Visual

Sep162020

Performance analysis of human and AI for image classification

Imagenet-1K (1000 class image classification problem) is a task that has been optimized with the development of CNN. AlexNet's TOP-5 error, which announced the beginning of the deep learning era, was about 17%. At that time, the TOP-5 error of the existing top technology (SIFT+FV) was about 26%...

Sep142020

Necessity of interaction with AR Glass concept video

Visual, Interaction, Trend

This is an AR Glass concept video created by a designer named Iskander Utebayev. Even if it is a concept video, it is quite fancy and once implemented, I think there is a potential to significantly change the Human-Machine Interface that uses smart devices. Apply AI technology…

Sep112020

Visual Speech Code Data

Lip2Wav: Generates a voice signal from silent lip movement

Visual, Speech, Code, Data

I've heard stories that you can know what you're talking about with just the movements of your lips if you get special training, but the research in the link was realized with AI.

Sep72020

Visual Trend Code

Pixar's Super Resolution Technology and Its Applications

Visual, Trend, Code

Deep learning-based super resolution technology was adopted by NVidia's latest GPU under the name DLSS (deep learning super sampling) and became a real service technology for consumers. Mainly in the 4K gaming market, 2K…

Sep42020

Visual

Implementation feasibility with Google MixNet

Visual

The convolution commonly used in images is a 3D operation. (KxKxC; K=kernel size, C=number of channels) After applying this by dividing it into multiple 2D operations of KxKx1, depthwise separable convolution that applies convolution with a size of 1x1xC in the channel direction greatly reduces the number of parameters...

Sep22020

Visual Code

Creating body movements by voice

Visual, Code

LipGan is the study of creating mouth shapes from speech signals. It is a technique that can be useful for creating an animation of a virtual character's mouth, but when applied in practice, the limitation is clear because only the lips of a character standing still move. In fact, humans...

Aug252020

Visual Interaction Data

Multimodal Q&A – Visual Dialog Task

Visual, Interaction, Data

The Visual Dialog task is a multimodal task that adds an image to a Q&A task that consists of a question and answer. For example, if you give a picture of a white cat and a black dog together and ask, "What color is the animal next to the cat?", you answer "black"...

Aug212020

Visual

Motion Retargeting from Motion, Skeleton and Angle

Visual

We share the project page of “Learning Character-Agnostic Motion for Motion Retargeting in 2D”, a paper published at SIGGRAPH 2019. This paper extracts motion, skeleton, and camera angle from three (which may be different) images, and then…

Aug202020

Visual Data

Adobe Mixamo: 3D character model open data

Visual, Data

In the game production side, we share a link to the Adobe Mixamo site that is already used a lot. When you enter, 121 3D characters and 2484 character motions are uploaded, and you can download them in a 3D format called (Autodesk) FBX. This format...