Visual

Sep72020

Pixar's Super Resolution Technology and Its Applications

Deep learning-based super resolution technology was adopted by NVidia's latest GPU under the name DLSS (deep learning super sampling) and became a real service technology for consumers. Mainly in the 4K gaming market, 2K…

Sep42020

Visual

Implementation feasibility with Google MixNet

Visual

The convolution commonly used in images is a 3D operation. (KxKxC; K=kernel size, C=number of channels) After applying this by dividing it into multiple 2D operations of KxKx1, depthwise separable convolution that applies convolution with a size of 1x1xC in the channel direction greatly reduces the number of parameters...

Sep22020

Visual Code

Creating body movements by voice

Visual, Code

LipGan is the study of creating mouth shapes from speech signals. It is a technique that can be useful for creating an animation of a virtual character's mouth, but when applied in practice, the limitation is clear because only the lips of a character standing still move. In fact, humans...

Aug252020

Visual Interaction Data

Multimodal Q&A – Visual Dialog Task

Visual, Interaction, Data

The Visual Dialog task is a multimodal task that adds an image to a Q&A task that consists of a question and answer. For example, if you give a picture of a white cat and a black dog together and ask, "What color is the animal next to the cat?", you answer "black"...

Aug212020

Visual

Motion Retargeting from Motion, Skeleton and Angle

Visual

We share the project page of “Learning Character-Agnostic Motion for Motion Retargeting in 2D”, a paper published at SIGGRAPH 2019. This paper extracts motion, skeleton, and camera angle from three (which may be different) images, and then…

Aug202020

Visual Data

Adobe Mixamo: 3D character model open data

Visual, Data

In the game production side, we share a link to the Adobe Mixamo site that is already used a lot. When you enter, 121 3D characters and 2484 character motions are uploaded, and you can download them in a 3D format called (Autodesk) FBX. This format...

Aug162020

Visual

GAN-based Image Compression

Visual

In the field of video compression, there are the same things as Moore's Law (the number of transistors doubles every two years), MPEG-1 in 1993, MPEG-4/AVC (H.264) in 2003, MPEG-H/HEVC in 2013 ( H.265). For reference, in the case of image compression,…

Aug102020

Visual Speech Code

Speech2Face-face prediction from speech signals

Visual, Speech, Code

MIT's Speech2Face is a study that generates a speaker's face from a speech signal. However, it does not perform speech to face transform with one model, but it combines the results of existing studies for different purposes to create impressive results. (The first author is now...

Aug52020

Visual Interaction Data

MIT DriveSeg-data for road situation awareness research

Visual, Interaction, Data

It is a dataset DriveSeg created for research on road situation awareness (used for self-driving cars, etc.). For each frame of the video, the entire image is pixel-by-pixel semantic labeling. Label is “vehicle, pedestrian, road, sidewalk, bicycle, motorcycle, building,...