Visual Performance analysis of human and AI for image classification

Implementation feasibility with Google MixNet

The convolution commonly used in images is a 3D operation. (KxKxC; K=kernel size, C=number of channels) After applying this by dividing it into multiple 2D operations of KxKx1, depthwise separable convolution that applies convolution with a size of 1x1xC in the channel direction greatly reduces the number of parameters...

VisualCode Performance analysis of human and AI for image classification

Creating body movements by voice

LipGan is the study of creating mouth shapes from speech signals. It is a technique that can be useful for creating an animation of a virtual character's mouth, but when applied in practice, the limitation is clear because only the lips of a character standing still move. In fact, humans...