Implementation feasibility with Google MixNet
The convolution commonly used in images is a 3D operation. (KxKxC; K=kernel size, C=number of channels) After applying this by dividing it into multiple 2D operations of KxKx1, depthwise separable convolution that applies convolution with a size of 1x1xC in the channel direction greatly reduces the number of parameters...