Code

Sep112020

Lip2Wav: Generates a voice signal from silent lip movement

I've heard stories that you can know what you're talking about with just the movements of your lips if you get special training, but the research in the link was realized with AI.

Sep92020

Interaction Code Data

KcBERT: Korean language model reflecting comments and new words

Interaction, Code, Data

In the case of large-scale language models, there was always a difficulty because there was no Korean model. Following SKT's KoBERT, Naver released KcBERT, which was learned from the ground up with data reflecting Naver comment data and new words. Not only the trained model…

Sep72020

Visual Trend Code

Pixar's Super Resolution Technology and Its Applications

Visual, Trend, Code

Deep learning-based super resolution technology was adopted by NVidia's latest GPU under the name DLSS (deep learning super sampling) and became a real service technology for consumers. Mainly in the 4K gaming market, 2K…

Sep22020

Visual Code

Creating body movements by voice

Visual, Code

LipGan is the study of creating mouth shapes from speech signals. It is a technique that can be useful for creating an animation of a virtual character's mouth, but when applied in practice, the limitation is clear because only the lips of a character standing still move. In fact, humans...

Aug192020

Speech Code

FastSpeech2 Open Source

Speech, Code

TensorflowTTS, an open source based on Tensorflow 2 that supports several latest TTS models such as Tacotron2, MelGan, FastSpeech, etc., has finally begun supporting Microsoft FastSpeech2. FastSpeech2 shows similar performance to Transformer series TTS, but takes more than twice the time to learn…

Aug112020

Interaction Code Data

Text-to-SQL: Convert natural language to SQL

Interaction, Code, Data

Text-to-SQL is a task that automatically converts natural language into SQL. The post I shared at the bottom was written by Aerin Kim of Microsoft, and it is well organized about Text-to-SQL. In the world, a lot of data is built as a relational database, and in this database...

Aug102020

Visual Speech Code

Speech2Face-face prediction from speech signals

Visual, Speech, Code

MIT's Speech2Face is a study that generates a speaker's face from a speech signal. However, it does not perform speech to face transform with one model, but it combines the results of existing studies for different purposes to create impressive results. (The first author is now...

Aug52020

Speech Code

Wav2Vec 2.0 Revealed-Create ASR with 10 Minute Voice

Speech, Code

After performing representation training with 53,000 hours of label-free data, a pre-trained model for Facebook's wav2vec 2.0, which became a hot topic because it created a speech recognizer with only 10 minutes of labeled data, was released. No fine-tuning in the representation model,...

Jul292020

Visual Code

First Order Motion Model for Image Animation

Visual, Code

Los Angeles Noir, a 2011 film made by Rockstar, surprised many with facial animations that were far superior to other games. The technology used at this time is called MotionScan, and basically, the actor is in a room where several cameras are elaborately placed...