Speech

MLP Singer

[Priority Research Team Hee-Jo Yoo] TTS (text-to-speech) is a technology that converts text into a voice of a specific voice when inputting arbitrary text. After Google announced the Tacotron series, it quickly switched from HMM (hidden Markov model)-based to deep learning-based, and is now commercially…

May272021

Can Machines Think? Emotionally

Speech, Trend

[Service Development Team, Eunji Kwon] When I was a child, when I was drawing imagination, robots in outer space were a favorite material. Looking back, from cartoons (Galaxy Railroad 999) to captains of artificial intelligence computers that move trains to recently released Humanoid movies, artificial intelligence is an important part of the media...

May212021

LaMDA-Google's Conversational Language Model

Speech, Trend

[Service Development Team Kim Byung-in] At Google I/O 2021, an event that showcases the latest Google technologies, Android, Web, artificial intelligence, Chrome, and other technologies, services, and platform services were released. Among the many technologies, the hottest topic is LaMDA (Google's language…

Mar192021

Visual Speech Interaction Code

A collection of AI projects for mobile devices (Awesome Tensorflow Lite)

Visual, Speech, Interaction, Code

Tensorflow Lite is a software package that contains tools that allow AI models trained with Tensorflow to run on mobile devices. It is said to be running on over 4 billion devices now. Basically, the trained model is converted to Tensorflow Lite…

Mar112021

Speech Trend

Google Lyra - speech compression based on deep generative models

Speech, Trend

Google Lyra is a new voice compression method based on the generative model. Existing voice compression methods have greatly improved the original sound quality, that is, about 8-16kbps required to obtain transparent quality, so that 3kbps low...

Feb192021

Speech Interaction Code

Framework merging of natural language processing and speech recognition

Speech, Interaction, Code

HuggingFace, famous for its integrated natural language processing package, adds speech recognition. The following is the related link: Specifically, Wav2Vec 2.0 developed by Facebook was added, which Wav2Vec 2.0 does unsupervised learning first with a large amount of unlabeled data, and very…

Nov182020

Speech Code

Facebook Denoiser: real-time speech enhancement

Speech, Code

We share a link to denoiser's github, Facebook's real-time noise reduction technology that was announced at Interspeech 2020. It is implemented in Pytorch and the title of the original paper is “Real Time Speech Enhancement in the Waveform Domain”. As the title suggests…

Nov22020

Speech Trend

AI technology that predicts Covid-19 infections through cough sounds

Speech, Trend

Corona 19 has yet to show signs of calming down worldwide. At MIT, we learned an AI model that can check whether COVID-19 is infected from the cough sound recorded with a mobile phone, and published the methodology and experimental results for this as a thesis. In the experimental results...

Oct72020

Visual Speech Code

Wav2Lip: generate lip motion from voice

Visual, Speech, Code

LipGAN is a technology that generates the shape of the lips of a face image using a voice signal, and when it is actually applied to a video, it was somewhat disappointing in terms of visual artifacts and the naturalness of movement. To improve this, the discriminator is not a single frame, but a plurality of consecutive…