The National Institute of Korean Language has released Korean language corpus for artificial intelligence learning on a large scale (13 kinds of 1.8 billion words). It was built by solving the copyright problem, and it is said that anyone can download and use the file once an online agreement is written and approved on the Everyone's Corpus site shared below.
Compared to the last '21st Century Sejong Plan', the data constructed this time are said to have increased the proportion of colloquial materials such as daily conversations, messengers and web documents. This reflects the trend of increasing interest and demand for colloquial conversations as interactive services such as artificial intelligence speakers and chatbots increase in recent years.
In particular, in the case of daily conversation data, it is said to have prepared a foothold to handle regional dialects in artificial intelligence technology by taking a step further from the collection of standard words and collecting various conversation data by region and age, so it would be useful to be used in various fields. The links below are the "Everyone's Corpus" site where you can download data, and public related news articles.