[Prior Research Team, Ji-Hyun Song] The TadGAN algorithm developed by the MIT research team is known to have better performance than previously known models in detecting anomalies by analyzing time series data. Currently, many companies researching anomaly detection are working in various fields (financial…

The value of a data specialist

[Service Development Team Jeon Jeon Jeon] The digital transformation of companies accelerated by Corona 19 continues to increase the value of data. The need for change in various industries as well as specialized IT companies is raising ransom money for data specialists.

GPT-Neo: Open Source GPT-3 Project

OpenAI's GPT-3 is a large language model with a parameter count of up to 175B. Despite the surprising results of GPT-3, it is not open source, so if you want to try it, try AI Dungeon ( or Philosopher AI ( ) Through a site such as...

Ubuntu Dialog Corpus

Building a conversation system that allows humans to have natural-looking conversations with virtual agents is a difficult task in natural language processing and is the basis for much ongoing research. Ubuntu Dialogue Corpus addresses various Ubuntu-related issues…

Korean profanity text dataset

We share a set of Korean profanity data collected and labeled by Joonhee Jo. It is gathered from multiple communities, and seems to be suitable for evaluation of real-world data. Below is a description of the data set: The Data Description statement is classified as swearword...

MELD: Multimodal EmotionLines Dataset

Multimodal EmotionLines Dataset (MELD) is a multimodal extension of EmotionLines, an emotionally labeled dialogue data set. MELD contains the same dialog instances available in EmotionLines, but audio and visual forms along with text…

HuggingFace Datasets 1.0

The first stable version 1.0 of the Huggingface Datasets library has been released, making it easy to use NLP datasets and evaluation metrics. Currently, we support about 100 datasets and evaluation metrics (about 10) for each dataset.