Data

Jun12021

Time Series Data Analysis_TadGAN

[Prior Research Team, Ji-Hyun Song] The TadGAN algorithm developed by the MIT research team is known to have better performance than previously known models in detecting anomalies by analyzing time series data. Currently, many companies researching anomaly detection are working in various fields (financial…

May142021

Trend Data

The value of a data specialist

Trend, Data

[Service Development Team Jeon Jeon Jeon] The digital transformation of companies accelerated by Corona 19 continues to increase the value of data. The need for change in various industries as well as specialized IT companies is raising ransom money for data specialists.

Apr82021

Interaction Code Data

GPT-Neo: Open Source GPT-3 Project

Interaction, Code, Data

OpenAI's GPT-3 is a large language model with a parameter count of up to 175B. Despite the surprising results of GPT-3, it is not open source, so if you want to try it, try AI Dungeon (https://play.aidungeon.io/main/landing) or Philosopher AI (https://philosopherai.com/). ) Through a site such as...

Feb232021

Interaction Code Data

DensePhrases - Near-Real-Time Wikipedia Open Domain Q&A

Interaction, Code, Data

DensePhrases is an open domain Q&A technology created by Jinhyuk Lee at Korea University, and was published as a paper titled “Learning Dense Representations of Phrases at Scale”. Here is a link to the paper: Given a question, the best paragraph from Wikipedia's nearly 60 billion paragraphs…

Feb102021

Interaction Data

PapersWithCode's Korean dataset

Interaction, Data

Paperswithcode, which provides information on various papers in the field of AI, linked open source, and SOTA, provides links to over 3,000 useful datasets. Of these, there are 851 data sets for text, limited to Korean…

Feb52021

Data

Ubuntu Dialog Corpus

Data

Building a conversation system that allows humans to have natural-looking conversations with virtual agents is a difficult task in natural language processing and is the basis for much ongoing research. Ubuntu Dialogue Corpus addresses various Ubuntu-related issues…

Dec212020

Interaction Data

Korean profanity text dataset

Interaction, Data

We share a set of Korean profanity data collected and labeled by Joonhee Jo. It is gathered from multiple communities, and seems to be suitable for evaluation of real-world data. Below is a description of the data set: The Data Description statement is classified as swearword...

Dec42020

Data

MELD: Multimodal EmotionLines Dataset

Data

Multimodal EmotionLines Dataset (MELD) is a multimodal extension of EmotionLines, an emotionally labeled dialogue data set. MELD contains the same dialog instances available in EmotionLines, but audio and visual forms along with text…

Oct122020

Interaction Data

HuggingFace Datasets 1.0

Interaction, Data

The first stable version 1.0 of the Huggingface Datasets library has been released, making it easy to use NLP datasets and evaluation metrics. Currently, we support about 100 datasets and evaluation metrics (about 10) for each dataset.