Data GPT-Neo-Open Source GPT-3 Project

Ubuntu Dialog Corpus

Building a conversation system that allows humans to have natural-looking conversations with virtual agents is a difficult task in natural language processing and is the basis for much ongoing research. Ubuntu Dialogue Corpus addresses various Ubuntu-related issues…

InteractionData GPT-Neo-Open Source GPT-3 Project

Korean profanity text dataset

We share a set of Korean profanity data collected and labeled by Joonhee Jo. It is gathered from multiple communities, and seems to be suitable for evaluation of real-world data. Below is a description of the data set: The Data Description statement is classified as swearword...

Data GPT-Neo-Open Source GPT-3 Project

MELD: Multimodal EmotionLines Dataset

Multimodal EmotionLines Dataset (MELD) is a multimodal extension of EmotionLines, an emotionally labeled dialogue data set. MELD contains the same dialog instances available in EmotionLines, but audio and visual forms along with text…

InteractionData GPT-Neo-Open Source GPT-3 Project

HuggingFace Datasets 1.0

The first stable version 1.0 of the Huggingface Datasets library has been released, making it easy to use NLP datasets and evaluation metrics. Currently, we support about 100 datasets and evaluation metrics (about 10) for each dataset.

Data GPT-Neo-Open Source GPT-3 Project

Korean language corpus of National Institute of Korean Language

The National Institute of the Korean Language has released Korean language materials for artificial intelligence learning on a large scale (13 kinds of 1.8 billion words). It was built by solving the copyright problem, and it is said that anyone can download and use the file once an online agreement is written and approved on the'Everyone's Corpus' site. this time…