The first stable version 1.0 of the Huggingface Datasets library has been released, making it easy to use NLP datasets and evaluation metrics. Currently, about 100 datasets and evaluation metrics (about 10) for each dataset are supported.
It is made by forking Tensorflow Datasets made for similar purposes, so it is similar in many parts, but it can be used as a cross-platform using Apache Arrow instead of TFRecord. It supports Numpy, Pandas, PyTorch, and Tensorflow 2.
Online data browser is also supported, so you can easily view the contents of each dataset. In addition, the combination with various models supported by the existing Huggingface seems to create a great synergy.
huggingface/datasets
🤗 Fast, efficient, open-access datasets and evaluation metrics for Natural Language Processing and more in PyTorch, TensorFlow, NumPy and Pandas – huggingface/datasets