Skip to content

Refuel-provided Datasets

Autolabel provides datasets out-of-the-box so you can easily get started with LLM-powered labeling. The full list of datasets is below:

Dataset Task Type
banking Classification
civil_comments Classification
ledgar Classification
movie_reviews Classification
walmart_amazon Entity Matching
company Entity Matching
squad_v2 Question Answering
sciq Question Answering
conll2003 Named Entity Matching

Downloading any dataset

To download a specific dataset, such as civil_comments, run:

from autolabel import get_data

get_data('civil_comments')
> Downloading seed example dataset to "data/civil_comments/seed.csv"...
> 100% [..............................................................................] 65757 / 65757

> Downloading test dataset to "data/civil_comments/test.csv"...
> 100% [............................................................................] 610663 / 610663