Refuel-provided Datasets
Autolabel provides datasets out-of-the-box so you can easily get started with LLM-powered labeling. The full list of datasets is below:
Dataset | Task Type |
---|---|
banking | Classification |
civil_comments | Classification |
ledgar | Classification |
movie_reviews | Classification |
walmart_amazon | Entity Matching |
company | Entity Matching |
squad_v2 | Question Answering |
sciq | Question Answering |
conll2003 | Named Entity Matching |
Downloading any dataset¶
To download a specific dataset, such as civil_comments
, run:
from autolabel import get_data
get_data('civil_comments')
> Downloading seed example dataset to "data/civil_comments/seed.csv"...
> 100% [..............................................................................] 65757 / 65757
> Downloading test dataset to "data/civil_comments/test.csv"...
> 100% [............................................................................] 610663 / 610663