Autolabel is a Python library to label, clean and enrich datasets with Large Language Models (LLMs).
🌟 (New!) Access RefuelLLM through Autolabel¶
You can access RefuelLLM, our recently announced LLM purpose built for data labeling, through Autolabel (Read more about it in this blog post). Refuel LLM is a Llama-v2-13b base model, instruction tuned on over 2500 unique (5.24B tokens) labeling tasks spanning categories such as classification, entity resolution, matching, reading comprehension and information extraction. You can experiment with the model in the playground here.
- Autolabel data for NLP tasks such as classification, question-answering and named entity-recognition, entity matching and more.
- Seamlessly use commercial and open source LLMs from providers such as OpenAI, Anthropic, HuggingFace, Google and more.
- Leverage research-proven LLM techniques to boost label quality, such as few-shot learning and chain-of-thought prompting.
- Confidence estimation and explanations out of the box for every single output label
- Caching and state management to minimize costs and experimentation time
You can get started with Autolabel by simpling bringing the dataset you want to label, picking your favorite LLM and writing a few lines of code.
- Installation and your first labeling task: Steps to install Autolabel and run sentiment analysis for movie reviews using OpenAI's
- Classification tutorial: A deeper dive into how Autolabel can be used to detect toxic comments at 95%+ accuracy.
- Command Line Interface: Learn how to use Autolabel's CLI to intuitively create configs from the command line.
- Here are more examples with sample notebooks that show how Autolabel can be used for different NLP tasks.