Skip to content

Introduction

isolated isolated

Autolabel is a Python library to label, clean and enrich datasets with Large Language Models (LLMs).

Features

  • Autolabel data for NLP tasks such as classification, question-answering and named entity-recognition, entity matching and more.
  • Seamlessly use commercial and open source LLMs from providers such as OpenAI, Anthropic, HuggingFace, Google and more.
  • Leverage research-proven LLM techniques to boost label quality, such as few-shot learning and chain-of-thought prompting.
  • Confidence estimation and explanations out of the box for every single output label
  • Caching and state management to minimize costs and experimentation time

Getting Started

You can get started with Autolabel by simpling bringing the dataset you want to label, picking your favorite LLM and writing a few lines of code.

  • Installation and your first labeling task: Steps to install Autolabel and run sentiment analysis for movie reviews using OpenAI's gpt-3.5-turbo.
  • Classification tutorial: A deeper dive into how Autolabel can be used to detect toxic comments at 95%+ accuracy.
  • Command Line Interface: Learn how to use Autolabel's CLI to intuitively create configs from the command line.
  • Here are more examples with sample notebooks that show how Autolabel can be used for different NLP tasks.

Resources

  • Discord: Join our Discord community for conversations on LLMs, Autolabel and so much more!
  • Github: Create an issue to report any bugs or give us a star on Github.
  • Contribute: Share your feedback or add new features, and help us improve Autolabel!