Large Language Models (LLMs)¶
Autolabel supports multiple LLMs for labeling data. Some LLMs are available by calling an API with the appropriate API keys (OpenAI, Anthropic, etc.) while others can be run locally (such as the ones available on Hugging Face). The LLM used to label can be controlled using the provider
and name
keys in the dictionary specified under model
in the input config.
Each LLM belongs to an LLM provider -- which refers to the organization or open-source framework through which we are able to access the LLM. A full list of LLM providers and LLMs that are currently supported is provided towards the end of this page.
Autolabel makes it easy to try out different LLMs for your task and this page will walk you through how to get started with each LLM provider and model. Separately, we've also benchmarked multiple LLMs across different datasets - you can read the full technical report here [link to blog post] or check out the latest benchmark results here.
OpenAI¶
To use models from OpenAI, you can set provider
to openai
when creating a labeling configuration. The specific model that will be queried can be specified using the name
key. Autolabel currently supports the following models from OpenAI:
text-davinci-003
gpt-3.5-turbo
,gpt-3.5-turbo-0301
andgpt-3.5-turbo-0613
(4,096 max tokens)gpt-3.5-turbo-16k
andgpt-3.5-turbo-16k-0613
(16,384 max tokens)gpt-4
,gpt-4-0314
andgpt-4-0613
(8,192 max tokens)gpt-4-32k
,gpt-4-32k-0314
andgpt-4-32k-0613
(32,768 max tokens)
gpt-4
set of models are the most capable (and most expensive) from OpenAI, while gpt-3.5-turbo
set of models are cheap (but still quite capable). Detailed pricing for these models is available here.
Setup¶
To use OpenAI models with Autolabel, make sure to first install the relevant packages by running:
and also setting the following environment variable: replacing<your-openai-key>
with your API key, which you can get from here.
Example usage¶
Here is an example of setting config to a dictionary that will use OpenAI's gpt-3.5-turbo
model for labeling. Specifically, note that in the dictionary proivded by the model
tag, provider
is set to openai
and name
is set to be gpt-3.5-turbo
. name
can be switched to use any of the three models mentioned above.
config = {
"task_name": "OpenbookQAWikipedia",
"task_type": "question_answering",
"dataset": {
"label_column": "answer",
"delimiter": ","
},
"model": {
"provider": "openai",
"name": "gpt-3.5-turbo",
"params": {}
},
"prompt": {
"task_guidelines": "You are an expert at answering questions.",
"example_template": "Question: {question}\nAnswer: {answer}"
}
}
Additional parameters¶
A few parameters can be passed in alongside openai
models to tweak their behavior:
max_tokens
(int): The maximum tokens to sample from the modeltemperature
(float): A float between 0 and 2 which indicates the diversity you want in the output. 0 uses greedy sampling (picks the most likely outcome).
These parameters can be passed in via the params
dictionary under model
. Here is an example:
"model": {
"provider": "openai",
"name": "gpt-3.5-turbo",
"params": {
"max_tokens": 512,
"temperature": 0.1
}
}
Anthropic¶
To use models from Anthropic, you can set the provider
to anthropic
when creating a labeling configuration. The specific model that will be queried can be specified using the name
key. Autolabel currently supports the following models from Anthropic:
claude-instant-v1
claude-v1
claude-v1
is a state-of-the-art high-performance model, while claude-instant-v1
is a lighter, less expensive, and much faster option. claude-instant-v1
is ~6.7 times cheaper than claude-v1
, at $1.63/1 million tokens. On the other hand claude-v1
costs $11.02/1 million tokens.
Setup¶
To use Anthropic models with Autolabel, make sure to first install the relevant packages by running:
and also setting the following environment variable: replacing<your-anthropic-key>
with your API key, which you can get from here.
Example usage¶
Here is an example of setting config to a dictionary that will use anthropic's claude-instant-v1
model for labeling. Specifically, note that in the dictionary proivded by the model
tag, provider
is set to anthropic
and name
is set to be claude-instant-v1
. name
can be switched to use any of the two models mentioned above.
config = {
"task_name": "OpenbookQAWikipedia",
"task_type": "question_answering",
"dataset": {
"label_column": "answer",
"delimiter": ","
},
"model": {
"provider": "anthropic",
"name": "claude-instant-v1",
"params": {}
},
"prompt": {
"task_guidelines": "You are an expert at answering questions.",
"example_template": "Question: {question}\nAnswer: {answer}"
}
}
Additional parameters¶
A few parameters that can be passed in for anthropic
models to control the model behavior:
max_tokens_to_sample
(int): The maximum tokens to sample from the modeltemperature
(float): A float between 0 and 2 which indicates the diversity you want in the output. 0 uses greedy sampling (picks the most likely outcome).
These parameters can be passed in via the params
dictionary under model
. Here is an example:
"model": {
"provider": "anthropic",
"name": "claude-instant-v1",
"params": {
"max_tokens_to_sample": 512,
"temperature": 0.1
}
}
Hugging Face¶
To use models from Hugging Face, you can set provider
to huggingface_pipeline
when creating a labeling configuration. The specific model that will be queried can be specified using the name
key. Autolabel currently supports all Sequence2Sequence and Causal Language Models on Hugging Face. All models available on Hugging Face can be found here. Ensure that the model you choose can be loaded using AutoModelForSeq2SeqLM
or AutoModelForCausalLM
. Here are a few examples:
Sequence2Sequence Language Models:
* google/flan-t5-small
(all flan-t5-* models)
* google/pegasus-x-base
* microsoft/prophetnet-large-uncased
Causal Language Models:
* gpt2
* openlm-research/open_llama_3b
* meta-llama/Llama-2-7b
This will run the model locally on a GPU (if available). You can also specify quantization strategy to load larger models in lower precision (and thus decreasing memory requirements).
Setup¶
To use Hugging Face models with Autolabel, make sure to first install the relevant packages by running:
Example usage¶
Here is an example of setting config to a dictionary that will use google/flan-t5-small
model for labeling via Hugging Face. Specifically, note that in the dictionary proivded by the model
tag, provider
is set to huggingface_pipeline
and name
is set to be google/flan-t5-small
. name
can be switched to use any model that satisfies the constraints above.
config = {
"task_name": "OpenbookQAWikipedia",
"task_type": "question_answering",
"dataset": {
"label_column": "answer",
"delimiter": ","
},
"model": {
"provider": "huggingface_pipeline",
"name": "google/flan-t5-small",
"params": {}
},
"prompt": {
"task_guidelines": "You are an expert at answering questions.",
"example_template": "Question: {question}\nAnswer: {answer}"
}
}
Additional parameters¶
A few parameters that can be passed in for huggingface_pipeline
models to control the model behavior:
max_new_tokens
(int) - The maximum tokens to sample from the modeltemperature
(float) - A float b/w 0 and 1 which indicates the diversity you want in the output. 0 uses greedy sampling.quantize
(int) - The model quantization to use. 32 bit by default, but we also support 16 bit and 8 bit support for models which have been hosted on Hugging Face.
These parameters can be passed in via the params
dictionary under model
. Here is an example:
"model": {
"provider": "huggingface_pipeline",
"name": "google/flan-t5-small",
"params": {
"max_new_tokens": 512,
"temperature": 0.1,
"quantize": 8
}
},
To use Llama 2, you can use the following model configuration:
Refuel¶
To use models hosted by Refuel, you can set provider
to refuel
when creating a labeling configuration. The specific model that will be queried can be specified using the name
key. Autolabel currently supports only one model:
flan-t5-xxl
This is a 13 billion parameter model, which is also available on Hugging Face here. However, running such a huge model locally is a challenge, which is why we are currently hosting the model on our servers.
Setup¶
To use Refuel models with Autolabel, make sure set the following environment variable:
replacing<your-refuel-key>
with your API key.
Getting a Refuel API key¶
If you're interested in trying one of the LLMs hosted by Refuel, sign up for your Refuel API key by filling out the form here. We'll review your application and get back to you soon!
Example usage¶
Here is an example of setting config to a dictionary that will use Refuel's flan-t5-xxl
model. Specifically, note that in the dictionary proivded by the model
tag, provider
is set to refuel
and name
is set to be flan-t5-xxl
.
config = {
"task_name": "OpenbookQAWikipedia",
"task_type": "question_answering",
"dataset": {
"label_column": "answer",
"delimiter": ","
},
"model": {
"provider": "refuel",
"name": "flan-t5-xxl",
"params": {}
},
"prompt": {
"task_guidelines": "You are an expert at answering questions.",
"example_template": "Question: {question}\nAnswer: {answer}"
}
}
Additional parameters¶
A few parameters that can be passed in for refuel
models to control the model behavior. For example:
max_new_tokens
(int) - The maximum tokens to sample from the modeltemperature
(float) - A float b/w 0 and 1 which indicates the diversity you want in the output. 0 uses greedy sampling.
These parameters can be passed in via the params
dictionary under model
. Here is an example:
"model": {
"provider": "refuel",
"name": "flan-t5-xxl",
"params": {
"max_new_tokens": 512,
"temperature": 0.1,
}
}
refuel
hosted LLMs support all the parameters that can be passed as a part of GenerationConfig while calling generate functions of Hugging Face LLMs.
Google PaLM¶
To use models from Google, you can set the provider
to google
when creating a labeling configuration. The specific model that will be queried can be specified using the name
key. Autolabel currently supports the following models from Google:
text-bison@001
chat-bison@001
text-bison@001
is often more suitable for labeling tasks due to its ability to follow natural language instructions. chat-bison@001
is fine-tuned for multi-turn conversations. text-bison@001
costs $0.001/1K characters and chat-bison@001
costs half that at $0.0005/1K characters. Detailed pricing for these models is available here
Setup¶
To use Google models with Autolabel, make sure to first install the relevant packages by running:
and also setting up Google authentication locally.Example usage¶
Here is an example of setting config to a dictionary that will use google's text-bison@001
model for labeling. Specifically, note that in the dictionary provided by the model
tag, provider
is set to google
and name
is set to be text-bison@001
. name
can be switched to use any of the two models mentioned above.
config = {
"task_name": "OpenbookQAWikipedia",
"task_type": "question_answering",
"dataset": {
"label_column": "answer",
"delimiter": ","
},
"model": {
"provider": "google",
"name": "text-bison@001",
"params": {}
},
"prompt": {
"task_guidelines": "You are an expert at answering questions.",
"example_template": "Question: {question}\nAnswer: {answer}"
}
}
Additional parameters¶
A few parameters can be passed in alongside google
models to tweak their behavior:
max_output_tokens
(int): Maximum number of tokens that can be generated in the response.temperature
(float): A float between 0 and 1 which indicates the diversity you want in the output. 0 uses greedy sampling (picks the most likely outcome).
These parameters can be passed in via the params
dictionary under model
. Here is an example:
"model": {
"provider": "google",
"name": "text-bison@001",
"params": {
"max_output_tokens": 512,
"temperature": 0.1
}
}
Model behavior¶
chat-bison@001
always responds in a "chatty" manner (example below), often returning more than just the requested label. This can cause problems on certain labeling tasks.
Content moderation¶
Both Google LLMs seem to have much stricter content moderation rules than the other supported models. This can cause certain labeling jobs to completely fail as shown in our technical report [add link to technical report]. Consider a different model if your dataset has content that is likely to trigger Google's built-in content moderation.
Cohere¶
To use models from Cohere, you can set the provider
to cohere
when creating a labeling configuration. The specific model that will be queried can be specified using the name
key. Autolabel currently supports the following models from Cohere:
command
(4096 max tokens)command-light
(4096 max tokens)base
(2048 max tokens)base-light
(2048 max tokens)
command
is an instruction-following conversational model that performs language tasks with high quality, while command-light
is an almost as capable, but much faster option. base
is a model that performs generative language tasks, while base-light
much faster but a little less capable. All models cost the same at $15/1 million tokens. Detailed pricing for these models is available here.
Setup¶
To use Cohere models with Autolabel, make sure to first install the relevant packages by running:
and also setting the following environment variable: replacing<your-cohere-key>
with your API key, which you can get from here.
Example usage¶
Here is an example of setting config to a dictionary that will use cohere's command
model for labeling. Specifically, note that in the dictionary proivded by the model
tag, provider
is set to cohere
and name
is set to be command
. name
can be switched to use any of the four models mentioned above.
config = {
"task_name": "OpenbookQAWikipedia",
"task_type": "question_answering",
"dataset": {
"label_column": "answer",
"delimiter": ","
},
"model": {
"provider": "cohere",
"name": "command",
"params": {}
},
"prompt": {
"task_guidelines": "You are an expert at answering questions.",
"example_template": "Question: {question}\nAnswer: {answer}"
}
}
Additional parameters¶
A few parameters that can be passed in for cohere
models to control the model behavior:
max_tokens
(int): The maximum number of tokens to predict per generationtemperature
(float): The degree of randomness in generations from 0.0 to 5.0, lower is less random.
These parameters can be passed in via the params
dictionary under model
. Here is an example:
"model": {
"provider": "cohere",
"name": "command",
"params": {
"max_tokens": 512,
"temperature": 0.1
}
}
Provider List¶
The table lists out all the provider, model combinations that Autolabel supports today:
Provider | Name |
---|---|
openai | text-davinci-003 |
openai | gpt-3.5-turbo models |
openai | gpt-4 models |
anthropic | claude-v1 |
anthropic | claude-instant-v1 |
huggingface_pipeline | seq2seq models and causalLM models |
refuel | flan-t5-xxl |
text-bison@001 | |
chat-bison@001 | |
cohere | command |
cohere | command-light |
cohere | base |
cohere | base-light |