Embedding Models¶
Autolabel also supports various models to compute text embeddings that are used in some few shot example selection strategies such as semantic similarity and max marginal relevance. Like the LLMs that Autolabel supports, each embedding model belongs to a provider. Currently the library supports embedding models from 3 providers: OpenAI, Google Vertex AI, and Huggingface. By default, if no embedding config is present in the labeling config but a few shot strategy that requires text embeddings is enabled, Autolabel defaults to use OpenAI embeddings and an OpenAI API key will be required.
Details on how to set up the embedding config for each provider are below.
OpenAI¶
To use models from OpenAI, you can set provider
to openai
under the embedding
key in the labeling configuration. Then, the specific model that will be queried can be specified using the model
key. The default embedding model, if none is provided, is text-embedding-ada-002
Setup¶
To use OpenAI models with Autolabel, make sure to first install the relevant packages by running:
and also setting the following environment variable: replacing<your-openai-key>
with your API key, which you can get from here.
Example usage¶
Here is an example of setting config to a dictionary that will use OpenAI's text-embedding-ada-002
model for computing text embeddings. Specifically, note that in the dictionary provided by the embedding
tag, provider
is set to openai
and model
is not set so it will default to text-embedding-ada-002
.
config = {
"task_name": "OpenbookQAWikipedia",
"task_type": "question_answering",
"dataset": {
"label_column": "answer",
"delimiter": ","
},
"model": {
"provider": "openai",
"name": "gpt-3.5-turbo",
"params": {}
},
"embedding": {
"provider": "openai"
},
"prompt": {
"task_guidelines": "You are an expert at answering questions.",
"example_template": "Question: {question}\nAnswer: {answer}"
}
}
Hugging Face¶
To use models from Hugging Face, you can set provider
to huggingface_pipeline
when creating a labeling configuration. The specific model that will be queried can be specified using the name
key.
This will run the model locally on a GPU (if available). You can also specify quantization strategy to load larger models in lower precision (and thus decreasing memory requirements).
Setup¶
To use Hugging Face models with Autolabel, make sure to first install the relevant packages by running:
Example usage¶
Here is an example of setting config to a dictionary that will use the sentence-transformers/all-mpnet-base-v2
model for computing text embeddings. Specifically, note that in the dictionary provided by the embedding
tag, provider
is set to huggingface_pipeline
and model
is set to be sentence-transformers/all-mpnet-base-v2
.
config = {
"task_name": "OpenbookQAWikipedia",
"task_type": "question_answering",
"dataset": {
"label_column": "answer",
"delimiter": ","
},
"model": {
"provider": "huggingface_pipeline",
"name": "google/flan-t5-small",
"params": {}
},
"embedding": {
"provider": "huggingface_pipeline",
"model": "sentence-transformers/all-mpnet-base-v2"
},
"prompt": {
"task_guidelines": "You are an expert at answering questions.",
"example_template": "Question: {question}\nAnswer: {answer}"
}
}
Google Vertex AI¶
To use models from Google, you can set the provider
to google
when creating a labeling configuration. The specific model that will be queried can be specified using the model
key.
Setup¶
To use Google models with Autolabel, make sure to first install the relevant packages by running:
and also setting up Google authentication locally.Example usage¶
Here is an example of setting config to a dictionary that will use google's textembedding-gecko
model for computing text embeddings. Specifically, note that in the dictionary provided by the embedding
tag, provider
is set to google
and model
is set to be textembedding-gecko
.
config = {
"task_name": "OpenbookQAWikipedia",
"task_type": "question_answering",
"dataset": {
"label_column": "answer",
"delimiter": ","
},
"model": {
"provider": "google",
"name": "text-bison@001",
"params": {}
},
"embedding": {
"provider": "google",
"model": "textembedding-gecko"
}
"prompt": {
"task_guidelines": "You are an expert at answering questions.",
"example_template": "Question: {question}\nAnswer: {answer}"
}
}