Overview
A task in Refuel, is a sequence of steps to define how you want to transform your data with LLMs. Tasks can range from simple to complex and may involve multiple steps executed in a chained sequence to achieve the desired transformation.
Defining your first task
Conceptually a task is a sequence of steps to be executed in a specific order to transform your data with LLMs. Every task in Refuel has the following components:
- Task name: A name for your task.
- Context: This is an overview of the dataset you are working with, and the problem you are trying to solve. These will be used to guide the LLM to take on a specific role/persona.
- Field(s): One or more output fields that will be generated by the LLM. Conceptually, a field is a new column you’re adding to your dataset - the result of the data transformation.
- Models: Provider and model to use for the task.
- Advanced settings: Additional settings to configure task behavior.
Here’s a quick video overview of how to define a new task:
Output fields
Refuel supports a variety of output field types, depending on the task you are trying to solve:
- Single category classification: A single category classification task is a task where the LLM will classify an input into one of a set of predefined categories.
- Multi category classification: A multi category classification task is a task where the LLM will classify an input into one or more of a set of predefined categories.
- Extraction: LLM will extract attributes from the input, and return them as a structured JSON schema.
- Generation: LLM will generate a free-format text output based on the input.
Enrichments fields
Enrichments allow you to configure external data sources from which relevant context can be fetched and supplied to the LLM when producing an output. This is critical for tasks where it is safer to rely on an external knowledge base (vs the LLM’s internal knowledge) to ensure accuracy/freshness of outputs.
Refuel currently supports enrichments from the following sources:
- Web search
- Maps search
- Website scraping
- Extracting text from images
- Custom enrichments (beta) - see custom enrichments for more details.
You can add enrichments to your task by clicking on “Add field” and selecting the enrichment source you want to use. When defining an enrichment field, you will typically select the input columns that will be used to fetch the enrichment data for each row, and optionally any other configurations required.
Supported Models
Refuel supports LLMs from a variety of providers:
Provider | Name |
---|---|
OpenAI | GPT-4 Turbo |
OpenAI | GPT-4o |
OpenAI | GPT-4o mini |
OpenAI | GPT-4 |
OpenAI | GPT-3.5 Turbo |
Anthropic | Claude 3.5 (Sonnet) |
Anthropic | Claude 3 (Opus) |
Anthropic | Claude 3 (Haiku) |
Gemini 1.5 (Pro) | |
Gemini 1.5 (Flash) | |
Gemini 2.0 (Flash) | |
Mistral | Mistral Small |
Mistral | Mistral Large |
Refuel | Refuel LLM-2 |
Refuel | Refuel LLM-2-small |
You can select the model you want to use for your task from the dropdown in the task editor. In addition to the base models, any LLMs that you have finetuned for a specific task will also be available in the same dropdown:
Advanced Settings
In addition to the model and field settings, you can also configure task behavior related to how LLM responses are cached, how is provided feedback used for few shot prompting and more by navigating to the Advanced
section in the task editor.
-
Few shot prompting: is a technique where you provide examples of the desired output to the LLM to help it understand the task better. You can update the number of examples you want to provide to the LLM in the
Few shot learning
section. Refuel will dynamically select the most relevant examples for each row to be processed. -
Caching LLM responses: By default, Refuel will cache the LLM responses. This means that if the LLM is called with the exact same prompt (model configuration + task guidelines + input data values to be processed) we will serve the completion from the cache instead of calling the LLM. This is useful in practice to improve latenct and reduce costs. You can disable this by setting the
Cache LLM responses
toggle to off. -
Beam search : Beam search is a technique where the LLM generates multiple completions in parallel and then selects the most likely completion. This behavior is enabled by default, but you can disable it by setting the
Beam search
toggle to off. -
Confidence: By default, Refuel will calculate the confidence score for each row based on the LLM response. You can disable this by setting the
Compute Confidence Scores
toggle to off.