Using applications
Playground
The playground is a tool for testing, debugging and playing around with your deployed application.
The playground can be used to test out an application by sending different inputs to the application and see how the output of the application changes. Edit the input in the Browser section to see how the corresponding output in the Response section changes. This is the exact response that any client will receive when they make a request to this application.
Examples of using applications
The playground also provides code snippets for using this application in different clients (currently we support python, javascript and curl). Here is an example snippet for the syntax of using an application. This will be adapted to your application in the playground.
Synchronous vs Asynchronous application calls
LLM applications defined in Refuel can be used in 2 ways, depending on the usecase and latency constraints:
- Synchronous
This is meant for processing individual or small batches of inputs (10s of items) in real time. It is best suited for usecases that need an immediate response/low latency. This involves making a call to the application synchronously i.e sending a request to the application and blocking until the response is received. The default code snippets in the playground are for synchronous calls.
Limitations: There is some possibility of timeouts if the LLM is temporarily unavailable (e.g. OpenAI outage) or traffic is throttled.
- Asynchronous
This is meant for cases where latency is less critical or we expect each request to take a long time to complete. This involves making a call to the application asynchronously i.e sending a request to the application and polling for the response separately.
Advantages compared to synchronous calls:
- No timeout errors - even if the LLM takes a long time to process, or if we have to make multiple LLM calls (e.g. in a multi-step chain)
- More robust to LLM throttling/transient availability - Refuel will take care of retries
Limitations:
- Marginally higher latency per request
- Separate API calls to submit a request, and then to retrieve the results
- Currently not supported through the Python or Javascript SDKs
How to use: The async workflow has two steps:
- Submit a request
The request will be exactly the same as the curl request for a synchronous call, but with an extra query parameter to tell Refuel to process this request asynchronously:
is_async=True
. The endpoint will return an HTTP status code 202, and the response schema, if the is_async flag is set to True, is the following:
The refuel_uuid
is the unique identifier for this resource, and should be used for retrieving results in Step 2.
- Retrieve results This is supported by the GET /applications//items/ endpoint. An example curl request -
This endpoint will return the following status codes:
- Status 202 - if the request is still being processed
- Status 200 - if the request is completed processing (along with the LLM output response)
- Status 404 - if no such
refuel_uuid
can be found
The response schema of the asynchronous call is the same as that of the synchronous response seen in the playground.
Optional parameters
Explanations
In addition to the label output, you can also request explanations for why a model chose a particular label.
Model override
The model used for the application call can be changed from the default model specified in the application definition. This is useful to try out different models for the same application, without having to re-deploy the application.
Rate limits
Currently, we impose the following rate limits on all applications:
- 600 requests/min
- Upto 100 concurrent requests
Requests will be throttled if these limits are exceeded. We can increase these limits for specific applications if needed. Contact us if you need to increase these limits.