Playground

The playground is a tool for testing, debugging and playing around with your deployed application.

The playground can be used to test out an application by sending different inputs to the application and see how the output of the application changes. Edit the input in the Browser section to see how the corresponding output in the Response section changes. This is the exact response that any client will receive when they make a request to this application.

Examples of using applications

The playground also provides code snippets for using this application in different clients (currently we support python, javascript and curl). Here is an example snippet for the syntax of using an application. This will be adapted to your application in the playground.

Synchronous vs Asynchronous application calls

LLM applications defined in Refuel can be used in 2 ways, depending on the usecase and latency constraints:

  1. Synchronous

This is meant for processing individual or small batches of inputs (10s of items) in real time. It is best suited for usecases that need an immediate response/low latency. This involves making a call to the application synchronously i.e sending a request to the application and blocking until the response is received. The default code snippets in the playground are for synchronous calls.

Limitations: There is some possibility of timeouts if the LLM is temporarily unavailable (e.g. OpenAI outage) or traffic is throttled.

  1. Asynchronous

This is meant for cases where latency is less critical or we expect each request to take a long time to complete. This involves making a call to the application asynchronously i.e sending a request to the application and polling for the response separately.

Advantages compared to synchronous calls:

  • No timeout errors - even if the LLM takes a long time to process, or if we have to make multiple LLM calls (e.g. in a multi-step chain)
  • More robust to LLM throttling/transient availability - Refuel will take care of retries

Limitations:

  • Marginally higher latency per request
  • Separate API calls to submit a request, and then to retrieve the results
  • Currently not supported through the Python or Javascript SDKs

How to use: The async workflow has two steps:

  1. Submit a request The request will be exactly the same as the curl request for a synchronous call, but with an extra query parameter to tell Refuel to process this request asynchronously: is_async=True. The endpoint will return an HTTP status code 202, and the response schema, if the is_async flag is set to True, is the following:
{
  "application_id": "<Application ID>",
  "application_name": "<Application Name>",
  "refuel_output": [
    {
      "refuel_uuid": "<refuel_uuid>",
      "refuel_api_timestamp": "<refuel_api_timestamp>",
      "uri": "<uri>"
    }
  ]
}

The refuel_uuid is the unique identifier for this resource, and should be used for retrieving results in Step 2.

  1. Retrieve results This is supported by the GET /applications//items/ endpoint. An example curl request -
Curl
curl -X 'GET' \
    'https://cloud-api.refuel.ai/applications/<Your Application ID>/items/<refuel_uuid>' \
    -H 'accept: application/json' \
    -H 'Authorization: Bearer <Your API Key>' \
    -H 'Content-Type: application/json'

This endpoint will return the following status codes:

  • Status 202 - if the request is still being processed
  • Status 200 - if the request is completed processing (along with the LLM output response)
  • Status 404 - if no such refuel_uuid can be found

The response schema of the asynchronous call is the same as that of the synchronous response seen in the playground.

Optional parameters

Explanations

In addition to the label output, you can also request explanations for why a model chose a particular label.

Model override

The model used for the application call can be changed from the default model specified in the application definition. This is useful to try out different models for the same application, without having to re-deploy the application.

Rate limits

Currently, we impose the following rate limits on all applications:

  • 600 requests/min
  • Upto 100 concurrent requests

Requests will be throttled if these limits are exceeded. We can increase these limits for specific applications if needed. Contact us if you need to increase these limits.