Credits — How It Works

The final cost is calculated in real time based on the cost of the specific request — to the specific model, under the specific conditions. How much compute is required, how much it costs us to serve — there are many dynamic variables. We calculate in tokens internally, but we show users a clear and stable unit: a credit.

What is a “credit”

A credit is an abstraction over real cost. It is simply the unit we use in the interface. Behind the scenes, it translates into tokens and real cost, taking the following factors into account:

the chosen model;
the number of input tokens (prompt);
the number of output tokens (response);
system messages and auxiliary contexts;
tools (tools / agents) used and external integrations.

Simply put: we calculate everything internally in tokens and actual cost, then present users with a convenient, stable currency — credits.

Why the cost is variable

Different models consume different amounts of compute.
Input/output length directly affects execution time and price.
Using additional system messages and agents increases the load.
If a request is poorly specified, the model performs more computations and clarification attempts.

Therefore, the final amount is calculated at the moment the request is executed — precisely for those specific conditions.

Recommendations for reducing credit usage

To use credits more efficiently and get predictable results:

State your task clearly. The more specific the request, the faster and cheaper the response.
Break large tasks into steps. Instead of “do X”, ask “1) prepare a plan; 2) write a skeleton; 3) implement function A”.
Choose your model deliberately. For simple assistance, use a lightweight model; for deep generation, use a powerful one. To learn more about available models, visit the Models page.
Reuse system prompts. Do not send the same large contexts every time if it is not necessary.

Transparency and data export

Transparency is very important to us — we strive for maximum openness and are actively working to provide more and more information in various ways. One such capability is the usage export feature. The export will include detailed information for each request, such as:

request ID and timestamps;
model type and name;
number of input tokens and output tokens;
system messages, prompts used, and contexts;
tools / agents and external integrations involved;
total cost in tokens and in credits;
execution latency and status (success / error);
size of transferred files or context size (if applicable);
processing region/cluster (if relevant) and other metadata (retries, number of iterations, external API calls).

We plan to make the export easy to analyze and to add filtering by date range, models, and request types, so you can easily reconcile expenses and understand exactly what credits were charged for.

Transparency and fairness

Credits are charged based on the real cost of the request to us. We are committed to fair and transparent accounting: internally — precise calculations in tokens and resources; externally — a simple, understandable unit for the user. If you wish, you will be able to get a detailed breakdown for each charge through the usage export or in the request logs.

In brief

A credit = a convenient unit for displaying the real, dynamic cost of compute. The range for typical requests is from 0.1 (micro-tasks) to 20 (complex, ambiguous requests). The more precise and concise the request, the cheaper and faster the result. We are working to give you as much detail as possible — and will soon add an export showing all key billing parameters.