Skip to content

Group Health

Group health helps determine whether an API issue is an isolated request failure or a concentrated issue in a plan, model, upstream group, or team member. Enterprise and team admins can use it to answer three questions quickly:

  • Which group has a lower success rate in the selected time range
  • Which user or token contributes most of the request volume, cost, or errors
  • Whether the error is isolated to one token or already affects the whole group

When troubleshooting API errors, check group health first, then open the single usage log entry to locate the specific request_id.

Data scope

The public status page embedded here queries group health across all aicentos users during the selected time window. It reflects platform-wide group availability and is real-time, impartial, and stable.

The Usage Logs -> Group Health view inside the console counts data visible under the current account permissions. Personal users usually see only their own tokens; enterprise and team admins can review team usage by user, username, token, and group.

If the status page above does not load correctly, open aicentos Group Health directly.

Console entry: Console -> Usage Logs. In error logs or statistics views, filter by time range, model, token, group, error message, and status code.

Console Example

The screenshot below shows Usage Logs -> Group Health in the console, including group success rate, request count, cost, cache hit data, average latency, latest request time, and failure reasons.

Console group health example

Usage principle

Identify the impact scope first, then handle the single error. A single log entry is best for locating one request; group health is best for determining whether the issue is concentrated.

For single error message explanations, see Error Logs.

List Columns

The console list and CSV export use the same display columns. The list contains two row types:

  • Group row: summarizes the overall health of one group in the selected time range.
  • Token row: shows user and token details under a group, which helps enterprise and team admins locate members, projects, or services.
Display columnApplies toDescriptionHow to use it
TypeGroup row, token rowIdentifies whether the row is a Group summary or a Token detailCheck group rows first for overall status, then token rows for a member or token
GroupGroup row, token rowGroups seen in the selected time range, including pay-as-you-go groups, plan groups, default groups, or model-specific groupsCheck whether the issue is concentrated in one plan, model, or upstream resource pool
User IDToken rowUser ID that used the tokenUse it to locate the member account in enterprise troubleshooting
UsernameToken rowUsername that used the tokenUse it for team reports, member communication, and permission checks
TokenToken rowToken name configured in the consoleCheck whether the issue is isolated to one token
Success RateGroup row, token rowSuccess Rate = successful requests / total requestsPay attention when it is below 80%; if it is clearly lower than peer rows, check that group or token first
RequestsGroup row, token rowTotal request count in the selected time rangeAvoid over-reading success rate when the sample size is small
SuccessGroup row, token rowSuccessful requests that returned 2xxRead it together with Requests and Errors to judge availability
ErrorsGroup row, token rowRequests that returned errors (4xx/5xx)When errors rise, check Failure Reason and error logs first
CostGroup row, token rowAccumulated quota/cost consumption in the selected time range, exported in console currency formatUse it for team cost accounting, project allocation, and abnormal cost detection
Cache Hit RateGroup row, token rowCache Hit Rate = cache-hit tokens / total tokensHigher is cheaper; cache-hit parts are usually billed at a lower price or free
Cache TokensGroup row, token rowNumber of cache-hit tokens in the selected time rangeThis part is usually billed at a low discount, so more means more savings
Cache RequestsGroup row, token rowNumber of requests that hit cache at least onceShows how many requests actually used cache
Cache Request ShareGroup row, token rowCache Request Share = cache-hit requests / total requestsHigher means more calls benefited from cache discounts
Avg Cache TokensGroup row, token rowAverage number of tokens per cache hitCompare cache reuse efficiency across members, services, or groups
Avg LatencyGroup row, token rowAverage request latency in secondsLower means faster upstream response; when latency rises, check long context, long output, and tool chains
Start TimeGroup row, token rowFirst time this group or token appeared in the current time rangeLocate when the issue or traffic started
Latest RequestGroup row, token rowMost recent time this group or token appeared in the current time rangeCheck whether the issue or traffic is still ongoing
Failure ReasonGroup rowTop failure reasons by frequency, including status code and count; empty or - when there are no errorsHandle the most frequent error first; do not rely only on the latest log entry

Field source

The display columns are generated from backend statistics. For day-to-day use, follow the console list and CSV export columns; only map them to raw field names when integrating an API or doing technical troubleshooting.

Team diagnosis

Check group rows first to decide whether it is a resource-pool issue, then check token rows to see whether a user or token caused it. If the group success rate is normal but one token has a high error rate, check that member's token, model name, client configuration, or request body first.

CSV Export

CSV export uses the same columns as the current list. It is suitable for enterprise and team weekly reports, cost allocation, incident reviews, and member usage reconciliation.

After exporting, you can preview the file with the online CSV viewer. It supports dragging or selecting a CSV file, and it can also parse pasted CSV text, which is useful for quickly checking columns and failure reasons.

Export behaviorDescription
Group rowType is Group; User ID, Username, and Token are usually empty, representing the group summary
Token rowType is Token; User ID, Username, and Token are shown, representing member or token details under the group
Currency formatCost uses the console currency format, such as ¥905.48
Percentage formatSuccess Rate, Cache Hit Rate, and Cache Request Share are exported as percentages
Number formatLarge numbers may include thousands separators for direct reading or spreadsheet import
Time formatStart Time and Latest Request are exported as local time, making them easier to align with incident time
Failure ReasonMultiple high-frequency errors are merged and include occurrence counts at the end; empty or - when there are no errors

Troubleshooting Flow

1. Determine the impact scope

Check rows where Type=Group first. If Success Rate is close to normal and Errors is low, it is usually a transient error. Copy the request_id from the single request and continue troubleshooting.

If one group's Success Rate is clearly lower than other groups, or Errors is concentrated, prioritize group-level checks for model, token, upstream account, plan permissions, and platform resource status.

In enterprise or team scenarios, check the Type=Token rows under that group. If only one user or token is abnormal, check that member's client configuration, token, model name, request body, and concurrency strategy first.

2. Check top failure reasons

Failure Reason is usually sorted by occurrence count. Start with the most frequent error, then inspect lower-frequency errors. High-frequency errors show the main failure type in the current time range.

Error typeCommon log keywordsInitial attributionWhat to check first
Rate limitAccount RPM limit exceeded, Max 10/min, Max 5/minUsage issue or upstream limitConcurrency or requests per minute are too high
Daily quota limitAccount daily limit exceededUpstream limitUpstream account daily quota is exhausted
Credential cooldownAll credentials ... are cooling downUpstream limitAll upstream credentials for the current model are cooling down
Request body too largestatus_code=413, openai_errorUsage issueContext, files, images, or tool results are too large
Permission or authentication401, 403, Invalid API key, pending admin approvalUsage issue or account statusToken, plan, group, or model permissions are abnormal
No available resourceNo available accounts, No available channel, auth_unavailablePlatform issue or configuration issueCurrent group has no available account, channel, or authentication resource
Upstream error502, all upstreams failed, Upstream request failedUpstream issueUpstream service or intermediate network is abnormal
Gateway timeout504, 521, 522, 524Upstream or network issueUpstream connection, read, or response timed out
Platform resource protectionsystem disk overloaded, Service UnavailablePlatform issuePlatform node or upstream resource is temporarily unavailable
Image endpoint formatgpt-image-2, prompt is required, multipart formUsage issueImage endpoint path, prompt, or upload format is wrong
Tool-call formattool_use, tool_result, Invalid schemaUsage issueClient tool messages or JSON Schema do not meet requirements

3. Handle by impact scope

SymptomMore likely causeWhat to do
Only one token failsToken configuration, permission, or local request format issueCopy the token again and check client configuration and request body
Only one model failsModel permission, model channel, or upstream model resource issueSwitch to a similar model and confirm the current plan supports the model
Only one group has a low success rateGroup resource pool, plan permission, or upstream account issueSwitch group/model; when contacting support, provide the group name and time range
Multiple groups show 502, 504, 521, 522, or 524 at the same timeUpstream or network path issueRetry later and reduce long-running tasks; contact support if it persists
Multiple requests show 413Request body is too largeShorten context, split files, compress images, or reduce tool results
Multiple requests show 429Request rate is too high, daily quota is exhausted, or credentials are cooling downReduce concurrency; distinguish RPM, daily limit, and cooldown from the log

4. Combine cost and cache data

SymptomMore likely causeWhat to do
Cost is clearly higher than other tokens in the same groupLarge context, long output, high-frequency calls, or repeated tasksCombine Requests, Avg Latency, and error logs to locate the service or member
Cache Hit Rate is high but Cache Request Share is lowA small number of large requests hit cacheCheck whether only fixed tasks are reusing context
Cache Request Share is high but Avg Cache Tokens is lowMany requests hit cache, but each hit saves littleCheck whether context is too short or cache content is unstable
One token has clearly higher Avg LatencyHeavy client tasks, long context, long output, or slow upstreamCompare that token's Requests, cache data, Failure Reason, and single logs

Information for Support

For simple issues, check Error Logs and Group Health first. If the issue remains unresolved, open the error log details in console/log and click the copy icon to copy the troubleshooting details in one click. When contacting support, provide the following in one message so the technical team can investigate with less back-and-forth:

  • User ID
  • Time range: when the issue started and when it last appeared
  • Group name: group
  • Model name used by the request
  • Status code, such as 429, 413, 502, or 503
  • Error content: error_reasons.content
  • Request ID: request_id from a single log entry or API response
  • Impact scope: one token, one model, one group, or multiple groups at the same time

Quick summary

For 401 / 403, check permissions. For 413, check request body size. For 429, check rate and quota. For 502 / 504 / 524, check upstream and long-running tasks. For 503, check whether resources are temporarily unavailable.