Methodology
Harness Arena aggregates metadata exported from local AI coding harness histories. Before anything reaches this site, the uploader normalizes each harness's native history format into a shared session metric schema. No prompt or response content is displayed. This page defines the metrics shown in the UI and explains what each data-completeness badge means.
Normalization Model
Each supported harness stores history differently. Claude, Codex, Gemini, Cursor Agent, and OpenCode all have their own log layouts, retention behavior, and native field names.
The uploader parses those harness-specific sources and maps them into a common session schema before upload. The site then aggregates only that normalized schema, rather than reinterpreting each harness independently.
That means cross-harness totals such as tokens, prompts, sessions, tool calls, subagents, and MCP calls are all computed from the same normalized fields after ingestion.
Metric Definitions
| Metric | Definition |
|---|---|
| Tokens | Sum of `sessions.total_tokens` across the selected scope. |
| Prompts | Sum of `sessions.message_count_user` across the selected scope. |
| Sessions | Count of session records in the selected scope. |
| Tool Calls | Sum of `sessions.total_tool_calls` across the selected scope. |
| Subagents | Sum of `sessions.subagent_calls` across the selected scope. |
| MCP Calls | Sum of `sessions.mcp_calls` across the selected scope. |
| Days Active | Count of distinct daily activity rows after daily data is merged by date. |
| Tokens / Prompt | Project detail view: `round(total_tokens / total_prompts)` when prompts > 0, else `0`. |
| Tools / Prompt | Project detail view: `round((total_tool_calls / total_prompts) * 10) / 10` when prompts > 0, else `0`. |
| Intervention Rate | Current site formula: `round((total_prompts / (total_tokens / 1000)) * 100) / 100` when tokens > 0, else `0`. |
| Leaderboard Rank | Users are ranked by total tokens descending. |
Data Completeness
Completeness describes how much of the original harness history still survives locally for the sessions contributing to a view. It affects how confidently metrics such as tokens, tool calls, and daily detail can be interpreted.
Full data
Every counted session for the current scope came from the harness's primary history source, so token, prompt, model, timing, and tool details are available at normal fidelity.
Incomplete data
Some or all sessions for the current scope are missing detailed metrics. Session existence, prompt counts, and dates may be available, but token counts, tool calls, and richer per-session metadata may be absent due to harness garbage collection or partial sync.
How Completeness Affects Metrics
`Tokens`, `Tool Calls`, `Subagents`, `MCP Calls`, and most derived metrics are most trustworthy when completeness is `Full data`. These metrics may be understated when a project shows `Incomplete data`.
`Prompts`, `Sessions`, and coarse activity timing can often survive longer because some harnesses keep lightweight indexes or prompt-history files after richer session logs are pruned.