Public Benchmark Feed

Gemini vs ClaudeLive Scorecard

Real benchmark runs from the ekkOS proxy stack. We track long-horizon constraint recall, task completion, tool correctness, latency, and cost on the same workload suite.

Top Model

Waiting for data

No benchmark rows yet

--

Runs (72h)

--

Calculating...

Global Pass Rate

--

Weighted by run volume

Feed Status

Refreshing...

Waiting for first sync

source: unknown

Live proxy rows use derived quality metrics unless benchmark overrides are provided.

Provider Standings

window: 72h

Model Leaderboard

auto-refresh 15s

RankModelScorePassRecallToolLatencyCostRuns

Recent Runs

most recent first