Public Benchmark Feed

Gemini vs ClaudeLive Scorecard

Real benchmark runs from the ekkOS proxy stack. We track long-horizon constraint recall, task completion, tool correctness, latency, and cost on the same workload suite.

Top Model

Waiting for data

No benchmark rows yet

Runs (72h)

Calculating...

Global Pass Rate

Weighted by run volume

Feed Status

Refreshing...

Waiting for first sync

source: unknown

Live proxy rows use derived quality metrics unless benchmark overrides are provided.

Provider Standings

window: 72h

Model Leaderboard

auto-refresh 15s

Rank	Model	Score	Pass	Recall	Tool	Latency	Cost	Runs

Recent Runs

most recent first