API sync High confidence
LMArena Text-to-Image
Best automatic source for mixed open/closed image-generation rankings.
- Metric
- Arena rating, higher is better
- Cadence
- Sync every 30 minutes; upstream latest split updates when LMArena republishes.
Open and closed models mixed Original source API sync High confidence
LMArena Image Edit
Complements text-to-image with editing models such as GPT Image, Gemini image, Seedream, and others.
- Metric
- Arena rating, higher is better
- Cadence
- Sync every 30 minutes.
Open and closed models mixed Original source API sync High confidence
LMArena Text-to-Video
Good primary source for text-to-video, including ByteDance, Kuaishou, xAI, Google, and other providers.
- Metric
- Arena rating, higher is better
- Cadence
- Sync every 30 minutes.
Open and closed models mixed Original source API sync High confidence
LMArena Image-to-Video
Secondary video source for image-to-video workflows.
- Metric
- Arena rating, higher is better
- Cadence
- Sync every 30 minutes.
Open and closed models mixed Original source API sync High confidence
SWE-bench Verified
Best structured source for agentic coding on real GitHub issues.
- Metric
- Resolved percentage, higher is better
- Cadence
- Sync hourly; source is a Hugging Face benchmark leaderboard API.
Open and closed models mixed Original source API sync High confidence
LMArena WebDev
Complements SWE-bench with web-development preference rankings.
- Metric
- Arena rating, higher is better
- Cadence
- Sync every 30 minutes.
Open and closed models mixed Original source API sync High confidence
Open ASR Leaderboard
Strong source for speech recognition. Audio generation leaderboards still need a secondary curated source.
- Metric
- Average WER, lower is better
- Cadence
- Sync daily or hourly; benchmark API is structured and reproducible.
Open and closed models mixed Original source CSV sync Medium confidence
OCRBench v2 English
Useful for OCR and text-centric visual understanding; treat as medium-confidence until mirrored into D1 snapshots.
- Metric
- Average score, higher is better
- Cadence
- Sync daily from raw CSV; upstream cadence is less formal than HF benchmark API.
Open and closed models mixed Original source CSV sync Medium confidence
OCRBench v2 Chinese
Chinese OCR and document-understanding companion table.
- Metric
- Average score, higher is better
- Cadence
- Sync daily from raw CSV.
Open and closed models mixed Original source API sync High confidence
LMArena Document
Good live source for document-style multimodal tasks.
- Metric
- Arena rating, higher is better
- Cadence
- Sync every 30 minutes.
Open and closed models mixed Original source API sync High confidence
LMArena Vision
Broad visual reasoning and multimodal model comparison.
- Metric
- Arena rating, higher is better
- Cadence
- Sync every 30 minutes.
Open and closed models mixed Original source HTML watch Watch source
Artificial Analysis Video
High-quality public leaderboard, but parsing HTML is more fragile than using benchmark APIs.
- Metric
- Video Arena ELO, higher is better
- Cadence
- Use as a cross-check unless a stable public API is available.
Open and closed models mixed Original source HTML watch Watch source
Aider Polyglot
Good benchmark for editing across C++, Go, Java, JavaScript, Python, and Rust, but no stable public JSON found.
- Metric
- Pass rate, higher is better
- Cadence
- Manual or HTML-parse fallback; useful as a code-editing cross-check.
Open and closed models mixed Original source