Tiny-ML Leaderboard

Sub-100M parameter language models, same eval harness, transparent methodology.

Why this exists. The community deserves a single place to compare tiny LMs fairly. We include every model with verifiable benchmarks — ours, our competitors', yours. Submit a model via PR.

Detailed Results

All
# Model Org Params WikiText-2 ↓ BLiMP ↑ ARC-Easy ↑ Training Tokens Released Links

Model Release Timeline

Chronological order

Benchmark Overview

BLiMP ↑

Higher is better

ARC-Easy ↑

Higher is better

WikiText-2 Perplexity ↓

Lower is better · bubble size = perplexity (smaller bubble = better)

Model Efficiency

Params vs Avg Score

Parameters vs Avg Score

High efficiency zone (≥1σ above trend) highlighted

Avg trend High-efficiency threshold Outperforming zone

Models Released by Org

Add your model

Open a PR with your model's benchmark results and reproduction steps. We require: params, training data provenance, eval harness used, and scores for at least 2 of the 3 benchmarks.

Submit Model