Tiny-ML Leaderboard

Sub-150M parameter language models, same eval harness, transparent methodology.

Unknown org? The "Unknown" tag is for model makers submitting their model ahead of official release. If that's you — DM glintresearch on Discord.

Detailed Results

Sort

⚡

How Efficiency (⚡) is calculated: overall benchmark score (avg of BLiMP, ARC-Easy, and normalized WikiText-2) × a size bonus for smaller models. The size bonus is capped on a log scale — the smallest model on the board tops out at 1.5×, the largest gets 1.0× (no bonus) — so a tiny model can't out-rank a much larger, better-performing one purely from an inflated size multiplier.

#	Model	Org	Params	Eff. ⚡	WikiText-2 byte_ppl ↓	BLiMP ↑	ARC-Easy ↑	Training Tokens	Released	Links

Model Release Timeline

Most recent first

Benchmark Overview

BLiMP ↑

Higher is better

ARC-Easy ↑

Higher is better

WikiText-2 Byte-Level Perplexity ↓

Lower is better · byte_ppl = exp(−loglik / total_bytes) · bubble size = perplexity (smaller bubble = better)

Model Efficiency

Leaderboard Score vs Params

Parameters vs Leaderboard Score

Scatter of each model's overall score vs its parameter count. Points above the dashed threshold line are ≥1σ above the trend. Top 3 marked.

How Efficiency (⚡) is calculated: Efficiency score = overall benchmark score (avg of BLiMP, ARC-Easy, and normalized WikiText-2) × a size bonus. The size bonus rewards smaller models but is capped — using a log scale across the full parameter range on this board, the smallest model can score at most 1.5× its raw benchmark score, and the largest gets no bonus (1.0×). This keeps small models competitive without letting a tiny parameter count alone produce an unrealistic (e.g. 100×) score gap over a larger, higher-performing model.

Avg trend High-efficiency threshold Outperforming zone