COMPASS

Reading-level calibration.

15/15

WITHIN-1-TIER

Research

PAPERS

Calibrated

TIER ASSIGNMENT

What It Is

COMPASS tests reading-level complexity calibration on research papers. The system must classify text complexity within one tier of the ground-truth label. Research papers are the hardest category — they combine technical vocabulary, discipline-specific knowledge, and high inferential demand.

The 15/15 within-1-tier result means every research paper in the test set was assigned a complexity tier within one level of its ground-truth classification. Surface metrics (Flesch-Kincaid, Gunning Fog) routinely mis-classify research papers. COMPASS gets all 15 within one tier.

Methodology

CorpusSealed research paper corpus

BaselinesFlesch-Kincaid, Gunning Fog, Coleman-Liau

PipelineMulti-dimensional complexity scoring

MetricWithin-1-tier accuracy

DimensionsVocabulary, domain-specificity, argument structure, inferential load

ReproducibilityFull instructions in GitHub repo

Reproducibility

Corpus15 research papers (sealed)
MetricWithin-1-tier classification
Repogithub.com/jourdanlabs/benchmarks/compass

Limitations

Within-1-tier, not exact. Metric counts within-1-tier matches, not exact matches. Exact-match accuracy is lower and documented in repo.

English-only. Corpus and pipeline are English-language only. Multilingual calibration out of scope.

Domain coverage. Tier system designed for benchmark corpus document types. Novel document types may produce degraded calibration.