COMPASS
Calibrated reading-level assessment for technical documents.
What it is
COMPASS calibrates reading-level assessments for documents across a complexity spectrum. Existing tools (Flesch-Kincaid, Gunning Fog, Coleman-Liau) score surface features — sentence length, syllable count, word frequency — and produce numbers that are easy to game and poor predictors of actual comprehension difficulty for technical documents.
COSMIC's COMPASS pipeline scores documents across multiple dimensions (vocabulary complexity, domain-specificity, argument structure, inferential load) and produces a calibrated tier assignment rather than a raw score. The within-1-tier metric is the key performance indicator: a within-1-tier error means the system assigned a tier adjacent to the correct tier, which is an acceptable placement for most real-world applications.
The 15/15 result on the research-paper category is the headline number because research papers are the hardest category — they combine technical vocabulary, discipline-specific knowledge, and high inferential demand. Surface metrics routinely mis-classify research papers as below their actual difficulty. COMPASS gets all 15 within one tier.
Limitations
Within-1-tier, not exact. The 15/15 metric counts within-1-tier matches, not exact matches. Exact-match accuracy on the research-paper category is documented in the repo and is lower.
English-only. The corpus and pipeline are English-language only. Multilingual reading-level calibration is out of scope.
Domain coverage. The tier system was designed for the document types in the benchmark corpus. Novel document types not represented in the sealed corpus may produce degraded calibration.