Open research and validation infrastructure.
CRUCIBLE is the JourdanLabs research division. It runs the benchmark program, maintains the COSMIC methodology playbook, and operates VANTAGE — the diagnostic suite that validates the other four divisions.
Multi-engine capability diagnostic with per-division scan receipts. Products validating products.
Sealed corpus · BCa CI · Honest refusal rates
Six sealed benchmarks spanning pharmacovigilance, corporate hierarchy, SOC triage, factual verification, semantic search, and reading calibration.
Open corpora · Honest baselines · Reproducible
The JourdanLabs playbook: sealed corpora, honest baselines, per-fix attribution, deterministic at runtime, LUNA audit trails.
Non-negotiable · Applies to all benchmarks
Step-by-step instructions for re-running any benchmark. Corpus SHA verification. Scoring harness documentation.
No special access · Engine API for full results
RAVEN is a next-generation research initiative. Details to follow.
—
CRUCIBLE benchmarks are public because the claim that COSMIC outperforms baselines in regulated domains is only credible if it can be verified. Engine implementations are proprietary — the corpus, scoring harnesses, and baseline code are not. Anyone can run the baselines. Anyone can verify the corpus SHA. Anyone can point their own pipeline at our scoring harness and compare.
This is how SuperGLUE, HELM, and other credible benchmark programs operate. We follow the same model.