CRUCIBLE

Open research and validation infrastructure.

CRUCIBLE is the JourdanLabs research division. It runs the benchmark program, maintains the COSMIC methodology playbook, and operates VANTAGE — the diagnostic suite that validates the other four divisions.

What's in CRUCIBLE

VANTAGE

Flagship

Multi-engine capability diagnostic with per-division scan receipts. Products validating products.

Sealed corpus · BCa CI · Honest refusal rates

Explore →

Benchmark Program

6 benchmarks

Six sealed benchmarks spanning pharmacovigilance, corporate hierarchy, SOC triage, factual verification, semantic search, and reading calibration.

Open corpora · Honest baselines · Reproducible

Explore →

Methodology

7 principles

The JourdanLabs playbook: sealed corpora, honest baselines, per-fix attribution, deterministic at runtime, LUNA audit trails.

Non-negotiable · Applies to all benchmarks

Explore →

Reproducibility

Public

Step-by-step instructions for re-running any benchmark. Corpus SHA verification. Scoring harness documentation.

No special access · Engine API for full results

Explore →

RAVEN

Coming

RAVEN is a next-generation research initiative. Details to follow.

—

Explore →

Why open research?

CRUCIBLE benchmarks are public because the claim that COSMIC outperforms baselines in regulated domains is only credible if it can be verified. Engine implementations are proprietary — the corpus, scoring harnesses, and baseline code are not. Anyone can run the baselines. Anyone can verify the corpus SHA. Anyone can point their own pipeline at our scoring harness and compare.

This is how SuperGLUE, HELM, and other credible benchmark programs operate. We follow the same model.

github.com/jourdanlabs/benchmarks →Read the methodology →