JOURDANLABS
← CRUCIBLE
CRUCIBLE / Benchmark Program

Six benchmarks. Publicly reproducible.

Every result ships with a sealed corpus (SHA-verified), honest baselines against real tools, per-fix attribution, self-assessed limitations, and step-by-step reproduction instructions. Engine implementations are proprietary. Results are not.

At a glance
SIGNAL
F1 0.639
24.3mo median · pharmacovigilance
CITADEL
F1 0.616
400/660 coverage · corporate hierarchy
SENTINEL
94.0%
held-out accuracy · SOC triage
ORACLE
51%
vs 31% / 25% baselines · factual verification
LENS
25×
vs grep · intent-based search
COMPASS
15/15
within-1-tier · reading-level calibration
All benchmarks
SIGNAL
Pharmacovigilance
Adverse drug event extraction from clinical narratives
24.3mo median detection window
F1 0.639
View →
CITADEL
Financial compliance
Corporate subsidiary hierarchy reconstruction from SEC Exhibit 21
400-entity corpus · Checkpoint E.2
F1 0.616
View →
SENTINEL
Security operations
SOC alert triage with HEIMDALL confidence gate
held-out accuracy
94.0%
View →
ORACLE
Factual verification
Claim verification with honest refusal against sealed KB
vs 31% / 25% always-confident baselines
51%
View →
LENS
Semantic search
Intent-based retrieval for dense technical corpora
vs grep on intent queries
25×
View →
COMPASS
Document calibration
Reading-level tier assignment for technical documents
within-1-tier · research-paper category
15/15
View →