CRUCIBLE / VANTAGEFLAGSHIP

Products validating products.

VANTAGE is the COSMIC diagnostic suite and now the CLARION-powered code auditor. VANTAGE X was already strong. VANTAGE 2.0 rebuilt the auditor around reject-first verification and cleared the internal fixture suite with 100% expected recall.

Request a VANTAGE scan →
VANTAGE 2.0 / CLARION receipts

The auditor beat the auditor.

The first VANTAGE line established a hard internal bar for deterministic code review. VANTAGE 2.0 did not chase a bigger confidence score. It changed the question: what claims about this codebase cannot survive contradiction, missing proof, stale evidence, unsafe execution patterns, or brittle architecture?

That CLARION posture produced a cleaner benchmark profile: every expected finding recovered, no forbidden-hit classes, no severity mismatches, and audit-hashable output that can be rerun rather than trusted.

Benchmark cases
9 / 9
all VANTAGE 2.0 fixture suites cleared
Expected recall
100%
15 of 15 expected findings recovered
Forbidden hits
0
no banned false-positive classes triggered
Severity drift
0
no expected severity mismatches
FixturePressureOutcome
clean-nodefalse-alarm disciplinemurked
package-hygienemissing build/test/lint/readme/license signalsmurked
runtime-dangerchild process, env, destructive filesystem, dynamic executionmurked
architecture-shapecircular dependency and long-function detectionmurked
duplicate-familysame project across variant namesmurked

Internal deterministic code-audit benchmark. This is a product validation receipt, not a public held-out benchmark claim.

Scan Receipts
DivisionTaskScoreRefusalCorpus
MineralLogicMineral title chain reconstruction88.5%11.0%sealed · ATLAS
PropertyGraphOwnership & encumbrance graph88.9%8.2%sealed · ATLAS
NAUTILUSCommodity flow modeling81.5%14.7%sealed · ATLAS
Valhalla AICross-domain reasoning (CRUCIBLE-adjacent)91.7%5.1%sealed · CRUCIBLE
SENTINELSOC alert triage94.0%6.0%sealed · HEIMDALL
ORACLEFactual claim verification51.0%67.5%cd5de198 · CRUCIBLE

BCa bootstrap confidence intervals (B=2,000) computed per scan. Full receipts available on request. BACCHUS and HELIX codebases will be added to VANTAGE scan rotation in the coming weeks.

How VANTAGE works
01

Sealed corpus per domain

Each VANTAGE scan runs against a domain-specific corpus sealed before the scan begins. The corpus SHA is published in the receipt. You can verify the corpus was not modified between scans.

02

Full COSMIC pipeline execution

NOVA → ECLIPSE → PULSAR → AURORA → LUNA. Every stage runs deterministically. No LLM calls at runtime. The AURORA gate refuses tasks where aggregate confidence falls below threshold — refusal rates are reported, not suppressed.

03

Per-engine confidence attribution

VANTAGE produces a failure-class taxonomy alongside the score. Each failure class is attributed to a specific pipeline stage with a confidence estimate. The taxonomy is what informs the benchmark checkpoint arc — it's how CITADEL's E.1 and E.2 fixes were identified.

04

Sealed receipt with BCa CIs

The scan receipt includes: corpus SHA, per-task scores, refusal rate, BCa bootstrap confidence intervals (B=2,000), and a LUNA audit chain head. The receipt is tamper-evident. If you share it with a third party, they can verify it against the public corpus.

Engines covered
NOVA
Initial evidence retrieval and claim decomposition
ECLIPSE
Adversarial challenge generation
PULSAR
Evidence aggregation and conflict resolution
LUNA
SHA-chained audit log, tamper-evident per-run
AURORA
Confidence gate; emits refusal below threshold
HEIMDALL
Tier gating; capability access control
DOLOS
Normalization and entity disambiguation

Engine implementations are proprietary. The VANTAGE diagnostic framework ships without engine source code. To run full diagnostics, request an evaluation API key.

What is and is not claimed

Is claimed:VANTAGE correctly identified CITADEL failure classes A, B, C with stated confidence levels (95%, 88%, 80%). The E.1 and E.2 fixes were implemented from VANTAGE's diagnosis and produced the documented F1 improvements.

Is not claimed: VANTAGE scan scores are capability demonstrations, not published benchmarks. Division scan scores (ATLAS 88.5%, BACCHUS 88.9%, HELIX 81.5%) have no held-out ground truth separate from the scan corpus. The CRUCIBLE numbers (SENTINEL 94.0%, ORACLE 51.0%) are the publicly published sealed benchmark results.

View the sealed benchmark program →Request an evaluation →