CRUCIBLE / VANTAGEFLAGSHIP

Products validating products.

VANTAGE is the COSMIC diagnostic suite and now the CLARION-powered code auditor. VANTAGE X was already strong. VANTAGE 2.0 rebuilt the auditor around reject-first verification and cleared the internal fixture suite with 100% expected recall.

Request a VANTAGE scan →

VANTAGE 2.0 / CLARION receipts

The auditor beat the auditor.

The first VANTAGE line established a hard internal bar for deterministic code review. VANTAGE 2.0 did not chase a bigger confidence score. It changed the question: what claims about this codebase cannot survive contradiction, missing proof, stale evidence, unsafe execution patterns, or brittle architecture?

That CLARION posture produced a cleaner benchmark profile: every expected finding recovered, no forbidden-hit classes, no severity mismatches, and audit-hashable output that can be rerun rather than trusted.

VANTAGE 2.0 branch →CLARION / OMNIS branch →

Benchmark cases

9 / 9

all VANTAGE 2.0 fixture suites cleared

Expected recall

100%

15 of 15 expected findings recovered

Forbidden hits

0

no banned false-positive classes triggered

Severity drift

0

no expected severity mismatches

FixturePressureOutcome

clean-nodefalse-alarm disciplinemurked

package-hygienemissing build/test/lint/readme/license signalsmurked

runtime-dangerchild process, env, destructive filesystem, dynamic executionmurked

architecture-shapecircular dependency and long-function detectionmurked

duplicate-familysame project across variant namesmurked

Internal deterministic code-audit benchmark. This is a product validation receipt, not a public held-out benchmark claim.

Scan Receipts

DivisionTaskScoreRefusalCorpus

MineralLogicMineral title chain reconstruction88.5%11.0%sealed · ATLAS

PropertyGraphOwnership & encumbrance graph88.9%8.2%sealed · ATLAS

NAUTILUSCommodity flow modeling81.5%14.7%sealed · ATLAS

Valhalla AICross-domain reasoning (CRUCIBLE-adjacent)91.7%5.1%sealed · CRUCIBLE

SENTINELSOC alert triage94.0%6.0%sealed · HEIMDALL

ORACLEFactual claim verification51.0%67.5%cd5de198 · CRUCIBLE

BCa bootstrap confidence intervals (B=2,000) computed per scan. Full receipts available on request. BACCHUS and HELIX codebases will be added to VANTAGE scan rotation in the coming weeks.

How VANTAGE works

Sealed corpus per domain

Each VANTAGE scan runs against a domain-specific corpus sealed before the scan begins. The corpus SHA is published in the receipt. You can verify the corpus was not modified between scans.

Full COSMIC pipeline execution

NOVA → ECLIPSE → PULSAR → AURORA → LUNA. Every stage runs deterministically. No LLM calls at runtime. The AURORA gate refuses tasks where aggregate confidence falls below threshold — refusal rates are reported, not suppressed.

Per-engine confidence attribution

VANTAGE produces a failure-class taxonomy alongside the score. Each failure class is attributed to a specific pipeline stage with a confidence estimate. The taxonomy is what informs the benchmark checkpoint arc — it's how CITADEL's E.1 and E.2 fixes were identified.

Sealed receipt with BCa CIs

The scan receipt includes: corpus SHA, per-task scores, refusal rate, BCa bootstrap confidence intervals (B=2,000), and a LUNA audit chain head. The receipt is tamper-evident. If you share it with a third party, they can verify it against the public corpus.

Engines covered

NOVA

Initial evidence retrieval and claim decomposition

ECLIPSE

Adversarial challenge generation

PULSAR

Evidence aggregation and conflict resolution

LUNA

SHA-chained audit log, tamper-evident per-run

AURORA

Confidence gate; emits refusal below threshold

HEIMDALL

Tier gating; capability access control

DOLOS

Normalization and entity disambiguation

Engine implementations are proprietary. The VANTAGE diagnostic framework ships without engine source code. To run full diagnostics, request an evaluation API key.

What is and is not claimed

Is claimed:VANTAGE correctly identified CITADEL failure classes A, B, C with stated confidence levels (95%, 88%, 80%). The E.1 and E.2 fixes were implemented from VANTAGE's diagnosis and produced the documented F1 improvements.

Is not claimed: VANTAGE scan scores are capability demonstrations, not published benchmarks. Division scan scores (ATLAS 88.5%, BACCHUS 88.9%, HELIX 81.5%) have no held-out ground truth separate from the scan corpus. The CRUCIBLE numbers (SENTINEL 94.0%, ORACLE 51.0%) are the publicly published sealed benchmark results.

View the sealed benchmark program →Request an evaluation →