SIGNAL

Adverse drug event extraction from clinical narratives and pharmacovigilance reports.

0.639

24.3

MONTHS MEDIAN DETECTION

0.712

PRECISION

What It Is

SIGNAL identifies adverse drug events (ADEs) in unstructured clinical text — spontaneous reports, case narratives, and pharmacovigilance databases. The 24.3-month median detection window represents how far in advance SIGNAL identifies safety signals before they appear in regulatory action.

Unlike LLM-based extraction pipelines, SIGNAL uses a sealed corpus of FAERS data — a public-domain government dataset — combined with a deterministic entity extraction and normalization pipeline. No inference is made about causality or severity unless the corpus explicitly supports it. Ambiguous mentions trigger honest refusal via AURORA.

Results

SystemF1PrecisionRecallRefusal Rate

SIGNAL v0.1 (COSMIC)0.6390.7120.580reported per-class

Keyword baseline0.5500.4810.6440.000

Dictionary lookup0.5120.7730.3840.000

Baselines are real implementations — keyword matching against MedDRA and drug-name dictionary lookup — not straw men.

Reproducibility

Corpus sourceFDA FAERS (public domain)
Corpus sealSHA-256 in CHECKPOINT_RESULTS.md
Repogithub.com/jourdanlabs/benchmarks/signal

Limitations

FAERS report quality variance. Incomplete, noisy, or duplicative reports can affect extraction quality and downstream signal timing.

Causality not inferred. SIGNAL does not infer causality or severity unless the corpus explicitly supports it.

Corpus recency. FAERS is updated monthly; detection latency depends on data availability and processing cadence.