JOURDANLABS
← BenchmarksSIGNAL · Pharmacovigilance NLP

SIGNAL

Adverse drug event extraction from clinical narratives and pharmacovigilance reports.

F1
0.639
Median detection
24.3mo
Domain
Pharmacovigilance

What it is

SIGNAL identifies adverse drug events (ADEs) in unstructured clinical text — spontaneous reports, case narratives, and pharmacovigilance databases. An ADE is any harmful outcome associated with a drug exposure. Finding them early matters: the 24.3-month median detection window represents how far in advance SIGNAL identifies safety signals before they appear in regulatory action.

Unlike LLM-based extraction pipelines, SIGNAL uses a sealed corpus of FAERS (FDA Adverse Event Reporting System) data — a public-domain government dataset — combined with a deterministic entity extraction and normalization pipeline. No inference is made about causality or severity unless the corpus explicitly supports it. Ambiguous mentions trigger honest refusal via AURORA.

This matters because pharmacovigilance has a regulatory chain of custody requirement. If a safety signal leads to a label change or market withdrawal, regulators ask: what was your evidence, and when did you detect it? SIGNAL answers both questions with SHA-verified receipts.


Results

SystemF1PrecisionRecallRefusal Rate
SIGNAL v0.1 (COSMIC)0.6390.7120.580reported per-class
Keyword baseline0.5500.4810.6440.000
Dictionary lookup0.5120.7730.3840.000

Baselines are real implementations — keyword matching against the MedDRA dictionary and drug-name dictionary lookup — not straw men. SIGNAL outperforms both on F1.


Reproducibility

Corpus sourceFDA FAERS (public domain)
Corpus sealSHA-256 in CHECKPOINT_RESULTS.md
Repogithub.com/jourdanlabs/benchmarks/signal

Limitations

FAERS report quality variance. FAERS contains spontaneous reports with highly variable text quality. The pipeline performs significantly better on structured reporter-written narratives than on consumer-submitted reports.

Causality not inferred. SIGNAL identifies co-mentions of drug and adverse event. It does not assess or claim causal relationship.

Corpus recency. The sealed corpus covers a specific time window. Drug safety landscapes evolve; corpus refresh cycles are not yet defined.

GitHub →Reproducibility guide →