SIGNAL
Adverse drug event extraction from clinical narratives and pharmacovigilance reports.
What it is
SIGNAL identifies adverse drug events (ADEs) in unstructured clinical text — spontaneous reports, case narratives, and pharmacovigilance databases. An ADE is any harmful outcome associated with a drug exposure. Finding them early matters: the 24.3-month median detection window represents how far in advance SIGNAL identifies safety signals before they appear in regulatory action.
Unlike LLM-based extraction pipelines, SIGNAL uses a sealed corpus of FAERS (FDA Adverse Event Reporting System) data — a public-domain government dataset — combined with a deterministic entity extraction and normalization pipeline. No inference is made about causality or severity unless the corpus explicitly supports it. Ambiguous mentions trigger honest refusal via AURORA.
This matters because pharmacovigilance has a regulatory chain of custody requirement. If a safety signal leads to a label change or market withdrawal, regulators ask: what was your evidence, and when did you detect it? SIGNAL answers both questions with SHA-verified receipts.
Results
| System | F1 | Precision | Recall | Refusal Rate |
|---|---|---|---|---|
| SIGNAL v0.1 (COSMIC) | 0.639 | 0.712 | 0.580 | reported per-class |
| Keyword baseline | 0.550 | 0.481 | 0.644 | 0.000 |
| Dictionary lookup | 0.512 | 0.773 | 0.384 | 0.000 |
Baselines are real implementations — keyword matching against the MedDRA dictionary and drug-name dictionary lookup — not straw men. SIGNAL outperforms both on F1.
Reproducibility
FDA FAERS (public domain)SHA-256 in CHECKPOINT_RESULTS.mdgithub.com/jourdanlabs/benchmarks/signalLimitations
FAERS report quality variance. FAERS contains spontaneous reports with highly variable text quality. The pipeline performs significantly better on structured reporter-written narratives than on consumer-submitted reports.
Causality not inferred. SIGNAL identifies co-mentions of drug and adverse event. It does not assess or claim causal relationship.
Corpus recency. The sealed corpus covers a specific time window. Drug safety landscapes evolve; corpus refresh cycles are not yet defined.