Regulatory Simulation Methodology

SECTION 01

Overview: What Adversarial Validation Means

Adversarial validation is not peer review. It is not quality assurance. It is a structured attempt to find the flaw in a piece of AI-generated biopharma science before that flaw is found by the FDA, a trial monitoring committee, or an investment committee during due diligence.

The AimwellBio methodology runs four independent AI agents against every brief submitted for validation. Each agent approaches the brief from the adversarial position of a different institutional challenger: regulatory reviewer, competitive intelligence analyst, capital risk assessor, and clinical trial failure analyst. The four agents do not collaborate during the audit. They produce independent findings, which are then synthesized into a consensus verdict.

The output is a PROCEED, DELAY, or KILL verdict, with a full audit trail showing exactly which agent identified which risk, against which ground truth reference, with what confidence level. The verdict is not a recommendation. It is an institutional accountability document.

Core principle: Every AI-generated biopharma brief contains at least one assumption that has not been validated against historical regulatory outcomes. The methodology exists to find that assumption before it compounds into a decision.

SECTION 02

The Validation Problem in AI-Generated Biopharma Science

AI-generated biopharma intelligence has proliferated faster than the institutional capacity to verify it. Clinical briefs, competitive landscapes, regulatory filing summaries, and trial design recommendations are now routinely generated by large language models and delivered to decision-makers as if they carried the same epistemic weight as data-grounded analysis.

They do not. Large language models optimize for plausibility, not accuracy. A model trained on regulatory literature will generate statistically plausible regulatory arguments. Those arguments will be internally coherent. They will not be grounded in the specific approval history, complete response letter precedents, or endpoint failure rates that a regulatory expert at FDA would reference.

The consequence is not that AI is wrong. It is that AI is selectively wrong in the direction of the outcome the user was hoping for, because the prompts are written by people who have a position. The adversarial agent is the antidote: it is prompted to disprove, not to confirm.

This is the validation problem that REGTECH is designed to solve. It is not a future problem. It is the reason organizations are receiving Complete Response Letters on submissions that AI-generated analysis declared low-risk.

SECTION 03

The Four-Agent Audit Architecture

The validation layer deploys four independent agents in parallel against every brief. Each agent operates from a distinct adversarial position and references a distinct ground truth corpus. The agents are not aware of each other's findings during the audit phase, independence is a design requirement, not a feature.

Agent 01 · Regulatory

FDA Review Simulation

Simulates the regulatory reviewer's perspective. Cross-references the brief against historical FDA decisions for the relevant therapeutic area, clinical endpoint type, and statistical design. Identifies gaps between the brief's regulatory claims and precedent.

Agent 02 · Competitive

Competitive Landscape Displacement

Maps the competitive landscape at the projected approval date, not the brief date. Identifies approved or late-stage competitors the brief does not account for, and quantifies the market access risk of a landscape the sponsor is not licensing against.

Agent 03 · Clinical Failure

Trial Failure Pattern Recognition

Audits the clinical assumptions in the brief against historical trial failure modes in the same indication. Flags enrollment assumptions, endpoint definitions, and inclusion/exclusion criteria that match the failure signatures in the historical corpus.

Agent 04 · Capital Risk

Investment Committee Stress Test

Constructs the downside scenarios the brief did not model. Quantifies capital exposure under regulatory delay, competitive displacement, and trial failure scenarios. Produces the IC stress package the brief's sponsors had no incentive to include.

After the independent audit phase, a synthesis layer produces the consensus verdict. Where agents disagree, the disagreement is documented, not suppressed. A split verdict (e.g., Regulatory PROCEED, Clinical DELAY) is a DELAY at the consensus level, with the specific risk source identified in the audit trail.

SECTION 04

Ground Truth Framework

The adversarial agents are only as credible as their reference corpus. The AimwellBio ground truth framework is built from publicly available regulatory and clinical data, not proprietary or confidential sources. This is deliberate: institutional credibility requires verifiability.

Historical FDA Decisions: A large corpus of approval decisions, complete response letters, and refuse-to-file actions indexed by therapeutic area, endpoint type, statistical design, and sponsor size.
ClinicalTrials.gov Registry: Trial registrations, protocol amendments, trial failures, and discontinuation reasons cross-referenced against the brief's clinical design claims.
Published Clinical Trial Outcomes: PubMed-indexed trial results, including negative and null results, structured by indication and endpoint category.
Regulatory Guidance Documents: FDA CDER guidance for industry, EMA scientific guidelines, ICH harmonized tripartite guidelines, and WHO technical guidance documents, referenced specifically rather than treated as training background.
Publicly Available Regulatory Correspondence: FDA Advisory Committee meeting transcripts, warning letters, and publicly disclosed CRL response documents where available.
Competitive Pipeline Data: Drugs@FDA database, EMA EPAR, and ClinicalTrials.gov pipeline registrations for competitive landscape displacement analysis.

Scope boundary: The ground truth corpus does not include proprietary sponsor data, confidential regulatory submissions, or non-public clinical datasets. Validation outputs are based solely on publicly available ground truth. This is a design choice: institutional defensibility requires that the reference corpus be auditable.

SECTION 05

Verdict Generation: PROCEED / DELAY / KILL

The three-verdict framework is designed for institutional decision-making, not analytical nuance. Each verdict has a specific operational meaning that does not require interpretation at the board level.

PROCEED

No material adversarial finding across all four agents. The brief's core claims are consistent with historical regulatory outcomes for analogous assets. Proceed with the decision this brief was prepared to support. Audit trail archived.

DELAY

At least one agent identified a material discrepancy between the brief's claims and historical ground truth. The decision this brief supports should be delayed pending resolution of the identified risk(s). Specific flags documented in audit trail.

KILL

Multiple agents identified material failures, or at least one agent identified a structural flaw that invalidates the brief's core premise. The decision this brief was prepared to support should not proceed in its current form. Full failure analysis in audit trail.

Verdict confidence is expressed as a percentage. For example, an 80% confidence DELAY means that across the historical corpus of analogous audit scenarios, 80 out of 100 reached the same verdict when the same risk pattern was present. Confidence intervals are included in the full audit trail and calibrated quarterly against new regulatory decision data.

SECTION 06

ICH & Regulatory Alignment

The methodology is aligned with, not certified by, the following regulatory guidance frameworks. Alignment means the validation logic and statistical treatment referenced in each guideline are incorporated into the relevant agent's audit criteria. It does not constitute endorsement by the referenced bodies.

Guideline	Alignment Area
ICH E9(R1) · 2020	Statistical methodology for clinical trials; estimand framework for regulatory claims; sensitivity analysis requirements. Agent 01 (Regulatory) references E9(R1) when auditing endpoint definitions and statistical design claims.
ICH E10	Choice of control group in clinical trials. Agent 03 (Clinical Failure) cross-references control group selection against E10 guidance when auditing trial design assumptions.
ICH E6(R3) · GCP	Good clinical practice standards for trial conduct. Agent 03 references E6(R3) compliance indicators when auditing clinical design assumptions against site feasibility and protocol adherence risk.
FDA CDER Guidance for Industry	Therapeutic area-specific guidance documents are referenced by Agent 01 for endpoint acceptability, labeling implications, and approval pathway claims.
EMA AI/ML Qualification Opinion	AI-specific regulatory considerations are incorporated into the framework's own documentation standards, ensuring the validation outputs themselves meet emerging AI governance expectations.

SECTION 07

Limitations & Scope

The methodology validates the consistency of AI-generated science against publicly available ground truth. It does not validate proprietary data, clinical data not yet in the public record, or decisions that have no historical regulatory analogue. Novel mechanisms, first-in-class assets, and unprecedented regulatory pathways require human expert judgment that the adversarial agent layer is designed to flag, not replace.

The verdict is an institutional accountability tool, not a regulatory determination. A PROCEED verdict does not mean the FDA will approve. A KILL verdict does not mean the asset is unviable. It means the brief, as presented, failed the adversarial test against the historical ground truth corpus available at the time of audit.

Methodology updates are published quarterly as new regulatory decision data is incorporated. Members receive notification when corpus updates materially affect active verdict confidence scores.

SECTION 08 · ADDED 2026-05-08

ccRCC Build Methodology · Disclosure Note

Clear cell renal cell carcinoma was added as the seventh vertical via a different ingest pipeline than the original six indications. We disclose the method here so partners know exactly what they are looking at and how it differs from the rest of the corpus.

Build Approach

Indication-specific keyword set: “clear cell renal cell carcinoma”, “renal cell carcinoma”, “ccRCC”, “kidney cancer”, “von Hippel-Lindau”, “papillary renal cell”, “chromophobe renal cell”, “metastatic renal cell”, combined with the full RCC drug list (belzutifan, cabozantinib, tivozanib, axitinib, lenvatinib, sunitinib, pazopanib, sorafenib, temsirolimus, nivolumab, pembrolizumab, ipilimumab in renal context).
Source endpoints used: ClinicalTrials.gov v2 API, PubMed E-utilities (5-year window), SEC EDGAR (20 RCC-focused tickers including MRK, EXEL, PFE, BMY, EISI, NVS, BAYRY, AVEO, TLX), FDA openFDA, WHO ICTRP (returned 404, excluded), and bioRxiv preprints.
Raw pull: 1,032 deduplicated signals across the four working sources.

Post-Process Filter · Quality Gate

A regex-based RCC-relevance filter was applied to drop drug-name false positives. Signals were retained only if the title or body contained explicit RCC terminology (clear cell renal cell carcinoma, renal cell carcinoma, kidney cancer, nephrectomy, von Hippel-Lindau, papillary RCC, chromophobe RCC), or named an RCC-approved therapeutic in unambiguous RCC context (belzutifan, Welireg, cabozantinib, Cabometyx, tivozanib, Fotivda, sunitinib, axitinib, pazopanib). SEC filings from the 20 RCC-focused pharma tickers were retained as the regulatory paper trail of those companies.

Result

1,032 raw → 421 high-confidence retained (40.8% retention rate).
Source breakdown: ClinicalTrials 35, SEC 40, PubMed 346.
Severity: Critical 2, High 3, Medium 416.
Triple-tagged: every retained signal carries [ccrcc, renal, oncology] so it surfaces from any of the three chip filters on /signals and /atlas.

Why this matters: Methodology variance between verticals matters. The other six verticals use a wider net, top-12 keywords against general PubMed and ClinicalTrials.gov queries, with no post-process filter. ccRCC uses a narrower, RCC-specific net with a quality-gate filter. The other six verticals report 95%+ retention; ccRCC reports 40.8%. Disclosing this prevents the appearance that all verticals follow identical pipelines.

Reproducibility

The full ingest manifest is at tools/scout-ingest/sources/ccrcc.yml in the project repo. The keymap that drives secondary tagging is at tools/scout-ingest/normalizer.py. The post-process filter logic is in the run log at tools/scout-ingest/runs/.

Corpus counting · deduplication disclosure

Signals tagged in multiple verticals (e.g., a SEC filing classified as both oncology and renal) appear in both indication feeds with the same canonical signal ID. The chip filter on /signals deduplicates these at render time so users see each unique signal once, but raw per-feed counts will sum to more than the deduplicated unique total.

As of 2026-05-08, the deduplicated corpus is:

10,020 signals across the 6 vertical feeds (raw sum, with cross-feed duplicates)
7,643 unique signal IDs after cross-feed deduplication
421 signals in the dedicated ccRCC subset, of which 361 are net-new IDs not present in the 6-vertical pool
8,004 unique signals total across 9 indications after deduplication
270 entities across 6-vertical atlas seeds + 31 RCC-focused entities in the ccRCC seed (some overlap with renal and oncology atlas seeds)

Partners modeling allocations should use the deduplicated unique total (8,004) for capacity claims; the per-feed raw totals (e.g., 1,706 in renal, 1,761 in oncology) are valid for indication-specific scoping.

SECTION 09 · ADDED 2026-05-28

Knowledge Graph Architecture · Current State and Roadmap

The AIMN corpus currently operates as a collection of indication-scoped signal feeds linked to entity registries by string matching. This section documents the architecture gap between the current state and the Knowledge Graph target, where every signal is formally linked to a canonical entity record, every entity has navigable verdict history, and every VERDICT carries contributor attribution traceable to primary sources.

Current Architecture (Implemented)

Signal feeds: 9 indication-scoped JSON feeds, 8,004 unique deduplicated signals. Each signal carries: id, source, source_name, date, category, severity, title, sponsor, summary, body, therapeutic_areas[], provenance.
Entity registries: per-indication companies-{indication}.json files (name, ticker, hq, type, tags[], dossier_summary) and a 65-entity ATLAS seed (data/atlas-companies.json: name, lat, lng, type, pipeline, signal_count, phase3).
Signal→Entity link: free-text sponsor field only. No foreign key. String match at render time.
VERDICT: AIMN:VERDICT product UI and tier-access control implemented. Synthesis pipeline is not yet instantiated, verdict content in the current release is methodologically described but computationally deferred.

Knowledge Graph Target (Planned)

Entity manifest: canonical entity registry with stable UUID per entity, deduplicated across all indication files. Prerequisite for all other graph work.
Signal→Entity FK: entity_id written to each signal record via 4-pass resolution (exact name, alias, fuzzy, ticker). Estimated 85%+ auto-resolution; residual flagged for manual review.
Entity→Signal array: signal_ids[] on each entity record, enabling entity-scoped signal navigation from ATLAS and /signals.
Verdict schema: structured Verdict node with signal_ids[], cluster_scores, confidence, recommendation, contributor_ids[], expires_at. Stored in Supabase, accessed via REST (service-role key server-side only).
Contributor registry: canonical contributor records (FDA, NEJM, ClinicalTrials.gov, SEC EDGAR, ASCO, etc.) with credibility scores computed from peer-review status, impact factor, and regulatory authority class. Signal→Contributor and Verdict→Contributor edges built from provenance URL domain matching and source_code normalization.

Why this matters for the product trust claim: The current platform delivers signal discovery and entity context. The Knowledge Graph layer delivers navigable attribution: "this VERDICT is based on these 23 signals, sourced from FDA (6), NEJM (4), ClinicalTrials.gov (7), and SEC EDGAR (6), with credibility scores attached to each source." That is the difference between a research tool and an accountability infrastructure. The target architecture is documented in full in KNOWLEDGE_GRAPH_SCHEMA_PLAN.md, SIGNAL_ENTITY_RELATIONSHIP_MAP.md, VERDICT_ROUTING_SPEC.md, CONTRIBUTOR_CONTEXT_ROUTING_SPEC.md, and ATLAS_SIGNAL_ROUTING_SPEC.md (internal architecture docs, available to institutional partners under NDA).

Implementation Sequence

The single highest-leverage prerequisite is the entity manifest, a canonical registry with stable IDs. Every other Knowledge Graph capability (signal FK resolution, entity→signal navigation, ATLAS hotspot clustering, VERDICT→Contributor attribution) is blocked by this single 4-hour build. Once the manifest exists, the graph assembles in order.

GStack projection: Knowledge Graph score moves from 52 → 85 after full Phase 3 implementation. This is the largest single point-swing available in the platform's architecture. It does not require new data collection, it requires structural linking of the data already present.