DOC TYPEMethodology Reference
VERSION1.2
ISSUED2026-Q1
STATUSActive
ALIGNMENTICH E9(R1) · EMA AI/ML · GCP E6(R3)
Regulatory Simulation Methodology
Validation Against Historical FDA Decision Outcomes

This document describes the adversarial validation methodology underpinning the AimwellBio PROCEED / DELAY / KILL verdict system. It covers the four-agent audit architecture, ground truth framework, historical decision corpus, verdict generation thresholds, and alignment with ICH statistical and clinical guidance. This methodology is the operational core of what we call REGTECH — the emerging discipline of applying institutional-grade adversarial validation to AI-generated biopharma science before it enters regulatory, investment, or clinical decisions.

Download Full Methodology (PDF)
methodology-v1.2.pdf  ·  28 pages  ·  Members only — create a free account to download
Access PDF →
AW · METHODOLOGYREGULATORY CALIBRATION CORPUS
METHODOLOGY · REGULATORY CALIBRATION CORPUS 4 frameworks · quarterly recalibration · 40,000+ historical decisions CORPUS v1.2 · MAY 2026 ICH E9(R1) STATISTICAL PRINCIPLESFDA CDER GUIDANCE FOR INDUSTRYEMA AI/ML QUALIFICATION OPINIONICH E10 CLINICAL TRIAL DESIGN 2018 2020 2022 2024 2026
ICH · FDA · EMA · quarterly recalibration against 40,000+ historical decisions
SECTION 01

Overview: What Adversarial Validation Means

Adversarial validation is not peer review. It is not quality assurance. It is a structured attempt to find the flaw in a piece of AI-generated biopharma science before that flaw is found by the FDA, a trial monitoring committee, or an investment committee during due diligence.

The AimwellBio methodology runs four independent AI agents against every brief submitted for validation. Each agent approaches the brief from the adversarial position of a different institutional challenger — regulatory reviewer, competitive intelligence analyst, capital risk assessor, and clinical trial failure analyst. The four agents do not collaborate during the audit. They produce independent findings, which are then synthesized into a consensus verdict.

The output is a PROCEED, DELAY, or KILL verdict — with a full audit trail showing exactly which agent identified which risk, against which ground truth reference, with what confidence level. The verdict is not a recommendation. It is an institutional accountability document.

Core principle: Every AI-generated biopharma brief contains at least one assumption that has not been validated against historical regulatory outcomes. The methodology exists to find that assumption before it compounds into a decision.

SECTION 02

The Validation Problem in AI-Generated Biopharma Science

AI-generated biopharma intelligence has proliferated faster than the institutional capacity to verify it. Clinical briefs, competitive landscapes, regulatory filing summaries, and trial design recommendations are now routinely generated by large language models and delivered to decision-makers as if they carried the same epistemic weight as data-grounded analysis.

They do not. Large language models optimize for plausibility, not accuracy. A model trained on regulatory literature will generate statistically plausible regulatory arguments. Those arguments will be internally coherent. They will not be grounded in the specific approval history, complete response letter precedents, or endpoint failure rates that a regulatory expert at FDA would reference.

The consequence is not that AI is wrong. It is that AI is selectively wrong in the direction of the outcome the user was hoping for — because the prompts are written by people who have a position. The adversarial agent is the antidote: it is prompted to disprove, not to confirm.

This is the validation problem that REGTECH is designed to solve. It is not a future problem. It is the reason organizations are receiving Complete Response Letters on submissions that AI-generated analysis declared low-risk.

SECTION 03

The Four-Agent Audit Architecture

The validation layer deploys four independent agents in parallel against every brief. Each agent operates from a distinct adversarial position and references a distinct ground truth corpus. The agents are not aware of each other's findings during the audit phase — independence is a design requirement, not a feature.

Agent 01 — Regulatory
FDA Review Simulation

Simulates the regulatory reviewer's perspective. Cross-references the brief against historical FDA decisions for the relevant therapeutic area, clinical endpoint type, and statistical design. Identifies gaps between the brief's regulatory claims and precedent.

Agent 02 — Competitive
Competitive Landscape Displacement

Maps the competitive landscape at the projected approval date, not the brief date. Identifies approved or late-stage competitors the brief does not account for, and quantifies the market access risk of a landscape the sponsor is not licensing against.

Agent 03 — Clinical Failure
Trial Failure Pattern Recognition

Audits the clinical assumptions in the brief against historical trial failure modes in the same indication. Flags enrollment assumptions, endpoint definitions, and inclusion/exclusion criteria that match the failure signatures in the historical corpus.

Agent 04 — Capital Risk
Investment Committee Stress Test

Constructs the downside scenarios the brief did not model. Quantifies capital exposure under regulatory delay, competitive displacement, and trial failure scenarios. Produces the IC stress package the brief's sponsors had no incentive to include.

After the independent audit phase, a synthesis layer produces the consensus verdict. Where agents disagree, the disagreement is documented — not suppressed. A split verdict (e.g., Regulatory PROCEED, Clinical DELAY) is a DELAY at the consensus level, with the specific risk source identified in the audit trail.

SECTION 04

Ground Truth Framework

The adversarial agents are only as credible as their reference corpus. The AimwellBio ground truth framework is built from publicly available regulatory and clinical data — not proprietary or confidential sources. This is deliberate: institutional credibility requires verifiability.

  • Historical FDA Decisions: 40,000+ approval decisions, complete response letters, and refuse-to-file actions indexed by therapeutic area, endpoint type, statistical design, and sponsor size.
  • ClinicalTrials.gov Registry: Trial registrations, protocol amendments, trial failures, and discontinuation reasons cross-referenced against the brief's clinical design claims.
  • Published Clinical Trial Outcomes: PubMed-indexed trial results, including negative and null results, structured by indication and endpoint category.
  • Regulatory Guidance Documents: FDA CDER guidance for industry, EMA scientific guidelines, ICH harmonized tripartite guidelines, and WHO technical guidance documents — referenced specifically rather than treated as training background.
  • Publicly Available Regulatory Correspondence: FDA Advisory Committee meeting transcripts, warning letters, and publicly disclosed CRL response documents where available.
  • Competitive Pipeline Data: Drugs@FDA database, EMA EPAR, and ClinicalTrials.gov pipeline registrations for competitive landscape displacement analysis.

Scope boundary: The ground truth corpus does not include proprietary sponsor data, confidential regulatory submissions, or non-public clinical datasets. Validation outputs are based solely on publicly available ground truth. This is a design choice: institutional defensibility requires that the reference corpus be auditable.

SECTION 05

Verdict Generation: PROCEED / DELAY / KILL

The three-verdict framework is designed for institutional decision-making — not analytical nuance. Each verdict has a specific operational meaning that does not require interpretation at the board level.

PROCEED

No material adversarial finding across all four agents. The brief's core claims are consistent with historical regulatory outcomes for analogous assets. Proceed with the decision this brief was prepared to support. Audit trail archived.

DELAY

At least one agent identified a material discrepancy between the brief's claims and historical ground truth. The decision this brief supports should be delayed pending resolution of the identified risk(s). Specific flags documented in audit trail.

KILL

Multiple agents identified material failures, or at least one agent identified a structural flaw that invalidates the brief's core premise. The decision this brief was prepared to support should not proceed in its current form. Full failure analysis in audit trail.

Verdict confidence is expressed as a percentage. A 94% confidence DELAY means that across the historical corpus of analogous audit scenarios, 94 out of 100 reached the same verdict when the same risk pattern was present. Confidence intervals are included in the full audit trail and calibrated quarterly against new regulatory decision data.

SECTION 06

ICH & Regulatory Alignment

The methodology is aligned with — not certified by — the following regulatory guidance frameworks. Alignment means the validation logic and statistical treatment referenced in each guideline are incorporated into the relevant agent's audit criteria. It does not constitute endorsement by the referenced bodies.

Guideline Alignment Area
ICH E9(R1) · 2020 Statistical methodology for clinical trials; estimand framework for regulatory claims; sensitivity analysis requirements. Agent 01 (Regulatory) references E9(R1) when auditing endpoint definitions and statistical design claims.
ICH E10 Choice of control group in clinical trials. Agent 03 (Clinical Failure) cross-references control group selection against E10 guidance when auditing trial design assumptions.
ICH E6(R3) · GCP Good clinical practice standards for trial conduct. Agent 03 references E6(R3) compliance indicators when auditing clinical design assumptions against site feasibility and protocol adherence risk.
FDA CDER Guidance for Industry Therapeutic area-specific guidance documents are referenced by Agent 01 for endpoint acceptability, labeling implications, and approval pathway claims.
EMA AI/ML Qualification Opinion AI-specific regulatory considerations are incorporated into the framework's own documentation standards — ensuring the validation outputs themselves meet emerging AI governance expectations.
SECTION 07

Limitations & Scope

The methodology validates the consistency of AI-generated science against publicly available ground truth. It does not validate proprietary data, clinical data not yet in the public record, or decisions that have no historical regulatory analogue. Novel mechanisms, first-in-class assets, and unprecedented regulatory pathways require human expert judgment that the adversarial agent layer is designed to flag — not replace.

The verdict is an institutional accountability tool, not a regulatory determination. A PROCEED verdict does not mean the FDA will approve. A KILL verdict does not mean the asset is unviable. It means the brief, as presented, failed the adversarial test against the historical ground truth corpus available at the time of audit.

Methodology updates are published quarterly as new regulatory decision data is incorporated. Members receive notification when corpus updates materially affect active verdict confidence scores.

SECTION 08 · ADDED 2026-05-08

ccRCC Build Methodology — Disclosure Note

Clear cell renal cell carcinoma was added as the seventh vertical via a different ingest pipeline than the original six indications. We disclose the method here so partners know exactly what they are looking at and how it differs from the rest of the corpus.

Build Approach

  • Indication-specific keyword set: “clear cell renal cell carcinoma”, “renal cell carcinoma”, “ccRCC”, “kidney cancer”, “von Hippel-Lindau”, “papillary renal cell”, “chromophobe renal cell”, “metastatic renal cell” — combined with the full RCC drug list (belzutifan, cabozantinib, tivozanib, axitinib, lenvatinib, sunitinib, pazopanib, sorafenib, temsirolimus, nivolumab, pembrolizumab, ipilimumab in renal context).
  • Source endpoints used: ClinicalTrials.gov v2 API, PubMed E-utilities (5-year window), SEC EDGAR (20 RCC-focused tickers including MRK, EXEL, PFE, BMY, EISI, NVS, BAYRY, AVEO, TLX), FDA openFDA, WHO ICTRP (returned 404, excluded), and bioRxiv preprints.
  • Raw pull: 1,032 deduplicated signals across the four working sources.

Post-Process Filter — Quality Gate

A regex-based RCC-relevance filter was applied to drop drug-name false positives. Signals were retained only if the title or body contained explicit RCC terminology (clear cell renal cell carcinoma, renal cell carcinoma, kidney cancer, nephrectomy, von Hippel-Lindau, papillary RCC, chromophobe RCC), or named an RCC-approved therapeutic in unambiguous RCC context (belzutifan, Welireg, cabozantinib, Cabometyx, tivozanib, Fotivda, sunitinib, axitinib, pazopanib). SEC filings from the 20 RCC-focused pharma tickers were retained as the regulatory paper trail of those companies.

Result

  • 1,032 raw → 421 high-confidence retained (40.8% retention rate).
  • Source breakdown: ClinicalTrials 35, SEC 40, PubMed 346.
  • Severity: Critical 2, High 3, Medium 416.
  • Triple-tagged: every retained signal carries [ccrcc, renal, oncology] so it surfaces from any of the three chip filters on /signals and /atlas.

Why this matters: Methodology variance between verticals matters. The other six verticals use a wider net — top-12 keywords against general PubMed and ClinicalTrials.gov queries, with no post-process filter. ccRCC uses a narrower, RCC-specific net with a quality-gate filter. The other six verticals report 95%+ retention; ccRCC reports 40.8%. Disclosing this prevents the appearance that all verticals follow identical pipelines.

Reproducibility

The full ingest manifest is at tools/scout-ingest/sources/ccrcc.yml in the project repo. The keymap that drives secondary tagging is at tools/scout-ingest/normalizer.py. The post-process filter logic is in the run log at tools/scout-ingest/runs/.

Corpus counting — deduplication disclosure

Signals tagged in multiple verticals (e.g., a SEC filing classified as both oncology and renal) appear in both indication feeds with the same canonical signal ID. The chip filter on /signals deduplicates these at render time so users see each unique signal once, but raw per-feed counts will sum to more than the deduplicated unique total.

As of 2026-05-08, the live deduplicated corpus is:

  • 10,020 signals across the 6 vertical feeds (raw sum, with cross-feed duplicates)
  • 7,643 unique signal IDs after cross-feed deduplication
  • 421 signals in the dedicated ccRCC subset, of which 361 are net-new IDs not present in the 6-vertical pool
  • 8,004 unique signals total across 7 verticals after deduplication
  • 270 entities across 6-vertical atlas seeds + 31 RCC-focused entities in the ccRCC seed (some overlap with renal and oncology atlas seeds)

Partners modeling allocations should use the deduplicated unique total (8,004) for capacity claims; the per-feed raw totals (e.g., 1,706 in renal, 1,761 in oncology) are valid for indication-specific scoping.

REGTECH · The Future of Regulatory Decision Infrastructure

This is not a feature. It is the direction the industry is moving — and the gap between organizations that validate their AI-generated science and those that do not is widening every quarter.

REGTECH — regulatory technology applied to the validation of AI-generated biopharma intelligence — is not an emerging category. It is a present requirement that most organizations are ignoring. The FDA is already developing internal capabilities to detect AI-generated regulatory submissions. The question is whether your submissions will survive that scrutiny.

Every organization in this industry is now making regulatory, investment, and clinical decisions informed by AI-generated science. The organizations building adversarial validation into that pipeline now are creating an institutional capability that compounds. The organizations that are not are accumulating unverified assumptions at the same rate — but with no visibility into where the exposure sits.

The AimwellBio methodology is the first published adversarial validation framework built specifically for AI-generated biopharma science. Members have access to the full methodology, the verdict corpus, and the capability to run their own audits. Non-members are making decisions against a brief that no one has challenged.

Access the Validation Layer →
Member Access

What the organizations already inside this system have access to — that you do not.

The methodology is publicly available. The validation layer — the capability to run your own PROCEED / DELAY / KILL audits against this framework — is members-only. Here is what is already running inside member organizations right now.

Live Adversarial Audit Runs

Submit any AI-generated brief for a real-time four-agent adversarial audit. PROCEED / DELAY / KILL verdict with full audit trail delivered within 3 minutes.

Verdict History & Audit Trail

Every verdict archived with full agent-level finding detail. Present to your investment committee or regulatory team as a documented due diligence artifact.

Regulatory Signal Monitoring

Continuous monitoring of FDA decisions, CRL patterns, and approval precedents in your therapeutic area. Verdict confidence scores updated quarterly with new ground truth data.

Board-Ready Verdict Packages

Pre-formatted audit output structured for investment committee or board presentation. Verdict-first, no interpretation required, full agent source documentation included.

Methodology Corpus Access

Full access to the ground truth corpus: 40,000+ historical FDA decisions, CRL patterns, and trial failure data that power the adversarial agents. Query the corpus directly.

Team Validation Workspace

Multi-seat workspace for regulatory, BD, and clinical teams. Share audit findings across functions. One validation layer, visible to every stakeholder who needs it.

THE ORGANIZATIONS ALREADY USING THIS ARE BUILDING A VALIDATION ADVANTAGE THAT COMPOUNDS.

Every audit run trains your institutional memory for what AI-generated biopharma science gets wrong. The longer you wait, the larger the gap between what they know and what you are guessing at.

Request Access to the Validation Layer →
Ready to Run an Audit?

The FDA will find the flaw. We find it first.

Submit your first AI-generated brief for adversarial validation. PROCEED / DELAY / KILL verdict in under 3 minutes. Full audit trail included.

For institutional & sovereign mandates: view sovereign deployment options