AAV-ML for Experimentalists #4: Where ML Is Being Applied in AAV Engineering + What to Expect

TL;DR ML in AAV engineering goes far beyond capsid design; 11 application areas, ranked by maturity.

Most conversations about ML in AAV focus on one thing: designing better capsids.

But the field is broader than that — and it has been quietly expanding for years. ML is now touching manufacturing, vector genome quality, regulatory elements, receptor identification, and even automating parts of R&D workflows. If you've been tracking only the capsid engineering headlines, you've been seeing a fraction of what's actually happening.

The application map in this post grew out of a list I started in 2018 to track where ML was showing up in the AAV field. What began as a handful of entries, mostly packaging prediction and early tropism work, has grown into 11 distinct application areas as the field matured, diversified, and moved from proof-of-concept to something closer to infrastructure. This post is that map, made accessible.

The goal is practical: if you're an AAV experimentalist, you should know what ML can touch in your workflow, not just capsid design, but your manufacturing bottlenecks, your vector genome quality concerns, your regulatory element choices, your receptor targets. You should also know which of these applications is ready to use today, which requires careful validation before trusting, and which is still more concept than practice. Knowing the difference protects you from hype and helps you invest your effort wisely.

Think of maturity as your risk dial. High maturity means validated by independent groups, in multiple contexts, with realistic benchmarks you can hold vendors and collaborators accountable to. Low maturity means the idea is sound and early results are intriguing, but don't build your program around it yet. Everything in between requires judgment about how closely your situation resembles the published context.

This is what the AIxAAV Interpreter is for: not just reporting what's being done, but helping you understand what it means for your work.

Application 1: Capsid Production Fitness

Can it package?

What it is: Predicting whether a capsid variant will assemble and package DNA.

Why it matters: Production fitness is the first filter. If a variant can't package, nothing else matters. Predicting this early saves synthesis and screening costs.

Key work:

Dyno's foundational datasets on the AAV2 28-mer region, scaling from single mutants to 200K+ variants with 12-29 mutations (ASGCT 2018 #360, 2019 #97, #183, 2020 #541)
Fit4Function (Eid, Deverman lab, Broad Institute): Introduced a critical methodological insight: instead of training on biased existing libraries, we first used ML to generate a library that uniformly samples the manufacturable sequence space. This "production-fit-first" approach ensures that downstream functional screens aren't confounded by packaging failures, and that training data is reproducible across experimental batches. The Fit4Function library design became the foundation for accurate multi-trait models (Application 3 below). (ASGCT 2021 #309, 2022 #1201; Nature Communications 2024)
Marques/Zolotukhin on a 33AA region achieving 78% accuracy with their own dataset (ASGCT 2020 #159)
Sanofi using UniRep embeddings + boosting on Dyno's public single-mutant data (ASGCT 2023 #431)
Voyager's transformer-based approach for VR8 peptide insertions (ASGCT 2024 #974)
Multiple PLM/LLM approaches including AAV-RoBERTa trained on 400M variants across serotypes (ASGCT 2024 #485) and BioMap's xTrimoAAV (ASGCT 2024 #1469)

What to expect: 2-5x enrichment for viable capsids is realistic. Models trained on sufficient data can predict packaging with high accuracy, even for variants with 10+ mutations from wild-type.

Maturity: Very High. Production fitness prediction has reached a level of consensus rare for any ML application in biology, especially since it is also applied in the larger protein engineering field. The foundational datasets are large and public, multiple independent groups have replicated the core finding, and the approach has held up across diverse ML architectures. What is not yet fully mature: cross-serotype transfer. Most validated models are AAV2-centric (because of Dyno's public data availability). Applying a model trained on AAV2 to predict fitness of an AAV9 variant is an extrapolation, not a validated use case. As PLM-based approaches extend training across serotypes, this is improving, but not yet at the same confidence level as within-serotype AAV2 prediction.

What this means for your program: If you're designing a peptide insertion or multi-mutant library, ML-guided fitness filtering is ready to use and should be in your workflow; just make sure you understand how similar or different the training data is to your use cases.

Application 2: Transduction and Tissue Tropism

Will it get to my target tissue?

What it is: Predicting transduction efficiency in specific cell types, tissues, or organs.

Why it matters: This is the end goal for most therapeutic applications.

Key targets being pursued:

CNS/BBB: Shape's massive diversity screening in NHP (ASGCT 2022 #130), Dyno's BBB-crossing variants (ASGCT 2023 #382, 2024 #301), Schaffer lab's 7-mer AAV5 achieving 5-fold packaging improvement and 10-fold brain infection success (ASGCT 2023 #107), Deverman lab's BI-hTFR1 capsid engineered to bind human transferrin receptor for brain-wide delivery (Science 2024)
Retina: Dyno's snRNA-seq for cell-type specificity (ASGCT 2022 #934, #936, 2024 #516)
Muscle/Cardiac: Affinia's ATC-0108 (ASGCT 2024 #605), Dyno's Dyno-bn8 (GATC 2025)

Liver detargeting: Consistently pursued as secondary objective. Fit4Function (Eid, Deverman lab, Broad Institute) showed that models trained only on mouse in vivo and human in vitro data (cell lines) could predict macaque liver biodistribution, one of the most compelling cross-species results in the field (Eid et al., Nature Communications 2024).

What to expect: Enrichment is real but context-dependent. A model trained on mouse CNS may not predict NHP CNS with high precision.

Maturity: High, but heavily context-dependent. Tropism prediction is validated well enough to use but confidence varies enormously by target tissue, species, and how far the model is being asked to extrapolate. What is not mature: predicting in vivo performance from in vitro data, especially for engineered variants. A systematic review of all engineered CNS/ocular capsids tested in NHP (Genentech, 2025) found only 36 studies: NHP validation is sparse relative to the volume of mouse-validated variants. Cell-type specificity within a tissue is also less mature than organ-level tropism. While Fit4Function shows (conditional) transferability across species in liver, no further validations have been published for other organs yet.

What this means for your program: Tropism ML is a real and useful tool for prioritizing candidates before expensive in vivo studies. But validate your model's training context carefully before trusting predictions outside of it. A model trained on mouse liver says very little about human retina.

Application 3: Multi-Trait Capsid Engineering

Can I optimize for packaging AND transduction AND cross-species performance simultaneously?

What it is: Designing capsid libraries that meet multiple criteria at once, not just packaging, but transduction in specific tissues, manufacturability, and cross-species performance.

Why it matters: Real clinical candidates must satisfy multiple constraints simultaneously. Optimizing for one property at a time wastes resources and misses the true target.

Key work:

Fit4Function (Eid, Deverman lab, Broad Institute): A generalizable ML approach for systematically engineering multi-trait AAV capsids. By uniformly sampling the manufacturable sequence space and training sequence-to-function models, we combined 6 models to design a multi-trait (cross-species liver-targeted, manufacturable) library. 88% of variants validated on all 6 criteria. Critically, models trained only on mouse in vivo + human in vitro data accurately predicted AAV biodistribution in macaques, the first demonstration of ML-guided cross-species prediction. (ASGCT 2021 #309, 2022 #1201, 2023 #79; Nature Communications 2024)
Dyno's multi-property models trained across cells, organs, and species on 800K+ variants (ASGCT 2021 #23)
Dyno 2024: CNS targeting + 10x liver detargeting + maintained production (ASGCT 2024 #301)

What to expect: 80-90% success rates on multi-trait criteria when properly designed. Models may be able to predict cross-species performance from mouse + human cell data.

Maturity: Medium-High. The core methodology is peer-reviewed and the results are strong. Fit4Function demonstrated a very high validation rate across six simultaneous criteria; a 2026 Frontiers review cites this as among the strongest evidence that ML can navigate multi-trait trade-offs simultaneously. What is not mature: the approach scales with data quality for each individual trait. Multi-trait models are only as reliable as their weakest component. And some traits, particularly in vivo NHP tropism, are data-poor even for the best-resourced groups. Risks compound when models are combined.

What this means for your program: Multi-trait ML engineering is ready for use when you have sufficient training data for each individual trait and when library design is done correctly from the start. It is not a workaround for sparse data (yet). The Fit4Function paper is the best available blueprint.

Application 4: Low-Data Generative Design

What if I only have ~1,000 hits?

What it is: Using generative models to propose novel capsid variants when training data is limited.

Why it matters: Most AAV campaigns don't have hundreds of thousands of training examples. You need approaches that work with realistic data sizes.

Key work:

Autoregressive models for receptor targeting (Barry, Eid et al., Deverman lab, ASGCT 2023 #43): Model trained on ~1,200 high-performance binders. Best generated variants matched (or exceeded) the best known binders, expanding the hit pool without expanding the screen.
Sinai et al. (Church lab / Dyno, bioRxiv 2021): VAE trained on 564 natural sequences + 22,704 deep mutational scan variants of the AAV2 28-mer. Generated viable multi-mutant capsids via latent interpolation. VAEs can work, but data requirements are higher than they appear.
Huang, ..., Eid et al. (Deverman lab, Broad Institute, PLOS Biology 2023): SVAE trained on LY6A/LY6C1 binding data. More sequence-diverse than saturation mutagenesis but lower average hit rate. A direct experimental quantification of the exploration-exploitation tradeoff. Both approaches produced top performers, establishing them as complementary strategies.
Sanofi GUAAVA (ASGCT 2023 #431): Generative design with manufacturing-relevance constraints in a pharma context.
Architecture matters more than data size: Autoregressive models outperform VAEs and diffusion models at small N. Read more here.

What to expect: With the right architecture, ~1,000 training examples can generate diverse, functional variants that match (or exceed) the best known performers.

Maturity: High. The core result is established. Important to note that generative diversity doesn't automatically mean better hits. Using generative models now is common to generate better potential variants than found in AAV first round screenings and in the larger protein engineering field. What is not mature: Evidence for generative design succeeding in whole-capsid engineering, for very large number of mutants, or for truly novel functional profiles is limited. Prospective hit rates compared to appropriate baselines are published for only a handful of programs.

What this means for your program: Generative design with limited data is a legitimate strategy when you have ~1,000+ high-quality labeled examples and are trying to expand the hit pool around known functional profiles. Treat generative diversity as a complement to, not a replacement for, targeted saturation mutagenesis around your best hits.

Application 5: Receptor-Guided Design

Can I engineer capsids to bind specific cell surface proteins?

What it is: Designing capsids with de novo interactions with known receptors to enable targeted delivery.

Why it matters: Receptor-guided design is more rational than blind screening. If you know the receptor, you can engineer specific binding.

Key work:

Deverman lab, Broad Institute (BI-hTFR1): Engineered an AAV capsid to bind human Transferrin Receptor 1 (TfR1), enabling brain-wide gene delivery. The capsid was actively transported across human brain endothelial models and achieved widespread CNS transduction in NHP. ML worked behind the scence. (Science 2024)
Deverman lab, Broad Institute (LY6A/LY6C1): Using generative models, engineered peptide-modified capsids that transduce the brain through de novo interactions with mouse BBB proteins. Demonstrated that de novo receptor targeting is achievable. (PLOS Biology 2023)
Affinia's receptor-guided approach: In vitro binding to 10-15 CNS candidate receptors combined with generative AI (ASGCT 2024 #982)
Voyager: Identification of conserved receptor for BBB-penetrant capsids (ASGCT 2024 #975)
WhiteLab Genomics (GEAR consortium): ML on single-cell RNA-seq + protein databases to identify photoreceptor-targeting receptors (ASGCT 2025 #1390)

Maturity: Medium. Multiple validated examples for BBB crossing. A 2026 Frontiers review identifies receptor-defined BBB shuttles as among the field's most anticipated near-term advances. The principle of anchoring designs to human-expressed receptors with cross-species orthologs is now widely adopted by multiple groups. What is not mature: validated examples remain concentrated in BBB crossing via a handful of well-characterized receptors. Binding a receptor and achieving functional transduction are not the same thing. Receptor expression varies across species and disease states. Receptor identification itself (scRNA-seq + protein databases, structure-based ML) is in early stages across the board.

What this means for your program: Receptor-guided design is worth building into your capsid engineering strategy, particularly for CNS indications. Use human-expressed receptors with cross-species orthologs to preserve translatability. But don't assume a positive binding result guarantees functional delivery.

Application 6: Manufacturing and Process Optimization

Can I improve yield without changing the capsid?

What it is: Predicting yield from process parameters: plasmid ratios, culture conditions, purification settings.

Why it matters: Higher titers, fewer failed batches, lower cost.

Key work:

Kriya: Models predicting purified AAV output from 50L/500L equipment data (ASGCT 2023 #1302)
Apertura: Plasmid ratio optimization achieving higher titer and fewer empty capsids with fewer runs (ASGCT 2025 #378)
Asimov: Transient AAV manufacturability optimization (ASGCT 2024 #548, 2025 #956)
UNC: Yield prediction identifying critical parameters: volume, wash conductivity, elution flowrate, pH (ASGCT 2025 #192)
WhiteLab + Cytiva: Stable cell line development optimization, claiming up to 70% reduction in development timelines

Maturity: High. Growing rapidly with validated results. Multiple groups have demonstrated that ML can identify which process parameters most strongly drive yield and compress the experimental runs needed for optimization. All work points in the same direction: data-driven approaches outperform one-variable-at-a-time DoE for complex manufacturing processes. What is not mature: process-specificity is the central limitation. Models trained on one production platform do not transfer to another. Most published results are single-group internal studies, and manufacturing data is largely proprietary; the field cannot build on shared benchmarks the way capsid ML has built on public datasets. Independent cross-platform validation is essentially absent.

What this means for your program: If you're running an internal optimization campaign within a defined production system, ML-guided DOE is worth exploring. Don't expect published models to transfer to your platform; treat this as tool-assisted process development, not off-the-shelf ML deployment.

Application 7: Vector Genome Quality

Can I avoid truncations and packaging artifacts?

What it is: Predicting truncation hotspots, designing stuffer sequences, optimizing codons.

Why it matters: Truncated genomes are non-functional vectors with unknown consequences.

Key work:

UMass: CNN+LSTM model for truncation prediction achieving 98% classification accuracy (ASGCT 2024 #433, 2025 #492)
Stanford/FormBio: Stuffer sequence optimization (ASGCT 2025 #1384)
UMass: Codon optimization for enhanced transgene expression (ASGCT 2025 #494)

Maturity: Emerging-Promising. The biological problem is well-defined and the early ML results are promising. The rationale in the UMass study is sound: truncations have direct regulatory implications and predictable sequence features are a legitimate target. What is not mature: a high accuracy on a training/test split is not the same as prospective generalization. The key validation questions: how diverse and similar to the real-case scenarios was the training data, does the model generalize across transgenes and ITR configurations, has it been replicated independently, have not yet been publicly answered. Stuffer sequence and codon optimization ML are in the same position: promising early signals, no independent replication yet.

What this means for your program: Watch this space actively. Vector genome quality ML is likely to mature quickly because the assays are clean and the industry need is acute. But validate carefully before using these tools to inform regulatory decisions.

Application 8: Regulatory Elements

Can I design better promoters or enhancers?

What it is: ML-designed tissue-specific promoters and cell-type-specific enhancers.

Why it matters: Delivery is only half the equation. Specificity in expression reduces off-target effects.

Key work:

Early work on synthetic promoters for muscle and liver (ASGCT 2020 #1003)
Asimov: Tissue-specific promoter design (ASGCT 2023 #172)
CMU: CNN for cell-specific enhancer targeting (ASGCT 2024 #475)
UCLA: Skeletal muscle enhancer-trained promoter design (ASGCT 2025 #1061)

Maturity: Medium. The conceptual foundation is well-established from the broader synthetic biology field, and AAV-specific examples across liver, muscle, and CNS represent genuine experimental validation of ML-designed elements. The approach is most reliable for well-studied tissues where training data is richest. What is not mature: regulatory element function is exquisitely context-dependent in ways sequence models don't fully capture; the same promoter can behave differently depending on transgene, capsid, route, species, and disease state. Cell-type specificity at the resolution of, say, excitatory vs. inhibitory neurons requires high-resolution training data that most groups don't have.

What this means for your program: ML-designed regulatory elements are worth including in your design toolkit, particularly for liver and muscle. Treat cell-type-specific designs as experimental hypotheses that need rigorous in vivo validation.

Application 9: Structure-Guided Approaches

Can AlphaFold help?

What it is: Using predicted structures for validation, receptor docking, or design guidance.

Key work:

Kriya: AlphaFold + molecular docking for massively parallel in silico screening (ASGCT 2024 #957)
Duke: AlphaFold3 validating synthetic MAAPs' membrane binding domain (ASGCT 2025 #60)
UTSW: Rosetta + AlphaFold confirming capsid hits (ASGCT 2025 #1411)
NCH: AF2 validating miniaturized transgene designs (ASGCT 2025 #1550)

Maturity: Emerging-Medium. Using predicted structures for validation is well-established and widely adopted. AlphaFold-based analysis provides a mechanistically interpretable sanity check on sequence-derived hits, and this use case is legitimate across multiple groups. What is not mature: de novo structure-guided capsid design. Predicted monomer structures are not experimental structures, and for a 60-mer icosahedral assembly with conformational dynamics, the gap between predicted structure and functionally relevant assembly is non-trivial. Predicted structure ≠ experimental structure. Docking simulations generate hypotheses; prospective experimental validation of structure-guided designs remains limited.

What this means for your program: Use structural analysis for mechanistic interpretation and validation of sequence-based hits. Treat in silico docking predictions as hypothesis generation, not experimental replacement.

Application 10: Immune-Related Prediction

Can I predict or escape immune responses?

What it is: Two related challenges at opposite ends of the treatment timeline:

Pre-existing immunity (immune evasion): Designing capsids that evade neutralizing antibodies patients already have against natural AAV serotypes; an eligibility barrier.
Treatment-induced immunogenicity: Predicting whether the capsid or transgene will trigger new immune responses after administration; a durability and safety concern.

Why it matters: Pre-existing NAbs exclude a significant fraction of patients from AAV therapies. Treatment-induced responses can eliminate transduced cells, reduce durability, or cause safety events. Both limit who can benefit and for how long.

Key Work — Pre-Existing Immunity :

Dyno's deep diversification generating capsids "distinct from any natural AAV" with the rationale of potentially removing epitopes (ASGCT 2020 #541). Conceptually interesting, but immune escape was not directly validated
Voyager's TRACER capsids claimed to escape pre-existing neutralizing antibodies (ASGCT 2024 #973). One of the few direct claims, awaiting broader validation

Key Work — Treatment-Induced Immunogenicity:

Ultragenyx modeling B-cell epitopes and transgene-HLA interactions to predict immunogenic risk (ASGCT 2025 #1747). Early framework for risk stratification

The Hard Problems — For Immune Evasion:

Escaping antibodies requires sequences far from natural serotypes; exactly where ML models are most uncertain
PLMs trained on natural proteins may bias against the diverse, unusual sequences you need (a calibration challenge)
Removing one epitope may create new ones!
Sequence changes for immune escape may compromise packaging or transduction

The Hard Problems — For immunogenicity:

Immune responses are complex and patient-specific
HLA diversity across populations adds another layer of unpredictability
Longitudinal validation data is scarce

What to expect: Risk flagging and hypothesis generation, not reliable design tools. Claims should be viewed skeptically without rigorous validation against diverse human sera or longitudinal immunogenicity data.

Maturity: Speculative/ Early Emerging. Conceptually important, but both areas have thin experimental validation and a long road ahead. The conceptual frameworks for both immune evasion and immunogenicity prediction are well-articulated. The Ultragenyx B-cell epitope / HLA interaction work (ASGCT 2025 #1747) is a legitimate early attempt at risk stratification, and Voyager's TRACER immune evasion claims are among the few direct experimental assertions in this space. What is not mature: immune responses are patient-specific, HLA-diverse, and shaped by exposure history in ways sequence models cannot capture alone. Designing for immune evasion requires sequences far from natural serotypes; exactly where all current models are most uncertain and PLM biases are most likely to mislead. Critically, the field has a pattern of claiming immune evasion based on "distinctness from natural serotypes" without measuring neutralization against a diverse panel of human sera. Conceptual distinctness is not experimental immune evasion.

What this means for your program: These tools are worth following closely because the clinical stakes are high. But right now they are risk-flagging frameworks, not validated design tools. Any claim of "immune evasion" or "reduced immunogenicity" should be challenged with: validated against what and in whom?

Application 11: Agentic AI and Workflow Automation

Can AI run parts of R&D for me?

What it is: AI agents that automate experimental design, data analysis, literature review, and decision-making in R&D workflows.

Why it matters: This addresses the "last mile" problem: connecting AI predictions to actual lab decisions.

Key work:
1. Dyno (GATC November 2025): Launched an expanded AI agent platform with three specialized agents:

Parser agent: Autonomous agent that participates in both data and decision flow from raw lab experiments into models and product discovery
Knowledge agent: Distills information across documents and data systems into executive summaries, allowing users to oversee the full history of gene therapy products
Structure agent (p0): Streamlines reasoning around protein structures, receptor targets, and payload engineering

Beta testers are currently being recruited.

2. Dyno + NVIDIA Psi-Phi (GTC March 2026): Launched an agentic AI suite for protein binder design:

Dyno Psi-1: First open-weight model from the Psi protein design family; flow-matching backbone generative model optimized for complex multimeric interfaces; prioritizes structural diversity and controllability
Dyno Phi agents: Experimentally-grounded filtering methods that connect in silico benchmarks to real-world outcomes
Psi-Phi Claude Code Skills: One-click extension providing authenticated API access to the same GPU-backed infrastructure directly within Claude Code for conversational protein design

Built on NVIDIA's La-Proteina family of models and trained on DGX Cloud using Hopper GPUs. The platform aims to make protein binder design more consistent and scalable, directly addressing the disconnect between computational optimization and experimental validation.

3. WhiteLab Genomics (ASGCT 2025 #1790): OLIVIA: a tool combining Large Language Models + Retrieval-Augmented Generation + knowledge graphs to structure and analyze heterogeneous lentiviral vector bioproduction data. Supports decision-making on production protocols.

Maturity: Very early / Emerging-Promising. The first explicitly AAV-focused agentic platforms are now live. What is not mature: there are no published prospective evaluations of agentic AI in AAV programs. No benchmark for error rates, decision quality, or hit rates from agent-assisted campaigns versus controls. This is expected for first-generation tools. But it means the maturity assessment must be honest. The broader drug discovery field is in a wave of agentic AI deployment that is outrunning validation, and AAV is no exception. I personally tend to think about agentic AI as 'software engineering' rather than 'machine learning', so I expect very fast maturity with proper software engineering testing and validation practice.

What this means for your program: Pay close attention to how these tools are positioned: augmenting scientist judgment or replacing it? The current framing is augmentation, which is right. Evaluate them on specific, measurable tasks within your workflow rather than adopting wholesale. The field will look very different in 2–3 years.

The Key Questions for Any Application

Every application in this post, mature or emerging, should be evaluated through the same lens before you act on it. These are the questions I ask:

1. Was this validated experimentally, or only computationally? Computational validation is a hypothesis. Experimental validation in a relevant system is evidence. 2. In what context? Serotype, species, cell type, assay, delivery route. The context is the result. 3. How similar is that context to mine? The closer the match, the more you can borrow. The further, the more you need to validate. 4. What's the realistic enrichment? 2-5x is meaningful and worth building on. 100x claims warrant skepticism; ask for the baseline. 5. Who validated it, and has anyone else replicated it? Internal single-group data is a starting point. Independent replication is evidence. Both are different from marketing. These questions don't change as the field evolves. What changes is how much evidence is available to answer them.

Final Thought

ML is expanding across the AAV pipeline fast. The 11 applications in this post didn't all arrive at once. Some, like production fitness prediction, have been building since 2018 and are now mature enough to treat as standard practice. Others appeared only in the last two years and are still finding their footing. That arc matters, because it tells you something useful: the applications that feel speculative today are often on the same trajectory as the ones you now take for granted, but you cannot take them for granted now.

Maturity is not a fixed label. It's a snapshot of where the evidence stands right now, in specific contexts, tested by specific groups. The right habit is not to memorize this map, but to internalize the questions it implies: What problem is ML actually solving here? In what context was it validated? How similar is that to mine?

Those questions apply whether you're evaluating a vendor pitch, a conference abstract, or a paper in press. And they don't expire — because the field won't stop moving.

Next in the series: Evaluating What You Hear — how to assess ML claims in AAV without being an ML expert.

PS: This is what The AIxAAV Interpreter is for: translating ML methods into actionable AAV engineering strategies. Follow me on LinkedIn for more practical insights that accelerate bio-innovation.

Search This Blog

The AI × AAV Interpreter