Machine Learning in AAV Manufacturing: What's Real, What's Not, and How to Tell

What the published evidence actually supports, and what's being sold ahead of it.

By Fatma Elzahraa Eid, PhD | TheBioMLClinic

Brief disclosure: I am an independent AAV-ML scientist with two decades of experience in machine learning and biological problems, and I provide vendor evaluation and technical due diligence services through TheBioMLClinic, which means I have a direct professional interest in the topic of this post. The vendor archetypes described are illustrative composites drawn from public market activity, not characterizations of any specific company. This post does not constitute procurement, investment, business, or regulatory advice. Full disclosures at the bottom.

Sit through enough pitch meetings about AI for AAV manufacturing and you start to notice a pattern. The slides are clean. The founders' pedigrees are real. The funding is increasingly serious. And the numbers, when there are numbers, sit at very different tiers of evidence than the confidence of the delivery suggests.

This post is for the executive who has been pitched twice this quarter on "AI-driven AAV manufacturing" and is trying to figure out which pitch deserves a follow-up technical meeting and which one is closer to vaporware. It is also for the AAV scientist who wants to understand the landscape of what AI can and cannot do now for AAV manufacturing, or whose CEO has asked them to evaluate one of these companies, and who needs an organized way to think about what is published, what is plausible, and what is hype.

The current state of the field is more interesting than either camp tends to admit. There is real, peer-reviewed machine learning (ML) work delivering meaningful results in AAV manufacturing. There is also a layer of pitch content moving faster than the evidence supports. The work of an honest executive read is separating the two, and the work this post tries to do is give you the framework to do it.

I'll lead with the strongest published result the field has produced, because it anchors everything else.

The benchmark to anchor everything against

In May 2025, a collaboration between groups at the University of North Carolina at Chapel Hill and North Carolina State University posted a preprint (with code made public), now published in Biotechnology and Bioengineering (2026, doi:10.1002/bit.70159), that demonstrated closed-loop Bayesian optimization of AAV affinity chromatography. I covered this preprint last year here.

Across three clinically relevant serotypes (AAV2, AAV5, and AAV9), the framework drove yields from a baseline of 70% to between 95 and 99 percent. It used roughly 30 experiments per serotype to find optimal conditions in a parameter space containing more than 930,000 unique combinations. For AAV2, the affinity capture step alone reduced host cell protein content from 223 μg/mL to under 2.5 μg/mL. The paper's reported reduction is up to 200-fold.

This is, to my knowledge, the most rigorous AAV-specific machine learning result published in the manufacturing literature to date. The methodology is sound. The serotype transfer is real; data from the AAV2 campaign was used to seed the AAV9 campaign, which then reached optimum in two cycles instead of three. The conflict of interest, that one senior author is the chief technology officer of the resin manufacturer, is disclosed openly in the paper. And, importantly, the limitations are stated honestly.

That last point matters more than it might appear. The same paper notes that for AAV9, even after Bayesian-optimized affinity capture, residual host cell protein at a Zolgensma-equivalent clinical dose of approximately 1.1 × 10¹⁴ viral particles was 2.1 μg per dose, still above the limit acceptable for in vivo administration. A polishing step, typically anion exchange chromatography, is standard practice in every conventional AAV process, and the affinity capture step is not designed to clear residual HCPs to clinical limits on its own. That is not a critique of the paper. The point is sharper than that. ML optimization of one unit operation, even one done with this level of methodological care, does not eliminate the need for the other steps in the process, and any pitch claiming that ML alone has solved AAV downstream manufacturing should be evaluated against that fact.

If you remember nothing else from this post, remember this distinction. It is the single most useful lens you can bring to any AAV manufacturing AI pitch you encounter. The category where ML has delivered is closed-loop optimization of a defined unit operation in a defined facility, when the labels are clean and the search space is constrained. The category where ML is still mostly being talked about is cross-platform, cross-serotype, cross-facility, end-to-end manufacturing intelligence. Pitches sometimes blur the two. The diligence work is keeping them separate.

Why this field is structurally harder for ML than it looks

Before walking the workflow, it helps to understand why AAV manufacturing data has properties that any machine learning approach has to contend with. None of these are solved by better algorithms.

Inter-laboratory titer variation in AAV has been documented at one to two orders of magnitude, driven primarily by reference standard differences and quantification method choice. The coefficient of variation for qPCR titering can show inter-replicate differences of approximately twofold, while digital droplet PCR brings the coefficient of variation to under 10 percent. A meaningful fraction of legacy training data was, however, generated by qPCR. Models trained on convolved data inherit the convolution.

The labels themselves are often more complex than they appear. "Transduction efficiency" packages binding, uptake, intracellular trafficking, uncoating, and transgene expression into a single number. A model predicting transduction efficiency is predicting the convolution of five biological processes, each with its own variance. The same is true for many manufacturing CQAs. Genome integrity, for instance, depends on packaging fidelity, reverse-packaging artifacts, and ITR truncation patterns; a single ddPCR readout cannot disaggregate these.

The dataset size problem is harder still. A typical Phase 1 process produces only tens of engineering runs in total, far fewer than what most modern ML paradigms are designed to use. Modern ML, in its dominant paradigm, expects datasets one to four orders of magnitude larger. The methods that work in this regime are not the methods that win on AI for images, text or coding. They are Gaussian processes, Bayesian optimization, hybrid mechanistic-ML approaches, and active learning frameworks designed for small-N regimes. When a pitch describes deep learning trained on customer manufacturing data, the right question is not whether the model is sophisticated. The right question is whether the data could possibly support it.

These constraints are real. They are also not the field's permanent ceiling. They are the reason why the most credible work in AAV manufacturing ML so far has been at the unit-operation level, with explicit data-efficiency methodology, on bounded problems. They are also the reason why "end-to-end AI for AAV manufacturing" remains aspirational.

[1] Where ML has actually delivered: closed-loop process optimization

The downstream chromatography work I led with is not a single isolated result. It is the leading edge of a broader category that has been quietly maturing for several years. Mechanistic chromatography modeling, hybrid physics-based deep learning, and Bayesian optimization frameworks have produced documented improvements across protein purification adjacent to AAV, including (LSTM-based) predictive monitoring of affinity column integrity in monoclonal antibody manufacturing.

What makes the recent AAV-specific work credible is its discipline. It was constrained to one resin chemistry. It used historical data from seven serotypes to train the surrogate model, then validated on three. It used (SHAP) analysis to identify which input parameters mattered most, finding that large feed volumes, high flow rates during wash and elution, and physiological wash buffer conditions were the dominant drivers. That last detail is more useful than it might seem. It is a specific, testable, mechanistic finding, not a generic claim about ML capability.

This is what real ML for AAV manufacturing looks like when it works. A bounded problem. A surrogate model with explicit uncertainty quantification (Gaussian processes earn their keep here because they tell you when they don't know). A defined acquisition function. A small number of experiments. Honest reporting of limitations.

It is also worth saying clearly: at the time of writing, the only peer-reviewed AAV-specific ML result with public data, public methods, and explicit per-serotype performance numbers is in this category. Everything else is at a lower tier of evidence.

[2] Producer cell engineering: where pattern recognition meets the limits of pitch language

Some of the most-cited AAV titer improvements in the literature come from genetic engineering of producer cells, not from ML. A 2021 genome-wide CRISPRa screen in HEK293T cells, using over 70,000 sgRNAs against more than 23,000 human genes, identified that combined overexpression of SKA2 and ITPRIP produced up to 3.8-fold improvement in AAV2 manufacturing capacity, with 3.5-fold for AAV6 and smaller gains for AAV5 and AAV9 (DOI: 10.1016/j.omtn.2021.06.026). Subsequent screen-based work, published in peer-reviewed papers and ASGCT abstracts across 2023 and 2024 from multiple groups, has identified additional host factors including GPR108, TM9SF2, KIAA0319L, and MON2.

This work is real and useful. It is also, mostly, not machine learning in the modeling sense. Screen-based hit-calling against pooled CRISPR libraries uses statistical tools (MAGeCK-style ranking, false discovery rate control) rather than predictive models. The distinction matters when evaluating a pitch. "AI-designed producer cells" can mean two different things. It can mean we ran a CRISPR screen and analyzed the data with standard tools that carry some AI flavor, which is solid science. Or it can mean we built a predictive model that designs new cell line edits, which is a much stronger claim that has not been publicly demonstrated for AAV at the time of writing. The field is genuinely moving from pure hit-calling toward integrating transcriptomic, proteomic, and single-cell data with screen output to build more predictive models of producer cell state, but the most ambitious work in this direction has not yet entered the peer-reviewed literature with headline numbers. The trajectory is real. The current evidence is still thin.

When evaluating a producer-cell ML pitch, the most useful first question is whether the work being described is closer to hit-discovery or closer to generative design. Both have value. They sit at different tiers of evidence, and they should be priced differently.

[3] Bioreactor monitoring and process analytical technology

Real-time monitoring of viral vector production is a frontier where the methodology is mature in adjacent fields and is in mid-development for AAV specifically. Raman spectroscopy combined with chemometric analysis is now standard for monitoring metabolic state (glucose, lactate, viable cell density) in CHO-based monoclonal antibody manufacturing. Adaptation to AAV titer monitoring has been done commercially and is deployed in industry. Independent peer-reviewed validation against orthogonal analytical methods (analytical ultracentrifugation, mass photometry, ddPCR) is not in the public domain.

That absence is not evidence the technology does not work. Vendors deploying these systems have customer data they cannot share publicly. The diligence implication is that buyers cannot evaluate accuracy from the literature alone. The right move is to ask for the data under NDA before committing.

A useful published benchmark, transferable from CHO to AAV, was demonstrated in 2025 for a SARS-CoV-2 spike protein process. An ML model (recurrent neural network soft sensor) tracked product titer along with cell growth and key metabolites across a 17-day fed-batch culture, with average normalized root mean squared error of 0.24 and average R² of 0.97 across the tracked variables (DOI: 10.1002/btpr.70046). No AAV-specific equivalent has been published in this rigor.

The interpretive frame to bring here: real-time monitoring of glucose, lactate, and viable cell density during AAV production is mature. Real-time monitoring of capsid titer is deployed but not publicly validated. Real-time monitoring of empty-versus-full ratio during production, which would actually change manufacturing economics if it worked, is essentially unproven. Pitches frequently slide between these three layers without acknowledging the differences.

[4] Cassette design and genome quality: the manufacturing AI category most often hidden in plain sight

A category of pitch that has grown rapidly in the past 18 months positions AAV manufacturing AI as primarily a sequence design problem. The argument: by screening transgene constructs in silico for manufacturability risk factors before any wet-lab work, you avoid manufacturing campaigns that were doomed by their cassette design from the start. The category overlaps with vector engineering but is being sold as manufacturing capability.

The strongest published result here is an academic ML model (CNN+LSTM) for truncation hotspot prediction, reporting 98 percent classification accuracy on a training and test split. Stuffer sequence optimization and codon optimization for transgene expression are similar in maturity. The methodology is reasonable. ML excels at sequence-level pattern recognition with clean labels, and these are exactly the kinds of problems where deep learning works.

The diligence question for this category is not whether the technology works. The diligence question is whether within-distribution accuracy translates to prospective accuracy on transgenes that look nothing like the training data. A 98 percent accuracy on a held-out test split from the same distribution is a meaningful methodological achievement. It is not the same as a 98 percent success rate on a previously unseen transgene with novel ITR configuration and unusual GC content. Vendors selling in this space should be asked, explicitly, what their prospective generalization data looks like, and on how many transgenes the model has been tested without retraining.

It is also worth noting that for one of the most-cited published numbers in AAV genome quality (a 2019 study showing that only about 60 percent of packaged genomes in one batch were complete; DOI:10.1089/hgtb.2019.031), the baseline against which ML truncation prediction should be measured is a real industry problem with regulatory implications. That is part of why this category is growing. The need is genuine.

[5] Packaging quality

A separate category sits at the intersection of upstream and downstream: packaging quality, or the question of what actually ended up inside the capsid. Three sub-problems matter here. Partial capsids contain genomes shorter than full-length. Empty capsids contain no DNA at all. Mis-packaged species contain host genomic DNA, plasmid backbone sequences, or helper plasmid fragments. A 2019 study using two-dimensional ddPCR found that only about 60 percent of packaged genomes in one analyzed batch were complete (DOI: 10.1089/hgtb.2019.031). Mass photometry, charge detection mass spectrometry, and analytical ultracentrifugation are the standard ways to quantify these populations after manufacture.

ML enters this category in two ways that are genuinely deployed. The first is on the downstream side, where mechanistic and hybrid models for empty-versus-full separation by anion exchange chromatography have been published, though no closed-loop ML result yet matches the affinity capture benchmark I led with. The second is on the analytical side, where long-read sequencing pipelines using ML basecalling and read classification can quantify subgenome species in packaged material. What is not yet published, and is being pitched by several vendors aspirationally (from what I have seen so far), is predictive modeling of packaging fidelity from cassette design alone, the capability to say from sequence whether a construct will package cleanly before it reaches the bioreactor.

The diligence question for any pitch in this space is which of the three layers the vendor is actually working on. Detection, separation, and prediction sit at very different evidence tiers.

[6] Analytical ML: the most embedded, least credited application

A great deal of AAV-relevant machine learning runs invisibly inside analytical instrument software. Deep learning particle classification on cryo-electron microscopy images has become standard for quantifying empty, partial, and full capsids at high purity, particularly where absorbance-based methods break down. Long-read sequencing pipelines for AAV genome quality control use ML basecalling and read classification to assign reads to subgenome categories (full, partial, rearranged, plasmid-derived). Quantitative low-voltage transmission electron microscopy methods have been published with deep learning classification, with concordance studies demonstrating agreement between deep learning-assisted TEM and analytical ultracentrifugation for full capsid quantification on bulk drug substance.

This is real ML. It is also rarely what is being sold when a vendor pitches "AI for AAV manufacturing." When a pitch deck claims AI-powered analytics, the right question is what is novel compared to what already ships in the cryo-EM, mass photometry, or nanopore sequencing instrument software. Sometimes the answer is meaningful additional capability. Sometimes the answer is a wrapper user interface (UI).

What you'll actually hear: five archetypes of pitches

In the current market, AAV manufacturing AI pitches cluster into five broad archetypes. Naming them is useful because the diligence questions are different for each.

The first is the platform infrastructure pitch. A category positioning itself as the modeling and data layer across the gene therapy manufacturing workflow. Typically combines mechanistic modeling with ML to claim data efficiency. Peer-reviewed AAV-specific outcome data is not yet in the public domain for this category.

The second is the integrated AI suite pitch. A category bundling AI-designed genetic components (tissue-specific promoters, regulatory elements optimized for production-phase expression control, transgene sequence optimization) with an engineered host cell line and an optimized plasmid system. Sold as end-to-end design and production. Reported titers from the strongest exemplars in this category span several orders of magnitude depending on whether the system is transient or stable, with the highest publicly reported numbers reaching well above conventional industry baselines for the corresponding configuration.

The third is the design-for-manufacturability pitch. A category positioning AAV manufacturing AI as primarily a construct design problem, with in silico screening of sequence candidates for manufacturability risk. Often paired with long-read sequencing analytics for batch comparability. Frames its value proposition around regulatory risk reduction.

The fourth is the predictive cell line development pitch. Typically structured as a collaboration between an AI-focused group and a manufacturing platform provider, focused on accelerating stable cell line clone selection. Public claims in this category tend to be expressed as forward-looking targets, often substantial percentage reductions in development timelines, rather than as completed program outcomes.

The fifth is the internal industry capability. Several large gene therapy developers have built their own ML capabilities for internal manufacturing, including yield prediction at production scale (50L through 3000L), internal process optimization, and ML-enabled analytical method development. This work is not sold externally. It matters because it means the most data-rich AAV manufacturing ML is currently invisible to outside view. The data exists. It lives behind firewalls.

Recognizing the archetype is the first step in evaluation. The second is asking the right questions.

Twelve diligence questions for any AAV manufacturing AI pitch

These are the questions I find most useful when an executive asks me to evaluate a vendor claim. They are not a scoring rubric. They are a framework for separating what is published from what is plausible from what is hype. None of them is hostile. All of them have a credible answer in a serious pitch.

What was the baseline performance before this approach, in the same units you are reporting now? A result without a baseline is not evaluable. The published 70 to 99 percent yield benchmark exists because the baseline was reported alongside it.
How many experimental runs were required to train and deploy the model? Data efficiency is the actual scarce resource in this field. Thirty experiments across a 930,000-condition search space is a meaningful claim. Numbers without an experimental-cost denominator are marketing.
Does the model transfer across serotypes, facilities, transgenes, or production platforms? The strongest published cross-serotype transfer result is bounded to one resin chemistry. Broader claims should be backed by data, not stated as a feature.
What is the validation set, and is it independent of the training data? A 98 percent accuracy on a within-distribution training and test split is not a 98 percent accuracy on a previously unseen transgene. The distinction is the difference between a methodological achievement and a product capability.
What critical quality attribute is being predicted, and how is it labeled? Convolved labels produce convolved predictions. If transduction efficiency is the model's target variable, the model is predicting binding times uptake times trafficking times uncoating times expression, with all the variance that implies.
Has the methodology been peer-reviewed, presented at a clinical-grade venue, or independently replicated? The strongest current AAV manufacturing ML result is peer-reviewed with conflicts of interest fully disclosed. That is the bar against which other claims should be measured.
What does the model do when it encounters conditions outside its training distribution? Distribution shift is a real risk. If a vendor describes their model as extrapolating to new regions of process space, the right next question is how that extrapolation has been validated.
What is the comparison baseline: design of experiments, one-factor-at-a-time, or no optimization at all? Statistical design of experiments has been shown to deliver two to four-fold titer improvements for AAV across multiple published studies. ML claims should be evaluated against that baseline, not against an unoptimized starting point.
How is value attributed when the offering bundles multiple components? If a stable producer system reports a headline titer, the question is what fraction is attributable to AI-designed components versus cell line engineering versus plasmid system optimization. A credible vendor will be able to walk through this.
What does the customer still have if the relationship ends? Tool, service, and intellectual property need to be cleanly separable. A model trained on customer manufacturing data should not become a competitive asset against that customer.
Is the headline number a stated target or a measured outcome? "We aim to reduce development timelines by 70 percent" and "we have demonstrated a 70 percent reduction across three programs" are different claims and should be evaluated differently. Forward-looking statements are hypotheses, not data.
For PAT and soft-sensor claims, what is the accuracy compared to orthogonal analytical methods? Real-time monitoring claims should be backed by accuracy data against analytical ultracentrifugation, mass photometry, or ddPCR. Public validation data is rare in this category. Ask for it under NDA.

I left the questions at twelve rather than condensing further because each catches a different category of pitch ambiguity. In practice, two or three of them will be the live questions for any given evaluation. Which two or three depends on the archetype.

#13 : The expertise question underneath all the others

Twelve diligence questions cover a lot of ground, but they all lead back to one question that none of them quite captures on its own.

ML applied to AAV manufacturing is only as good as the chemistry and biology assumptions encoded into the model. Input features, output labels, training data composition, loss functions, validation strategy: every layer of a manufacturing ML system reflects choices made by the team that built it, and every one of those choices has a biology premise embedded in it. When the premise is right, the model is calibrated to the actual physics of AAV production. When the premise is wrong, the model optimizes a slightly different problem than the one the customer is trying to solve, and the failure mode is silent. The model will report high accuracy on its own validation set. The biology will not cooperate.

This is the most important diligence lens an executive can bring to a pitch, and it is also the one that is hardest to evaluate from the outside. The chemistry and biology cannot be inspected in the slide deck. The model architecture and the headline numbers can be. The result is an asymmetry: pitches optimize for the parts of the system that can be shown, while the parts that determine whether the model is real often live in choices that never reach the slide deck.

A few examples to make this concrete.

A model designed to predict empty-versus-full capsid ratio from upstream bioreactor telemetry needs to encode the biology of capsid assembly kinetics, the chemistry of Rep and Cap protein stoichiometry, and the cell-line-specific timing at which empty and full capsids form during the production window. Empty capsids and full capsids are not formed in parallel by the same process. They are produced through partly distinct intracellular trajectories, and the upstream signals that predict each are different. A team that treats the ratio as a generic regression target on top of standard bioreactor variables will produce a model that looks plausible during validation and underperforms in production, because the model has not been told which inputs actually carry information about which capsid population. The biology has to come first. The model architecture follows.
A model designed to predict truncation hotspots from cassette sequence needs to encode ITR secondary structure, GC content effects on replication, the directionality of packaging from the 3' ITR inward, and the strand-specific kinetics of single-stranded genome formation. Truncation is not a uniform sequence-pattern problem. It is a biology problem with sequence-level correlates. A team that treats the entire transgene as a string of bases and learns a generic pattern-matcher will produce a model that performs beautifully on transgenes resembling the training set and fails entirely on a new gene of interest with unusual GC content, novel ITR configuration, or a different size relative to the packaging limit. The within-distribution accuracy will stay high. The prospective accuracy on the customer's actual program will not.
A closed-loop chromatography optimizer needs to encode the biology of capsid surface chemistry, the charge distribution differences across serotypes, and the binding mechanism specific to the affinity ligand being used. The strongest published example of this work explicitly handles the case where AAV5 binds the affinity resin under different conditions than AAV2 and AAV9, because AAV5's binding mechanism is structurally distinct. That detail is in the paper because the team understood the biology well enough to design the optimization around it. A team that treated all serotypes as generic biomolecules with tunable buffer parameters would have produced a model that worked for AAV2 and AAV9 and failed for AAV5. The model would have reported its failure as poor convergence rather than as a biology error, and the customer would have walked away thinking the technology does not work, rather than thinking the team did not understand which biology to encode.

The pattern across all three examples is the same. The model architecture is not the bottleneck. The biology that goes into the model is. A team without deep AAV biology and chemistry expertise can still build a model. The model will train. It will validate. It will produce numbers. The numbers will be confident. The biology will not be in the model in any meaningful way, and the customer will find out the hard way.

This is the underlying reason why most of the credible published AAV manufacturing ML work to date has come from collaborations that include process chemists, bioprocess engineers, or AAV biologists as full intellectual partners rather than as customers or consultants brought in late. The biology has to be in the room when the model is being designed, not after.

For the executive evaluating a pitch, the question that gets at this is not a technical one. It is a team-composition question. Who on the model-building team has direct, hands-on experience with AAV chemistry and biology, at what level of depth, and what role do they play in feature selection, label definition, and validation design. The answer is informative regardless of what it is. A credible team will name specific people, specific roles, and specific contributions. A pitch where the AAV biology expertise is described as "advisory" or "consulted" or "available" is describing a team where the biology was applied at the end of the model-building process rather than at the beginning. Those are not the same model, even if the architecture is identical.

For the scientist evaluating a pitch on behalf of a CEO, the corresponding move is to look at the model's input features and output labels and ask whether they reflect a real understanding of which biological steps produce which signals. If the features look generic and the labels look convolved, the model is generic regardless of how sophisticated the architecture is.

None of this is operational in the sense that a non-ML executive could run the analysis themselves. That is the point. The chemistry-and-biology-awareness question is the one that cannot be outsourced to the framework. It can only be evaluated by a person or team with deep working experience in both ML methodology and AAV biology. This is a necessary evaluation; if it cannot be done right, then your program is taking a risk.

Where the field actually is

Closed-loop optimization of defined unit operations in defined facilities is real and delivers documented improvements in the range of 20 to 40 percent for routine cases, and into the 70+ percent range in best cases. Producer cell engineering analysis using statistical hit-calling on CRISPR screen data is mature. Real-time process analytical technology for AAV titer is deployed but not publicly validated. Sequence-level ML for genome quality and cassette design is the highest-velocity category and will matter for regulatory submissions within the next 24 months. Most pitch content claiming territory beyond these boundaries is either unproven or proprietary, and proprietary should be treated as unproven until shown otherwise.

This is not a pessimistic read. It is a calibration. The field is doing real work. It is also being represented, in some quarters, more ambitiously than the evidence supports. Both can be true at once, and the work of an executive read is holding both in view simultaneously.

The actual bottleneck

I'll close with the structural observation that I think matters more than any individual technical critique.

The reason the AAV manufacturing AI category cannot yet produce cross-platform models is not that the algorithms are inadequate. The algorithms exist. The methodology exists. The reason is that AAV manufacturing data lives in proprietary silos, and almost no one in the field is willing to share it, understandably. Capsid engineering ML has matured rapidly in the past five years partly because public datasets exist (mainly datasets and methods from Dyno and my previous work at the Broad Institute). Manufacturing data has no equivalent. Each company optimizes its own process on its own data. Cross-platform validation, the thing that would let the field build models that actually transfer, is structurally rare. Some large gene therapy developers have internal datasets large enough to attempt cross-platform validation today, and the absence of public results from these efforts is itself a signal worth interpreting. The companies that could be furthest along are often the quietest, which is not an accident.

This is a coordination problem, not a science problem. The companies that will be able to credibly pitch real cross-platform AAV manufacturing AI in five years will be the ones that solve the data sharing question now, probably through pre-competitive consortia structured to protect sensitive IP while pooling the parameters that need to be pooled. The companies that try to build it on internal data alone will hit the ceiling that the published evidence already shows is there.

That is the conversation worth having at the executive level. Not whether AI works for AAV manufacturing. It does, in the right places. The conversation worth having is what infrastructure the field needs to build for the next generation of work to be possible, and how to evaluate the companies claiming to be there already.

If you are reading this because you have a specific pitch in front of you, the framework above is where to start. If you are reading this because you are wondering whether AAV manufacturing AI is real, the answer is yes, in the categories where it is real, calibrated to the evidence that supports it. Anything beyond that requires the diligence work this post has been about.

The field will be stronger for executives who can ask the right questions, and for scientists who can answer them honestly. Empowering both of those readers is the work.

Fatma Elzahraa Eid, PhD is the founder of TheBioMLClinic, an independent advisory practice at the intersection of AAV capsid engineering and machine learning.

Full disclosures

Prior work and expertise. I am an independent computational scientist with over twenty years of experience in machine learning and biological problems, including six years as Computational Lead at the Broad Institute of MIT and Harvard's AAV Engineering Program. I am a named inventor on patents and author on peer-reviewed publications in AAV capsid engineering and ML-guided protein design. This experience directly informs my scientific perspective on the claims discussed in this post. The views expressed here do not represent the Broad Institute, any prior employer, or any current advisory client of TheBioMLClinic.

Business interest in the subject of this post. Through TheBioMLClinic, I provide independent technical advisory and vendor evaluation services to organizations considering or being pitched AAV AI capabilities. This is a direct professional interest in the topic of this post and should be treated as a material conflict of interest. The framework presented here reflects my professional analytical assessment of the field's published evidence base; it is also adjacent to the services I offer. Readers should weigh both facts when evaluating the content.

Archetype framing, not vendor identification. The five vendor archetypes described in this post are illustrative composites drawn from public market activity across multiple companies. They are not characterizations of any specific company. Any resemblance to a particular vendor is unintentional and should not be read as identification, criticism, endorsement, or assessment of that vendor's products, claims, or practices. Where specific numerical ranges from public sources are referenced, they are not attributed to any company by name in this post.

No coordination with specific companies. This post was not coordinated with, endorsed by, sponsored by, or commissioned by any company. I have no current equity stake, paid advisory role, or board position with any commercial vendor of AAV manufacturing AI products at the time of publication. I may have past, present, or future advisory engagements with organizations in the broader gene therapy and AAV-ML space; none of these engagements influenced the content of this post.

Why some entities are named and others are not. Peer-reviewed publications, preprints, and conference abstracts are cited by author, institution, and/or venue where appropriate. These citations are factual references to the published scientific literature and do not constitute endorsement or critique of the cited authors' organizations. Commercial vendors are not named because the post is intended as a framework for evaluating any pitch a reader may encounter, not as commentary on specific companies.

Time-bound and tier-of-evidence nature of the analysis. Some of the numbers cited in this post come from peer-reviewed publications. Others come from preprints or conference abstracts, which represent the best public information at the time of writing but have not undergone full peer review. The AAV manufacturing AI field is evolving rapidly. Specific claims, timelines, and capabilities described here may be superseded by new published work after the date of publication.

Not procurement, investment, business, or regulatory advice. This post does not constitute investment advice, procurement guidance, business strategy advice, or regulatory guidance. Decisions about vendor selection, technology adoption, capital allocation, or regulatory strategy should be made with aid from qualified domain experts for the specific organization making the decision. The diligence questions presented in this post are a framework for thinking, not a substitute for organization-specific technical and commercial evaluation.

Educational intent. This post is intended to provide a calibration framework for scientists and executives evaluating AAV manufacturing AI claims. It identifies where the published evidence base supports specific capabilities and where claims currently exceed that evidence. It is offered as a contribution to field-level discussion, not as a definitive or prescriptive guide.

Search This Blog

The AI × AAV Interpreter