AI Is Not Coming for AAV Scientist Jobs. We Are Worried About the Wrong Thing (~5min read)

TL;DR AI doesn't have enough data to replace you. You are the one making the AI. 


 
Every couple of days, someone slides into my inbox with a version of the same question. A postdoc. A VP of Research. A senior scientist with fifteen years in capsid biology watching presentations about models that "design novel AAV variants at scale."

The question is always: "Should I be worried?"

Not about what you think.

The fear makes sense. The headlines are real. But the anxiety is aimed at the wrong target. There are three structural reasons why — and each one reframes the previous.


Realization One: The Models Need You to Function at All

Before asking whether AI will replace experimental scientists, ask whether it can even work reliably in AAV engineering.

The answer is: barely — and not because the algorithms aren't good enough.

Of the roughly 246 million protein sequences in UniProt, fewer than 600,000 have been manually reviewed and experimentally characterized. That's around 0.25% of the total. For AAV specifically — where your readout isn't "does this protein fold?" but "does this variant transduce the right tissue in an NHP at a therapeutic dose without triggering immune clearance" — the characterized fraction is smaller still.

Compare that to language models (LLMs) trained on 80–90% of commonly produced human text, or image models trained on hundreds of billions of labeled examples with instant, cheap feedback. In those fields, data deficits are solvable problems — money closes the gap in weeks.

In AAV, even unlimited resources can't compress a six-week in vivo readout. The data accumulation rate is set by the biology itself, and biology doesn't negotiate. Labels cost hundreds to thousands of dollars each. Feedback loops run on animal model timelines. You can't parallelize your way out of an NHP study.

The gap isn't just large. It closes slowly, permanently, at the speed of wet lab science. Which means the models don't have enough to learn from — and you are the learning.


Realization Two: The Field Doesn't Share — and That Compounds Everything

The dramatic success of AI in computer science applications stems from a radically different data-availability story. 

Even if the data existed, AAV has a second structural problem layered on top: it is one of the least knowledge-transparent fields in modern biology, and it's becoming less open over time, not more.

In computer science, sharing is the currency. ArXiv preprints, open-source code, public benchmarks — when a new method works, the code is on GitHub before peer review finishes. Knowledge compounds publicly because the incentive structure rewards it.

AAV doesn't work that way. Capsid engineering — even in academic settings — routinely operates under industry partnerships, sponsored research agreements, or patent timelines. The most important methodological details: which library strategy worked in NHP, what the real hit rate was, why a particular screening approach failed — reach the community years late, stripped of operational detail, or not at all.

The result is that every organization runs its own private experiment, with limited ability to learn from what the field has already tried. The same mistakes get made repeatedly, in parallel, in silence. AI tools built on public data are working from a picture of AAV that is incomplete in exactly the ways that matter most. There is no shared benchmark. There is no reproducible baseline. And that's not changing anytime soon, because the economics of gene therapy push in the opposite direction.


Realization Three: The Hardest Decision Is Permanently Yours

Even with perfect data and full transparency, a third problem remains — and this one doesn't go away.

The goal of AAV engineering is, by definition, to go where the model has never been. Novel capsids. Novel receptor targets. Novel species. Every time you push into genuinely new functional space, the model's reliability degrades — silently, without warning. It has no way of knowing it's extrapolating. It returns a confident score. The number looks like the other numbers.

The person who catches that is you. The one who knows the training data was enriched for a different tropism profile. Who recognizes that a particular loop region has never been functionally characterized in the context you care about. Who understands that assay conditions won't transfer across species. That judgment — applied before a confident model output sends a program months in the wrong direction — is load-bearing in a way nothing currently automates.

Current AI agents in gene therapy are genuinely useful for data pipeline logistics, program summarization, and reducing cognitive overhead. Real gains. But the organizations building them say the same thing consistently:

"expert knowledge still outperforms general AI for scientific decisions, and domain expertise must inform the system rather than be replaced by it."

The agents handle logistics. What to test, what a result means, and what to do when the model was confidently wrong — that remains yours.

The Question Worth Sitting With

The job isn't at risk. The role is evolving.

The most valuable experimental scientist in an ML-augmented AAV lab is not the one who runs the most assays. It's the one who designs experiments that generate data worth learning from — who understands what the model can and cannot know, and asks before every campaign: what does this teach the model that it can't currently know?

That reframe turns the experimental scientist from a downstream consumer of ML outputs into something more important: the person who determines whether the system can learn anything useful at all.

The fear about AI replacing AAV scientists is understandable. It is also, structurally, backwards.

The question isn't "will AI take my job?"

It's "am I generating the data that makes AI worth using?"


Want the full version with the expanded IP deep dive and a concrete NHP failure scenario? [Link to long version]
Questions and field observations — send them in.

PS: This is what The AIxAAV Interpreter is for: translating ML methods into actionable AAV engineering strategies. Follow me on LinkedIn for more practical insights that accelerate bio-innovation.



Comments

Popular posts from this blog

Why AlphaFold Won't Engineer Your Next AAV Capsid

Why Protein Language Models (PLMs) won't let you explore distant AAV capsids + how to fix it