AAV-ML for Experimentalists #3: How ML Fits Into AAV Experimental Workflows

TL;DR ML can help you along every step of the AAV campaign; you just need to know how it fits.

You now know what generative and predictive models do from the previous posts of this series [Post 1, Post 2].

Generative proposes new sequences. Predictive scores the ones you have.

But knowing what they do isn't the same as knowing when to use them.

When do you bring ML into your campaign? At what step? What decisions does it actually change?

This post maps ML onto your real workflow — so you know where it fits and where it doesn't.

The Big Picture

ML doesn't replace your workflow. It augments specific steps. You still design libraries. You still synthesize. You still screen. You still validate.

ML inserts at two points:

Before you make things: Generative models propose candidates. Predictive models filter out likely failures. You synthesize smarter.
Before you test things: Predictive models rank your variants. You screen the most promising first. You prioritize smarter.

The goal isn't to eliminate experiments. It's to eliminate wasted experiments.

Fewer duds synthesized. Fewer duds screened. Faster path to winners.

The Core Cycle

Every capsid engineering campaign follows a cycle:

Design → Make → Screen → Learn → Repeat

ML touches three of these:

Design: Generative models propose candidates you wouldn't have designed.
Screen: Predictive models prioritize what to test first.
Learn: Your screening results train better models for the next round.

The "Learn" step is what makes ML iterative. Every screen generates data. That data improves your models. Better models improve your next round (if applicable).

This is why thinking about ML early — even before your first screen — pays off later.

Decision Point 1: Starting a New Campaign

You're launching a new campaign. Where does ML fit?

First question: Do you have relevant training data?

If yes (yours from previous work, or public datasets that match your serotype and property):

Use predictive models to filter your designed library before synthesis
Use fine-tuned generative models to propose additional candidates
You're starting ahead

If no (new serotype, new property, no relevant data):

Consider zero-shot generative models trained on other serotypes or public data
Or start with rational design and traditional diversity
Plan to collect ML-ready data in round 1

Key insight: Don't wait for "enough data." Even a modest first screen ( 1,000 to 5,000 variants) gives you enough to train useful models for round 2. Plan for this from the start.

Decision Point 2: Designing Your Library

Traditional library design optimizes for diversity. Cover as much sequence space as possible.

ML-informed library design adds another consideration: will this data be useful for training models later?

What makes a library "ML-ready":

Coverage of the fitness landscape: Not just hits, but a range of outcomes. Models learn from contrast between winners and losers.
Quantitative readouts: Binary data (worked/didn't) enables filtering. Continuous measurements (titers, fold-enrichment) enable ranking. More granularity means more useful models. You need to know what you are going to use the models for to decide.
Controls and spike-ins: Known variants included across batches enable normalization. Controls let you know which is which. Without them, your training data can be corrupted.
Metadata: Batch, conditions, timepoints. What seems irrelevant now may matter for modeling later.

You don't need to redesign everything. But small adjustments in how you collect data can dramatically increase its ML value downstream.

Decision Point 3: Before Library Synthesis

You have a set of candidate sequences — designed rationally, generated computationally, or both.

Synthesis is expensive. You can't make everything.

This is where predictive filtering pays off.

A packaging fitness model scores your candidates. You remove the bottom tier: sequences predicted to fail before you spend money making them.

How aggressive should you filter?

Aggressive filtering (keep top 10%): Saves the most cost, but risks filtering out false negatives (good variants the model underestimated).
Conservative filtering (keep top 50%): Costs more to synthesize, but hedges against model error.

The right threshold depends on your synthesis budget, your risk tolerance, and your confidence in the model.

Rule of thumb: Filter aggressively on well-validated properties like production fitness (packaging/viability), where models are reliable. Filter conservatively on less-validated properties, like in vivo tropism, where models are uncertain.

Decision Point 4: After Round 1 Screening

You've screened. You have hits.

Now what?

Option A: Go straight to validation.

Take your top hits into secondary assays, animal studies, or development.

Option B: Expand computationally first.

Train a generative model on your hits. Generate new candidates. Screen your original hits AND the generated candidates together in round 2.

Why consider Option B?

Your round 1 hits came from your round 1 library. But the sequence space is vast. There are almost certainly good variants you never made.

A generative model trained on your hits can propose novel variants that follow the same patterns — but explore further.

In round 2, you screen both:

Your original top hits (confirmed performers)
Generated candidates (novel proposals)

The generated candidates sometimes outperform your original winners. You're finding variants you would have missed. Read more here.

When is Option B worth it?

When you have time for another round
When you want more diversity or backup candidates
When your hit rate suggests there's more to find
When you are going to screen multi-trait predictions: learning and predicting variants possessing simultaneous features/ functions/ traits/characteristics at once.

Decision Point 5: Before Individual Screens

You have a set of candidate sequences planned for indivdual screenings.

But screening capacity is limited. You can test 1,000 variants in a medium-throughput screen or 5-10 variants in low-throughput but not 10,000.

This is where predictive ranking pays off.

A model scores your variants by predicted performance. You screen the top n first. This is particularly important if you have candidates, say for tropism, and ranking based on a secondary function like production fitness.

Important: Keep some random samples in each tier. This lets you measure whether the model is actually enriching — and provides unbiased training data for future models.

Decision Point 6: When to Stop Iterating

ML enables iteration. Each round improves your models. Better models improve your next round.

But more rounds aren't always better.

Diminishing returns: Early rounds yield the biggest gains. By round 3 or 4, incremental improvement shrinks.
Practical limits: Timelines, budgets, organizational patience. At some point, you need to move forward with what you have.
Good enough vs. optimal: Know your threshold. If you have candidates that meet your specs, additional optimization may not be worth the time.

Rule of thumb: Plan for 2-3 rounds. If you're not seeing meaningful improvement by round 3, you've likely captured most of the ML-addressable gains.

What Changes When You Plan for ML

If you know ML will be part of your campaign, some upfront decisions change:

Collect quantitative data, not just binary. "Packaged / didn't package" trains a filter. Actual titers train a ranker. The latter is more valuable.

Include controls and spike-ins. Known variants across batches enable normalization. Without them, batch effects look like biology — and corrupt your models.

Think about what you'll want to predict later. If tropism matters, design readouts that capture tropism — even in early screens. You can't train a tropism model on packaging data.

Record metadata. Batch IDs, dates, conditions, operators. What seems like noise now might explain variance later.

Plan for negative examples. Models learn from failures too. Don't discard them. A library of "what didn't work" is training data for "what might work."

Common Pitfalls

Waiting too long to involve ML. "We don't have enough data yet." But even 1,000 variants with outcomes is enough to start. Don't wait for perfection.

Over-trusting predictions without validation. Predictions are probabilities, not guarantees. Always validate top hits experimentally. A model that's 80% accurate is still wrong 20% of the time.

Using a model trained on a different context. A model trained on AAV2 packaging may not transfer to AAV9. A model trained on HEK cells binding may not predict liver transduction. Always ask: how similar is my context to the training data?

Not closing the loop. You use predictions to prioritize screening. Great. But did you track how the predictions performed? Without feedback, you can't improve — and you don't know if the model is actually helping.

Discarding "failed" variants. Non-functional variants are training data too. They teach the model what doesn't work. Keep them.

Final Thought

ML fits into your workflow at specific decision points:

Before synthesis: Filter out predicted failures.
Before screening: Rank and prioritize.
After screening: Train better models, generate new candidates.

The earlier you plan for ML, the more value you extract. Small changes in how you design libraries and collect data pay dividends in later rounds.

It's not about replacing your experiments. It's about making each experiment count for more.

Next in the series: Where ML Is Being Applied (a survey of capsid fitness, tropism, manufacturing, and beyond).

PS: This is what The AIxAAV Interpreter is for: translating ML methods into actionable AAV engineering strategies. Follow me on LinkedIn for more practical insights that accelerate bio-innovation.

Search This Blog

The AI × AAV Interpreter