Why Protein Language Models (PLMs) won't let you explore distant AAV capsids + how to fix it
TL;DR: Your PLM is confusing "distant" AAVs with "broken" AAVs. |
If you've ever used a protein language model (like ESM) to filter AAV capsid libraries, you've probably noticed something frustrating:
Deep mutants always look bad.
That 12-mutation capsid you designed for immune evasion? ESM scores it poorly. The ancestral reconstruction with 15 changes? Even worse. Meanwhile, the 2-mutation tweak to AAV9 looks great, not because it's better, but because it's closer to what the model already knows.
This is the distance bias problem, and it quietly shapes which capsids make it through your pipeline.
What's happening:
PLMs like ESM learned from natural sequences. They've internalized an accurate belief: most random mutations break proteins. So when you hand them a sequence far from anything in their training set, they hedge their bets and score it low.
The result? Three capsids with equal true transduction efficiency, but 1, 5, and 10 mutations from wild type, get scored as Excellent → Good → Weak. Not because of biology. Because of distance.
Why does this matter in AAV engineering?
- Escape neutralizing antibodies (requires getting far from natural serotypes)
- Explore novel tropisms (often found in deep sequence space)
- Make large insertions or domain swaps (automatically pushes you far from WT)
The fix is simple but powerful:
Instead of comparing raw PLM scores across your whole library:
- Group candidates by mutational depth
- Build a background distribution for each depth (uniform shell sampling works)
- Convert raw scores → percentiles within each depth
- Rank by depth-normalized percentiles
Now your 10-mutation immune escape variant competes fairly against other 10-mutation capsids, not against 2-mutation tweaks.
Where this changes the game in AAV:
- Immune evasion screens: Stop penalizing the very distance you're trying to achieve
- VR insertion libraries: Long insertions naturally push sequences far from WT; calibration lets them compete
- Ancestral/chimeric designs: These often sit at intermediate distances where raw PLM bias is strongest
- Active learning loops: Without calibration, your model keeps pulling you back toward known serotypes
What this means for industry:
If you're at a gene therapy company running ML-guided capsid screens, this bias is silently shaping your pipeline, and probably not in your favor.
The capsids that survive your filters tend to be safe, incremental, close to known serotypes. The bold candidates, the ones that might actually solve your immunogenicity or tissue-targeting problem, get ranked down before anyone reviews them.
That's not a science problem. It's a prioritization problem. And it's fixable in an afternoon.
One calibration step. No new model. No new data. Just a statistical correction that lets your pipeline see what it's been missing.
The bigger picture:
Credit: This technique was elegantly demonstrated by Ada Shaw and the Debora Marks Lab (Harvard Medical School). Paper: [Link]
PS: If you want to go deeper on translating ML predictions into actionable AAV biology, that's what The AIxAAV Interpreter is for. I've spent two decades bridging ML theory and application.
Follow me on LinkedIn (#AIxAAV #TheBioMLClinic #TheBioMLPlayBook) for more practical insights that accelerate bio-innovation.
Comments
Post a Comment