TL;DR When you only have ~1,000 functional hits, your generative model architecture matters more than your dataset size. Autoregressive models learn dependencies position-by-position; a stronger inductive bias that works where VAEs and diffusion models fail.