
BioML Seminar 2.4 - Evaluating Protein Design Models with Tianyu Lu
Tianyu is a PhD candidate in Possu Huang's group at Stanford University working on generative models of protein structures. Previously, he worked at ProteinQure on Gaussian Process kernels for non-canonical peptide property prediction and optimizing library design rules to increase nanobody developability and likelihood of specific binding. He studied with Philip Kim and Alan Moses at the University of Toronto working on DNA mimic proteins and inferring the structure of gene regulatory networks.
Abstract: Recent advances in generative modeling enable efficient sampling of protein structures, but their tendency to optimize for designability imposes a bias toward idealized structures at the expense of loops and other complex structural motifs critical for function. We introduce SHAPES (Structural and Hierarchical Assessment of Proteins with Embedding Similarity) to evaluate five state-of-the-art generative models of protein structures. Using structural embeddings across multiple structural hierarchies, ranging from local geometries to global protein architectures, we reveal substantial undersampling of the observed protein structure space by these models. We use Fréchet Protein Distance (FPD) to quantify distributional coverage. Different models are distinct in their coverage behavior across different sampling noise scales and temperatures; the frequency of TERtiary Motifs (TERMs) further supports the observations. More robust sequence design and structure prediction methods are likely crucial in guiding the development of models with improved coverage of the designable protein space.
