San Francisco ML + Bio Salon
Overview
Brandon White, Amplify Partners, and Dimension Capital are hosting the SF San Francisco ML + Bio Salon, a gathering of ML/AI researchers from all backgrounds (tech, bio, chem, language, audio, vision physics, etc) discussing the latest advancements in applying ML to biochemistry. Focused on answering the question: How can multimodal models and language models be used to understand biology and chemistry?
Agenda
Food and drinks
Philip Fradkin Talk
Alex Beatson Talk
Joshua Meier Talk
Debate, questions, conversation
Talks
Philip Fradkin will present Orthrus: Towards Evolutionary and Functional RNA Foundation Models.
In the face of rapidly accumulating genomic data, our ability to accurately predict key mature RNA properties that underlie transcript function and regulation remains limited. Pre-trained genomic foundation models offer an avenue to adapt learned RNA representations to biological prediction tasks. However, existing genomic foundation models are trained using strategies borrowed from textual or visual domains that do not leverage biological domain knowledge. Here, we introduce Orthrus, a Mamba-based mature RNA foundation model pre-trained using a novel self-supervised contrastive learning objective with biological augmentations. Orthrus is trained by maximizing embedding similarity between curated pairs of RNA transcripts, where pairs are formed from splice isoforms of 10 model organisms and transcripts from orthologous genes in 400+ mammalian species from the Zoonomia Project. This training objective results in a latent representation that clusters RNA sequences with functional and evolutionary similarities. We find that the generalized mature RNA isoform representations learned by Orthrus significantly outperform existing genomic foundation models on five mRNA property prediction tasks, and requires only a fraction of fine-tuning data to do so. Finally, we show that Orthrus is capable of capturing divergent biological function of individual transcript isoforms.
Alex Beatson will present Axiom: Building AI for toxicity assessment.
Toxicity is one of the biggest reasons drug programs fail (in discovery, in the clinic, or in the market) and has been dramatically underserved by AI. I’ll discuss the need for AI for toxicity assessment, why no good tools already exist, and how we’re tackling this at Axiom. We have built a cell imaging dataset of primary human liver cells exposed to a large library of small molecules, and we use this to train and deploy models to predict in vitro readouts and phenotypes as well as clinical liver injury risk. Using AI throughout our data engine allows us to overcome some traditional limitations of in vitro and clinical datasets. Models trained on this data allow us to predict toxicity and to answer key questions such as the risk at a given dosage, the reason a molecule is toxic, or how a molecule’s toxicity can be reduced.
Joshua Meier will present Chai-1: Decoding the molecular interactions of life.
We introduce Chai-1, a multi-modal foundation model for molecular structure prediction that performs at the state-of-the-art across a variety of tasks relevant to drug discovery. Chai-1 can optionally be prompted with experimental restraints (e.g. derived from wet-lab data) which boosts performance by double-digit percentage points. Chai-1 can also be run in single-sequence mode without MSAs while preserving most of its performance. We release Chai-1 model weights and inference code as a Python package for non-commercial use and via a web interface where it can be used for free including for commercial drug discovery purposes.