De novo prediction of tissue-specific expression effect and disease risk for every mutation in a human genome - Expecto
Princeton Docket # 18-3423
Sequence-based models, with their scalability and usage of sequence dependencies, enable a new era of mutation analysis at unprecedented scale that can yield a new perspective on human variation in human diseases and complex traits. Researchers in the Department of Computer Science and the Lewis-Sigler Institute of Integrative Genomics at Princeton University and the Center for Computational Biology at the Flatiron Institute, a part of the Simons Foundation have developed data-driven models (ExPecto) that predict tissue-specific transcription levels for each gene in the human genome directly from 40kbp promoter-proximal sequences, leveraging sequence features learned from chromatin profiling data. ExPecto is capable of predicting expression-altering effects of any mutation with high confidence, across over 200 tissues and cell types. ExPecto was used to systematically predict likely expression-altering human genome variants, which were used to prioritize causal variants within GWAS disease- or trait-associated loci. The researchers experimentally showed that ExPecto predicted putative causal SNPs identified by the original GWAS studies, but not the lead SNPs identified by the original GWAS studies at three loci associated with four diseases cause expression-altering effects.
The scalability of ExPecto allows one to systematically characterize the full predicted expression effect space of potential mutations for each gene, via profiling of over 140 million promoter proximal mutations. The resulting distribution of predicted mutation effects is informative of gene-specific evolutionary constraints of expression, indicating whether a specific gene is under evolutionary pressure for low or high expression in a particular tissue. Understanding such constraints on human gene expression could provide valuable information on deleterious impacts of gene transcription dysregulation in a systematic manner, which are otherwise difficult to obtain due to experimental limitations in humans, enabling identification of novel potential disease-associated non-coding mutations.
Publications
Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk, Jian Zhou, Chandra L. Theesfeld 1, Kevin Yao, Kathleen M. Chen3, Aaron K. Wong and
Olga G. Troyanskaya, Nature Genetics, Vol 50. Aug 2018, 1171-1179.
Potential Applications
Use in genetic counseling in assessing risk of genomic variations with or without prior knowledge
Identification of genomic variations that effect gene expression in patients
Identification of disease causing non-coding gene alternations
Advantages
Prediction of expression-altering effects with high confidence
Scalability and usage of sequence dependencies
Intellectual Property & Development Status
Patent protection is pending.
Princeton is currently seeking commercial partners for the further development and commercialization of this opportunity.
The Inventors
Olga Troyanskaya is a professor at the Lewis-Sigler Institute for Integrative Genomics and the Department of Computer Science at Princeton University, where she has been on the faculty since 2003. In 2014 she became the deputy director of Genomics at the Center for Computational Biology at the Flatiron Institute, a part of the Simons Foundation in NYC. She holds a Ph.D. in Biomedical Informatics from Stanford University, has been honored as one of the top young technology innovators by the MIT Technology Review, and is a recipient of the Sloan Research Fellowship, the National Science Foundation CAREER award, the Overton award from the International Society for Computational Biology, and the Ira Herskowitz award from the Genetic Society of America.
Jian Zhou is a Flatrion fellow, at the Flatiron Institute at the Simons Foundation that mainly works on understanding chromatin and genome variation. He received a B.S. from Peking University
Chandra Theesfeld is a research scientist in the laboratory of professor Olga Troyanskaya at Princeton University.