The vast majority of coding variants are rare, and assessment of the contribution of rare variants is hampered by low statistical power and limited functional data. Elucidating the molecular mechanisms linking a mutation’s impact with phenotype is very often non-trivial, and functional interpretation of mutation data has consequently lagged behind generation of the data from modern high-throughput techniques. This is complicated by the multitude of effects a mutation may have on a proteins function
We have developed a suite of programs that uses graph-based signatures to represent the wild-type environment of a residue in order to predict the effects of a mutation on protein stability and affinity for protein partners, nucleic acids, metal ions and small molecules, including drugs and ligands. We present here a novel knowledge-guided integrated, scalable computational workflow designed to evaluate the effects of missense mutations on protein structure and interactions, and associate these effects with phenotypic data.
Using this pipeline, we have analysed hundreds of mutations generated in saturation mutagenesis studies of DBR1 and Gal4 and show that the experimental phenotypes correlate well with the predictions for over 80% of the mutations. This methodology has also allowed us to correlate mutations in VHL with the risk of developing renal cell carcinoma to guide patient treatment; analysis of the consequences of mutations in the Mendelian disease Alkaptonuria, which are being used to guide clinical trial analysis; and led to the automatic characterisation of drug resistance mutations from whole-genomic sequencing of Tuberculosis.
These examples highlight that structural bioinformatics tools, when applied in a systematic, integrated way, can provide a powerful and scalable approach for predicting structural and functional consequences of mutations in order to reveal molecular mechanisms leading to clinical and experimental phenotypes.