Using whole genome sequencing to identify non-coding elements associated with common, complex phenotypes

Understanding the role in disease development of the non-coding genome to help improve treatments

Genetics and Genomics

Summary

We’re using a previously unparalleled amount of data on whole-genome sequences to understand how certain parts of the DNA  affect common complex traits and characteristics in people.

What are we doing?

The past 20 years of genome-wide association studies have told us that the non-coding genome is highly-relevant to disease progression and regulation of complex traits and characteristics in people. What we don’t know, in the majority of cases, is how, where (which cells) or when. We aim to release an open-source pipeline for performing genome-wide analyses of the non-coding genome. Based on an analysis of people with type-two diabetes, we hope to develop methods for interpretation and prioritisation of regulatory region identification.

How are we doing it?

Our analysis, using computers, will use data from UK, TOPMed and All Of Us biobanks. Together these contain nearly 1 million whole-genome sequences of people from diverse genetic backgrounds. Our analysis will be performed on a remote cloud-analysis platform, designed to maximise participant data security.

What happens next?

Our next step is to identify regions of the genome associated with type 2 diabetes, leveraging existing functional annotations of the non-coding genome, via collaboration with Prof Jorge Ferrer at the Centre for Genomic Regulation.

Collaborators

Prof Michael Weedon

Prof Timothy Frayling