This impressive undertaking brings new understanding to the functional aspects of the genome and can probably be considered the most significant genomic discovery step since the sequencing of the whole human genome in 2000. The ENCODE project assigned biochemical function to about 80% of the genome, and in particular to elements outside of the well-studied protein-coding regions.
The findings of the ENCODE consortium – comprising over 400 researchers working in 32 laboratories across the world – indicate that a much greater proportion of the genome is biologically active than had previously been thought, and should effectively dispel the notion of ‘junk’ DNA.
The results of functional analyses on 147 different cell types demonstrate that at least 80% of the genome performs a specific function – mostly regulating the activity of the 2% that comprises protein-encoding genes. The work identified some 4 million regulatory elements in total, many of which are located far away on the genome from the gene they control.
This is arguably the most significant step forward in our understanding of how the human genome works since the releases of the initial draft sequence in 2000 and the final draft in 2003. At that time it was deeply surprising to many to find that humans possessed only around 20,000 genes occupying less than 2% of the genome, leading some to label the other 98% as ‘junk’ DNA.
Evidence from functional and genome-wide association studies over the years has made this an increasingly defunct term as it became apparent that a large proportion of the single base mutations that cause disease fell between gene coding regions, but this new comprehensive analysis should put an end to the idea once and for all.
Dr Ewan Birney of the European Bioinformatics Institute (EBI) in Cambridge who coordinated the data analysis said “This will give researchers a whole new world to explore and ultimately, it’s hoped, will lead to new treatments”. He also pointed out that the job was still far from done, and that deep characterisation is probably only around 10% complete. It is quite possible that much of the remaining 20% of the genome has a functional role that has yet to be identified.
The mapping provides new insights into gene organisation and most of all, mechanisms of regulation. A central goal in biology – understanding the enormous diversity of gene expression in different cell types under various physiological conditions – can be considered partly achieved.
The project yielded invaluable information on the human transcriptional regulatory network with systematic analyses of transcription factors, chromatin structure and regulatory modifications. All these findings shine new light on our concept of the gene.
Some of the newly identified elements correspond to sequence variants linked to human disease, and can therefore guide interpretation of these variations. Genome-wide association studies have previously identified many noncoding variants associated with common diseases and traits. Such variants systematically perturb transcription, alter chromatin states, and form regulatory networks. ENCODE’s results point to the involvement of regulatory DNA variation in common human disease and provide pathogenic insights into diverse disorders.
The publication of such a detailed analysis of the functionalities of the human genome has understandably generated much enthusiasm among scientists and general public alike. Confirmation that a far larger chunk of our genome is biologically active than previously thought has been an exciting discovery and researchers hope the findings will lead to a deeper understanding of numerous diseases.
It is however important to remember, and for the scientific community to clearly acknowledge, that despite these fantastic results it may be many years before patients see any benefits from the project. Better understanding of the functional complexity of the human genome will undeniably lead to improved control of disease and to better treatments, but the road to clinical implications and applications is still long and difficult.
- ENCODE data describes function of the human genome (genome.gov)