Low penetrance variants and colorectal tumours
Although inherited susceptibility is responsible for 30% of all CRC (Lichtenstein, Holm et al. 2000), high-penetrance mutations in APC, the mismatch repair (MMR) genes, MUTYH, SMAD4, BMPR1A and STK11 account for <5% of cases (Aaltonen et al. 2007). The nature of the residual inherited susceptibilityto CRC is at present undefined, but a model in which high-riskalleles account for all of the excess inherited risk seems improbable.It is likely that the remaining CRC inherited risk is largely accounted forby common, low penetrance alleles. These alleles may either predispose directly to colorectal tumourigenesis or may have an additive effect on predisposition. Candidate alleles studied include variants on known tumour suppressor genes, oncogenes, DNA repair genes, folate metabolising genes, and others.
The APC I1307K variant is present in about 6% of Ashkenazi Jews,but is much rarer in those of other ethnic groups. I1307K createsan A8 tract (eight consecutive adenine residues) which appears to be somatically unstable, leadingto frameshift mutations (Laken et al. 1997). The tumour risk associated with I1307K has been controversial, but most recent reports suggest that it has a relatively small effect (perhaps only 1.5-fold risk of colorectal cancer), suggesting that the A8 tract is only modestly hypermutable (Gryfe et al. 1999).
A number of other low-penetrance alleles have been found with varying degrees of evidence and importance (table 1.1). The ability to identify these genes and to understand their interactions with other relevant environmental and genetic factors remains important however. It will help to stratify an individual patient’s risk for entry into surveillance programs and to reveal causative factors, allowing more effective prevention strategies.
Genome-wide association studies in cancer
To date a number of genome-wide association studies have been performed in breast (Easton et al. 2007; Stacey et al. 2007; Stacey et al. 2008), lung(Amos et al. 2008), prostate (Gudmundsson et al. 2007; Gudmundsson et al. 2007; Eeles et al. 2008; Gudmundsson et al. 2008), melanoma (Gudbjartsson et al. 2008) as well as colorectal cancer (Broderick et al. 2007; Tomlinson, Webb et al. 2007; Jaeger, Webb et al. 2008; Tomlinson et al. 2008). Most of these studies have been published over the last 2 years. The odds ratios for the loci identified range from 1.1 to 1.75, the majority having an odds ratio <1.5 (Easton and Eeles 2008). There has been a certain amount of replication between these studies, particularly for the locus 8q24 which has been associated with risk of breast, prostate and colorectal cancer in separate studies. However results so far suggest that these loci account for a small proportion of the overall risk.
It is difficult to speculate on the true function of these risk alleles. There appears to be very little epistasis between the 28 loci identified in these 5 cancer types. None of these loci are involved in DNA repair, frequently a cause of susceptibility to higher penetrance loci. This may underlie why so many case control studies have failed to yield significant results consistently, as the underlying hypothesis may have been inaccurate. One might speculate that many of the associations may be driven through their effects on gene expression, particularly as many lie in gene-poor regions.
Most GWAS have not been empowered to detect the effects of polymorphisms with minor allele frequencies (MAFs) <0.05; such variants are therefore sometimes included in the rare variant class. More often, rare variants are considered to be subpolymorphic (MAF <0.01), with very rare or ‘private’ variants having MAF <0.001. Clearly much of the distinction between ‘common disease-common variant’ and ‘rare variant’ models is arbitrary. Nevertheless it is probably worth arbitrarily defining them in order to illustrate important differences between common and rare variants models, in terms of gene discovery and possible clinical relevance. For example, the significance of rare variants is such that they are likely to have more biological impact than common variants, having arisen more recently in evolutionary terms (Bodmer and Bonilla 2008).
Rare variants as low-penetrance alleles
Rare variants will not be detectable by population association studies based on the use of linked polymorphic markers, even with very large case/control cohort studies. This is because of low allelic frequency and individually small contributions to the overall inherited susceptibility of a disease. These variants are less common than those studied in association studies (i.e. minor allele frequency (MAF) <0.05) but not as rare as obvious mutations (MAF >0.01), although such mutations may also be identified. Finding rare variants requires nomination of candidate genes likely to have a role in disease aetiology, which are then directly screened for sequence variants which may affect protein function. This is known as the ‘common-disease/rare-variant’ hypothesis (Pritchard 2001).
So far there have been few rare variants identified in colorectal cancer, partially because candidate genes are not easily identified, and because there have only been a few studies performed. In one such study variants in APC I1307K and E1317Q, in AXIN1, CTNNB1, and the mismatch repair genes hMLH1 and hMSH2 were more common in 124 multiple adenoma cases than in controls (Fearnhead et al. 2004). Studies of other candidate genes have produced results of low or no significance however (Dallosso et al. 2008; Zogopoulos et al. 2008).
Labelling APC I1307K a rare variant may not be accurate, as the frequency of the polymorphism in the Ashkenazi population where it is present is 6%, thus potentially suitable for large association studies. This distinction underlines the arbitrary nature of how such polymorphisms are labelled as rare or common variants.
Although the population attributable risk (PAR) of rare variants may be relatively high, the relative influence of these common variants is low, with reported odds ratios below 2 and peaking at approximately 1.2 (Easton and Eeles 2008). Most rare variants have odds ratios a little higher than 2 but not above 5, with a mean of 3.7 in observations thus far (Bodmer and Bonilla 2008). Their individual contributions are small, and they do not give rise to familial concentrations of cases. As techniques improve to interrogate genetic sequence in an inexpensive, high-throughput and efficient manner this method of identifying variants is likely to generate a higher yield of significant results in the near future.
A candidate gene approach demonstrated rare novel low penetrance breast cancer predisposition loci in three genes, PALB2, BRIP1, and RAD51C. (Seal et al 2006; Rahman et al 2007; Meindl et al 2010). This discovery was assisted by the identification of breast cancer cases in Fanconi Anaemia pedigrees. In general however, it is not a simple task to prioritize candidates for rare variant studies. In the short term, it is likely that discovery efforts will be focused largely on sequencing candidate genes. Nevertheless, it is becoming feasible to sequence entire genomes to discover variants, due to decreased costs and increased efficiency of such methods. In a proof of principle study, complete exomic sequencing of a patient with familial pancreatic cancer identified a germline truncating mutation in PALB2 which appeared responsible for this individual’s predisposition to the disease (Jones et al 2009), although mutations in this gene are thought to be rare events in familial pancreatic cancer (Tischkowitz et al 2010).
The above mentioned rare variant loci for breast cancer in PALB2, BRIP1, and RAD51C were present in 10, 8 and 2 cases and 0, 1 and 0 controls respectively. Due to lack of power rare variants are difficult to validate by frequency alone in an association-type study. If we assume that a single variant or a set of related variants (for example, in the same gene) occurs at a general population frequency of 0.01–0.001, as many as 1000 unselected cases or controls will be required to detect with probability of about 0.7 more than one variant in a discovery screen (Bodmer & Tomlinson 2010).
Nevertheless, in principle the more common a variant is in the population the less its biological impact, thus allowing it to be passed on through generations without affecting reproductive ability. Rare variants are likely to reveal more about the pathophysiology of the disease process than common variants, as they are likely to have functional significance, as opposed to common variants which are probably in linkage disequilibrium with the causative mutations.
However it is more problematic to design useful studies of rare variants, as random variation identified cannot be readily assumed to be of functional significance, for example over 1500 variants of uncertain significance (VUSs) have been identified in BRCA1 using a sequencing based approach in breast cancer cases. The difficulty with rare variant discovery, particularly with whole exomic sequence analysis, will be to sort out the candidate functional variation from an almost overwhelming background of functionally irrelevant variation. The choice of targets will, in general, require some a priori assessment of functional effects. In silico biometric approaches have been developed with increasing predictive ability, although in vitro demonstration of effects are generally preferable in order to determine functional effects, for example simple effects on expression or protein truncation.
Studying a cohort of affected cases and subsequently examining a control set for variants identified can cause ascertainment bias. Thus it would be preferable to search for them in affected individuals and controls with equal rigour, and to use a statistical framework to determine whether variants are truly more common in the affected. These studies are likely to require extremely large and/or enriched data sets in order to identify and verify significant rare variants. Nevertheless it is becoming increasingly cost and time effective to perform even whole genome sequencing to determine genetic predisposition to both common and rare disease.
Copy number variation and predisposition
A copy number polymorphism (CNP) in MTUS1 was found to be associated with breast cancer predisposition (Frank et al. 2007), but not colorectal cancer (Monahan et al 2008). Recently, multiple studies have discovered an abundance of germline copy number variation (CNV) of DNA segments ranging from small to large chromosomal segments (e.g. Down syndrome results from trisomy 21), probably encompassing over 12% of the human genome (Redon et al. 2006). These include deletions, insertions, duplications and complex multi-site variants. The extent and role of these copy number polymorphisms (CNPs) is increasingly understood with the development of new techniques which allow us to identify such variation (Lupski 2007).
Many new CNPs have been identified from studies using whole genome SNP chips (Redon et al. 2006). However, the extent of linkage disequilibrium between SNPs and CNPs is unclear. The biological impact of these types of variation, for example on gene expression, is strikingly different. Expression profiles from SNPs and CNPs had little overlap (Stranger et al. 2007). Multiplex ligation-probe amplification (MLPA) has revealed complex whole exon duplications and deletions in APC which lead to the classic FAP phenotype (Schouten et al. 2002; McCart et al. 2006; Pagenstecher et al. 2007). High penetrance conditions such as FAP are rare whatever the type of mutation may be, e.g. point mutations or exon CNV. In theory, complex disease might be more susceptible to subtle, lower penetrance forms of variation which alter whole gene copy number without disabling gene function. In addition, the impact of individual CNPs may be even subtler, with disease phenotype being caused by combinations of low penetrance alleles.
Identification of significant CNPs is thus far hampered by the cost of performing such studies and the lack of techniques available. Genome wide association studies using SNPs are better at identifying deletion copy number variation that duplication (Locke et al. 2006). The new generation arrays (e.g. the Affymetrix 5.0 and 6.0, and Illumina 1 M) are being designed to offer the potential to simultaneously interrogate SNPs and CNPs in a single experiment. However, it may be that more comprehensive genome wide CNP maps are first required with the level of detail for CNPs that the Hapmap project provided for SNPs, before such genome wide CNP arrays are truly useful.
Much as SNPs can be either common or rare variants, so can CNPs. Using a comparative genomic hybridisation (aCGH) platform, a large study concluded that these CNVs are well tagged on existing SNP platforms and probably contribute little to disease predisposition (Craddock et al 2010). However this study was limited by the selection of CNVs and did not examine the impact of rare CNVs. While genome-wide association using common CNPs may be a potentially useful method to elucidate predisposition caused by such CNPs, this technique is not useful for such rare variants. The true role of these variants are as of yet of undetermined importance in human disease.
Functional consequences of risk alleles
When a Mendelian cancer predisposition gene is first identified, much of the evidence of it’s linkage to the phenotype derives from the finding of several different variants in that gene that
- Have strong functional effects (for example, protein-truncating mutations).
- Are often accompanied by ‘second hits’ in the cancer themselves.
- Are essentially absent from the general population and are hence associated with a very high relative risk.
Conversely the finding of a statistical association of low penetrance alleles with disease in association studies does not necessarily prove that the underlying variant has biological consequence such as causing low-penetrance predisposition. The likely disease-causing locus (with which the polymorphism is in linkage disequilibrium) has rarely been identified. IGF1 microsatellite and the TSER TYMS polymorphisms may be in linkage disequilibrium with a sequence variant which alters gene expression Monahan et al 2009). In a number of recent genome-wide and candidate gene association studies performed, the downstream effect of such variation on RNA and protein function is largely unknown. Nevertheless identification of a germline mutation in linkage disequilibrium with predisposition alleles has remained elusive and it is felt that allele-specific expression may be an important aetiological factor in colorectal cancer predisposition, particularly as many observed significant variants are not close to any known coding regions (Houlston et al. 2008; Valle et al. 2008). A SNP in SMAD7 whilst strongly associated with colorectal cancer risk was not found to alter expression of the gene despite lying in the 3’UTR region of the gene (Broderick et al. 2007). This study may have been limited by the effects of tissue-specific expression as it was performed on lymphoblastoid cell lines derived from cases. In contrast colorectal cancer associated locus 8q24 lies in a gene desert but contains regulatory elements of MYC, and this region preferentially binds TCF4 the primary target of the canonical Wnt signalling pathway (Tuupanen et al 2009; Pomerantz et al 2009).
Whilst association studies may not easily reveal germline mutations, quantitative and qualitative gene expression studies may be a useful direction for future studies.
Understanding proteomics may be used to yield information as to epistasis between genes as protein-protein interactions are amongst the most important determinants of interaction between genes. However, in variants identified to date there appears to be very little epistasis (Houlston et al. 2008). There have been some significant advances in the understanding of diseases such as Crohn’s disease (Parkes et al. 2007) and Coeliac disease (van Heel et al. 2007) due to the results of non-hypothesis driven association studies. A number of low-penetrance loci have been linked to specific biological pathways with likely biological relevance in these conditions. Five of the 10 SNPs identified by GWAS of colorectal cancer are in close LD with genes of the TGF/BMP signalling pathway including SMAD7, BMP2 and BMP4. In the next few years research is likely to reveal further advances in our understanding of the role of both common and rare low penetrance alleles in colorectal cancer by analysing the associated effects on expression and protein function, and by the identification of disease causing mutations.
Recently published data analysis from the CAPP2 study demonstrates significant modification of colorectal cancer risk in Lynch Syndrome patients by aspirin (Burn et al 2011). Thus even high penetrant syndromes may be modifiable by the environment. A priori, environmental agents are even more likely to modify lower penetrance genetic risk factors. An association of smoking-related cancers with polymorphisms at the cancer susceptibility locus 8q24 (identified by genome-wide association) has been suggested (Park et al. 2008). When the odds ratios for predisposition alleles are well below 1.5 there is a possibility of interaction (or bias) through an unmeasured environmental factor, as in the context of lung cancer risk and association with 15q which contains the nicotinic acetylcholine receptor (Chanock and Hunter 2008). Furthermore, the role of gene-environment interactions remains poorly defined and a reductionist approach to understanding the aetiology of colorectal neoplasia means that few such studies exist. Naturally common low penetrance susceptibility alleles will individually contribute little to overall risk, and it is likely that environmental ‘modification’ by smoking, exercise, body habitus, diet, etc. will provide a more complete explanation of what drives normal colonic crypts along the pathway to cancer. Indeed the odds ratios for environmental risk factors are comparable to many low penetrance alleles.
It is likely that combining data from genetic and environmental studies will provide clinicians with an increasingly powerful tool to understand and individual patient’s risk and tailor an appropriate management plan, whether this be colonoscopic screening, genetic testing, or lifestyle modification. It has been proposed that this data may be used in future in association studies in a two-step process whereby patients are first screened for epidemiological risk factors before entering the genotyping analysis (Murcray et al. 2009).
COloRectal Gene Identification (CORGI) Study
In 1997, the ColoRectal tumour Gene Identification(CoRGI) Study Consortium was formed to ascertain and collect biologicalsamples and data from families segregating colorectal cancer, in order to identify novel predisposition genes. This study led by Prof Ian Tomlinson has largely been undertaken in this laboratory by colleagues. Families and individuals are being collected with the following entry criteria;
- Bowel cancer aged < 75 years old
- Colorectal adenoma < 45 years old
- Three or more adenomas at any time
- Severely dysplastic/villous/large (> 1cm) adenoma
- Exclude Patients with IBD, pathogenic germline mutations, Peutz-Jeghers & juvenile polyposis.
Families were collected from centres throughout England, Scotland and Ireland.
CORGI 1 – Linkage Analysis: A genome wide linkage analysis has been performed on 69 families with a history of bowel cancer and/or polyps using the GeneChip Mapping 10K Xba 142 arrays containing 10 204SNP markers (Kemp et al. 2006). Families in this study had at least 2 individuals (except parent/child) affected. A maximum non-parametriclinkage statistic of 3.40 (P=0.0003) was identified at chromosomal region 3q21–q24. The Galway family is the largest pedigree with over 29 informative meioses, and a decision was taken for it to be studied separately (Chapters 3 and 4).
CORGI 1b A second similar set of 34 families has been collected. Linkage analysis was performed by colleagues which confirmed linkage at 3q22 (Papaemmanuil, Carvajal-Carmona et al. 2008).
CORGI 1c Approximately 100 families where siblings are affected are being collected for sib-pair analysis.
CORGI 2 – Genome Wide Association (GWA): CORGI 2 is a GWA study using an Illumina SNP platform on cases with the same entry criteria as CORGI 1 but without a family history. Colleagues initially genotyped 550,163 tag SNPs in 940 individuals with familial colorectal neoplasia and 965 controls using the Illumina Infinium platform. (Tomlinson, Webb et al. 2007). In CORGI 2b Approximately 42000 candidate SNPs with most significant association in CORGI 2 are being re-tested in a group of ~ 3000 colorectal cancer patients. Several loci which contain SNPs associated with colorectal cancer susceptibility (at 8q23, 10p14, 11q24, 15q13.3 and 18q21) have been recently identified by colleagues in this cohort (Broderick, Carvajal-Carmona et al. 2007; Tomlinson, Webb et al. 2007; Jaeger, Webb et al. 2008; Tenesa et al. 2008; Tomlinson, Webb et al. 2008). However no mutations have yet been identified at these loci with proven functional relevance.
CORGI 3 – Candidate gene screening: Genes in the CORGI 2 patient cohort are being screened for sequence abnormalities in functionally important genes such as those involved in DNA repair, the Wnt pathway, or other genes involved in the aetiology of colorectal neoplasia. Colleagues are also screening the patients included in CORGI 1 and CORGI 2 for gene mutations the loci identified by linkage or association respectively. Candidate genes EPHB1 and MBD4 have been screened for mutations at 3q21-24 in the CORGI 1 family set but none were found (Kemp, Carvajal-Carmona et al. 2006).
Because of the evidence from adenoma-to-carcinoma sequence model (Morson 1968; Fearon and Vogelstein 1990) the National Polyp Study (Winawer et al. 1993) and other prospective studies (Dove-Edwin et al. 2005; Dove-Edwin et al. 2006) we know that if polyps are removed during colonoscopy, cancer may be prevented. Thus colorectal cancer is one of the most preventable of all cancers, and some early evidence is emerging that colonoscopic screening may reduce colorectal cancer related mortality (Baxter et al. 2009). However, national colonoscopic screening programs are expensive, stretching the capacity of already busy services and therefore do not reach the whole population they target. In addition to lifestyle modification advice to reduce environmental risk factors, it may be possible to identify two groups of patients with inherited risk by understanding the underlying molecular aetiology.
(Copyright, Dr Kevin Monahan)
- Low penetrance risk and colorectal cancer: A review (familyhistorybowelcancer.wordpress.com)
- Hereditary Colorectal Cancer Syndromes (familyhistorybowelcancer.wordpress.com)
- Colorectal Cancer Aetiology (familyhistorybowelcancer.wordpress.com)
- Polyposis (familyhistorybowelcancer.wordpress.com)