Ku et al Modern Pathology 2012;25(8):1055-1068l Mod Pathol. 2012;25(8):1055-1068
Recent advances in genotyping and sequencing technologies have provided powerful tools with which to explore the genetic basis of both Mendelian (monogenic) and sporadic (polygenic) diseases. Several hundred genome-wide association studies have so far been performed to explore the genetics of various polygenic or complex diseases including those cancers with a genetic predisposition. Exome sequencing has also proven very successful in elucidating the etiology of a range of hitherto poorly understood Mendelian disorders caused by high-penetrance mutations. Despite such progress, the genetic etiology of several familial cancers, such as familial colorectal cancer type X, has remained elusive. Familial colorectal cancer type X and Lynch syndrome are similar in terms of their fulfilling certain clinical criteria, but the former group is not characterized by germline mutations in DNA mismatch-repair genes. On the other hand, the genetics of sporadic colorectal cancer have been investigated by genome-wide association studies, leading to the identification of multiple new susceptibility loci. In addition, there is increasing evidence to suggest that familial and sporadic cancers exhibit similarities in terms of their genetic etiologies. In this review, we have summarized our current knowledge of familial colorectal cancer type X, discussed current approaches to probing its genetic etiology through the application of new sequencing technologies and the recruitment of the results of colorectal cancer genome-wide association studies, and explore the challenges that remain to be overcome given the uncertainty of the current genetic model (ie, monogenic vs polygenic) of familial colorectal cancer type X.
Recent developments in high-throughput sequence capture methods and next-generation sequencing technologies have made exome sequencing a viable approach to the identification of pathological mutations, both from a technical standpoint and in terms of being cost-effective.[1–4] The advent of exome sequencing has already contributed significantly toward the identification of new causal mutations (and genes) for a number of previously unresolved Mendelian disorders such as Kabuki syndrome, Miller syndrome, Sensenbrenner syndrome, and Fowler syndrome to name just a few. Further, exome sequencing has proven to be an effective tool to interrogate the genetic basis of Mendelian disorders in samples derived from both families and unrelated individuals.[5–8] Since the inception of the idea of using exome sequencing as both a discovery and a diagnostic tool for Mendelian disorders, this field has advanced very considerably. Accompanied and aided by other technical advances such as the development of computational and statistical approaches to interrogate the myriad variants identified by exome sequencing,[12, 13] including algorithms to detect copy number variants using exome sequencing data, and the idea (and practical demonstration) of using single-nucleotide polymorphism genotypes extracted from exome sequencing data to perform accurate genetic linkage mapping to reduce the ‘search space’ for genetic variants, exome sequencing has emerged as a mature analytical approach.
Although major progress has been made in understanding the genetic basis of Mendelian disorders over the past 3 years using exome sequencing, so far only limited studies have interrogated familial forms of cancer, ie, familial pancreatic cancer and hereditary pheochromocytoma (a rare neural crest cell tumor). By harnessing the latest technological advances, Jones et al  identified a germline truncating mutation in PALB2 through exome sequencing a single patient with familial pancreatic cancer. That this patient might have a familial form of pancreatic cancer was suggested by the fact that his sister had also developed the disease. In similar manner, mutations in MAX, the MYC-associated factor X gene, were also identified through sequencing the exomes of three unrelated individuals with hereditary pheochromocytoma.
Since 2005, >100 genome-wide association studies have been performed to interrogate the genetic basis of various sporadic or polygenic forms of cancer (such as colorectal, prostate, breast, and lung) for which numerous statistically robust and novel single-nucleotide polymorphisms or genetic loci have been identified.[18, 19] In addition to their polygenic nature, these cancers are multifactorial, involving a complex interaction of multiple genetic and environmental factors. By contrast, little progress has so far been achieved in the context of ‘familial’ cancers (ie, cancers displaying a very evident family history with clustering of multiple affected family members). More specifically, familial forms of cancer typically occur in more individuals in a given family than would be expected by chance alone. Familial cancers are often characterized by their occurrence at a comparatively early age, thereby indicating the potential presence of a gene mutation that increases the risk of cancer. However, familial clustering of cases may also be a sign of a shared environment or lifestyle, or alternatively chance alone. By contrast, sporadic cancers lack any obvious family history of the disease.
The slow progress of research into familial cancer has been illustrated, for example, in hereditary diffuse gastric cancer. CDH1 was the first causal gene identified for this cancer in 1998, and it remains the only known gene underlying hereditary diffuse gastric cancer. However, germline mutations in this gene account for only a proportion of hereditary diffuse gastric cancer cases, suggesting that an as-yet-to-be identified gene(s) is likely to be responsible for the remaining cases unexplained by CDH1. Similarly, BRCA1 and BRCA2 are the only high-penetrance genes for familial breast cancer, although numerous novel single-nucleotide polymorphisms and genetic loci conferring low-to-moderate risk or effect size (odds ratio <1.5) have been identified by genome-wide association studies of polygenic breast cancer.[22, 23] Some of these common alleles have been reported to modify risk in BRCA1 and BRCA2 mutations carriers. However, so far the results from genome-wide association studies have limited value for individual risk prediction, as compared with the high-penetrance inherited mutations in causal genes for familial breast cancer which can prompt drastic clinical intervention such as mastectomy. An analysis to evaluate the potential for individualized disease risk stratification based on common single-nucleotide polymorphisms identified by genome-wide association studies in breast cancer came to the conclusion that the clinical utility of single, common, low-penetrance genes for breast cancer risk prediction is currently quite limited.
In the context of familial colorectal cancer, the genetic causes of familial adenomatous polyposis and Lynch syndrome have been well documented; in most instances, they are accounted for by germline mutations in the APC gene and DNA mismatch-repair genes (ie, MSH2, MLH1, MSH6, and PMS2), respectively. For example, ~90% of familial adenomatous polyposis cases are caused by germline mutations in the APC gene. The majority of these mutations introduce a premature stop codon resulting in a truncated protein. Similarly, the MSH2 and MLH1 genes harbor >90% of the germline mutations found in Lynch syndrome patients.[27, 28] By contrast, the genetic etiology of familial colorectal cancer type X remains largely unknown. It is widely anticipated that new insights generated from studies on familial colorectal cancer type X will lead to the molecular characterization of a novel form of familial colorectal cancer which will necessitate the reclassification of subsets of families with a strong history of colorectal cancer.
How to Interrogate the Genetics of Familial Colorectal Cancer Type X?
The nature of the disease determines the study design required to unravel the causal mutations or risk-predisposing variants for familial colorectal cancer type X. However, there is little evidence to show whether familial colorectal cancer type X is a monogenic or polygenic disease or whether it is somewhere in between. The evidence suggesting that familial colorectal cancer type X is a monogenic disease comes mainly from the fulfillment of Amsterdam Criteria. The Amsterdam Criteria state that at least three relatives must have colorectal cancer. However, the familial aggregation, with multiple affected family members in one family, could also be due to shared non-genetic factors, which would not therefore necessarily be compatible with the monogenic model. Such environmental factors would be expected to interact with multiple genetic risk factors causing colorectal cancer, a multifactorial disease model proposed for polygenic disease. This therefore raises the question as to whether the Amsterdam Criteria are sufficient to support a monogenic basis for familial colorectal cancer type X. Furthermore, some of the clinical features of familial colorectal cancer type X implied that it could have a polygenic basis. This uncertainty in the nature of the disease for familial colorectal cancer type X presents substantial challenges in terms of deciding upon an optimal approach to interrogate its genetic basis.
The targeted sequencing of causal genes, already applied in the context of other familial cancers (such as CDH1 (hereditary diffuse gastric cancer), BRCA1 and BRCA2 (familial breast cancer), and the genes underlying hereditary pheochromocytoma), appears to be a worthwhile approach to identify deleterious germline mutations for familial colorectal cancer type X. The rationale is that germline mutations in these genes could underlie different familial cancers, as for example in the case of the PALB2 germline mutations that have been found in both familial pancreatic and breast cancers.[16, 63] Another notable example is provided by the germline mutations in the BRCA2 gene that not only increase the risk of breast and ovarian cancer, but also pancreatic cancer. This targeted sequencing approach has been greatly aided by high-throughput enrichment methods and next-generation sequencing technologies to selectively enrich for regions of interest. Hundreds of genes can be sequenced efficiently, leveraging these technological advances compared with traditional PCR-based Sanger sequencing. The efficiency of this approach has been exemplified in a targeted sequencing study of germline mutations in 21 tumor suppressor genes for 360 women with inherited ovarian, peritoneal, or fallopian tube carcinoma. This study harnessed the power of the Sure-Select enrichment system and the Illumina sequencing platform to sequence these genes; 24% of the patients were found to carry germline loss-of-function mutations in 12 genes, six of which had not previously been implicated in inherited ovarian carcinoma. Although this targeted approach has limited discovery value, as these genes had already been implicated in causing familial cancers, it could still have some novelty value by identifying germline mutations in known genes for cancers, which have not yet been linked to these genes.
This targeted approach can be expanded to include the entire set of exons in all genes in the human genome. Exome sequencing on its own or coupled with linkage analysis has already unravelled multiple new causal mutations and genes for Mendelian disorders.[7, 8] Furthermore, these discoveries were made by exome sequencing fewer than 10 patient samples in most of the studies reported. As such, it is also widely anticipated that exome sequencing will represent a powerful tool to reveal the genetic causes of familial colorectal cancer type X by identifying rare and deleterious or high-penetrance mutations within gene coding regions. However, the appropriate selection of cases will have a key role in determining the success or otherwise of exome sequencing in this context. In addition to fulfilling the Amsterdam Criteria, and excluding germline mutations in mismatch-repair genes, selecting cases with a very early onset of disease, severe clinico-pathological manifestations or the ‘extreme’ familial colorectal cancer type X phenotypes are expected to enrich for the ‘monogenic’ component and hence enhance our chances of identifying high-penetrance mutations. Recurrent mutations (similar mutations in different samples) or genes harboring several different deleterious mutations (which include single-nucleotide variants and small indels) across multiple samples can then be prioritized for further studies using a larger sample of cases.
On the other hand, if we assume that familial colorectal cancer type X has a polygenic component, then genome-wide association studies would represent the ideal approach to identify common single-nucleotide polymorphisms associated with this disease. Further, whole-genome genotyping arrays would also allow copy number variants to be investigated to a certain extent for their associations with familial colorectal cancer type X within a single genome-wide association study. High-density genotyping arrays have been used to identify copy number variants in a cohort of 41 colorectal cancer patients who were below 40 years of age at diagnosis and/or who exhibited an overt family history. Multiple copy number variants, encompassing genes such as CDH18, GREM1, and BCR, were identified in six patients as well as two deletions encompassing two microRNA genes, hsa-mir-491/KIAA1797 and hsa-mir-646/AK309218. Interestingly, these copy number variants had not previously been reported in relation to colorectal cancer predisposition, nor had they been encountered in large control cohorts. This illustrates the potential power of copy number variant investigation to identify novel causal or susceptibility genes or genetic loci for both familial and sporadic colorectal cancers. Through another interesting observation, multiple genomic aberrations including copy number gains and losses in different chromosomes have also been detected in 30 mismatch repair-proficient familial colorectal cancers. In particular, the frequency of 20q gain is remarkably increased when compared with sporadic colorectal cancer, suggesting that the 20q gain is involved in the genetic etiology of these mismatch repair-proficient familial colorectal cancers. The finding that most of these genomic aberrations were also observed in sporadic colorectal cancer further suggests that familial and sporadic colorectal cancers could share genetic predisposition to a certain extent.
It is however noteworthy that genome-wide association studies represent an indirect association study design, based on linkage disequilibrium, to detect the disease-causing variants, as compared with direct sequencing. To achieve the required statistical power and significance threshold to detect common single-nucleotide polymorphisms conferring small effect sizes (odds ratio <1.5), several thousands of cases and controls are required for the initial genome-wide genotyping and subsequent replication studies. Although the cost of genotyping arrays is steadily becoming much cheaper, a hefty investment is still required to analyze thousands of samples. In addition to this cost, collecting the adequate sample size of patients to embark on a genome-wide association study is a considerable challenge if this is to be achieved without an international consortium (because of the rarity of familial colorectal cancer type X as compared with sporadic colorectal cancer cases). The polygenic basis of familial colorectal cancer type X is still a speculative issue. Bearing in mind this uncertainty, an alternative is to leverage the results from genome-wide association studies of colorectal cancer by genotyping the robust single-nucleotide polymorphism associations in a familial colorectal cancer type X cohort. This approach might be more feasible in terms of cost-effectiveness and sample size (without the need of a stringent significance threshold to account for several hundred thousand single-nucleotide polymorphisms). The penalty of multiple testing imposed in genome-wide association studies should increase the attractiveness of this approach in the context of testing single-nucleotide polymorphisms identified by genome-wide association studies for familial colorectal cancer type X. One may speculate that if familial colorectal cancer type X has a polygenic component, some of these polymorphisms should also be associated with familial colorectal cancer type X, which would then warrant a comprehensive genome-wide association study for familial colorectal cancer type X in the future. This speculation appears reasonable because common shared single-nucleotide polymorphisms or genetic loci have been found in several different cancers. There have been several examples of the practical utility of genome-wide association study results in the context of familial cancers. These studies have provided evidence to suggest that low-penetrance variants may explain the increased cancer risk in familial colorectal cancer[106–108] and in familial testicular germ cell tumors.
Finally, the genes or genetic loci implicated in colorectal cancer by genome-wide association studies can be captured and sequenced. This targeted sequencing approach is very cost-effective as up to 96 samples can be multiplexed through barcoding for massively parallel sequencing. This targeted sequencing approach will interrogate both rare variants and common single-nucleotide polymorphisms in the loci identified by genome-wide association studies. The promise of this approach in unravelling rare variants in loci implicated by genome-wide association studies has already been demonstrated.[51,53–55] For example, deep resequencing of such loci has identified independent rare variants associated with inflammatory bowel disease.
Perspectives and Conclusions
The genetic and clinical differences between Lynch syndrome and familial colorectal cancer type X have been well documented. However, the genetic etiologies of familial colorectal cancer type X remain to be determined. There is also a paucity of evidence to indicate one way or the other whether familial colorectal cancer type X is a monogenic or a polygenic disease. On the other hand, the genetics of sporadic/polygenic colorectal cancer have been comprehensively investigated by >10 genome-wide association studies over the past few years. One striking observation is the sharing of common single-nucleotide polymorphisms or genetic loci across different cancers. It is therefore reasonable to speculate that if familial colorectal cancer type X has a polygenic basis, some of the single-nucleotide polymorphisms identified by genome-wide association studies as conferring risk of colorectal cancer might be expected to show associations with familial colorectal cancer type X as well. Given the expense and logistic challenges involved in collecting a large number of familial colorectal cancer type X cases to embark on a genome-wide association study, together with the uncertainty of the disease model, we believe that the genotyping of genome-wide association study-identified single-nucleotide polymorphisms in familial colorectal cancer type X would be a more feasible first approach to explore the genetic etiology of this disease. However, given the low incidence of familial colorectal cancer type X (ie, only ~2–3% of colorectal cancer families meet Amsterdam Criteria and about half of these are Lynch syndrome cases), collecting an adequate large sample size is difficult and challenging especially for studying the association of single-nucleotide polymorphisms with modest effect sizes. Thus, National or International Consortia involving many centers are likely to be needed to recruit large numbers of patients. Alternatively, the genes or loci identified by genome-wide association studies could be investigated using a targeted sequencing approach to unravel rare variants of larger effect size.
One of the limitations of genome-wide association studies is that they are based upon an indirect association study design, which is reliant on linkage disequilibrium to identify the disease functional variants. As a result, the surrogate markers (ie, the associated single-nucleotide polymorphisms) identified by genome-wide association studies generally lack functional significance. Furthermore, to enhance the statistical power, genome-wide association studies have tended to lump all colorectal cancers in the disease group, even although it is well recognized that colorectal cancers are inherently heterogeneous. These challenges have led to the notion and conceptualization of ‘molecular pathological investigation’, which is a relatively new field of epidemiology based upon the molecular classification of cancer. It is a multidisciplinary field involving the investigation of the interrelationship between exogenous and endogenous (eg, genetic) factors, tumoral molecular signatures, and tumor progression. Further, integrating genome-wide association studies with molecular pathological investigation allows examination of the relationship between susceptibility alleles identified by genome-wide association studies and specific molecular alterations/subtypes, which can help to elucidate the function of these alleles and provide insights into whether the detected susceptibility alleles are truly causal. Although there are challenges, molecular pathological epidemiology has unique strengths, and can provide insights into the pathogenic process.
In addition, exome sequencing of multiple ‘well-selected’ cases could be performed, assuming a monogenic basis in which high-penetrance mutations are predicted to underlie the genetic etiology of familial colorectal cancer type X. Exome sequencing of families with multiple affected individuals also represents a promising study design. This family-based design has the advantage that it allows for the genetically heterogeneous nature of familial colorectal cancer type X. Comparing unrelated individuals or probands from different families to identify ‘common/shared’ putative pathological variants or genes harboring putative pathological variants might not be a successful strategy for genetically heterogeneous diseases. However, it still depends on the degree of genetic heterogeneity (ie, allelic heterogeneity versus locus heterogeneity) characterizing the disease and this remains unknown. Although the family design is robust with respect to genetic heterogeneity (comparing affected and unaffected members in a family), one must recognize that it could also be problematic because the penetrance of disease mutations for familial colorectal cancer type X is likely to be lower than that for Lynch syndrome.
Moving forward, it is arguable that whole-genome sequencing should probably be considered instead of exome sequencing, as the cost differential between the two approaches (given a small patient sample size) would not be substantial, and because the former approach will generate genetic data for the entire genome rather than just 1–2% as for exome sequencing. However, one should select the study design that best fits the hypothesis where rare deleterious mutations in coding regions underlie the genetic etiology of a Mendelian disorder or familial cancer. So far, all the discoveries made by whole-genome sequencing could also have been achieved using exome sequencing for Mendelian disorders. Furthermore, the genetic variants in most of the non-coding regions revealed by whole-genome sequencing remain ‘uninterpretable’ biologically. In taking a practical (rather than theoretical) point of view, whole-genome sequencing still presents a very substantial technical challenge as well as a challenge in terms of analyzing and interpreting the sequence data generated.
The disease models underpinning multiple familial cancers such as familial nasopharyngeal carcinoma,familial testicular germ cell tumor, familial chronic lymphocytic leukemia,and familial colorectal cancer (familial colorectal cancer type X) remain contentious as the high-penetrance mutations are yet to be identified. By contrast, multiple low-penetrance variants that confer an effect size of odds ratio <1.5 have been revealed through genome-wide association studies for the sporadic cases of these cancers; interestingly, some of these single-nucleotide polymorphisms have also been found to be associated with the familial cases (nasopharyngeal carcinoma, testicular germ cell tumor,[116, 117] chronic lymphocytic leukemia, and colorectal cancer). In the context of familial colorectal cancer type X, we believe that the disease model and its genetic basis are likely to become more apparent when the approaches that we have outlined and discussed are applied in practice. This should facilitate the iterative interrogation of the genetics of familial colorectal cancer type X and other familial cancers of similar nature before embarking on either a comprehensive genome-wide association studies or whole-genome sequencing approach.