ﺑﺎﺯﮔﺸﺖ ﺑﻪ ﺻﻔﺤﻪ ﻗﺒﻠﯽ
خرید پکیج
تعداد آیتم قابل مشاهده باقیمانده : 3 مورد
نسخه الکترونیک
medimedia.ir

Genetic association and GWAS studies: Principles and applications

Genetic association and GWAS studies: Principles and applications
Literature review current through: Jan 2024.
This topic last updated: May 20, 2022.

INTRODUCTION — Interest in the genetic determinants of disease originated with Gregor Mendel's observations on the genetics of the pea in the 1860s. Subsequent studies have identified many of the genes responsible for "Mendelian" diseases, conditions that follow a clear familial pattern. However, diseases inherited in a Mendelian fashion (eg, Huntington disease and cystic fibrosis) are rare.

The Human Genome Project has generated growing interest in genetic contribution to "complex" diseases. Such diseases combine some familial predisposition with a large environmental contribution. Examples include cardiovascular disease, diabetes, asthma, cancer, and obesity.

This topic will discuss the principles and clinical applications of genetic association studies in the elucidation of the genetic basis for common diseases with complex genetic components. Additional discussions of modes of inheritance and a glossary of genetic terms are presented separately:

Monogenic disorders – (See "Inheritance patterns of monogenic disorders (Mendelian and non-Mendelian)".)

Complex trait disorders – (See "Principles of complex trait genetics".)

Glossary of terms – (See "Genetics: Glossary of terms".)

TERMINOLOGY AND STUDY DESIGN

Genetic association studies — Genetic association studies are analogous to traditional epidemiologic association studies. Instead of seeking association between traditional risk variables (eg, hypertension) and disease outcomes (eg, stroke), a genetic association study looks for an association between a genetic variable and a specified condition.

Single nucleotide polymorphisms — The genetic variable most commonly studied is a single nucleotide polymorphism (SNP, pronounced "snip"). A SNP is a base pair change that occurs in at least 1 percent of the population. SNPs are not clearly deleterious, in contrast to mutations, which occur less frequently and generally have an adverse effect on protein function. (See "Basic genetics concepts: DNA regulation and gene expression", section on 'Genetic variation'.)

Study design — The usual study design for association studies is case-control or nested case-control, where controls are selected from the general population. Other designs, using family-based data (eg, pedigrees, parent-offspring trios, or sibling pairs) are becoming less popular due to difficulties in obtaining sufficient sample size and the need for specialized statistical methods for analysis [1].

Candidate gene approach — Earlier gene association studies employed a "candidate" gene approach in which a genetic variant of interest was selected on the basis of the known biology of a disease. The genetic variant may be of interest because of its presumed biologic function, or association with a disease in previous studies. In this approach, one or a small number of known variants are genotyped, usually by PCR methods, in a number of cases and controls.

Genome-wide approach (GWAS) — Genome-wide association studies (GWAS, pronounced "gee-wass") test hundreds of thousands of genes simultaneously (figure 1). GWAS use microarray technology, which "arrays" a large series of test sequences on a solid surface. Technological advances with microarrays have enabled researchers to genotype an individual for between 500,000 and 4 million SNPs on a platform no bigger than a microscope slide.

The principle behind microarrays is that DNA from an individual is hybridized against an array of short oligonucleotide probes (ie, DNA sequences) that are immobilized on a surface (the "array"). Sequences on the array are chosen to assay the regions of the genome with the most variation. If the individual being tested has the DNA sequence that matches a probe on the array, the DNA binds and is detected (figure 2).

Most GWAS are case-control studies, where a group of cases and a group of separate controls are gathered, DNA is isolated, and microarray data is obtained on each individual. The case-control design facilitates large sample sizes and therefore greater power, which is particularly important for detecting potentially small genetic effects. GWAS have also been performed using longitudinal cohorts, with cases and controls "nested" within the cohort.

At the most basic level, the distribution of the three genotypes for each SNP (AA, Aa, aa) is compared between cases and controls using a Chi-square test. This is repeated over and over for each of the 500,000 or more SNPs on the array. As human genetic variation is increasingly catalogued, for example in databases such as "Hap Map" and "1000 Genomes," this information can be used to impute genetic variants that were not directly measured by the microarray. In other words, the 500,000 to 2 million SNPs that are directly assayed are used to make a best guess on the other SNPs that were not measured, often yielding up to 6 million SNPs or more to test for association. With reference panels such as the Haplotype Reference Consortium (HRC), imputation can yield almost 40 million SNPs per person.

The number of association studies in the medical literature has expanded dramatically. Between 2007 and 2009, GWAS identified nearly 1000 SNPs associated with a range of human traits and common diseases [2]. By March 2011, that number had risen to almost 4000 SNPs for over 200 conditions [3]. The GWAS explosion has continued, as described in a manually curated catalogue of GWAS studies to help readers navigate the growing body of data [4].

Despite the issue of "missing heritability" (see 'Missing heritability' below), GWAS have had some notable successes identifying genes of large effect. An example is the relationship between a complement factor H (CFH) polymorphism and age-related macular degeneration (AMD). This polymorphism is involved in regulation of the alternate complement pathway that results in increased inflammation. Individuals with one allele coding for a histidine substitution for tyrosine in position 402 of the CFH protein (CFH Y402H) are at increased risk for AMD. Meta-analysis indicates a possible multiplicative model in that each copy of the C allele at this locus (coding for histidine) increases the risk of AMD by approximately 2.5-fold compared to the T allele (ie, the TC heterozygote group has an odds ratio [OR] for AMD of 2.5 and the CC homozygote has an odds ratio of approximately 6 relative to the TT genotype group) [5].

GENETIC DETERMINANTS OF COMPLEX TRAITS

Common disease-common variant hypothesis — The basic assumption regarding genetic determinants of complex disease is that common variants in many genes will each lead to a small rise (or fall) in the risk of disease, and that the overall risk of disease is determined by the combination of multiple variants and environmental exposures [6]. The single nucleotide polymorphisms (SNPs) chosen for inclusion on microarrays are usually the more common variants in the population.

A competing theory is "common disease-rare variant," in which it is anticipated that most common diseases will be due to multiple rare variants (<1 percent), perhaps even unique mutations. If this is the case, GWAS would be less useful, because they are designed to reveal associations that occur commonly; and one would need to move to whole genome sequencing to detect rare variants associated with a disease. The technology for whole genome sequencing has advanced quickly and has become competitively priced, and hence it is becoming more common. (See "Next-generation DNA sequencing (NGS): Principles and clinical applications".)

In reality, the dichotomy between common and rare variants is not absolute, and results from one type of association will likely complement the other. As an example, sequencing of the cystic fibrosis gene revealed that one major allele, delta F508, accounts for 67 percent of disease in Europe, but the remainder is due to hundreds of rarer disease alleles/mutations [7]. A study from 2018 indicated that many of the signals seen in complex diseases overlapped with the signals seen for Mendelian diseases that share the same phenotype; it seems likely therefore that common variants with small effects in a particular gene could explain the disease risk for a complex disease and that rare mutations with very large effects in that same gene could explain the disease risk for a Mendelian version of that disease [8].

GWAS have led to the successful identification of many genes involved in complex disease, such as coronary artery disease, type 2 diabetes, stroke, multiple sclerosis, breast cancer, bipolar disorder, rheumatoid arthritis, and Alzheimer's disease [9-17]. These studies largely support the hypothesis that multiple genes, each conferring small relative risks (ie, 1.2-1.8), collectively contribute to increased disease risk. Other GWAS results have shed light on the pathogenesis of a disease process (eg, the role of melatonin in type 2 diabetes) [18,19].

Missing heritability — Despite many successes, GWAS have not explained as much of the variance in complex traits as was originally expected. Heritability studies look at the variance in a trait and use family relationships (eg, twins and siblings) to partition how much of the variance is due to genetic factors and how much is due to environmental factors. As an example, studies have estimated that height is 80 to 90 percent "heritable," but GWAS and meta-analyses of GWAS on height have turned up more than 40 genes, accounting for less than 5 percent of the variance [20,21].This discrepancy between the numerator (amount of heritability explained from GWAS SNPs) and denominator (amount of total heritability from family/population data) has been termed "missing heritability" [2,22].

There are several hypotheses about what may account for the "missing heritability":

Some of the variation may be explained by rare alleles that will only be detected by whole genome sequencing, rather than SNPs. However, results from next-generation sequencing (NGS) studies (see "Next-generation DNA sequencing (NGS): Principles and clinical applications") do not always find major new genetic associations or explain further heritability [23]. In many cases the rare variants discovered have been "tagged" by common variants, and the extra sequencing has not revealed much additional information. Debate remains about whether rare variants are effectively tagged by common variants or whether they are independent. This debate is made more confusing by the finding that the answer depends on which measure of linkage disequilibrium is used, as different measures can give conflicting answers [24]. In other cases, NGS does indeed identify multiple new variants that increase the explained heritability [25].

Some of the variation may be explained by copy number variants (CNVs). CNVs are polymorphisms that consist of deletions or duplications of whole segments of DNA, spanning at least 1000 bases [26]. Although CNVs can show association for some diseases where SNPs have not yielded any signal (eg, autism) [27], it seems unlikely that common CNVs will explain most of the unaccounted heritability for the majority of complex traits [28]. An investigation of the association of CNVs with eight complex diseases found that most CNV signals were in areas that had already been highlighted by SNP associations, and most CNVs were "tagged" by SNPs [29]. (See "Genomic disorders: An overview", section on 'Copy number variations'.)

Current methodology may not be adequate to detect gene/gene interactions (epistasis). This is particularly true if each genetic variant alone shows no effect but shows an effect in the presence of the other variant. Modelling indicates that gene/gene and higher order interactions could account for missing heritability, but would not have been detected due to insufficient sample size in the studies to date [30].

Heritability may also be accounted for by epigenetic changes (eg, methylation of DNA, modification of histone proteins). There is increasing evidence that these non-DNA changes can also be passed on from generation to generation, potentially through various RNAs, challenging the traditional dogma that sperm carries only DNA [31,32]. Alternatively, epigenetic changes influence gene expression and can explain some of the variation in human disease not captured purely by sequence-based methods [33]. (See "Principles of epigenetics".)

Others, however, maintain that there is no "missing heritability," arguing that estimates of heritability are complex and may have been interpreted simplistically. High heritability does not necessarily mean genetic determination or genes of large effect; it may reflect a large component of shared environment as well [34]. For example, a study looking at within- and between-family polygenic risk scores found that a substantial proportion of within-family heritability was explained by socioeconomic status; this raises the possibility that shared environment may be underestimated and shared genetics overestimated by current methods [35].

Overestimates of total heritability may have created "phantom heritability" (presumed genetic contributions that could not be identified even if all causative genetic variants were identified) [30]. This line of argument is supported by the inability to explain all of the genetic variance in a trait even when thousands of SNPs are evaluated. For example, one model using nearly 30,000 SNPs could account for 45 percent of the variance in height [36]. Fitting all genotyped SNPs explains a substantial, but not a majority, of the heritability for such complex phenotypes as height, body mass index (BMI), von Willebrand factor levels, and QT interval [37].

The dichotomy between the common variant versus rare variant hypotheses is increasingly moot. It is becoming clear that the many genetic variants responsible for a disease span the whole spectrum of frequency from common to rare, and the spectrum of effect sizes ranges from small to large; together, it is the combined effect of hundreds to millions of genetic variants that determines disease risk, not to mention the environmental variables that also influence risk [38]. Chasing the "missing" heritability may end up detracting from the more fruitful work of identifying the etiologic pathways involved in a disease or identifying new "druggable" targets for therapeutic development.

JUDGING STUDY VALIDITY — It is increasingly necessary for physicians to become familiar with GWAS data and how to gauge validity of these studies.

Potential biases — There are multiple potential biases for case-control studies in traditional epidemiology. The potential for bias is less, though not absent, in genetic studies. In contrast to most environmental exposures, the genetic "exposure" is not chosen by the participant, doesn't vary with age or calendar year, is not subject to recall bias, and is not influenced by the disease or treatment.

One potential bias that cannot be avoided, however, is survivor bias, in that those who are included in a case-control study are those who have survived the initial "insult" (figure 3). As an example, in a study of stroke, results may differ between a case-control and cohort GWAS if certain genes are associated with initial stroke severity, causing early mortality before recruitment into a case-control study.

Validity framework — Many groups have published guidelines for critical appraisal of genetic association studies [39,40]. One framework, from the "Users' Guides to the Medical Literature" series, asks the following questions [41,42]:

Was the disease phenotype properly defined and accurately recorded by someone blind to the genetic information?

Have any potential differences between disease and non-disease groups, particularly ancestry, been properly addressed?

Was measurement of the genetic variants (ie, genotyping) unbiased and accurate?

Do the genotype proportions observe Hardy-Weinberg equilibrium?

Have the investigators adjusted for multiple comparisons?

Are the results consistent with other studies?

The issues raised in these questions are as follows:

Phenotyping — What may seem like a unique disease (eg, ischemic stroke), may be a heterogeneous group of diseases (eg, large vessel stroke, lacunar stroke, and embolic stroke), or may be variously defined (eg, clinical signs only, or CT/MRI imaging confirmation). Investigators may selectively report only the disease definition or subgroup that yields a significant association [43].

Alternatively, an apparent single disease entity may represent genetically separate but clinically similar diseases ("genetic heterogeneity"), as seen with epilepsy [44]. Including diseases with different genetic backgrounds may obscure a true genetic association.

To avoid bias, those who are doing the phenotyping should be blind to the results of the genotyping and vice versa. The possibility of bias is greater for candidate gene studies where single nucleotide polymorphisms (SNPs) are typed manually in contrast to GWAS in which the genotype identification is automated.

Ancestry and comorbidities — Although genetic association studies are less prone to traditional confounding than other case control studies, there are two potential sources of bias:

Ethnicity or racial mix — This particular form of confounding, referred to as "population stratification," occurs when disease and non-disease populations include a different ethnic/racial mix. If the likelihood of developing the condition of interest varies with ancestry, then any SNP with a different allele frequency between ancestral groups will appear to be linked (spuriously) to the disease (figure 4).

As with traditional confounders, the way to correct for this is to measure the confounder and adjust for it in the analysis or to stratify the analysis. Ethnic subgroups may be identified by self-report or by statistical analysis of SNPs themselves [45,46]. As an example, a spurious association between the CYP3A4-V polymorphism and prostate cancer in African-Americans disappeared when results were adjusted for additional genetic markers associated with ancestry in the population studied [47].

Comorbidities/associated phenotypes — An apparent association between a genotype and condition of interest may represent a direct association between the genotype and another condition, with the disease of interest being linked to this other condition. As an example, two GWAS showed an association between type 2 diabetes and a SNP in the FTO (fat mass and obesity associated) gene [10,48]. These studies selected diabetic patients and controls irrespective of their body mass index (BMI); another study that matched diabetic patients and controls on BMI showed no association. Thus, obesity is associated with diabetes and the initial SNP was thought to be linked with diabetes when in fact it was linked with the associated obesity.

Accuracy of genotyping — Genotyping error is a threat to the validity of genetic association studies. Genotyping errors may arise if there is a problem with the DNA samples or with the technology that is employed to identify alleles [49].

DNA samples may differ between diseased and non-diseased participants in ways that lead to inaccuracies in genotyping. As an example, in a GWAS for type 2 diabetes, blood used for the control population was stored from a cohort in 1958 while samples from patients with known diabetes were from the present day. The older blood resulted in genotyping errors that led to false positive SNP associations [50].

Genotyping error rates vary widely, from <1 to 30 percent [51]; rates up to a few percent are common in even the best studies [52]. It is common practice for investigators to cull SNPs that do not have a high call rate (eg, 95 to 98 percent). Call rate refers to the percent of DNA samples that can be genotyped. This increases the validity of the data but does not ensure that the genotype information is correct. The genotype may be misidentified as, for example, when one of the alleles in a heterozygote is harder to identify than the other, resulting in a heterozygote being mislabeled a homozygote. Checking for Hardy-Weinberg equilibrium may be one way of detecting genotyping problems. (See 'Hardy-Weinberg equilibrium' below.)

Hardy-Weinberg equilibrium — In the same way that most continuous variables in medicine observe a normal distribution, most allele distributions observe what is called Hardy-Weinberg equilibrium (HWE). This describes the steady state where there are no selective forces (eg, mutation, inbreeding, selective survival) acting on a particular locus or gene. The Hardy-Weinberg law states that if there are two alleles (named A and a) at a particular locus, with frequency p and q respectively, then after one generation of random mating, the genotype frequencies of the AA, Aa, and aa groups in the population will be p2, 2pq, and q2, respectively. Given that there are only two alleles possible, A or a, then:

p + q = 1,

and

p2 +2pq + q2 = 1.

HWE is commonly used as a quality measure. It has become general practice, in a genetic association study, to check whether the allele frequencies at a particular SNP observe HWE proportions. Results are considered to be consistent with HWE when the p-value is <0.05. HWE calculators are available online [53,54]. For a cohort study, HWE should be tested in the whole study population, whereas for a case-control study, it should be tested in the controls, since these are supposedly representative of the general population.

However, HWE is non-specific and may be insensitive [55]. Genotyping error and population stratification are two of many factors that may upset HWE proportions. Other factors that may affect HWE include new mutations, inbreeding, or a selective advantage of one allele over another.

Multiple comparisons — The usual p-value for judging statistical significance in traditional epidemiological studies (0.05) is intended for a single comparison. The 0.05 threshold means that a result as extreme or more extreme than the one seen will occur by chance once in 20 times; this is taken as low enough to not be chance and to indicate a significant association. However, if one is looking at 100 SNPs, then by chance alone one might expect 5 SNPs to reach this threshold; in fact there is over a 99 percent chance that at least one will reach this threshold.

In GWAS, in which 1 to 40 million SNPs are tested simultaneously, the possibilities for false-positive are countless. The current consensus is that for such large scale studies, a p-value in the range of 5x10-8 (in contrast to the usual 5x10-2) should be considered the threshold [56]. Some have argued that a simple Bonferroni correction for 106 comparisons is overly conservative given that the tests are not independent; the finding of linkage disequilibrium (LD) means that the tests are correlated. A review has summarized the various approaches to determining the correct threshold, including techniques that account for LD, false-positive report probability, false discovery rate, and Bayesian methods [57]. These various methods all suggest a threshold between 10-7 and 10-8, and hence the original threshold of 5 x 10-8 remains widely used.

Replication — It is essential that studies replicate their results, given the large potential for false positive signals. This is commonly done by repeating the study in different populations.

There is a growing movement to combine GWAS results using meta-analysis. Most of the genetic associations between SNPs and complex diseases are small in magnitude, and therefore even sizable studies may fail to detect underlying associations [58]. Meta-analysis improves the identification of replicable associations and increases precision by increasing power. This is illustrated by studies in various conditions such as Crohn disease and asthma [59,60].

The HuGE Net website lists many of the meta-analyses performed to date and also hosts the HuGE Navigator, where one can find out what single studies, genome-wide association studies, meta-analyses, and synopses are available [61,62].

Replication can also take the form of functional studies in vitro or in animal models. Demonstration by cell culture or mouse mutagenesis that genetic variants can lead to difference in protein level or function can provide powerful support for the effect of genetic variants. Follow-up studies can also provide evidence to support an initial GWAS result, including:

Candidate gene association studies that focus on a few genetic variants that are near the site of the initial GWAS locus

Fine mapping studies, involving a denser set of SNPs around the initial GWAS locus or involving resequencing of the entire locus to pick up rare variants

INTERPRETING GWAS RESULTS — Investigators usually report the magnitude of a genetic association using traditional measures of association: relative risks (RRs) in cohort studies, odds ratios (ORs) in case-control studies, and hazard ratios (HRs) in survival analyses that take account of the timing of events.

Understanding the magnitude of the risk depends on the genetic model involved. For dominant variant alleles (producing a protein isoform that dominates function), the presence of even one copy (ie, heterozygosity) will result in maximal increase in risk. For recessive variant alleles, both alleles must be present to result in an increase in risk (ie, heterozygotes will not show an increase in risk). In both cases, a single RR, OR, or HR describes the magnitude of the association.

If the effect of a variant allele is additive, then there is a "dose-response" effect: its presence in one gene will lead to an increase in risk, while its presence in both genes will lead to a further increase. There are two possible ways to calculate this further increase: one is to take the square of the risk (variably called the log-additive, per-allele, or multiplicative risk model), and the other is to take two times the risk (called the linear additive model). Recent work in diabetes indicates that most associations seem to follow the log-additive model [63].

It is important to understand that the magnitude of the effect does not necessarily translate into a causal relationship between the single nucleotide polymorphism (SNP) and the disease. It is possible that the SNP is only a marker for another SNP nearby that is linked (termed "linkage disequilibrium") and that is the true causal variant. While linkage disequilibrium does not detract from the potential use of the SNP as a clinical marker, it limits the potential to draw pathophysiologic conclusions from the identified SNP.

Much of the work related to interpreting GWAS focuses on making sense of the hits that have been found and understanding how the variants affect function. In fact, over 85 percent of the hits in GWAS of complex diseases are in non-coding regions and presumably do not affect the coding sequence; emerging evidence indicates that these non-coding regions play a role in splicing, binding to proteins, and activating or repressing enhancers and promoters, among other key functions [64,65].

CLINICAL APPLICATION — A framework for translating GWAS results to clinical application has been developed and is presented here [41,42]:

Does the genetic association improve predictive power beyond easily measured clinical variables?

What are the absolute versus relative genetic effects?

Is the risk-associated allele likely to be present in my patient?

Is the patient likely better off knowing the genetic information?

These issues need to be addressed thoughtfully and quickly, particularly given the "direct-to-consumer" availability of genetic testing. (See "Personalized medicine", section on 'Direct-to-consumer testing'.)

Predictive power — The effect of single SNPs in complex disease to date has been small (ie, odds ratios in the 1.1-1.6 range). Therefore, there has been interest in combining the effect of many genes into a genetic "profile" for greater clinical utility; these are called polygenic risk scores [66].

As an example, in one study, investigators created a profile of 5 SNPs associated with prostate cancer [67]; they found an OR of 1.6 for those who were homozygous or heterozygous for the risk allele at one SNP, and up to 4.5 for those who were homozygous or heterozygous for the risk allele at 4 SNPs. The greater the number of SNPs in each profile, however, the lower the number of people with that combination; hence the increased magnitude of effect is offset by the small number of people to whom that effect is relevant [68].

For dichotomous (eg, yes/no) outcomes, there are a number of statistical tools to quantitate how much predictive power the genetic information adds to existing data. One is to calculate the area under the receiver operating characteristic (ROC) curve, an approach often used for diagnostic tests [69]. An ROC curve plots the true positive rate (sensitivity) on the y-axis against the false positive rate (1-specificity) on the x-axis (figure 5). An ROC curve with no greater predictive ability than chance would approximate a straight diagonal line from the origin (0, 0) to the upper right hand corner (1.0, 1.0). The area under the curve (AUC) would be 0.5. The visual representation of a perfectly predictive test would be a line that goes straight up the y-axis to 1.0 and then straight across the x-axis to 1.0 and would have an AUC of 1.

Success in creating risk scores with a handful of SNPs has led some to try creating risk scores with tens or hundreds of thousands of SNPs, hoping to increase predictive power. However, predictive power levels off very quickly. As an example, one study looked at polygenic risk scores for five common complex diseases and found that increasing the number of SNPs in the risk score from tens to millions only increased the predictive power negligibly (increase in AUC of 0.01) and only explained 2 to 4 percent of the variance in disease risk [70].

There are a number of reasons why the addition of SNP information may not increase the predictive power of these models substantially:

The effect of genetic polymorphisms present from birth may genuinely be dwarfed by environmental factors over a lifetime, especially for late-onset diseases.

SNP arrays are based on the most common polymorphisms (>1 percent) and lose the ability to tag variants that are increasingly rare (<0.1 percent). It appears that in some cases the sheer number of rare variants can contribute to heritability despite their individual rarity [25].

Some of heritability appears to be carried by copy number variants (CNVs), and CNVs are not routinely included in polygenic risk scores [71]. (See "Genomic disorders: An overview", section on 'Copy number variations'.)

There are rare examples, particularly in the area of pharmacogenomics, where genetic associations may have clinical applications because of their large magnitude. As an example, a SNP in the thiopurine methyltransferase (TPMT) gene identifies individuals who are at increased risk of life-threatening hematologic toxicity from the chemotherapeutic agent 6-mercaptopurine [72]. Genotyping this SNP can avoid substantial harm by additional monitoring or substituting an alternative chemotherapeutic agent in those with the high-risk genotype. (See "Overview of pharmacogenomics" and "Thiopurines: Pretreatment testing and approach to therapeutic drug monitoring for adults with inflammatory bowel disease" and "Treatment of acute lymphoblastic leukemia/lymphoma in children and adolescents".)

Another pharmacogenomics example is the use of CYP2C19 genetic testing for guiding the choice of P2Y12 inhibitors after primary percutaneous coronary intervention. In a trial using genotype-guided selection of antiplatelet agent in which individuals with loss-of-function alleles received ticagrelor or prasugrel and those without received clopidogrel, bleeding rates were significantly lower in the genotype-guided group compared with non-genotype-guided controls, with equivalent thrombotic event rates [73].

The most likely short-term clinical application of a genetic association is to provide prognostic information in the context of other known predictors. To be of value, the genetic marker must provide independent predictive power beyond traditional clinical predictive variables (eg, age, sex, smoking status) and beyond easily obtained surrogate measures of familial aggregation, such as family history. This is often not the case, particularly if the genetic polymorphism exerts its effect through a readily measured variable (eg, a gene controlling lipids exerts its effect through increases in LDL).

As another example, the addition of genetic data to a model for prediction of type 2 diabetes did not improve the model fit and resulted in appropriate reclassification of, at most, 4 percent of people [74].

However, a high AUC or even an increment in AUC with the addition of genetic variables does not guarantee that the genetic information will be clinically useful. Points along the ROC curve may still correspond to sensitivities and specificities that are too low to be used for screening or prognosis. Furthermore, even if the sensitivity and specificity are high, the usefulness of the genetic variant for classification of risk may be limited due to low allele frequency (ie, only a small proportion of the risk group will be detected or stratified correctly) [75].

In summary:

A very small p-value does not necessarily translate into a useful clinical marker because effect size (RR or OR) may be small.

A high effect size does not necessarily translate into a clinical useful marker because the AUC may be low.

A high AUC does not necessarily translate into a clinical useful marker because the combined sensitivity and specificity may be insufficiently high for screening or prognostication.

High sensitivity and specificity may not necessarily translate into a clinical useful marker because the allele frequency may be low.

These considerations show the difficulty in translating a statistically significant genetic predictor into a clinical useful and accurate clinical classifier.

Absolute versus relative effects — If the patient's risk of disease is low in the absence of a variant allele, even a 5- or 10-fold increase in risk in the presence of the allele may represent a small absolute increase in risk. Conversely, if the baseline risk is high, a modest increase in relative risk could impact clinical decision making.

As an example, the Factor V Leiden mutation increases the risk of venous thrombosis by approximately sixfold [76]. However, the baseline risk of thrombosis in the general population is sufficiently low (approximately 0.2 percent) that one would not use genotyping as a population screening test [77]. However, the prevalence of Factor V Leiden in patients with venous thrombosis is 12 to 20 percent, and thus testing some patients with established venous thrombosis may be appropriate [78,79]. (See "Evaluating adult patients with established venous thromboembolism for acquired and inherited risk factors", section on 'Evaluation for hypercoagulable disorders'.)

Allele frequency in the relevant population — In applying the results, clinicians must consider the likelihood that the particular allele is present in a particular patient. As an example, while factor V Leiden is relatively common in certain White populations (approximately 1 in 20 White individuals are heterozygous), it is virtually non-existent in populations from China. Hence, genotyping for factor V Leiden is unnecessary in an individual from China presenting with unprovoked deep vein thrombosis. (See "Factor V Leiden and activated protein C resistance", section on 'FVL genotypes'.)

Allele frequencies for various genes and populations of interest are available in the Allele frequency database (ALFRED) or at the HapMap website [80,81]. Some gene-disease associations may be restricted to a very select subgroup. As an example, variants in the BRCA1 gene were identified in patients with early-onset breast cancer who had a strong family history [82]. This group, however, only accounts for approximately 5 percent of all breast cancers. Hence, this genetic association is not worth testing for in those who present with typical late-onset breast cancer without a strong family history. However, in certain ancestry groups such as Ashkenazi Jews, who have a high prevalence of BRCA1 mutations, testing may be appropriate in women with breast or ovarian cancer. (See "Genetic testing and management of individuals at risk of hereditary breast and ovarian cancer syndromes".)

Patient impact — Even if the magnitude of the genetic effect is not sufficiently high to be clinically useful in prognostication, it may still be useful clinically in terms of changing risk behavior [83]. Presenting personal genetic information may take advantage of the layperson's perception of DNA as their "life code" to prompt behavior change.

As an example, early evidence suggests that providing information about glutathione-S-transferase (GST) genotypes, which affect nicotine metabolism, may influence smoking cessation rates [84]. This potential benefit should be balanced against the potential worry of knowing that one's risk of health problems years in the future is increased, and the potential for increased insurance premiums or life insurance/disability denial.

IMPLICATIONS FOR RESEARCH — Apart from potential clinical uses, GWAS results can also shed light on new pathophysiological mechanisms. Principles have been proposed to move beyond identifying single nucleotide polymorphisms (SNPs) to defining their possible functional significance, which might include targeted resequencing, studying the SNP in other populations to further define and narrow linkage disequilibrium, exploring gene transcription functions and epigenetic regulation, and using model systems and cell models to further evaluate the proposed causal variants [3]. The challenge of dissecting the correlation between genotype and phenotype will require rigorous evaluation.

SUMMARY

Definitions – Genetic association studies seek to identify the genetic component to risk of non-Mendelian complex disorders. The basic assumption regarding genetic determinants of complex disease is that the overall risk of disease is determined by the combination of multiple common genetic variants, each with small effect, and environmental exposures. The candidate gene approach selects a genetic variant of potential interest and looks for this gene in cases and controls. The genome-wide approach (genome-wide association studies [GWAS]) looks for an association between millions of variants and the condition of interest (figure 1). (See 'Terminology and study design' above.)

Explaining the variance in complex traits – GWAS have identified some significant single nucleotide polymorphisms (SNPs) associated with disease, but there is still debate about the amount of variance explained by these, (ie, whether there is "missing heritability"). (See 'Genetic determinants of complex traits' above.)

Evaluating GWAS validity – GWAS should be evaluated for validity, considering potential errors related to phenotyping, genotyping, ancestral differences between cases and controls, and multiple comparators. Testing allele distributions for Hardy-Weinberg equilibrium and looking for consistency between studies are other important considerations. (See 'Judging study validity' above.)

Clinical implications – Understanding the magnitude of the risk from any given gene variant depends on the genetic model involved. Increasingly, genetic variants are being combined in polygenic risk scores to help predict risk of disease, prognosis, or likelihood of response; these risk scores can incorporate hundreds to millions of SNPs. Questions in applying results of genetic association studies to clinical practice include the effect on predictive power, the magnitude of absolute versus relative effects, generalizability to the given patient, and whether the genetic information will be beneficial to the patient. (See 'Interpreting GWAS results' above and 'Clinical application' above.)

  1. Dawn Teare M, Barrett JH. Genetic linkage studies. Lancet 2005; 366:1036.
  2. Visscher PM, Montgomery GW. Genome-wide association studies and human disease: from trickle to flood. JAMA 2009; 302:2028.
  3. Freedman ML, Monteiro AN, Gayther SA, et al. Principles for the post-GWAS functional characterization of cancer risk loci. Nat Genet 2011; 43:513.
  4. MacArthur J, Bowler E, Cerezo M, et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res 2017; 45:D896.
  5. Thakkinstian A, Han P, McEvoy M, et al. Systematic review and meta-analysis of the association between complement factor H Y402H polymorphisms and age-related macular degeneration. Hum Mol Genet 2006; 15:2784.
  6. Iyengar SK, Elston RC. The genetic basis of complex traits: rare variants or "common gene, common disease"? Methods Mol Biol 2007; 376:71.
  7. Estivill X, Bancells C, Ramos C. Geographic distribution and regional origin of 272 cystic fibrosis mutations in European populations. The Biomed CF Mutation Analysis Consortium. Hum Mutat 1997; 10:135.
  8. Freund MK, Burch KS, Shi H, et al. Phenotype-Specific Enrichment of Mendelian Disorder Genes near GWAS Regions across 62 Complex Traits. Am J Hum Genet 2018; 103:535.
  9. Samani NJ, Erdmann J, Hall AS, et al. Genomewide association analysis of coronary artery disease. N Engl J Med 2007; 357:443.
  10. Zeggini E, Weedon MN, Lindgren CM, et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 2007; 316:1336.
  11. Matarín M, Brown WM, Scholz S, et al. A genome-wide genotyping study in patients with ischaemic stroke: initial analysis and data release. Lancet Neurol 2007; 6:414.
  12. International Multiple Sclerosis Genetics Consortium, Hafler DA, Compston A, et al. Risk alleles for multiple sclerosis identified by a genomewide study. N Engl J Med 2007; 357:851.
  13. Easton DF, Pooley KA, Dunning AM, et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 2007; 447:1087.
  14. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007; 447:661.
  15. Plenge RM, Seielstad M, Padyukov L, et al. TRAF1-C5 as a risk locus for rheumatoid arthritis--a genomewide study. N Engl J Med 2007; 357:1199.
  16. Coon KD, Myers AJ, Craig DW, et al. A high-density whole-genome association study reveals that APOE is the major susceptibility gene for sporadic late-onset Alzheimer's disease. J Clin Psychiatry 2007; 68:613.
  17. Li H, Wetten S, Li L, et al. Candidate single-nucleotide polymorphisms from a genomewide association study of Alzheimer disease. Arch Neurol 2008; 65:45.
  18. Prokopenko I, Langenberg C, Florez JC, et al. Variants in MTNR1B influence fasting glucose levels. Nat Genet 2009; 41:77.
  19. Bouatia-Naji N, Bonnefond A, Cavalcanti-Proença C, et al. A variant near MTNR1B is associated with increased fasting plasma glucose levels and type 2 diabetes risk. Nat Genet 2009; 41:89.
  20. Weedon MN, Lango H, Lindgren CM, et al. Genome-wide association analysis identifies 20 loci that influence adult height. Nat Genet 2008; 40:575.
  21. Lettre G, Jackson AU, Gieger C, et al. Identification of ten loci associated with height highlights new biological pathways in human growth. Nat Genet 2008; 40:584.
  22. Maher B. Personal genomes: The case of the missing heritability. Nature 2008; 456:18.
  23. Hunt KA, Mistry V, Bockett NA, et al. Negligible impact of rare autoimmune-locus coding-region variants on missing heritability. Nature 2013; 498:232.
  24. Turkmen A, Lin S. Are rare variants really independent? Genet Epidemiol 2017; 41:363.
  25. Wainschtein P, Jain D, Zheng Z, et al. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nat Genet 2022; 54:263.
  26. Wain LV, Armour JA, Tobin MD. Genomic copy number variation, human health, and disease. Lancet 2009; 374:340.
  27. Weiss LA, Shen Y, Korn JM, et al. Association between microdeletion and microduplication at 16p11.2 and autism. N Engl J Med 2008; 358:667.
  28. Conrad DF, Pinto D, Redon R, et al. Origins and functional impact of copy number variation in the human genome. Nature 2010; 464:704.
  29. Wellcome Trust Case Control Consortium, Craddock N, Hurles ME, et al. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 2010; 464:713.
  30. Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: Genetic interactions create phantom heritability. Proc Natl Acad Sci U S A 2012; 109:1193.
  31. Skinner MK, Manikkam M, Guerrero-Bosagna C. Epigenetic transgenerational actions of environmental factors in disease etiology. Trends Endocrinol Metab 2010; 21:214.
  32. Chen Q, Yan W, Duan E. Epigenetic inheritance of acquired traits through sperm RNAs and sperm RNA modifications. Nat Rev Genet 2016; 17:733.
  33. Garg P, Jadhav B, Rodriguez OL, et al. A Survey of Rare Epigenetic Variation in 23,116 Human Genomes Identifies Disease-Relevant Epivariations and CGG Expansions. Am J Hum Genet 2020; 107:654.
  34. Visscher PM, Hill WG, Wray NR. Heritability in the genomics era--concepts and misconceptions. Nat Rev Genet 2008; 9:255.
  35. Selzam S, Ritchie SJ, Pingault JB, et al. Comparing Within- and Between-Family Polygenic Score Prediction. Am J Hum Genet 2019; 105:351.
  36. Yang J, Benyamin B, McEvoy BP, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet 2010; 42:565.
  37. Yang J, Manolio TA, Pasquale LR, et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat Genet 2011; 43:519.
  38. Zhang Y, Qi G, Park JH, Chatterjee N. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits. Nat Genet 2018; 50:1318.
  39. Little J, Bradley L, Bray MS, et al. Reporting, appraising, and integrating data on genotype prevalence and gene-disease associations. Am J Epidemiol 2002; 156:300.
  40. Little J, Higgins JP, Ioannidis JP, et al. STrengthening the REporting of Genetic Association studies (STREGA): an extension of the STROBE Statement. Ann Intern Med 2009; 150:206.
  41. Attia J, Ioannidis JP, Thakkinstian A, et al. How to use an article about genetic association: B: Are the results of the study valid? JAMA 2009; 301:191.
  42. Attia J, Ioannidis JP, Thakkinstian A, et al. How to use an article about genetic association: C: What are the results and will they help me in caring for my patients? JAMA 2009; 301:304.
  43. Contopoulos-Ioannidis DG, Alexiou GA, Gouvias TC, Ioannidis JP. An empirical evaluation of multifarious outcomes in pharmacogenetics: beta-2 adrenoceptor gene polymorphisms in asthma treatment. Pharmacogenet Genomics 2006; 16:705.
  44. Berkovic SF, Scheffer IE. Genetics of the epilepsies. Epilepsia 2001; 42 Suppl 5:16.
  45. Barnholtz-Sloan JS, McEvoy B, Shriver MD, Rebbeck TR. Ancestry estimation and correction for population stratification in molecular epidemiologic association studies. Cancer Epidemiol Biomarkers Prev 2008; 17:471.
  46. Pritchard JK, Rosenberg NA. Use of unlinked genetic markers to detect population stratification in association studies. Am J Hum Genet 1999; 65:220.
  47. Kittles RA, Chen W, Panguluri RK, et al. CYP3A4-V and prostate cancer in African Americans: causal or confounding association because of population stratification? Hum Genet 2002; 110:553.
  48. Frayling TM, Timpson NJ, Weedon MN, et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 2007; 316:889.
  49. Leek JT, Scharpf RB, Bravo HC, et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 2010; 11:733.
  50. Clayton DG, Walker NM, Smyth DJ, et al. Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat Genet 2005; 37:1243.
  51. Akey JM, Zhang K, Xiong M, et al. The effect that genotyping errors have on the robustness of common linkage-disequilibrium measures. Am J Hum Genet 2001; 68:1447.
  52. Pompanon F, Bonin A, Bellemain E, Taberlet P. Genotyping errors: causes, consequences and solutions. Nat Rev Genet 2005; 6:847.
  53. www.tufts.edu/~mcourt01/Documents/Court%20lab%20-%20HW%20calculator.xls (Accessed on March 05, 2014).
  54. www.oege.org/software/hwe-mr-calc.shtml (Accessed on March 05, 2014).
  55. Cox DG, Kraft P. Quantification of the power of Hardy-Weinberg equilibrium testing to detect genotyping error. Hum Hered 2006; 61:10.
  56. McCarthy MI, Abecasis GR, Cardon LR, et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 2008; 9:356.
  57. Sham PC, Purcell SM. Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet 2014; 15:335.
  58. Moonesinghe R, Khoury MJ, Liu T, Ioannidis JP. Required sample size and nonreplicability thresholds for heterogeneous genetic associations. Proc Natl Acad Sci U S A 2008; 105:617.
  59. Franke A, McGovern DP, Barrett JC, et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci. Nat Genet 2010; 42:1118.
  60. Moffatt MF, Gut IG, Demenais F, et al. A large-scale, consortium-based genomewide association study of asthma. N Engl J Med 2010; 363:1211.
  61. Yu W, Wulf A, Yesupriya A, et al. HuGE Watch: tracking trends and patterns of published studies of genetic association and human genome epidemiology in near-real time. Eur J Hum Genet 2008; 16:1155.
  62. www.hugenavigator.net (Accessed on August 04, 2015).
  63. Salanti G, Southam L, Altshuler D, et al. Underlying genetic models of inheritance in established type 2 diabetes associations. Am J Epidemiol 2009; 170:537.
  64. Gallagher MD, Chen-Plotkin AS. The Post-GWAS Era: From Association to Function. Am J Hum Genet 2018; 102:717.
  65. Cannon ME, Mohlke KL. Deciphering the Emerging Complexities of Molecular Mechanisms at GWAS Loci. Am J Hum Genet 2018; 103:637.
  66. Torkamani A, Wineinger NE, Topol EJ. The personal and clinical utility of polygenic risk scores. Nat Rev Genet 2018; 19:581.
  67. Zheng SL, Sun J, Wiklund F, et al. Cumulative association of five genetic variants with prostate cancer. N Engl J Med 2008; 358:910.
  68. Janssens AC, van Duijn CM. Genome-based prediction of common diseases: advances and prospects. Hum Mol Genet 2008; 17:R166.
  69. Irwig L, Bossuyt P, Glasziou P, et al. Designing studies to ensure that estimates of test accuracy are transferable. BMJ 2002; 324:669.
  70. Khera AV, Chaffin M, Aragam KG, et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet 2018; 50:1219.
  71. Li YR, Glessner JT, Coe BP, et al. Rare copy number variants in over 100,000 European ancestry subjects reveal multiple disease associations. Nat Commun 2020; 11:255.
  72. McLeod HL, Krynetski EY, Relling MV, Evans WE. Genetic polymorphism of thiopurine methyltransferase and its clinical relevance for childhood acute lymphoblastic leukemia. Leukemia 2000; 14:567.
  73. Claassens DMF, Vos GJA, Bergmeijer TO, et al. A Genotype-Guided Strategy for Oral P2Y12 Inhibitors in Primary PCI. N Engl J Med 2019; 381:1621.
  74. Meigs JB, Shrader P, Sullivan LM, et al. Genotype score in addition to common risk factors for prediction of type 2 diabetes. N Engl J Med 2008; 359:2208.
  75. Jakobsdottir J, Gorin MB, Conley YP, et al. Interpretation of genetic association studies: markers with replicated highly significant odds ratios may be poor classifiers. PLoS Genet 2009; 5:e1000337.
  76. Emmerich J, Rosendaal FR, Cattaneo M, et al. Combined effect of factor V Leiden and prothrombin 20210A on the risk of venous thromboembolism--pooled analysis of 8 case-control studies including 2310 cases and 3204 controls. Study Group for Pooled-Analysis in Venous Thromboembolism. Thromb Haemost 2001; 86:809.
  77. Cushman M, Tsai AW, White RH, et al. Deep vein thrombosis and pulmonary embolism in two cohorts: the longitudinal investigation of thromboembolism etiology. Am J Med 2004; 117:19.
  78. Ridker PM, Hennekens CH, Lindpaintner K, et al. Mutation in the gene coding for coagulation factor V and the risk of myocardial infarction, stroke, and venous thrombosis in apparently healthy men. N Engl J Med 1995; 332:912.
  79. Koster T, Rosendaal FR, de Ronde H, et al. Venous thrombosis due to poor anticoagulant response to activated protein C: Leiden Thrombophilia Study. Lancet 1993; 342:1503.
  80. The ALlele FREquency Database. Available at: http://alfred.med.yale.edu/alfred/index.asp (Accessed on March 05, 2014).
  81. International HapMap Project. Available at: www.hapmap.org (Accessed on March 05, 2014).
  82. Miki Y, Swensen J, Shattuck-Eidens D, et al. A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science 1994; 266:66.
  83. Marteau TM, Lerman C. Genetic risk and behavioural change. BMJ 2001; 322:1056.
  84. Hamajima N, Suzuki K, Ito Y, Kondo T. Genotype announcement to Japanese smokers who attended a health checkup examination. J Epidemiol 2006; 16:45.
Topic 2902 Version 30.0

References

آیا می خواهید مدیلیب را به صفحه اصلی خود اضافه کنید؟