INTRODUCTION — Mendelian randomization represents a novel epidemiologic study design that incorporates genetic information into traditional epidemiologic methods. Studies based on Mendelian randomization will likely become increasingly common as genetic knowledge of health and disease expands with data from genomewide association studies and genome sequencing. Mendelian randomization provides an approach to addressing questions of causality without many of the typical biases that impact the validity of traditional epidemiologic approaches.
While Mendelian randomization studies can provide important suggestive evidence for causal relations between risk factor and disease outcome, they are not true experiments and are dependent on several assumptions. Evidence from randomized controlled trials, when possible, should continue to guide clinical decisions. However, Mendelian randomization studies are increasingly being used to identify potential targets for new drugs prior to embarking on costly randomized controlled trials.
This topic will discuss the rationale and limitations of Mendelian randomization as a study design. The principles of Mendelian inheritance, which are the basis for randomization of this study design, are discussed separately. (See "Inheritance patterns of monogenic disorders (Mendelian and non-Mendelian)".)
RATIONALE — The Mendelian randomization design was first proposed in 1986 to evaluate whether low levels of LDL cholesterol increase cancer risk . Observational studies had reported a higher risk of cancer in individuals with low LDL levels, compared with subjects with normal or elevated LDL levels. However, biases implicit in observational studies could not be excluded as an explanation for the observed association.
Investigators proposed a natural experiment, suggesting that the effect of low LDL levels on cancer risk could be determined by comparing cancer rates in individuals with and without genotypes that predispose to low LDL level. The inheritance of a particular genotype, based on Mendel’s second law of independent assortment, was far less likely to be influenced by lifestyle or environmental issues than LDL levels themselves. This hypothetical study, finally performed more than 20 years later when genetic data became more readily available, found no increased risk among individuals with lifelong low LDL levels .
Mendelian randomization limits several potential biases that are often seen in observational studies using traditional epidemiologic study designs . These include:
●Confounding – Confounding occurs when factors (known or unknown) are associated with both the exposure of interest (ie, low LDL levels) and the outcome (ie, cancer) (figure 1) . For example, an unknown carcinogen X that leads to both low LDL levels and cancer would be a potential confounder in the association between low LDL levels and cancer. (See "Proof, p-values, and hypothesis testing", section on 'Explanation for the results of a study'.)
Confounding leads to mixing of the effect of the exposure of interest and of the confounding factor on the outcome of interest. Confounding-related bias can be excluded by randomization, or minimized by adjustment for confounding variables in multivariable models. Randomization, if successful, excludes all possible confounders (including unknown or unmeasured confounders). However, randomized trials are difficult, and often not feasible or ethical, to conduct. Multivariable models can adjust only for confounders that are known and measured in the study population.
●Reverse causality – Reverse causality occurs when the outcome affects the level of the exposure of interest, rather than the expected (reverse) direction of effect (figure 2). Reverse causality could represent an important bias in studies when the exposure variable is measured close to the time of outcome. For example, cancer may reduce LDL levels (due to cachexia), which could explain the association between low LDL levels and cancer.
The Mendelian randomization approach limits both confounding and reverse causality, allowing for improved causal inferences, under specific conditions described below. (See 'Assumptions and limitations' below.)
STRUCTURE OF A MENDELIAN RANDOMIZATION STUDY — A Mendelian randomization study is a study (preferably in a prospective cohort) in which the exposure is defined based on the presence or absence of a specific allele ("risk allele") that influences the risk factor of interest [3,5]. Unlike most epidemiologic studies, in which a single exposure-outcome association is estimated, three separate associations are ascertained in a Mendelian randomization study (figure 3):
●Association 1: Between the risk allele (the instrumental variable) and the risk factor (the intermediate variable)
●Association 2: Between the risk allele and the outcome of interest
●Association 3: Between the intermediate variable (risk factor) and the outcome
Robust evidence for the first two associations is required for a Mendelian randomization study. Both qualitative and quantitative Mendelian randomization designs are possible; in most cases, a quantitative assessment is performed . By using the estimates of association 1 and association 2, the investigator can determine the causal effect of the intermediate risk factor on the outcome. Association 3 represents the association which would normally be estimated in a traditional cohort study.
CONCEPTUAL SIMILARITIES TO THE RANDOMIZED TRIAL — Randomized controlled trials (RCTs) provide the highest level of evidence for the causal effect of an intervention on the occurrence of the outcome [7-9]. However, randomized trials are not always feasible (ie, there may be no intervention available) and may be unethical in certain situations (ie, randomization to an intervention which may cause harm). Furthermore, the generalizability of findings from some RCTs may be questioned, particularly when the study population is not representative of the general population .
Alternative study designs, such as Mendelian randomization, have attempted to provide evidence of similar rigor to the RCT but in large population-based cohorts that are often more representative of the general population. The Mendelian randomization study design uses observational data and, unlike a randomized trial, is not a true experiment. However, there are several parallels with randomized trials :
●In an RCT, an individual is randomized to an intervention and outcomes are ascertained. Similarly, in Mendelian randomization, individuals are randomized (at conception) to a genetic variant and outcomes are ascertained. In both the RCT and Mendelian randomization, the biological inference is that the difference in outcomes between the groups is due to a change in an intermediate factor. In the RCT, the effect on the intermediate factor is assumed (ie, statins improve outcomes by lowering LDL). However, in Mendelian randomization, the effect of the variant on the intermediate is confirmed and related back to the outcomes, providing a causal relation between the intermediate and the outcome.
●In an RCT, randomization is achieved by allocating the intervention in a random manner. This step prevents confounding and classifies the RCT as a true experiment. In a Mendelian randomization study, the randomization process occurs during gamete formation prior to conception. In this way, Mendelian randomization studies can be viewed as a natural experiment where individuals are randomly allocated to a certain allele at a given locus, as described by Mendel's second law of independent assortment of alleles. Mendel's second law states that the inheritance of alleles at different genetic loci are independent events, akin to flipping a coin. Therefore, the allocation of a given allele is random in the study population and not influenced by confounding demographic factors. However, there are at least two possible confounders, "linkage disequilibrium" and "population stratification", that must be considered in such studies. (See 'Assumptions and limitations' below and "Genetics: Glossary of terms".)
●In an RCT, the timing of the intervention (eg, allocation of drug) is known and the follow-up for events is prospective, which limits the possibility of reverse causality. Similarly, reverse causality is unlikely to bias the observations from Mendelian randomization studies as the future outcome cannot affect the allocation of a specific allele.
Assuming that randomization is successful and losses to follow-up are minimized, inferences from an RCT are considered causal because they are true experiments. Thus, causal statements can be made based on RCT studies (ie, Drug A causes a reduction in events compared to placebo). Mendelian randomization studies are not as robust as RCTs with regard to ascribing causal effects, but represent the best available evidence for causal effects from observational data. (See 'Assumptions and limitations' below.)
ASSUMPTIONS AND LIMITATIONS — Technically, a Mendelian randomization study is a type of "instrumental variable" study . An instrumental variable is associated with variation in an intermediate factor but not with other factors that are known to influence the intermediate factor or the outcome (ie, not confounded by other factors) (figure 3). The genetic marker in Mendelian randomization acts as the instrumental variable. Due to the large number of false positive associations between candidate genetic markers and intermediate factors, Mendelian randomization requires careful selection of genetic markers; only replicated genetic markers with robust evidence for association with intermediate factors should be considered [13,14].
An important misconception of Mendelian randomization studies is that such studies provide evidence to support the use of risk alleles in risk prediction for disease. Mendelian randomization studies use genetic information solely as an epidemiologic tool to evaluate the causal effects of modifiable risk factors; they provide no evidence for the clinical use of genetic markers associated with disease. As an example, studies that have successfully used genetic variants associated with LDL to address the causal relation between LDL levels and myocardial infarction (MI) provide strong support that LDL is causally related to MI; however, they do not provide the required evidence that such genetic variants should be used in cardiovascular risk prediction or added to current risk calculators (eg, Framingham risk score). Improving risk prediction is not the objective of Mendelian randomization studies.
Assumptions that must be met in a Mendelian randomization to ensure validity of the study include :
●The risk allele is strongly associated with the intermediate factor and explains a significant proportion of the variation of the intermediate factor throughout the duration of the study (and preferably throughout life).
●The risk allele is not associated with other factors that can influence the intermediate factor.
●The risk allele acts on the outcome solely via its association with the intermediate factor. (See 'Pleiotropy' below.)
Based on these assumptions, several limitations to Mendelian randomization must be considered, including insufficient statistical power, confounding due to linkage disequilibrium or population stratification, pleiotropy, and canalization. In most Mendelian randomization studies, the current state of knowledge of genetics and biology makes it difficult to entirely exclude pleiotropy, linkage disequilibrium or canalization as possible biases in estimating causal effects. These potential biases therefore limit the evidence for (or against) causality from such studies.
Insufficient statistical power — Weak or inconsistent associations between the gene marker and the intermediate factor may lead to negative Mendelian randomization studies due to insufficient statistical power to detect an association. Mendelian randomization experiments need to be conducted in a population with sufficiently large sample size to reliably detect the association between the risk allele, the intermediate factor and the outcome of interest. Likewise, genetic loci that only explain a small fraction of the variation in the intermediate factor would produce insufficient differences in measures of the intermediate factor to have any detectable effect on the outcome.
Confounding — Although Mendelian randomization avoids typical confounding, linkage disequilibrium and population stratification are two forms of confounding specific to genetic epidemiology studies (figure 4).
Linkage disequilibrium — The independent assortment of alleles (Mendel's second law) does not apply uniformly to the entire genome. Therefore, the allocation of alleles at conception is not entirely random across all genetic loci. When two loci are in close proximity to each other on the same chromosome, the probability of allocation of these alleles from parents to offspring will be correlated. These loci are said to be "in linkage disequilibrium" (figure 5).
Confounding can arise when locus A, which is strongly associated with the intermediate factor, is in linkage disequilibrium (ie, in close physical proximity) with locus B, which is associated with the outcome. In this case, the risk allele at locus A will appear to be associated with both the intermediate factor and the outcome but the latter association occurs via locus B (and not through the intermediate factor). Therefore, the association between allele at locus A and disease would be confounded by locus B.
Population stratification — Population stratification represents confounding by ethnicity. For example, differences in allele frequencies and disease prevalence may exist among different ethnic groups unrelated to any true association between risk alleles and disease, leading to the faulty conclusion that risk alleles are associated with the disease outcome. In this case, the association between risk allele and disease would be confounded by ethnicity. Control for population stratification is possible using certain statistical techniques (eg, genomic control or principal component methods) [15,16].
Pleiotropy — Another potential limitation of Mendelian randomization is the existence of pleiotropy. Pleiotropy occurs when a genetic variant influences multiple phenotypic traits or has multiple biological effects. The core consideration for Mendelian randomization is that the association between the risk allele and the outcome is mediated via the intermediate factor. Therefore, alternative pathways from the risk allele to the disease outcome would severely compromise the concept of Mendelian randomization (figure 6). The risk allele locus selected for a Mendelian randomization study should have no other functions that could produce an association between the allele and the disease outcome (ie, no pleiotropy). As an example, if a risk allele is associated with lower cholesterol, but also lowers HDL (which may not be known), then if this allele were used in a Mendelian randomization study evaluating the effects of LDL on myocardial infarction risk, the effect of LDL lowering will be overestimated by the added effect of the HDL increase.
Canalization and gene-environment interactions — The validity of results from a Mendelian randomization study is limited when the effect of the risk allele on the intermediate factor or the outcome is modified via compensatory responses in utero or environmental influences during life. Canalization refers to the compensatory effects of mechanisms that act to buffer any genetic effect in utero or after birth . For example, a risk allele associated with an increase in LDL in utero could lead to the upregulation of mechanisms (eg, via gene expression) to protect the individual from increased LDL. This would render the association between the risk allele and the disease outcome invalid for the purposes of estimating the effect of LDL in the general population.
The absence of an association between a risk allele and an intermediate phenotype or a disease outcome also may result from the influence of multiple cumulative environmental effects during life, or from possible gene-environment interactions (eg, with age) that may nullify any genetic effects after a certain age.
MULTIPLE GENETIC MARKERS FOR MENDELIAN RANDOMIZATION — Mendelian randomization studies were initially performed with single risk alleles. Following these early studies, genetic risk scores that combine multiple risk alleles across several loci that are associated with the intermediate factor have become more common. Genetic risk scores may improve associations between the genetic marker and the intermediate risk factor, but they are more prone to confounding by LD and pleiotropic effects. However, the use of different instruments (ie, separate independent markers) strongly associated with the same risk factor that demonstrate consistent results can bolster the causal argument for the role of the risk factor by strengthening some of the assumptions (ie, by reducing the possibility that the association observed is due to pleiotropy or confounding by linkage disequilibrium since it would be very unlikely that independent markers in different regions of the genome would have similar pleiotropic effects or linkage disequilibrium patterns) . Several methodologic advances have allowed for pleiotropic variants to be included in Mendelian randomization studies and provide causal estimates unbiased by potential pleiotropy. However, these methods require additional conditions and assumptions to remain valid (eg, MR-Egger [an adaptation of Egger] regression, multivariable Mendelian randomization, weighted median) [19-21]. These methods are frequently used as sensitivity analyses to better understand how much pleiotropy may influence the results of a Mendelian randomization study.
EXAMPLES OF MENDELIAN RANDOMIZATION IN THE LITERATURE — Several important Mendelian randomization studies have been published. Several examples that provide insights into cardiovascular risk factors and are excellent illustrations of this study design are reviewed here.
Lipoprotein(a) and myocardial infarction — Although numerous epidemiologic studies have demonstrated important associations between Lp(a) and cardiovascular events [22-29], it was unclear whether Lp(a) represented a risk marker or a causal factor for myocardial infarction. Using three large populations from Denmark, a Mendelian randomization study demonstrated that LPA gene variants influence Lp(a) plasma levels, and that LPA gene variants increase the risk of myocardial infarction (MI) . Using an instrumental variable analysis, they show that a doubling in Lp(a) levels throughout life is associated with a 22 percent increase in the risk of MI. Because the allocation of the LPA alleles is randomized before conception, this provides an unconfounded estimate of the causal effect of Lp(a) levels on myocardial infarction. (See "Lipoprotein(a)".)
This Mendelian randomization study met the major requirements for this study design:
●LPA gene variants are strongly associated with Lp(a) plasma levels; such variants explain 30 to 60 percent of the variation of plasma Lp(a) levels and this remains consistent throughout life . Thus, LPA gene variants represent one of the best genetic markers for Mendelian randomization.
●Population stratification is unlikely to confound the relations between LPA gene variants, Lp(a) levels and myocardial infarction as the study was performed in a relatively ethnically homogeneous population.
●The LPA genetic variants used in this study are known to regulate expression of the LPA gene (ie, regulatory variants) and therefore are likely to only influence Lp(a) levels and not have other pleiotropic effects.
True absence of pleiotropic effects can rarely be proven, however. In addition, linkage disequilibrium with other loci which may have effects on Lp(a) levels or myocardial infarction cannot be fully excluded. These examples of unverifiable assumptions in Mendelian randomization may limit confidence in ascribing causal relationships from such a study design (as compared to an RCT).
Lipoprotein(a) and aortic valve disease — In a genome-wide association study (GWAS) of aortic valve calcium (AVC) performed by the Cohorts for Heart and Aging Research in Genetic Epidemiology (CHARGE) consortium, a genetic variant in the LPA locus was associated at genome wide significance with AVC and with aortic stenosis in two independent cohorts. Formal Mendelian randomization analyses provided strong evidence for a causal association between circulating Lp(a) levels and AVC . These results were replicated by independent investigators in additional studies [33-35]. These data lend strong support to the notion that Lp(a) is a causal factor for the development and progression of aortic stenosis and provide a potential therapeutic target for this disease.
CRP gene variants and myocardial infarction — C-reactive protein (CRP) has been demonstrated to be a robust marker of increased cardiovascular risk in numerous populations. However, whether CRP plays a causal role in atherogenesis and cardiovascular disease remains unknown as no agents that specifically lower CRP exist (without affecting other risk factors).
Two studies evaluated the causal role of CRP using the Mendelian randomization approach evaluating the impact of a genetically elevated CRP level and cardiovascular outcomes [36,37]. Despite clinically important differences in CRP levels between individuals with and without CRP risk alleles, both of these large adequately powered studies observed no differences in cardiovascular event rates in individuals with or without these CRP risk alleles . These studies provide important evidence that CRP is unlikely to be a causal factor in atherosclerosis and that proposed strategies to reduce cardiovascular disease risk by directly lowering levels of CRP are unlikely to be effective. However, although CRP may not be a causal factor in cardiovascular disease, this observation does not have an impact on the importance of the role of CRP as a risk marker in cardiovascular risk prediction. (See "C-reactive protein in cardiovascular disease".)
HDL gene variants and myocardial infarction — Low levels of high density lipoprotein (HDL) cholesterol are a known risk factor for cardiovascular disease; however, the causal nature of this association has never been definitively proven. Although efforts to pharmacologically increase HDL to reduce cardiovascular risk have become a major interest, this strategy would only be expected to be successful if HDL was, in fact, a causal mediator of cardiovascular disease. A large Mendelian randomization study evaluating HDL-increasing genetic variants demonstrated no reduction in myocardial infarction among those with HDL-increasing gene variants, as compared to those without, suggesting that life-long genetic elevations in HDL may not be causally associated with reduced risk of myocardial infarction . (See "HDL cholesterol: Clinical aspects of abnormal values", section on 'Low HDL cholesterol as an ASCVD risk factor'.)
Interestingly, these genetic studies are also concordant with several randomized trials that have failed to reduce cardiovascular events with HDL-increasing pharmacological therapy. Taken as a whole, these studies have significantly weakened the hypothesis that HDL is a causal factor in myocardial infarction. (See "HDL cholesterol: Clinical aspects of abnormal values", section on 'Effect of increasing HDL cholesterol on clinical outcome'.)
Alcohol intake and cardiovascular disease — In observational studies, alcohol intake has been consistently associated with a reduced risk of cardiovascular disease and has led to public health recommendations in some parts of the world that light to moderate consumption of alcohol may have cardiovascular benefits. However, alcohol intake is frequently confounded by several other factors, such as socioeconomic status, lifestyle, and behavioral factors, which could confound the relation with cardiovascular disease.
Given the difficulties in performing a randomized trial to address this question, investigators have turned to Mendelian randomization approaches. By leveraging the genetic variants that predispose to a greater propensity for alcohol intake, several Mendelian randomization studies have demonstrated that greater alcohol intake is associated with higher risk of cardiovascular risk factors (such as hypertension) and incident coronary artery disease [39,40]. These results, which are not prone to confounding and other observational biases, provide compelling evidence for a causal association between alcohol intake and increased cardiovascular disease.
EVALUATING A MENDELIAN RANDOMIZATION ARTICLE — Clinicians interpreting evidence from a Mendelian randomization article in the literature can use the following checklist to grade the strength of such a study:
●Is the study design based on the standard prospective cohort study (ie, incident rather than prevalent events)?
●Has the risk allele used been reproducibly associated with the intermediate factor in several independent studies?
●Does the risk allele explain a reasonable amount of the variation in the intermediate factor? Is this consistent throughout the study follow-up?
●Is the mechanism underlying the association between the risk allele and the intermediate factor known (eg, is the genetic variant a known regulatory variant for the expression of the intermediate factor)?
●Is the risk allele known to have no other effects other than to influence the levels of the intermediate factor? Was this question evaluated in the cohort?
●Was a single risk allele used (as opposed to a gene score, which increases the possibility of pleiotropy and/or confounding by linkage disequilibrium)? If more than one risk allele is known, did the authors examine additional risk alleles (known to strongly associate with the risk factor of interest) to confirm their findings? Did the authors perform sensitivity analyses to evaluate whether results are robust to pleiotropy?
●Have the authors estimated the following associations in their study population?
•Genetic variant(s) with intermediate factor
•Genetic variant(s) with disease outcome
●Was an instrumental variable analysis used to provide estimates of causal effect?
●Was randomization successful? Demographic and clinical characteristics should be randomly distributed between the risk alleles with minimal differences.
●Did the authors consider confounding by population stratification or linkage disequilibrium with other loci?
For negative Mendelian randomization studies, a few additional factors to be considered are:
●Was there a large enough sample size (ie, was power calculated)? This becomes increasingly important for weaker associations.
●Did the authors consider canalization?
●Did the authors consider gene-environment interactions (eg, age interactions) as an explanation?
SUMMARY AND RECOMMENDATIONS
●Definition – Mendelian randomization represents a novel epidemiologic study design that incorporates genetic information into traditional epidemiologic methods to investigate potential risk factors for specified outcomes. Mendelian randomization minimizes many of the major biases of traditional epidemiologic observational studies. (See 'Introduction' above.)
●Method – Mendelian randomization is a natural experiment in which the independent assortment of alleles at a given locus randomly sorts individuals who do or don’t have a specified allele. A prospective cohort study is done to evaluate whether the outcome of interest is found more commonly in individuals with the risk allele than in those without the allele. Because the risk allele is closely associated with a risk factor (intermediate variable) of interest, then it can be shown that the presence of the risk factor is likely a causal factor for the outcome of interest. (See 'Rationale' above.)
●Limitations – The validity of Mendelian randomization is dependent upon random gene assortment at a polymorphic locus and a strong, direct, and unique association between the risk allele and risk factor. Studies are weakened by linkage disequilibrium, population stratification, inconsistent associations between allele and intermediate variable resulting in insufficient statistical power, canalization, and gene/environment interactions. (See 'Assumptions and limitations' above.)
●Questions to consider – Mendelian randomization relies on several assumptions, some unverifiable, which can invalidate inferences for a causal association between risk factor and disease. In reviewing a study employing Mendelian randomization design, questions regarding potential confounding factors need to be considered. (See 'Evaluating a Mendelian randomization article' above.)
ACKNOWLEDGMENT — The UpToDate editorial staff acknowledges Christopher J O'Donnell, MD, MPH, who contributed to an earlier version of this topic review.
آیا می خواهید مدیلیب را به صفحه اصلی خود اضافه کنید؟