ﺑﺎﺯﮔﺸﺖ ﺑﻪ ﺻﻔﺤﻪ ﻗﺒﻠﯽ
خرید پکیج
تعداد آیتم قابل مشاهده باقیمانده : 3 مورد
نسخه الکترونیک
medimedia.ir

Systematic review and meta-analysis

Systematic review and meta-analysis
Literature review current through: Jan 2024.
This topic last updated: Aug 31, 2023.

INTRODUCTION — This topic review will provide an overview of how systematic reviews and meta-analyses are conducted and how to interpret them. In addition, it will provide a summary of methodologic terms commonly encountered in systematic reviews and meta-analyses.

A broader discussion of evidence-based medicine and a glossary of biostatistical and epidemiological terms are presented separately. (See "Evidence-based medicine" and "Glossary of common biostatistical and epidemiological terms".)

KEY DEFINITIONS — The terms systematic review and meta-analysis are often used together, but they are not interchangeable. Not all systematic reviews include meta-analyses, though many do.

These terms are defined here since they are used throughout this topic. A glossary of other relevant terms is provided at the end of this topic. (See 'Glossary of terms' below.)

Systematic review — A systematic review is a comprehensive summary of all available evidence that meets predefined eligibility criteria to address a specific clinical question or range of questions. It is based upon a rigorous process that incorporates [1,2]:

Determination of research question, including study eligibility and methodology

Systematic identification of studies that have evaluated the specific research question(s)

Critical appraisal of the studies

Meta-analyses (not always performed) (see 'Meta-analysis' below)

Presentation of key findings (eg, in a summary of findings table)

Explicit assessment of the limitations of the evidence (ie, rating the certainty [or quality] of the body of evidence)

Systematic reviews contrast with "narrative" reviews and textbook chapters which generally do not exhaustively review the literature. In addition, narrative reviews lack transparency in the selection and interpretation of supporting evidence, generally do not provide a quantitative synthesis of the data, and may be biased if the included evidence was selected to support a preconceived conclusion rather than deriving conclusions from the entire body of evidence.

Meta-analysis — Meta-analysis, which is commonly included in systematic reviews, is the statistical method of quantitatively combining or pooling results from different studies. It can be used to provide overall pooled effect estimates. For example, if a drug was evaluated in multiple placebo-controlled trials that all reported the same outcome, meta-analysis can be used to estimate a pooled relative risk for the drug's overall effect based upon all of the trials. Meta-analysis can also be used to pool other types of data such as studies on diagnostic accuracy (eg, pooled estimates on sensitivity and specificity) and epidemiologic studies (eg, pooled incidence or prevalence rates; pooled odds ratio for strength of association). Meta-regression and network meta-analysis (NMA) are enhancements to traditional meta-analysis. (See 'Meta-regression' below and 'Network meta-analysis' below.)

ADVANTAGES OF SYSTEMATIC REVIEW AND META-ANALYSIS — Clinical decisions in medicine ideally should be based upon guidance from a comprehensive assessment of the body of available knowledge. A single clinical trial, even a large one, is seldom sufficient to provide a confident answer to a clinical question. Indeed, one analysis suggested that most research claims are ultimately proven to be incorrect or inaccurate when additional studies have been performed [3]. At the same time, it is well established that large randomized controlled trials do not always confirm the results of prior meta-analyses [4-6]. The "truth" needs to be understood by examining all sources of data as critically and objectively as possible.

There are several potential benefits to performing systematic analysis, which may also include meta-analysis:

Unique aspects to a single randomized trial, involving the participating patient population, protocol, setting in which the trial is performed, or expertise of the involved clinicians, may limit its generalizable to other settings or individual patients. The conclusions of systematic reviews are likely to be more generalizable than single studies.

Combining studies in meta-analyses increases the sample size and generally produces more precise estimates of the effect size (ie, estimates that have smaller confidence intervals) than a single randomized trial. Meta-analysis may also allow exploration of reasons for heterogeneity across studies to allow conclusions beyond what can be gleaned from individual studies. (See 'Exploration of heterogeneity' below.)

Clinicians rarely have the time or resources to critically evaluate the body of evidence relevant to a particular clinical question, and a systematic review can facilitate this investigation. (See "Evidence-based medicine", section on 'Categories of evidence'.)

In contrast with narrative review articles, most systematic reviews focus on narrow, clearly defined questions and include all eligible studies, not just studies chosen by the author. Systematic reviews are therefore less prone to bias since they derive conclusions from the entire body of evidence, whereas narrative reviews may be biased if the included evidence was selected to support a preconceived conclusion.

Systematic review and meta-analysis are methods to synthesize the available evidence using an explicit, transparent approach that considers the strengths and weaknesses of the individual studies, populations and interventions, and specific outcomes that were assessed. Individual practitioners, policymakers, and guideline developers can use well-conducted systematic reviews to determine best patient management decisions. Organizations that develop guidelines can use the results of systematic reviews and meta-analyses to provide evidence-based recommendations for care.

STEPS TO CONDUCTING A SYSTEMATIC REVIEW AND META-ANALYSIS

Overview — Several steps are essential for conducting a systematic review or meta-analysis. These include:

Formulating research questions (see 'Formulating research questions' below)

Developing a protocol (see 'Developing a protocol' below)

Searching for the evidence (see 'The literature search' below)

Assessing the quality of studies (often referred to as the risk of bias [RoB] assessment)

Summarizing and displaying results (eg, using forest pots and a summary of findings table, as shown in the figure (figure 1)) (see 'Forest plot' below)

Exploring reasons for heterogeneity across studies (see 'Exploration of heterogeneity' below)

The basic steps, along with limitations that should be considered, are discussed here. While this topic review focuses on meta-analysis of randomized controlled trials, many of the methods and issues apply equally to meta-analyses of other comparative studies, noncomparative (single group) and other observational studies, and studies of diagnostic tests. An overview of approaches to systematic review and meta-analysis is provided in a table (table 1).

The updated 2020 Preferred Reporting Items of Systematic reviews and Meta-Analyses (PRISMA) statement emphasizes that systematic reviews should provide the protocol, data, and assessments of RoB from individual studies with sufficient transparency to allow the reader to verify the results [1]. It underscores the basic questions that the clinician and investigator should ask when interpreting a systematic review. The PRISMA website provides checklists for the items that should be included in a systematic review. Several "extensions" to PRISMA have been developed for specific types of systematic reviews or meta-analyses (eg, PRISMA for diagnostic test accuracy, PRISMA for individual patient data analyses, PRSMA for network meta-analysis) [7]. In addition, readers of systematic reviews should assess the relevance to their own practice with regards to the populations, settings, interventions, and outcomes studied. (See 'Reading and interpreting a systematic review' below.)

In 2011, the Institute of Medicine published recommended standards for developing systematic reviews, which remain pertinent [8]. While these standards principally apply to publicly funded systematic reviews of comparative effectiveness research that focus specifically on treatments, most of the standards pertain to all systematic reviews. The United States Agency for Healthcare Research and Quality also has an ongoing series of articles that form a Methods Guide for Comparative Effectiveness Reviews for its Evidence-based Practice Center program and related reviews. This guide principally applies to large overarching systematic reviews but provides insights and recommendations for addressing a broad range of topics, studies, and methodological approaches.

The Cochrane Collaboration (an international organization that advances systematic reviews and meta-analyses) also provides guidance for conducting systematic reviews and meta-analyses specific to the effects of healthcare interventions [9].

Formulating research questions — Research questions in systematic reviews are analogous to the research hypotheses of primary research studies. They should be focused and defined clearly since they determine the scope of research the systematic review will address [10].

Broad questions that cover a range of topics may not be directly answerable and are not appropriate for systematic reviews or meta-analyses. As an example, the question "What is the best treatment for chronic hepatitis B?" would need to be broken down into several smaller well-focused questions that could be addressed in individual and complementary systematic reviews. Examples of appropriate key questions may include, "How does entecavir compare with placebo for achieving hepatitis B e antigen (HBeAg) seroconversion in patients with chronic HBeAg-positive hepatitis B?" and "What is the relationship between hepatitis B genotypes and response rates to entecavir?" These and other related questions would be addressed individually and then, ideally, considered together to answer the more general question.

Research questions for studies of the effectiveness of interventions are commonly formulated according to the "PICO" method, which fully defines the Population, Intervention, Comparator, and Outcomes of interest [10]. The acronym "PICOD" is sometimes used to indicate that investigators must also specify which study designs are appropriate to include (eg, all comparative studies versus only randomized trials). Other eligibility criteria may include the timing or setting of care. Variations of these criteria should be used for systematic reviews of other study designs, such as of cohort studies (without a comparator), studies of exposures (instead of interventions), studies of diagnostic tests, and qualitative research.

Developing a protocol — A written protocol serves to minimize bias and to ensure that the review is implemented according to reproducible steps. A systematic review should describe its research question (and its component PICOD elements) and the review methodology, including the search strategy and approaches to analyzing and summarizing the data. Ideally, the protocol should be a collaborative effort that includes both clinical and methodology experts.

Publication of protocols can be useful to prevent unnecessary duplication of efforts and to enhance transparency of the systematic review. A voluntary registry, PROSPERO, was established in 2011. The database contains protocol details for systematic reviews that have health-related outcomes.

The literature search

Performing the search — The literature search should be systematic and comprehensive to minimize error and bias. Most systematic reviews start with searches in at least two electronic databases of the literature. Medline is almost universally used (eg, through the PubMed interface); other commonly searched databases include Embase and the . Inclusion of additional databases should be considered for specialized topics such as mental health, complementary or alternative medicine, quality of care, and nursing. Electronic searches should be supplemented by searches of the bibliographies of retrieved articles and relevant review articles and by studies known to domain experts.

Some researchers attempt to incorporate unpublished data (so called "grey literature") to diminish the risks of publication bias (selective publication of studies, possibly based on their results), reporting bias (selective reporting of study results, possibly based on statistical significance), and to include data that are evolving rapidly and not yet published [11]. The importance of including unpublished data sources in systematic reviews and meta-analysis is uncertain [12]. There is no standard definition of grey literature, but it generally refers to information obtained from sources other than published, peer-reviewed articles. This may include conference abstracts and proceedings, clinical trial registries (eg, ClincalTrials.gov registry), adverse events databases, government agency databases and documents (eg, US Food and Drug Administration), unpublished industry data, dissertations, and online sites.

Publication and reporting bias — Reporting bias refers to bias that results from incomplete publishing or reporting of available research. This is a common concern and a potentially important limitation of systematic review since the missing data may affect the validity of systematic reviews [13]. There are two main categories of reporting bias:

Publication bias – Compared with positive studies, negative studies may take longer to be published or may not be published at all [14]. This is referred to as "publication bias."

Outcome reporting bias – "Outcome reporting bias" refers to the concern that a study may only include outcomes that are favorable and significant in the published report, while nonsignificant or unfavorable outcomes are selectively not reported. This may occur due to active suppression of "negative" findings or merely because of space limitations of the publication.

Several methods have been developed to evaluate whether publication bias is present. However, they all involve major assumptions about possible missing studies [15]. Any evaluation of publication bias should not be considered definitive, but rather only exploratory in nature.

A commonly used method for assessing publication bias is the funnel plot, which is a scatter plot displaying the relationship between the weight of the study (eg, study size or standard error) and the observed effect size (figure 2) [16]. An asymmetric appearance, especially due to the absence of smaller negative studies, can suggest unpublished data. However, this assessment is not definitive since asymmetry could be due to factors other than unpublished negative studies (such as population heterogeneity or study quality) [13,17-19]. Funnel plot assessments are generally considered unreliable when there are <10 studies included in a meta-analysis [20].

Other methods to evaluate reporting bias include the "trim and fill" method, "modeling selection process," and testing for an excess of significant findings [21-24]. These methods are beyond the scope of this topic.

Risk of bias assessment — The quality of an individual study has been defined as the "confidence that the trial design, conduct, and analysis has minimized or avoided biases" [25]. The risk of bias (RoB) assessment (sometimes referred to as "quality assessment") represents the extent to which trial design and methodology prevented systematic error and can help explain differences in the results of systematic reviews.

The primary value of the RoB assessment of individual studies in the meta-analysis is to determine the degree of confidence that the pooled effect estimate reflects the "truth" as best as it can be measured. One would be more likely to have high confidence in conclusions based upon "high-quality" (ie, low RoB) studies rather than "low-quality" (ie, high RoB) studies. Differences in RoB of individual studies can also be explored to help explain heterogeneity (eg, does the effect in low RoB studies differ from that in high RoB?).

The process of assessing study quality is not straightforward. Numerous RoB assessment systems are available. Different study designs have different methodological concerns and thus different RoB assessment tools are used depending on the methodology of the individual studies. Commonly used tools (among many others) include:

For RCTs:

Original Cochrane RoB tool for randomized controlled trials (with 7 questions [26])

More complex revision of this tool, RoB 2 (with 5 overarching questions and 22 subquestions [27])

CASP (Critical Appraisal Skills Programme) Randomised Controlled Trial Checklist (with 11 questions)

NHLBI (National Heart, Lung, and Blood Institute) Quality Assessment of Controlled Intervention Studies (with 14 questions)

For observational comparative studies (of various designs):

The ROBINS-I tool (Risk Of Bias In Non-randomized Studies of Interventions, with 7 overarching questions and 31 subquestions [28])

CASP Cohort Study Checklist (with 12 questions)

Joanna Briggs Institute (JBI) Checklist for Cohort Studies (with 11 questions)

For studies on diagnostic tests:

JBI Checklist for Diagnostic Test Accuracy Studies (with 10 questions) [29]

Different methodologists use different tools depending on available time and resources, needs and purpose of the given review, and philosophical differences among researchers about the relative importance of different "quality" factors. Importantly, the assessment of a study's RoB can be limited by the need to rely on information presented in the manuscript [30].

For randomized trials, the RoB assessment typically considers the following factors:

Randomization method – Some "randomization" methods are not truly random, which can be a source of bias. For example, a computer algorithm is generally preferred over a system based on day of the week or other nonrandom method.

Allocation concealment – Allocation is the assignment of study participants to a treatment group. It occurs between randomization and implementation of the intervention. Allocation should be adequately concealed from the study personnel. A study may be biased if allocation is not concealed. For example, if the study used unsealed envelopes corresponding to the randomization order to assign patients to each treatment arm, the study personnel could read the contents and thereby channel certain patients into the desired treatment (eg, if they believed the investigational treatment was effective, they may channel sicker patients into that arm). This would result in imbalance between the two arms of the study (ie, the intervention arm would have sicker patients while the control arm would have healthier people), resulting in the intervention appearing to be less effective than it truly is.

Blinding – Ideally, all relevant groups should be blinded to treatment assignment. This includes study participants, clinicians, data collectors, outcome assessors, and data analysts. Blinding is not always feasible. Some forms of surgery or behavioral modifications, for example, do not lend themselves to blinding of patients and providers. However, outcome assessors and data analyst can usually be blinded regardless of the type of treatment. "Double blinding" generally refers to blinding of the study participants and at least one of the study investigators, although it may not be clear who was blinded when only "double blinding" is reported. For adequate blinding, treatments with a noticeable side effect (eg, niacin) ideally should have an "active control" that mimics the side effect.

Differences between study groups – Differences in the treatment groups at baseline can lead to biased results. The goal of randomization is to balance important prognostic variables relevant to the outcome(s) of interest among the different treatment groups. However, randomization is not always successful. Differences in treatment groups typically occur in trials with relatively small numbers of subjects. Researchers can attempt to adjust for baseline differences in the statistical analysis, but it is far more preferable to have balanced groups at baseline.

Attrition and incomplete reporting – High rates of withdrawal of participants from a study may indicate a fundamental problem with the study design. Uneven withdrawal from different study groups can lead to bias, particularly if the reasons for withdrawal differ between, and are related to, the interventions (such as ascribing adverse events to the intervention or lack of effectiveness to the placebo). Reports should describe the reasons for patient withdrawal to allow assessment of their effect on bias and study applicability.

Early termination for benefit – Stopping a trial early for benefit will, on average, overestimate treatment effects [31]. However, the degree of overestimation varies. Small trials that are stopped early with few events can result in large overestimates. In larger trials with more events (ie, >200 to 300 events), early stopping is less likely to result in serious overestimation [32]. Early termination of a trial for harm can also introduce bias (ie, overestimation of the harm); however, it is generally considered ethically obligatory to stop the trial in such circumstances. Early termination for other reasons (eg, slow accrual) is not considered a source of bias per se, though it can sometimes indicate that there are other problems with the trial (eg, the eligibility criteria may be too strict and not reflective of the patient population seen in actual clinical practice).

Other factors that may be considered when assessing the methodologic quality of a study include the accuracy of reporting (eg, details of study methodology, patient characteristics, and study results) and the appropriateness of statistical analyses. For example, an intention to treat (ITT) analysis is appropriate for assessing efficacy of a treatment since it preserves the comparability of treatment groups achieved by randomization. In some cases, it may be appropriate to perform a per protocol analysis alongside the ITT analysis, but when performed alone, per protocol analyses can lead to biased results.

The RoB assessment involves judgement. For this reason, it should generally be performed independently by two separate reviewers and there should be a process for resolving disagreements.

Meta-analysis

Statistical methods for combining data — Meta-analysis combines results across studies to provide overall estimates and confidence intervals of treatment effects. For dichotomous outcomes (ie, outcomes with two possible states, such as death versus survival), results are summarized using an odds ratio (OR), relative risk (RR; also called risk ratio), or hazard ratio (HR). Essentially, any study metric can be meta-analyzed, including continuous variables (mean, mean difference, percent change) or proportions. However, meta-analysis is not feasible if the studies measured completely different outcomes (eg, one trial measured pain scores while the other measured functional ability).

There are numerous specific methodologic details of meta-analysis that are beyond the scope of this topic. The primary consideration is whether the summary effect estimate should be calculated under the assumption of a "random effects" or a "fixed effect" model [33]. For most of the medical literature, the random effects model is the more appropriate approach. These two approaches are discussed in detail below. (See 'Fixed versus random effects models' below.)

When to combine studies — The decision to combine studies should be based upon both qualitative and quantitative evaluations. Important qualitative features include the degree of similarity of populations, interventions, outcomes, study objectives, and study designs that incorporate both clinical and biologic plausibility. The systematic reviewers should provide a sufficient explanation of the rationale for combining studies to allow the readers to judge for themselves whether they agree that it was appropriate to combine the individual studies.

Some investigators also examine statistical heterogeneity (eg, the I2 index or Q statistic) to determine whether it is appropriate to combine data. However, it is not standard practice to avoid meta-analysis solely due to statistical heterogeneity. (See 'Statistical heterogeneity' below.)

Exploration of heterogeneity — Meta-analyses typically attempt to explore the reasons for statistical heterogeneity across studies (why the results or effect sizes differ from study to study). This is most commonly accomplished by performing subgroup analyses. Meta-regression and sensitivity analyses are also used.

These analyses may be exploratory post hoc examinations of differences between studies, or they may be performed to evaluate specific a priori hypotheses regarding factors that are thought to impact the effect size (eg, low- versus high-risk patients, earlier versus later treatment).

All explorations of heterogeneity carry the risks associated with data dredging and ecological fallacy.

Data dredging – This refers to analyzing a large number of variables regardless of clinical relevance, which often results in false-positive findings [34]).

Ecological fallacy – It may be difficult to account properly for certain patient-level variables, such as age, when performing these analyses. For example, most studies report the average for such variables (eg, a mean age of 47 years) which does not reflect the range of values across the study population. Making an assumption about individual data based upon aggregated statistics (known as "ecological fallacy") can produce invalid results in subgroup analyses or meta-regression [35,36]. It is generally not appropriate to make inferences about specific individuals based upon aggregated statistics for groups of individuals. The only reliable way to address this is to analyze patient-level data. (See 'Individual patient data' below.)

Most findings from subgroup analyses and meta-regression should be considered hypothesis-generating, rather than conclusive.

Subgroup analyses — The primary method used to explore heterogeneity is subgroup analysis, which involves performing separate analyses based upon clinically relevant variables. To minimize the risk of drawing false conclusions, subgroup analyses in meta-analyses should be:

Specified a priori, including hypotheses for the direction of the differences (ie, they should be based upon prior evidence or knowledge)

Limited to only a few (ie, to avoid data dredging)

Analyzed by conducting statistical testing for interaction (ie, determining the p value for the between-group difference) rather than simply comparing the separate effect estimates

An example of a subgroup analysis is shown in panel C of the figure (figure 1), which is from a meta-analysis examining the effect of corticosteroids in patients with acute respiratory distress syndrome. The subgroup analysis explored whether the effect differed in studies in which patients received earlier (before day 14) or later (day 14 or later) corticosteroid therapy. In this case, the test for subgroup effect (ie, interaction) was statistically significant (p=0.003).

The approach to evaluating findings from subgroup analyses in meta-analyses and clinical trials is discussed in greater detail separately. (See "Evidence-based medicine", section on 'Subgroup analyses'.)

Meta-regression — Regression analysis of primary studies may be used to account for potential confounding factors and explain differences in results among studies. This meta-analytic technique is commonly known as meta-regression. In this approach, the dependent variable in the regression is the estimate of treatment effect from each individual study and the independent variables are the aggregated characteristics in the individual studies (variables such as drug dose, treatment duration, study size). Instead of individual patients serving as the units of analysis, each individual study is considered to be one observation [37-39]. Meta-regression tests the statistical interaction between the study variable (eg, drug dose) and the treatment effect (eg, relative risk of death). It can include categorical and continuous variables which can be analyzed singly (univariable analysis) or together (multivariable analysis).

However, as previously discussed, a common pitfall in meta-regression is to analyze by aggregate data as proxies for patient-level data (termed "ecological fallacy") [35,36]. For example, comparisons between men and women are valid only if all participants in the study are male or female. Meta-regressions that assess for differences in effect size according to sex by including the percent female or male patients within each study assume that each individual in the study is that percentage male or female.

Results of meta-regression can be depicted graphically using a bubble plot as shown in the figure (figure 3), which is an example of a meta-regression of early trials of zidovudine monotherapy for HIV infection [40]. The meta-regression successfully explains the heterogeneity across studies, showing an association between treatment duration and mortality benefit that was not apparent within the individual trials.

Special adaptations of meta-analysis

Individual patient data — It is sometimes possible to obtain original patient-level data which can be reanalyzed in a meta-analysis [41]. Pooling individual patient data is the most rigorous form of meta-analysis. While more costly and time-consuming and limited by difficulties collecting the original data, there are several benefits, including:

It permits regressions of patient-level predictors (eg, age) without the risk of ecological fallacy

It allows time-to-event analyses

Network meta-analysis — When multiple different interventions are compared across trials, a network of studies can be established where all the studied interventions are linked to each other by individual trials. Network meta-analysis (NMA) evaluates all studies and all interventions simultaneously to produce multiple pairwise estimates of relative effects of each intervention compared with every other intervention [42,43].

A schematic representation of a network diagram is shown in the figure (figure 4). In reality, some network diagrams in NMAs are far more complex (figure 5).

The pairwise comparisons in NMAs are based upon both direct and indirect comparisons. For example, consider two drugs (drug A and drug B) that were each evaluated in placebo-controlled trials and directly compared with one another in a separate clinical trial (figure 4). NMA can be used to estimate the relative efficacy of drug A versus drug B based upon the direct comparison (ie, from the trial directly comparing drug A to drug B) and indirect comparisons (ie, from the placebo-controlled trials). The direct and indirect estimates are then pooled together to yield an overall estimate (or "network estimate") of the relative effect. Typically, the direct, indirect, and network estimates are reported separately in NMAs. Some of the comparisons in a NMA may be based entirely on indirect data.

When assessing the validity of an NMA, many of the same principles that are used for assessing conventional meta-analysis apply (eg, was the literature search comprehensive, were eligibility criteria for the studies clearly stated, were the individual studies assessed for RoB, how precise are the effect estimates, etc (table 2)). However, there are two concerns that are unique to NMAs [44,45]:

Intransitivity – The assumption of transitivity is fundamental to NMA because the network estimates rely upon indirect comparisons. For the transitivity assumption to hold, the individual studies must be sufficiently similar in all respects other than the treatments being compared (ie, similar participants, setting, ancillary treatments, and other relevant parameters). In the example above, if studies of drug A versus placebo are systematically different than studies of drug B versus placebo (eg, if they were conducted in an earlier era), then the indirect comparison of drug A versus drug B may be biased due to these differences (ie, the difference may be partly explained by differences in disease management over the intervening decades).

Incoherence – Incoherence (also called inconsistency) refers to differences between the direct and indirect estimates. Incoherence can be a consequence of bias due to methodologic limitations of the studies, publication bias, indirectness, or intransitivity. If the direct and indirect estimates are considerably different from each other, the network estimate may not be valid. Addressing incoherence and assessing its impact on the network estimate requires judgement [44].

Bayesian methods are commonly used to conduct NMA [46]. This approach has the advantage of allowing estimation of the probability of each intervention being best, which, in turn, allows interventions to be ranked. Such ranking, however, needs to be interpreted cautiously, as it can be unstable, depending on the network topology, and can have a substantial degree of imprecision [47].

READING AND INTERPRETING A SYSTEMATIC REVIEW — Key questions to consider when reading and interpreting a systematic review are summarized in the table (table 2). The reader should appraise the systematic reviews for its quality, potential sources of bias, and extent to which the findings are applicable to their specific question. Systematic reviews and meta-analyses are subject to the same biases observed in all research. In addition, the value of a systematic review's conclusions may be limited by the quality and applicability of the individual studies included in the review.

GLOSSARY OF TERMS

Applicability (also called generalizability or directness) — The relevance of a study (or a group of studies) to a population of interest (or an individual patient). This requires an assessment of how similar the subjects of a study are to the population of interest, the relevance of the studied interventions and outcomes, and other PICO features. (See 'PICO method (PICOD, PICOS, PICOTS, others)' below.)

Ecological fallacy (ecological inference fallacy) — An error in interpreting data where inferences are made about specific individuals based upon aggregated statistics for groups of individuals.

Fixed versus random effects models — Most meta-analyses use random effects models; a fixed effect model is appropriate only in select circumstances.

Fixed effect model – The central assumption of a fixed effect model is that there is a single true effect and that all trials provide estimates of this one true effect. Meta-analysis thus provides a pooled estimate of the single true effect. A hypothetical model for a fixed effect model meta-analysis is shown in a figure (figure 6).

Since the assumption is that there is a single true effect, the fixed effect model assumes that estimates from each study differ solely because of random error. This assumes that all studies represent the same population, intervention, comparator, and outcome for which there is a single "true" effect size. Fixed effects models yield effect size estimates by assigning a weight to each individual study estimate that reflects the inherent variability in the results measured (ie, the "within-study variance" related to the standard error of the outcome).

There are limited instances when it is appropriate to use a fixed effects model for meta-analysis of clinical trials:

If there is extreme confidence that the studies are comparable (ie, characteristics of the enrolled patients, the type of intervention, comparators and outcome measures) such that any difference across studies is just due to random variation. Such an assumption is typically difficult to justify. One example of an appropriate use of the fixed effects model is meta-analysis of repeated, identical, highly controlled trials in a uniform setting, as may be done by pharmaceutical companies during early testing.

The studies are of rare events in which one form of a fixed effects model (the Peto odds ratio) may be less biased than other methods of pooling data [48].

Random effects model – The central assumption of a random effects model is that each study estimate represents a random sample from a distribution of different populations [49]. For most of the medical literature, the random effects model is the more appropriate approach. A hypothetical model for a random effects model meta-analysis is shown in the figure (figure 7). The model assumes there are multiple true treatment effects related to inherent differences in different populations or other factors, and that each trial provides an estimate of its own true effect. The meta-analysis provides a pooled estimate across (or an average of) a range of true effects. Thus, the random effects model assumes that there is not necessarily one "true" effect size but rather that the studies included have provided a glimpse of a range of "true" effects. The random effects model incorporates both "between-study variance" (to capture the range of difference effects across studies) and "within-study variance" (to capture the range of difference effects within studies) [33]. There are several methods for calculating the random effects model estimates. The optimal approaches continue to be debated [50].

Forest plot — A forest plot is a graphical presentation of individual studies, typically displayed as point estimates with their associated 95% CIs on an appropriate scale, next to a description of the individual studies (figure 8). The forest plot allows the reader to see the estimate and the precision of the individual studies, appreciate the heterogeneity of results, and compare the estimates of the individual studies to the overall summary estimate.

Ideally, a forest plot should provide sufficient data for the reader to make some assessment of the individual studies in the context of the overall summary (eg, to compare sample sizes, any variations in treatments such as dose, baseline values, demographic features, and study quality).

Funnel plot — A graphical technique, with related statistical tests, to examine the studies within a systematic review for the possibility of publication bias (figure 2). (See 'Publication and reporting bias' above.)

Grey literature — A term that is variably defined but generally includes sources of evidence beyond the peer-reviewed published literature. Examples include conference abstracts and proceedings, unpublished study results (eg, available on clinical trial registries such as ), press releases, adverse events databases, other online databases, government agency databases and policy documents (eg, US Food and Drug Administration), unpublished industry data, and dissertations.

Heterogeneity

Clinical heterogeneity — Qualitative differences in study features, such as study eligibility criteria, interventions, or methods of measuring outcomes, that may preclude appropriate meta-analysis. These features can be explicit (such as different drug doses used) or implicit (such as differences in populations depending on setting or country). Clinical heterogeneity may or may not result in statistical heterogeneity but often may not: for example, if the effect size is similar regardless of the drug dose, of the individual drug within a class of drugs, or in different populations (eg, men and women, or Japanese and American).

Statistical heterogeneity — Quantitative differences in study results across studies examining similar questions. Statistical heterogeneity may be due to clinical heterogeneity or to chance. Statistical heterogeneity is measured with a variety of tests, most commonly I2 and the Q statistic. Other heterogeneity measures (eg, H2, R2, tau2) have also been described but are infrequently used.

I2 index — The I2 index represents the amount of variability in the effect sizes across studies that can be explained by between-study variability. For example, an I2 value of 75 percent means that 75 percent of the variability in the measured effect sizes across studies is caused by true heterogeneity among studies. By consensus, standard thresholds for the interpretation of I2 are 25, 50, and 75 percent to represent low, medium, and high heterogeneity, respectively [51]. However, the investigators who introduced the I2 statistic noted that naïve categorization of I2 values is not appropriate in all circumstances and that "the practical impact of heterogeneity in a meta-analysis also depends on the size and direction of treatment effects" [51]. The clinical implication and interpretability of a meta-analysis with a large I2 index will be different for studies with large statistically significant effects compared with studies with smaller inconsistent effects.

Q statistic — The Q statistic (or chi square test for heterogeneity) tests the hypothesis that results across studies are homogeneous. Its calculation involves summing the squared deviations from the effect measured in each study from the overall effect and weighting the contribution from each study by the inverse of its variance. The Q statistic is usually interpreted to indicate heterogeneity if its P value is <0.10. A nonsignificant value suggests that the studies are homogeneous. However, the Q statistic has limited power to detect heterogeneity in meta-analyses with few studies, while it tends to over-detect heterogeneity in meta-analyses with many studies [52].

Meta-regression — A meta-analytic technique that permits adjustment for potential confounders and analysis of different variables to help explain differences in results across studies. Equivalent to patient-level regression, except that the unit of analysis is a study instead of a person. Additional details are provided above. (See 'Meta-regression' above.)

Network meta-analysis — A technique to simultaneously meta-analyze a network of studies that evaluated related, but different, specific comparisons. It permits quantitative inferences across studies that have made indirect comparisons of interventions. An example would be the comparison of two or more drugs to each other, when each was studied only in comparison to placebo. Additional details are provided above. (See 'Network meta-analysis' above.)

PICO method (PICOD, PICOS, PICOTS, others) — An acronym that stands for Population, Intervention(s), Comparator(s), Outcome(s); added letters include Study Design (PICOD), Setting (PICOS), Timing and Setting (PICOTS). Some factors may be used instead (eg, Exposure instead of Intervention) and other factors may also be import (eg, effect modifiers). PICO is the basis for a systematic approach in developing a research question and research protocol. While used extensively for systematic reviews, PICO is relevant to all medical research questions. Each feature is defined explicitly and comprehensively so that it is unambiguously evident which studies are eligible for inclusion in a systematic review. (See "Evidence-based medicine", section on 'Formulating a clinical question'.)

PRISMA — The Preferred Reporting Items of Systematic reviews and Meta-Analyses (PRISMA) statement and extensions are sets of guidelines for reporting systematic reviews and meta-analyses [1]. PRISMA is used as a standard by many researchers and journals.

PROSPERO — An international database of prospectively registered systematic reviews in health care. PROSPERO creates a permanent record of systematic review protocols to reduce unnecessary duplication of efforts and increase transparency. Researchers should ideally enter their protocols prospectively and update them as necessary.

Publication bias — One of several related biases in the available evidence being considered for inclusion in a systematic review. Conceptually, studies that have been published are systematically different than studies that have failed to be published, due to lack of acceptance by journals, lack of interest by authors or research grantors, or potentially, by deliberate withholding by funders. Theoretically, "positive" (statistically significant) results are more likely to be published than "negative" results.

Related biases include:

Selective outcome reporting bias, wherein published studies report only certain outcomes

Time-lag bias, wherein "negative" study results tend to be delayed in their publication compared with "positive" results

Location bias, wherein "positive" or more interesting results tend to be published in journals that are more easily accessible

Language bias, wherein results of studies published in non-English language journals differ from those of studies from the same countries or authors published in English

Multiple or duplicate publication bias, wherein certain studies may be overrepresented in the literature due to duplicate or overlapping publications (which may be difficult to tease apart).

Risk of bias assessment — The risk of bias (RoB) assessment (sometimes referred to as "quality assessment") represents the extent to which trial design and methodology prevented systematic error. Assessing RoB can help explain differences in the results of systematic reviews. The primary value of the RoB assessment of individual studies in the meta-analysis is to determine the degree of confidence that the pooled effect estimate reflects the "truth" as best as it can be measured. One would be more likely to have high confidence in conclusions based upon "high-quality" (ie, low RoB) studies rather than "low-quality" (ie, high RoB) studies. Additional details are provided above. (See 'Risk of bias assessment' above.)

Sensitivity analysis — A method of exploring heterogeneity in a meta-analysis by varying which studies are included to determine the effects of such changes. Used to explore how sensitive a meta-analysis finding is to inclusion of individual studies and to evaluate possible causes of heterogeneity; for example, whether exclusion of high RoB studies influences the size of the effect.

SUMMARY

Definitions

A systematic review is a comprehensive summary of all available evidence that meets predefined eligibility criteria to address a specific clinical question or range of questions. (See 'Systematic review' above.)

Meta-analysis, which is commonly included in systematic reviews, is a statistical method that quantitatively combines the results from different studies. It is commonly used to provide an overall pooled estimate of the benefit or harm of an intervention. (See 'Meta-analysis' above.)

Steps to conducting systematic review and meta-analysis – Several steps are essential for conducting a systematic review or meta-analysis. These include:

Formulating the research question(s) (see 'Formulating research questions' above)

Developing a protocol (see 'Developing a protocol' above)

Searching for the evidence (see 'The literature search' above)

Assessing the risk of bias of studies (see 'Risk of bias assessment' above)

Summarizing and displaying results (eg, using forest pots and a summary of findings table, as shown in the figure (figure 1)) (see 'Forest plot' above)

Exploring reasons for heterogeneity across studies (see 'Exploration of heterogeneity' above)

Reading and interpreting systematic reviews – When reading and interpreting a systematic review, the reader should appraise the methodologic quality, assess for potential sources of bias, and consider the extent to which the findings are applicable to their specific question. Key issues to consider are summarized in the table (table 2). The value of a systematic review's conclusions may be limited by the quality and applicability of the individual studies included in the review. (See 'Reading and interpreting a systematic review' above.)

  1. Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 2021; 372:n71.
  2. Page MJ, Moher D, Bossuyt PM, et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ 2021; 372:n160.
  3. Ioannidis JP. Why most published research findings are false: author's reply to Goodman and Greenland. PLoS Med 2007; 4:e215.
  4. LeLorier J, Grégoire G, Benhaddad A, et al. Discrepancies between meta-analyses and subsequent large randomized, controlled trials. N Engl J Med 1997; 337:536.
  5. Cappelleri JC, Ioannidis JP, Schmid CH, et al. Large trials vs meta-analysis of smaller trials: how do their results compare? JAMA 1996; 276:1332.
  6. Villar J, Carroli G, Belizán JM. Predictive ability of meta-analyses of randomised controlled trials. Lancet 1995; 345:772.
  7. Extensions of the PRISMA Statement. Available at: www.prisma-statement.org/Extensions/Default.aspx (Accessed on August 30, 2023).
  8. Institute of Medicine. Finding what workds in health care: Standards for systematic reviews. The National Academies Press, Washington, DC, 2011. Available at: http://www.iom.edu/Reports/2011/Finding-What-Works-in-Health-Care-Standards-for-Systematic-Reviews.aspx (Accessed on October 10, 2011).
  9. Cochrane. Available at: https://www.cochrane.org/ (Accessed on June 19, 2023).
  10. Counsell C. Formulating questions and locating primary studies for inclusion in systematic reviews. Ann Intern Med 1997; 127:380.
  11. Paez A. Grey literature: An important resource in systematic reviews. J Evid Based Med 2017.
  12. Schmucker CM, Blümle A, Schell LK, et al. Systematic review finds that study data not published in full text articles have unclear impact on meta-analyses results in medical research. PLoS One 2017; 12:e0176210.
  13. Thornton A, Lee P. Publication bias in meta-analysis: its causes and consequences. J Clin Epidemiol 2000; 53:207.
  14. Ioannidis JP. Effect of the statistical significance of results on the time to completion and publication of randomized efficacy trials. JAMA 1998; 279:281.
  15. Vevea JL, Woods CM. Publication bias in research synthesis: sensitivity analysis using a priori weight functions. Psychol Methods 2005; 10:428.
  16. Egger M, Davey Smith G, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ 1997; 315:629.
  17. Tang JL, Liu JL. Misleading funnel plot for detection of bias in meta-analysis. J Clin Epidemiol 2000; 53:477.
  18. Terrin N, Schmid CH, Lau J, Olkin I. Adjusting for publication bias in the presence of heterogeneity. Stat Med 2003; 22:2113.
  19. Terrin N, Schmid CH, Lau J. In an empirical evaluation of the funnel plot, researchers could not visually identify publication bias. J Clin Epidemiol 2005; 58:894.
  20. Methods Guide for Effectiveness and Comparative Effectiveness Reviews, Agency for Healthcare Research and Quality (US).
  21. Duval S, Tweedie R. Trim and fill: A simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics 2000; 56:455.
  22. Copas J. What works?: Selectivity models and meta-analysis. Journal of the Royal Statistical Society Series A 1999; 162:95.
  23. Rosenthal R. The 'file drawer problem' and tolerance for null results. Psychol Bull 1979; 86:638.
  24. Ioannidis JP, Trikalinos TA. An exploratory test for an excess of significant findings. Clin Trials 2007; 4:245.
  25. Moher D, Jadad AR, Nichol G, et al. Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists. Control Clin Trials 1995; 16:62.
  26. Higgins JP, Altman DG, Gøtzsche PC, et al. The Cochrane Collaboration's tool for assessing risk of bias in randomised trials. BMJ 2011; 343:d5928.
  27. Sterne JAC, Savović J, Page MJ, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ 2019; 366:l4898.
  28. Sterne JA, Hernán MA, Reeves BC, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ 2016; 355:i4919.
  29. Campbell JM, Klugar M, Ding S, et al. Diagnostic test accuracy systematic reviews. In: JBI Manual for Evidence Synthesis, Aromataris E, Munn Z (Eds), JBI, 2020.
  30. Verhagen AP, de Vet HC, de Bie RA, et al. The art of quality assessment of RCTs included in systematic reviews. J Clin Epidemiol 2001; 54:651.
  31. Bassler D, Briel M, Montori VM, et al. Stopping randomized trials early for benefit and estimation of treatment effects: systematic review and meta-regression analysis. JAMA 2010; 303:1180.
  32. Walter SD, Guyatt GH, Bassler D, et al. Randomised trials with provision for early stopping for benefit (or harm): The impact on the estimated treatment effect. Stat Med 2019; 38:2524.
  33. DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials 1986; 7:177.
  34. Schulz KF, Grimes DA. Multiplicity in randomised trials II: subgroup and interim analyses. Lancet 2005; 365:1657.
  35. Rothman KJ, Greenland S. Modern epidemiology, 2nd ed, Lippincott-Raven, Philadelphia 1998.
  36. Geissbühler M, Hincapié CA, Aghlmandi S, et al. Most published meta-regression analyses based on aggregate data suffer from methodological pitfalls: a meta-epidemiological study. BMC Med Res Methodol 2021; 21:123.
  37. Berkey CS, Hoaglin DC, Mosteller F, Colditz GA. A random-effects regression model for meta-analysis. Stat Med 1995; 14:395.
  38. Schmid CH. Exploring heterogeneity in randomized trials via metaanalysis. Drug Inf J 1999; 33:211.
  39. van Houwelingen HC, Arends LR, Stijnen T. Advanced methods in meta-analysis: multivariate approach and meta-regression. Stat Med 2002; 21:589.
  40. Ioannidis JP, Cappelleri JC, Sacks HS, Lau J. The relationship between study design, results, and reporting of randomized clinical trials of HIV infection. Control Clin Trials 1997; 18:431.
  41. Clarke MJ, Stewart LA. Principles of and procedures for systematic reviews. In: Systematic reviews in health care: meta-analysis in context, Egger M, Smith G, Altman D (Eds), BMJ Publishing Group, London 2001. p.23.
  42. Jansen JP, Fleurence R, Devine B, et al. Interpreting indirect treatment comparisons and network meta-analysis for health-care decision making: report of the ISPOR Task Force on Indirect Treatment Comparisons Good Research Practices: part 1. Value Health 2011; 14:417.
  43. Mills EJ, Ioannidis JP, Thorlund K, et al. How to use an article reporting a multiple treatment comparison meta-analysis. JAMA 2012; 308:1246.
  44. Brignardello-Petersen R, Mustafa RA, Siemieniuk RAC, et al. GRADE approach to rate the certainty from a network meta-analysis: addressing incoherence. J Clin Epidemiol 2019; 108:77.
  45. Brignardello-Petersen R, Bonner A, Alexander PE, et al. Advances in the GRADE approach to rate the certainty in estimates from a network meta-analysis. J Clin Epidemiol 2018; 93:36.
  46. Salanti G. Indirect and mixed-treatment comparison, network, or multiple-treatments meta-analysis: many names, many benefits, many concerns for the next generation evidence synthesis tool. Res Synth Methods 2012; 3:80.
  47. Trinquart L, Attiche N, Bafeta A, et al. Uncertainty in Treatment Rankings: Reanalysis of Network Meta-analyses of Randomized Trials. Ann Intern Med 2016; 164:666.
  48. Bradburn MJ, Deeks JJ, Berlin JA, Russell Localio A. Much ado about nothing: a comparison of the performance of meta-analytical methods with rare events. Stat Med 2007; 26:53.
  49. Lau J, Ioannidis JP, Schmid CH. Summing up evidence: one answer is not always enough. Lancet 1998; 351:123.
  50. Jackson D, Law M, Stijnen T, et al. A comparison of seven random-effects models for meta-analyses that estimate the summary odds ratio. Stat Med 2018; 37:1059.
  51. Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ 2003; 327:557.
  52. Huedo-Medina TB, Sánchez-Meca J, Marín-Martínez F, Botella J. Assessing heterogeneity in meta-analysis: Q statistic or I2 index? Psychol Methods 2006; 11:193.
Topic 16293 Version 27.0

References

آیا می خواهید مدیلیب را به صفحه اصلی خود اضافه کنید؟