A practical 9-step guide to conducting a meta-analysis, from defining your research question to publishing your results. No coding required.
A meta-analysis is a statistical method that combines the quantitative results of multiple independent studies addressing the same research question into a single pooled estimate. It sits at the top of the evidence hierarchy in evidence-based medicine, above individual randomized controlled trials (RCTs), cohort studies, and expert opinions.
Meta-analysis is not the same as a systematic review. A systematic review is the broader research process of identifying, evaluating, and synthesizing all relevant evidence on a topic. A meta-analysis is the statistical technique used within a systematic review to quantitatively combine results. You can have a systematic review without a meta-analysis (a narrative synthesis), but you should never perform a meta-analysis without a rigorous systematic review underpinning it.
| Level | Evidence Type | Strength |
|---|---|---|
| 1 | Systematic reviews and meta-analyses of RCTs | Highest |
| 2 | Individual randomized controlled trials | High |
| 3 | Cohort studies | Moderate |
| 4 | Case-control studies | Moderate-Low |
| 5 | Case series / Case reports | Low |
| 6 | Expert opinion / Editorials | Lowest |
Every meta-analysis begins with a clearly formulated research question. The PICO framework is the gold standard for structuring clinical questions:
| Element | Meaning | Example |
|---|---|---|
| P (Population) | Who are the patients or participants? | Adults with type 2 diabetes mellitus |
| I (Intervention) | What treatment or exposure is being studied? | GLP-1 receptor agonists (semaglutide, liraglutide) |
| C (Comparator) | What is the control or alternative? | Placebo or standard care |
| O (Outcome) | What outcome are you measuring? | Change in HbA1c, body weight, adverse events |
Your PICO question directly determines your eligibility criteria. Before searching the literature, define these precisely:
Before starting your search, register your protocol on PROSPERO (International Prospective Register of Systematic Reviews). Registration demonstrates that your review was planned before results were known, reducing the risk of outcome reporting bias. PROSPERO registration is free and increasingly required by journals.
A comprehensive, reproducible search strategy is the backbone of any meta-analysis. The goal is high sensitivity (recall) -- it is better to retrieve too many irrelevant articles than to miss relevant ones.
At minimum, search these three databases:
Depending on your topic, also consider: Web of Science, Scopus, PsycINFO (psychology), CINAHL (nursing), ClinicalTrials.gov (unpublished trial data), and conference proceedings.
Translate each PICO element into search terms. Combine synonyms with OR and PICO elements with AND:
("diabetes mellitus, type 2"[MeSH] OR "type 2 diabetes" OR "T2DM") AND ("GLP-1 receptor agonists"[MeSH] OR "glucagon-like peptide-1" OR "semaglutide" OR "liraglutide" OR "dulaglutide") AND ("randomized controlled trial"[pt] OR "controlled clinical trial"[pt])
Record the exact search string, database, date of search, and number of results for each database. PRISMA 2020 requires this level of transparency, and reviewers will ask for it.
Following the PRISMA 2020 flow diagram, study selection proceeds in distinct phases:
After searching multiple databases, you will have duplicate records. Use reference management software (Zotero, EndNote, or MetaReview's built-in deduplication) to identify and remove duplicates. Typically, 20-40% of combined results are duplicates.
Rapidly screen each unique record based on its title and abstract against your inclusion criteria. At this stage, be inclusive -- if in doubt, keep it for full-text review. Two independent reviewers should screen all records separately.
Retrieve the full text of all potentially eligible articles. Read each one carefully against your complete inclusion/exclusion criteria. Record the specific reason for excluding each article (PRISMA 2020 requirement).
Calculate Cohen's kappa coefficient to quantify agreement between the two reviewers:
| Kappa Value | Level of Agreement |
|---|---|
| < 0.20 | Poor |
| 0.21 - 0.40 | Fair |
| 0.41 - 0.60 | Moderate |
| 0.61 - 0.80 | Substantial |
| 0.81 - 1.00 | Almost perfect |
Disagreements should be resolved through discussion or by consulting a third reviewer. Aim for kappa ≥ 0.80.
Data extraction is where you systematically pull the quantitative and qualitative information needed from each included study. Accuracy here is critical -- errors in data extraction propagate directly into your meta-analysis results.
Your extraction form should capture:
| Outcome Type | Data to Extract | Example |
|---|---|---|
| Binary (dichotomous) | Events and total N, for both intervention and control groups | Deaths: 15/200 (treatment) vs 30/198 (control) |
| Continuous | Mean, standard deviation (SD), and N for both groups | HbA1c change: -1.2 (SD 0.8, n=150) vs -0.4 (SD 0.7, n=148) |
| Time-to-event (survival) | Hazard ratio (HR), 95% CI, or data to reconstruct them | HR = 0.72 (95% CI: 0.58-0.89) |
Two reviewers should independently extract data from every study. After extraction, compare the results and resolve any discrepancies. This catches transcription errors, misread tables, and misinterpreted outcome definitions. Studies have shown that single-reviewer extraction has an error rate of 10-30%.
The effect size measure you choose determines how results are combined and interpreted. Choosing the wrong effect size is one of the most common mistakes in meta-analysis. Here is a decision framework:
Is your outcome binary (yes/no) or continuous (numerical)?
| Effect Size | Data Type | When to Use | Null Value | Interpretation Example |
|---|---|---|---|---|
| OR (Odds Ratio) | Binary | Case-control studies; logistic regression outputs | 1.0 | OR = 2.5: The odds of the event are 2.5 times higher in the intervention group |
| RR (Risk Ratio) | Binary | RCTs and cohort studies (preferred over OR) | 1.0 | RR = 0.70: 30% relative risk reduction in the intervention group |
| MD (Mean Difference) | Continuous | Same outcome scale across all studies | 0 | MD = -5.3 mmHg: Blood pressure is 5.3 mmHg lower in the intervention group |
| SMD (Standardized Mean Difference) | Continuous | Different scales measuring the same construct | 0 | SMD = -0.50: A medium effect favoring the intervention (Cohen's conventions) |
| HR (Hazard Ratio) | Time-to-event | Survival analysis, Cox regression data | 1.0 | HR = 0.65: 35% reduction in the instantaneous hazard of the event |
For a deeper dive into effect size selection, including formulas and conversion methods, see our dedicated guide: Choosing Effect Sizes: OR, RR, MD, SMD Guide.
This is the computational core of your meta-analysis. Three key decisions must be made: the analytical model, heterogeneity assessment, and how to visualize results.
| Feature | Fixed-Effect Model | Random-Effects Model |
|---|---|---|
| Assumption | All studies estimate the same single true effect | Each study estimates its own true effect; these effects follow a distribution |
| Source of variation | Within-study sampling error only | Within-study error + between-study variance (τ²) |
| Weighting | Based on study precision (inverse variance) | Adjusted weights that account for between-study heterogeneity |
| Confidence intervals | Narrower (can be falsely precise if heterogeneity exists) | Wider (more conservative, typically more realistic) |
| When to use | Studies are clinically and methodologically homogeneous; I² < 25% | Studies differ in populations, settings, or methods (most real-world scenarios) |
Heterogeneity refers to variability in study results beyond what is expected from sampling error alone. Three key statistics quantify it:
| Statistic | What It Measures | Interpretation |
|---|---|---|
| I² | Percentage of total variability due to true heterogeneity | 0-25% low, 25-50% moderate, 50-75% substantial, >75% considerable |
| Cochran's Q | Whether observed differences in results are compatible with chance alone | p < 0.10 suggests significant heterogeneity (uses a liberal threshold because the test has low power) |
| τ² (tau-squared) | Absolute between-study variance | Expressed in the same units as the effect size squared; larger values mean more heterogeneity |
The forest plot is the signature visualization of a meta-analysis. Here is how to read one:
When heterogeneity is substantial (I² > 50%), subgroup analysis can help identify sources. Divide studies into groups based on pre-specified characteristics:
Use the Q-between test (interaction test) to determine if the effect truly differs between subgroups (p < 0.05).
Publication bias occurs because studies with positive or statistically significant results are more likely to be published than those with null or negative findings. This means the available literature may overestimate the true effect, and your meta-analysis could inherit that bias.
A funnel plot graphs each study's effect size (x-axis) against its precision, typically standard error (y-axis, inverted). In the absence of publication bias:
Asymmetry in the funnel plot -- typically a gap in the bottom-right or bottom-left corner -- suggests that small studies with unfavorable results may be missing.
| Test | Method | When to Use | Significance Threshold |
|---|---|---|---|
| Egger's test | Linear regression of effect size on standard error | Continuous outcomes (MD, SMD); works well with ≥10 studies | p < 0.10 |
| Begg's test | Rank correlation between effect size and variance | Binary outcomes (OR, RR); less powerful than Egger's | p < 0.10 |
| Peter's test | Regression of effect size on inverse of total sample size | Binary outcomes; less affected by mathematical coupling than Egger's | p < 0.10 |
If publication bias is detected, the trim-and-fill method provides an adjusted estimate. It works by:
The adjusted estimate shows what the pooled effect might be if publication bias were absent. A large shift from the original estimate is concerning.
Sensitivity analysis tests the robustness of your meta-analysis results. The question it answers: "Would the conclusions change if we made different analytical decisions?"
The most common sensitivity analysis method. Procedure:
If removing any single study causes the pooled effect to change direction (e.g., from significant to non-significant, or from favoring intervention to favoring control), that study is influential and must be discussed explicitly.
The PRISMA 2020 (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement provides a 27-item checklist for transparent reporting. Most medical journals require PRISMA compliance.
Here is a standard template for reporting your primary meta-analysis result:
| Section | Key Items to Report |
|---|---|
| Title | Identify the report as a systematic review, meta-analysis, or both |
| Registration | Registration number and registry name (e.g., PROSPERO CRD42025xxxxx) |
| Search strategy | Full search strings for all databases (typically in a supplementary file) |
| Study selection | PRISMA flow diagram with numbers at each stage |
| Effect measures | Specify effect measure (OR, RR, MD, SMD, HR) and why it was chosen |
| Synthesis methods | Model (fixed/random), software used, method for pooling |
| Certainty assessment | GRADE framework for overall quality of evidence (optional but recommended) |
For a complete PRISMA 2020 flow diagram guide, see: PRISMA 2020 Flow Diagram Guide.
Choosing the right software can make or break your meta-analysis experience. Here is an honest comparison of the main options available today:
| Feature | MetaReview | RevMan (Cochrane) | R (meta/metafor) | Stata | Covidence |
|---|---|---|---|---|---|
| Price | Free | Free (Cochrane authors) / Paid | Free | Paid ($$$) | Paid ($$) |
| Installation | None (browser-based) | Desktop download required | Install R + packages | Desktop license | None (browser-based) |
| Coding required | No | No | Yes (R scripts) | Yes (do-files) | No |
| Effect sizes | OR, RR, MD, SMD | OR, RR, MD, SMD | All types + custom | All types + custom | No statistical analysis |
| Forest plot | Yes (SVG, publication-quality) | Yes | Yes (customizable) | Yes (customizable) | No |
| Funnel plot | Yes | Yes | Yes | Yes | No |
| Subgroup analysis | Yes | Yes | Yes | Yes | No |
| Sensitivity analysis | Leave-one-out | Limited | Full suite | Full suite | No |
| Literature search | Built-in PubMed search | Cochrane Library | No | No | Import only |
| AI screening | Yes (LLM-powered) | No | No | No | No |
| PDF data extraction | Yes (AI-powered) | No | No | No | No |
| Auto-generated results text | Yes | No | No | No | No |
| Best for | Researchers who want an all-in-one free tool | Cochrane review authors | Statisticians who want full control | Biostatisticians with Stata access | Screening and collaboration only |
For a detailed feature-by-feature comparison, see: Meta-Analysis Software Comparison.
After reviewing thousands of published meta-analyses and their peer review feedback, these are the most frequent errors that lead to rejection or revision requests:
Combining OR from one study with RR from another without proper conversion produces meaningless pooled estimates. Always convert to a common metric or recalculate from raw data.
Reporting a pooled effect with I² = 85% and no attempt to explore or explain the heterogeneity is a red flag for reviewers. High heterogeneity demands subgroup analysis, meta-regression, or a narrative approach.
Excluding studies without pre-specified, transparent criteria is scientific misconduct. Every exclusion must be documented with a clear reason. This is why protocol registration on PROSPERO matters.
Without prospective registration, reviewers cannot verify that your methods, outcomes, and analyses were not changed after seeing the results. PROSPERO registration takes 30 minutes and prevents months of reviewer questions.
If studies come from different populations, settings, and time periods, a fixed-effect model will underestimate the uncertainty. When in doubt, use random-effects.
Searching only PubMed is not sufficient. Cochrane recommends at least three databases. Missing Embase alone can mean missing 20-30% of relevant studies.
Failing to perform and report sensitivity analysis (at minimum, leave-one-out) leaves your conclusions unverified. Reviewers expect to see evidence that results are robust.
Subgroup analyses not specified in the protocol should be explicitly labeled as exploratory. Treating data-driven subgroups as definitive findings is misleading.
Formal tests (Egger's, Begg's) lack statistical power with fewer than 10 studies. Acknowledge this limitation rather than claiming "no publication bias detected" based on an underpowered test.
For observational studies, always prefer the most adjusted (multivariable) estimates. Unadjusted estimates may be confounded and produce biased pooled results.
MetaReview is a free online tool. Go from data entry to a publication-quality forest plot in under 5 minutes. No installation, no coding, no cost.
Open MetaReview - It's FreeGet notified about new features and meta-analysis tips.
No spam. Unsubscribe anytime.
A systematic review is the entire process of systematically identifying, evaluating, and synthesizing all relevant research on a topic. It follows a structured protocol with explicit inclusion/exclusion criteria. A meta-analysis is specifically the statistical method used within a systematic review to quantitatively pool results from multiple studies into a single effect estimate. You can conduct a systematic review without a meta-analysis (presenting a narrative synthesis), but a meta-analysis should always be embedded within a systematic review framework. Think of systematic review as the research method and meta-analysis as the statistical technique.
There is no absolute minimum, but practical considerations matter. With 2 studies, you can technically compute a pooled estimate, but the result will be driven almost entirely by sample size differences and provides limited insight. With 5 or more studies, heterogeneity statistics (I², Q) become more meaningful. With 10 or more studies, you can reliably perform publication bias tests (Egger's, Begg's) and funnel plot analysis. Most reviewers consider 5 studies a reasonable minimum for a credible meta-analysis, and will accept fewer only if the topic is narrow and the studies are high-quality.
MetaReview is a completely free, browser-based meta-analysis tool that requires no installation, no account, and no coding knowledge. It supports OR, RR, MD, and SMD effect sizes, fixed and random-effects models, forest plots, funnel plots, subgroup analysis, leave-one-out sensitivity analysis, and auto-generated results paragraphs. Other free options include the R statistical language with the "meta" and "metafor" packages, which are powerful but require programming skills. RevMan is free for Cochrane review authors but requires desktop installation. OpenMeta-Analyst is another free option but is no longer actively maintained.
A forest plot displays each study as a row. The square represents the study's effect estimate (e.g., OR, RR, or MD), with the square size proportional to the study's weight. The horizontal line through the square is the 95% confidence interval. The diamond at the bottom represents the pooled (combined) effect. A vertical reference line shows the null effect (1.0 for ratio measures like OR/RR, or 0 for difference measures like MD/SMD). If a study's confidence interval crosses this null line, that study alone did not find a statistically significant effect. If the diamond does not touch the null line, the pooled result is statistically significant.
I² tells you what percentage of the observed variation across study results is due to genuine differences between studies (true heterogeneity) rather than random sampling variation. An I² of 0% means all variation is due to chance; an I² of 75% means three-quarters of the observed variability reflects true differences in underlying effects. The Cochrane Handbook provides rough benchmarks: 0-40% might not be important, 30-60% may represent moderate heterogeneity, 50-90% may represent substantial heterogeneity, and 75-100% indicates considerable heterogeneity. When I² is high, explore sources through subgroup analysis or meta-regression rather than simply reporting the pooled estimate.
Yes. Point-and-click tools like MetaReview are designed for researchers who do not have programming or advanced biostatistics training. You enter your extracted data (event counts, sample sizes, means, standard deviations), select your effect size type and model, and the tool computes everything: pooled estimates, confidence intervals, heterogeneity statistics, forest plots, funnel plots, and sensitivity analyses. That said, understanding what these statistics mean and how to interpret them is essential for writing a defensible manuscript. We recommend reading the relevant chapters of the Cochrane Handbook for Systematic Reviews even if you use a no-code tool.
A realistic timeline for a focused meta-analysis is 3 to 12 months from protocol registration to manuscript submission. Protocol development and PROSPERO registration takes 1-2 weeks. The literature search typically takes 1-3 weeks. Screening can take 2-8 weeks depending on volume (tools like MetaReview's AI screening can compress this significantly). Data extraction takes 2-6 weeks for 15-30 studies. Quality assessment takes 1-2 weeks. Statistical analysis and figure generation can be done in 1-3 days using the right tools. Writing the manuscript takes 2-4 weeks. Peer review and revisions add another 2-6 months. The most common bottleneck is screening and data extraction, which together account for roughly half of the total time.
PRISMA stands for Preferred Reporting Items for Systematic Reviews and Meta-Analyses. The PRISMA 2020 update consists of a 27-item checklist covering everything that should be reported in a systematic review or meta-analysis: title, abstract, rationale, objectives, protocol registration, eligibility criteria, information sources, search strategy, selection process, data extraction, effect measures, synthesis methods, risk of bias, results of syntheses, reporting biases, certainty of evidence, and conclusions. It also includes a standardized flow diagram template. Most biomedical journals require authors to submit a completed PRISMA checklist alongside their manuscript. The checklist is freely available at prisma-statement.org.