Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/6618138

Guidelines for Systematic Review in Environmental Management

Article  in  Conservation Biology · January 2007


DOI: 10.1111/j.1523-1739.2006.00485.x · Source: PubMed

CITATIONS READS
658 5,751

2 authors:

Andrew Pullin Gavin B Stewart


Bangor University Newcastle University
235 PUBLICATIONS   13,637 CITATIONS    147 PUBLICATIONS   5,438 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Organic Food Quality View project

CADIMA: an open-access evidence synthesis tool permanently maintained and further developed by JKI View project

All content following this page was uploaded by Gavin B Stewart on 11 October 2017.

The user has requested enhancement of the downloaded file.


Guidelines for Systematic Review in Conservation and
Environmental Management
ANDREW S. PULLIN∗ AND GAVIN B. STEWART
Centre for Evidence-Based Conservation, School of Biosciences, The University of Birmingham, Edgbaston, Birmingham B15 2TT,
United Kingdom

Abstract: An increasing number of applied disciplines are utilizing evidence-based frameworks to review
and disseminate the effectiveness of management and policy interventions. The rationale is that increased
accessibility of the best available evidence will provide a more efficient and less biased platform for decision
making. We argue that there are significant benefits for conservation in using such a framework, but the
scientific community needs to undertake and disseminate more systematic reviews before the full benefit can
be realized. We devised a set of guidelines for undertaking formalized systematic review, based on a health
services model. The guideline stages include planning and conducting a review, including protocol formation,
search strategy, data inclusion, data extraction, and analysis. Review dissemination is addressed in terms
of current developments and future plans for a Web-based open-access library. By the use of case studies we
highlight critical modifications to guidelines for protocol formulation, data-quality assessment, data extraction,
and data synthesis for conservation and environmental management. Ecological data presented significant
but soluble challenges for the systematic review process, particularly in terms of the quantity, accessibility, and
diverse quality of available data. In the field of conservation and environmental management there needs
to be further engagement of scientists and practitioners to develop and take ownership of an evidence-based
framework.

Keywords: conservation policy, conservation practice, decision making, evidence-based knowledge transfer
Directrices para la Revisión Sistemática en Gestión Ambiental y de Conservación
Resumen: Un mayor número de disciplinas está utilizando marcos de referencia basados en evidencias
para revisar y diseminar la efectividad de las intervenciones de gestión y polı́tica. El fundamento es que la
mayor accesibilidad de la evidencia mejor disponible proporcionará una plataforma de toma de decisiones
menos sesgada y más eficiente. Argumentamos que hay beneficios significativos para la conservación al
utilizar tal marco de referencia, pero la comunidad cientı́fica debe emprender y diseminar revisiones más
sistemáticas antes de que se pueda comprender el beneficio completo. Diseñamos un conjunto de directrices
para realizar revisiones sistemáticas formales, basado en un modelo de servicios de salud. Las etapas de
las directrices incluyen la planificación y conducción de una revisión, incluyendo formación del protocolo,
estrategias de búsqueda, inclusión de datos, extracción y análisis de datos. La diseminación de revisiones es
abordada en términos del desarrollo actual y los planes futuros para una biblioteca de acceso abierto en
la Web. Al utilizar estudios de caso resaltamos modificaciones crı́ticas a las directrices para la formulación
del protocolo, evaluación de la calidad de los datos, extracción de datos y sı́ntesis de datos para la gestión
ambiental y de conservación. Los datos ecológicos presentaron retos significativos, pero solucionables, para
el proceso de revisión sistemática, particularmente en términos de la cantidad, accesibilidad y calidad de los
datos disponibles. Se requiere un mayor compromiso de cientı́ficos y profesionales de la gestión ambiental y
de conservación para desarrollar y apropiarse de un marco de referencia basado en evidencias.

Palabras Clave: polı́tica de la conservación, práctica de la conservación, toma de decisiones, transferencia de


conocimiento basado en evidencia

[email protected]
Paper submitted August 4, 2005; revised manuscript accepted January 25, 2006.

1647
Conservation Biology Volume 20, No. 6, 1647–1656

C 2006 Society for Conservation Biology
DOI: 10.1111/j.1523-1739.2006.00485.x
1648 Systematic Review Guidelines Pullin & Stewart

Introduction For this methodology to have an impact on conservation


effectiveness, more conservation biologists need to un-
In response to problems of accessing scientific informa- dertake reviews, and we encourage this community to
tion to support decision making, many applied disciplines use (and improve) these guidelines and help establish an
are utilizing an evidence-based framework for knowledge evidence-based framework for our discipline.
transfer involving systematic review and dissemination of
evidence on effectiveness of interventions at the practical
and policy levels (Stevens & Milne 1997; Khan et al. 2003).
The framework is most fully developed in the health ser-
Systematic Review Guidelines
vices sector, where global review and dissemination units
For clarity the guidelines are split into three stages and key
have been established and are linked by networks such
phases within each. We use examples of our own reviews
as the Cochrane Collaboration (e.g., www.cochrane.org).
to highlight key issues for reviews in conservation and
Within these networks systematic reviews are undertaken
environmental management.
following set guidelines that include peer review to en-
sure that they meet required standards before dissemina-
tion. The need for such a framework in conservation has
Stage 1—Planning the Review
been argued elsewhere (Pullin & Knight 2001; Fazey et
al. 2004; Pullin et al. 2004; Sutherland et al. 2004). Here
QUESTION FORMULATION
we present a summary of newly developed guidelines for
systematic review and dissemination in conservation and A systematic review starts with a specific question, clearly
environmental management (more detailed guidance can defined with subject, intervention, and outcome ele-
be obtained from www.cebc.bham.ac.uk). ments (Table 1), that is answerable in scientific terms
We used established guidelines from the health services ( Jackson 1980; Cooper 1984; Hedges 1994). The ques-
sector (NHS CRD 2001; Higgins & Green 2005) as our tion is critical to the process because it generates the
models, undertook our own systematic reviews to test literature search terms and determines relevance criteria
these models, and modified the guidelines through anal- (NHS CRD 2001). Finding the right question is a com-
ysis of procedures and outcomes for their application to promise (probably more so in ecology than in medicine)
conservation and environmental management. Although between taking a holistic approach (thus increasing real-
the basic ethos of systematic review remains unchanged, ism by involving a large number of variables but limiting
ecological data are often fundamentally different in na- the number of relevant studies), and a reductionist ap-
ture from data on human health (Fazey et al. 2004; Pullin proach (which may limit the review’s relevance, utility,
et al. 2004), and this is reflected in our guidelines. At and value) (Stewart et al. 2005a). The question should be
first glance many of the guidelines may seem routine and practice or policy relevant and should therefore be gener-
common sense, but the rigor and objectivity applied at ated by, or at least in collaboration with, relevant decision
key stages, and the underlying philosophy of transparency makers (or organizations) for whom the question is real.
and independence, sets them apart from the majority of It may also be important for the question to be seen as
traditional reviews published recently in the field of ap- neutral to stakeholder groups. Ideally meetings should
plied ecology (Roberts et al. 2006). Pullin and Knight be held with key stakeholders to try to reach consensus
(2001), Fazey et al. (2004), Pullin et al. (2004), and Suther- on the nature of the question. This may be more critical
land et al. (2004) argue that, once established, systematic for ecological review than medical review because, un-
review methodology will significantly improve the identi- like the benefit of improving human health, the benefit
fication and provision of evidence to support practice and of conserving biodiversity is often contested (Fazey et al.
policy in conservation and environmental management. 2004).

Table 1. Elements of a reviewable question; normally a permutation of “Does intervention x on subject y produce outcome z’’?

Question
element Definition

Subject unit of study (e.g., ecosystem, habitat, species) that should be defined in terms of the subject(s) on whom the
intervention will be applied
Intervention proposed management regime, policy, or action
Outcome all relevant objectives of the proposed management intervention that can be measured reliably with particular
consideration given to the most important management outcome and to any outcome critical to whether the
proposed intervention has greater benefits or disadvantages than any other alternatives (i.e., the outcome desired)
Comparator Is the intervention being compared with no intervention or are alternative interventions being compared with each
other?

Conservation Biology
Volume 20, No. 6, December 2006
Pullin & Stewart Systematic Review Guidelines 1649

EXAMPLES OF QUESTION FORMULATION (NHSCRD 2001). The subject, intervention, and outcome
elements defined in the question-setting stage provide a
We use four examples of the systematic review process
priori inclusion criteria. If the relevant population, inter-
throughout and introduce each of them here.
vention, or outcome measures are present, then the data
Example 1. English Nature, a U.K. statutory conserva-
are included although data quality thresholds may result
tion agency, was concerned about the ecological impacts
in the subsequent exclusion of otherwise relevant mate-
of burning management carried out by landowners in up-
rial either from quantitative analysis or from the review
land areas of England. Discussion with English Nature per-
in entirety (see below).
sonnel enabled this general concern to be “unpacked,” al-
The search strategy is constructed from search terms
lowing definition of subject, intervention, and outcome
extracted from the subject, intervention, and outcome el-
elements of two specific review questions (Stewart et al.
ements of the question. It is important that the search
2005a): “Does burning of U.K. submontane, dry dwarf-
is sufficiently rigorous and broad so that all studies eli-
shrub heath maintain vegetation diversity?” and “Does
gible for inclusion are identified. Search protocols must
burning degrade blanket bog?” Identification of these two
balance sensitivity (getting all information of relevance)
related questions allowed specific hypotheses to be tested
and specificity (the proportion of hits that are relevant)
while retaining broader policy relevance. These also pro-
(NHS CRD 2001). In ecology resource-intensive searches
vided examples of habitat-based reviews.
of high sensitivity are required, even though this is at the
Example 2. The Royal Society for the Protection of Birds
expense of specificity, because ecology lacks the mesh-
(RSPB) was concerned about the impact of wind farms
heading indexes and integrated databases of medicine and
on bird populations, which led to a systematic review
public health. A high-sensitivity and low-specificity ap-
(Stewart et al. 2005b). This review was a test case for
proach is necessary to reduce bias and increase repeata-
measuring impact of interventions arising from specific
bility (see below). Typically, large numbers of references
development activity with policy relevance.
are therefore rejected. For example, of 317 articles with
Example 3. Tyler and Pullin (2005) and Tyler et al.
relevant titles concerning the impact of burning on blan-
(2006) examined the effectiveness of Rhododendron
ket bog, only 8 (2.5%) had comparators (Stewart et al.
ponticum control methods as a test case for review of
2005). Similarly, reviews regarding burning of dry heath
control methods for an invasive plant species.
and the impact of wind farms on bird abundance resulted
Example 4. Tyler et al. (2005) investigated the impact of
in meta-analysis of 1.7% and 12% of material with relevant
control methodologies on introduced populations of the
titles, respectively.
American mink (Mustela vison) in Europe, as an invasive
In a review of the effectiveness of control methodolo-
animal species test case.
gies on introduced populations of the American Mink
Although discussions with the proposers of a review
(Mustela vison) in Europe, Tyler et al. (2005) searched
proved effective in formulation of a review question,
the following electronic databases: Agricola, BIOSIS pre-
other stakeholders may disagree. In example 1, a key
views, CAB abstracts, Copac, Digital Dissertations, Index
stakeholder disagreed with the outcome measure (a mea-
to Theses online, ISI Current Contents, ISI Proceedings,
sure of favorable ecological condition based on the rela-
ISI Web of Knowledge, ISI Web of Science, JSTOR, Sci-
tive abundance of key species) used in the “blanket bog”
enceDirect, Scirus, Scopus, Wildlink; the World Wide Web
review. To avoid postreview problems such as this we ad-
(first 100 “hits” from www.alltheweb.com, www.google.
vocate involvement of multiple stakeholders early in the
co.uk, U.K. Department for the Environment, Food and
review process.
Rural Affairs, Scottish Natural Heritage, Oxford Univer-
sity’s Wildlife Conservation Research Unit, The Royal So-
DEVELOPING A REVIEW PROTOCOL
ciety for the Protection of Birds, The National Trust,
The review protocol acts as a document that all stakehold- British Wildlife, The Mammal Society, Mammals Trust, and
ers agree upon, after which the review itself can be con- The British Trust for Ornithology); and bibliographies of
ducted (see www.cebc.bham.ac.uk/protocols.htm for ex- relevant articles (search terms: Mustela AND vison, Mus-
amples). tela AND vison AND trap∗ , Mustela AND vison AND
A review protocol is developed as a document that control∗ , Mustela AND vison AND management, Mustela
guides the review. As in any scientific endeavor, methodol- AND vison AND pest, Mink AND trap∗ , Mink AND con-
ogy should be established and made available for scrutiny trol∗ , Mink AND management, Mink AND pest). The speci-
and comment at an early stage. Because reviews are ret- ficity of this search was low, with many references identi-
rospective by nature, the protocol is essential to make fied multiple times. The grey literature search was largely
the review process as rigorous, transparent, and well de- U.K. based due to resource limitations, although the inclu-
fined as possible (Light & Pillemer 1984). Besides a formal sion of non-U.K. theses was possible. The low specificity
presentation of the question and its background, a re- of the review (only 1% of retrieved material was judged
view protocol sets out the strategy for obtaining data and relevant), however, limits the potential for bias notwith-
defines relevance criteria for data inclusion or exclusion standing the geographical scope of the grey-literature

Conservation Biology
Volume 20, No. 6, December 2006
1650 Systematic Review Guidelines Pullin & Stewart

search. The documented search is fully repeatable and achieved, then the criteria should be further developed
transparent; thus, readers can judge its validity. and the process repeated.
Remaining articles should be viewed in full to deter-
mine whether they contain relevant and usable data. Ob-
Stage 2 - Conducting the Review taining the full text of all articles can be very time con-
suming and a realistic deadline may have to be used and
SEARCHING FOR DATA a record kept of those not obtained. The conservative
It is perhaps self-evident that the widest possible range of approach and independent checking of a subsample by
sources should be accessed to capture information. The kappa analysis should be repeated at this stage. Short
following are useful general sources: multiple electronic lists of articles and data sets should be made available for
databases (general databases and databases with specific scrutiny by stakeholders and subject experts. All should
foci), professional networks and organizations (special- be invited, within a set deadline, to identify relevant data
interest groups may have personal literature collections sources they believe are missing from the list. Reviewers
or libraries), the Internet (method of contacting or search- should be aware that investigators often cite selectively
ing for information from the above two groups), bibliogra- studies with positive results (Gotzsche 1987; Ravnskov
phies (data sources cited in literature obtained from the 1992); thus, checking bibliographies and direct contacts
above), and subject experts (direct personal contact may must be used only to augment the search.
yield new data sets). To minimize the problem of publica-
tion bias (e.g., Leimu & Koricheva 2005), both published ASSESSING QUALITY OF METHODOLOGY
and unpublished data must be included, a standard rarely
satisfied in traditional reviews. Hand searching of specific To determine the level of confidence that may be placed
sources and visits to libraries and museums are likely to in selected data sets, each should be critically appraised
be necessary to extract all relevant material. It may be to determine the extent to which its research methodol-
necessary to search local databases for questions with a ogy is likely to prevent systematic errors or bias (Moher et
regional focus. At each stage of the review it is essential al. 1995). In the health services, a hierarchy of research is
that the numbers and identities of articles retrieved, ac- recognized that scores the value of the data in terms of the
cepted, and rejected be recorded. The maintenance of a scientific rigor of the methodology used (Stevens & Milne
database or collection of bibliographic software libraries 1997). The hierarchy of methodology can be viewed as
is recommended. The repeatability of search methods generic and has been transferred from medicine to ecol-
is a key characteristic of systematic reviews (NHS CRD ogy (Pullin & Knight 2003; see www.cebc.bham.ac.uk
2001). for full details). Where a number of well-designed, high-
quality studies are available, others with inferior method-
ology may be rejected. Alternatively, the effects of individ-
SELECTION OF RELEVANT DATA
ual studies can be weighted according to their position
Once searching is complete, relevant articles must be ef- in the “quality hierarchy.” However, there are dangers in
ficiently selected without wasting resources examining the rigid application of this hierarchy in ecology. Hypo-
irrelevant articles in detail. Selecting only relevant arti- thetically, a rigorous methodology, such as a randomized
cles from a potentially large body of initial literature re- controlled trial, could be viewed as superior, even though
quires the reviewer to use inclusion and exclusion criteria it was applied over inadequately short time and small spa-
stated a priori in the protocol to impose a number of fil- tial scales, to a time series experiment providing data over
ters of increasing rigor. First, if a long list of articles or longer time and larger spatial scales more appropriate
data sources is acquired (1000s rather than 100s) and the to the question. This problem carries with it the threat
list of relevant sources is likely to be much shorter, it may of misinterpretation of evidence. Potential pitfalls of this
be efficient to eliminate some material on title only (espe- kind need to be considered at this stage and addressed by
cially if obviously spurious hits arise from ambiguity in the more pragmatic quality weightings and judicious use of
use of words in the literature). The second filter should sensitivity analysis (see below).
examine title and abstract to determine relevance. The ap- Four sources of systematic bias are routinely consid-
proach should be conservative so as to retain data if there ered in healthcare (Feinstein & Horwitz 1985; Moher et
is reasonable doubt over its relevance. It is good practice al. 1995; Moher et al. 1996; Khan et al. 2003) of which
at this stage to employ a second reviewer to go through three have, to date, required consideration in ecological
the same process on a random subsample of abstracts systematic reviews. Selection bias results from the way
from the original list and to ensure decisions are compa- that comparison (e.g., treatment and control) groups are
rable by performing a kappa analysis, which adjusts the assembled (Kunz & Oxman 1998) and is a primary reason
proportion of records for which there was agreement by for randomization. Performance bias refers to systematic
the amount of agreement expected by chance alone (Co- differences in the care provided to subjects in the com-
hen 1960; Edwards et al. 2002). If comparability is not parison groups and is dealt with by the experimenter

Conservation Biology
Volume 20, No. 6, December 2006
Pullin & Stewart Systematic Review Guidelines 1651

being unaware of which are treatments and which are ered inadequate by itself due to variation in other critical
controls ( blinding) (Schulz et al. 1995). We postulate that data-quality elements, particularly the widespread occur-
the ecological equivalents of performance bias arise from rence of confounding factors resulting from variation be-
biased baseline comparisons and failure to consider the tween treatment and control at baseline or from changes
impact of covariables along with the intervention of in- concurrent with wind-farm operation (ecological perfor-
terest. However, it is not possible to account for variables mance bias). The rigor of observations was also variable as
that are not known to be confounders or that were not measured in terms of replication and objectivity (ecologi-
measured, and for those that are known, difficulties can cal detection bias). To test for the impact of these factors,
arise in extracting standardized information for analysis. data-quality scores, summing the different aspects of data
Measurement or detection bias refers to systematic dif- quality outlined above, were added as a meta-regression
ferences between the comparison groups in outcome as- covariable. Data-quality score was not significant, suggest-
sessment and is also addressed by blinding (Schulz et al. ing that bifurcation of the data into high- and low-quality
1995). Blinding is generally not possible in ecology, but evidence was not necessary, possibly because the low-
detection bias nevertheless varies, depending on the rigor quality studies (low replication, imprecise estimates of
and objectivity of sampling methodology (e.g., percent abundance, high intratreatment variation coupled with
cover assessed by eye is subject to greater potential de- confounded baselines) had a high variance and therefore
tection bias than frequency). The fourth, attrition bias a low weighting in meta-analysis by inverse variance. Sen-
(systematic differences between the comparison groups sitivity analyses were used to explore the impact of in-
in the loss of samples), has not been an issue in ecological cluding low-quality unreplicated data, but the impact of
systematic review to date. individual data quality elements other than time was not
Assessing the quality of methodology is a critical part examined because a large number of environmental and
of the systematic review process and requires a number wind-farm correlates were of interest and the potential for
of subjective decisions about the relative importance of Type II errors would have been increased. Although this
different sources of bias and data quality elements spe- pragmatic approach is easy to apply, there is no measure
cific to ecology, particularly the appropriateness of vari- of a study’s “true” validity (Emerson et al. 1990; Schulz et
able temporal and spatial scales. It is therefore vital that al. 1995; Jüni et al. 1999). Caution should be exercised in
the assessment process be standardized and be as trans- interpreting study validity, especially if different quality
parent and repeatable as possible. At least 25 scales and elements are combined in a single data-quality sum.
9 checklists have been used to assess the validity of ran- A review of the effectiveness of Rhododendron control
domized controlled trials in medicine (Moher & Feinstein methods considered study hierarchy and potential for bias
1995; Moher et al. 1996), and various similar criteria have providing a subjective summary of data quality (Table 2).
been used to critically appraise the validity of observa- In this instance the number of environmental variables
tional studies (Horwitz et al. 1979; Feinstein et al. 1982; with sufficient data for analysis was low and sample sizes
Levine 1994; Bero et al. 1999). These checklists do not were sufficient to examine the impact of some individ-
consider specific ecological criteria. We therefore suggest ual study quality variables such as length of experiment
that review-specific a priori assessment forms and two or and whether results were generated in the field or a glass
more assessors should be used to assess study quality in house. There were statistically significant differences in
ecological reviewing. The subjective decisions may be a effectiveness of control with glasshouse trials showing
focus of criticism; thus, we advocate consultation with greater control than field-based experimentation or mon-
stakeholders to try and reach consensus before moving itoring, raising questions about the ecological relevance
on to data extraction. of glasshouse work and the likely modifying variables.
Finally, at this stage it may be necessary to reject articles This approach has the merit of objectivity, although there
that are seemingly relevant but do not present data in is choice about which variables are included in the anal-
extractable format. If possible, authors of such articles ysis and caution must be exercised to avoid Type II er-
should be contacted and asked whether they can provide rors, data mining, and overinterpreting results, especially
data in a suitable format. when sample sizes are small.
Stewart et al. (2005a) used the hierarchy of methodol-
ogy to separate randomized controlled trials and site com-
DATA EXTRACTION
parisons addressing the question, “Does burning degrade
blanket bog?” This reflected a major data-quality schism; Data extracted from articles should be recorded on care-
therefore, further data-quality assessment was inappropri- fully designed spreadsheets and undertaken with synthe-
ate given the very small number of studies. This approach sis in mind. Narrative synthesis requires the construction
enabled a simple, but discriminatory, vote count of studies of tables that provide details of the study or population
with results showing positive, neutral, or negative effects. characteristics, data quality, and relevant outcomes, all
When reviewing the impact of wind farms on bird pop- of which are defined a priori. Quantitative analysis fol-
ulations, the standard hierarchy of evidence was consid- lows the same model but care must be taken to extract

Conservation Biology
Volume 20, No. 6, December 2006
1652 Systematic Review Guidelines Pullin & Stewart

Table 2. Data-quality assessment of an article included in a systematic review of the effectiveness of methods for the control of Rhododendron
ponticum (Tyler et al. 2004).

Methods site comparison based on sites treated with different interventions, no control, comparison
methods only
Population no stand-age detail, site located on lowland heath
Intervention and drilled holes filled with herbicide, compared with stumps painted with herbicide
cointerventions
Outcomes painted stumps, 30–40% killed
drilled holes, 95% killed
Study design site comparison
Baseline comparison no information regarding the sites prior to treatment, thus not possible to validate baseline
Intratreatment variation no information describing intratreatment variation
Measurement of intervention no information regarding the sites provided, thus not possible to comment on other management
and cointerventions within the area
Replication and parameter of no replication or measure of abundance other than percent kill
abundance
Notes study appears to comment on the use of techniques rather than providing the reader with
scientific evidence, resulting in a high potential for bias and subsequently low data quality

information pertinent to subsequent analysis (e.g., should been larger and a quantitative generic outcome measure
binary or continuous outcomes be extracted)? In con- identified, the impact of these decisions could have been
trast to medicine, consideration of the appropriate spa- explored with sensitivity analyses. Given the nature of
tial scale(s) and level of replication are necessary prior the data, qualitative discussion of the issues was more
to extracting the variance measures required to weight appropriate.
meta-analyses. Great care must be taken to standardize
and document the process of data extraction, the details
DATA SYNTHESIS
of which should be recorded in tables of included studies
to increase the transparency of the process. To some ex- This stage includes both qualitative synthesis and quan-
tent data extraction can be guided by a priori rules, but titative analysis with statistical methods as appropriate.
the complexity of the operation means a degree of flexibil- Qualitative synthesis allows informal evaluation of the ef-
ity must be maintained. Sensitivity analyses can be used fect of the intervention and the manner in which it may
to investigate the impact of extracting data in different be influenced by measured study characteristics and data
ways when there is doubt about the optimum extraction quality. Data from the data-extraction spreadsheet is tab-
method. ulated to form a summary of the number of data sets pro-
Reviewing the impact of burning on the ecological con- viding a yes, no, or neutral answer to each question (vote
dition of blanket bog required extraction of data showing counting).
changes in floristic composition and structure. Two re- More formal quantitative analysis can be undertaken
viewers extracted data after reaching a consensus regard- to generate overall point estimates of the effect size
ing which subsets were relevant within the full data set and to analyze reasons for heterogeneity in the effect
of each article. A priori rules increased the repeatability of the intervention where appropriate data exist. Meta-
of data-set formation. For example, sites within an exper- analysis is now commonly used in ecology (e.g., Arnqvist
iment were pooled to prevent pseudoreplication, avoid- & Wooster 1995; Osenberg et al. 1999; Gates 2002), so
ing post hoc justifications for deriving more than one we have not treated it in detail here. Meta-analysis pro-
data set from an experiment and combining unreplicated, vides summary effect sizes with each data set weighted
pseudoreplicated, and replicated data. Pooled treatment according to some measure of its importance, with more
and control sites were included once to maintain inde- weight given to large studies with precise effect esti-
pendence and avoid bias, with the exception of data on mates and less to small studies with imprecise effect
rotational burning, which was scarce and therefore ad- estimates. Generally each study is weighted in inverse
mitted to the review provided there was a comparator ir- proportion to the variance of its effect. Pooling of in-
respective of further potential for bias. Where there was dividual effects can be undertaken with fixed-effects or
a choice of times since burning, priority was given to the random-effects statistical models. Fixed-effects models es-
longest time range to maintain independence and max- timate the average effect and assume there is a single,
imize predictive power. Similarly, grazed sites received true underlying effect, whereas random-effects models
priority over ungrazed sites when the maintenance of assume there is a distribution of effects that depend
independence demanded a choice because grazing and on study characteristics. Random effects models include
burning are carried out concurrently over most of the interstudy variability (assuming a normal distribution);
British uplands (Stewart et al. 2005a). If sample sizes had thus, when there is heterogeneity, a random-effects model

Conservation Biology
Volume 20, No. 6, December 2006
Pullin & Stewart Systematic Review Guidelines 1653

has wider confidence intervals on its summary effect than ing that the patterns in the data were robust. A priori and
a fixed-effect model. In medicine both statistical mod- post hoc reasons for heterogeneity were explored with
els are used to assess the robustness of statistical syn- meta-regression. Of the a priori variables only bird taxon
thesis with an a priori decision about which is most appeared to modify the result, with relationships between
germane (NHS CRD 2001; Khan et al. 2003). Results of turbine number and power being too weak to have bio-
our initial reviews suggest that random-effects models logical significance. Post hoc analysis revealed that the im-
are most appropriate for the analysis of ecological data pact of wind farms became more pronounced over time,
because the numerous complex interactions common a finding not reported by any of the original research or
in ecology are likely to result in heterogeneity between previously assessed in the literature. This has important
studies. implications because declines in local bird abundance are
Relationships between differences in characteristics of more likely to have deleterious population-level impacts if
individual studies and heterogeneity in results can be in- they worsen over time. It also suggests that current wind-
vestigated as part of the meta-analysis, thus aiding the farm monitoring programs are of inadequate duration to
interpretation of ecological relevance of the findings. Ex- detect deleterious effects.
ploration of these differences is facilitated by construc-
tion of tables that group studies with similar character- Stage 3—Reporting and Dissemination of Results
istics and outcomes together. Data sets can be stratified Before reports are disseminated they should be subjected
into subgroups based on populations, interventions, out- to expert scrutiny or peer review, including assessment of
comes, and methodology. Important factors that could scientific quality and completeness. This process requires
produce variation in effect size should be defined a priori the development of an editorial panel equivalent to that
(see stage 1 above) and their relative importance consid- of a journal or grant board, but with a more supportive
ered prior to data extraction to make the most efficient role in helping reviewers achieve the necessary quality
use of data. Differences in subgroups of studies can then rather than rejecting large numbers outright.
be explored. The recommended format for reporting is a short sum-
If sufficient data exist, meta-analysis can be undertaken mary that highlights the main review outcomes. This
on subgroups and the significance of differences assessed. should be written so as to enable effective communi-
Such analyses must be interpreted with caution because cation with managers and policy formers. A full report,
statistical power may be limited (Type I errors possible) written for the commissioning body, and internal records
and multiple analyses of numerous subgroups could re- will normally include too much detail for wider dissem-
sult in spurious significance (Type II errors possible). Al- ination but should nevertheless be available, along with
ternatively, a meta-regression approach can be adopted the summary, to all who want more information on the
whereby linear regression models are fitted for each co- conduct of the review process. Commonly, the review
variate, with studies weighted according to the precision will also be submitted, at the author’s discretion, for pub-
of the estimate of treatment effect in a random-effects lication in a peer-reviewed journal. We have developed
model (Sharp 1998). separate guidelines and a format for presentation of re-
Despite the attempt to achieve objectivity in review- views (www.cebc.bham.ac.uk/gettinginvolved.htm).
ing scientific data, considerable subjective judgment is A full consideration of dissemination and implementa-
required when undertaking meta-analyses. These judg- tion activities is beyond the scope of this paper, but a
ments include decisions about the choice of effect mea- few general comments are pertinent. Wide dissemination
sure, how data are combined to form data sets, which data and open access are key requirements of the evidence-
sets are relevant and which are methodologically sound based framework. However, standards of review have to
enough to be included, methods of meta-analysis, and the be ensured; therefore, a central Web site administered by
issue of whether and how to investigate sources of het- a collaboration of stakeholders is recommended, follow-
erogeneity (Thompson 1994). Reviewers should explic- ing the Cochrane Collaboration model with its emphasis
itly state and distinguish between the a priori and post on transparency of the review process and independence
hoc rationales behind these decisions to minimize bias from bias (Fazey et al. 2004). On acceptance through peer
and increase transparency. review, summaries of reviews should be posted on the
A review of the impact of wind turbines on bird abun- Web site with free access. Such a Web resource will be
dance utilized standardized mean difference meta-analysis of limited use until many more systematic reviews have
with weighting by inverse variance to combine data from been undertaken.
19 globally distributed wind farms. Sensitivity analyses
were used to explore the effect of including data from
unreplicated studies and to assess bias arising from data Requirement for Further Work
extraction of pseudoreplicated or aggregated data. Pooled
effect sizes remained negative and statistically significant To date, no systematic reviews have been published in
regardless of how the effect sizes were generated, indicat- ecology without involvement of the authors. There is

Conservation Biology
Volume 20, No. 6, December 2006
1654 Systematic Review Guidelines Pullin & Stewart

therefore potential for bias in development of appropriate tioner, and stakeholder groups) (NHS CRD 2001) warrant
methodology. For example, all reviews to date have incor- consideration. Addressing all these issues is beyond the
porated comparators, although work in progress involves scope of this paper, but they require further development
synthesizing experience and evidence with Bayesian if an ecological evidence base is to be fully established.
methodologies (Morris & Normand 1992; Louis & Zelter- The ecological guidelines presented evolved from the ex-
man 1993). It could be argued that this is an excessively isting medical model. Table 3 highlights key differences
reductionist approach, applying a narrow definition of between ecological and medical guidelines at present, but
evidence (Fox 2005) and that further methodological de- as experience with ecological systematic review grows,
velopment might be necessary to integrate different types the guidelines should be revised and updated as is stan-
of evidence (Dixon-Woods et al. 2004) or to assess eco- dard practice in medicine.
logical information of types beyond the experience of the As was the experience in the medical field, it will take
authors. time for systematic reviews to be recognized and valued as
Other issues require consideration to strengthen the equivalent to other scientific papers in conservation. Key
ecological guidelines presented above. Medical system- steps forward in encouraging more systematic reviews
atic review methodology is developing rapidly, with new will be for journals to encourage their submission and
techniques being developed to handle the variable lev- publication and for funders to see systematic reviews as
els of data quality in fields such as diagnostic testing. a valid form of research. We call on the conservation and
The utility of these techniques for ecological purposes environmental management communities to engage with
requires further investigation. Likewise, techniques for us to further develop the ecological systematic review
economic cost-benefit evaluation and disseminating evi- and create the accessible evidence base that the subject
dence to different audiences (political, scientific, practi- urgently requires.

Table 3. Differences between the medical systematic review guidelines and the ecological review guidelines advocated by the authors.

Review stage Medical guidelines Ecological guidelines

Question formulation question formulation generally not question formulation usually limited by information
limited by complexity and study availability and complexity requiring a balance
numbers between holism (more realistic) and reductionism
(more studies)
stakeholder engagement useful but not stakeholder engagement may be critical because
generally critical conservation actions often result in conflicts in
objectives
Developing review protocol: complex searches balancing sensitivity high sensitivity, low specificity searches are
search strategy and specificity are possible and recommended to reduce bias and increase
recommended repeatability because ecology lacks the
sophisticated search infrastructure of medicine
Assessing quality of clear hierarchy of evidence generally pragmatic quality weightings and sensitivity analyses
methodology applicable and often used to define a must augment data-quality hierarchies to avoid
minimum quality threshold misinterpretation, particularly when combining data
across the hierarchy to increase sample sizes
performance bias and detection bias performance bias and detection bias addressed by
addressed by blinding; methodology is experimental design but are hard to assess especially
easy to assess with published quality in a standardized manner, necessitating the use of
weightings; and attrition bias is review-specific quality weightings; attrition bias
common rare
numerous off-the-shelf checklists no off-the-shelf checklists, hence the need for a priori
available to assess the validity of review-specific criteria preferably validated by
medical research consensus with stakeholders
Data extraction data extraction often relatively data extraction complex especially with respect to
straightforward, except for missing variance measures for weighting; a priori rules must
data and data hygiene problems be developed in order to extract data in a
repeatable, standardized manner; independence and
(pseudo)replication are common problems
Data synthesis: meta-analysis fixed and random effects models are random-effects models are generally more useful than
applicable fixed-effect models because the complex
interactions in ecology generally result in
ecologically important heterogeneity between
studies

Conservation Biology
Volume 20, No. 6, December 2006
Pullin & Stewart Systematic Review Guidelines 1655

the quality of clinical trials for meta-analysis. Journal of American


Acknowledgments Medical Association 282:1054–1060.
Khan, K. S., R. Kunz, J. Kleijnen, and G. Antes. 2003. Systematic re-
We thank the many conservation managers and scientists views to support evidence-based medicine: how to apply findings
who have given us constructive feedback on the review of healthcare research. Royal Society of Medicine Press, London.
Kunz, R., and A. D. Oxman. 1998. The unpredictability paradox: review
process. Medical colleagues have contributed guidance
of empirical comparisons of randomised and nonrandomised trials.
particularly, T. Knight, R. Taylor, and K. Khan. We would British Medical Journal 317:1185–1190.
also like to thank Ioan Fazey and two anonymous review- Leimu, R., and J. Koricheva. 2005. What determines the citation fre-
ers for their comments on an earlier draft of this work. quency of ecological papers? Trends in Ecology & Evolution 20:28–
This work was supported through grants from English 32.
Levine, M., S. Walter, H. Lee, T. Haines, A. Holbrook, and V. Moyer. 1994.
Nature, the U.K. Natural Environment Research Council,
The Evidence-Based Medicine Working Group. Users’ guides to the
and the Royal Society for the Protection of Birds. medical literature IV: how to use an article about harm. Journal of
American Medical Association 271:1615–1619.
Literature Cited Light, R. J., and D. B. Pillemer. 1984. Summing up: the science of review-
ing research. Harvard University Press, Cambridge, Massachusetts.
Arnqvist, G., and D. Wooster. 1995. Metaanalysis—synthesizing research Louis, T., and D. Zelterman. 1993. Bayesian approaches to research syn-
findings in ecology and evolution. Trends in Ecology & Evolution thesis. Pages 411–422 in H. Cooper and L.V. Hedges, editors. The
10:236–240. handbook of research synthesis. Russell Sage Foundation, New York.
Bero, L., R. Grilli, J. Grimshaw, G. Mowatt, A. Oxman, and M. Zwaren- Moher, D., A. R. Jadad, G. Nichol, M. Penman, P. Tugwell, and S. Walsh.
stein, editors. 1999. Effective practice and organisation of care mod- 1995. Assessing the quality of randomized controlled trials: an anno-
ule of the Cochrane Database of Systematic Reviews. Issue 2. Update tated bibliography of scales and checklists. Controlled Clinical Trials
software. The Cochrane Library, Oxford, United Kingdom. 16:62–73.
Cohen, J. 1960. A coefficient of agreement for nominal scales. Educa- Moher, D., A. R. Jadad, and P. Tugwell. 1996. Assessing the quality of ran-
tional and Psychological Measurement 20:37–46. domized controlled trials: current issues and future directions. Inter-
Cooper, H. M. 1984. Integrating research. A guide for literature reviews. national Journal of Technology Assessment in Health Care 12:195–
Sage Publications, Newbury Park. 208.
Dixon-Woods, M., S. Agarwal, B. Young. D. Jones, and A. Sutton. 2004. Morris, C. N., and S. L. Normand. 1992. Hierarchical models for com-
Integrative approaches to qualitative and quantitative evidence. Na- bining information and for meta-analyses. Pages 321–344 in J. M.
tional Health Services Health Development Agency, London. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, editors.
Edwards, P., M. Clarke, C. DiGuiseppi, S. Pratap, I. Roberts, and R. Wentz. Bayesian statistics. 4th edition. Oxford University Press, New York.
2002. Identification of randomized controlled trials in systematic NHS (National Health Service) CRD (Centre for Reviews and Dissemi-
reviews: accuracy and reliability of screening records. Statistics in nation). 2001. Undertaking systematic review of research on effec-
Medicine 21:1635–1640. tiveness. NHS CRD, University of York, York, United Kingdom.
Emerson, J. D., E. Burdick, D. C. Hoaglin, F. Mosteller, and T. C. Chalmers. Osenberg, C. W., O. Sarnelle, S. D. Cooper, and R. D. Holt. 1999. Resolv-
1990. An empirical study of the possible relation of treatment dif- ing ecological questions through meta-analysis: goals, metrics and
ferences to quality scores in controlled randomized clinical trials. models. Ecology 80:1105–1117.
Controlled Clinical Trials 11:339–352. Pullin, A., and T. Knight. 2001. Effectiveness in conservation practice:
Fazey, I., J. G. Salisbury, D. B. Lindenmayer, J. Maindonald, and R. Dou- pointers from medicine and public health. Conservation Biology
glas. 2004. Can methods applied in medicine be used to summarize 15:50–54.
and disseminate conservation research? Environmental Conserva- Pullin, A., and T. Knight. 2003. Support for decision-making in con-
tion 31:190–198. servation practice: an evidence-based approach. Journal for Nature
Feinstein, A. R. 1985. Clinical epidemiology: the architecture of clinical Conservation 11:83–90.
research. Saunders, Philadelphia. Pullin, A., T. Knight, D. Stone, and K. Charman. 2004. Do conservation
Feinstein, A. R., and R. I. Horwitz. 1982. Double standards, scientific managers use scientific evidence to support their decision-making?
methods, and epidemiological research. New England Journal of Biological Conservation 119:245–252.
Medicine 307:1611–1617. Ravnskov, U. 1992. Cholesterol lowering trials in coronary heart disease:
Fox, D. M. 2005. Evidence of evidence-based health policy: the politics frequency of citation and outcome. British Medical Journal 305:
of systematic reviews in coverage decisions. Health Affairs 24:114– 9–15.
122. Roberts, P.D., G. B. Stewart, and A.S. Pullin. 2006. Are review articles a
Gates, S. 2002. Review of methodology of quantitative reviews using reliable source of evidence to support conservation management?
meta-analysis in ecology. Journal of Animal Ecology 71:547–557. A comparison with medicine. Biological Conservation:in press.
Gotzsche, P. C. 1987. Reference bias in reports of drug trials. British Sharp, S. 1998. Meta-analysis regression: statistics, biostatistics, and epi-
Medical Journal 295:654–656. demiology. Stata Technical Bulletin 42:16–22.
Hedges, L. V. 1994. Statistical considerations. Pages 30–33 in H. Cooper Schulz, K. F., I. Chalmers, R. J. Hayes, and D. G. Altman. 1995. Empirical
and L.V. Hedges, editors. The handbook of research synthesis. Russell evidence of bias: dimensions of methodological quality associated
Sage Foundation, New York. with estimates of treatment effects in controlled trials. Journal of the
Higgins, J. P. T., and S. Green, editors. 2005. Cochrane handbook for American Medical Association 273:408–412.
systematic reviews of interventions 4.2.5. John Wiley & Sons, Chich- Stevens, A., and R. Milne. 1997. The effectiveness revolution and public
ester, United Kingdom. health. Pages 197–225 in G. Scally, editor. Progress in public health.
Horwitz, R. I., and A. R. Feinstein. 1979. Methodological standards and Royal Society of Medicine Press, London.
contradictory results in case-control research. American Journal of Stewart, G. B., C. F. Coles, and A. S. Pullin. 2005a. Applying evidence-
Medicine 66:556–564. based practice in conservation management: lessons from the first
Jackson, G. B. 1980. Methods for integrative reviews. Review Education systematic review and dissemination projects. Biological Conserva-
Research 50:438–460. tion 126:270–278.
Jüni, P., A. Witschi, R. Bloch, and M. Egger. 1999. The hazards of scoring Stewart, G. B., A. S. Pullin, and C. F. Coles. 2005b. Effects of wind turbines

Conservation Biology
Volume 20, No. 6, December 2006
1656 Systematic Review Guidelines Pullin & Stewart

on bird abundance. Systematic review 4. Centre for Evidence-Based tre for Evidence-Based Conservation, Birmingham, United Kingdom.
Conservation, Birmingham, United Kingdom. (Also available from (Also available from www.cebc.bham.ac.uk/systematicreviews.
www.cebc.bham.ac.uk/systematicreviews.htm.) htm.)
Sutherland, W., A. Pullin, P. Dolman, and T. Knight. 2004. The need Tyler, C., E. Clark, and A. S. Pullin. 2005. Do management interven-
for evidence-based conservation. Trends in Ecology & Evolution tions effectively reduce or eradicate populations of the American
19:305–308. Mink, Mustela vison? Systematic Review 7. Centre for Evidence-
Thompson, S. 1994. Systematic review: why sources of heterogene- Based Conservation, Birmingham, United Kingdom. (Also available
ity in meta-analysis should be investigated. British Medical Journal from www.cebc.bham.ac.uk/systematicreviews.htm.)
309:1351–1355. Tyler, C., A. S. Pullin, and G. B. Stewart. 2006. Effectiveness of man-
Tyler, C., and A. S. Pullin. 2005. Do commonly used interventions effec- agement interventions to control invasion by Rhododendron pon-
tively control Rhododendron ponticum? Systematic Review 6. Cen- ticum. Environmental Management 37:513–522.

Conservation Biology
Volume 20, No. 6, December 2006

View publication stats

You might also like