C54A

Product Name: Infobase 2009 - Release Date: March 2009
C54-A
Vol. 28 No. 19
Replaces C54-P
Vol. 27 No. 25
Verification of Comparability of Patient

Results Within One Health Care System;
Approved Guideline
This document provides guidance on how to verify comparability of quantitative

laboratory results for individual patients within a health care system.
A guideline for global application developed through the Clinical and Laboratory
Standards Institute consensus process.
(Formerly NCCLS)
This document is protected by copyright. Downloaded on 2/23/2009

Clinical and Laboratory Standards Institute

Advancing Quality in Health Care Testing
Clinical and Laboratory Standards Institute (CLSI, Most documents are subject to two levels of consensus—
formerly NCCLS) is an international, interdisciplinary, “proposed” and “approved.” Depending on the need for
nonprofit, standards-developing, and educational field evaluation or data collection, documents may also be
organization that promotes the development and use of made available for review at an intermediate consensus
voluntary consensus standards and guidelines within the level.
health care community. It is recognized worldwide for
Proposed A consensus document undergoes the first stage
the application of its unique consensus process in the
of review by the health care community as a proposed
development of standards and guidelines for patient
standard or guideline. The document should receive a wide
testing and related health care issues. Our process is
and thorough technical review, including an overall review
based on the principle that consensus is an effective and
of its scope, approach, and utility, and a line-by-line review
cost-effective way to improve patient testing and health
of its technical and editorial content.
care services.
Approved An approved standard or guideline has achieved
In addition to developing and promoting the use of
consensus within the health care community. It should be
voluntary consensus standards and guidelines, we
reviewed to assess the utility of the final document, to
provide an open and unbiased forum to address critical
ensure attainment of consensus (ie, that comments on earlier
issues affecting the quality of patient testing and health
versions have been satisfactorily addressed), and to identify
care.
the need for additional consensus documents.
PUBLICATIONS
Our standards and guidelines represent a consensus opinion
A document is published as a standard, guideline, or on good practices and reflect the substantial agreement by
committee report. materially affected, competent, and interested parties
obtained by following CLSI’s established consensus
Standard A document developed through the consensus
procedures. Provisions in CLSI standards and guidelines
process that clearly identifies specific, essential
may be more or less stringent than applicable regulations.
requirements for materials, methods, or practices for use
Consequently, conformance to this voluntary consensus
in an unmodified form. A standard may, in addition,
document does not relieve the user of responsibility for
contain discretionary elements, which are clearly
compliance with applicable regulations.
identified.
COMMENTS
Guideline A document developed through the consensus
process describing criteria for a general operating The comments of users are essential to the consensus
practice, procedure, or material for voluntary use. A process. Anyone may submit a comment, and all comments
guideline may be used as written or modified by the user are addressed, according to the consensus process, by the
to fit specific needs. committee that wrote the document. All comments,
including those that result in a change to the document when
Report A document that has not been subjected to
published at the next consensus level and those that do not
consensus review and is released by the Board of
result in a change, are responded to by the committee in an
Directors.
appendix to the document. Readers are strongly encouraged
CONSENSUS PROCESS to comment in any form and at any time on any document.
Address comments to Clinical and Laboratory Standards
The CLSI voluntary consensus process is a protocol Institute, 940 West Valley Road, Suite 1400, Wayne, PA
establishing formal criteria for: 19087, USA.
• the authorization of a project VOLUNTEER PARTICIPATION
• the development and open review of documents Health care professionals in all specialties are urged to
• the revision of documents in response to comments volunteer for participation in CLSI projects. Please contact
by users us at [email protected] or +610.688.0100 for
additional information on committee participation.
• the acceptance of a document as a consensus
standard or guideline.

C54-A
ISBN 1-56238-671-9
Volume 28 Number 19 ISSN 0273-3099
Verification of Comparability of Patient Results Within One Health Care
System; Approved Guideline
Christopher M. Lehman, MD
John Rex Astles, PhD, FACB
Renze Bais, PhD
Sterling Bennett, MD
Ellis Jacobs, PhD, DABCC, FACB
Stan Johnson, MA
W. Gregory Miller, PhD
Jeffrey Vaks, PhD
Harvey Lipman, PhD
Amit R. Phansalkar, MS
Kenneth Sikaris, MD
Dietmar Stöckl, PhD
Greg Cooper, CLS, MHA
Abstract
Clinical and Laboratory Standards Institute document C54-A—Verification of Comparability of Patient Results Within One
Health Care System; Approved Guideline provides guidance on how to verify comparability of quantitative laboratory results for
individual patients across a health care system. For the purpose of this document, a health care system is defined as a system of
physician offices, clinics, hospitals, and reference laboratories, under one administrative entity, where a patient may present for
laboratory testing, and whose results may be reviewed by any health care provider within the system for the purpose of providing
medical care. This document does not provide guidance on how to correct method noncomparability that may be identified.
Clinical and Laboratory Standards Institute (CLSI). Verification of Comparability of Patient Results Within One Health Care
System; Approved Guideline. CLSI document C54-A (ISBN 1-56238-671-9). Clinical and Laboratory Standards Institute, 940
West Valley Road, Suite 1400, Wayne, Pennsylvania 19087-1898 USA, 2008.
The Clinical and Laboratory Standards Institute consensus process, which is the mechanism for moving a document through
two or more levels of review by the health care community, is an ongoing process. Users should expect revised editions of any
given document. Because rapid changes in technology may affect the procedures, methods, and protocols in a standard or
guideline, users should replace outdated editions with the current editions of CLSI/NCCLS documents. Current editions are
listed in the CLSI catalog and posted on our website at www.clsi.org. If your organization is not a member and would like to
become one, and to request a copy of the catalog, contact us at: Telephone: 610.688.0100; Fax: 610.688.0700; E-Mail:
[email protected]; Website: www.clsi.org
(Formerly NCCLS)

Number 19 C54-A
Copyright ©2008 Clinical and Laboratory Standards Institute. Except as stated below, neither this
publication nor any portion thereof may be adapted, copied, or otherwise reproduced, by any means
(electronic, mechanical, photocopying, recording, or otherwise) without prior written permission from
Clinical and Laboratory Standards Institute (“CLSI”).
CLSI hereby grants permission to each individual member or purchaser to make a single reproduction of
this publication for use in its laboratory procedure manual at a single site. To request permission to use
this publication in any other manner, contact the Executive Vice President, Clinical and Laboratory
Standards Institute, 940 West Valley Road, Suite 1400, Wayne, Pennsylvania 19087-1898, USA.
Suggested Citation
(CLSI. Verification of Comparability of Patient Results Within One Health Care System; Approved
Guideline. CLSI document C54-A. Wayne, PA: Clinical and Laboratory Standards Institute; 2008.)
Proposed Guideline
October 2007
Approved Guideline
May 2008
ISBN 1-56238-671-9
ISSN 0273-3099
ii

Volume 28 C54-A
Committee Membership
Area Committee on Clinical Chemistry and Toxicology
David A. Armbruster, PhD, Linda Thienpont, PhD Neil Greenberg, PhD

DABCC, FACB University of Ghent Ortho-Clinical Diagnostics, Inc.
Chairholder Ghent, Belgium Rochester, New York
Abbott Diagnostics
Abbott Park, Illinois Jeffrey E. Vaks, PhD Harvey W. Kaufman, MD
Roche Molecular Diagnostics Quest Diagnostics, Incorporated
Christopher M. Lehman, MD Pleasanton, California Lyndhurst, New Jersey
Vice-Chairholder
Univ. of Utah Health Sciences Hubert Vesper, PhD W. Gregory Miller, PhD
Center Centers for Disease Control and Virginia Commonwealth University
Salt Lake City, Utah Prevention Richmond, Virginia
Atlanta, Georgia
John Rex Astles, PhD, FACB Gary L. Myers, PhD
Centers for Disease Control and Jack Zakowski, PhD, FACB Centers for Disease Control and
Prevention Beckman Coulter, Inc. Prevention
Atlanta, Georgia Brea, California Atlanta, Georgia
David M. Bunk, PhD Advisors David Sacks, MD

National Institute of Standards and Brigham and Women’s Hospital
Technology Mary F. Burritt, PhD and Harvard Medical School
Gaithersburg, Maryland Mayo Clinic Boston, Massachusetts
Scottsdale, Arizona
David G. Grenache, PhD, Bette Seamonds, PhD
MT(ASCP), DABCC Paul D’Orazio, PhD Mercy Health Laboratory
University of Utah Instrumentation Laboratory Swarthmore, Pennsylvania
Salt Lake City, Utah Lexington, Massachusetts
Steven C. Kazmierczak, PhD, Carl C. Garber, PhD, FACB STT Consulting
DABCC, FACB Quest Diagnostics, Incorporated Horebeke, Belgium
Oregon Health and Science Lyndhurst, New Jersey
University Thomas L. Williams, MD
Portland, Oregon Uttam Garg, PhD, DABCC Nebraska Methodist Hospital
Children’s Mercy Hospitals & Omaha, Nebraska
Clinics
Kansas City, Missouri
Subcommittee on Verification of Comparability of Patient Results
Christopher M. Lehman, MD Ellis Jacobs, PhD, DABCC, FACB Advisors

Chairholder NYU/Bellevue
Univ. of Utah Health Sciences New York, New York J. David Bessman, MD
Center Univ. of Texas Medical Branch
Salt Lake City, Utah Stan R. Johnson, MA Galveston, Texas
Beckman Coulter, Inc.
John Rex Astles, PhD, FACB Brea, California Elma Kamari Bidkorpeh
Centers for Disease Control and Kaiser Permanente
Prevention W. Gregory Miller, PhD North Hollywood, California
Atlanta, Georgia Virginia Commonwealth University
Richmond, Virginia Harvey B. Lipman, PhD
Renze Bais, PhD Centers for Disease Control and
Pacific Laboratory Medicine Jeffrey E. Vaks, PhD Prevention
Services Roche Molecular Diagnostics Atlanta, Georgia
Sydney, Australia Pleasanton, California
Amit Phansalkar, MS
Sterling Bennett, MD Ian S. Young, MD, FRCP ARUP Laboratories
LDS Hospital Queen’s University Belfast Salt Lake City, Utah
Salt Lake City, Utah Belfast, United Kingdom
iii

Number 19 C54-A
Advisors (Continued) Staff Jane M. Oates, MT(ASCP)

Staff Liaison
Kenneth A. Sikaris, MD Clinical and Laboratory Standards
Melbourne Pathology Institute Melissa A. Lewis
Victoria, Australia Wayne, Pennsylvania Editor

Lois M. Schmidt, DA
STT Consulting
Vice President, Standards
Horebeke, Belgium
Development and Marketing
Acknowledgments
This guideline was prepared by CLSI, as part of a cooperative effort with IFCC to work toward the
advancement and dissemination of laboratory standards on a worldwide basis. CLSI gratefully
acknowledges the participation of IFCC experts Ian S. Young, MD, FRCP; and Renze Bais, PhD, on this
project.
In addition, CLSI and the Subcommittee on Verification of Comparability of Patient Results gratefully
acknowledge the following volunteer for his important contributions to the development and/or
completion of this document: Greg Cooper, CLS, MHA, Bio-Rad Laboratories, Inc.
iv

Volume 28 C54-A
Contents
Abstract ....................................................................................................................................................i
Committee Membership........................................................................................................................ iii
Foreword .............................................................................................................................................. vii
1 Scope .......................................................................................................................................... 1
2 Introduction ................................................................................................................................ 1
3 Standard Precautions.................................................................................................................. 2
4 Definitions ................................................................................................................................. 2
5 Practical Considerations for Designing a Comparability Monitoring Protocol ......................... 5

5.1 Causes of Noncomparability of Results ........................................................................ 5
5.2 Scope of Comparisons .................................................................................................. 6
5.3 Risk Assessment for Noncomparable Results .............................................................. 6
5.4 Frequency and Complexity of Comparability Assessment Protocols ........................... 6
5.5 General Approaches to Comparability Testing............................................................. 7
5.6 Triggers for Special Cause Comparability Testing ....................................................... 7
6 Samples for Comparability Testing ........................................................................................... 9
6.1 Commutability .............................................................................................................. 9
6.2 Analyte Concentrations for Testing ............................................................................ 12
6.3 Storage and Transport ................................................................................................. 12
7 Acceptance Criteria for Comparability Testing of Patient Results .......................................... 13
7.1 Evaluation of Comparability Based on Clinical Outcomes ........................................ 13
7.2 Evaluation of Comparability Based on Clinician’s Questionnaire ............................. 14
7.3 Evaluation of Comparability Based on Biological Variability ................................... 14
7.4 Evaluation of Analytical Performance Based on Published Professional
Recommendations ....................................................................................................... 15
7.5 Evaluation of Analytical Performance Based on Goals Set by Accrediting
Agencies...................................................................................................................... 15
7.6 Evaluation of Analytical Performance Based on the General Capability ................... 15
8 Statistical Evaluation of Comparability Data........................................................................... 16
8.1 Hypothesis Testing ..................................................................................................... 16
8.2 Statistical Analysis of Comparability Data ................................................................. 16
8.3 Fixed Limit Evaluation ............................................................................................... 18
9 Point-of-Care Testing (POCT) ................................................................................................. 19
9.1 Specimen Selection ..................................................................................................... 19
9.2 Specimen Acquisition ................................................................................................. 19
9.3 Range of Specimen Values ......................................................................................... 20
9.4 Multiple Devices of the Same Make and Model......................................................... 20
9.5 Statistical Considerations for POC Comparability Testing ........................................ 20
10 Range Test Comparability Protocol ......................................................................................... 20
10.1 Select an Analyte for Comparison .............................................................................. 20
v

Number 19 C54-A
Contents (Continued)
10.2 Select the Instruments to Be Compared ...................................................................... 21
10.3 Identify an Approximate Analyte Concentration for Comparison Testing ................. 21
10.4 Calculate the Desired Concentration or Activity to Be Used for Comparison
Sample Selection......................................................................................................... 21
10.5 Select a Sample for Comparison Testing .................................................................... 21
10.6 Select the Appropriate Level of Acceptance Criteria That Can Be Applied
to the Comparison Test (from Section 7) .................................................................... 22
10.7 Set the Critical Difference for the Comparability Test at the Recommended
Total Error or Bias Limit Determined in Section 10.6 ............................................... 22
10.8 Determine the Number of Replicates to Be Run......................................................... 22
10.9 Perform the Comparison ............................................................................................. 22
10.10 Evaluate the Clinical Relevance of the Comparison Results ...................................... 23
10.11 Troubleshooting Noncomparability ............................................................................ 23
References ............................................................................................................................................. 24
Appendix A. Worked Examples ........................................................................................................... 26
Appendix B. Table of Critical Differences (%) for the Range Test...................................................... 34
Appendix C. Statistical Concepts.......................................................................................................... 36
Appendix D. Biological Variation ........................................................................................................ 43
Summary of Comments and Subcommittee Responses ........................................................................ 45
The Quality Management System Approach ........................................................................................ 46
Related CLSI Reference Materials ....................................................................................................... 47
vi

Volume 28 C54-A
Foreword
Patients may present for laboratory testing at multiple locations within a health care system. Continuity of
medical care requires that the comparability of test results produced by different measurement systems be
verified periodically. This document provides guidance on how to verify the comparability of quantitative
laboratory results for analytes tested on different measurement systems. The document addresses causes
of noncomparability, risk assessment of comparability failure, frequency of comparison testing,
concentrations to be compared, commutability of comparability testing materials, a comparability testing
protocol, and acceptance criteria for interpretation of comparability testing. The comparability testing
protocol described in this document is an intuitive, simple approach that balances the need for a
statistically valid, clinically relevant methodology with practical limitations on laboratory resources.
Other valid procedures for comparability evaluation can be developed by a laboratory, and it is not the
intent of this document to exclude their use. This protocol can also be used to validate reagent lot
changes.
Key Words
accuracy, bias, coefficient of variation, commutability, comparability, imprecision, studentized range test
vii

Number 19 C54-A
viii

Volume 28 C54-A
Verification of Comparability of Patient Results Within One Health Care

System; Approved Guideline
1 Scope
This document provides guidance on how to verify comparability of quantitative laboratory results for
individual patients within a health care system. For the purpose of this document, a health care system is
defined as a system of physician offices, clinics, hospitals, and reference laboratories, under one
administrative entity, where a patient may present for laboratory testing, and whose results may be
reviewed by any health care provider within the system for the purpose of providing medical care.
The document provides a simple approach to be used for the assessment of patient laboratory result
comparability across a maximum of 10 instruments, and assumes that a more comprehensive validation of
quantitative measurement system comparability has been undertaken when the measurement systems
were initially introduced into the laboratory. A more comprehensive comparison among measurement
procedure results can follow a methodology such as that described in CLSI/NCCLS document EP09.1
Comparability testing is just one facet of a program for assuring quality laboratory performance and is not
intended to be a substitute for other quality monitors. This document does not address corrective action
should method noncomparability be identified.
The approach described can also be used to verify comparability of patients’ results in situations such as
those following reagent or calibrator lot changes, instrument component changes or maintenance
procedures, alerts from quality control (QC) or external quality assessment (EQA) (proficiency testing
[PT]) events, or other special cause event.
2 Introduction
Out of necessity, or for their own convenience, patients may interface with health care systems for the
purpose of laboratory testing in a variety of settings and/or locations. Results of these tests may be
compiled and reviewed by providing clinicians at any of the patient care locations. In addition, larger
laboratories may have multiple instruments within one location (eg, backup instruments, point-of-care
[POC] instruments) that may provide laboratory results for an individual patient during a health care
episode. Over time, lots of calibrator and reagents change, calibration and maintenance procedures are
performed, and other events may occur that can affect patient test results. The diagnostic value of patient
test results is maximized if the measurement systems providing such results are in a state of statistical
control (ie, are producing stable and consistent results). Maintaining comparability may involve
standardization and calibration of instruments, forced agreement of results among different measurement
systems through mathematical transformation, or adoption of different reference intervals and/or
therapeutic or diagnostic cutoffs that are clearly indicated in the patient report. Regardless of the approach
used to achieve comparable results among different measurement systems, or to accommodate known
differences, periodic verification of assay comparability is necessary to provide optimal patient care.
There is no consensus procedure for demonstrating patient laboratory result comparability for patient
samples among measurement procedures. A survey of the participants involved in the preparation of this
document demonstrated a variety of approaches to testing frequency, number and type of samples tested
(eg, random, high and low concentrations, or concentrations spanning the analytical measurement range),
evaluation and acceptance criteria for the results of comparison testing, and method of dealing with
known bias between methods. The intent of this document is to review the salient issues surrounding
verification of comparability of patient results among measurement procedures, and to provide a practical,
statistically valid approach that laboratories of varying size and resources can use to satisfy this quality
©
Clinical and Laboratory Standards Institute. All rights reserved. 1

Number 19 C54-A
requirement. Other valid procedures for comparability evaluation can be developed by a laboratory, and it
is not the intent of this document to exclude their use.
This guideline addresses evaluation and monitoring of comparability of patient results. Recommendations
on monitoring stability of the analytical process are provided in CLSI document C24.2 Other clinical
laboratory procedures are in place to address calibration traceability of routine measurement procedures
to reference systems that are intended to ensure long-term consistency of calibration and uniformity of
results among providers of in vitro diagnostic (IVD) measurement systems (see CLSI document X053 and
ISO 175114 for further information).
3 Standard Precautions
Because it is often impossible to know what isolates or specimens might be infectious, all patient and
laboratory specimens are treated as infectious and handled according to “standard precautions.” Standard
precautions are guidelines that combine the major features of “universal precautions and body substance
isolation” practices. Standard precautions cover the transmission of all infectious agents and thus are
more comprehensive than universal precautions, which are intended to apply only to transmission of
blood-borne pathogens. Standard and universal precaution guidelines are available from the US Centers
for Disease Control and Prevention.5 For specific precautions for preventing the laboratory transmission
of all infectious agents from laboratory instruments and materials and for recommendations for the
management of exposure to all infectious disease, refer to CLSI document M29.6
4 Definitions
accuracy (of measurement) – closeness of the agreement between the result of a measurement and a true
value of the measurand (VIM93).7
alpha error – probability of falsely rejecting the null hypothesis when it is true.
analyte – component represented in the name of a measurable quantity (ISO 17511).4
analytical measurement range (AMR) – the range of analyte values that a method can directly measure
on the sample without any dilution, concentration, or other pretreatment that is not part of the typical
assay process.
beta error – probability of falsely rejecting the alternative hypothesis when it is true.
bias – difference between the expectation of the test results and an accepted reference value (ISO 5725-1,8
ISO 3534-1)9; NOTE 1: Bias is the total systematic error as contrasted to random error. There may be
one or more systematic error components contributing to the bias. A larger systematic difference from the
accepted reference value is reflected by a larger bias value (ISO 5725-1)8; NOTE 2: The measure of
trueness is usually expressed in terms of bias (ISO 3534-1).9
calibration – set of operations that establish, under specified conditions, the relationship between values
of quantities indicated by a measuring instrument or measuring system, or values represented by a
material measure or a reference material, and the corresponding values realized by standards (VIM93).7
calibrator – a substance, material, or article intended by its manufacturer to be used to establish the
measurement relationships of an in vitro diagnostic medical device.
coefficient of variation (CV) – for a non-negative characteristic, the ratio of the standard deviation to the
average (ISO 3534-1)9; NOTE: The ratio may be expressed as a percentage.
©
2 Clinical and Laboratory Standards Institute. All rights reserved.

Volume 28 C54-A
commutable – interassay properties of a reference material, calibrator material, or quality control

material that are comparable with those demonstrated by authentic clinical specimens; NOTE:
Commutability of a material is defined as the “degree to which a material yields the same numerical
relationships between results of measurements by a given set of measurement procedures, purporting to
measure the same quantity, as those between the expectations of the relationships obtained when the same
procedures are applied to other relevant types of material” (CEN prEN 12287).10
comparability – agreement between patient results obtained for an analyte using different measurement
procedures within a health care system; NOTE: The results are considered to be comparable if the
differences do not exceed a critical value established based on defined acceptance criteria.
external quality assessment//proficiency testing (PT) – a program in which multiple samples are
periodically sent to members of a group of laboratories for analysis and/or identification, in which each
laboratory’s results are compared with those of other laboratories in the group and/or with an assigned
value.
imprecision – the random dispersion of a set of replicate measurements and/or values expressed
quantitatively by a statistic, such as standard deviation or coefficient of variation; NOTE: It is defined in
terms of repeatability and reproducibility (see CLSI/NCCLS document EP05).11
measurand – particular quantity subject to measurement (VIM93).7
measurement procedure – set of operations, described specifically, used in the performance of particular
measurements according to a given method (VIM93)7; NOTE 1: A measurement procedure is usually
recorded in a document that is sometimes itself called a “measurement procedure” (or a measurement
method) and is usually in sufficient detail to enable an operator to carry out a measurement without
additional information (VIM93)7; NOTE 2: This term pertains to specific procedures as marketed by
specific manufacturers; NOTE 3: In other documents and in CLSI document EP15,12 equivalent terms
were method, device, and assay; NOTE 4: A measurement procedure is based on a measurement method.
measurement system – a unit or device used to measure or assess the presence or absence of a particular
substance, or to quantitate that substance, found in blood or body fluids; NOTE: A measurement system
includes instructions and all of the instrumentation, equipment, reagents, and/or supplies needed to
perform an assay or examination and generate test results.
point-of-care testing (POCT)//bedside, near-patient testing – testing performed in an alternate site,

outside a central laboratory environment, generally nearer to, or at the site of, the patient.
power – probability of accepting the alternative hypothesis that a substance causes interference when it is
true; NOTE: The probability is usually denoted as a percentage, 100(1-β) %.
precision (of measurement) – closeness of agreement between independent test results obtained under
stipulated conditions (ISO 3534-1)9; NOTE: The measure of precision usually is expressed in terms of
imprecision and computed as a standard deviation of the test results. Less precision is reflected by a larger
standard deviation (ISO 3534-1).9
proficiency testing//external quality assessment (PT/EQA) – determination of laboratory testing

performance by means of interlaboratory comparisons; NOTE 1: Commonly, a program periodically
sends multiple specimens to members of a group of laboratories for analysis and/or identification; the
program then compares each laboratory’s results with those of other laboratories in the group and/or with
an assigned value, and reports the results to the participating laboratory and others; NOTE 2: Other forms
of PT/EQA include: data transformation exercises, single-item testing (where one item is sent to a number
of laboratories sequentially and returned to the program at intervals), and one-off exercises (where
©

Number 19 C54-A
laboratories are provided with a test item on a single occasion); NOTE 3: The results are summarized,
analyzed, and, with some tests, graded by the program and provided to the participating site, which can
compare its results with those of other sites that use a similar method.
quality control – 1) the operational techniques and activities that are used to fulfill requirements for
quality (ISO 9000)13; 2) In health care testing, the set of procedures designed to monitor the test method
and the results to assure test system performance; NOTE: QC includes testing control materials, charting
the results and analyzing them to identify sources of error, and evaluating and documenting any remedial
action taken as a result of this analysis.
regression analysis – the process of estimating the parameters of a model by optimizing the value of an
objective function and then testing the resulting predictions for statistical significance against an
appropriate null hypothesis model; the process of describing mathematically the relationship between two
or more variables; NOTE: This can include the parametric testing of the statistical significance of the
relationship, if random errors are assumed to be normal.
risk – combination of the probability of occurrence of harm and the severity of that harm (ISO 1519014;
ISO/IEC Guide 5115).
sample – one or more parts taken from a system, and intended to provide information on the system,
often to serve as a basis for decision on the system or its production (ISO 15189)16; NOTE 1: For
example, a volume of serum taken from a larger volume of serum (ISO 15189)16; NOTE 2: A sample is
prepared from the patient specimen and used to obtain information by means of a specific laboratory test;
NOTE 3: For the purposes of this guideline, readers can consider the terms “sample” and “specimen” to
be equivalent; NOTE 4: The term “specimen” has been used in laboratory medicine as a synonym for a
sample, as defined here, of biological origin, or for an entire macroscopic parasite.
statistic – a function of a set of observations from a random variable; NOTE: A statistic is also a random
variable; thus, it also has statistics, such as mean, standard deviation, etc.
total error – the sum of any set of defined errors that can affect the accuracy of an analytical result;
NOTE: Total error can be defined as the sum of bias and imprecision.
trueness (of measurement) – closeness of agreement between the average value obtained from a large
series of test results and an accepted reference value (ISO 3534-1)9; NOTE: The measure of trueness is
usually expressed in terms of bias (ISO 3534-1).9
Type I error – an incorrect judgment or conclusion that occurs when an association is found between
variables where, in fact, no association exists; NOTE 1: For example, if the experimental procedure does
not really have any effect, chance or random error may cause the researcher to conclude that the
experimental procedure did have an effect; NOTE 2: Also known as “false positive” or “alpha error.”
Type II error – an incorrect judgment or conclusion that occurs when no association is found between
variables where, in fact, an association does exist; NOTE 1: In a medical screening, for example, a
negative test result may occur by chance in a subject who possesses the attribute for which the test is
conducted; NOTE 2: Also known as “false negative” or “beta error.”
validation – confirmation, through the provision of objective evidence, that requirements for a specific
intended use or application have been fulfilled (ISO 9000).13
©

Volume 28 C54-A
5 Practical Considerations for Designing a Comparability Monitoring Protocol

A number of factors should be considered when designing a comparability protocol. The laboratory
director must determine the appropriate protocol for monitoring each analyte that is measured by more
than one instrument in the health care system. Applicable regulatory and/or accreditation requirements
(eg, frequency of comparison testing) should be incorporated into the design of any protocol.
Comparability verification is considered good laboratory practice even if it is not a regulatory or
accreditation requirement.
5.1 Causes of Noncomparability of Results
In designing a plan for routine assessment of measurement system comparability, potential causes of
noncomparability of results for patients’ samples should be considered. Reasons for differences between
results from more than one instrument or method include:
• different analytical methodologies;
• differences in calibration between measurement procedures;
• differences in imprecision between measurement procedures;
• existence of value assignment errors and variation of commutability between lots of calibrators;
• simultaneous use of calibrator lots of different ages/stages of time-dependent degradation in different

laboratory locations;
• differences in commutability of calibrators with different measurement procedures from different

IVD manufacturers;
• reagent on-instrument degradation after calibration;
• instrument drift/failure;
• use of different reagent lots, or differences in packaging, shipping, or storage conditions when the
same method is used on more than one instrument;
• differences in instrument analytical parameters, such as dilution ratios and incubation times between
different instruments that use the same reagents; and
• preanalytical effects on the sample, including differences in sample handling between different types
of measurement systems.
Differences due to calibration, reagent lots, and instrument parameters are more easily managed by the
laboratory to achieve comparable results for patient samples, while differences between results caused by
fundamental differences in analytical methodology are more difficult to manage. For example, antibodies
may be directed against different epitopes of a polypeptide hormone, in which case the substance actually
measured is different depending upon the method. When there are fundamental differences in analytical
methodologies used within a health care system, it may be impossible to force patient results to agree
through a calibration process or by adjusting reported results using a mathematical correction factor. In
general, such differences are more frequently addressed by defining and reporting different reference
intervals.
©

Number 19 C54-A
5.2 Scope of Comparisons
Ideally, every measurement system measuring the same analyte in the health care system should be
included in comparability testing. For unstable analytes, it may be possible to stabilize patient specimens
collected for comparability testing (eg, glucose may be stabilized in whole blood with fluoride ion,
ammonia may be stabilized by freezing plasma aliquots). In exceptional cases, materials other than patient
specimens may be required for comparability testing (see Section 6).
5.3 Risk Assessment for Noncomparable Results
For comparability testing, risk equals the product of the potential for harm caused by the degree of
measurement system noncomparability and the potential frequency of occurrence of noncomparable
results for a specific analyte.
5.3.1 Clinical Impact of Noncomparability
The laboratory director must assess the impact of noncomparability of a measurement system on patient
care. Input from practicing clinicians who order the test should be solicited when necessary. Harm to the
patient may result when diagnosis and/or treatment is delayed due to clinician confusion about
noncomparable results that may generate an additional “tie-breaker” test (eg, noncomparability of
emergency department and outpatient clinic results). In addition, physicians directing the care of patients
who are being monitored for various biomarker or therapeutic drug levels, which are measured on
instruments that give dissimilar results, may be confused about the outcomes of treatments or dosing
regimens (eg, tumor markers). Analytes that are at risk for noncomparability and that pose a significant
risk to patient outcome may warrant more frequent and/or more rigorous assessment.
5.3.2 Probability of Noncomparability
The laboratory director must assess the likelihood that two assays will demonstrate noncomparability
given the inherent limitations of the measurement systems being compared (see Section 7). Evidence of
calibration instability or prior problems maintaining comparability, for example, may indicate a need for
more frequent comparisons, and/or more frequent calibrations.
5.4 Frequency and Complexity of Comparability Assessment Protocols
In designing a comparability protocol for an analyte, the laboratory director must consider the risk to
patients of noncomparability of assays, as well as practical considerations. Approaches to comparability
testing can vary significantly in terms of reagents consumed; time spent procuring, storing, transporting,
and measuring samples for comparative analysis; and time spent evaluating the comparability of results. It
may be useful to begin monitoring comparability with as high a frequency of comparisons as indicated by
risk assessment and cost effectiveness, making improvements based on comparison data. The frequency
of monitoring can then be reduced based on improvements in performance and revisions in risk
assessment (if applicable).
5.4.1 Statistical Considerations
From a statistical standpoint, the tolerance for falsely detecting a difference between assays (Type I error)
must be balanced against the tolerance for failure to detect a true difference that is clinically significant
(Type II error). Practical considerations frequently limit the sample size of a comparison, increasing the
probability of a Type II error. In addition, frequent comparisons of stable assays increase the probability
of a Type I error (see Section 8 and Appendix C).
©

Volume 28 C54-A
5.4.2 Operational and Cost Considerations
There are a number of practical considerations that the laboratory director should address when deciding
how frequently to verify comparability and the number of replicates to be tested. Operational factors that
may influence the frequency of testing include staffing availability (staff may need to spend considerable
time acquiring, storing, and transporting specimens); availability and stability of appropriate samples for
testing; capacity for storing patient specimens; geographic locations of testing sites; cost of reagents; and
the opportunity to combine comparability testing with other quality assurance testing, such as verification
of the analytical measurement range (AMR) (however, see discussion of appropriate sample selection in
Section 6). Ultimately, the laboratory director must balance the risk to the patient of noncomparable
results against the cost in materials and labor to the laboratory when designing a protocol for evaluation
and maintenance of interassay comparability. If the monitoring of comparability is accompanied by
process improvements, the initial cost of implementing comparability monitoring and the cost of
subsequent process improvements should be mitigated by cost savings due to improved performance and
cost reductions due to less frequent monitoring.
5.5 General Approaches to Comparability Testing
Interinstrument comparability testing can be categorized as: (1) frequent (eg, daily, weekly) monitoring;
(2) periodic monitoring (eg, quarterly, biannual) that is performed when frequent monitoring is deemed
unnecessary because the measurement systems involved are stable and the risk of errors in clinical
interpretation due to noncomparable results is low; and (3) special cause testing that is performed in
response to an alert from a monitoring procedure or other triggering event (see Section 5.6) when a
greater degree of statistical confidence in the results is desired. Frequent monitoring may be set up to have
more or less power to detect a difference, depending upon the requirements of the assay, but generally
involves comparing fewer samples or running fewer replicates of a single specimen. This approach is
relatively low-cost in terms of number of patient samples tested, and time and reagents consumed per
comparison event. Alternatively, frequent monitoring can be accomplished through automated, statistical
monitoring of patient results (eg, weighted moving averages).17 Frequent monitoring provides the
opportunity to evaluate trends in comparability of results over time, and allows for better understanding
and improvement of the measurement procedure. Periodic monitoring should be designed to have greater
power to detect a difference (ie, a larger number of patient samples or replicate analysis of individual
samples) due to the lower frequency of comparisons. Consequently, periodic monitoring is generally
more costly in terms of time and reagents per comparison event. Special cause testing often requires the
largest number of patient samples or replicate testing of individual samples to provide increased statistical
power to detect a clinically significant difference between assays. Special cause testing should be used for
troubleshooting and follow-up to resolve comparability issues identified by a monitoring procedure. It is
important to note, however, that sample sizes for special cause assessments are generally expected to be
smaller than what is required for an initial method validation.
Comparability testing should only be conducted when all measurement systems that are being compared
are functioning according to the manufacturer’s recommendations and are judged to be in control.
However, comparability testing is a component of the quality assurance process and may provide an
indication that a measurement procedure needs to be reviewed for possible corrective action.
5.6 Triggers for Special Cause Comparability Testing
5.6.1 Failure of a Frequent or Periodic Monitor
When frequent or periodic comparison testing fails to pass acceptance criteria, it may be appropriate to
follow up with special cause testing to confirm noncomparability and to document conformance after
analytical issues have been resolved.
©

Number 19 C54-A
5.6.2 Proficiency Testing (PT)/External Quality Assessment (EQA) Failure
Comparability testing among methods or instruments may be useful to investigate and resolve a PT/EQA
failure. After correcting the analytical source of the PT/EQA failure, repeat testing between instruments
may be necessary to confirm comparability.
5.6.3 Shift in a Statistical Monitoring Parameter
Hematology instruments generally have a built-in software feature that provides a weighted moving
average of patient results for various parameters. Some chemistry instruments and laboratory information
systems (LIS) offer similar capability using various statistical procedures. If a change in the composition
of the patient population being monitored has been ruled out, a shift in the moving average, or other
statistical trend test, may be an indication for comparability testing, once any analytical issues have been
resolved.
5.6.4 Quality Control (QC) Result Failure
Results produced from the analysis of QC samples are used to monitor and verify that a measurement
system is performing within expectations for a stable measurement process. QC result acceptance criteria
are designed to detect unacceptable imprecision and bias that exceed the expectations for stable
measurement system performance. An unacceptable QC result or trend of results may be an indication for
follow-up with patient-sample-based comparability testing. It is important to note that QC materials are
manufactured to simulate properties of patient samples, but the processing required to produce QC
materials may cause them to be noncommutable with native clinical samples (see Section 6.1.3).
Consequently, in most situations, the results for QC samples cannot be reliably compared between
different instruments and methods as a surrogate for the comparability of patient results. However, when
identical analyzers—being monitored with identical QC materials—produce QC results that begin to
deviate from each other, that may suggest the need for comparability testing.
5.6.5 Reagent or Calibrator Lot Change
Reagent and calibrator lot changes are a commonly occurring special cause for comparability testing.
Good laboratory practice includes verification that patient results are comparable to those from a previous
lot when a new lot of reagents or a new lot of calibrator is put into service. In some countries, regulatory
or accreditation requirements dictate verification of performance following reagent or calibrator lot
changes.
5.6.5.1 Reagent Lot Change
The principal consideration when introducing a new reagent lot is the choice of material to use to verify
comparability of patient results with those from the prior lot. QC materials have been used for this
purpose, but have inherent commutability limitations that may confound conclusions based on results
following reagent lot changes. QC materials may have a different commutability characteristic (causing a
different noncommutability bias) between two reagent lots, which can cause an apparent difference
inconsistent with results for native patient samples, or can cause an apparent agreement or “false-negative
result” when a real difference exists for native patient samples (see Section 6 for recommendations on
materials to use for comparability testing, and limitations and verification practices necessary when using
QC or other materials with unknown commutability properties).
5.6.5.2 Calibrator Lot Change
If a calibrator lot is changed at the same time as a reagent lot change, then the precautions in Section
5.6.5.1 for reagents are applicable.
©

Volume 28 C54-A
If a calibrator lot change is made and the same reagent lot(s) continues in use, then the choice of materials
to use for comparability testing is simplified because QC materials can be used without additional
qualification. Commutability is a property of a non-native sample material that exists between a particular
material and reagent combination. When there is no change in reagent lot, there is no change in the
commutability property for a given material, and differences in results for a QC material between a new
vs a prior lot of calibrator are expected to reflect the relationship for native patient samples.
5.6.6 Ad Hoc Comparability Testing
Ad hoc verification of comparability may be indicated in the following circumstances: following

resolution of an underlying problem in one or more instruments, major maintenance, component
replacement, software update, or clinician inquiry regarding the accuracy of results.
6 Samples for Comparability Testing
6.1 Commutability
The selection of materials for comparability testing should take into account the commutability of the
material. Commutability is the equivalence of the mathematical relationships among the results of
different measurement procedures for a reference material and for representative samples of the type
intended to be measured. Freshly obtained patient samples represent the “ideal” material for comparison
testing, because they are the intended samples to be analyzed by measurement systems under routine
circumstances. When any other type of sample is used, its commutability with native patient samples
must be verified. Commercial materials intended for use in routine QC, demonstration of linearity,
external PT, and instrument calibration have, in most cases, had their matrices modified in ways that may
significantly affect commutability with native clinical samples.18,19 A manufacturer’s product calibrators
are typically intended only for use with a specific routine measurement procedure and are not
commutable for use with other manufacturers’ measurement procedures.
6.1.1 Patient Specimens
The optimal samples for comparability testing are native patient samples collected in an appropriate,
validated collection container and processed (and stored, if necessary) according to the stability
requirements of the analyte. Samples containing substances known to interfere with the assay being
compared should be excluded, because the purpose of the comparison testing in this guideline is to verify
comparability of results for typical samples, not to verify the specificity of the methods. Because routine
measurement procedures may not be completely specific for the analyte, there will be a distribution of
results for native patient samples when measured by two or more procedures. The distribution of results
will have contributions from the imprecision of measurement due to reproducibility and repeatability, and
from sample-specific nonspecificity effects. The statistical criteria to determine if two or more routine
measurement procedures have equivalent results should consider both sources of variability. However, as
noted previously, if two different methods have different specificities for the analyte, it may not be
possible to achieve comparable results for patient samples.
A second generally acceptable material for comparability testing is a pool of patient samples. Pooled
samples should be used when the number of measurement systems to be compared requires more sample
than is available from a single phlebotomy collection at one point in time. Pooled samples have the
limitation of not adequately representing individual samples, because differences between individual
patient samples may be masked. Interactions among donor samples may cause precipitation of some
serum proteins and protein-bound molecules that are important as analytes or as potential interferents in
the measurement procedures. Pooling may also dilute unspecified interfering substances to levels at which
they no longer interfere with a method. Thus, pooling native samples may be an advantage for the
©

Number 19 C54-A
purpose of comparability testing intended primarily to evaluate calibration differences (bias) among
measurement systems.
Collecting and processing samples for preparation of a pool require careful consideration of analyte
stability on storage prior to pooling and during the pooling process. CLSI/NCCLS document C3720
includes information on handling blood and serum for preparing large pools of reference material, but the
principles are applicable to smaller pools that would be used for comparability evaluation in one health
care system.
It is often difficult for laboratories to locate patient samples containing analyte concentrations of interest
for comparability testing. It may be necessary to add purified or partially purified analyte to native
clinical samples or pools of native samples to achieve higher levels of an analyte. The additive, or an
impurity in the additive, may have an unexpected influence on the matrix that would compromise the
“native” characteristics of the resultant sample. For some analytes that are normally not detectable in
healthy individuals, it may be possible to add a small amount of a sample with a very high amount of an
analyte (eg, human chorionic gonadotropin) to a pool. In some cases, there may be a metabolite of an
analyte that is also measured in a measurement procedure, in which case adding only unmetabolized
analyte to a pool may not be appropriate for comparison testing. For example, a comparison of a gas
chromatography-mass spectrometry method with an immunoassay for measurement of cyclosporin would
demonstrate better comparability between methods for a sample that was spiked with pure drug than for
native patient samples that contain a combination of drug and drug metabolites.
6.1.2 Commutable Reference and Control Materials
Reference materials, control materials, and PT/EQA materials that have been demonstrated to be
commutable with patient specimens for the method(s) being compared are suitable for comparability testing.
6.1.3 Quality Control Materials
Under some circumstances, QC results may be used to verify comparability of results among different
instruments and methods. When identical instruments use the same lot of reagents, there is a good
probability that the relationship between the QC material results on each instrument will be very similar
to the relationship for patient samples. This situation occurs because any matrix biases associated with
noncommutable QC sample results are expected to be the same when identical instruments and reagents
are used. However, when there are differences in instrument platforms (even from the same
manufacturer), different lots of reagents are used (even on the same instrument model), or different
measurement systems are used, it becomes increasingly likely that results produced from the analysis of
QC materials will not have the same numerical relationship among methods as do results from native clinical
samples, and erroneous conclusions regarding the comparability of patient sample results may be made.
Because QC materials are frequently noncommutable with native clinical samples, QC results may give
an apparent numeric relationship between measurement systems that does not reflect the true relationship
observed for patient samples. The lack of a difference observed using QC samples can be a “false
negative,” in which case a true difference for patient samples may be masked by an offsetting matrix bias
that gives the false impression that the measurement systems produce comparable results. Conversely, an
apparent difference observed using QC samples may be due only to a matrix bias, and the results for
patient samples may in fact be equivalent between the instruments.
6.1.3.1 Using QC Results From Different Measurement Systems
For a situation in which different instruments and/or reagents are used to measure the same analyte, the
relationship between results for QC samples from different measurement systems can be trended to
determine if any changes in the relationship have occurred. When a measuring system is performing
©

Volume 28 C54-A
within the expectations for a stable measurement process, the QC results should be stable and consistent
for a given instrument/reagent combination. If the numeric relationship between the means for the QC
results from two, or more, measuring systems is known, and the results for native clinical samples have
been verified to be comparable between those measuring systems, the numeric relationship between the
means of the QC results should remain constant as long as the performance of the measuring systems
remains unchanged. The numeric relationship between the means for the QC results can be monitored
over time (moving means) as an indicator that the previously established comparability based on native
clinical samples has remained unchanged. However, the numeric relationship between the moving means
for QC results may change every time the lot of QC material changes and every time the lot of reagents
changes on any of the measuring systems. Consequently, it is necessary to reestablish the numeric
relationship between means of the QC results, with verification that results for native clinical samples
have remained comparable among the measurement systems following lot changes. It may be more
practical to perform frequent or periodic comparability monitoring using native patient samples than to
conduct the validation necessary to base monitoring on QC results.
6.1.3.2 Use of QC Materials When Preanalytic Stability of Patient Samples Is Limited
There may be circumstances when the preanalytic stability of an analyte is a limiting factor for
comparison testing (ie, stability is less than the time required to transport a sample aliquot to each of the
instruments to be compared). There may also be situations when analyte concentration or activity at
appropriate levels for testing cannot be realistically achieved with patient samples. In these cases, QC
materials (or other reference or PT/EQA materials) may be the only samples available for comparison
testing. The commutability limitations of QC, PT/EQA, and reference materials described above must be
considered when making conclusions regarding the comparability of results for patient samples.
Differences between measuring systems observed using noncommutable samples may be due to a true
bias, a bias caused by the sample matrix, or a combination of both. The QC, reference, or PT/EQA
materials used should be validated for suitability in evaluating comparability of patient sample results for
the measurement systems involved.
6.1.3.3 Use of QC Materials, or Verification Materials, Provided by the Measurement System Manufacturer
An IVD manufacturer may provide control materials specifically designed and validated to verify that the
performance of their measurement systems meets the claims in the product labeling. Although these
materials may not be commutable with patient samples, the manufacturer may have designed the
materials to have approximately constant performance among the measurement systems identified in the
product labeling. Such materials may be used as samples for comparability testing among the
measurement systems that are specifically identified in the product labeling. Review of the claims for
such control materials and confirmation that the comparability limits meet the laboratory’s clinical
requirements are recommended.
A control material provided by one measurement system manufacturer will not be suitable for use with a
measurement system from another manufacturer, because the material will not have been validated to be
commutable with patient samples among the different measurement systems.
6.1.4 Materials Used for External Quality Assessment, Proficiency Testing, or External Group
Quality Control Evaluation
If preanalytic sample stability is a limiting factor such that patient samples cannot be used, materials used
for external evaluation of performance may be considered for use as comparison materials. These types of
materials are typically not validated for commutability with native clinical samples. However, the
materials are typically analyzed by a large number of laboratories using the same instruments and
methods, and the mean value within a group of peer instruments and methods will include bias
components attributable to calibration and to noncommutability. However, the noncommutability can be
©

Number 19 C54-A
assumed to be approximately constant within the peer group. The mean value for a peer group with a
sufficiently large number of participants (usually considered to be ≥ 10), and an acceptable among-
participant SD,21,22 can be assumed to represent a value consistent with use of the measurement system in
compliance with the manufacturer’s instructions. However, there may be reagent-lot-specific matrix
biases within the peer group data that require wider acceptance limits than would be applicable for native
patient samples.
An individual laboratory can use the mean value for the appropriate peer group to determine that an
instrument/method combination has remained stable and continues to meet the performance verified at the
time of the external assessment event. Many assessment programs allow participants to purchase extra
vials of the materials to use for internal verification procedures. It is also possible to store residual
quantities of the external assessment materials under storage conditions (usually frozen at −70 °C) that
will prevent degradation of the analyte. However, caution must be used when storing extra vials, or
storing and reusing residual external assessment materials, because they may not have been validated for
this purpose. In particular for residual material, there may be deterioration of the analyte during its open
vial use period, during storage, or caused by a freeze-thaw cycle; in addition, the matrix may be altered by
the storage conditions.
The results from the analysis of an external-assessment material cannot be used to directly compare
different measurement procedures (ie, those that are not considered members of the same peer group).
The material is not likely to be commutable with native clinical samples, and it is not possible to
determine if the numeric relationship between different measurement procedures is influenced by the
presence of a matrix bias (see Section 6.1.2). Consequently, results that appear to agree could have a
calibration bias that is offset by a matrix bias.
6.1.5 Other Nonpatient Materials
Linearity verification materials and routine measurement calibration materials (ie, manufacturer’s product
calibrators) are not recommended for verification of comparability of patients’ results, because these
materials are not intended to be commutable with native patient samples.
6.2 Analyte Concentrations for Testing
The laboratory director must consider the clinical use of an assay and practical limitations on the
laboratory when designing comparison testing procedures. To apply the comparability method described
in this document, comparisons must be made at analyte concentrations where there is a reliable estimate
of the imprecision of measurement (see Section 8.2.2). Most laboratories have insufficient resources to
test across the full AMR of a measurement system at each comparison testing event, although this would
be the ideal approach. Laboratories may choose to perform comparability testing near the mean
concentration(s) of QC material(s), because imprecision is known at these concentrations; near significant
clinical decision values or upper and lower reference interval limits; or some combination of these across
consecutive comparability testing events.
6.3 Storage and Transport
A laboratory may choose to store native patient samples containing specific analyte concentrations for
future comparison testing (eg, samples with concentrations that are infrequently encountered). Preanalytic
variables and appropriate storage conditions must be taken into consideration when storing samples.
Caution should be exercised when storing frozen samples. Freezers with automatic defrost cycles should
not be used, because these operate by periodically warming the compartment, partially thawing the
contents, then refreezing. Samples should be stored below −70 °C23 to ensure immobilization of solid
water and suitable stability of the frozen condition.
©

Volume 28 C54-A
Use care when transporting samples between measurement systems to ensure appropriate stability of the
analyte of interest and to prevent evaporation. When there is a transportation delay between locations that
need to measure the same sample, it is recommended to prepare aliquots and coordinate the testing so all
aliquots are measured at approximately the same time. It is generally not recommended to base an
evaluation on a value measured at a significantly different time from other measurements. Determination of
what constitutes a significantly different time depends on the stability characteristics of a given analyte.
7 Acceptance Criteria for Comparability Testing of Patient Results

There are no universally accepted criteria for evaluating the results of comparability testing; therefore, the
laboratory director must determine the limits of acceptable differences for results produced by different
measurement systems for the same analyte. The choice of criteria may vary from analyte to analyte and
heavily depends on the availability of published information for each analyte (eg, clinical studies,
biological variability data, and data from external PT/EQA programs). The primary objective is
agreement between results from different measurement systems that is acceptable for the clinical
situations in which the results will be interpreted. However, the inherent performance characteristics of
the measurement systems should be taken into consideration when establishing acceptance criteria. If
system capability is insufficient to meet desired comparability criteria, the frequency of comparability
monitoring event failures will be impracticably high. Under those circumstances, improvement of
measurement system performance (ie, replacement of a measurement system or optimization of current
system operation) would be required to meet the desired comparability criteria. Circumstances may also
arise when criteria based on the imprecision of the measurement systems being compared may be more
stringent than necessary for clinical requirements, and the laboratory director may choose to base the
acceptance criteria on clinical requirements.
Analytical goals for comparability of results can be defined using clinical approaches, expert opinion, or
statistical approaches. The goal for comparability may vary depending upon the clinical use of an assay.
Greater agreement among results is required when a result is used to identify changes in an individual
patient over time vs use of a result as a component of an initial diagnostic workup. Therefore,
comparability testing of a measurement system across different laboratory instruments that may be used
to monitor the same patient over time should use comparability criteria consistent with that required for
serial patient monitoring. A consensus hierarchy of approaches to establish criteria for analytical
performance and measurement system comparability has been proposed by the International Organization for
Standardization (ISO) Technical Committee 212 Working Group on Analytical Performance Goals Based on
Medical Needs and members of the International Federation of Clinical Chemistry and Laboratory Medicine
(IFCC).24 The following approaches are listed in order of preference.
7.1 Evaluation of Comparability Based on Clinical Outcomes
Acceptance criteria based on well-designed, clinical outcomes studies are the highest standard for
evaluating comparability testing. The strength of this approach is the clinical impact of analytical
performance cannot be ignored. The weakness of the approach is clinical outcome studies are difficult to
perform; therefore, there are very few examples in laboratory medicine.25-29
Example 7.1
An example of an assay with clinical outcomes-based data that can be used to make comparability
recommendations is the use of the hemoglobin A1c assay (HbA1c) for monitoring an individual’s diabetes
control. The Diabetes Control and Complications Trial on Clinical Outcomes Related to HbA1c indicated
that a HbA1c of 8.0% has a poorer clinical outcome compared to a HbA1c of 7.0%, and should therefore
be accompanied by a change in patient management. Total error between methods should be kept to
below ±1% (ie, an absolute change of 1% reporting unit) so a patient with a poor clinical outcome
(HbA1c ≥ 8.0%) cannot be misclassified as a well-controlled diabetic (HbA1c ≤ 7.0%). As total error
©

Number 19 C54-A
includes both the imprecision and bias, when both measurement systems have low imprecision, some bias
between methods may be tolerated.
7.2 Evaluation of Comparability Based on Clinician’s Questionnaire
An alternative approach is to survey clinicians by questionnaire in order to determine their expectations of

analytical quality that would give them confidence in managing patients.30-33 The goal is a clinical
consensus of the magnitude of change in an individual patient’s results that would result in a change in
clinical management. The advantage of this approach is it is based on clinical experience and therefore
acceptable to clinicians. The disadvantage of the approach is clinicians’ expectations may be based on
prevailing standards of analytical comparability, and also that the method still requires a rigorous
methodology to define the clinical scenario and its analytical correlate.
Example 7.2
An example would be a survey that indicated that clinicians interpreted a 10% change (eg, a change in
HbA1c concentration from 8.0% to 7.2%) in the HbA1c result as a significant change in a patient’s
clinical condition.34 A total error goal less than 10% would consist of the known imprecision of both
methods as well as the acceptable bias between the methods.
7.3 Evaluation of Comparability Based on Biological Variability
When monitoring a patient using two methods for the same analyte, each method may have analytical
imprecision that is within desirable limits (ie, <0.5 CVI), but bias between the methods may significantly
reduce comparability. Therefore, for two methods that individually achieve desirable analytical
imprecision as defined above, a quality specification for the allowable difference between two methods
measuring the same analyte can be defined as: allowable difference < 0.33 x CVI.35
Appendix D shows the derivation of the desirable limits for difference between two methods for the same
analyte using the principles of biological variability. Values for CVG and CVI are available for most
common analytes.36,37 Detailed discussions of this approach are available in numerous publications.35,38-43
The strength of the biological variability approach is it uses a defined statistical approach, uses
measurable biological variability parameters, and takes into account the impact of measurement error on
clinical interpretation of results. The weaknesses of this approach include that it is not based on clinical
outcomes, and the necessary parameters for the calculations are not available for all analytes. In addition,
the analytical performance of currently available methods is insufficient to meet the goals for desirable
bias defined by the biological variability approach for several common analytes (eg, serum/plasma
sodium and calcium measurements).
Example 7.3
For the monitoring of HbA1c, where the within-subject biological variability (CVI) is 5.6%, the allowable
difference between two methods that have achieved desirable imprecision is: allowable difference < 0.33
x CVI or allowable difference < 1.8% (eg, if the HbA1c measurement on a patient sample by one method
is equal to 8.0%, HbA1c values for comparison methods would need to be between 8.1% and 7.9% to
meet the requirement).
©

Volume 28 C54-A
7.4 Evaluation of Analytical Performance Based on Published Professional

Recommendations
National or international professional expert bodies make judgments regarding what is an acceptable bias.
The guidelines produced by these bodies are based on the combined expert understanding of the
profession, often expressed as a consensus finding.
The strength of this approach is it includes the understanding of experienced and reputable experts of
analytical differences that could result in differing clinical interpretation of results. The weakness of this
approach is it is neither statistically rigorous nor based on clinical outcome studies. Another possible
weakness is these bias goals may refer to the difference between a method and a traceable target value
rather than the bias between two methods.
Example 7.4
The National Cholesterol Education Program states that acceptable bias between cholesterol methods is
≤ 3% at 200 and 240 mg/dL (5.2 and 6.24 mmol/L).
7.5 Evaluation of Analytical Performance Based on Goals Set by Accrediting Agencies
Accrediting and regulatory agencies may define acceptable goals for imprecision or inaccuracy that are
based on a combination of observed performance from PT/EQA data and advice obtained from industry
and professional leaders or advisors.
The strength of this approach is it takes into account both the capability observed in industry as a whole,
as well as the informed advice from industry and professional advisors that include what should be
achievable in all laboratories. The weakness of this approach is it reflects what can be achieved rather
than what is clinically required. Another possible weakness is these bias goals may refer to the difference
between a method and a traceable target value rather than the bias between two methods.
Example 7.5
The Royal College of Pathologists of Australasia and the Australian Association of Clinical Biochemists
state that in the Australian Quality Assurance Program in Chemical Pathology, the allowable limits of
performance for total cholesterol is ±0.5 mmol/L up to 10 mmol/L and ±5% above 10 mmol/L (±19 mg/dL
up to 387 mg/dL and ±5% above 387 mg/dL).
7.6 Evaluation of Analytical Performance Based on the General Capability
In this approach, performance that is similar to that of peers is defined as acceptable. Biases between
measurement systems that are within the usual range of differences observed for similar measurement
systems are defined as acceptable, because the industry and profession already accept those differences
existing between laboratories and measurement systems.
The strength of this approach is the information is readily accessible from PT/EQA results. The weakness
of the approach is large differences may often be seen in PT/EQA schemes and some may be due to
matrix errors. There is also no assessment made of the possible differences in clinical interpretation that
could result from the differences observed. Similarly, differences between laboratories may be corrected
for by differences in reference intervals and decision limits that are not evident in results from PT/EQA
schemes.
©

Number 19 C54-A
Example 7.6
PT/EQA testing shows that the bias from a target value for HbA1c is ≤ 0.13% (an absolute difference of
0.13% reporting units) for the best 20% of laboratories. Fifty percent of laboratories have a bias of
≤ 0.29% (an absolute difference of 0.29% reporting units) and 10% of laboratories have a bias of > 0.77%
(an absolute difference of 0.77% reporting units) that may be considered less acceptable.
8 Statistical Evaluation of Comparability Data

Analysis of comparability data does not always require sophisticated statistical analysis. Inspection of a
simple plot or table of comparison data should be the first step in any evaluation of comparability data.
This may be sufficient to assure the medical director that assays are performing in a comparable manner.
It is left to the laboratory director’s discretion to determine when a more rigorous analysis is required.
8.1 Hypothesis Testing
This document’s statistical procedure for assessing the comparability of laboratory methods employs
hypothesis testing. Hypothesis testing involves stating a null hypothesis (usually that the laboratory
methods produce equivalent results), calculating a statistic, and rejecting the null hypothesis if the value
of the statistic is highly unlikely when the null hypothesis is true. The significance level of a hypothesis
test is the probability of incorrectly rejecting the null hypothesis when it is actually true, also called Type I
error. The significance level is usually selected prior to conducting a hypothesis test. Power is the
probability of correctly rejecting the null hypothesis when it is false. Incorrectly accepting the null
hypothesis when it is actually false is called Type II error. Power is a property of the hypothesis test
design and is useful for understanding the reliability of a hypothesis test.
An assumption of the hypothesis testing procedure in this document is that for any given specimen and
laboratory method, replicate results are characterized by a normal distribution with some mean and
variance. In reality, the mean and variance are typically unknown, but can be estimated from replicate
measurements. The standard deviation, s, is defined as the square root of the variance.
Alternatively, the standard deviation can be estimated using data derived from long-term QC testing (see
Section 8.2.2). Refer to Appendix C for a more detailed discussion of statistical concepts.
8.2 Statistical Analysis of Comparability Data
Traditional approaches to method comparison (eg, Student’s t-test, linear regression) are not easily
adaptable to the simultaneous statistical comparison of multiple instruments, and typically require large
sample sizes. This document presents an intuitive, simple, and statistically valid approach for the
collection and simultaneous analysis of method comparison data from multiple instruments. The method
utilizes comparisons at preselected analyte concentrations through parallel, replicate testing of a single
specimen on two or more instruments. All of the results produced by the instruments are then compared
using the studentized range test. This approach minimizes the impact on the laboratory of comparability
testing, yet provides adequate detection of clinically significant differences between instruments.
Comparisons are made at analyte concentrations where reliable estimates of measurement imprecision are
available to the laboratory (see Sections 8.2.2 and 8.2.3).
8.2.1 The Studentized Range Test
The studentized range test evaluates the hypothesis that two or more methods yield equivalent values, on
average, using the range of results—the difference between the maximum value and minimum values (or
the maximum and minimum mean values) obtained by testing a single sample (or pooled sample) on two
or more instruments. If the range is sufficiently small, the conclusion is the results are equivalent among
©

Volume 28 C54-A
the measurement systems. If the range is too large to attribute plausibly to chance, the conclusion is the
results are not equivalent. It is reasonably assumed that each measurement system’s results include
random error that follows a normal distribution. The variance of random error is assumed to be the same
for both measurement systems; however, the test statistic distribution is sufficiently robust to tolerate
inequality of method variances up to a factor of four.44 The power of the test (the ability to detect a
difference) can be increased or decreased by increasing or decreasing the number of replicate tests
performed on the sample (or pooled sample) on each measurement system being compared (see Appendix
C8). If the maximum difference between results from any two measurement systems is significantly
different, the next largest difference can be tested, and then the next largest, etc., until a nonsignificant
difference is obtained. This iterative process allows the laboratory to identify measurement systems that
do not meet the specified requirement for comparability. Multiple applications of a hypothesis test
increase the power, but also increase the aggregate probability of a Type I error (ie, falsely concluding the
methods are not comparable when they truly are). See Appendix C for additional discussion.
A statistical table has been provided to assist the laboratory in designing a comparison experiment by
determining the number of replicates to be run to attain a power of 80% and detect a defined difference
between instrument results at an alpha level of 0.05.
8.2.2 Estimating Variance Over the Analytical Measurement Range Using Quality Control Data
Data from statistical QC monitoring can be used to estimate the variance, the standard deviation (SD), or
the coefficient of variation (CV) at concentrations or activities near those of the QC materials. Data
should be collected for at least six months to ensure most sources of variability in the measurement
procedure are represented in the imprecision estimate. Sources of variability include calibration events;
changes in lot of calibrator material; reagent changes during use; changes in lot of reagent; changes in
components such as pipettes, temperature control, washing systems, detection devices, etc.; maintenance
cycles; environmental factors such as temperature and humidity; and fluctuations in electrical power.
The QC materials used to estimate variance may deteriorate during their open bottle use period and during
long-term storage. If these conditions occur, the estimate of the SD or CV applicable to a patient sample
may be artifactually increased. Despite this potential limitation, QC data are generally the best available
sources for estimating the variance (imprecision) of a method. Data from PT/EQA suggest that estimates
of imprecision derived from commercially prepared control materials are comparable to estimates derived
from frozen serum.45
8.2.2.1 Estimation of a Measurement System Precision Profile
Available QC materials often do not challenge the full AMR of a measurement system. When variance
estimates are needed for concentrations or activities that differ from those of the QC materials, additional
procedures are necessary. Consultation with measurement system manufacturers about available
imprecision estimates at other concentrations or activities may be helpful. Laboratories may also produce
estimates of imprecision using pools of patient samples, calibrator, linearity, or PT/EQA materials
following CLSI/NCCLS document EP0511 (20-day protocol), at one or more concentrations or activities
in the extended range likely to be encountered in samples to be used for verification of comparability
between methods. However, a 20-day protocol is unlikely to account for a number of the variability
sources described above, and cumulative data over a longer time period (eg, six months) are preferred.
CLSI document EP1512 (five-day protocol) is not adequate in this situation, because that procedure is
intended primarily to verify manufacturers’ claims, not to estimate the imprecision as a basis for
acceptance criteria for another statistical test.
It is recommended to determine the SD at an adequate number of concentrations or activities over the

range that will be used for comparison evaluation among measurement systems. In many cases, the SD
may be approximately constant over a range of concentrations or activities, or may be proportional to the
©

Number 19 C54-A
concentration or activity, in which case the CV will be reasonably constant. In either case, development
of a table or graph of the SD at different concentrations or activities to use in the range test statistic is
suggested. When there is a consistent change in SD with concentration or activity, interpolation between
the values determined will allow good estimates of SD at intermediate concentrations or activities.
Particular caution needs to be paid to low concentrations or activities where the SD may increase
substantially as the concentration of activity approaches the lower limit of the measurement range. An
“imprecision profile” for an analyte measured on a given measurement system needs to be established
only once. It is reasonable to assume the variance will be approximately the same at subsequent time
periods as long as QC indicates the measurement system continues to meet its specifications.
For concentrations or activities that approach the lower limit of the AMR, it may be necessary to base
acceptance criteria on an absolute difference in concentration or activity units between results for two or
more measurement systems. The absolute difference may be based on medical usefulness criteria rather
than statistical performance criteria. The laboratory director will need to establish the medical usefulness
criteria based on the population served by the laboratory.
8.2.3 Identifying Concentrations Suitable for Use in the Comparison Evaluation
For the range test comparison protocol, patient samples are selected with concentrations or activities that
are within the range of the “precision profile” for an analyte measured on a given measurement system
(see Section 8.2.2). If a “precision profile” has not been established, the laboratory should select patient
samples with concentrations or activities that are “reasonably close” to the nominal values for QC
materials, or other values for which the SD is known. “Reasonably close” is difficult to specify but, in
general for values reported to at least two significant digits, values within 20% will likely have similar SD
to that of the QC or other material. A “precision profile” is recommended for samples reported to a single
digit or that approach the lower limit of the AMR.
A special procedure is needed when different measurement systems are to be compared and they use QC
materials that have different nominal analyte values and, thus, different SDs at those values. In this
situation, it is recommended to prepare pools of patient samples and measure aliquots from each pool
according to CLSI/NCCLS document EP0511 to establish an SD value for each measurement system at
the nominal concentration of the pool.
8.3 Fixed Limit Evaluation
In some situations, it may be desirable to establish a fixed limit for the agreement between the largest and
smallest numeric values observed among a group of measurement systems being compared. Analyte
concentrations that approach the lower limit of quantitation are a common situation in which a criterion
based on a percent difference is not suitable, because the small magnitude of the numeric result causes a
small absolute difference to be a large percent difference. In such situations, the concepts described in
Section 7 should be considered, but the fixed limit generally requires judgment of the clinical impact of
differences at numeric values for which a percent criterion is not realistically achievable.
For example, CLSI/NCCLS document C3046 states: “Ideally, 95% of the individual results from the
POCT glucose monitoring system should agree within ±15 mg/dL (±0.83 mmol/L) of the laboratory
analyzer values at glucose concentrations below 75 mg/dL (4.2 mmol/L) and within ±20% of the
laboratory analyzer values at glucose concentrations at or above 75 mg/dL (4.2 mmol/L).” In this
example, the ±15 mg/dL (±0.83 mmol/L) fixed criterion below 75 mg/dL (4.2 mmol/L) reflects the state
of the art in POCT glucose devices at the time the guideline was published.
©

Volume 28 C54-A
9 Point-of-Care Testing (POCT)

Point-of-care testing (POCT) presents unique challenges for comparability testing. Large numbers of
instruments may have to be compared against a laboratory instrument (reference instrument), and reagents
are frequently expensive. Methods to be compared often analyze different sample types (eg, whole blood
vs plasma or serum), require different sample acquisition techniques (eg, venipuncture vs fingerstick), or
employ nontraditional analysis (eg, measurement of transcutaneous bilirubin). On the other hand,
laboratory testing and POCT are frequently performed on specimens collected in close temporal
proximity to each other (eg, result confirmation testing), providing an opportunity for comparison.
9.1 Specimen Selection
Whenever possible, comparisons should be performed with simultaneously obtained specimens of the
proper type for the measurement systems involved (eg, fingerstick [capillary] whole blood for a POCT
glucose measurement system vs venous plasma measurement system), unless both systems use the same
sample type. Otherwise, artifactual differences may be detected (due to a sample being inappropriate for
one of the measurement systems), or differences that should be evident (due to the difference in sample
sources normally used in the measurement systems) may be masked. For example, venous specimens may
not be appropriate for analysis with POC glucose testing devices that employ a methodology that is
dependent upon a minimum sample oxygen content. Conversely, other POC glucose testing
methodologies may be suppressed by elevated oxygen concentrations and would not be appropriate for
the analysis of arterial specimens.
9.2 Specimen Acquisition
POC comparison testing can often be accomplished by performing POCT analysis at the same time
venous (or arterial) specimens are collected for laboratory analysis via phlebotomy. This approach has
two major advantages: 1) reduction of POCT performance variability by limiting the number of testing
personnel; and 2) elimination of the possibility for a change in the patient’s condition (eg, glucose or
insulin administration in the case of POC glucose testing) between the collection of the POCT and
laboratory specimens. Disadvantages include: 1) additional discomfort for the patient (if a capillary
specimen is required for POCT); 2) more complex planning and coordination to execute; 3) more
difficulty in assessing the full range of test results; and 4) failure to assess comparability across all testing
personnel. Laboratories should follow local policies with respect to requirements for institutional review
board approval and patient consent for the collection of additional specimens for method validation or
comparison testing.
A variation of this approach that avoids additional sample collection (and possible patient discomfort),
involves comparison testing performed on specimens that have been collected in vacuum tubes at the
bedside and transported to the laboratory for analysis. This approach is only applicable if the
manufacturer has validated testing of anticoagulated, noncapillary specimens on POCT instruments. This
approach overcomes the disadvantages noted above, except it still fails to assess comparability across all
testing personnel. In addition, POCT measurement of an analyte in venous (or arterial) anticoagulated
whole blood may not mimic measurements made on capillary specimens, and influences of a capillary
specimen on analyte values is not evaluated. Note that safety concerns generally limit testing of vacuum
tube specimens to the laboratory, where tubes can be safely decapped behind a protective screen to
acquire an aliquot of blood for POCT.
Another approach monitors the agreement between results from POCT at the bedside with near-
simultaneously collected blood sent to the laboratory for analysis. Laboratory analysis of blood collected
proximal in time to a POCT measurement generally gives acceptable agreement, provided the laboratory-
analyzed specimens have been properly collected and handled to ensure analyte stability prior to analysis.
How close in time the two specimens must be collected depends upon the analyte in question and the
©

Number 19 C54-A
probability of a change in analyte concentration between the samplings. Advantages of this approach
include: 1) relatively frequent assessment of comparability; 2) assessment of the entire testing process;
3) assessment of a greater range of test results; 4) assessment of all testing personnel on all shifts and
days of testing; and 5) assessment of all measurement systems. Disadvantages to this approach include:
1) potential introduction of additional variability due to changes in the patient’s condition between
collection of the laboratory and POCT specimens (eg, insulin or glucose administration for glucose
POCT); 2) introduction of variability due to a lack of synchronization of clocks used to identify sample
collection times; and 3) inability to differentiate noncomparability due to operator error from that due to
analytical error. In addition, to evaluate the comparability of individual POCT measurement systems,
results must be tracked by individual devices in use. Confirmation of potential noncomparability using
this approach should be confirmed using one of the simultaneous testing protocols described previously.
9.3 Range of Specimen Values
Test results ideally should span the AMR of the system in question. At a minimum, low and high
concentrations (eg, upper and lower reference interval limits) or those around clinical decision points
should be analyzed.
9.4 Multiple Devices of the Same Make and Model
For POCT programs with a large number of devices of the same make and model in use (eg, glucose
meters), an alternative approach is acceptable for documenting comparability of patient results between
the POCT devices and the laboratory. When all of the POCT measurement systems of the same make and
model to be evaluated are using one lot of reagent strips/cartridges, comparability testing can be
performed with a representative subset of POCT devices while simultaneously evaluating QC and/or
PT/EQA results among all of the POCT devices. If comparability of patient results among the subset of
POCT devices included is confirmed, the comparability of the other POCT devices of the same make and
model can be inferred from acceptable agreement of QC and/or PT results for the same lots of reagent
strips/cartridges and QC or PT materials. If more than one device type or lot of reagents is used, this
process must be repeated for each combination. In subsequent comparability evaluation events, different
subsets of instruments should be tested so, over time, all POCT devices will have comparability testing
performed on them.
9.5 Statistical Considerations for POC Comparability Testing
For circumstances where the range test protocol is not applicable (eg, sample types required by laboratory
and POCT measurement systems are different; simultaneous comparison of multiple specimens),
alternative statistical approaches for comparison of measurement systems have been described.47,48
10 Range Test Comparability Protocol

This protocol is designed for studies comparing up to 10 measurement systems with a maximum of five
replicates per system. If the criteria for the range test comparability protocol cannot be met, alternative
approaches should be considered, such as the protocols described in CLSI document EP1512 or
CLSI/NCCLS document EP09.1 The following protocol should be performed for at least two analyte
concentrations on each measurement system per comparability testing event.
10.1 Select an Analyte for Comparison
Any analyte measured by more than one test system in a health care system should be considered for
comparability testing.
©

Volume 28 C54-A
10.2 Select the Instruments to Be Compared
Ideally, all measurement systems that are currently in use for measuring patient samples for the analyte
should be compared.
10.3 Identify an Approximate Analyte Concentration for Comparison Testing
(1) Select an analyte concentration for which an estimate of the imprecision of the assay (ie, CV) is known
for each of the measurement systems. Imprecision estimates may be derived from testing of control
materials or other appropriate materials (see Section 6). Long-term CVs should be determined from at
least six months of data. Record the CVs of the assays to be tested.
(2) Compare the magnitudes of the CVs of the measurement systems to determine if the greatest and
smallest CVs differ by less than twofold (ie, 2x). If so, proceed to the next step. If the CVs differ by
greater than twofold, consider protocols from CLSI document EP1512 or CLSI/NCCLS document
EP091 for demonstration of measurement system comparability.
(3) Calculate a pooled CV from the measurement system CVs:
Pooled CV = ([CV12 + CV22 +… CVn2]/n)½
NOTE: Calculation of a pooled CV using this equation assumes the long-term CVs for the measurement
systems (CV1, CV2, …, CVn) are all calculated from approximately equal numbers of measurements (ie,
equal sample sizes).
10.4 Calculate the Desired Concentration or Activity to Be Used for Comparison Sample
Selection
Determine the mean concentration or activity value of the control (or other) material that will be used to
estimate imprecision for each of the measurement systems. Calculate the grand mean of the measurement
system mean values for the material and record it. Use the grand mean as the comparison sample desired
concentration or activity for Section 10.5:
• Comparison sample desired value = grand mean of control material means = (mean control material
concentration for analyzer A + mean control material concentration for analyzer B+…+ mean control
material concentration for analyzer J)/number of analyzers to be compared.
If a precision profile has been determined, the desired concentration is within the range of values over
which the SD or CV is approximately constant (see Section 8.2.2.1).
10.5 Select a Sample for Comparison Testing
Identify a specimen that: 1) meets the stability requirements of the analyte for all assays; 2) does not
contain substances that interfere with the assays being compared; 3) has sufficient volume for testing; and
4) has an estimated value (based on testing on any one of the measurement systems to be compared)
within 20% of the test sample target value calculated in Section 10.4 (see Section 8.2.3 for exceptions to
this approach). If a large number of measurement systems are to be compared, pooled samples may be
used (see Section 6.1.1 for potential limitations of pooled specimens). If measurement systems to be
evaluated are located remotely from each other, be sure to stabilize the specimen appropriately for
transportation.
©

Number 19 C54-A
10.6 Select the Appropriate Level of Acceptance Criteria That Can Be Applied to the
Comparison Test (from Section 7)
(1) Determine if there are recommendations based on clinical outcomes studies that are within the
performance specifications of the measurement systems being compared (ie, the long-term CVs of the
assays to be compared are less than the recommended acceptance criteria); if not, proceed to the next
level of evidence.
(2) Determine if the clinicians at the institution(s) have specific recommendations based on their clinical
experience that are within the performance specifications of the methods being compared; if not,
proceed to the next level of evidence.
(3) Determine if there are recommendations based on biological variability that are within the performance
specifications of the methods being compared; if not, proceed to the next level of evidence.
(4) Determine if there are minimal requirements set by an accreditation agency; if not, proceed to the
next level of evidence.
(5) Determine the analytical capability of the measurement system based on external PT (EQA) data; if
no data are available, proceed to the next level of evidence.
(6) If no external comparability criteria are applicable, determine the analytical capability of the
measurement system based on internal imprecision data.
10.7 Set the Critical Difference for the Comparability Test at the Recommended Total
Error or Bias Limit Determined in Section 10.6
See Section 7 and Appendix A for examples.
10.8 Determine the Number of Replicates to Be Run
Using the pooled CV calculated in Section 10.3, determine from Appendix B the number of replicate
analyses to be performed on each instrument. (See Appendix B for instructions.)
10.9 Perform the Comparison
(1) Analyze the specimen selected in Section 10.5 on each of the measurement systems to be compared,
performing the number of replicates specified in Section 10.8.
(2) If replicate analyses are not indicated, the individual results from each measurement system will be
compared directly. Calculate the mean of the individual results from all of the measurement systems.
(3) If replicate analyses are performed, calculate the mean value from the replicate analyses of the
specimen separately for each measurement system. Calculate the grand mean of the measurement
system mean values.
(4) Calculate the range as the difference between the most disparate measurement system means divided
by the mean of the measurement system means:
When replicate measurements are made: [(highest measurement system mean − lowest measurement
system mean)/grand mean of all measurement system means] • 100%.
When replicate measurements are not made: [(highest measurement system value − lowest
measurement system value)/mean of all measurement system values] • 100%.
©

Volume 28 C54-A
(5) Compare the calculated range with the critical difference determined in Section 10.7.
(6) If the calculated range is less than or equal to the critical difference, conclude that all measurement
systems perform comparably at the analyte level evaluated.
(7) If the calculated range is greater than the critical difference, conclude that the measurement systems
with the most disparate mean values (or individual values) perform significantly different from each
other. Identify the two measurement systems with the next most disparate means or individual values
(one of the previous means or individual values from the pair just tested should be included in the
new pair to be tested) and perform the comparison test. Continue the comparisons until a
nonsignificant difference is obtained, or until all measurement systems are determined to perform
significantly different from each other.
(8) If a measurement system is known to produce results with an expected bias vs another measurement
system, calculate the range as noted below and compare the resulting value with the critical difference
to determine if the known and expected bias is greater than expected (see Example 3 in Appendix A):
When replicate measurements are made: [(highest measurement system mean − lowest measurement
system mean)/mean of all measurement system means] • 100% − expected difference (in %).
When replicate measurements are not made: [(highest measurement system value − lowest measurement
system value)/mean of all measurement system values] • 100% − expected difference (in %).
10.10 Evaluate the Clinical Relevance of the Comparison Results

The medical director must assess the medical significance of any statistically significant comparison
differences (see Section 8 and Appendix C for a discussion of Type I and Type II errors).
10.11 Troubleshooting Noncomparability
If a measurement system is determined to be noncomparable, troubleshoot any analytical problems and

repeat the comparison.
©

Number 19 C54-A
References
1
CLSI/NCCLS. Method Comparison and Bias Estimation Using Patient Samples; Approved Guideline—Second Edition. CLSI/NCCLS
document EP9-A2. Wayne, PA: NCCLS; 2002.
2
CLSI. Statistical Quality Control for Quantitative Measurement Procedures: Principles and Definitions; Approved Guideline—Third
Edition. CLSI document C24-A3. Wayne, PA: Clinical and Laboratory Standards Institute; 2006.
3
CLSI. Metrological Traceability and Its Implementation; A Report. CLSI document X5-R. Wayne, PA: Clinical and Laboratory Standards
Institute; 2006.
4
ISO. In vitro diagnostic medical devices – Measurement of quantities in biological samples – Metrological traceability of values assigned
to calibrators and control materials. ISO 17511. Geneva: International Organization for Standardization; 2003.
5
Guideline for Isolation Precautions: Preventing Transmission of Infectious Agents in Healthcare Settings 2007. Available at:
http://www.cdc.gov/ncidod/dhqp/gl_isolation.html. Accessed 7 March 2008.
6
CLSI. Protection of Laboratory Workers From Occupationally Acquired Infections; Approved Guideline—Third Edition. CLSI document
M29-A3. Wayne, PA: Clinical and Laboratory Standards Institute; 2005.
7
ISO. International Vocabulary of Basic and General Terms in Metrology. Geneva: International Organization for Standardization; 1993.
8
ISO. Accuracy (trueness and precision) of measurement methods and results – Part 1: General principles and definitions. ISO 5725-1.
Geneva: International Organization for Standardization; 1994.
9
ISO. Statistics – Vocabulary and symbols – Part 1: Probability and general statistical terms. ISO 3534-1. Geneva: International
Organization for Standardization; 1993.
10
ISO. In vitro diagnostic medical devices – Measurement of quantities in samples of biological.origin – Description of reference materials.
ISO 19194. Geneva: International Organization for Standardization; 2002.
11
CLSI/NCCLS. Evaluation of Precision Performance of Quantitative Measurement Methods; Approved Guideline—Second Edition.
CLSI/NCCLS document EP5-A2. Wayne, PA: NCCLS; 2004.
12
CLSI. User Verification of Performance for Precision and Trueness; Approved Guideline—Second Edition. CLSI document EP15-A2.
Wayne, PA: Clinical and Laboratory Standards Institute; 2005.
13
ISO. Quality management systems – Fundamentals and vocabulary. ISO 9000. Geneva: International Organization for Standardization; 2000.
14
ISO. Medical laboratories – Requirements for safety. ISO 15190. Geneva: International Oragnization for Standardization; 2003.
15
ISO. Safety aspects – Guidelines for their inclusion in standards. ISO/IEC Guide 51. Geneva: International Organization of
Standardization; 1999.
16
ISO. Medical laboratories – Particular requirements for quality and competence. ISO 15189. Geneva: International Organization for
Standardization; 2003.
17
Linnet K. The exponentially weighted moving average (EWMA) rule compared with traditionally used quality control rules. Clin Chem Lab
Med. 2006;44:396-399.
18
Miller WG. Specimen materials, target values and commutability for external quality assessment (proficiency testing) schemes. Clin Chim
Acta. 2003;327:25-37.
19
Miller WG, Myers GL, Rej R. Why commutability matters. Clin Chem. 2006;52:553-554 (editorial).
20
CLSI/NCCLS. Preparation and Validation of Commutable Frozen Human Serum Pools as Secondary Reference Materials for Cholesterol
Measurement Procedures; Approved Guideline. CLSI/NCCLS document C37-A. Wayne, PA: NCCLS; 1999.
21
Lawson NS, Williams TL, Long T. Matrix effects and accuracy assessment. Identifying matrix-sensitive methods from real-time
proficiency testing data. Arch Pathol Lab Med. 1993;117:401-411.
22
Long T. Statistical power in the detection of matrix effects. Arch Pathol Lab Med. 1993;387-392.
23
Jennings TA. Effect of formulation on lyophilization, part 2. IVD Technology, March 1998. http://www.devicelink.com/ivdt/archive/97/
03/007.html. Accessed 7 March 2008.
24
Kenny D, Fraser CG, Hyltoft Petersen P, Kallner A. Strategies to set global analytical quality specifications in laboratory medicine.
Consensus agreement. Scand J Clin Lab Invest. 1999;59:585.
25
Larsen ML, Fraser CG, Petersen PH. A comparison of analytical goals for haemoglobin A1c assays derived using different strategies. Ann
Clin Biochem. 1991;28:272-278.
©

Volume 28 C54-A
26
Jenny RW. Analytical goals for determination of theophylline concentration in serum. Clin Chem. 1991;37:154-158.
27
Petersen PH, Hørder M. Ways of assessing quality goals for diagnostic tests in clinical situations. Arch Pathol Lab Med. 1988;112:435-443.
28
Hyltoft Petersen P, Hørder M. Influence of analytical quality on test results. Scand J Clin Lab Invest. 1992;52 Suppl 208:65-87.
29
de Verdier CH, Groth T, Hyltoft Petersen P, eds. Medical need for quality specifications in clinical laboratories. Upsala J Med Sci.
1993;98:189-491.
30
Barnett RN. Medical significance of laboratory results. Am J Clin Pathol. 1968;50:671-676.
31
Elion Gerritzen WE. Analytical precision in clinical chemistry and medical decisions. Am J Clin Pathol. 1980;73:183-195.
32
Skendzel LP, Barnett RN, Platt R. Medical useful criteria for analyte performance of laboratory tests. Am J Clin Pathol. 1985;83:200-205.
33
Thue G, Sandberg S, Fugelli P. Clinical assessment of haemoglobin values by general practitioners related to analytical and biological
variation. Scand J Clin Lab Invest. 1991;51:453-459.
34
Petersen PH, Larsen ML, Horder M. Prerequisites for the maintenance of a certain state of health by biochemical monitoring. In: Harris EK,
Yasada T, eds. Maintaining a Healthy State Within the Individual. Amsterdam: Elsevier; 1987:147-158.
35
Petersen PH, Fraser CG, Westgard JO, Larsen ML. Analytical goal-setting for monitoring patients when two analytical methods are used.
Clin Chem. 1992;38:2256-2260.
36
Fraser CG. Biological Variation: From Principles to Practice. Washington, DC: AACC Press; 2001.
37
Westgard QC. Desirable Specifications for Total Error, Imprecision, and Bias, Derived From Biologic Variation.
http://www.westgard.com/biodatabase1.htm. Accessed 7 March 2008.
38
Cotlove E, Harris EK, Williams GZ. Components of variation in long-term studies of serum constituents in normal subjects. III.
Physiological and medical implications. Clin Chem. 1970;16:1028-1032.
39
Fraser CG, Hyltoft Petersen P. Desirable standards for laboratory tests if they are to fulfill medical needs. Clin Chem. 1993;39:1447-1455.
40
Fraser CG, Hyltoft Petersen P, Ricos C, Haeckel R. Proposed quality specifications for the imprecision and inaccuracy of analytical systems
for clinical chemistry. Eur J Clin Chem Clin Biochem. 1992;30:311-317.
41
Klee GG. Tolerance limits for short-term analytical bias and analytical imprecision derived from clinical assay specificity. Clin Chem.
1993;39:1514-1518.
42
Stöckl D, Baadenhuijsen H, Fraser CG, Libeer JC, Hyltoft Petersen P, Ricós C. Desirable routine analytical goals for quantities assayed in
serum (Discussion paper from the members of the EQA Working Group A on analytical goals in laboratory medicine). Eur J Clin Chem
Clin Biochem. 1995;33:157-169.
43
Ricos C, Iglesias N, Garcia-Lario JV, et al. Within subject biological variation in disease: collated data and clinical consequences. Ann Clin
Biochem. 2007;44:343-352.
44
Ramseyer GC, Tcheng T. The robustness of the studentized range statistic to violations of the normality and homogeneity of variance
assumptions. Amer Edu Res J. 1973;10:235-240.
45
Ross JW, Miller WG, Myers GL, Praestgaard J. The accuracy of laboratory measurements in clinical chemistry: a study of eleven routine
analytes in the College of American Pathologists Chemistry Survey with fresh frozen serum, definitive methods and reference methods.
Arch Pathol Lab Med. 1998;122:587-608.
46
CLSI/NCCLS. Point-of-Care Blood Glucose Testing in Acute and Chronic Care Facilities; Approved Guideline—Second Edition.
CLSI/NCCLS document C30-A2. Wayne, PA: NCCLS; 2002.
47
Stockl D, Dewitte K, Fierens C, Thienpont LM. Evaluating clinical accuracy of systems for self-monitoring of blood glucose by error grid
analysis. Diabetes Care. 2000;23:1711.
48
Parkes JL, Slatin SL, Pardo S, Ginsberg BH. A new consensus error grid to evaluate the clinical significance of inaccuracies in the
measurement of blood glucose. Diabetes Care. 2000;23:1143-1148.
©

Number 19 C54-A
Appendix A. Worked Examples
Example 1
A laboratory director wanted to evaluate the comparability of AST measurements between two analyzers.
• Precision Estimates
Precision estimates were derived from long-term QC statistics, as follows:
Control 1 Control 2
Analyzer
Mean, U/L CV, % Mean, U/L CV, %
Analyzer A 40.37 1.22% 208.89 1.90%
Analyzer B 41.52 1.74% 211.74 2.00%
Mean = 40.95 Pooled CV = 1.50% Mean = 210.32 Pooled CV = 1.95%†
*
*
Control 1 pooled CV = ([{1.22%}2 + {1.74%}2]/2)1/2 = √2.26% = 1.50%
†
Since the respective CVs of the two analyzers differed by less than a factor of 2 (Control 1: 1.22% vs
1.74%; Control 2: 1.90% vs 2.00%), the range test protocol was used.
• Sample Selection
Sample ranges were calculated as ± 20% of the respective pooled means of the controls, or 32.8 to 49.1
U/L and 168.3 to 252.4 U/L. Samples with initial values of 37.8 (sample 1) and 245.1 U/L (sample 2) on
analyzer A were selected for the comparison test.
• Acceptability Criteria
To determine acceptability criteria following the hierarchy in Section 7, the laboratory director noted that
recommendations based on clinical outcome studies do not exist and key clinicians in the institution did
not have specific recommendations, but within-subject, biological variability has been estimated to be
11.9%. Using the desirable bias goal, the allowable difference, or critical difference, was calculated as
approximately one-third of the within-subject variability, or 11.9%/3 = 3.97%.
• Number of Replicates
For both sample 1 (pooled CV = 1.50%) and sample 2 (pooled CV = 1.95%), the rows and columns of the
critical differences table corresponding to a comparison of two methods with a pooled analytical CV of
between 1.0% and 2.0% was located.
©

Volume 28 C54-A
Appendix A. (Continued)
From Appendix B. Critical Differences (%) for the Range Test
Analytical CV (%)
Methods Replicates 1 2 3
2 2 4.298953 8.597906 12.89686

2 3 2.266968 4.533936 6.800903
2 4 1.730228 3.460456 5.190683
2 5 1.458445 2.91689 4.375335
The selected critical difference of 3.97% lies between the tabulated critical difference values for three
replicates (2.266968% < 3.97% < 4.533936%), indicating that running three replicate comparisons of the
two samples would provide a power of 80% for detecting a difference between the two analyzers.
• Comparison Data
The samples were analyzed in triplicate with the following results:
Sample 1 Sample 2
Replicate
Analyzer A Analyzer B Analyzer A Analyzer B
1 37.8 38.6 245.1 240.2
2 38.5 38.4 238.1 243.7
3 38.2 40.4 242.9 246.0
Mean 38.17 39.13 242.03 243.30
Statistic Sample 1 Sample 2

Grand Mean (38.17+39.13)/2 = 38.65 (242.03+243.3)/2 = 242.67
Range 39.13−38.17 = 0.96 243.30−242.03 = 1.27
Range, % (0.96/38.65) • 100 = 2.48% (1.27/242.67) • 100 = 0.52%
Critical
3.97% 3.97%
Difference
Status Pass Pass
• Conclusion
Because the observed ranges were less than the critical difference for both analyte concentrations, the
laboratory director concluded that the comparability of the methods is acceptable.
©

Number 19 C54-A
Example 2
A laboratory director wanted to evaluate the comparability of white blood cell (WBC) measurements
among three analyzers.
Control 1 Control 2
Analyzer 3 3
Mean, x10 /μL CV, % Mean, x10 /μL CV, %
Analyzer A 3.53 2.29% 20.34 1.62%
Analyzer B 2.61 2.40% 17.78 2.04%
Analyzer C 2.88 3.46% 18.69 2.33%
*
NOTE: Analyzer A is the laboratory’s main analyzer. Due to differences in capabilities,

analyzers B and C require a different set of controls than analyzer A, so the control means are not
directly comparable.
*
Control 1 pooled CV = ([{2.29%}2 + {2.40%}2 + {3.46%}2]/3)1/2 = √7.66% = 2.77%
†
Control 2 pooled CV = ([{1.62%}2 + {2.04%}2 + {2.33%}]/3)1/2 = √4.07% = 2.02%
Since the respective CVs of the three analyzers differed by less than a factor of 2 (Control 1: 2.29% vs
2.40% vs 3.46%; Control 2: 1.62% vs 2.04% vs 2.33%), the range test protocol was used.
Sample ranges were calculated as ±20% of the respective pooled means of the controls, or 2.4 - 3.6 x
103/μL and 15.1 - 22.7 x 103/μL. Samples with initial values of 3.4 (sample 1) and 20.3 x 103/μL (sample
2) on analyzer A were selected for the comparison test.
In the absence of acceptability criteria based on clinical outcomes or clinician consensus, the laboratory
director identified a criterion based on biological variability that specifies an allowable difference of
3.63% (CVi = 10.9%). At a WBC count of 3.0, this criterion would only tolerate a range of ±0.1 units. At
a WBC count of 20, the tolerable range would be ±0.7 units. This was considered too stringent for routine
hematology testing. The laboratory director determined that the only practical level of acceptance criteria
available was based on goals set by regulatory authorities. In the United States, CLIA regulations set a PT
performance goal of 15% of the target value, while the German Medical Association maximal permissible
deviation is 13%. The laboratory director selected 15% as the critical difference.
Sample 1: From the critical differences table, it was determined that for a three-method comparison, with
a pooled CV of 2.77%, the critical difference of 15% falls between the lower boundary for n=1
(16.66157%) and the upper boundary of n=2 replicates (12.53629%). Therefore, the laboratory director
©

Volume 28 C54-A
had the option to perform singlet or duplicate testing. Selection of a singlet comparison would result in a
greater probability of falsely accepting method comparability (ie, less power to detect a difference).
Selection of a two-replicate comparison would result in a higher probability of falsely rejecting method
comparability, but a lower probability of falsely accepting method comparability (ie, improved power to
detect a difference). The laboratory director elected to conduct the comparison with singlet testing, since
the selected critical difference of 15% was closer to 16.66157%.
Sample 2: For the same reason as for sample 1, singlet testing was performed.

Pooled CV (%)
3 1 8.330783 16.66157 24.99235
3 2 4.178763 8.357526 12.53629
3 3 2.505236 5.010471 7.515707
3 4 1.974246 3.948492 5.922738
3 5 1.687305 3.37461 5.061915
• Comparison Data
The samples were analyzed with the following results:
Sample 1 Sample 2
Replicate Analyzer A Analyzer B Analyzer C Analyzer A Analyzer B Analyzer C
1 3.4 3.0 3.2 20.3 19.5 24.4

Grand Mean (3.4+3.0+3.2)/3 = 3.20 (20.3+19.5+24.4)/3 = 21.40
Range 3.4–3.0 = 0.40 24.4–19.5 = 4.90
Range, % (0.40/3.20) • 100 = 12.50% (4.90/21.40) • 100 = 22.90%
Critical
15.00% 15.00%
Difference
Status Pass Fail
Because the observed range with sample 2 exceeded the critical difference, the comparison was
recalculated after excluding the most extreme value (from analyzer C).
©

Number 19 C54-A
A-B Comparison
Statistic Sample 2
Grand Mean (20.3+19.5)/2 = 19.90
Range 20.3–19.5 = 0.80
Range, % (0.80/19.9)•100 = 4.02%
Critical
15.00%
Difference
Status Pass
• Conclusion
The comparability of all three methods was acceptable with sample 1. With sample 2, the comparability
of all three methods was unacceptable; however, when results from method C were excluded, the
comparability of methods A and B was acceptable. The laboratory director should institute further
evaluation or corrective actions with method C.
Example 3
A laboratory director wanted to evaluate the comparability of prothrombin time (PT) results between a
main analyzer and backup analyzer. Previous data analyses showed that at an approximate PT value of 13
seconds, the backup analyzer produces results that are about 10% longer than the main analyzer; and at an
approximate PT value of 25 seconds, the backup analyzer’s results are about 15% longer. Both analyzers
have an imprecision of 3% at both levels.
Sample ranges were calculated as 13 seconds ± 20%, or 10.4 to 15.6 seconds, and 25 seconds ± 20%, or
20.0 to 30.0 seconds. Samples with initial values of 14.2 (sample 1) and 27.1 seconds (sample 2) on the
main analyzer were selected for the comparison test.
The laboratory director noted that recommendations based on clinical outcome studies do not exist. The
laboratory director consulted clinicians with expertise in the interpretation of coagulation tests, and their
consensus judgment was that a critical difference of 15% was appropriate.
Samples 1 and 2: For a comparison of two methods, each with a CV of 3%, the acceptability criterion of
15% is greater than the critical difference table entry of 12.89686%; therefore, the laboratory director
determined that singlet testing was adequate for the comparisons.
©

Volume 28 C54-A

Analytical CV (%)
2 2 4.298953 8.597906 12.89686

2 3 2.266968 4.533936 6.800903
2 4 1.730228 3.460456 5.190683
2 5 1.458445 2.91689 4.375335
• Comparison Data
The samples were analyzed with the following results:
Sample 1 Sample 2
Replicate
1 14.2 15.1 27.1 32.3

Grand Mean (14.20 +15.10)/2 = 14.65 (32.30+27.10)/2 = 29.7
Range 15.10−14.20 = 0.90 32.30−27.10 = 5.20
Range, % (0.90/14.65)•100 = 6. 14% (5.20/29.70)•100 = 17.51%
Expected
10.0% 15.0%
Difference
Absolute
Adjusted │6.14 – 10.0 │= 3.86%* │17.51–15│ = 2.51%*
Range
Critical
15.0% 15.0%
Difference
Status Pass Pass
*
Note that the ranges were adjusted by subtracting the expected differences and taking the
absolute value of the results.
• Conclusion
Because the absolute value of the adjusted range was less than the critical difference, the comparability of
the methods was considered acceptable for both samples, given the known underlying differences.
©

Number 19 C54-A
Example 4
A laboratory director wanted to compare the measurement of mean cellular volume (MCV) by two
hematology analyzers of the same model.
Control 1 Control 2
Analyzer
Mean, fL CV, % Mean, fL CV, %
Analyzer A 69.96 1.10% 88.50 0.90%
Analyzer B 70.41 1.00% 88.00 1.00%
*
*
†
Since the respective CVs of the two analyzers differed by less than a factor of 2 (Control 1: 1.10% vs
1.00%; Control 2: 0.09% vs 1.00%), the range test protocol was used.
Since the precision profiles of both instruments in the range of the two controls are essentially constant,
and use of the ±20% criterion would have produced overlapping sample ranges, samples as close as
possible to the mean control values were selected for the comparison tests. Samples with initial values of
70.4 fL (sample 1) and 87.5 fL (sample 2) on analyzer A were selected for the comparison.
Following the hierarchy in Section 7, the laboratory director was unable to identify any acceptance
criteria based on clinical outcomes, and key clinicians in the institution did not have specific
recommendations. Within-subject, biological variability has been estimated to be 1.7%, which would
specify a critical difference of 1.7/3 = 0.6%. This criterion was determined to be impractical, since the
instrument CVs were greater than the allowable percent difference. Based on an internal study at their
laboratory, it was determined that the MCV values of a nontransfused, individual hospitalized patient
during any hospital stay rarely varied by more than 3%. Therefore, the laboratory director selected a
critical limit of 3.0% for the comparison test.
A pooled CV of 1% was used for the comparisons, since the pooled CVs of the two controls were
approximately 1%. From the critical differences table, a critical cutoff of 3.0% falls between two and
three replicates for a comparison of two methods with a pooled CV of 1% (2.266968 < 3.0 < 4.298953).
Three replicates were chosen for the comparisons, since the critical limit of 3% was closer to the table
value of 2.266968; and the laboratory director desired greater power to detect a difference, since this was
a periodic comparison monitoring event.
©

Volume 28 C54-A

Analytical CV (%)
2 2 4.298953 8.597906 12.89686

2 3 2.266968 4.533936 6.800903
2 4 1.730228 3.460456 5.190683
2 5 1.458445 2.91689 4.375335
• Comparison Data
The sample was analyzed with the following results:
Replicate Sample 1 Sample 2

1 70.4 70.5 87.50 88.2
2 70.5 70.6 87.40 88.4
3 70.5 70.9 87.40 88.5
Mean 70.47 70.67 87.43 88.37

Grand Mean (70.47 + 70.67)/2 = 70.57 (88.37 + 87.43)/2 = 87.90
Range 70.67 – 70.47 = 0.20 88.37 – 87.43 = 0.94
Range, % (0.20/70.57)•100 = 0.28% (0.94/87.90)•100 = 1.07%
Critical
Difference 3.00% 3.00%
Status Pass Pass
• Conclusion
Because the ranges were less than the critical differences, the comparability of methods was considered
acceptable.
©

Number 19
Appendix B. Table of Critical Differences (%) for the Range Test*
34
Analytical CV (%)
Methods Replicates 1 2 3 4 5 6 7 8 9 10 15 20 25
2 2 4.298953 8.597906 12.89686 17.19581 21.49476 25.79372 30.09267 34.39162 38.69058 42.98953 64.48429 85.97906 107.4738
2 3 2.266968 4.533936 6.800903 9.067871 11.33484 13.60181 15.86877 18.13574 20.40271 22.66968 34.00452 45.33936 56.67419
2 4 1.730228 3.460456 5.190683 6.920911 8.651139 10.38137 12.11159 13.84182 15.57205 17.30228 25.95342 34.60456 43.2557
2 5 1.458445 2.91689 4.375335 5.83378 7.292225 8.75067 10.20912 11.66756 13.12601 14.58445 21.87668 29.1689 36.46113
3 1 8.330783 16.66157 24.99235 33.32313 41.65391 49.9847 58.31548 66.64626 74.97704 83.30783 124.9617 166.6157 208.2696
3 2 4.178763 8.357526 12.53629 16.71505 20.89381 25.07258 29.25134 33.4301 37.60887 41.78763 62.68144 83.57526 104.4691
3 3 2.505236 5.010471 7.515707 10.02094 12.52618 15.03141 17.53665 20.04188 22.54712 25.05236 37.57853 50.10471 62.63089
3 4 1.974246 3.948492 5.922738 7.896984 9.871231 11.84548 13.81972 15.79397 17.76821 19.74246 29.61369 39.48492 49.35615
3 5 1.687305 3.37461 5.061915 6.749221 8.436526 10.12383 11.81114 13.49844 15.18575 16.87305 25.30958 33.7461 42.18263
4 1 6.824526 13.64905 20.47358 27.29811 34.12263 40.94716 47.77169 54.59621 61.42074 68.24526 102.3679 136.4905 170.6132
4 2 4.070855 8.14171 12.21257 16.28342 20.35428 24.42513 28.49599 32.56684 36.6377 40.70855 61.06283 81.4171 101.7714
4 3 2.614709 5.229419 7.844128 10.45884 13.07355 15.68826 18.30297 20.91768 23.53239 26.14709 39.22064 52.29419 65.36774
©
4 4 2.09933 4.19866 6.29799 8.39732 10.49665 12.59598 14.69531 16.79464 18.89397 20.9933 31.48995 41.9866 52.48325
Clinical and Laboratory Standards Institute. All rights reserved.
4 5 1.809468 3.618936 5.428403 7.237871 9.047339 10.85681 12.66627 14.47574 16.28521 18.09468 27.14202 36.18936 45.2367
5 1 6.287027 12.57405 18.86108 25.14811 31.43513 37.72216 44.00919 50.29622 56.58324 62.87027 94.3054 125.7405 157.1757
5 2 4.011505 8.02301 12.03451 16.04602 20.05752 24.06903 28.08053 32.09204 36.10354 40.11505 60.17257 80.2301 100.2876
5 3 2.687157 5.374315 8.061472 10.74863 13.43579 16.12294 18.8101 21.49726 24.18442 26.87157 40.30736 53.74315 67.17893
5 4 2.183492 4.366985 6.550477 8.733969 10.91746 13.10095 15.28445 17.46794 19.65143 21.83492 32.75239 43.66985 54.58731
5 5 1.892544 3.785088 5.677632 7.570175 9.462719 11.35526 13.24781 15.14035 17.03289 18.92544 28.38816 37.85088 47.3136
6 1 6.032903 12.06581 18.09871 24.13161 30.16451 36.19742 42.23032 48.26322 54.29612 60.32903 90.49354 120.6581 150.8226
6 2 3.979847 7.959693 11.93954 15.91939 19.89923 23.87908 27.85893 31.83877 35.81862 39.79847 59.6977 79.59693 99.49617
6 3 2.742547 5.485095 8.227642 10.97019 13.71274 16.45528 19.19783 21.94038 24.68293 27.42547 41.13821 54.85095 68.56369
6 4 2.24721 4.49442 6.74163 8.98884 11.23605 13.48326 15.73047 17.97768 20.22489 22.4721 33.70815 44.9442 56.18025
6 5 1.955509 3.911018 5.866527 7.822036 9.777545 11.73305 13.68856 15.64407 17.59958 19.55509 29.33263 39.11018 48.88772
7 1 5.895309 11.79062 17.68593 23.58124 29.47655 35.37186 41.26717 47.16247 53.05778 58.95309 88.42964 117.9062 147.3827
C54-A
7 2 3.963844 7.927687 11.89153 15.85537 19.81922 23.78306 27.7469 31.71075 35.67459 39.63844 59.45765 79.27687 99.09609
7 3 2.787998 5.575996 8.363994 11.15199 13.93999 16.72799 19.51599 22.30399 25.09198 27.87998 41.81997 55.75996 69.69995

Volume 28
Appendix B. (Continued)
Analytical CV (%)
Methods Replicates 1 2 3 4 5 6 7 8 9 10 15 20 25
7 4 2.298651 4.597302 6.895953 9.194603 11.49325 13.79191 16.09056 18.38921 20.68786 22.98651 34.47976 45.97302 57.46627
7 5 2.006231 4.012462 6.018694 8.024925 10.03116 12.03739 14.04362 16.04985 18.05608 20.06231 30.09347 40.12462 50.15578
8 1 5.815314 11.63063 17.44594 23.26126 29.07657 34.89188 40.7072 46.52251 52.33783 58.15314 87.22971 116.3063 145.3828
8 2 3.957097 7.914194 11.87129 15.82839 19.78549 23.74258 27.69968 31.65678 35.61387 39.57097 59.35646 79.14194 98.92743
8 3 2.826834 5.653668 8.480503 11.30734 14.13417 16.96101 19.78784 22.61467 25.44151 28.26834 42.40251 56.53668 70.67086
8 4 2.341876 4.683752 7.025628 9.367504 11.70938 14.05126 16.39313 18.73501 21.07688 23.41876 35.12814 46.83752 58.5469
8 5 2.048712 4.097424 6.146137 8.194849 10.24356 12.29227 14.34099 16.3897 18.43841 20.48712 30.73068 40.97424 51.2178
9 1 5.767266 11.53453 17.3018 23.06906 28.83633 34.6036 40.37086 46.13813 51.90539 57.67266 86.50899 115.3453 144.1817
9 2 3.956059 7.912118 11.86818 15.82424 19.78029 23.73635 27.69241 31.64847 35.60453 39.56059 59.34088 79.12118 98.90147
9 3 2.860891 5.721783 8.582674 11.44357 14.30446 17.16535 20.02624 22.88713 25.74802 28.60891 42.91337 57.21783 71.52228
9 4 2.3792 4.7584 7.137601 9.516801 11.896 14.2752 16.6544 19.0336 21.4128 23.792 35.688 47.584 59.48
9 5 2.085263 4.170526 6.255788 8.341051 10.42631 12.51158 14.59684 16.6821 18.76736 20.85263 31.27894 41.70526 52.13157
10 1 5.738386 11.47677 17.21516 22.95354 28.69193 34.43031 40.1687 45.90709 51.64547 57.38386 86.07579 114.7677 143.4596
10 2 3.958657 7.917314 11.87597 15.83463 19.79329 23.75194 27.7106 31.66926 35.62791 39.58657 59.37986 79.17314 98.96643
10 3 2.891302 5.782605 8.673907 11.56521 14.45651 17.34781 20.23912 23.13042 26.02172 28.91302 43.36954 57.82605 72.28256
10 4 2.412071 4.824141 7.236212 9.648283 12.06035 14.47242 16.88449 19.29657 21.70864 24.12071 36.18106 48.24141 60.30177
10 5 2.117339 4.234677 6.352016 8.469354 10.58669 12.70403 14.82137 16.93871 19.05605 21.17339 31.76008 42.34677 52.93346
*
Instructions for using the table of critical differences to determine the number of replicate analyses to perform:
Start with the rows in the table that correspond to the number of measurement systems that will be compared. If the pooled CV calculated in Section 10.3
is a whole number, find the column for that CV value; then, locate the tabular value in that column that is closest to the critical difference set in Section
10.7. Identify the number of replicates that correspond to that tabular value. If the pooled CV calculated in Section 10.3 is not a whole number (eg, 2.5),
cross-reference the rows for the number of measurement systems to be compared with the columns that bracket the pooled CV (eg, columns with headings
of 1.0 and 2.0 from Example 1). Next, identify the row that corresponds to the number of replicate analyses that produce critical differences that bracket
the critical difference from Section 10.7 (eg, three replicates on two instruments for a pooled CV between 1.0 and 2.0 produces critical differences of
2.266968% and 4.533936%, which bracket the calculated critical difference of 3.97% from Example 1). Perform the number of replicate analyses
indicated by that row. NOTE: The critical difference values in the table have been calculated assuming an alpha error (Type I error) of 0.05 and a power
C54-A
of 80% (ie, Type II error of 0.2).
35

Number 19 C54-A
Appendix C. Statistical Concepts
C1. Hypothesis Testing
Hypothesis testing is a statistical tool for using data to make inferences or draw conclusions. For example,
in this document, procedures are described for using hypothesis testing to draw conclusions about the
comparability of two or more laboratory methods. For purposes of this discussion, methods are
instruments or assay systems that may or may not be of the same make and model, or may or may not be
in the same laboratory. In general terms, a hypothesis test has three components: 1) a statistic with a
known, estimated, or assumed probability distribution; 2) a hypothesis about a population or situation
represented by the statistic; and 3) a critical value, or decision limit, against which the statistic is
compared to make an inference about the validity of the hypothesis.
The statistic in a hypothesis test is a number calculated from data obtained through observations,
experiments, surveys, etc. For method comparison hypothesis tests, data are generated from analysis of
specimens by two or more laboratory methods. The statistics of interest are related to ranges, means, and
variances. Under sets of basic assumptions, the probability distributions of the statistics are well
characterized.
A hypothesis is a simple statement about a situation of interest. For method comparisons, the hypotheses
are statements such as the means of the underlying populations of the data sets are equal; the difference
between the population means is zero, etc. The hypothesis to be tested is often referred to as the null
hypothesis. Its converse is called the alternative hypothesis.
The critical value is a number (or pair of numbers) that defines the limit (or limits) beyond which it
would be unlikely to obtain a value of the statistic if the null hypothesis is true. If the statistic is beyond
(more extreme than) the critical value, the null hypothesis is rejected, or inferred not to be true. If the
statistic does not exceed the critical value, the null hypothesis is not rejected, or inferred not to be false.
The critical value is derived from two factors: 1) the probability distribution of the statistic; and 2) the
significance level. The significance level is the probability of falsely (incorrectly) rejecting the null
hypothesis when it is actually true. Commonly used, though somewhat arbitrary, significance levels are
5% and 1%.
Figure C1 illustrates the concepts of hypothesis testing. The upper curve illustrates the probability
distribution of a statistic when the null hypothesis is true. The critical values are selected so the area of the
shaded regions equals the significance level. If the value of the statistic generated by a method
comparison study falls between the critical values, then the null hypothesis is not rejected. If the value of
the statistic falls beyond the critical values, then the null hypothesis is rejected. The area of the shaded
region equals the probability of falsely rejecting the null hypothesis. Note that in this figure, the
probability distribution of the statistic is represented as normal. Many statistics have non-normal
distributions, but the same concepts apply.
36 ©

Volume 28 C54-A
Appendix C. (Continued)
Figure C1. Illustration of Hypothesis Testing and Power Determination. See text for details.
C2. Power
Power is defined as the probability of correctly rejecting the null hypothesis when it is false. Power
depends on both the critical value (which is related to the significance level, as discussed above) and the
“degree of incorrectness” of the null hypothesis. Figure C1 illustrates both of these points. The upper
curve shows the probability distribution of the statistic under the null hypothesis. The lower curve shows
the probability distribution of the statistic under one instance of the alternative hypothesis. Note that the
distribution of the statistic under the alternative hypothesis is “shifted,” but the critical values do not
change. Consequently, the probability of rejecting the null hypothesis is greater under the alternative
hypothesis than under the null hypothesis. This probability is the power.
Power is affected by the critical values. If the significance level is changed so the critical values are
moved farther into the tails of the distribution or closer to the center, the probability of rejection will
decrease or increase, respectively, under both the null and alternative hypotheses. Thus, significance level
and power change in the same direction.
Power is also affected by the “degree of incorrectness” of the null hypothesis. In the illustration, if the
distribution of the statistic under the alternative hypothesis is shifted left, then power decreases; if it is
shifted right, power increases. In other words, the greater the difference between the mean of the actual
distribution (alternative hypothesis) and the assumed distribution (null hypothesis), the higher the
probability of correctly rejecting the null hypothesis, or the greater the power.
It should be noted that power is also influenced by the variance of the test methods and the sample size.
The probability of detecting a true difference between test methods is higher for more precise methods
than for less precise methods, and power increases with the number of measurements used in the
hypothesis test. Power curves are produced by calculating the power for a range of values for the
alternative hypothesis and plotting the pairs. Examples of power curves are shown below in Section C7.
©

Number 19 C54-A
C3. Type I and Type II Errors
In hypothesis testing, two types of errors may be made. Type I error, or alpha error, occurs if the null
hypothesis is rejected when it is actually true. Type II error, or beta error, occurs if the null hypothesis is
accepted when it is actually false. The probability of a Type I error is controlled by the selection of the
significance level, usually represented as “α.” The probability of a Type II error is usually represented by
“β” and is a function of four factors: the significance level (α), the degree of incorrectness of the null
hypothesis, the variance of the test methods, and the sample size. These same factors influence power, as
described above, and, actually, the power of a hypothesis test is 1 – β.
A hypothesis test may be designed to provide desired probabilities for Type I and Type II errors or, in
other words, for desired significance levels and power. Usually, in hypothesis testing, the probability of
Type I error is known because a particular significance level, α, is selected; but it is important to estimate
β as well, to avoid running a hypothesis test with little probability of detecting important differences
between methods.
Consider the following example: Suppose a specimen with an analyte concentration of 2.5 mmol/L is
tested once by each of two methods with analytical coefficients of variation of 2%. Suppose one sets α =
0.05; that is, one wants the probability of Type I error to be only 0.05 (ie, 1 in 20 chance). Suppose also
that one wants the probability of β error to be only 0.10 when the true difference between the methods is
0.139. What should be the decision rule? Without going through all the calculations, suffice it to say that
if the decision rule is based on the alpha error rate, the null hypothesis will be rejected if the difference
between the two measurements is greater than 0.139, but if the decision rule is based on the beta error
rate, the null hypothesis will be rejected if the difference between the two measurements is greater than
0.187. So, in this case, both the Type I and Type II goals cannot be achieved with this hypothesis test, so
the relative consequences of Type I and Type II errors must be considered when selecting the decision rule.
C4. Multiple Trials
A trial consists of performing any of the hypothesis testing procedures one time. The significance levels
and power calculations shown with the procedures and examples apply to a single trial. It is often
desirable to perform multiple trials. For example, one may wish to compare two methods on a daily,
weekly, or monthly basis. If the probability of rejecting the null hypothesis in a single trial is p, then the
probability of rejecting the null hypothesis at least once in n trials is given by the formula 1 − (1 − p ) .
n
Figure C2 illustrates the relationship between the probability of rejection in a single trial and the
probability of at least one rejection in multiple trials.
One important consequence of this relationship is that even when the null hypothesis is true (ie, the
methods are equivalent), the likelihood of rejection becomes high if enough trials are performed. For
example, if the significance level is 5% when there is no difference between the methods, the probability
of falsely rejecting the null hypothesis in a single trial is only 5%, but the probability of rejecting the null
hypothesis at least once in 10 trials is 40%.
A second important consequence is that the power to detect a difference between methods increases with
multiple trials, for the same reason. If a procedure has a power of 20% for detecting a certain difference,
then the probability of detecting this difference (ie, correctly rejecting the null hypothesis) in one trial is
20%; but the probability of detection (ie, correctly rejecting the null hypothesis at least once) in 10 trials
is 89%. Another way to view this result is that if a trial has power of 20%, it will take 10 trials to obtain
roughly a 90% chance of detecting a difference between methods.
38 ©

Volume 28 C54-A
1.00
0.90
0.80
Probability of at Least One Rejection
0.70
0.60 p=0.30
p=0.20
0.50 p=0.10
p=0.05
0.40 p=0.01
0.30
0.20
0.10
0.00
0 10 20 30 40 50
Trials
Figure C2. Relationship Between the Probability of Rejection in a Single Trial, p, and the
Probability of at Least One Rejection in Multiple Trials. See text for details.
C5. Multiple Specimens
The test procedures in Section 10 provide for comparing a single split-specimen between two or more
laboratory methods. Obviously, when only a single specimen is used, the methods are compared at only
one concentration of analyte. Even if the methods are shown to be equivalent at that one concentration, it
does not necessarily follow that they are equivalent at other concentrations. Use of multiple specimens
covering a range of analyte concentrations provides a more informative comparison of methods.
In a multiple-specimen comparison, the selected test procedure is applied separately to the data from each
specimen. For example, if five specimens are being compared between two methods, a range test is
conducted with the data from the first specimen and the null hypothesis is accepted or rejected. Next, a
range test is conducted on the second specimen and the null hypothesis is accepted or rejected, and so on
through the fifth specimen. If the null hypothesis is accepted in all five range tests, then the methods are
considered comparable over the range of analyte concentrations in the five specimens. On the other hand,
if the null hypothesis is rejected in any of the range tests, then the methods are not considered
comparable.
©

Number 19 C54-A
Using multiple specimens is the same as conducting multiple trials, described above. The effective (or
overall) significance level, or probability of false rejection, increases. For example, if five specimens are
compared and the significance level for each hypothesis test is 5%, then the effective significance level is
the probability of at least one rejection in five trials, or 23%, as shown in Figure C2. It is evident that the
probability of a false rejection in several comparisons of multiple specimens could be very high.
The effective power also increases when multiple specimens are used, for the same reason that the
effective significance level increases. For example, if a test procedure has power of 60% for detecting a
designated difference between methods when a single specimen is compared, the power increases to 99%
when five specimens are compared.
C6. Number of Replicates
The range test may be performed with either singlet or replicate analyses. The effect of the number of
replicates is illustrated by the power curves in Section C7. The potential benefit of using replicate
analyses is an increase in power for detecting an absolute difference between methods. For example,
assume that the SD of the methods to be compared is 0.5 units and that one is interested in detecting a
difference of 1.5 units between methods. Further assume that the SD was estimated with 20 degrees of
freedom. If singlet analyses are performed, then 1.5 units are 3 SDs and the power for detecting a 3-SD
difference is about 50%, as shown in the first set of power curves in Section C7. If triplicate analyses are
performed, the SD becomes 0.5 3 = 0.29 . The absolute difference of 1.5 units becomes 5.2 SDs, and
the power increases to >90%. The greater the number of replicates, the smaller the absolute difference
that can be reliably detected by the range test.
C7. Power Curves
This appendix shows selected power curves to illustrate the influence of various parameters on the power
of the range test procedure described in this document.
Power Curves for Range Test as a Function of Replicates, Assuming Two

Methods with 3.0% CV
1.0
0.8
0.6
Power
0.4
5 Reps
0.2 4 Reps
3 Reps
2 Reps
0.0
0.0 1.0 2.0 3.0 4.0 5.0 6.0
Difference Between Means, SDs
Figure C3. Range Test: Influence of Number of Replicates
Power increases with increasing number of replicates.
40 ©

Volume 28 C54-A
Power Curves for Range Test as a Function of CV, Assuming Two

Methods and 3 Replicates
1.0
0.8
0.6
Power
0.4
2% CV
0.2 3% CV
4% CV
5% CV
0.0
0.0 1.0 2.0 3.0 4.0 5.0 6.0
Figure C4. Range Test: Influence of CV
Power decreases modestly as the CV of the analytical methods increases.
Power Curves for Range Test as a Function of Number of Methods

Assuming 3% CV and 3 Replicates
1.0
0.8
0.6
Power
0.4
5 Methods
0.2
4 Methods
3 Methods
2 Methods
0.0
0.0 1.0 2.0 3.0 4.0 5.0 6.0
Figure C5. Range Test: Influence of Number of Methods
Power increases when a greater number of methods are compared. These curves were generated under the
assumption that all methods give equivalent results except one, which is biased relative to the others.
©

Number 19 C54-A
C8. Comparative Power of Test Procedures
The power of the range test is compared for selected conditions.
Test Conditions 1.0 2.0 3.0 4.0 5.0 6.0

Range Test Methods=2, df=8 0.10 0.24 0.46 0.70 0.87 0.96
Significance level α = 0.05

Reps = number of replicates performed per method
Methods = number of methods
df = degrees of freedom of the variance estimate
42 ©

Volume 28 C54-A
Appendix D. Biological Variation
D1. Analytical Difference Between Two Results
Two results are said to be analytically different if the difference between them is more than could be
accounted for by the combined analytical imprecision that may be present in both results. The total
analytical imprecision (CVAT) in two results using the same method having an imprecision of CVA is
defined as:
CVAT = [CVA2 + CVA2]0.5
CVAT = [2CVA2 ]0.5
CVAT = [2]0.5 • [CVA2]0.5
CVAT = 1.41• CVA (1)
In order to be 95% confident that such a combined imprecision has been exceeded, one needs to multiply
this combined analytical CVAT with a z value corresponding to a 95% probability (1.96) to derive the
critical analytical difference (CDA) that indicates a difference between two results that is greater than
combined analytical imprecision.
CDA = 1.96 • 1.41 • CVA
CDA = 2.77 • CVA (2)
D2. Biological Difference Between Two Results
Two results are said to be biologically different if the difference between them is more than could be
accounted for by both the combined analytical imprecision that may be present in both results as well as
the combined biological variability that may occur in the parameter in a stable patient from day to day.
The total analytical imprecision (CVAIT) in two results using the same method having an imprecision of
CVA and a biological within-subject day-to-day variability of CVI is defined as:
CVAIT = [CVA2 + CVI2 + CVA2 + CVI2]0.5
CVAIT = [2CVA2 + 2CVI2 ]0.5 (3)
CVAIT = [2]0.05 • [CVA2 + CVI2]0.5
CVAIT = 1.41 • [CVA2 + CVI2]0.5
In order to be 95% confident that such a combined imprecision has been exceeded, one needs to multiply
this combined biological CVAIT with a z value corresponding to a 95% probability (1.96) to derive the
critical analytical difference (CDAI) that indicates a difference between two results that is greater than
combined analytical imprecision and within-subject biological variability.
CDAI = 1.96 • 1.41 • [CVA2 + CVI2]0.5 (4)
CDAI = 2.77 • [CVA2 + CVI2]0.5 (5)
©

Number 19 C54-A
Appendix D. (Continued)
D3. Critical Biological Difference Between Two Results Being Performed With a
Method of Desirable Analytical Imprecision
Desirable analytical imprecision has been defined by Harris1 as well as Fraser and Peterson2 as an
imprecision that is less than half the within-subject biological variability.
CVAd ≤ 0.5 CVI
If one assumes the method has desirable imprecision and substitutes this requirement into equation (4),
the critical biologically significant change with desirable imprecision CVAdIT becomes:
CVAdIT = [CVA2 + CVI2 + CVA2 + CVI2]0.5
CVAdIT = [(0.5CVI)2 + CVI2 + (0.5CVI)2 + CVI2]0.5
CVAdIT = [0.25CVI2 + CVI2 + 0.25CVI2 + CVI2]0.5
CVAdIT = [2.5CVI2]0.5
CVAdIT = [2.5]0.05 • [CVI2]0.5
CVAdIT = 1.58 • CVI
Similarly, in order to be 95% confident that such a combined imprecision has been exceeded, one needs
to multiply this combined biological CVAdIT with a z value corresponding to a 95% probability (1.96) to
derive the critical analytical difference (CDAdI) that indicates a difference between two results that is
greater than combined, but desirable analytical imprecision.
CDAdI = 1.96 • 1.58 • CVI
CDAdI = 3.10 • CVI (6)
As analytical imprecision approaches, equation (5) becomes:
CDI = 2.77 • CVI
A simple goal can be that the allowable bias between two methods (|Bias2 – Bias1| = BiasT) for monitoring
subjects is no more than the increase in the critical difference between two results attributable to the assay
variability.
BiasT ≤ CDAI − CDI
Thus, a simple goal can be that the allowable bias between two desirable methods (|Bias2d – Bias1d| =
BiasTd) for monitoring subjects is:
BiasTd ≤ 3.10 CVI – 2.77 CVI = 0.33 CVI
References for Appendix D

1
Harris EK. Statistical principles underlying analytical goal setting in clinical chemistry. Am J Clin Pathol. 1979;72:374-382.
2
Fraser CG, Petersen PH. The importance of imprecision. Ann Clin Biochem. 1991;28:207-211.
44 ©

Volume 28 C54-A
Clinical and Laboratory Standards Institute consensus procedures include an appeals process that
is described in detail in Section 8 of the Administrative Procedures. For further information,
contact CLSI or visit our website at www.clsi.org.
Summary of Comments and Subcommittee Responses

C54-P: Verification of Comparability of Patient Results Within One Health Care System; Proposed
Guideline
Section 4, Definitions
1. “PT/EQA” – There is no definition of “EQA.” Furthermore, does the committee really want to use “EQA” as
an abbreviation? First of all, “EQA” is only one letter different than “EQC,” and “EQC” is already creating a lot
of confusion in the laboratory community. (Does EQC mean ‘external’ vs ‘equivalent’ vs ‘electronic’ QC?)
Secondly, and assuming EQA means external quality assurance, does use of this term add any additional value
over and above PT alone? My recommendation would be to eliminate use of EQA.
• A definition of EQA (external quality assessment) has been added to the document. The subcommittee
has elected to retain the use of EQA because the document is intended for an international audience, and
the term “PT” (proficiency testing) is not in use universally throughout the world.
Section 7.1, Evaluation of Comparability Based on Clinical Outcomes
2. Example 7.1: I wish to point out a conceptual mistake in the example 7.1. After reporting the outcomes of the
DCCT study (<7% and >8%), the text says that the TE for HbA1c should be kept to below ±1% (absolute
HbA1c value) to avoid patient misclassification. However, considering the reported outcomes, the allowable TE
should be lower than ±0.5% in HbA1c absolute values. As a matter of fact, to correctly classify a subject with a
true HbA1c value of 7.5%, the measurement error should not exceed 0.5% (relative TE 6.66%) in order to avoid
the possibility after the HbA1c measurement to classify the individual both as a patient with a poor glycemic
control (>8%) or as a well-controlled diabetic patient (<7%).
• The subcommittee has elected not to modify the wording in the example. While the calculations of the
commenter are mathematically correct, the subcommittee believes that a strict mathematical approach
over-interprets the data presented in the Diabetes Control and Complications Trial (DCCT). The
subcommittee believes that the example, as presented, represents how a laboratory director would
interpret the results of the DCCT. In addition, the United States National Glycohemoglobin
Standardization Program (NGSP) requires a 95% confidence interval of 0.85% (absolute HbA1c value)
for the differences between a routine method and the reference method for the routine method to be
considered traceable to the DCCT reference method. The NGSP protocol specifies measurement of 40
samples run in duplicate. If this experiment was performed in singlet, as is typical in a standard
laboratory comparison, the comparable range for 95% of the values would be expected to be ~1.0%
(absolute HbA1c value).
Appendix A. Worked Examples
3. Example 3, Comparison data table: The number in the “range, %” row should be (0.90/14.65) • 100, not
(0.94/14.20) • 100.
• The entries in the table have been corrected.
©

Number 19 C54-A
The Quality Management System Approach

Clinical and Laboratory Standards Institute (CLSI) subscribes to a quality management system approach in the
development of standards and guidelines, which facilitates project management; defines a document structure via a
template; and provides a process to identify needed documents. The approach is based on the model presented in the
most current edition of CLSI/NCCLS document HS1—A Quality Management System Model for Health Care. The
quality management system approach applies a core set of “quality system essentials” (QSEs), basic to any
organization, to all operations in any health care service’s path of workflow (ie, operational aspects that define how
a particular product or service is provided). The QSEs provide the framework for delivery of any type of product or
service, serving as a manager’s guide. The QSEs are:
Documents & Records Equipment Information Management Process Improvement

Organization Purchasing & Inventory Occurrence Management Customer Service
Personnel Process Control Assessments―External & Facilities & Safety
Internal
C54-A addresses the QSEs indicated by an “X.” For a description of the other documents listed in the grid, please
refer to the Related CLSI Reference Materials section on the following page.
Purchasing &
—External &
Improvement
Organization
Management
Management
Assessments
Information
Facilities &
Occurrence
Documents
Equipment
& Records
Personnel
Inventory
Customer
Service
Internal
Control
Process
Process
Safety
C30 C30 X M29
C24
C30
C37
EP05
EP09
EP15
M29
X05
Adapted from CLSI/NCCLS document HS01—A Quality Management System Model for Health Care.
Path of Workflow
A path of workflow is the description of the necessary steps to deliver the particular product or service that the
organization or entity provides. For example, CLSI/NCCLS document GP26⎯Application of a Quality
Management System Model for Laboratory Services defines a clinical laboratory path of workflow which consists of
three sequential processes: preexamination, examination, and postexamination. All clinical laboratories follow these
processes to deliver the laboratory’s services, namely quality laboratory information.
C54-A addresses the clinical laboratory path of workflow steps indicated by an “X.” For a description of the other
documents listed in the grid, please refer to the Related CLSI Reference Materials section on the following page.
Preexamination Examination Postexamination

receipt/processing
Sample collection
Results reporting
Sample transport
Results review
and follow-up
and archiving
Interpretation
Examination
Examination
management
ordering
Sample
Sample
X C30 X X C30
C30
Adapted from CLSI/NCCLS document HS01—A Quality Management System Model for Health Care.
46 ©

Volume 28 C54-A
Related CLSI Reference Materials∗

C24-A3 Statistical Quality Control for Quantitative Measurement Procedures: Principles and Definitions;
Approved Guideline—Third Edition (2006). This guideline provides definitions of analytical intervals,
planning of quality control procedures, and guidance for quality control applications.
C30-A2 Point-of-Care Blood Glucose Testing in Acute and Chronic Care Facilities; Approved Guideline—
Second Edition (2002). This document contains guidelines for performance of point-of-care (POC) blood
glucose testing that stress quality control, training, and administrative responsibility.
C37-A Preparation and Validation of Commutable Frozen Human Serum Pools as Secondary Reference
Materials for Cholesterol Measurement Procedures; Approved Guideline (1999). This guideline details
procedures for the manufacture and evaluation of human serum pools for cholesterol measurement.
EP05-A2 Evaluation of Precision Performance of Quantitative Measurement Methods; Approved Guideline—

Second Edition (2004). This document provides guidance for designing an experiment to evaluate the
precision performance of quantitative measurement methods; recommendations on comparing the resulting
precision estimates with manufacturers’ precision performance claims and determining when such
comparisons are valid; as well as manufacturers’ guidelines for establishing claims.
EP09-A2 Method Comparison and Bias Estimation Using Patient Samples; Approved Guideline—Second Edition
(2002). This document addresses procedures for determining the bias between two clinical methods, and the
design of a method comparison experiment using split patient samples and data analysis.
EP15-A2 User Verification of Performance for Precision and Trueness; Approved Guideline—Second Edition
(2005). This document describes the demonstration of method precision and trueness for clinical laboratory
quantitative methods utilizing a protocol designed to be completed within five working days or less.
M29-A3 Protection of Laboratory Workers From Occupationally Acquired Infections; Approved Guideline—
Third Edition (2005). Based on US regulations, this document provides guidance on the risk of transmission
of infectious agents by aerosols, droplets, blood, and body substances in a laboratory setting; specific
precautions for preventing the laboratory transmission of microbial infection from laboratory instruments and
materials; and recommendations for the management of exposure to infectious agents.
X05-R Metrological Traceability and Its Implementation; A Report (2006). This document provides guidance to
manufacturers for establishing and reporting metrological traceability.
∗
Proposed-level documents are being advanced through the Clinical and Laboratory Standards Institute consensus process;
therefore, readers should refer to the most current editions.
©

Number 19 C54-A
NOTES
48 ©

Volume 28 C54-A
NOTES
©

Active Membership
(as of 1 April 2008)
Sustaining Members FDA Center for Biologics Evaluation Fio Upside Endeavors, LLC
and Research Focus Diagnostics Vital Diagnostics S.r.l.
Abbott FDA Center for Devices and Radiological Future Diagnostics B.V. Watin-Biolife Diagnostics and Medicals
American Association for Clinical Health Genomic Health, Inc. Wellstat Diagnostics, LLC
Chemistry FDA Center for Veterinary Medicine Gen-Probe Wyeth Research
AstraZeneca Pharmaceuticals Health Canada Genzyme Diagnostics XDX, Inc.
Bayer Corporation Massachusetts Department of Public GlaxoSmithKline YD Consultant
BD Health Laboratories GlucoTec, Inc. ZIUR Ltd.
Beckman Coulter, Inc. Ministry of Health and Social Welfare – GR Micro LTD
bioMérieux, Inc. Tanzania Greiner Bio-One Inc. Trade Associations
CLMA National Center of Infectious and Habig Regulatory Consulting
College of American Pathologists Parasitic Diseases (Bulgaria) HistoGenex N.V. AdvaMed
GlaxoSmithKline National Health Laboratory Service Icon Laboratories, Inc. Japan Association of Clinical
Ortho-Clinical Diagnostics, Inc. (South Africa) Immunicon Corporation Reagents Industries (Tokyo, Japan)
Pfizer Inc National Institute of Standards and Indiana State Department of Health
Roche Diagnostics, Inc. Technology Instrumentation Laboratory Associate Active Members
National Pathology Accreditation Japan Assn. of Clinical Reagents Industries
Professional Members Advisory Council (Australia) Joanneum Research Forschungsgesellschaft mbH 3rd Medical Group (AK)
New York State Department of Health Johnson & Johnson Pharmaceutical 5th Medical Group/SGSL (ND)
American Academy of Family Ontario Ministry of Health Research and Development, L.L.C. 22 MDSS (KS)
Physicians Pennsylvania Dept. of Health Kaiser Permanente 36th Medical Group/SGSL (Guam)
American Association for Clinical Saskatchewan Health-Provincial K.C.J. Enterprises 48th Medical Group/MDSS (APO, AE)
Chemistry Laboratory Krouwer Consulting 55th Medical Group/SGSAL (NE)
American Association for Laboratory Scientific Institute of Public Health Laboratory Specialists, Inc. 59th MDW/859th MDTS/MTL Wilford Hall
Accreditation University of Iowa, Hygienic Lab LifeScan, Inc. (a Johnson & Johnson Medical Center (TX)
American Association for Company) Academisch Ziekenhuis-VUB
Respiratory Care Industry Members LipoScience Acadiana Medical Labs, Ltd
American Chemical Society Maine Standards Company, LLC ACL Laboratories (IL)
American College of Medical Genetics 3M Medical Division Medical Device Consultants, Inc. Adams County Hospital (OH)
American Medical Technologists AB Biodisk Merck & Company, Inc. Air Force Institute for Operational Health
American Society for Clinical Abbott Micromyx, LLC (TX)
Laboratory Science Abbott Diabetes Care MicroPhage Akron’s Children’s Hospital (OH)
American Society for Microbiology Abbott Molecular Inc. Monogen, Inc. Alameda County Medical Center
American Type Culture Collection Abbott Point of Care Inc. MultiPhase Solutions, Inc. Albany Medical Center Hospital (NY)
ASCP Access Genetics Nanogen Albemarle Hospital (NC)
Associazione Microbiologi Clinici Acupath Nanogen, Point-of-Care Diagnostics Div. Alfred I. du Pont Hospital for Children
Italiani (AMCLI) AdvaMed Nanosphere, Inc. All Children’s Hospital (FL)
Canadian Society for Medical Advancis Pharmaceutical Corporation Nihon Koden Corporation Allegheny General Hospital (PA)
Laboratory Science Advantage Bio Consultants, Inc. Nissui Pharmaceutical Co., Ltd. Alpena General Hospital (MI)
COLA Affymetrix, Inc. (Santa Clara, CA) NJK & Associates, Inc. Alta Bates Summit Medical Center (CA)
College of American Pathologists Affymetrix, Inc. (W. Sacramento, CA) NorDx – Scarborough Campus American University of Beirut Medical
College of Medical Laboratory Agilent Technologies/Molecular NovaBiotics (Aberdeen, UK) Center (NJ)
Technologists of Ontario Diagnostics Novartis Institutes for Biomedical Anne Arundel Medical Center (MD)
College of Physicians and Surgeons Ammirati Regulatory Consulting Research Antelope Valley Hospital District (CA)
of Saskatchewan Anapharm, Inc. Nucryst Pharmaceuticals Arkansas Children’s Hospital (AR)
Elkin Simson Consulting Services Anna Longwell, PC Olympus America, Inc. Arkansas Dept of Health
ESCMID Aptium Oncology Opti Scan Bio Medical Assoc. Public Health Laboratory (AR)
Family Health International Arpida Ltd. Optimer Pharmaceuticals, Inc. Arkansas Methodist Medical Center (AR)
Hong Kong Accreditation Service A/S Rosco Orion Genomics, LLC Asan Medical Center (Seoul)
Innovation and Technology Associate Regional & University Ortho-Clinical Diagnostics, Inc. Asante Health System (OR)
Commission Pathologists (Rochester, NY) Asiri Group of Hospitals Ltd.
International Federation of Astellas Pharma Ortho-McNeil, Inc. Asociacion Espanola Primera de Socorros
Biomedical Laboratory Science AstraZeneca Pharmaceuticals Oxonica (UK) Mutuos (Uruguay)
International Federation of Clinical Aviir, Inc. Panaceapharma Pharmaceuticals Aspirus Wausau Hospital (WI)
Chemistry Axis-Shield PoC AS Paratek Pharmaceuticals, Inc. Atlantic City Medical Center (NJ)
Italian Society of Clinical Bayer Corporation – West Haven, CT PathCare Atlantic Health Sciences Corp.
Biochemistry and Clinical Molecular Bayer HealthCare, LLC, Diagnostics Pathwork Diagnostics Auburn Regional Medical Center (WA)
Biology Div. – Elkhart, IN Pfizer Animal Health Augusta Medical Center (VA)
JCCLS BD Pfizer Inc Aultman Hospital (OH)
The Joint Commission BD Biosciences – San Jose, CA Phadia AB Avera McKennan (SD)
National Society for BD Diagnostic Systems PlaCor, Inc Az Sint-Jan
Histotechnology, Inc. BD Vacutainer Systems Powers Consulting Services Azienda Ospedale Di Lecco (Italy)
Ontario Medical Association Quality Beckman Coulter, Inc. PPD Baffin Regional Hospital (Canada)
Management Program-Laboratory Beth Goldstein Consultant (PA) ProSource Consulting, Inc. Baptist Hospital for Women (TN)
Service Bioanalyse, Ltd. QSE Consulting Baptist Hospital of Miami (FL)
RCPA Quality Assurance Programs Bio-Development S.r.l. Qualtek Clinical Laboratories Bassett Army Community Hospital (AK)
PTY Limited Biomedia Laboratories SDN BHD Quest Diagnostics, Incorporated Baton Rouge General (LA)
Serbian Society of Microbiology bioMérieux, Inc. (MO) Radiometer America, Inc. Baxter Regional Medical Center (AR)
SIMeL bioMérieux, Inc. (NC) RCC CIDA S. A. Bay Regional Medical Center (MI)
Sociedad Espanola de Bioquimica Bio-Rad Laboratories, Inc. – France Replidyne BayCare Health System (FL)
Clinica y Patologia Molecular Bio-Rad Laboratories, Inc. – Irvine, Rib-X Pharmaceuticals Baylor Health Care System (TX)
Sociedade Brasileira de Analises CA Roche Diagnostics GmbH Bayou Pathology, APMC (LA)
Clinicas Bio-Rad Laboratories, Inc. – Plano, Roche Diagnostics, Inc. Baystate Medical Center (MA)
Sociedade Brasileira de Patologia TX Roche Diagnostics Ltd B.B.A.G. Ve U. AS., Duzen Laboratories
Clinica Blaine Healthcare Associates, Inc. Roche Diagnostics Shanghai Ltd. (Turkey)
Turkish Society of Microbiology Braun Biosystems, Inc. Roche Molecular Systems Beebe Medical Center (DE)
Washington G2 Reports Canon U.S. Life Sciences, Inc. SAIC Frederick Inc. NCI-Frederick Cancer Belfast HSS Trust
World Health Organization Cempra Pharmaceuticals, Inc. Research & Development Center Beloit Memorial Hospital (WI)
Center for Measurement Standards/ITRI Sanofi Pasteur Ben Taub General Hospital (TX)
Government Members Centers for Disease Control and Sarstedt, Inc. The Bermuda Hospitals Board
Prevention Schering Corporation Bonnyville Health Center (Canada)
Armed Forces Institute of Pathology Central States Research Centre, Inc. Sequenom, Inc. Boston Medical Center (MA)
Association of Public Health Cepheid Siemens Healthcare Diagnostics Boulder Community Hospital (CO)
Laboratories Chen & Chen, LLC (IQUUM) Siemens Medical Solutions Diagnostics (CA) Brantford General Hospital (Canada)
BC Centre for Disease Control The Clinical Microbiology Institute Siemens Medical Solutions Diagnostics (DE) Bridgeport Hospital (CT)
Centers for Disease Control and Comprehensive Cytometric Consulting Siemens Medical Solutions Diagnostics (NY) Bronson Methodist Hospital (MI)
Prevention Control Lab Specialty Ranbaxy Ltd Broward General Medical Center (FL)
Centers for Disease Control and Copan Diagnostics Inc. Sphere Medical Holding Limited Calgary Laboratory Services (Calgary, AB,
Prevention - Namibia Cosmetic Ingredient Review State of Alabama Canada)
Centers for Disease Control and Cubist Pharmaceuticals Stirling Medical Innovations California Pacific Medical Center (CA)
Prevention – Nigeria Cumbre Inc. Streck Laboratories, Inc. Cambridge Health Alliance (MA)
Centers for Disease Control and Dade Behring Marburg GmbH – A Sysmex America, Inc. (Mundelein, IL) Camden Clark Memorial Hospital (WV)
Prevention – Tanzania Siemens Company Sysmex Corporation (Japan) Canadian Science Center for Human and
Centers for Medicare & Medicaid Dahl-Chase Pathology Associates PA Targanta Therapeutics, Inc Animal Health (Canada)
Services David G. Rhoads Associates, Inc. Tethys Bioscience, Inc. Cape Breton Healthcare Complex (Canada)
Centers for Medicare & Medicaid Diagnostic Products Corporation TheraDoc Cape Cod Hospital (MA)
Services/CLIA Program Diagnostica Stago Therapeutic Monitoring Services, LLC Cape Fear Valley Medical Center
Chinese Committee for Clinical Docro, Inc. Theravance Inc. Laboratory (NC)
Laboratory Standards DX Tech Third Wave Technologies, Inc. Capital Health/QE II Health Sciences
Department of Veterans Affairs Eiken Chemical Company, Ltd. Thrombodyne, Inc. Centre (Nova Scotia)
DFS/CLIA Certification Elanco Animal Health ThromboVision, Inc. Capital Health - Regional Laboratory
Emisphere Technologies, Inc. Transasia Bio-Medicals Limited Services (Canada)
Eurofins Medinet Trek Diagnostic Systems

Capital Health System Mercer Dundy County Hospital (NE) Indiana University – Chlamydia Medical Center of Louisiana at NO-
Campus (NJ) Durham VA Medical Center (NC) Laboratory (IN) Charity (LA)
Carilion Labs Charlotte DVA Laboratory Services (FL) Inova Fairfax Hospital (VA) Medical Center of McKinney (TX)
Carpermor S.A. de C.V. (Mexico) Dwight D. Eisenhower Medical Institut fur Stand. und Dok. im Med. Medical Centre Ljubljana (Slovenia)
Catholic Health Initiatives (KY) Center (KS) Lab. (Germany) Medical College of Virginia Hospital (VA)
Cavan General Hospital (Ireland) E. A. Conway Medical Center (LA) Institut National de Santé Publique du Quebec Medical Specialists (IN)
CDC/HIV (APO, AP) East Central Health (Canada) Centre de Doc. – INSPQ (Canada) Medical Univ. of South Carolina (SC)
Cedars-Sinai Medical Center (CA) East Georgia Regional Medical Institute Health Laboratories (PR) MediCorp - Mary Washington Hospital
Central Baptist Hospital (KY) Center (GA) Institute of Clinical Pathology and (VA)
Central Kansas Medical Center (KS) Eastern Health Pathology (Australia) Medical Research (Australia) Memorial Hermann Healthcare System (TX)
Central Texas Veterans Health Care Easton Hospital (PA) Institute of Laboratory Medicine Memorial Hospital at Gulfport (MS)
System (TX) Edward Hospital (IL) Landspitali Univ. Hospital (Iceland) Memorial Hospital Laboratory (CO)
Centralized Laboratory Services (NY) Effingham Hospital (GA) Institute of Medical & Veterinary Memorial Medical Center (IL)
Centre Hospitalier Anna-Laberge (Canada) Eliza Coffee Memorial Hospital (AL) Science (Australia) Memorial Medical Center (PA)
Centura – Villa Pueblo (CO) Emory University Hospital (GA) Integrated Regional Laboratories Memorial Regional Hospital (FL)
Chaleur Regional Hospital (Canada) Evangelical Community Hospital (PA) South Florida (FL) Mercy Franciscan Mt. Airy (OH)
Chang Gung Memorial Hospital (Taiwan) Evans Army Community Hospital (CO) International Health Management Mercy Hospital (ME)
Changhua Christian Hospital (Taiwan) Exeter Hospital (NH) Associates, Inc. (IL) Mercy Medical Center (CO)
The Charlotte Hungerford Hospital Federal Medical Center (MN) Ireland Army Community Hospital (KY) Mercy Medical Center (OR)
(CT) First Health of the Carolinas IWK Health Centre (Canada) Methodist Hospital (MN)
Chatham - Kent Health Alliance (Canada) Moore Regional Hospital (NC) Jackson County Memorial Hospital (OK) Methodist Hospital (TX)
Chesapeake General Hospital (VA) Flaget Memorial Hospital (KY) Jackson Health System (FL) Methodist Hospital Pathology (NE)
Chester County Hospital (PA) Fletcher Allen Health Care (VT) Jackson Purchase Medical Center MetroHealth Medical Center (OH)
Children’s Healthcare of Atlanta (GA) Fleury S.A. (Brazil) (KY) Metropolitan Hospital Center (NY)
The Children’s Hospital (CO) Florida Hospital (FL) Jacobi Medical Center (NY) Metropolitan Medical Laboratory, PLC (IA)
Children’s Hospital (OH) Florida Hospital Waterman (FL) John C. Lincoln Hospital (AZ) The Michener Inst. for Applied
Children’s Hospital and Medical Center Foote Hospital (MI) John Muir Medical Center (CA) Health Sciences (Canada)
(WA) Fort St. John General Hospital (Canada) John T. Mather Memorial Hospital (NY) Middelheim General Hospital
Children’s Hospital & Research Forum Health Northside Medical Johns Hopkins Medical Institutions Middletown Regional Hospital (OH)
Center at Oakland (CA) Center (OH) (MD) Mike O'Callaghan Federal Hospital (NV)
Children’s Hospital Medical Center Fox Chase Cancer Center (PA) Johns Hopkins University (MD) Mississippi Baptist Medical Center (MS)
(OH) Frankford Hospital (PA) Johnson City Medical Center (TN) Mississippi Public Health Lab (MS)
Children’s Hospital of Philadelphia Fraser Health Authority JPS Health Network (TX) Monmouth Medical Center (NJ)
(PA) Royal Columbian Hospital Site Kadlec Medical Center (WA) Montefiore Medical Center (NY)
Children’s Hospitals and Clinics (MN) (Canada) Kaiser Permanente (CA) Montreal General Hospital (Quebec)
Children’s Medical Center (OH) Fresenius Medical Care/Spectra East Kaiser Permanente (MD) Morton Plant Hospital (FL)
Children’s Medical Center (TX) (NJ) Kaiser Permanente (OH) Mt. Sinai Hospital - New York (NY)
Children’s Memorial Hospital (IL) Fundacio Joan Costa Roma Consorci Kaiser Permanente Medical Care (CA) Nassau County Medical Center (NY)
The Children’s Mercy Hospital (MO) Sanitari de Terrassa (Spain) Kantonsspital Aarau AG (Switzerland) National Cancer Center (S. Korea)
Childrens Hosp. – Kings Daughters (VA) Gamma-Dynacare Laboratories Keller Army Community Hospital (NY) National Cancer Institute (MD)
Childrens Hospital Los Angeles (CA) (Canada) Kenora-Rainy River Reg. Lab. National Healthcare Group (Singapore)
Childrens Hospital of Wisconsin (WI) Gamma Dynacare Medical Program (Canada) National Institutes of Health, Clinical
Chilton Memorial Hospital (NJ) Laboratories (Ontario, Canada) King Fahad National Guard Health Center (MD)
Christiana Care Health Services (DE) Garden City Hospital (MI) Affairs King Abdulaziz Medical City National Naval Medical Center (MD)
Christus St. John Hospital (TX) Garfield Medical Center (CA) (Saudi Arabia) National University Hospital Department of
CHU Sainte – Justine (Quebec) Geisinger Medical Center (Danville, PA) King Faisal Specialist Hospital (MD) Laboratory Medicine (Singapore)
City of Hope National Medical Genesis Healthcare System (OH) King Hussein Cancer Center Naval Hospital Great Lakes (IL)
Center (CA) George Washington University Kings County Hospital Center (NY) Naval Hospital Oak Harbor (WA)
Clarian Health – Clarian Pathology Hospital (DC) Kingston General Hospital (Canada) Naval Medical Center Portsmouth (VA)
Laboratory (IN) Ghent University Hospital (Belgium) Lab Medico Santa Luzia LTDA (Brazil) NB Department of Health
Cleveland Clinic Health System Good Samaritan Hospital (OH) Labette Health (KS) The Nebraska Medical Center (NE)
Eastern Region (OH) Good Shepherd Medical Center (TX) Laboratory Alliance of Central New New England Baptist Hospital (MA)
Clinical Labs of Hawaii (HI) Grana S.A. (Mexico) York (NY) New England Fertility Institute (CT)
CLSI Laboratories, Univ. Pittsburgh Grand Strand Reg. Medical Center (SC) LabPlus Auckland Healthcare Services New Lexington Clinic (KY)
Med. Ctr. (PA) Gundersen Lutheran Medical Center Limited (New Zealand) New York City Department of Health
Colchester East Hants Health Authority (WI) Labway Clinical Laboratory Ltd (China) and Mental Hygiene (NY)
(Canada) Guthrie Clinic Laboratories (PA) Lafayette General Medical Center (LA) New York-Presbyterian Hospital (NY)
Commonwealth of Virginia (DCLS) Haga Teaching Hospital (Netherlands) Lakeland Regional Laboratories (MI) New York University Medical Center (NY)
(VA) Hagerstown Medical Laboratory (MD) Lakeland Regional Medical Center (FL) Newark Beth Israel Medical Center (NJ)
Community Hospital (IN) Halton Healthcare Services (Canada) Lancaster General Hospital (PA) Newton Memorial Hospital (NJ)
The Community Hospital (OH) Hamad Medical Corporation (Qatar) Landstuhl Regional Medical Center North Bay Hospital (FL)
Community Hospital of the Monterey Hamilton Regional Laboratory Medicine Langley Air Force Base (VA) North Carolina Baptist Hospital (NC)
Peninsula (CA) Program (Canada) LeBonheur Children’s Medical Center North Coast Clinical Laboratory, Inc. (OH)
Community Medical Center (NJ) Hanover General Hospital (PA) (TN) North District Hospital (Hong Kong)
Community Memorial Hospital (WI) Harford Memorial Hospital (MD) Legacy Laboratory Services (OR) North Mississippi Medical Center (MS)
Consultants Laboratory of WI LLC Harris Methodist Fort Worth (TX) Lethbridge Regional Hospital (Canada) North Shore-Long Island Jewish Health
(WI) Health Network Lab (PA) Lewis-Gale Medical Center (VA) System Laboratories (NY)
Contra Costa Regional Medical Health Partners Laboratories Bon L’Hotel-Dieu de Quebec (Quebec, Northeast Pathologists, Inc. (MO)
Center (CA) Secours Richmond (VA) Canada) Northridge Hospital Medical Center (CA)
Cook Children’s Medical Center Health Sciences Research Institute Licking Memorial Hospital (OH) Northside Hospital (GA)
(TX) (Japan) LifeBridge Health Sinai Hospital (MD) Northwest Texas Hospital (TX)
Cork University Hospital (Ireland) Health Waikato (New Zealand) LifeLabs (Canada) Northwestern Memorial Hospital (IL)
Corpus Christi Medical Center (TX) Heartland Health (MO) Loma Linda University Medical (CA) Norton Healthcare (KY)
Covance CLS (IN) Heidelberg Army Hospital (APO, AE) Long Beach Memorial Medical Ochsner Clinic Foundation (LA)
Covance Evansville (IN) Helen Hayes Hospital (NY) Center (CA) Ohio State University Hospitals (OH)
The Credit Valley Hospital (Canada) Hema-Quebec (Canada) Los Angeles County Public Health Onze Lieve Vrouw Ziekenhuis (Belgium)
Creighton Medical Laboratories (NE) Hennepin Faculty Association (MN) Lab. (CA) Ordre Professionel des Technologistes
Creighton University Medical Center (NE) Henry Ford Hospital (MI) Louisiana Office of Public Health Medicaux du Quebec (Quebec)
Crozer-Chester Medical Center (PA) Henry M. Jackson Foundation (MD) Laboratory (LA) Orebro University Hospital
Darwin Library NT Territory Health Henry Medical Center, Inc. (GA) Louisiana State University Medical Ctr. Orlando Regional Healthcare System (FL)
Services (Australia) Hi-Desert Medical Center (CA) (LA) The Ottawa Hospital (Canada)
David Grant Medical Center (CA) Hoag Memorial Hospital Lourdes Hospital (KY) Our Lady of Lourdes Medical Center (NJ)
Daviess Community Hospital (IN) Presbyterian (CA) Maccabi Medical Care and Health Fund Our Lady of Lourdes Reg. Medical Ctr.
Deaconess Hospital Laboratory (IN) Holy Cross Hospital (MD) Madison Parish Hospital (LA) (LA)
Deaconess Medical Center (WA) Holy Family Medical Center (WI) Mafraq Hospital Our Lady of the Way Hospital (KY)
Dean Medical Center (WI) Holy Name Hospital (NJ) Magnolia Regional Health Center (MS) Our Lady’s Hospital for Sick Children
DeWitt Healthcare Network (USA Holy Spirit Hospital (PA) Main Line Clinical Laboratories, Inc. (PA) (Ireland)
Meddac) (VA) Hopital Cite de La Sante de Laval Maricopa Integrated Health System (AZ) Overlake Hospital Medical Center (WA)
DHHS NC State Lab of Public (Canada) Marquette General Hospital (MI) Palmetto Health Baptist Laboratory (SC)
Health (NC) Hopital du Haut-Richelieu (Canada) Marshfield Clinic (WI) Pathologists Associated (IN)
Diagnostic Laboratory Services, Inc. Hôpital Maisonneuve - Rosemont Martha Jefferson Hospital (VA) Pathology and Cytology Laboratories,
(HI) (Montreal, Canada) Martin Luther King, Jr. Harbor Hospital Inc. (KY)
Diagnostic Services of Manitoba Hôpital Sacré-Coeur de Montreal (CA) Pathology Associates Medical
(Canada) (Quebec, Canada) Martin Memorial Health Systems (FL) Laboratories (WA)
Diagnósticos da América S/A (Sao Hopital Santa Cabrini Ospedale Mary Imogene Bassett Hospital (NY) Penn State Hershey Medical Center (PA)
Paulo) (Canada) Marymount Medical Center (KY) Pennsylvania Hospital (PA)
DIANON Systems/Lab Corp. (OK) Hospital Albert Einstein (Brazil) Massachusetts General Hospital (MA) Penrose St. Francis Health Services (CO)
Diaz Gill-Medicina Laboratorial S.A. Hospital das Clinicas-FMUSP (Brazil) Massachusetts General Hospital The Permanente Medical Group (CA)
Dimensions Healthcare System Hospital de Dirino Espirito Santa Division of Laboratory Medicine (MA) Perry County Memorial Hospital (IN)
(MD) (Portugal) Maxwell Air Force Base (AL) Peterborough Regional Health Centre
Dr. Erfan & Bagedo General Hospital The Hospital for Sick Children Mayo Clinic (MN) (Canada)
(Saudi Arabia) (Canada) Mayo Clinic Scottsdale (AZ) Piedmont Hospital (GA)
Dr. Everette Chalmers Regional Hôtel Dieu Grace Hospital Library MDS Metro Laboratory Services Pitt County Memorial Hospital (NC)
Hospital (NB) (Windsor, ON, Canada) (BC, Canada) Prairie Lakes Hospital (SD)
DRAKE Center (OH) Hunter Area Pathology Service Meadows Regional Medical Center (GA) Presbyterian Hospital of Dallas (TX)
Driscoll Children’s Hospital (TX) (Australia) Mease Countryside Hospital (FL) Presbyterian/St. Luke’s Medical Center
DSI of Bucks County (PA) Imelda Hospital (Belgium) Medecin Microbiologiste (Canada) (CO)
DUHS Clinical Laboratories (NC) Medical Center Hospital (TX) Prince County Hospital

Princess Margaret Hospital (Hong Kong) St. Joseph’s Hospital & Health Sydney South West Pathology Service U.S.A. Meddac (Pathology Division)
Providence Alaska Medical Center (AK) Center (ND) (Australia) (MO)
Providence Health Care (Canada) St. Joseph’s Medical Center (CA) T.J. Samson Community Hospital (KY) UW Hospital (WI)
Providence Medford Medical Center St. Joseph’s Regional Medical Taipei Veterans General Hospital UZ-KUL Medical Center (Belgium)
(OR) Center (NJ) (Taiwan) VA (Asheville) Medical Center (NC)
Provincial Health Services Authority St. Jude Children’s Research Hospital Taiwan Society of Laboratory VA (Bay Pines) Medical Center (FL)
(Vancouver, BC, Canada) (TN) Medicine VA (Chillicothe) Medical Center (OH)
Provincial Laboratory for Public St. Louis University Hospital (MO) Tallaght Hospital VA (Cincinnati) Medical Center (OH)
Health (Edmonton, AB, Canada) St. Luke’s Hospital (FL) Tartu University Clinics (Tartu) VA (Dallas) Medical Center (TX)
Queen Elizabeth Hospital (Canada) St. Luke’s Hospital (IA) Temple Univ. Hospital - Parkinson VA (Dayton) Medical Center (OH)
Queensland Health Pathology Services St. Luke’s Hospital (PA) Pav. (PA) VA (Decatur) Medical Center (GA)
(Australia) St. Margaret Memorial Hospital (PA) Texas Children's Hospital (TX) VA (Hines) Medical Center (IL)
Quest Diagnostics, Inc St. Martha’s Regional Hospital (Canada) Texas Department of State Health VA (Indianapolis) Medical Center (IN)
Quest Diagnostics, Inc (San Juan St. Mary Medical Center (CA) Services VA (Iowa City) Medical Center (IA)
Capistrano, CA) St. Mary’s Health Center (MO) Thomason Hospital (TX) VA (Long Beach) Medical Center (CA)
Quest Diagnostics JV (IN, OH, PA) St Mary’s Healthcare (SD) Timmins and District Hospital (Canada) VA (Miami) Medical Center (FL)
Quest Diagnostics Laboratories (WA) St. Mary’s Medical Center (IN) The Toledo Hospital (OH) VA New Jersey Health Care System
Quincy Hospital (MA) St. Michael’s Hospital Diagnostic Touro Infirmary (LA) (NJ)
Rady Children’s Hospital San Diego Laboratories & Pathology (Canada) Tri-Cities Laboratory (WA) VA Outpatient Clinic (OH)
(CA) St. Tammany Parish Hospital (LA) Trident Medical Center (SC) VA (Phoenix) Medical Center (AZ)
Redington-Fairview General Hospital St. Thomas More Hospital (CO) Trinity Medical Center (AL) VA (San Diego) Medical Center (CA)
(ME) Sampson Regional Medical Center (NC) Tripler Army Medical Center (HI) VA (Seattle) Medical Center (WA)
Regional Health Authority Four (RHA4) Samsung Medical Center (Korea) Tufts New England Medical Center VA (Sheridan) Medical Center (WY)
(Canada) San Francisco General Hospital- Hospital (MA) VA (Tucson) Medical Center (AZ)
Regions Hospital (MN) University of California San Francisco Tulane Medical Center Hospital & Valley Health (VA)
Reid Hospital & Health Care Services (CA) Clinic (LA) Vancouver Hospital and Health
(IN) Sanford USP Medical Center (SD) Turku University Central Hospital Sciences Center (BC, Canada)
Renown Regional Medical Center (NV) SARL Laboratoire Carron (France) UCI Medical Center (CA) Vancouver Island Health Authority
Research Medical Center (MO) Saudi Aramco Medical (Saudi Arabia) UCLA Medical Center Clinical (Canada)
Riverside Regional Medical Center Scott Air Force Base (IL) Laboratories (CA) Vanderbilt University Medical Center
(VA) Scott & White Memorial Hospital (TX) UCSD Medical Center (CA) (TN)
Riyadh Armed Forces Hospital, Seoul National University Hospital UCSF Medical Center China Basin Via Christi Regional Medical Center
Sulaymainia (Korea) (CA) (KS)
Robert Wood Johnson University Seton Medical Center (CA) UMass Memorial Medical Center (MA) Virga Jessezieukenhuis (Belgium)
Hospital (NJ) Shamokin Area Community Hospital UMC of Southern Nevada (NV) ViroMed Laboratories (LabCorp) (MN)
Roxborough Memorial Hospital (PA) UNC Hospitals (NC) Virtua - West Jersey Hospital (NJ)
(PA) Sheik Kalifa Medical City (UAE) Union Clinical Laboratory (Taiwan) WakeMed (NC)
Royal Victoria Hospital (Canada) Shore Memorial Hospital (NJ) United Christian Hospital (Hong Kong) Walter Reed Army Medical Center (DC)
Rush North Shore Medical Center Shriners Hospitals for Children (SC) United Clinical Laboratories (IA) Warren Hospital (NJ)
(IL) Singapore General Hospital Unity HealthCare (IA) Washington Hospital Center (DC)
SAAD Specialist Hospital (Saudi (Singapore) Universita Campus Bio-Medico (Italy) Waterbury Hospital (CT)
Arabia) SJRMC Plymouth Laboratory (IN) Universitair Ziekenhuis Antwerpen Waterford Regional Hospital (Ireland)
Sacred Heart Hospital (FL) Sky Lakes Medical Center (OR) (Belgium) Wayne Memorial Hospital (NC)
Sacred Heart Hospital (WI) South Bend Medical Foundation (IN) University College Hospital (Ireland) Weirton Medical Center (WV)
Sahlgrenska Universitetssjukhuset South County Hospital (RI) University Medical Center at Princeton Wellstar Douglas Hospital Laboratory
(Sweden) South Dakota State Health Laboratory (NJ) (GA)
Saint Elizabeth Regional Medical (SD) University of Alabama-Birmingham Wellstar Paulding Hospital (GA)
Center (NE) South Miami Hospital (FL) Hospital (AL) Wellstar Windy Hill Hospital Laboratory
Saint Francis Hospital & Medical Southern Health Care Network University of Arkansas for Medical Sci. (GA)
Center (CT) (Australia) (AR) West China Second University Hospital,
Saint Mary's Regional Medical Southern Maine Medical Center (ME) University of Chicago Hospitals (IL) Sichuan University (P.R. China)
Center (NV) Southwest Nova District Health University of Colorado Health Sciences West Valley Medical Center Laboratory
Saints Memorial Medical Center (MA) Authority (Canada) Center (CO) (ID)
St. Agnes Healthcare (MD) Speare Memorial Hospital (NH) University of Colorado Hospital Westchester Medical Center (NY)
St. Anthony Hospital (OK) Spectrum Health - Blodgett Campus University of Iowa Hospitals and Clinics Western Baptist Hospital (KY)
St. Anthony Hospital Central Laboratory (MI) (IA) Western Healthcare Corporation
(CO) Stanford Hospital and Clinics (CA) University of Kentucky Med. Ctr. (KY) (Canada)
St. Anthony’s Hospital (FL) State of Connecticut Department of University of Maryland Medical System Wheaton Franciscan & Midwest Clinical
St. Barnabas Medical Center (NJ) Public Health (CT) (MD) Laboratories (WI)
St. Christopher’s Hospital for State of Hawaii Department of Health University of Medicine & Dentistry, NJ Wheeling Hospital (WV)
Children (PA) (HI) University Hosp. (NJ) Whitehorse General Hospital (Canada)
St. Elizabeth Community Hospital (CA) State of Washington-Public Health Labs University of Miami (FL) William Beaumont Army Medical
St. Francis Hospital (SC) (WA) University of Missouri Hospital (MO) Center (TX)
St. Francis Medical Center (MN) Steele Memorial Hospital (ID) University of MN Medical Center - William Beaumont Hospital (MI)
St. John Hospital and Medical Stillwater Medical Center (OK) Fairview William Osler Health Centre (Canada)
Center (MI) Stony Brook University Hospital (NY) University of MS Medical Center (MS) Winchester Hospital (MA)
St. John’s Hospital (IL) Stormont-Vail Regional Medical University of So. Alabama Children’s Winn Army Community Hospital (GA)
St. John’s Hospital & Health Ctr. (CA) Center (KS) and Women’s Hospital (AL) Wisconsin State Laboratory of Hygiene
St. John’s Mercy Medical Center (MO) Sudbury Regional Hospital (Canada) University of Texas Health Center (TX) (WI)
St. John’s Regional Health Center (MO) Suncoast Medical Clinic (FL) The University of Texas Medical Wishard Health Sciences (IN)
St. Joseph Medical Center (MD) Sunnybrook Health Science Center Branch (TX) Womack Army Medical Center (NC)
St. Joseph Mercy – Oakland (MI) (ON, Canada) University of the Ryukyus (Japan) Woodlawn Hospital (IN)
St. Joseph Mercy Hospital (MI) Sunrise Hospital and Medical Center University of Virginia Medical Center York Hospital (PA)
St. Joseph’s Hospital (FL) (NV) University of Washington
Swedish Medical Center (CO) UPMC Bedford Memorial (PA)
OFFICERS BOARD OF DIRECTORS
Gerald A. Hoeltge, MD, Maria Carballo

President Health Canada Timothy J. O’Leary, MD, PhD
Cleveland Clinic Department of Veterans Affairs
Russel K. Enns, PhD
Janet K.A. Nicholson, PhD, Cepheid Robert Rej, PhD
President-Elect New York State Department of Health
Centers for Disease Control and Prevention Prof. Naotaka Hamasaki, MD, PhD
Nagasaki International University Donald St.Pierre
Mary Lou Gantzer, PhD, FDA Center for Devices and Radiological Health
Secretary Valerie Ng, PhD, MD
Siemens Medical Solutions Diagnostics Alameda County Medical Center/ Michael Thein, PhD
Highland General Hospital Roche Diagnostics GmbH
W. Gregory Miller, PhD,
Treasurer Luann Ochs, MS James A. Thomas
Virginia Commonwealth University BD Diagnostics – TriPath ASTM International
Robert L. Habig, PhD,

Immediate Past President
Habig Regulatory Consulting
Glen Fine, MS, MBA,

Executive Vice President


C54-A
Vol. 28 No. 19
Replaces C54-P
Vol. 27 No. 25
Verification of Comparability of Patient

Results Within One Health Care System;
Approved Guideline
This document provides guidance on how to verify comparability of quantitative

940 West Valley Road Suite 1400 Wayne, PA 19087 USA PHONE 610.688.0100 laboratory results for individual patients within a health care system.
FAX 610.688.0700 E-MAIL: [email protected] WEBSITE: www.clsi.org ISBN 1-56238-671-9 A guideline for global application developed through the Clinical and Laboratory
Standards Institute consensus process.

C54A

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

C54A

Uploaded by

Copyright:

Available Formats

Product Name: Infobase 2009 - Release Date: March 2009

Verification of Comparability of Patient

This document provides guidance on how to verify comparability of quantitative

This document is protected by copyright. Downloaded on 2/23/2009

Clinical and Laboratory Standards Institute

This document is protected by copyright. Downloaded on 2/23/2009

This document is protected by copyright. Downloaded on 2/23/2009

This document is protected by copyright. Downloaded on 2/23/2009

Area Committee on Clinical Chemistry and Toxicology

David A. Armbruster, PhD, Linda Thienpont, PhD Neil Greenberg, PhD

David M. Bunk, PhD Advisors David Sacks, MD

Subcommittee on Verification of Comparability of Patient Results

Christopher M. Lehman, MD Ellis Jacobs, PhD, DABCC, FACB Advisors

This document is protected by copyright. Downloaded on 2/23/2009

Advisors (Continued) Staff Jane M. Oates, MT(ASCP)

Dietmar Stöckl, PhD

This document is protected by copyright. Downloaded on 2/23/2009

Committee Membership........................................................................................................................ iii

Foreword .............................................................................................................................................. vii

5 Practical Considerations for Designing a Comparability Monitoring Protocol ......................... 5

This document is protected by copyright. Downloaded on 2/23/2009

Appendix A. Worked Examples ........................................................................................................... 26

Appendix B. Table of Critical Differences (%) for the Range Test...................................................... 34

Appendix C. Statistical Concepts.......................................................................................................... 36

Appendix D. Biological Variation ........................................................................................................ 43

Summary of Comments and Subcommittee Responses ........................................................................ 45

The Quality Management System Approach ........................................................................................ 46

Related CLSI Reference Materials ....................................................................................................... 47

This document is protected by copyright. Downloaded on 2/23/2009

This document is protected by copyright. Downloaded on 2/23/2009

This document is protected by copyright. Downloaded on 2/23/2009

Verification of Comparability of Patient Results Within One Health Care

This document is protected by copyright. Downloaded on 2/23/2009

analyte – component represented in the name of a measurable quantity (ISO 17511).4

This document is protected by copyright. Downloaded on 2/23/2009

commutable – interassay properties of a reference material, calibrator material, or quality control

measurand – particular quantity subject to measurement (VIM93).7

point-of-care testing (POCT)//bedside, near-patient testing – testing performed in an alternate site,

proficiency testing//external quality assessment (PT/EQA) – determination of laboratory testing

This document is protected by copyright. Downloaded on 2/23/2009

This document is protected by copyright. Downloaded on 2/23/2009

5 Practical Considerations for Designing a Comparability Monitoring Protocol

5.1 Causes of Noncomparability of Results

• different analytical methodologies;

• differences in calibration between measurement procedures;

• differences in imprecision between measurement procedures;

• simultaneous use of calibrator lots of different ages/stages of time-dependent degradation in different

• differences in commutability of calibrators with different measurement procedures from different

• reagent on-instrument degradation after calibration;

This document is protected by copyright. Downloaded on 2/23/2009

5.2 Scope of Comparisons

5.3 Risk Assessment for Noncomparable Results

5.3.1 Clinical Impact of Noncomparability

5.3.2 Probability of Noncomparability

5.4 Frequency and Complexity of Comparability Assessment Protocols

5.4.1 Statistical Considerations

This document is protected by copyright. Downloaded on 2/23/2009

5.4.2 Operational and Cost Considerations

5.5 General Approaches to Comparability Testing

5.6 Triggers for Special Cause Comparability Testing

5.6.1 Failure of a Frequent or Periodic Monitor

This document is protected by copyright. Downloaded on 2/23/2009

5.6.2 Proficiency Testing (PT)/External Quality Assessment (EQA) Failure

5.6.3 Shift in a Statistical Monitoring Parameter