# ASTM D7372-17

Designation: D7372 − 17 An American National StandardStandard Guide forAnalysis and Interpretation of Proficiency Test ProgramResults1This standard is issued under the fixed designation D7372; the number immediately following the designation indicates the year oforiginal adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. Asuperscript epsilon (´) indicates an editorial change since the last revision or reapproval.1. Scope*1.1 This guide covers the evaluation and interpretation ofproficiency test program (PTP) results. For proficiency testprogram participants, this guide describes procedures forassessing participants’ results relative to the collective PTprogram results and potentially improving the laboratory’stesting performance based on the assessment of findings andinsights. For the committees responsible for the test methodsincluded in PT programs, this guide describes procedures forassessing industry’s ability to perform test methods and forpotentially identifying opportunities for improvements.1.2 This standard does not purport to address all of thesafety concerns, if any, associated with its use. It is theresponsibility of the user of this standard to establish appro-priate safety, health, and environmental practices and deter-mine the applicability of regulatory limitations prior to use.1.3 This international standard was developed in accor-dance with internationally recognized principles on standard-ization established in the Decision on Principles for theDevelopment of International Standards, Guides and Recom-mendations issued by the World Trade Organization TechnicalBarriers to Trade (TBT) Committee.2. Referenced Documents2.1 ASTM Standards:2D6259 Practice for Determination of a Pooled Limit ofQuantitation for a Test MethodD6299 Practice for Applying Statistical Quality Assuranceand Control Charting Techniques to Evaluate AnalyticalMeasurement System PerformanceD6617 Practice for Laboratory Bias Detection Using SingleTest Result from Standard MaterialD6792 Practice for Quality Management Systems in Petro-leum Products, Liquid Fuels, and Lubricants TestingLaboratoriesE177 Practice for Use of the Terms Precision and Bias inASTM Test MethodsE456 Terminology Relating to Quality and StatisticsE2655 Guide for Reporting Uncertainty of Test Results andUse of the Term Measurement Uncertainty in ASTM TestMethods2.2 ASTM standards used only in Appendix X3 are alsolisted in X3.1.3. Terminology3.1 Definitions:3.1.1 accuracy, n—closeness of agreement between an ob-served value and an accepted reference value. E177, E4563.1.2 analytical measurement system, n—a collection of oneor more components or subsystems, such as sample handlingand preparation, test equipment, instrumentation, displaydevices, data handlers, printouts or output transmitters, that areused to determine a quantitative value of a specific property foran unknown sample in accordance with a standard test method.3.1.3 assignable cause, n—factor that contributes to varia-tion and that is feasible to detect and identify. E4563.1.4 bias, n—systematic error that contributes to the differ-ence between a population mean of the measurements or testresults and an accepted reference or true value. E177, E4563.1.5 control limits, n—limits on a control chart that areused as criteria for signaling the need for action or for judgingwhether a set of data does or does not indicate a state ofstatistical control. E4563.1.6 in-statistical-control, adj—process, analytical mea-surement system, or function that exhibits variations that canonly be attributable to common cause. D62993.1.7 out-of-statistical-control, adj—a process, analyticalmeasurement system, or function that exhibits variations inaddition to those that can be attributable to common cause andthe magnitude of these additional variations exceeds specifiedlimits. D62991This guide is under the jurisdiction of ASTM Committee D02 on PetroleumProducts, Liquid Fuels, and Lubricants and is the direct responsibility of Subcom-mittee D02.94 on Coordinating Subcommittee on Quality Assurance and Statistics.Current edition approved Oct. 1, 2017. Published October 2017. Originallyapproved in 2007. Last previous edition approved in 2012 as D7372 – 12. DOI:10.1520/D7372-17.2For referenced ASTM standards, visit the ASTM website, www.astm.org, orcontact ASTM Customer Service at service@astm.org. For Annual Book of ASTMStandards volume information, refer to the standard’s Document Summary page onthe ASTM website.*A Summary of Changes section appears at the end of this standardCopyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United StatesThis international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for theDevelopment of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.13.1.8 proficiency testing, n—determination of a laboratory’stesting capability by participation in an interlaboratory profi-ciency test program D62993.1.9 proficiency test program (PTP), n—statistical qualityassurance activities that enable laboratories to assess theirperformance in conducting test methods within their ownlaboratory when their data are compared against other labora-tories that participate in the same program cycle using the sametest method.3.1.9.1 Discussion—Proficiency test programs are alsoknown as crosscheck programs and check schemes. The termInterlaboratory Crosscheck Program (ILCP) was previouslyused by ASTM for its PTP with Committee D02.3.1.10 test performance index—industry (TPIIND), n—anapproximate measure of a PT program’s testing capability fora specific test method, defined as the ratio of the ASTMreproducibility (RASTM)tothese data reproducibility (Rthesedata).3.1.11 uncertainty, n—an indication of the magnitude oferror associated with a value that takes into account bothsystematic errors and random errors associated with the mea-surement or test process. E26553.1.12 Z-score, n—standardized and dimensionless measureof the difference between an individual result in a data set andthe arithmetic mean of the dataset, re-expressed in units ofstandard deviation of the dataset (by dividing the actualdifference from the mean by the standard deviation for the dataset). D62993.1.12.1 Discussion—The Z-score term described here isequivalent to Eq. A1.3 in Practice D6299.3.1.13 Z -score, n—measure similar to the Z-score exceptthat the PT program standard deviation is replaced with onethat takes into account the site precision of the laboratory. Z isa valid approach when the laboratory’s site precision standarddeviation is less than that for the PT program (that is, thesedata standard deviation) or stated otherwise when the TPI 1.Z 5~Xi2 X¯!ŒS~s !21Ssthese data2nDDwhere:Z = site precision adjusted Z-Score,Xi= laboratory’s result,X¯= PT average value,s = site precision standard deviation estimate,sthese data= PT Program standard deviation estimate, andn = number of non-outlier data.3.1.13.1 Discussion—Z -score described here is equivalentto Eq. 2 in Practice D6299 for pre-treated results, when the“standard error of ARV” is expressed as “standard deviation ofARV/ √n.”3.2 Definitions of Terms Specific to This Standard:3.2.1 common (chance, random) cause, n—for quality as-surance programs, one of generally numerous factors, individu-ally of relatively small importance, that contributes tovariation, and that is not feasible to detect or control. D62993.2.2 site precision (R ), n—value below which the absolutedifference between two individual test results obtained undersite precision conditions may be expected to occur with aprobability of approximately 0.95 (95 %). It is calculated as2.77 times the standard deviation of results obtained under siteprecision conditions. D62993.2.3 site precision conditions, n—conditions under whichtest results are obtained by one or more operators in a singlesite location practicing the same test method on a singlemeasurement system which may comprise multipleinstruments, using test specimens taken at random from thesame sample of material, over an extended period of timespanning at least a 15 day interval. D62993.2.4 these data, n—term used by the ASTM InternationalD02 PT program to identify statistical results calculated fromthe data submitted by program participants.3.3 Symbols:3.3.1 I—individual observation (as in I-chart).3.3.2 PTP or PT program—proficiency test program.3.3.3 QC—quality control.3.3.4 R —site precision.3.3.5 Rthese data—reproducibility determined in PT program.3.3.6 rthese data—repeatability determined in PT program.3.3.7 RASTM—published ASTM reproducibility.4. Summary of Guide4.1 Petroleum product, liquid fuel, and lubricant samplesare regularly analyzed by specified standard test methods aspart of a proficiency test program. This guide provides alaboratory with the tools and procedures for evaluating theirresults from a PT program. Techniques are presented to screen,plot, and interpret test results in accordance with industry-accepted practices.5. Significance and Use5.1 This guide can be used to evaluate the performance of alaboratory or group of laboratories participating in a profi-ciency test (PT) program involving petroleum and petroleumproducts.5.2 Data accrued, using the techniques included in thisguide, provide the ability to monitor analytical measurementsystem precision and bias. These data are useful for updatingstandard test methods, as well as for indicating areas ofpotential measurement system improvement for action by thelaboratory. This guide serves both the individual participatinglaboratory and the responsible standards development group asfollows:5.2.1 Tools and Approaches for Participating Laboratories.Administrative ReviewsFlagged Data and InvestigationsData Normality ChecksQQ PlotsHistogramsBias (Deviation from Mean)Z-Scores, Z -Scores TrendsPrecision Performance—TPIIND, F-testD7372 − 172Comparison of PTP and Individual Laboratory Site Preci-sion5.2.2 Tools and Approaches for Responsible Standards De-velopment Groups.TPI and precision trendsBias and precision comparisons via box and includesthe mean and the 1st and 99th percentile limits on thehistogram for data sets with n 100. These limits are based on“median 6 2.33 · Standard Deviation,” where 62.33 arerespectively the first and 99th percentiles of the standardnormal distribution.6.5.2 PT program participants should review histogramswhen available and note unusual data distributions. Partici-pants should locate where their result falls within the histogrambins. Depending on the histogram, the location of data incertain bins could indicate a potential issue such as bias.Consider reviewing the histogram in parallel with correspond-ing statistics such as the Z-score, AD statistic, TPI (Industry),and the normal probability (or deviate) plot. See X3.2 forexamples.6.6 Single Laboratory Bias (Deviation from Mean):6.6.1 As mentioned in Practice D6299, subsection 7.6, it isappropriate to evaluate proficiency test results by plotting thesigned deviations from the mean for each result for each testcycle. Practice D6299 suggests plotting the signed deviationson control charts. Laboratories would then apply the strategiesoutlined in that standard to identify outliers and other issuessuch as long-term biases. The recommended control chart is achart of individual observations (called an I-Chart) with anexponentially weighted moving average (EWMA) overlaid onthe data. See X3.3 for examples.6.6.2 Another graphical approach for monitoring bias in-volves use of box and whisker graphs. As is the case forreviewing histograms, laboratories should use the box andwhisker graphs to observe where their particular result lies inthe graph relative to the general distribution of results for thetest method they used. Consider investigating any data outsidethe whisker end, if those data were not flagged already forother causes.Areview of the apparent distribution of results foreach test method measuring the same parameter may providevaluable insight regarding overall biases between methods. See7.2 for more information on box and whisker plots.6.6.3 Another statistical approach for evaluating bias isdescribed in Practice D6617. This guide estimates whether ornot a single test result is biased compared to the consensusvalue from the PT program.6.7 Z-score, Z -score Trends—The Z-score or Z -score, orboth, calculated for each datum submitted by the laboratoryshould be reviewed with respect to the following:6.7.1 Sign and Magnitude of Z-score—The sign (that is, “+”or “–”) of the statistic reflects the relative bias of the individualresult versus the mean of the sample group (and standardizedto the standard deviation of that data set). Z-score valuesfalling in the ranges of plus or minus 0 to 1, 1 to 2, 2 to 3, and3 can be compared to control chart values falling in the rangesbetween the mean and 1-sigma, 1 to 2-sigma, 2 to 3-sigma, and3-sigma. For normally distributed data, there is an expecta-tion that about 68 % of the data will lie in the –1 sigma to +1sigma range, about 95 % in the –2 sigma to +2 sigma range,and 99 % in the –3 to +3 sigma range. The further alaboratory’s Z-score is from zero, the greater the relative biasand lower the probability that the data is considered withinstatistical control. Conduct investigations to determine thecause of any perceived bias as needed.6.7.2 Z-scores and/or Z -score Trends Using Data fromMultiple PTP Cycles—Collect the Z-scores or Z -scores valuesfor each test method (parameter) for successive PT programcycles on a control chart to show the trend over time. PlottingZ-scores or Z -scores is more practical than plotting the signeddeviations from the mean (as in 6.2.1) especially when themagnitude of means can vary considerably from PT cycle tocycle. It is recommended to use the run rules promulgated inPractice D6299 to evaluate any observed trends. Conductinvestigations to determine causes as needed. According toPractice D6299, Z-score and Z -score data for a PT programcycle and test method parameter are acceptable for trendanalysis via control charts when two conditions are met: first,there are at least 16 non-outlier data for the parameter andsecond, the PT cycle standard deviation is not statisticallygreater than the reproducibility standard deviation for the testmethod (see F-test).6.7.3 Average Z-score and Average Z -score—Calculate theaverage Z-score or Z -score for a series over a selected timeperiod. The sign and magnitude of this result is an indication ofthe long-term relative bias. Conduct investigations to deter-mine the cause of any perceived bias as needed.6.8 Precision Performance:6.8.1 TPI (Industry)—Assess the general capability of a testmethod using TPIINDalone or along with other tool