# ASTM E3080-17 14.05

Designation: E3080 − 17 An American National StandardStandard Practice forRegression Analysis1This standard is issued under the fixed designation E3080; the number immediately following the designation indicates the year oforiginal adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. Asuperscript epsilon (´) indicates an editorial change since the last revision or reapproval.1. Scope1.1 This practice covers regression analysis methodologyfor estimating, evaluating, and using the simple linear regres-sion model to define the statistical relationship between twonumerical variables.1.2 The system of units for this practice is not specified.Dimensional quantities in the practice are presented only asillustrations of calculation methods. The examples are notbinding on products or test methods treated.1.3 This standard does not purport to address all of thesafety concerns, if any, associated with its use. It is theresponsibility of the user of this standard to establish appro-priate safety, health, and environmental practices and deter-mine the applicability of regulatory limitations prior to use.1.4 This international standard was developed in accor-dance with internationally recognized principles on standard-ization established in the Decision on Principles for theDevelopment of International Standards, Guides and Recom-mendations issued by the World Trade Organization TechnicalBarriers to Trade (TBT) Committee.2. Referenced Documents2.1 ASTM Standards:2E178 Practice for Dealing With Outlying ObservationsE456 Terminology Relating to Quality and StatisticsE2586 Practice for Calculating and Using Basic Statistics3. Terminology3.1 Definitions—Unless otherwise noted, terms relating toquality and statistics are as defined in Terminology E456.3.1.1 coeffıcient of determination, r2,n—square of thecorrelation coefficient.3.1.2 degrees of freedom, n—the number of independentdata points minus the number of parameters that have to beestimated before calculating the variance. E25863.1.3 residual, n—observed value minus fitted value, when amodel is used.3.1.4 predictor variable, X, n—a variable used to predict aresponse variable using a regression model.3.1.4.1 Discussion—Also called an independent or explana-tory variable.3.1.5 regression analysis, n—a statistical procedure used tocharacterize the association between two numerical variablesfor prediction of the response variable from the predictorvariable.3.1.6 response variable, Y, n—a variable predicted from aregression model.3.1.6.1 Discussion—Also called a dependent variable.3.1.7 sample correlation coeffıcient, r, n—a dimensionlessmeasure of association between two variables estimated fromthe data.3.1.8 sample covariance, sxy,n—an estimate of the associa-tion of the response variable and predictor variable calculatedfrom the data.3.2 Definitions of Terms Specific to This Standard:3.2.1 intercept, n—of a regression model, β0, the value ofthe response variable when the predictor variable is zero.3.2.2 regression model parameter, n—a descriptive constantdefining a regression model that is to be estimated.3.2.3 residual standard deviation, n—of a regression model,σ, the square root of the residual variance.3.2.4 residual variance, n—of a regression model, σ2, thevariance of the residuals (see residual).3.2.5 slope, n—of a regression model, β1, the incrementalchange in the response variable due to a unit change in thepredictor variable.3.3 Symbols:b0= intercept estimate (5.2.2)b1= slope estimate (5.2.2)β0= intercept parameter in model (5.1.2)β1= slope parameter in model (5.1.2)1This practice is under the jurisdiction of ASTM Committee E11 on Quality andStatistics and is the direct responsibility of Subcommittee E11.10 on Sampling /Statistics.Current edition approved Nov. 1, 2017. Published January 2018. Originallyapproved in 2019. Last previous edition approved in 2016 as E3080 – 16. DOI:10.1520/E3080-17.2For referenced ASTM standards, visit the ASTM website, www.astm.org, orcontact ASTM Customer Service at service@astm.org. For Annual Book of ASTMStandards volume information, refer to the standard’s Document Summary page onthe ASTM website.Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United StatesThis international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for theDevelopment of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.1E = general point estimate of a parameter (5.4.2)ei= residual for data point i (5.2.5)ε = residual parameter in model (5.1.3)F = F statistic (X1.3.2)h = index for any value in data range (5.4.5)i = index for a data point (5.2.1)n = number of data points (5.2.1)r = sample correlation coefficient (5.3.2.1)r2= coefficient of determination (5.3.2.2)S(b0,b1) = sum of squared deviations of Yito the regressionline (X1.1.2)sb1= standard error of slope estimate (5.4.3)sb0= standard error of intercept estimate (5.4.4)sE= general standard error of a point estimate (5.4.2)σ = residual standard deviation (5.1.3)s = estimate of σ (5.2.6)σ2= residual variance (5.1.3)s2= estimate of σ2(5.2.6)sX2= variance of X data (X1.2.1)sY2= variance of Y data (X1.2.1)SXX= sum of squares of deviations of X data fromaverage (5.2.3)SXY= sum of cross products of X and Y from theiraverages (5.2.3)sXY= sample covariance of X and Y (X1.2.1)sYˆh=standard error of Yˆh(5.4.5)sYˆh~ind!= standard error of future individual Y value (5.4.6)SYY= sum of squares of deviations of Y data fromaverage (5.2.3)t = Student’s t distribution (5.4.2)X = predictor variable (5.1.1)X¯= average of X data (5.2.3)Xh= general value of X in its range (5.4.5)Xi= value of X for data point i (5.2.1)Y = response variable (5.1.1)Y¯= average of Y data (5.2.3)Yˆh~ind!= predicted future individual Y for a value Xh(5.4.6)Yi= value of Y for data point i (5.2.1)Yˆh= predicted value of Y for any value Xh(5.4.5)Yˆi= predicted value of Y for data point i (5.2.4)3.4 Acronyms:3.4.1 ANOVA, n—Analysis of Variance3.4.2 df, n—Degrees of Freedom3.4.3 LOF, n—Lack of Fit3.4.4 MS, n—Mean Square3.4.5 MSE, n—Mean Square Error3.4.6 MSR, n—Mean Square Regression3.4.7 MST, n—Mean Square Total3.4.8 PE, n—Pure Error3.4.9 SS, n—Sum of Squares3.4.10 SSE, n—Sum of Squares Error3.4.11 SSR, n—Sum of Squares Regression3.4.12 SST, n—Sum of Squares Total4. Significance and Use4.1 Regression analysis is a statistical procedure that studiesthe statistical relationships between two or more variables Ref.(1, 2).3In general, one of these variables is designated as aresponse variable and the rest of the variables are designated aspredictor variables. Then the objective of the model is topredict the response from the predictor variables.4.1.1 This standard considers a numerical response variableand only a single numerical predictor variable.4.1.2 The regression model consists of: (1) a mathematicalfunction that relates the mean values of the response variabledistribution to fixed values of the predictor variable, and (2)adescription of statistical distribution that describes the variabil-ity in the response variable at fixed levels of the predictorvariable.4.1.3 The regression procedure utilizes experimental orobservational data to estimate the parameters defining a regres-sion model and their precision. Diagnostic procedures areutilized to assess the resulting model fit and can suggest othermodels for improved prediction performance.4.1.4 The regression model can be useful for developingprocess knowledge through description of the variablerelationship, in making predictions of future values, and indeveloping control methods for the process generating valuesof the variables.4.2 Section 5 in this standard deals with the simple linearregression model using a straight line mathematical relation-ship between the two variables where variability of theresponse variable over the range of values of the predictorvariable is described by a normal distribution with constantvariance. Appendix X1 provides supplemental information.5. Simple Linear Regression Analysis5.1 Simple Linear Regression Model:5.1.1 Select the response variable Y and the predictorvariable X. The predictor X is assumed to have known valueswith little or no measurement error. The response Y has adistribution of values for a given X value, and this distributionis defined for all X values in a given range.5.1.2 The regression function for the straight line relation-ship is Y5β01β1X. The two parameters for the function are theintercept β0and the slope β1. The intercept is the value of Ywhen X = 0, but this parameter may not be of practical interestwhen the range of X is far removed from zero. The slope is theamount of incremental change in Y units for a unit change in X.5.1.3 The statistical distribution for Y is assumed to be anormal (Gaussian) distribution having a mean of β01β1X witha standard deviation σ. The simple linear regression model isthen stated as Y5β01β1X1ε, where ε is a random error that isnormally distributed with mean zero and standard deviation σ(variance σ2).5.1.4 An example of a linear regression model is depicted inFig. 1 over a range of X from0to40X units. Normaldistributions of response Y with σ = 1.3 Y units are depicted atX = 10, 20, and 30 X units.3The boldface numbers in parentheses refer to a list of references at the end ofthis standard.E3080 − 1725.2 Estimating Regression Model Parameters:5.2.1 The model parameters β0, and β1, are estimated froma sample of data consisting of n pairs of values designated as(Xi, Yi), with the sample number i ranging from 1 through n.The data can arise in two different ways. Observational dataconsists of X and Y values measured on a set of n randomsamples. Experimental data consists of Y values measured on nexperimental units with X values set at fixed values. In bothcases the Y values may have measurement error, but the Xvalues are assumed known with negligible measurement error.5.2.2 The regression line parameters β0, and β1are esti-mated by the method of least squares, which finds theircorresponding estimates b0and b1that minimize the sum of thesquares of the vertical distances between the Yivalues and theirrespective line values at Xi. (For a further discussion of theleast squares method, see X1.1.2.)5.2.3 Calculate the following statistics from the X and Yvalues in the data set.5.2.3.1 Calculate the averages of X and Y:X¯5(i51nXin(1)Y¯5(i51nYin(2)5.2.3.2 Calculate the sums of squared deviations SXXandSYYof X and Y from their respective averages and the sum ofcross products SXYof the X and Y deviations from theiraverages:SXX5(i51n~Xi2 X¯!2(3)SYY5(i51n~Yi2 Y¯!2(4)SXY5(i51n~Xi2 X¯!~Yi2 Y¯! (5)SXXis a known fixed constant. SYYand SXYare randomvariables.5.2.3.3 The least squares solution gives the parameter esti-mates:b15 SXY⁄ SXX(6)b05 Y¯2 b1X¯(7)[SYYis not used here but will be used in subsequent sections.]5.2.4 The fitted values Yˆifor each data point Yiare calcu-lated from the estimated regression function as:Yˆi5 b01b1Xi(8)5.2.5 The residual eiis the difference between the responsedata point Yiand its fitted value Yˆi:ei5 Yi2 Yˆi(9)Residuals are graphically the vertical distances on the scatterplot between the response data points Yiand the estimatedregression line.5.2.6 The estimates s2of the variance σ2and s of thestandard deviation σ of the Y distribution are calculated as thesum of the squared residuals divided by their degrees offreedom:s25(i51nei2~n 2 2!5(i21n~Yi2 Yˆi!2⁄ ~n 2 2! (10)s 5 =s2(11)These estimates have n – 2 degrees of freedom because ofprior estimation of two parameters, the slope and intercept ofthe line, which removed two degrees of freedom from the dataset of n data points prior to calculation of the residuals.5.2.7 Regression Analysis Procedure with Example—Thesteps in the regression analysis procedure for the simple linearmodel, that are illustrated in the example below, are as follows:FIG. 1 Graphical Depiction of a Straight Line Regression ModelE3080 − 173(1) Choose the predictor variable X and response variableY.(2) Obtain data pairs of X and Y from available data or byconducting an experiment.(3) Evaluate the distribution of the predictor variable andthe XY relationship using plots.(4) If the model is supported by the data plots, estimate themodel parameters from the data.(5) Evaluate the fitted model against the model assump-tions.(6) Use the regression model for future prediction of Yfrom X.5.2.7.1 A data set from Duncan, Ref. (3) lists measurementsof shear strength (inch-pounds) and weld diameter (mils)measured on 10 random test specimens, so this is an observa-tional data set with n = 10 pairs. Regression analysis will beused to investigate the relationship between weld diameter andshear strength, with the objective of predicting shear strength Yfrom weld diameter X. The weld diameters are considered to bemeasured with small error. The data are listed in Table 1.5.2.7.2 A dot plot of the X data is shown as Fig. 2, and theplot indicated that the data was spread out fairly evenly acrossthe range of 190–270 mils and some of the parts had the samediameters.5.2.7.3 A scatter plot of the data is recommended as a firstor concurrent step for a visual look at the relationship, andmost computer packages have this as an option. This is a plotof Y (on the vertical axis) versus X (on the horizontal axis) foreach data pair. If a straight line relationship exists, the clusterof points will appear to be elongated in a particular directionalong a straight line, and the plot will visually reveal anycurvature or any other deviations from a straight linerelationship, as well as any outlying data points. The estimatedregression line can also be included on the plot to give a visualimpression of the fit of the model to the data.The scatter plot for this example is shown in Fig. 3. Theshear strength appears to be increasing in a linear fashion withweld diameter. There is some scatter but no apparent outlyingdata points.5.2.7.4 The calculations, with equation numbers for eachcalculation, are shown in Table 1. The averages of X and Y arerespectively 233.9 mils and 975.0 inch-pounds. The deviationsof X and Y from their averages are listed for each observation,and these are used to calculate values of the statistics SXX, SYY,and SXY. The least squares estimates of the slope and interceptare calculated, resulting in the estimated model equation givingfitted values Yˆi5-569.4716.898 Xi, and these values are listed foreach observation. The residuals ei5Yi5Yˆiare also listed foreach observation. Estimates of the variance and standarddeviation of the Y distribution are calculated from squares ofthe residuals. The estimated standard deviation is 99.90 inch-pounds.5.2.7.5 The least squares straight line is depicted wi