Changes

Jump to navigation Jump to search
m
Line 9: Line 9:  
  | pdf    = http://archives.datapages.com/data/specpubs/methodo1/images/a095/a0950001/0300/03430.pdf
 
  | pdf    = http://archives.datapages.com/data/specpubs/methodo1/images/a095/a0950001/0300/03430.pdf
 
}}
 
}}
 +
 +
[[File:Correlation-and-regression-analysis fig1.png|300px|thumb|{{figure number|1}}Linear regression of x-on-y. Note the negative slope corresponding to a negative correlation. The regression line is determined so as to minimize the sum of squared deviations: <math>\sum_i{e_i^2}</math>]]
 +
 
Correlation analysis, and its cousin, regression analysis, are well-known statistical approaches used in the study of relationships among multiple physical properties. The investigation of [[permeability]]-[[porosity]] relationships is a typical example of the use of correlation in geology.
 
Correlation analysis, and its cousin, regression analysis, are well-known statistical approaches used in the study of relationships among multiple physical properties. The investigation of [[permeability]]-[[porosity]] relationships is a typical example of the use of correlation in geology.
    
The term ''correlation'' most often refers to the linear association between two quantities or variables, that is, the tendency for one variable, x, to increase or decrease as the other, y, increases or decreases, in a straight-line trend or relationship.<ref name=Draper_etal_1966>Draper, N. R., and H. Smith, 1966, Applied regression analysis, 2nd ed.: New York, John Wiley, 709 p.</ref> <ref name=Snedecor_etal_1967>Snedecor, G. W., and W. G. Cochran, 1967, Statistical methods, 6th ed.: Ames, Iowa State Univ. Press, 593 p.</ref> The ''correlation coefficient'' (also called the Pearson correlation coefficient), r, is a dimensionless numerical index of the strength of that relationship. The sample value of r, which can range from -1 to +1, is computed using the following formula:
 
The term ''correlation'' most often refers to the linear association between two quantities or variables, that is, the tendency for one variable, x, to increase or decrease as the other, y, increases or decreases, in a straight-line trend or relationship.<ref name=Draper_etal_1966>Draper, N. R., and H. Smith, 1966, Applied regression analysis, 2nd ed.: New York, John Wiley, 709 p.</ref> <ref name=Snedecor_etal_1967>Snedecor, G. W., and W. G. Cochran, 1967, Statistical methods, 6th ed.: Ames, Iowa State Univ. Press, 593 p.</ref> The ''correlation coefficient'' (also called the Pearson correlation coefficient), r, is a dimensionless numerical index of the strength of that relationship. The sample value of r, which can range from -1 to +1, is computed using the following formula:
   −
:<math>r = \frac{\displaystyle \sum_{i} (x_{i}-\bar{x})(y_{i}-\bar{y})}{\sqrt{\displaystyle \sum_i (x_{i}-\bar{x})^2 \cdot \displaystyle \sum_i (y_{i}-\bar{y})^2}}</math>
+
:<math>r = \frac{\displaystyle \sum_{i} (x_{i}-\bar{x})(y_{i}-\bar{y})}{\sqrt{\displaystyle \sum_i (x_{i}-\bar{x})^2 \times \displaystyle \sum_i (y_{i}-\bar{y})^2}}</math>
    
where the summation is made over the n sample values available and where
 
where the summation is made over the n sample values available and where
Line 25: Line 28:  
Whereas correlation describes the linear association among variables, regression involves the prediction of one quantity from the others. Regression analysis is that broad class of statistics and statistical methods that comprises line, curve, and surface fitting, as well as other kinds of prediction and modeling techniques.
 
Whereas correlation describes the linear association among variables, regression involves the prediction of one quantity from the others. Regression analysis is that broad class of statistics and statistical methods that comprises line, curve, and surface fitting, as well as other kinds of prediction and modeling techniques.
   −
The simplest type of regression analysis involves fitting a straight line between two variables (Figure 1). In this case, one of the quantities is called the ''independent or predictor variable'' (usually denoted x), while the other is called the ''dependent or predicted variable'' (usually denoted y). This approach is often referred to as ''simple linear regression,'' or y-on-x regression. It leads to the development of an empirical straight-line relationship between the two variables and has the following form:
+
The simplest type of regression analysis involves fitting a straight line between two variables ([[:file:Correlation-and-regression-analysis fig1.png|Figure 1]]). In this case, one of the quantities is called the ''independent or predictor variable'' (usually denoted x), while the other is called the ''dependent or predicted variable'' (usually denoted y). This approach is often referred to as ''simple linear regression,'' or y-on-x regression. It leads to the development of an empirical straight-line relationship between the two variables and has the following form:
    
:<math>\widehat{y} = ax + b</math>
 
:<math>\widehat{y} = ax + b</math>
Line 56: Line 59:  
Some important extensions of two-variable linear regression analysis of particular interest to geologists include the following:
 
Some important extensions of two-variable linear regression analysis of particular interest to geologists include the following:
 
* Fitting through the origin, or forcing the regression line through any fixed point
 
* Fitting through the origin, or forcing the regression line through any fixed point
* ''Regression in reverse, which consists of rewriting the regression of y-on-x as x'' = (y - b)/a
+
* Regression in reverse, which consists of rewriting the regression of y-on-x as  
* ''Modeling nonlinear relationships, such as y'' = a + bx + cx<sup>2</sup>
+
:<math>x = \frac{(y - b)}{a}</math>
 +
* ''Modeling nonlinear relationships, such as  
 +
:<math>\widehat{y} = a + bx + cx^2</math>
    
Various other functions of the x variable can be included in the previous relationship, such as polynomials and logarithms. Again, the regression parameters are determined so as to minimize the corresponding error variance.
 
Various other functions of the x variable can be included in the previous relationship, such as polynomials and logarithms. Again, the regression parameters are determined so as to minimize the corresponding error variance.
Line 63: Line 68:  
==Multiple and multivariate regression==
 
==Multiple and multivariate regression==
   −
The most important extension of the two-variable case is to situations involving more than two variables. When there is still one dependent variable but many predictor variables, the fitting technique is called ''multiple linear regression.'' When there are also more than one dependent variable, the approach is called ''multivariate regression'' (see [[Multivariate data analysis]]). The methods of simple bivariate regression extend directly to these multivariate situations. A typical geological application of multiple regression is the prediction of fold thickness from various geometric attributes, as given by the following equation:
+
The most important extension of the two-variable case is to situations involving more than two variables. When there is still one dependent variable but many predictor variables, the fitting technique is called ''multiple linear regression.'' When there are also more than one dependent variable, the approach is called ''multivariate regression'' (see [[Multivariate data analysis]]). The methods of simple bivariate regression extend directly to these multivariate situations. A typical geological application of multiple regression is the prediction of [[fold]] thickness from various geometric attributes, as given by the following equation:
   −
:<math>\mathbf{Equation}</math>
+
:<math>\text{Thickness } = a+b~(\text{attitude}) + c~(\text{tightness}) + d~(\text{asymmetry}) </math>
    
''Polynomial regression,'' sometimes referred to as ''curve fitting,'' is a special type of multiple regression frequently used in the earth sciences to model nonlinear relationships. Spline fitting is another type of nonlinear curve fitting. A close relative to polynomial curve fitting is surface fitting, which has one or more spatial components as predictor variables.
 
''Polynomial regression,'' sometimes referred to as ''curve fitting,'' is a special type of multiple regression frequently used in the earth sciences to model nonlinear relationships. Spline fitting is another type of nonlinear curve fitting. A close relative to polynomial curve fitting is surface fitting, which has one or more spatial components as predictor variables.
    
The fitting of surfaces by least squares is an important component in most automated contouring software packages and is commonly used in computer generation of geological maps. Trend surface analysis, another mapping technique, is also based on the principles of least-squares fitting. Finally, some of the more specialized geostatistical techniques, such as kriging, are likewise rooted in the basic principles of least squares and multiple regression.
 
The fitting of surfaces by least squares is an important component in most automated contouring software packages and is commonly used in computer generation of geological maps. Trend surface analysis, another mapping technique, is also based on the principles of least-squares fitting. Finally, some of the more specialized geostatistical techniques, such as kriging, are likewise rooted in the basic principles of least squares and multiple regression.
  −
[[File:t-c-coburn_correlation-and-regression-analysis_1.png|thumb|Linear regression of x-on-y. Note the negative slope corresponding to a negative correlation. The regression line is determined so as to minimize the sum of squared deviations: [equation]]]
      
==References==
 
==References==
Line 82: Line 85:     
[[Category:Geological methods]] [[Category:Test content]][[Category:Pages with unformatted equations]]
 
[[Category:Geological methods]] [[Category:Test content]][[Category:Pages with unformatted equations]]
 +
[[Category:Methods in Exploration 10]]

Navigation menu