Changes

Jump to navigation Jump to search
m
Line 9: Line 9:  
  | pdf    = http://archives.datapages.com/data/specpubs/methodo1/images/a095/a0950001/0300/03430.pdf
 
  | pdf    = http://archives.datapages.com/data/specpubs/methodo1/images/a095/a0950001/0300/03430.pdf
 
}}
 
}}
 +
 +
[[File:Correlation-and-regression-analysis fig1.png|300px|thumb|{{figure number|1}}Linear regression of x-on-y. Note the negative slope corresponding to a negative correlation. The regression line is determined so as to minimize the sum of squared deviations: <math>\sum_i{e_i^2}</math>]]
 +
 
Correlation analysis, and its cousin, regression analysis, are well-known statistical approaches used in the study of relationships among multiple physical properties. The investigation of [[permeability]]-[[porosity]] relationships is a typical example of the use of correlation in geology.
 
Correlation analysis, and its cousin, regression analysis, are well-known statistical approaches used in the study of relationships among multiple physical properties. The investigation of [[permeability]]-[[porosity]] relationships is a typical example of the use of correlation in geology.
    
The term ''correlation'' most often refers to the linear association between two quantities or variables, that is, the tendency for one variable, x, to increase or decrease as the other, y, increases or decreases, in a straight-line trend or relationship.<ref name=Draper_etal_1966>Draper, N. R., and H. Smith, 1966, Applied regression analysis, 2nd ed.: New York, John Wiley, 709 p.</ref> <ref name=Snedecor_etal_1967>Snedecor, G. W., and W. G. Cochran, 1967, Statistical methods, 6th ed.: Ames, Iowa State Univ. Press, 593 p.</ref> The ''correlation coefficient'' (also called the Pearson correlation coefficient), r, is a dimensionless numerical index of the strength of that relationship. The sample value of r, which can range from -1 to +1, is computed using the following formula:
 
The term ''correlation'' most often refers to the linear association between two quantities or variables, that is, the tendency for one variable, x, to increase or decrease as the other, y, increases or decreases, in a straight-line trend or relationship.<ref name=Draper_etal_1966>Draper, N. R., and H. Smith, 1966, Applied regression analysis, 2nd ed.: New York, John Wiley, 709 p.</ref> <ref name=Snedecor_etal_1967>Snedecor, G. W., and W. G. Cochran, 1967, Statistical methods, 6th ed.: Ames, Iowa State Univ. Press, 593 p.</ref> The ''correlation coefficient'' (also called the Pearson correlation coefficient), r, is a dimensionless numerical index of the strength of that relationship. The sample value of r, which can range from -1 to +1, is computed using the following formula:
   −
:<math>r = \frac{\displaystyle \sum_{i} (x_{i}-\bar{x})(y_{i}-\bar{y})}{\sqrt{\displaystyle \sum_i (x_{i}-\bar{x})^2 \cdot \displaystyle \sum_i (y_{i}-\bar{y})^2}}</math>
+
:<math>r = \frac{\displaystyle \sum_{i} (x_{i}-\bar{x})(y_{i}-\bar{y})}{\sqrt{\displaystyle \sum_i (x_{i}-\bar{x})^2 \times \displaystyle \sum_i (y_{i}-\bar{y})^2}}</math>
    
where the summation is made over the n sample values available and where
 
where the summation is made over the n sample values available and where
Line 25: Line 28:  
Whereas correlation describes the linear association among variables, regression involves the prediction of one quantity from the others. Regression analysis is that broad class of statistics and statistical methods that comprises line, curve, and surface fitting, as well as other kinds of prediction and modeling techniques.
 
Whereas correlation describes the linear association among variables, regression involves the prediction of one quantity from the others. Regression analysis is that broad class of statistics and statistical methods that comprises line, curve, and surface fitting, as well as other kinds of prediction and modeling techniques.
   −
The simplest type of regression analysis involves fitting a straight line between two variables (Figure 1). In this case, one of the quantities is called the ''independent or predictor variable'' (usually denoted x), while the other is called the ''dependent or predicted variable'' (usually denoted y). This approach is often referred to as ''simple linear regression,'' or y-on-x regression. It leads to the development of an empirical straight-line relationship between the two variables and has the following form:
+
The simplest type of regression analysis involves fitting a straight line between two variables ([[:file:Correlation-and-regression-analysis fig1.png|Figure 1]]). In this case, one of the quantities is called the ''independent or predictor variable'' (usually denoted x), while the other is called the ''dependent or predicted variable'' (usually denoted y). This approach is often referred to as ''simple linear regression,'' or y-on-x regression. It leads to the development of an empirical straight-line relationship between the two variables and has the following form:
    
:<math>\widehat{y} = ax + b</math>
 
:<math>\widehat{y} = ax + b</math>
Line 65: Line 68:  
==Multiple and multivariate regression==
 
==Multiple and multivariate regression==
   −
The most important extension of the two-variable case is to situations involving more than two variables. When there is still one dependent variable but many predictor variables, the fitting technique is called ''multiple linear regression.'' When there are also more than one dependent variable, the approach is called ''multivariate regression'' (see [[Multivariate data analysis]]). The methods of simple bivariate regression extend directly to these multivariate situations. A typical geological application of multiple regression is the prediction of fold thickness from various geometric attributes, as given by the following equation:
+
The most important extension of the two-variable case is to situations involving more than two variables. When there is still one dependent variable but many predictor variables, the fitting technique is called ''multiple linear regression.'' When there are also more than one dependent variable, the approach is called ''multivariate regression'' (see [[Multivariate data analysis]]). The methods of simple bivariate regression extend directly to these multivariate situations. A typical geological application of multiple regression is the prediction of [[fold]] thickness from various geometric attributes, as given by the following equation:
    
:<math>\text{Thickness } = a+b~(\text{attitude}) + c~(\text{tightness}) + d~(\text{asymmetry}) </math>
 
:<math>\text{Thickness } = a+b~(\text{attitude}) + c~(\text{tightness}) + d~(\text{asymmetry}) </math>
Line 72: Line 75:     
The fitting of surfaces by least squares is an important component in most automated contouring software packages and is commonly used in computer generation of geological maps. Trend surface analysis, another mapping technique, is also based on the principles of least-squares fitting. Finally, some of the more specialized geostatistical techniques, such as kriging, are likewise rooted in the basic principles of least squares and multiple regression.
 
The fitting of surfaces by least squares is an important component in most automated contouring software packages and is commonly used in computer generation of geological maps. Trend surface analysis, another mapping technique, is also based on the principles of least-squares fitting. Finally, some of the more specialized geostatistical techniques, such as kriging, are likewise rooted in the basic principles of least squares and multiple regression.
  −
[[File:Correlation-and-regression-analysis fig1.png|thumb|Linear regression of x-on-y. Note the negative slope corresponding to a negative correlation. The regression line is determined so as to minimize the sum of squared deviations: [equation]]]
      
==References==
 
==References==
Line 84: Line 85:     
[[Category:Geological methods]] [[Category:Test content]][[Category:Pages with unformatted equations]]
 
[[Category:Geological methods]] [[Category:Test content]][[Category:Pages with unformatted equations]]
 +
[[Category:Methods in Exploration 10]]

Navigation menu