Changes

Jump to navigation Jump to search
not fully compiled; reduced See also
Line 11: Line 11:  
  | pdf    = http://archives.datapages.com/data/specpubs/methodo1/images/a095/a0950001/0300/03450.pdf
 
  | pdf    = http://archives.datapages.com/data/specpubs/methodo1/images/a095/a0950001/0300/03450.pdf
 
}}
 
}}
Most geological phenomena are multivariate in nature; for example, a porous medium is characterized by a set of interdependent quantities or attributes such as grain size, porosity, permeability, and saturation. Although univariate statistical analysis can characterize the distribution of each attribute separately, an understanding of porous media calls for unraveling the interrelationships among their various attributes. Multivariate statistical analysis proposes to study the joint distribution of all attributes, in which the distribution of any single variable is analyzed as a function of the other attributes distributions.
+
Most geological phenomena are multivariate in nature; for example, a porous medium is characterized by a set of interdependent quantities or attributes such as grain size, [[porosity]], [[permeability]], and saturation. Although univariate statistical analysis can characterize the distribution of each attribute separately, an understanding of porous media calls for unraveling the interrelationships among their various attributes. Multivariate statistical analysis proposes to study the joint distribution of all attributes, in which the distribution of any single variable is analyzed as a function of the other attributes distributions.
    
Multivariate observations are best organized and manipulated as a matrix of sample values, of size (n × P), where n is the number of samples and P is the number of attributes or variables. For example, a (5 × 3) matrix might represent five core samples at different depths on which frequencies of occurrence of three different fossils are recorded. The purposes of multivariate data analysis is to study the relationships among the P attributes, classify the n collected samples into homogeneous groups, and make inferences about the underlying populations from the sample.
 
Multivariate observations are best organized and manipulated as a matrix of sample values, of size (n × P), where n is the number of samples and P is the number of attributes or variables. For example, a (5 × 3) matrix might represent five core samples at different depths on which frequencies of occurrence of three different fossils are recorded. The purposes of multivariate data analysis is to study the relationships among the P attributes, classify the n collected samples into homogeneous groups, and make inferences about the underlying populations from the sample.
Line 34: Line 34:     
==A note about data representativeness==
 
==A note about data representativeness==
As mentioned in the chapter on "Statistics Overview" (in Part 6), the available sample data are an incomplete image of the underlying population. Statistical features and relationships seen in the data may not be representative of  
+
As mentioned in the chapter on [[Statistics overview]] (in Part 6), the available sample data are an incomplete image of the underlying population. Statistical features and relationships seen in the data may not be representative of  
    
the characteristics of the underlying population if there are biases in the data. Sources of biases are multiple, from the most obvious measurement biases to imposed spatial clustering. Spatial clustering results from preferential location of data (drilling patterns and core plugs), a prevalent problem in exploration and development. Such preferential selection of data points can severely bias one's image of the reservoir, usually in a nonconservative way. Remedies include defining representative subsets of the data, weighting the data, and careful interpretation of the data analysis results.
 
the characteristics of the underlying population if there are biases in the data. Sources of biases are multiple, from the most obvious measurement biases to imposed spatial clustering. Spatial clustering results from preferential location of data (drilling patterns and core plugs), a prevalent problem in exploration and development. Such preferential selection of data points can severely bias one's image of the reservoir, usually in a nonconservative way. Remedies include defining representative subsets of the data, weighting the data, and careful interpretation of the data analysis results.
Line 92: Line 92:  
* Certain other techniques are specific to binary variables.
 
* Certain other techniques are specific to binary variables.
   −
The problem of preferential sampling in high pay zones, which may lead to more samples having high porosity and saturation values, is particularly critical when performing cluster analysis. If spatial declustering is not done properly before CA, all results can be mere artifacts of that preferential sampling. A related problem is linked to sample locations ''u''<sub>l</sub> and ''u''<sub>l&prime;</sub> not being accounted for in the definition of, say, the Euclidean distance between two samples l and l&prime;.
+
The problem of preferential sampling in high pay zones, which may lead to more samples having high [[porosity]] and saturation values, is particularly critical when performing cluster analysis. If spatial declustering is not done properly before CA, all results can be mere artifacts of that preferential sampling. A related problem is linked to sample locations ''u''<sub>l</sub> and ''u''<sub>l&prime;</sub> not being accounted for in the definition of, say, the Euclidean distance between two samples l and l&prime;.
    
:<math>\mathbf{Equation}</math>
 
:<math>\mathbf{Equation}</math>
Line 105: Line 105:     
==See also==
 
==See also==
 +
* [[Introduction to geological methods]]
 +
* [[Monte Carlo and stochastic simulation methods]]
 +
* [[Correlation and regression analysis]]
 +
* [[Statistics overview]]
    
==References==
 
==References==

Navigation menu