Changes

Jump to navigation Jump to search
47 bytes added ,  19:28, 19 January 2022
m
Line 13: Line 13:  
  | isbn    = 0891816607
 
  | isbn    = 0891816607
 
}}
 
}}
The purpose of statistics is to project or infer, from limited samples, the character of a population. In most cases, particularly in oil and gas investigations, geological information is not derived from carefully designed sample schemes but, by design, represents anomalies. What successful company would drill on a regional trend as opposed to the top of a structure, on a bright spot, or at the crest of a reef? Statistical procedures presume that sufficient data are randomly sampled from a population and that the average sample value approximates the population average. This is only possible if both high and low values are sampled without bias and enough samples are taken to stabilize the calculations. While proper sampling techniques are essential to formal statistical inference, geological samples are much too difficult or costly to obtain and cannot be discarded. Therefore, the robust testing of hypotheses and calculation of confidence intervals for statistical projections must be viewed in the restrictive light of geological data. Nonetheless, quantitative description and relationship inferences can be made with the underlying awareness of the constraint of data quality.
+
The purpose of statistics is to project or infer, from limited samples, the character of a population. In most cases, particularly in oil and gas investigations, geological information is not derived from carefully designed sample schemes but, by design, represents anomalies. What successful company would drill on a regional trend as opposed to the top of a structure, on a bright spot, or at the crest of a [[reef]]? Statistical procedures presume that sufficient data are randomly sampled from a population and that the average sample value approximates the population average. This is only possible if both high and low values are sampled without bias and enough samples are taken to stabilize the calculations. While proper sampling techniques are essential to formal statistical inference, geological samples are much too difficult or costly to obtain and cannot be discarded. Therefore, the robust testing of hypotheses and calculation of confidence intervals for statistical projections must be viewed in the restrictive light of geological data. Nonetheless, quantitative description and relationship inferences can be made with the underlying awareness of the constraint of data quality.
    
It is also important to remember the effect of resolution and precision in analyzing quantitative geological data. J. C. Davis put it eloquently in his introduction to his classic text.<ref name=pt06r24>Davis, J. C., 1986, Statistics and data analysis in geology: New York, John Wiley, 646 p.</ref>
 
It is also important to remember the effect of resolution and precision in analyzing quantitative geological data. J. C. Davis put it eloquently in his introduction to his classic text.<ref name=pt06r24>Davis, J. C., 1986, Statistics and data analysis in geology: New York, John Wiley, 646 p.</ref>
Line 32: Line 32:  
The simplest and most commonly overlooked statistical procedure is to plot the data.<ref name=pt06r7>Atkinson, A. C., 1985, Plots, transformations, and regression: Oxford, U., K., Oxford Press, 282 p.</ref> Often a simple crossplot reveals the essential characteristics of a data set and allows for interpretation as well as proper selection of additional methods. In most cases, plotting of data reveals the nature of the data set and outliers or anomalous data points to review for accuracy or measurement error and can indicate the spread or variability of the data. Eliminating measurement error is not uncommon even in commercial data sets. For example, in a data set composed of well information, if the kelly bushing is not known or uniformly subtracted from all wells, the resulting map will develop a severe case of volcanoes.
 
The simplest and most commonly overlooked statistical procedure is to plot the data.<ref name=pt06r7>Atkinson, A. C., 1985, Plots, transformations, and regression: Oxford, U., K., Oxford Press, 282 p.</ref> Often a simple crossplot reveals the essential characteristics of a data set and allows for interpretation as well as proper selection of additional methods. In most cases, plotting of data reveals the nature of the data set and outliers or anomalous data points to review for accuracy or measurement error and can indicate the spread or variability of the data. Eliminating measurement error is not uncommon even in commercial data sets. For example, in a data set composed of well information, if the kelly bushing is not known or uniformly subtracted from all wells, the resulting map will develop a severe case of volcanoes.
   −
There are three measures of characterizing a population by describing the average value, or its central tendency. The most familiar measure is the ''arithmetic mean,'' which is simply the sum of the values divided by their number. The ''mode'' is the value that occurs with the greatest frequency, and the ''median'' is the value that has as many values above it as below it ([[:file:statistics-overview_fig1.png|Figure 1]]). As an example of comparing some of the statistics discussed in previous chapters, consider the following values of [[porosity]] (in percent) that have been measured on ten different sandstone samples: 15.1, 16.5, 18.8, 19.0, 22.0, 23.0, 25.0, 24.9, 31.9, and 43.0. Of the measures of central tendency, the arithmetic mean is the sum of all these numbers divided in this case by 10, or 239.2 ö 10 = 23.93. The median is 22.5 (halfway between 22.0 and 23.0), the value below which half the porosity values fall. The mid-range value is 29.05. The mode is the most frequently occurring value. Of the measures of dispersion, the range is computed to be 27.9, the variance is 61.79, and the standard deviation (the square root of the variance) is 7.86.
+
There are three measures of characterizing a population by describing the average value, or its central tendency. The most familiar measure is the ''arithmetic mean,'' which is simply the sum of the values divided by their number. The ''mode'' is the value that occurs with the greatest frequency, and the ''median'' is the value that has as many values above it as below it ([[:file:statistics-overview_fig1.png|Figure 1]]). As an example of comparing some of the statistics discussed in previous chapters, consider the following values of [[porosity]] (in percent) that have been measured on ten different [[sandstone]] samples: 15.1, 16.5, 18.8, 19.0, 22.0, 23.0, 25.0, 24.9, 31.9, and 43.0. Of the measures of central tendency, the arithmetic mean is the sum of all these numbers divided in this case by 10, or 239.2 ö 10 = 23.93. The median is 22.5 (halfway between 22.0 and 23.0), the value below which half the porosity values fall. The mid-range value is 29.05. The mode is the most frequently occurring value. Of the measures of dispersion, the range is computed to be 27.9, the variance is 61.79, and the standard deviation (the square root of the variance) is 7.86.
    
Although the mean, median, and mode convey the same general notion of centrality, their values are often different, as just demonstrated, because they represent different functions of the same data. Statistically, each has its strengths and weaknesses. Although it is sensitive to extreme values, the arithmetic mean is most generally used, partially because of convention and partially because of its computational versatility in other statistical calculations.
 
Although the mean, median, and mode convey the same general notion of centrality, their values are often different, as just demonstrated, because they represent different functions of the same data. Statistically, each has its strengths and weaknesses. Although it is sensitive to extreme values, the arithmetic mean is most generally used, partially because of convention and partially because of its computational versatility in other statistical calculations.
Line 146: Line 146:     
[[Category:Geological methods]]
 
[[Category:Geological methods]]
 +
[[Category:Methods in Exploration 10]]

Navigation menu