Changes

Jump to navigation Jump to search
no edit summary
Line 32: Line 32:  
The simplest and most commonly overlooked statistical procedure is to plot the data.<ref name=pt06r7>Atkinson, A. C., 1985, Plots, transformations, and regression: Oxford, U., K., Oxford Press, 282 p.</ref> Often a simple crossplot reveals the essential characteristics of a data set and allows for interpretation as well as proper selection of additional methods. In most cases, plotting of data reveals the nature of the data set and outliers or anomalous data points to review for accuracy or measurement error and can indicate the spread or variability of the data. Eliminating measurement error is not uncommon even in commercial data sets. For example, in a data set composed of well information, if the kelly bushing is not known or uniformly subtracted from all wells, the resulting map will develop a severe case of volcanoes.
 
The simplest and most commonly overlooked statistical procedure is to plot the data.<ref name=pt06r7>Atkinson, A. C., 1985, Plots, transformations, and regression: Oxford, U., K., Oxford Press, 282 p.</ref> Often a simple crossplot reveals the essential characteristics of a data set and allows for interpretation as well as proper selection of additional methods. In most cases, plotting of data reveals the nature of the data set and outliers or anomalous data points to review for accuracy or measurement error and can indicate the spread or variability of the data. Eliminating measurement error is not uncommon even in commercial data sets. For example, in a data set composed of well information, if the kelly bushing is not known or uniformly subtracted from all wells, the resulting map will develop a severe case of volcanoes.
   −
There are three measures of characterizing a population by describing the average value, or its central tendency. The most familiar measure is the ''arithmetic mean,'' which is simply the sum of the values divided by their number. The ''mode'' is the value that occurs with the greatest frequency, and the ''median'' is the value that has as many values above it as below it ([[:file:statistics-overview_fig1.png|Figure 1]]). As an example of comparing some of the statistics discussed in previous chapters, consider the following values of [[porosity]] (in percent) that have been measured on ten different sandstone samples: 15.1, 16.5, 18.8, 19.0, 22.0, 23.0, 25.0, 24.9, 31.9, and 43.0. Of the measures of central tendency, the arithmetic mean is the sum of all these numbers divided in this case by 10, or 239.2 ö 10 = 23.93. The median is 22.5 (halfway between 22.0 and 23.0), the value below which half the porosity values fall. The mid-range value is 29.05. The mode is the most frequently occurring value. Of the measures of dispersion, the range is computed to be 27.9, the variance is 61.79, and the standard deviation (the square root of the variance) is 7.86.
+
There are three measures of characterizing a population by describing the average value, or its central tendency. The most familiar measure is the ''arithmetic mean,'' which is simply the sum of the values divided by their number. The ''mode'' is the value that occurs with the greatest frequency, and the ''median'' is the value that has as many values above it as below it ([[:file:statistics-overview_fig1.png|Figure 1]]). As an example of comparing some of the statistics discussed in previous chapters, consider the following values of [[porosity]] (in percent) that have been measured on ten different [[sandstone]] samples: 15.1, 16.5, 18.8, 19.0, 22.0, 23.0, 25.0, 24.9, 31.9, and 43.0. Of the measures of central tendency, the arithmetic mean is the sum of all these numbers divided in this case by 10, or 239.2 ö 10 = 23.93. The median is 22.5 (halfway between 22.0 and 23.0), the value below which half the porosity values fall. The mid-range value is 29.05. The mode is the most frequently occurring value. Of the measures of dispersion, the range is computed to be 27.9, the variance is 61.79, and the standard deviation (the square root of the variance) is 7.86.
    
Although the mean, median, and mode convey the same general notion of centrality, their values are often different, as just demonstrated, because they represent different functions of the same data. Statistically, each has its strengths and weaknesses. Although it is sensitive to extreme values, the arithmetic mean is most generally used, partially because of convention and partially because of its computational versatility in other statistical calculations.
 
Although the mean, median, and mode convey the same general notion of centrality, their values are often different, as just demonstrated, because they represent different functions of the same data. Statistically, each has its strengths and weaknesses. Although it is sensitive to extreme values, the arithmetic mean is most generally used, partially because of convention and partially because of its computational versatility in other statistical calculations.

Navigation menu