Changes

Jump to navigation Jump to search
Line 35: Line 35:     
Although the mean, median, and mode convey the same general notion of centrality, their values are often different, as just demonstrated, because they represent different functions of the same data. Statistically, each has its strengths and weaknesses. Although it is sensitive to extreme values, the arithmetic mean is most generally used, partially because of convention and partially because of its computational versatility in other statistical calculations.
 
Although the mean, median, and mode convey the same general notion of centrality, their values are often different, as just demonstrated, because they represent different functions of the same data. Statistically, each has its strengths and weaknesses. Although it is sensitive to extreme values, the arithmetic mean is most generally used, partially because of convention and partially because of its computational versatility in other statistical calculations.
 +
 +
[[file:statistics-overview_fig2.png|thumb|400px|{{figure number|2}}A symmetrical data set. The three measures of central tendency are identical.]]
    
The differences among these measures are a function of the frequency distribution of the samples. The frequency distribution is nothing more than a plot of the values versus the number of times the value occurs, and it is often depicted as a histogram. Most values cluster around some central value, and the frequency of occurrence declines toward extreme values. There are several shapes of frequency distributions that commonly occur in nature. Data sets that are symmetrical about a central value develop the familiar “bell-shaped” ''normal'' distribution ([[:file:statistics-overview_fig2.png|Figure 2]]). Data sets that have numerous small values and a few large values develop an asymmetrical curve shape. Comparison of histograms plays a vital role in the study of various geological properties. For example, construction of a histogram might be used to determine if a particular oil field exhibits a multimodal porosity distribution, indicating the presence of multiple lithologies. Another situation might involve a comparison of the distributions of petroleum field sizes discovered worldwide in foreland and rift basins.
 
The differences among these measures are a function of the frequency distribution of the samples. The frequency distribution is nothing more than a plot of the values versus the number of times the value occurs, and it is often depicted as a histogram. Most values cluster around some central value, and the frequency of occurrence declines toward extreme values. There are several shapes of frequency distributions that commonly occur in nature. Data sets that are symmetrical about a central value develop the familiar “bell-shaped” ''normal'' distribution ([[:file:statistics-overview_fig2.png|Figure 2]]). Data sets that have numerous small values and a few large values develop an asymmetrical curve shape. Comparison of histograms plays a vital role in the study of various geological properties. For example, construction of a histogram might be used to determine if a particular oil field exhibits a multimodal porosity distribution, indicating the presence of multiple lithologies. Another situation might involve a comparison of the distributions of petroleum field sizes discovered worldwide in foreland and rift basins.
  −
[[file:statistics-overview_fig2.png|thumb|{{figure number|2}}A symmetrical data set. The three measures of central tendency are identical.]]
      
The three measures of central tendency are identical in symmetrical data sets ([[:file:statistics-overview_fig2.png|Figure 2]]) and are very different in asymmetrical data sets ([[:file:statistics-overview_fig1.png|Figure 1]]). This difference is crucial in arriving at essential estimates. For example, what is the ''most likely'' value for reserves for the next well we drill? If, as in most producing basins, there are a few huge fields and many subcommercial small fields, the most likely discovery is not the mean but the mode. Determining the shape of the frequency distribution is critical to understanding which statistic to use. (For an excellent discussion of the characteristics of petroleum data population distributions, see Harbaugh et al.<ref name=pt06r47 />)
 
The three measures of central tendency are identical in symmetrical data sets ([[:file:statistics-overview_fig2.png|Figure 2]]) and are very different in asymmetrical data sets ([[:file:statistics-overview_fig1.png|Figure 1]]). This difference is crucial in arriving at essential estimates. For example, what is the ''most likely'' value for reserves for the next well we drill? If, as in most producing basins, there are a few huge fields and many subcommercial small fields, the most likely discovery is not the mean but the mode. Determining the shape of the frequency distribution is critical to understanding which statistic to use. (For an excellent discussion of the characteristics of petroleum data population distributions, see Harbaugh et al.<ref name=pt06r47 />)

Navigation menu