Changes

Jump to navigation Jump to search
no edit summary
Line 50: Line 50:     
==Discriminant analysis (classification)==
 
==Discriminant analysis (classification)==
[[File:Charles-l-vavra-john-g-kaldi-robert-m-sneider capillary-pressure 1.jpg|thumbnail|'''Figure 1.''' Plot of two-bivariate distributions, showing overlap between groups a and b along both variables ''x''<sub>1</sub> and ''x''<sub>2</sub>. Groups can be distinguished by projecting members of the two groups onto the discriminant function line.<ref name=Davis_1986 />]]
+
[[File:Multivariate-data-analysis fig1.png|300px|thumbnail|'''Figure 1.''' Plot of two-bivariate distributions, showing overlap between groups a and b along both variables ''x''<sub>1</sub> and ''x''<sub>2</sub>. Groups can be distinguished by projecting members of the two groups onto the discriminant function line.<ref name=Davis_1986 />]]
    
''Discriminant analysis'' (DA) attempts to determine an allocation rule to classify multivariate data vectors into a set of predefined classes, with a minimum probability of misclassification.<ref name=Davis_1986>Davis, J. C., 1986, Statistics and data analysis in geology: New York, John Wiley, 646 p.</ref> Consider a set of n samples with P quantities being measured on each. Suppose that the n samples are divided into m classes or groups. Discriminant analysis consists of two steps:
 
''Discriminant analysis'' (DA) attempts to determine an allocation rule to classify multivariate data vectors into a set of predefined classes, with a minimum probability of misclassification.<ref name=Davis_1986>Davis, J. C., 1986, Statistics and data analysis in geology: New York, John Wiley, 646 p.</ref> Consider a set of n samples with P quantities being measured on each. Suppose that the n samples are divided into m classes or groups. Discriminant analysis consists of two steps:
Line 75: Line 75:  
One can also define a distance ''d''<sub>ii&prime;</sub> between any two variables ''x''<sub>''i''</sub> and ''x''<sub>i&prime;</sub> by setting the previous summations over all n samples. Such distances between variables lead to definition of classes of variables having similar sample values. Such classes (clusters) of variables can help defining subsets of the P variables for further studies, with reduced dimensionality.
 
One can also define a distance ''d''<sub>ii&prime;</sub> between any two variables ''x''<sub>''i''</sub> and ''x''<sub>i&prime;</sub> by setting the previous summations over all n samples. Such distances between variables lead to definition of classes of variables having similar sample values. Such classes (clusters) of variables can help defining subsets of the P variables for further studies, with reduced dimensionality.
   −
[[File:Charles-l-vavra-john-g-kaldi-robert-m-sneider capillary-pressure 2.jpg|thumbnail|'''Figure 2.''' Dendrogram (by aggregation). Starting from n samples, combine the two most similar samples (here 2 and 3). Then, combine the two nearest groups by either joining two samples or aggregating a third sample to the previous group of two (1 is aggregated to 2 and 3). At the next step, 4 and 5 constitutes a new group, which is then aggregated to the former group (1, 2, 3). The aggregation process stops when there is only one group left. In the last step, group (1, 2, 3, 4, 5) is aggregated to group (6, 7, 8, 9).]]
+
[[File:Multivariate-data-analysis fig2.png|300px|thumbnail|'''Figure 2.''' Dendrogram (by aggregation). Starting from n samples, combine the two most similar samples (here 2 and 3). Then, combine the two nearest groups by either joining two samples or aggregating a third sample to the previous group of two (1 is aggregated to 2 and 3). At the next step, 4 and 5 constitutes a new group, which is then aggregated to the former group (1, 2, 3). The aggregation process stops when there is only one group left. In the last step, group (1, 2, 3, 4, 5) is aggregated to group (6, 7, 8, 9).]]
    
There is a large (and growing) variety of types of cluster analysis techniques:<ref name=Hartigan_1975 />
 
There is a large (and growing) variety of types of cluster analysis techniques:<ref name=Hartigan_1975 />

Navigation menu