The multivariate analyses were run on either R or SAS. The appendix link at the end of each section links to the appropriate appendix where the code and output for each analysis can be viewed.
To begin the identification process it was important to see if the species whose succession classification were
known grouped together when their variables were analyzed and compared. If the known species do not group together
then analyzing the species’ succession based on frequency, clustering and the slope of the dbh distribution in 3
successional stands would be irrelevant. Grouping is visualized by running a PCA and
Cluster analysis on all species and all variables in the test.
Data Table.
Since the range of values amongst the variables is large, a correlation matrix was used in the PCA analysis to ensure
that each variable has equal weight. If the PCA analysis is run without using a correlation matrix then the frequencies
of the Young, Mid and Old Growth stands would have a higher weight then the other variables since the values in those
variables are much larger then the clustering and slope values. The princomp function in R was used to run the PCA analysis.
The values in the test data are per species therefore the method chosed for the CLUSTER analysis will not affect
the results of the analysis. For this reason centroid method in the
PROC CLUSTER function in SAS was used in the CLUSTER analysis.
To test a data set for significant differences between the mean values for any pair of variables the F-test in PROC DISCRIM
is used.
The null-hypothesis for this test is that the group centroids for each succession classification are equal.
Since 3 succession classifications were compared the total number of possible comparisons is 3. To prevent the
inflation of an experimentwise type I error
due to multiple comparisons the stand critical T^2-value (alpha=0.05) for the F-test was adjusted based on the
Bonferroni's adjustment
of critical F-values for multiple comparisons. This adjustment changed the critical T^2-value from 0.05 to
alpha/c=0.0166667, where c is the number or possible comparisons.
The primary objective of this project is to predict the succession classification for the unknown species in the test
Data Table.
PROC DISCRIM
function in SAS was used to predict the unknown species succession classification. The known variables
were used as a model for the remaining species and the SAS program automatically outputted the unknown species predicted
classifications.
To visualize the results of the PROC DISCRIM predictions of species succession classification
PROC CANDISK and PROC GPLOT
functions in SAS were used to generate a image to illustrate the early, mid and late successional species
clustering for both the known and predicted species combined.
To predict the succession classification of the unknown species it is important to know which of the 8 variables used
in the multivariate analyses is significant in differentiating between early, mid and late successional species. The
PROC STEPDISK
function in SAS was used with the STEPWISE method to calculate an F-value for each variable, based on an analysis
of covariance, to step through each variable to test if it was significant in classifying the unknown species.
HOME
CLUSTER Appendix (Appendix C)
Multivariate Methods: Test the Difference
Amongst Group Means
Multivariate Methods: Prediction of the
Succession Classification for Unknown Species
CANDISK Appendix (Appendix E)
Multivariate Methods: Test the
Significance of Each Variable in Predicting the Succession Classification for Unknown Species
INTRODUCTION
DATA DETAILS
MULTIVARIATE METHODS
MULTIVARIATE RESULTS & DISCUSSION
CONCLUSION
APPENDICES
REFERENCES & ACKNOWLEDGEMENTS
DATA PREPARATION METHODS
DATA PREPARATION RESULTS & DISCUSSION
PRELIMINARY ANALYSIS