The multivariate analyses were run on either R or SAS. The appendix link at the end of each section links to the appropriate appendix where the code and output for each analysis can be viewed.

To begin the identification process it was important to see if the species whose succession classification were known grouped together when their variables were analyzed and compared. If the known species do not group together then analyzing the species’ succession based on frequency, clustering and the slope of the dbh distribution in 3 successional stands would be irrelevant. Grouping is visualized by running a PCA and Cluster analysis on all species and all variables in the test. Data Table.

Since the range of values amongst the variables is large, a correlation matrix was used in the PCA analysis to ensure that each variable has equal weight. If the PCA analysis is run without using a correlation matrix then the frequencies of the Young, Mid and Old Growth stands would have a higher weight then the other variables since the values in those variables are much larger then the clustering and slope values. The princomp function in R was used to run the PCA analysis.

The values in the test data are per species therefore the method chosed for the CLUSTER analysis will not affect the results of the analysis. For this reason centroid method in the PROC CLUSTER function in SAS was used in the CLUSTER analysis.

PCA Appendix (Appendix B)CLUSTER Appendix (Appendix C)

To test a data set for significant differences between the mean values for any pair of variables the F-test in PROC DISCRIM is used. The null-hypothesis for this test is that the group centroids for each succession classification are equal. Since 3 succession classifications were compared the total number of possible comparisons is 3. To prevent the inflation of an experimentwise type I error due to multiple comparisons the stand critical T^2-value (alpha=0.05) for the F-test was adjusted based on the Bonferroni's adjustment of critical F-values for multiple comparisons. This adjustment changed the critical T^2-value from 0.05 to alpha/c=0.0166667, where c is the number or possible comparisons.

DISCRIM Appendix (Appendix D)The primary objective of this project is to predict the succession classification for the unknown species in the test Data Table. PROC DISCRIM function in SAS was used to predict the unknown species succession classification. The known variables were used as a model for the remaining species and the SAS program automatically outputted the unknown species predicted classifications.

To visualize the results of the PROC DISCRIM predictions of species succession classification PROC CANDISK and PROC GPLOT functions in SAS were used to generate a image to illustrate the early, mid and late successional species clustering for both the known and predicted species combined.

DISCRIM Appendix (Appendix D)CANDISK Appendix (Appendix E)

To predict the succession classification of the unknown species it is important to know which of the 8 variables used in the multivariate analyses is significant in differentiating between early, mid and late successional species. The PROC STEPDISK function in SAS was used with the STEPWISE method to calculate an F-value for each variable, based on an analysis of covariance, to step through each variable to test if it was significant in classifying the unknown species.

STEPDISK Appendix (Appendix F)
HOME

INTRODUCTION

DATA DETAILS

MULTIVARIATE METHODS

MULTIVARIATE RESULTS & DISCUSSION

CONCLUSION

APPENDICES

REFERENCES & ACKNOWLEDGEMENTS

DATA PREPARATION METHODS

DATA PREPARATION RESULTS & DISCUSSION

PRELIMINARY ANALYSIS