![]() |
The Spence Laboratory | ||||||||||||||||||||||||||||||||
| INVERTEBRATE ECOLOGY | |||||||||||||||||||||||||||||||||
| | ualberta.ca | Faculty of Agricultural, Environmental and Life Sciences | Renewable Resources | |||||||||||||||||||||||||||||||||
|
|
riate Regression Trees is a technique originally described by De'ath (2002). This is a record of my personal journey in learning how to apply this technique to my data. I am sure there are better ways, and I hope by sharing my experiences others will share theirs to make this technique more useful. If you have any questions or suggestion feel free to contact me by email: Josh.Jacobs[at]ualberta.ca OverviewI) Installing R-package and mvpart package from CRAN I) Installing R-package and mvpart package from CRAN
II) Preparing Data and importing into R
III) Creating a MRT in R
IV) Hints and tricksInterpretation of the Relative Error (RE) & Cross-Validated Error (CV-Error)The Relative Error (RE) is described as the fit of the tree. Therefore, the variance explained by the tree is the inverse of the Error. However, the RE gives an over-optimistic view of how the tree will predict new data. This is better described by the CV-Error. The CV-error varies from 0 for a perfect predictor to 1 for a poor predictor (De'ath 2002). Variance explained by each nodeTo find the variance explained by each node, write the results of the tree to a file a then find the summary of the file. 1) To write the results of the tree to a file, preceed the command line with name<- , just like went initially reading the csv.> mrtspider<- mvpart(gdist(spider[,1:12],meth="bray",full=TRUE,sq=TRUE) ~herbs+reft+moss+sand+twigs+water,spider,method="mrt",xv="1se", which="4") 2) Then find the summary of this file: 3) The variance explained can be calculated from this table:
When "nsplit" is 0 the relative error is 1, so the variance explained (1-rel error) is 0. Identifying Indicator SpeciesThe Indicator Species Analysis (ISA) (Dufrêne and Legendre 1997) is a helpful tool for characterizing the species at each node. To do this, I go back to the excel sheet and by sorting the data by the environmental variable for each split, I can assign each site to a node of the MRT. Then do a ISA using node as the grouping variable. I do this analysis in PCord still, but one day I will figure out how to do it in R and share it with the rest of the world. Benefits of distance based MRTs (db-MRT)Straight from De'ath (2002) V) Notes and Problems- to make this method reliable I feel that a large number of trees should be run. For my personal application of this method I change the xv to equal "1se", so R will pick the best tree within one SE of the overall best, and get R to create a large number of trees (>50) and then pick the tree that is most consistently produced. There is probably a way to run make R run 100 trees and give a summary of the results. - to make this method more useful you need to be able to create a table with the information seen in table 1 in De'ath's paper. I can not figure out how to reproduce this table and would really like too. I will keep trying to do this and will update this page when I can. If you know how to do this please let me know Josh.Jacobs[at]ualberta.ca -the graphs produced in the MRT using Euclidian distance are the species across the x-axis and abundance across the y-axis. The graphs produced using Bray-Curtis Distance measure creates a different graph with species along the x-axis and I believe the sum of squares on the y-axis. VI) ReferencesDe'Ath, G. 2002. Multivariate regression trees: a new technique for modeling species environment relationships. Ecology. 83:1105-1117. Dufrêne, M., and P. Legendre. 1997. Species assemblages and indicator species: the need for a flexible asymmetrical approach. Ecological Monographs 67:345–366. Last updated: 16 Jan 2007 |
||||||||||||||||||||||||||||||||