Skip to main content

fitting - Mathematica Implementations of the Random Forest algorithm


Is anyone aware of Mathematica use/implementation of Random Forest algorithm?



Answer



Here I will attempt to provide a basic implementation of the random forest algorithm for classification. This is by no means fast and doesn't scale very well but otherwise is a nice classifier. I recommend reading Breiman and Cutler's page for information about random forests.



The following are some helper functions that allow us to compute entropy and ultimately information gain easily.


condEnt = Statistics`Library`NConditionalEntropy;
ent = Statistics`Library`NEntropy;
maxI[v_] := Ordering[v, -1][[1]]

Now for the decision trees that will be used in the forest. This is the biggest bottleneck in the whole process and I would love to see a much faster implementation in Mathematica! The trees I'm working with are called CART trees.


The basic idea is to select at random m of the possible k variables to separate the response y into classes. Once the best variable is chosen the value of that variable which bests separates the classes is chosen to create a split in the tree. This process continues until all responses have been classified or we aren't able to split things based on the remaining data.


cart[x_, y_ /; ent[y] == 0, m_] := First[y]

cart[x_, y_, m_] :=

Block[{k = Length[x], h, drop, bestVar, ub, best, mask},
h = ent[y];
drop = RandomSample[Range[k], k - m];
bestVar =
maxI[Table[If[FreeQ[drop, i], h - condEnt[x[[i]], y], 0], {i, k}]];
ub = Union[x[[bestVar]]];
mask = UnitStep[x[[bestVar]] - #] & /@ ub;
best = maxI[(h - condEnt[#, y] & /@ mask)];
If[Min[#] == Max[#],
RandomChoice[y],

{bestVar, ub[[best]],
cart[Pick[x\[Transpose], #, 1]\[Transpose], Pick[y, #, 1], m],
cart[Pick[x\[Transpose], #, 0]\[Transpose], Pick[y, #, 0], m]
}
] &[mask[[best]]]
]

To demo these things as I go lets use the famous iris data built in to Mathematica selecting about 80% for training and 20% for testing.


data = ExampleData[{"Statistics", "FisherIris"}];


rs = RandomSample[Range[Length[data]]];
train = rs[[1 ;; 120]];
test = rs[[121 ;;]];

Now lets create a CART tree from this data letting m be 3. We can read the result easily. The first element is the variable that bests splits the data, in this case variable 3. The next value 3.3 is the critical value of variable 3 to use. If a value is below that it branches to the right and if it is above it branches to the left. The leaves are the next two elements.


tree = cart[Transpose@data[[train, 1 ;; -2]], data[[train, -1]], 3]

(* {3, 3.3, {3, 4.9, {4, 1.8, "virginica", {3, 5.1, {4, 1.6, "versicolor","virginica"},
"versicolor"}}, {4, 1.7, {4, 1.8, "versicolor", "virginica"},"versicolor"}}, "setosa"} *)


So far so good. Now we need a classifier that can take a new input and a tree to make a classification.


classify[x_, Except[_List, d_]] := d
classify[x_, {best_, value_, d1_, d2_}] := classify[x, If[x[[best]] < value, d2, d1]]

Lets try it out with the first element from our training data. It correctly classifies the iris as species 2 (virginica).


classify[{6.4, 2.8, 5.6, 2.1}, tree]

(* "virginica *)

A random forest is nothing but an ensemble of such trees, created from a bootstrap sample of the data, that allows each one to vote. The function rf takes some input data x, a response vector y, the number of variables to select for splitting m and the number of trees to grow ntree. The function rfClassify takes a new input and a trained forest and makes a classification.



rf[x_, y_, m_, ntree_] := 
Table[boot = RandomChoice[Range[Length[y]], Length[y]];
cart[x[[All, boot]], y[[boot]], m], {ntree}]

rfClassify[input_, forest_] :=
First[#][[maxI[Last[#]]]] &[
Transpose[Tally[classify[input, #] & /@ forest]]]

Lets try it on the iris data. First we fit a forest with our training set. And test it to make sure it works well on its own training data. In this case we get perfect classification of our training data so it looks good.


f = rf[data[[train, 1 ;; -2]]\[Transpose], data[[train, -1]], 3, 500];


N@
Mean[Boole[
First[#] === Last[#] & /@
Transpose[{rfClassify[#, f] & /@ data[[train, 1 ;; -2]],
data[[train, -1]]}]]]

(* 1. *)

Now lets try it out on the test data it has never seen before.



 N@
Mean[Boole[
First[#] === Last[#] & /@
Transpose[{rfClassify[#, f] & /@ data[[test, 1 ;; -2]],
data[[test, -1]]}]]]

(* 0.966667 *)

I'd say 97% correct classification isn't bad for a relatively small data set and forest.


EDIT:



It is worth showing how one might use RLink and the randomForest package in Mathematica with the data I have here. The following code will fit a random forest to the iris data and return the prediction for a particular input newX.


(*Enable RLink*)
<< RLink`
InstallR[]

(*Set the training data and response*)
RSet["x", data[[train, 1 ;; -2]]];
RSet["y", data[[train, -1]]];
RSet["newX", data[[test, 1 ;; -2]][[1]]];


(*install the package. Note: you only need to do this once*)
REvaluate["install.packages(\"randomForest\")"];

(*fit a random forest and make one classification*)
REvaluate["{
library(randomForest)
f = randomForest(x,as.factor(y))
predict(f,newX)
}"
]


(* RFactor[{1}, RFactorLevels["setosa", "versicolor", "virginica"],
RAttributes["names" :> {"1"}]] *)

Comments

Popular posts from this blog

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...

How to thread a list

I have data in format data = {{a1, a2}, {b1, b2}, {c1, c2}, {d1, d2}} Tableform: I want to thread it to : tdata = {{{a1, b1}, {a2, b2}}, {{a1, c1}, {a2, c2}}, {{a1, d1}, {a2, d2}}} Tableform: And I would like to do better then pseudofunction[n_] := Transpose[{data2[[1]], data2[[n]]}]; SetAttributes[pseudofunction, Listable]; Range[2, 4] // pseudofunction Here is my benchmark data, where data3 is normal sample of real data. data3 = Drop[ExcelWorkBook[[Column1 ;; Column4]], None, 1]; data2 = {a #, b #, c #, d #} & /@ Range[1, 10^5]; data = RandomReal[{0, 1}, {10^6, 4}]; Here is my benchmark code kptnw[list_] := Transpose[{Table[First@#, {Length@# - 1}], Rest@#}, {3, 1, 2}] &@list kptnw2[list_] := Transpose[{ConstantArray[First@#, Length@# - 1], Rest@#}, {3, 1, 2}] &@list OleksandrR[list_] := Flatten[Outer[List, List@First[list], Rest[list], 1], {{2}, {1, 4}}] paradox2[list_] := Partition[Riffle[list[[1]], #], 2] & /@ Drop[list, 1] RM[list_] := FoldList[Transpose[{First@li...

front end - keyboard shortcut to invoke Insert new matrix

I frequently need to type in some matrices, and the menu command Insert > Table/Matrix > New... allows matrices with lines drawn between columns and rows, which is very helpful. I would like to make a keyboard shortcut for it, but cannot find the relevant frontend token command (4209405) for it. Since the FullForm[] and InputForm[] of matrices with lines drawn between rows and columns is the same as those without lines, it's hard to do this via 3rd party system-wide text expanders (e.g. autohotkey or atext on mac). How does one assign a keyboard shortcut for the menu item Insert > Table/Matrix > New... , preferably using only mathematica? Thanks! Answer In the MenuSetup.tr (for linux located in the $InstallationDirectory/SystemFiles/FrontEnd/TextResources/X/ directory), I changed the line MenuItem["&New...", "CreateGridBoxDialog"] to read MenuItem["&New...", "CreateGridBoxDialog", MenuKey["m", Modifiers-...