Decision trees are one of the
most widely used machine learning algorithms used today. In this part, let us
learn at a very high level, how they work.
PART
1: A visual guide towards their working
Example Outline:
Suppose we are trying to find out whether a patient is a positive case of a certain disease, using the length and width of the suspect virus found in his/her blood. We obtained test results of 16 patients. Since decision tree is a form of ‘supervised learning’, we will first have to train the decision tree model by providing it with the sets of virus lengths and widths that resulted in positive casesand the ones which did not, before it is able to do predictions on its own. Upon plotting this data obtained from each patient, let us assume we landed up with a graph shown below:
Decision Tree Construction
Decision Tree Construction
Parts of Decision Tree
Root Node: This is the first node of a decision tree and often the most important attribute as proposed by a separate algorithm(s), to the decision tree. We will see what this ‘separate algorithm(s)’ is later on in the series. In this case the root node attribute is the ‘Length’.
Leaf Node: The node when further branching of the decision tree is not required is called ‘leaf node’. This happens when all or most values on any one side of the decision boundary are composed of only one type of variables. Look at step 2 illustrated as under, for instance. All values below Length of 1.5 result in negative instances. Thus if L<1.5 we can say with great confidence that the given case will be negative, even if we don’t know the outcome in advance. Thus the leaf node is said to be ‘pure node’ or the one bearing ‘low variance’ as it contains species of a single kind. In reality, however since the data is more complicated and large, we often set a threshold like, say, declare a leaf node when more than 90% of the data points are of same kind. We will deeper into the ‘decision tree stopping criterion’ later on in this series.
Child Node:This is the kind of node when we are not sure what to conclude because we are presented with a mixture of results. Going to step 2 again for instance, L>1.5 results in a mixture of positive and negative cases. We therefore require more branching to be able to arrive at a decisive conclusion. We will see exactly how this demarcation is done, later on in this series.
Decision Boundary:This is that imaginary line which demarcates cases belonging to one class, from the cases belonging to the other class(es).
Let the Tree Begin!
Batool Arhamna Haider,
13 Feb 2016
About the author:
The author is a graduate from Stanford University with MS in
Petroleum Engineering. She is currently working as Data Scientist in the Talent
Analytics Research team at UnitedHealth group.
The author would like to dedicate this series to
her beloved late, Aunt Fouzia Raza
If you enjoyed this post and wish to be informed whenever a new post is published, then make sure you subscribe to my regular Email Updates. Subscribe Now!
0 comments:
Have any question? Feel Free To Post Below:
Leave your feed back here.