Machine Learning II Decision Trees


Decision Trees


Trees Represent Disjunctive Concepts

     outlook=sunny & humidity=high
     outlook=rainand windy=true
if outlook=sunny      if outlook=sunny
and humidity=high     and humidity=normal
then classify as 'N'. then classify as 'P'.

Induction of Decision Trees

     outlook=(sunny overcast rain)
     temperature=(cool mild hot)
     humidity=(high normal)
     windy=(true false)
     outlook=overcase
     temperature=cool
     humidity=normal
     windy=false

ID3



outlook=sunny        outlook=sunny
temperature=cool     temperature=cool
humidity=high        humidity=high
windy=true           windy=true
classify as 'N'      classify as 'P'

How do we choose which attributes to split on?



 

Entropy Graph


     I(p,n) = I(9,g)
            = 0.940 bits
     E(outlook) = 0.694 bits
     gain(outlook) = 0.940 - 0.694 = 0.246 bits
     gain(temperature) = 0.029 bits
     gain(humidity) = 0.151 bits
     gain(windy) = 0.048 bits

Estimating Accuracy of Tree



Accuracy of Tree Versus Size


 

Noise and Overfitting of Tree





Automatically Generated Links Page is here.