ID3 algorithm

In decision tree learning, ID3 (Iterative Dichotomiser 3) is an algorithm used to generate a decision tree invented by Ross Quinlan.^[1] ID3 is the precursor to the C4.5 algorithm.

1 Algorithm
2 The ID3 metrics
- 2.1 Entropy
- 2.2 Gain
3 See also
4 References
5 External links

Algorithm

The ID3 algorithm can be summarized as follows:

Take all unused attributes and count their entropy concerning test samples
Choose attribute for which entropy is minimum (or, equivalently, information gain is maximum)
Make node containing that attribute

The algorithm is as follows:

ID3 (Examples, Target_Attribute, Attributes)

Create a root node for the tree
If all examples are positive, Return the single-node tree Root, with label = +.
If all examples are negative, Return the single-node tree Root, with label = -.
If number of predicting attributes is empty, then Return the single node tree Root, with label = most common value of the target attribute in the examples.
Otherwise Begin
- A = The Attribute that best classifies examples.
- Decision Tree attribute for Root = A.
- For each possible value, v_i, of A,
  - Add a new tree branch below Root, corresponding to the test A = $v i$ .
  - Let Examples( $v i$ ) be the subset of examples that have the value $v i$ for A
  - If Examples(v_i) is empty
    - Then below this new branch add a leaf node with label = most common target value in the examples
  - Else below this new branch add the subtree ID3 (Examples( $v i$ ), Target_Attribute, Attributes – {A})
End
Return Root

The ID3 metrics

The algorithm is based on Occam's razor: it prefers smaller decision trees (simpler theories) over larger ones. However, it does not always produce the smallest tree, and is therefore a heuristic. Occam's razor is formalized using the concept of information entropy:

Entropy

$E(S) = - \sum^{n}_{j=1} f_{S}(j) \log_{2} f_{S}(j)$

Where :

$E (S)$ is the information entropy of the set $S$ ;
$n$ is the number of different values of the attribute in $S$ (entropy is computed for one chosen attribute)
$f S (j)$ is the frequency (proportion) of the value $j$ in the set $S$
$log 2$ is the binary logarithm

An entropy of 0 identifies a perfectly classified set.

Entropy is used to determine which node to split next in the algorithm. The higher the entropy, the higher the potential to improve the classification here.

Gain

Gain is computed to estimate the gain produced by a split over an attribute :

$G(S, A) = E(S) - \sum^{m}_{i=1} f_{S}(A_{i}) E(S_{A_{i}})$

Where :

$G (S, A)$ is the gain of the set $S$ after a split over the $A$ attribute
$E (S)$ is the information entropy of the set $S$
$m$ is the number of different values of the attribute $A$ in $S$
$f S (A i)$ is the frequency (proportion) of the items possessing $A i$ as value for $A$ in $S$
$A i$ is $i t h$ possible value of $A$
$S_{A_{i}}$ is a subset of $S$ containing all items where the value of $A$ is $A i$

Gain quantifies the entropy improvement by splitting over an attribute: higher is better.

References

^ Quinlan, J. R. 1986. Induction of Decision Trees. Mach. Learn. 1, 1 (Mar. 1986), 81-106.

Mitchell, Tom M. Machine Learning. McGraw-Hill, 1997. pp. 55-58.

fonte: https://web.archive.org/web/20120212195347/http://en.wikipedia.org/wiki/ID3_algorithm

Personal tools

Interaction

Toolbox

What links here
Related changes
Upload file
Special pages
Permanent link
Cite this page

Print/export

Create a book
Download as PDF
Printable version

Conheça Walt Disney World

ID3 algorithm

Contents

Algorithm

The ID3 metrics

Entropy

Gain

See also

References

External links

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Interaction

Toolbox

Print/export

Languages