Skip to content

ID3 Decision Tree Classifier for Machine Learning along with Reduced Error Pruning and Random Forest to avoid overfitting

License

Notifications You must be signed in to change notification settings

kevalmorabia97/ID3-Decision-Tree-Classifier-in-Java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ID3-Decision-Tree-Classifier-in-Java

Classes: 1 = >50K, 2 = <=50K Attributes age: continuous. workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked. fnlwgt: continuous. education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool. education-num: continuous. marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse. occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces. relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried. race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black. sex: Female, Male. capital-gain: continuous. capital-loss: continuous. hours-per-week: continuous. native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands. 

Procedure:

  1. Decision tree was generated using the data provided and the ID3 algorithm mentioned in Tom. M. Mitchell.
  2. Missing values were filled using the value which appeared most frequently in the particular attribute column.
  3. Continuous values were handled as mentioned in section 3.7.2 of Tom M. Mitchell. First the values were sorted in ascending order, then at the points where value was changing, gain was calculated and finally the column was splited at the point where maximum gain was obtained.
  4. Reduced Error Pruning was performed by removing a node (one by one) and then checking the accuracy. If accuracy was increased than the node was removed else we move on to check the next node.
  5. Random forests were generated using 50% attributes and 33% data randomly. 10 forests were generated and accuracy increased compared to the original ID3 algorithm.

Output: Start... Prepocessing Training data Prepocessing Testing data Generating Decision Tree using ID3 Algorithm Training Time=1.979secs Accuracy=0.807874209200909 Precision=0.8762364294330519 Recall=0.8727272727272727 F-Score=0.874478330658106 No of nodes in tree = 33223 Applying Reduced Error Pruning on the decision tree generated Training Time=10.7secs Accuracy=0.8404889134574043 Precision=0.9467631684760756 Recall=0.8588415523781733 F-Score=0.9006617450177867 No of nodes in tree = 2640 Initializing Random Forest with 10 trees, 0.5 fraction of attributes and 0.33 fraction of training instances in each tree Training Time=1.618secs Accuracy=0.8313371414532277 Precision=0.944270205066345 Recall=0.8511779630300834 F-Score=0.8953107129241327 End... 

About

ID3 Decision Tree Classifier for Machine Learning along with Reduced Error Pruning and Random Forest to avoid overfitting

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages