Skip to main content

Posts

Showing posts from September, 2017

Decision tree for predicting diabetes based on diagnostic measures

Decision tree learners are powerfull classifiers, which utilizes a tree structure to model the relationship among the features and the potential outcomes. The tree has a  root node  and  decision nodes  where choices are made. The choices split the data across branches that indicate the potential outcoumes of a decision. The tree is terminated by  leaf nodes  (or terminal nodes) that denote the action to be taken as the result of the series of the decisions. In the case of a predictive model, the leaf nodes provide the expected result given the series of events in the tree. After the model is created, many decision trees algorithms output the resulting structure in a human-readable format. This provides tremendous insight into how and why the model works or doesn’t work well for a particular task. This also makes decision trees particularly appropiate for applications in which the classification machanism needs to be transparent for legal reasons, or in case the results needs to be

Random Forest for predicting diabetes based on diagnostic measures

Random forests or decision tree forests focuses only on ensembles of decision trees. This method combines the base principles of bagging with random feature selection to add additional diversity to the decision tree models. After the ensemble of trees (the forest) is generated, the model uses a vote to combine the trees’ predictions. As the ensembles uses only a small, random portion of the full feature set, random forests can handle extremely large datasets, where the so-called “curse of dimensionality” might cause other models to fail. At the same time, its error rates for most learning tasks are on par with nearly any other method. It is a all purpose model that performs well in most problems, but unlike a decision tree, the model is not easily interpretable. Can handle noisy, missing data as well as categorical or continuous features, but selects only the most important features. Here we will work with the Pima Indians Diabetes database to predict the onset of diabetes base