Skip to main content

Posts

Supervised and Unsupervised Learning. Classification, Regression and Clustering.

Nowadays, data is growing faster than ever before and this data comes from every sector: businesses, biology, economics, etc. Technology and artificial intelligence allows us to process the large amount of information that is produced from all of these sectors. Data mining refers to the study of pre-existing databases in order to get new insights or information about the data.  Data mining uses different techniques to discover patterns and establish relationships to solve problems. Machine Learning uses Data mining techniques and other learning algorithms to build a model of what is happening behind the data so that it can be used to predict future outcomes . The main focus of Machine Learning is the study and design of systems or algorithm that can learn from data . Supervised learning and Unsupervised Learning: Supervised learning is when the algorithm works with input (x) and output (y) variables . Using input and output variables the supervised machine lea
Recent posts

Decision tree for predicting diabetes based on diagnostic measures

Decision tree learners are powerfull classifiers, which utilizes a tree structure to model the relationship among the features and the potential outcomes. The tree has a  root node  and  decision nodes  where choices are made. The choices split the data across branches that indicate the potential outcoumes of a decision. The tree is terminated by  leaf nodes  (or terminal nodes) that denote the action to be taken as the result of the series of the decisions. In the case of a predictive model, the leaf nodes provide the expected result given the series of events in the tree. After the model is created, many decision trees algorithms output the resulting structure in a human-readable format. This provides tremendous insight into how and why the model works or doesn’t work well for a particular task. This also makes decision trees particularly appropiate for applications in which the classification machanism needs to be transparent for legal reasons, or in case the results needs to be

Random Forest for predicting diabetes based on diagnostic measures

Random forests or decision tree forests focuses only on ensembles of decision trees. This method combines the base principles of bagging with random feature selection to add additional diversity to the decision tree models. After the ensemble of trees (the forest) is generated, the model uses a vote to combine the trees’ predictions. As the ensembles uses only a small, random portion of the full feature set, random forests can handle extremely large datasets, where the so-called “curse of dimensionality” might cause other models to fail. At the same time, its error rates for most learning tasks are on par with nearly any other method. It is a all purpose model that performs well in most problems, but unlike a decision tree, the model is not easily interpretable. Can handle noisy, missing data as well as categorical or continuous features, but selects only the most important features. Here we will work with the Pima Indians Diabetes database to predict the onset of diabetes base