Nowadays, data is growing faster than ever before and this data comes from every sector: businesses, biology, economics, etc. Technology and artificial intelligence allows us to process the large amount of information that is produced from all of these sectors.
Data mining refers to the study of pre-existing databases in order to get new insights or information about the data. Data mining uses different techniques to discover patterns and establish relationships to solve problems.
Machine Learning uses Data mining techniques and other learning algorithms to build a model of what is happening behind the data so that it can be used to predict future outcomes.
The main focus of Machine Learning is the study and design of systems or algorithm that can learn from data.
Supervised learning and Unsupervised Learning:
Supervised learning is when the algorithm works with input (x) and output (y) variables.
Using input and output variables the supervised machine learning algorithm gives us a function or model that best fits our data.
It is called supervised learning because the algorithm is fitting a model knowing the output.
Examples of supervised learning are: Classification, Regression
Using input and output variables the supervised machine learning algorithm gives us a function or model that best fits our data.
It is called supervised learning because the algorithm is fitting a model knowing the output.
Examples of supervised learning are: Classification, Regression
Unsupervised learning is when the algorithm only works with input data (x), there is no output data to work with.
The unsupervised learning algorithm uses the data to model the structure or distribution of the data which gives more information and insights about the data we are working with.
These are called unsupervised learning because unlike supervised learning there is no output. In this case, the algorithms are left to their own to discover and identify the structure in the data.
The unsupervised learning algorithm uses the data to model the structure or distribution of the data which gives more information and insights about the data we are working with.
These are called unsupervised learning because unlike supervised learning there is no output. In this case, the algorithms are left to their own to discover and identify the structure in the data.
Examples of supervised learning are: Clustering
Differences between Classification, Regression and Clustering:
As we have seen classification and regression are supervised learning algorithms, while clustering is an unsupervised learning algorithm.
Classification and regression, have input and output variables to work with.
- Classification: the algorithm identifies and classifies an object into a category. Example: given specific input of variables, for instance, physical measures of a tumor, like radius, shape, etc., the algorithm is able to classify into two categories, benign or malignant, the tumor.
- Regression: the algorithm predicts a value of a continuous variable. Example: given specific input variables, for instance metric measures of participants, like height, waist measure, etc., the algorithm is able to predict the weight of the participants.
Clustering is an unsupervised learning algorithm. The algorithm only works with input)variables.
- Clustering: the algorithm identifies groups of objects that have similar characteristics. Example: given a specific input variables, for example population of a city, the algorithm groups the individuals of the populations into sets (clusters or groups) that are similar among them.