Naive Bayes algorithm using iris dataset

This algorith is based on probabilty, the probability captures the chance that an event will occur in the light of the available evidence. The lower the probability, the less likely the event is to occur. A probability of 0 indicates that the event will definitily not occur, while a probability of 1 indicates that the event will occur with 100 percent certainty.

This classifier uses training data to calculate an observed probability of each outcome based on the evidence provided bu feature values. When the classifier is later applied to unlabeled data, it uses the observed probabilities to predict the most likely class for the new feature.

This classifier has been used for: Text classification . Intrusion detection in computer networks . Diagnosing medical conditions

The relationship between dependent events can be described using Bayes’ theorem, which provides a way of thinking about how to revise an estimate of probabilities of one event in light of the evidence provided by another event.

The Naive Bayes algorithm describes a simple method to apply Baye’s theorem to classification problems.

This algorithm is named as such because it makes some ‘naive’ assumptions about the data. In particular, Naives Bayes assumes that all the features are equally important and independent. These assumptions are rarely true in most real-world applications. However, in most cases when these assumptions are violated, Naive Bayes still performs well. This is true even in extreme circumstances where strong dependencies are found among the features. Due to the algorithm’s versatility and accuracy across many types of conditions, Naive Bayes is often a strong first candidate for classification learning tasks.

The Laplacian correction (or Laplace estimator) is a way of dealing with zero probability values. There is a simple trick to avoid this problem. We can assume that our training set is so large that adding one to each count that we need would only make a negligible difference in the estimated probabilities, yet would avoid the case of zero probability values. This technique is know as Laplacian correction (or Laplace estimator).

STEP1. Collecting data. Exploring and preparing the data.

Here we will work with the iris dataset:

head(iris)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

summary(iris)

##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
##

str(iris)

## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

par(mfrow = c(1,4))
plot(iris$Species, iris$Sepal.Length, col= 'purple', las = 2)
plot(iris$Species, iris$Sepal.Width, col= 'pink', las = 2)
plot(iris$Species, iris$Petal.Length, col= 'gold', las = 2)
plot(iris$Species, iris$Sepal.Width, col= 'orange', las = 2)

STEP2. Creating training and testing datasets

We will divide our data into two different sets: a training dataset that will be used to build the model and a test dataset that will be used to estimate the predictive accuracy of the model.

The dataset will be divided into training (80%) and testing (20%) sets, we create the data sets using the caret package:

library(caret)

set.seed(1244)

train_ind= createDataPartition(y = iris$Species,p = 0.8,list = FALSE)
train = iris[train_ind,]
iris_train = train[-5]
head(iris_train)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1          5.1         3.5          1.4         0.2
## 2          4.9         3.0          1.4         0.2
## 3          4.7         3.2          1.3         0.2
## 4          4.6         3.1          1.5         0.2
## 5          5.0         3.6          1.4         0.2
## 7          4.6         3.4          1.4         0.3

test = iris[-train_ind,]
iris_test = test[-5]
head(iris_test)

##    Sepal.Length Sepal.Width Petal.Length Petal.Width
## 6           5.4         3.9          1.7         0.4
## 8           5.0         3.4          1.5         0.2
## 26          5.0         3.0          1.6         0.2
## 27          5.0         3.4          1.6         0.4
## 30          4.7         3.2          1.6         0.2
## 31          4.8         3.1          1.6         0.2

iris_train_labels = train[,5]
head(iris_train_labels)

## [1] setosa setosa setosa setosa setosa setosa
## Levels: setosa versicolor virginica

length(iris_train_labels)

## [1] 120

iris_test_labels = test[,5]
head(iris_test_labels)

## [1] setosa setosa setosa setosa setosa setosa
## Levels: setosa versicolor virginica

length(iris_test_labels)

## [1] 30

The training set has 120 samples, and the testing set has 30 samples.

STEP3. Training a model on the data

The Naive Bayes classifier is typically trained on data with categorical features.

library(e1071)

classifier = naiveBayes(iris_train, iris_train_labels, laplace = 1)
predictions = predict(classifier, iris_test, type = "class")
library(gmodels)

model1 = CrossTable(predictions, iris_test_labels, prop.chisq = FALSE, prop.t = FALSE)

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Row Total |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  30 
## 
##  
##              | iris_test_labels 
##  predictions |     setosa | versicolor |  virginica |  Row Total | 
## -------------|------------|------------|------------|------------|
##       setosa |         10 |          0 |          0 |         10 | 
##              |      1.000 |      0.000 |      0.000 |      0.333 | 
##              |      1.000 |      0.000 |      0.000 |            | 
## -------------|------------|------------|------------|------------|
##   versicolor |          0 |         10 |          1 |         11 | 
##              |      0.000 |      0.909 |      0.091 |      0.367 | 
##              |      0.000 |      1.000 |      0.100 |            | 
## -------------|------------|------------|------------|------------|
##    virginica |          0 |          0 |          9 |          9 | 
##              |      0.000 |      0.000 |      1.000 |      0.300 | 
##              |      0.000 |      0.000 |      0.900 |            | 
## -------------|------------|------------|------------|------------|
## Column Total |         10 |         10 |         10 |         30 | 
##              |      0.333 |      0.333 |      0.333 |            | 
## -------------|------------|------------|------------|------------|
## 
##

STEP4. Evaluating model performance.

library(caret)
confu = confusionMatrix(predictions, iris_test_labels, positive = 'setosa')
confu

## Confusion Matrix and Statistics
## 
##             Reference
## Prediction   setosa versicolor virginica
##   setosa         10          0         0
##   versicolor      0         10         1
##   virginica       0          0         9
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9667          
##                  95% CI : (0.8278, 0.9992)
##     No Information Rate : 0.3333          
##     P-Value [Acc > NIR] : 2.963e-13       
##                                           
##                   Kappa : 0.95            
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: setosa Class: versicolor Class: virginica
## Sensitivity                 1.0000            1.0000           0.9000
## Specificity                 1.0000            0.9500           1.0000
## Pos Pred Value              1.0000            0.9091           1.0000
## Neg Pred Value              1.0000            1.0000           0.9524
## Prevalence                  0.3333            0.3333           0.3333
## Detection Rate              0.3333            0.3333           0.3000
## Detection Prevalence        0.3333            0.3667           0.3000
## Balanced Accuracy           1.0000            0.9750           0.9500

The accuracy of the model is 96.67 %, whit an error rate of 3.33 %.

The kappa statistic of the model is 0.95.

Ant Colony Optimization (part 2) : Graph optimization using ACO

The Travelling Salesman Problem (TSP) is one of the most famous problems in computer science for studying optimization, the objective is to find a complete route that connects all the nodes of a network, visiting them only once and returning to the starting point while minimizing the total distance of the route. The problem of the traveling agent has an important variation, and this depends on whether the distances between one node and another are symmetric or not, that is, that the distance between A and B is equal to the distance between B and A, since in practice is very unlikely to be so. The number of possible routes in a network is determined by the equation: (𝒏−𝟏)! This means that in a network of 5 nodes the number of probable routes is equal to (5-1)! = 24, and as the number of nodes increases, the number of possible routes grows factorially. In the case that the problem is symmetrical the number of possible routes is reduced to half: ( (𝒏−𝟏)! ) / 𝟐 The complexity o...

Continue >>

Data World Blog

Search This Blog