Skip to main content

Artificial Neuronal Network in R (neuralnet package)

An Artificial Neural Network (ANN) models the relationship between a set of input signals and an output signal using a model derived from our undestanding of how a biological brain responds to stimuli from sensory inputs. ANN uses a network of artificial neurons or nodes to solve learning problems.
At first ANNs were used to simulate learning simple functions like the AND function or the logical OR function, but nowadays as cumputers have become more powerfull that complexity of the ANNs has increased so much that they are now frequently applied to more practical problems including speech and handwrinting recognition programs, automation of smart devices, sophisticated models of weather and climate patterns, etc…
ANNs are very versatile learners that can be applied to nearly any learning task, classification, numeric prediction, and even unsuppervised pattern recognition.
ANNs are best applied to problems where the input data and output data are well defined, yet the process that relates the input to the output is extremlly complex. Results in a complex black box model that is difficult, if not impossible, to interpret.
There are many variants of neural networks, but all of them can be defined in terms of the following characteristics:
  1. Activation function: transforms a neuron’s combined input signals into a single output signal to be broadcasted further in the network.
  2. Network topology: describes the number of neurons and number of layers in the model and the manner in which they are connected.
  3. Training algorithm: specifies how connection weights are set in order to inhibit or excite neurons in proportion to the input signal.
?Here we will use breast cancer data from Winsconsin (https://data.world/health/breast-cancer-wisconsin/workspace/file?filename=breast-cancer-wisconsin-data%2Fdata.csv) with a ANN algorithm to predict if a tumor is bening or malignant

1. Collecting data. Exploring and preparing the data.

breastcancer = read.csv("C:/Users/ester/Downloads/breast-cancer-wisconsin-data-data.csv", sep = "," , dec = ".", header = TRUE)
head(breastcancer)
##         id diagnosis radius_mean texture_mean perimeter_mean area_mean
## 1   842302         M       17.99        10.38         122.80    1001.0
## 2   842517         M       20.57        17.77         132.90    1326.0
## 3 84300903         M       19.69        21.25         130.00    1203.0
## 4 84348301         M       11.42        20.38          77.58     386.1
## 5 84358402         M       20.29        14.34         135.10    1297.0
## 6   843786         M       12.45        15.70          82.57     477.1
##   smoothness_mean compactness_mean concavity_mean concave.points_mean
## 1         0.11840          0.27760         0.3001             0.14710
## 2         0.08474          0.07864         0.0869             0.07017
## 3         0.10960          0.15990         0.1974             0.12790
## 4         0.14250          0.28390         0.2414             0.10520
## 5         0.10030          0.13280         0.1980             0.10430
## 6         0.12780          0.17000         0.1578             0.08089
##   symmetry_mean fractal_dimension_mean radius_se texture_se perimeter_se
## 1        0.2419                0.07871    1.0950     0.9053        8.589
## 2        0.1812                0.05667    0.5435     0.7339        3.398
## 3        0.2069                0.05999    0.7456     0.7869        4.585
## 4        0.2597                0.09744    0.4956     1.1560        3.445
## 5        0.1809                0.05883    0.7572     0.7813        5.438
## 6        0.2087                0.07613    0.3345     0.8902        2.217
##   area_se smoothness_se compactness_se concavity_se concave.points_se
## 1  153.40      0.006399        0.04904      0.05373           0.01587
## 2   74.08      0.005225        0.01308      0.01860           0.01340
## 3   94.03      0.006150        0.04006      0.03832           0.02058
## 4   27.23      0.009110        0.07458      0.05661           0.01867
## 5   94.44      0.011490        0.02461      0.05688           0.01885
## 6   27.19      0.007510        0.03345      0.03672           0.01137
##   symmetry_se fractal_dimension_se radius_worst texture_worst
## 1     0.03003             0.006193        25.38         17.33
## 2     0.01389             0.003532        24.99         23.41
## 3     0.02250             0.004571        23.57         25.53
## 4     0.05963             0.009208        14.91         26.50
## 5     0.01756             0.005115        22.54         16.67
## 6     0.02165             0.005082        15.47         23.75
##   perimeter_worst area_worst smoothness_worst compactness_worst
## 1          184.60     2019.0           0.1622            0.6656
## 2          158.80     1956.0           0.1238            0.1866
## 3          152.50     1709.0           0.1444            0.4245
## 4           98.87      567.7           0.2098            0.8663
## 5          152.20     1575.0           0.1374            0.2050
## 6          103.40      741.6           0.1791            0.5249
##   concavity_worst concave.points_worst symmetry_worst
## 1          0.7119               0.2654         0.4601
## 2          0.2416               0.1860         0.2750
## 3          0.4504               0.2430         0.3613
## 4          0.6869               0.2575         0.6638
## 5          0.4000               0.1625         0.2364
## 6          0.5355               0.1741         0.3985
##   fractal_dimension_worst  X
## 1                 0.11890 NA
## 2                 0.08902 NA
## 3                 0.08758 NA
## 4                 0.17300 NA
## 5                 0.07678 NA
## 6                 0.12440 NA
summary(breastcancer)
##        id            diagnosis  radius_mean      texture_mean  
##  Min.   :     8670   B:357     Min.   : 6.981   Min.   : 9.71  
##  1st Qu.:   869218   M:212     1st Qu.:11.700   1st Qu.:16.17  
##  Median :   906024             Median :13.370   Median :18.84  
##  Mean   : 30371831             Mean   :14.127   Mean   :19.29  
##  3rd Qu.:  8813129             3rd Qu.:15.780   3rd Qu.:21.80  
##  Max.   :911320502             Max.   :28.110   Max.   :39.28  
##  perimeter_mean     area_mean      smoothness_mean   compactness_mean 
##  Min.   : 43.79   Min.   : 143.5   Min.   :0.05263   Min.   :0.01938  
##  1st Qu.: 75.17   1st Qu.: 420.3   1st Qu.:0.08637   1st Qu.:0.06492  
##  Median : 86.24   Median : 551.1   Median :0.09587   Median :0.09263  
##  Mean   : 91.97   Mean   : 654.9   Mean   :0.09636   Mean   :0.10434  
##  3rd Qu.:104.10   3rd Qu.: 782.7   3rd Qu.:0.10530   3rd Qu.:0.13040  
##  Max.   :188.50   Max.   :2501.0   Max.   :0.16340   Max.   :0.34540  
##  concavity_mean    concave.points_mean symmetry_mean   
##  Min.   :0.00000   Min.   :0.00000     Min.   :0.1060  
##  1st Qu.:0.02956   1st Qu.:0.02031     1st Qu.:0.1619  
##  Median :0.06154   Median :0.03350     Median :0.1792  
##  Mean   :0.08880   Mean   :0.04892     Mean   :0.1812  
##  3rd Qu.:0.13070   3rd Qu.:0.07400     3rd Qu.:0.1957  
##  Max.   :0.42680   Max.   :0.20120     Max.   :0.3040  
##  fractal_dimension_mean   radius_se        texture_se      perimeter_se   
##  Min.   :0.04996        Min.   :0.1115   Min.   :0.3602   Min.   : 0.757  
##  1st Qu.:0.05770        1st Qu.:0.2324   1st Qu.:0.8339   1st Qu.: 1.606  
##  Median :0.06154        Median :0.3242   Median :1.1080   Median : 2.287  
##  Mean   :0.06280        Mean   :0.4052   Mean   :1.2169   Mean   : 2.866  
##  3rd Qu.:0.06612        3rd Qu.:0.4789   3rd Qu.:1.4740   3rd Qu.: 3.357  
##  Max.   :0.09744        Max.   :2.8730   Max.   :4.8850   Max.   :21.980  
##     area_se        smoothness_se      compactness_se      concavity_se    
##  Min.   :  6.802   Min.   :0.001713   Min.   :0.002252   Min.   :0.00000  
##  1st Qu.: 17.850   1st Qu.:0.005169   1st Qu.:0.013080   1st Qu.:0.01509  
##  Median : 24.530   Median :0.006380   Median :0.020450   Median :0.02589  
##  Mean   : 40.337   Mean   :0.007041   Mean   :0.025478   Mean   :0.03189  
##  3rd Qu.: 45.190   3rd Qu.:0.008146   3rd Qu.:0.032450   3rd Qu.:0.04205  
##  Max.   :542.200   Max.   :0.031130   Max.   :0.135400   Max.   :0.39600  
##  concave.points_se   symmetry_se       fractal_dimension_se
##  Min.   :0.000000   Min.   :0.007882   Min.   :0.0008948   
##  1st Qu.:0.007638   1st Qu.:0.015160   1st Qu.:0.0022480   
##  Median :0.010930   Median :0.018730   Median :0.0031870   
##  Mean   :0.011796   Mean   :0.020542   Mean   :0.0037949   
##  3rd Qu.:0.014710   3rd Qu.:0.023480   3rd Qu.:0.0045580   
##  Max.   :0.052790   Max.   :0.078950   Max.   :0.0298400   
##   radius_worst   texture_worst   perimeter_worst    area_worst    
##  Min.   : 7.93   Min.   :12.02   Min.   : 50.41   Min.   : 185.2  
##  1st Qu.:13.01   1st Qu.:21.08   1st Qu.: 84.11   1st Qu.: 515.3  
##  Median :14.97   Median :25.41   Median : 97.66   Median : 686.5  
##  Mean   :16.27   Mean   :25.68   Mean   :107.26   Mean   : 880.6  
##  3rd Qu.:18.79   3rd Qu.:29.72   3rd Qu.:125.40   3rd Qu.:1084.0  
##  Max.   :36.04   Max.   :49.54   Max.   :251.20   Max.   :4254.0  
##  smoothness_worst  compactness_worst concavity_worst  concave.points_worst
##  Min.   :0.07117   Min.   :0.02729   Min.   :0.0000   Min.   :0.00000     
##  1st Qu.:0.11660   1st Qu.:0.14720   1st Qu.:0.1145   1st Qu.:0.06493     
##  Median :0.13130   Median :0.21190   Median :0.2267   Median :0.09993     
##  Mean   :0.13237   Mean   :0.25427   Mean   :0.2722   Mean   :0.11461     
##  3rd Qu.:0.14600   3rd Qu.:0.33910   3rd Qu.:0.3829   3rd Qu.:0.16140     
##  Max.   :0.22260   Max.   :1.05800   Max.   :1.2520   Max.   :0.29100     
##  symmetry_worst   fractal_dimension_worst    X          
##  Min.   :0.1565   Min.   :0.05504         Mode:logical  
##  1st Qu.:0.2504   1st Qu.:0.07146         NA's:569      
##  Median :0.2822   Median :0.08004                       
##  Mean   :0.2901   Mean   :0.08395                       
##  3rd Qu.:0.3179   3rd Qu.:0.09208                       
##  Max.   :0.6638   Max.   :0.20750
breastcan = breastcancer[-c(1,33)] #we don't need the first and last columns
summary(breastcan)
##  diagnosis  radius_mean      texture_mean   perimeter_mean  
##  B:357     Min.   : 6.981   Min.   : 9.71   Min.   : 43.79  
##  M:212     1st Qu.:11.700   1st Qu.:16.17   1st Qu.: 75.17  
##            Median :13.370   Median :18.84   Median : 86.24  
##            Mean   :14.127   Mean   :19.29   Mean   : 91.97  
##            3rd Qu.:15.780   3rd Qu.:21.80   3rd Qu.:104.10  
##            Max.   :28.110   Max.   :39.28   Max.   :188.50  
##    area_mean      smoothness_mean   compactness_mean  concavity_mean   
##  Min.   : 143.5   Min.   :0.05263   Min.   :0.01938   Min.   :0.00000  
##  1st Qu.: 420.3   1st Qu.:0.08637   1st Qu.:0.06492   1st Qu.:0.02956  
##  Median : 551.1   Median :0.09587   Median :0.09263   Median :0.06154  
##  Mean   : 654.9   Mean   :0.09636   Mean   :0.10434   Mean   :0.08880  
##  3rd Qu.: 782.7   3rd Qu.:0.10530   3rd Qu.:0.13040   3rd Qu.:0.13070  
##  Max.   :2501.0   Max.   :0.16340   Max.   :0.34540   Max.   :0.42680  
##  concave.points_mean symmetry_mean    fractal_dimension_mean
##  Min.   :0.00000     Min.   :0.1060   Min.   :0.04996       
##  1st Qu.:0.02031     1st Qu.:0.1619   1st Qu.:0.05770       
##  Median :0.03350     Median :0.1792   Median :0.06154       
##  Mean   :0.04892     Mean   :0.1812   Mean   :0.06280       
##  3rd Qu.:0.07400     3rd Qu.:0.1957   3rd Qu.:0.06612       
##  Max.   :0.20120     Max.   :0.3040   Max.   :0.09744       
##    radius_se        texture_se      perimeter_se       area_se       
##  Min.   :0.1115   Min.   :0.3602   Min.   : 0.757   Min.   :  6.802  
##  1st Qu.:0.2324   1st Qu.:0.8339   1st Qu.: 1.606   1st Qu.: 17.850  
##  Median :0.3242   Median :1.1080   Median : 2.287   Median : 24.530  
##  Mean   :0.4052   Mean   :1.2169   Mean   : 2.866   Mean   : 40.337  
##  3rd Qu.:0.4789   3rd Qu.:1.4740   3rd Qu.: 3.357   3rd Qu.: 45.190  
##  Max.   :2.8730   Max.   :4.8850   Max.   :21.980   Max.   :542.200  
##  smoothness_se      compactness_se      concavity_se    
##  Min.   :0.001713   Min.   :0.002252   Min.   :0.00000  
##  1st Qu.:0.005169   1st Qu.:0.013080   1st Qu.:0.01509  
##  Median :0.006380   Median :0.020450   Median :0.02589  
##  Mean   :0.007041   Mean   :0.025478   Mean   :0.03189  
##  3rd Qu.:0.008146   3rd Qu.:0.032450   3rd Qu.:0.04205  
##  Max.   :0.031130   Max.   :0.135400   Max.   :0.39600  
##  concave.points_se   symmetry_se       fractal_dimension_se
##  Min.   :0.000000   Min.   :0.007882   Min.   :0.0008948   
##  1st Qu.:0.007638   1st Qu.:0.015160   1st Qu.:0.0022480   
##  Median :0.010930   Median :0.018730   Median :0.0031870   
##  Mean   :0.011796   Mean   :0.020542   Mean   :0.0037949   
##  3rd Qu.:0.014710   3rd Qu.:0.023480   3rd Qu.:0.0045580   
##  Max.   :0.052790   Max.   :0.078950   Max.   :0.0298400   
##   radius_worst   texture_worst   perimeter_worst    area_worst    
##  Min.   : 7.93   Min.   :12.02   Min.   : 50.41   Min.   : 185.2  
##  1st Qu.:13.01   1st Qu.:21.08   1st Qu.: 84.11   1st Qu.: 515.3  
##  Median :14.97   Median :25.41   Median : 97.66   Median : 686.5  
##  Mean   :16.27   Mean   :25.68   Mean   :107.26   Mean   : 880.6  
##  3rd Qu.:18.79   3rd Qu.:29.72   3rd Qu.:125.40   3rd Qu.:1084.0  
##  Max.   :36.04   Max.   :49.54   Max.   :251.20   Max.   :4254.0  
##  smoothness_worst  compactness_worst concavity_worst  concave.points_worst
##  Min.   :0.07117   Min.   :0.02729   Min.   :0.0000   Min.   :0.00000     
##  1st Qu.:0.11660   1st Qu.:0.14720   1st Qu.:0.1145   1st Qu.:0.06493     
##  Median :0.13130   Median :0.21190   Median :0.2267   Median :0.09993     
##  Mean   :0.13237   Mean   :0.25427   Mean   :0.2722   Mean   :0.11461     
##  3rd Qu.:0.14600   3rd Qu.:0.33910   3rd Qu.:0.3829   3rd Qu.:0.16140     
##  Max.   :0.22260   Max.   :1.05800   Max.   :1.2520   Max.   :0.29100     
##  symmetry_worst   fractal_dimension_worst
##  Min.   :0.1565   Min.   :0.05504        
##  1st Qu.:0.2504   1st Qu.:0.07146        
##  Median :0.2822   Median :0.08004        
##  Mean   :0.2901   Mean   :0.08395        
##  3rd Qu.:0.3179   3rd Qu.:0.09208        
##  Max.   :0.6638   Max.   :0.20750
str(breastcan)
## 'data.frame':    569 obs. of  31 variables:
##  $ diagnosis              : Factor w/ 2 levels "B","M": 2 2 2 2 2 2 2 2 2 2 ...
##  $ radius_mean            : num  18 20.6 19.7 11.4 20.3 ...
##  $ texture_mean           : num  10.4 17.8 21.2 20.4 14.3 ...
##  $ perimeter_mean         : num  122.8 132.9 130 77.6 135.1 ...
##  $ area_mean              : num  1001 1326 1203 386 1297 ...
##  $ smoothness_mean        : num  0.1184 0.0847 0.1096 0.1425 0.1003 ...
##  $ compactness_mean       : num  0.2776 0.0786 0.1599 0.2839 0.1328 ...
##  $ concavity_mean         : num  0.3001 0.0869 0.1974 0.2414 0.198 ...
##  $ concave.points_mean    : num  0.1471 0.0702 0.1279 0.1052 0.1043 ...
##  $ symmetry_mean          : num  0.242 0.181 0.207 0.26 0.181 ...
##  $ fractal_dimension_mean : num  0.0787 0.0567 0.06 0.0974 0.0588 ...
##  $ radius_se              : num  1.095 0.543 0.746 0.496 0.757 ...
##  $ texture_se             : num  0.905 0.734 0.787 1.156 0.781 ...
##  $ perimeter_se           : num  8.59 3.4 4.58 3.44 5.44 ...
##  $ area_se                : num  153.4 74.1 94 27.2 94.4 ...
##  $ smoothness_se          : num  0.0064 0.00522 0.00615 0.00911 0.01149 ...
##  $ compactness_se         : num  0.049 0.0131 0.0401 0.0746 0.0246 ...
##  $ concavity_se           : num  0.0537 0.0186 0.0383 0.0566 0.0569 ...
##  $ concave.points_se      : num  0.0159 0.0134 0.0206 0.0187 0.0188 ...
##  $ symmetry_se            : num  0.03 0.0139 0.0225 0.0596 0.0176 ...
##  $ fractal_dimension_se   : num  0.00619 0.00353 0.00457 0.00921 0.00511 ...
##  $ radius_worst           : num  25.4 25 23.6 14.9 22.5 ...
##  $ texture_worst          : num  17.3 23.4 25.5 26.5 16.7 ...
##  $ perimeter_worst        : num  184.6 158.8 152.5 98.9 152.2 ...
##  $ area_worst             : num  2019 1956 1709 568 1575 ...
##  $ smoothness_worst       : num  0.162 0.124 0.144 0.21 0.137 ...
##  $ compactness_worst      : num  0.666 0.187 0.424 0.866 0.205 ...
##  $ concavity_worst        : num  0.712 0.242 0.45 0.687 0.4 ...
##  $ concave.points_worst   : num  0.265 0.186 0.243 0.258 0.163 ...
##  $ symmetry_worst         : num  0.46 0.275 0.361 0.664 0.236 ...
##  $ fractal_dimension_worst: num  0.1189 0.089 0.0876 0.173 0.0768 ...
The data we are going to work with has a dimention of 569 rows and 31 columns.
In order to work with the neuralnet package we create the following logical variables based on the previous variable diagnosis (type of tumor):
breastcan$Benign
breastcan$Benign[breastcan$diagnosis == "B"] = TRUE
breastcan$Benign[breastcan$diagnosis != "B"] = FALSE
breastcan$Malignant
breastcan$Malignant[breastcan$diagnosis == "M"] = TRUE
breastcan$Malignant[breastcan$diagnosis != "M"] = FALSE

#this new variable based on diagnosis variable will be used later on to evaluating model performance 
breastcan$diag0[breastcan$diagnosis == "B"] = "1"
breastcan$diag0[breastcan$diagnosis == "M"] = "2"

2. Creating training and testing datasets

We will divide our data into two different sets: a training dataset that will be used to build the model and a test dataset that will be used to estimate the predictive accuracy of the model.
The dataset will be divided into training (67%) and testing (33%) sets, we create the data sets using the caret package:
library(caret)
set.seed(123)

train_ind= createDataPartition(y = breastcan$diagnosis,p = 0.67,list = FALSE)
train = breastcan[train_ind,]
head(train)
##    diagnosis radius_mean texture_mean perimeter_mean area_mean
## 2          M       20.57        17.77         132.90    1326.0
## 4          M       11.42        20.38          77.58     386.1
## 5          M       20.29        14.34         135.10    1297.0
## 8          M       13.71        20.83          90.20     577.9
## 9          M       13.00        21.82          87.50     519.8
## 10         M       12.46        24.04          83.97     475.9
##    smoothness_mean compactness_mean concavity_mean concave.points_mean
## 2          0.08474          0.07864        0.08690             0.07017
## 4          0.14250          0.28390        0.24140             0.10520
## 5          0.10030          0.13280        0.19800             0.10430
## 8          0.11890          0.16450        0.09366             0.05985
## 9          0.12730          0.19320        0.18590             0.09353
## 10         0.11860          0.23960        0.22730             0.08543
##    symmetry_mean fractal_dimension_mean radius_se texture_se perimeter_se
## 2         0.1812                0.05667    0.5435     0.7339        3.398
## 4         0.2597                0.09744    0.4956     1.1560        3.445
## 5         0.1809                0.05883    0.7572     0.7813        5.438
## 8         0.2196                0.07451    0.5835     1.3770        3.856
## 9         0.2350                0.07389    0.3063     1.0020        2.406
## 10        0.2030                0.08243    0.2976     1.5990        2.039
##    area_se smoothness_se compactness_se concavity_se concave.points_se
## 2    74.08      0.005225        0.01308      0.01860           0.01340
## 4    27.23      0.009110        0.07458      0.05661           0.01867
## 5    94.44      0.011490        0.02461      0.05688           0.01885
## 8    50.96      0.008805        0.03029      0.02488           0.01448
## 9    24.32      0.005731        0.03502      0.03553           0.01226
## 10   23.94      0.007149        0.07217      0.07743           0.01432
##    symmetry_se fractal_dimension_se radius_worst texture_worst
## 2      0.01389             0.003532        24.99         23.41
## 4      0.05963             0.009208        14.91         26.50
## 5      0.01756             0.005115        22.54         16.67
## 8      0.01486             0.005412        17.06         28.14
## 9      0.02143             0.003749        15.49         30.73
## 10     0.01789             0.010080        15.09         40.68
##    perimeter_worst area_worst smoothness_worst compactness_worst
## 2           158.80     1956.0           0.1238            0.1866
## 4            98.87      567.7           0.2098            0.8663
## 5           152.20     1575.0           0.1374            0.2050
## 8           110.60      897.0           0.1654            0.3682
## 9           106.20      739.3           0.1703            0.5401
## 10           97.65      711.4           0.1853            1.0580
##    concavity_worst concave.points_worst symmetry_worst
## 2           0.2416               0.1860         0.2750
## 4           0.6869               0.2575         0.6638
## 5           0.4000               0.1625         0.2364
## 8           0.2678               0.1556         0.3196
## 9           0.5390               0.2060         0.4378
## 10          1.1050               0.2210         0.4366
##    fractal_dimension_worst Benign Malignant diag0
## 2                  0.08902  FALSE      TRUE     2
## 4                  0.17300  FALSE      TRUE     2
## 5                  0.07678  FALSE      TRUE     2
## 8                  0.11510  FALSE      TRUE     2
## 9                  0.10720  FALSE      TRUE     2
## 10                 0.20750  FALSE      TRUE     2
test = breastcan[-train_ind,]
head(test)
##    diagnosis radius_mean texture_mean perimeter_mean area_mean
## 1          M       17.99        10.38         122.80    1001.0
## 3          M       19.69        21.25         130.00    1203.0
## 6          M       12.45        15.70          82.57     477.1
## 7          M       18.25        19.98         119.60    1040.0
## 15         M       13.73        22.61          93.60     578.3
## 19         M       19.81        22.15         130.00    1260.0
##    smoothness_mean compactness_mean concavity_mean concave.points_mean
## 1          0.11840           0.2776         0.3001             0.14710
## 3          0.10960           0.1599         0.1974             0.12790
## 6          0.12780           0.1700         0.1578             0.08089
## 7          0.09463           0.1090         0.1127             0.07400
## 15         0.11310           0.2293         0.2128             0.08025
## 19         0.09831           0.1027         0.1479             0.09498
##    symmetry_mean fractal_dimension_mean radius_se texture_se perimeter_se
## 1         0.2419                0.07871    1.0950     0.9053        8.589
## 3         0.2069                0.05999    0.7456     0.7869        4.585
## 6         0.2087                0.07613    0.3345     0.8902        2.217
## 7         0.1794                0.05742    0.4467     0.7732        3.180
## 15        0.2069                0.07682    0.2121     1.1690        2.061
## 19        0.1582                0.05395    0.7582     1.0170        5.865
##    area_se smoothness_se compactness_se concavity_se concave.points_se
## 1   153.40      0.006399        0.04904      0.05373           0.01587
## 3    94.03      0.006150        0.04006      0.03832           0.02058
## 6    27.19      0.007510        0.03345      0.03672           0.01137
## 7    53.91      0.004314        0.01382      0.02254           0.01039
## 15   19.21      0.006429        0.05936      0.05501           0.01628
## 19  112.40      0.006494        0.01893      0.03391           0.01521
##    symmetry_se fractal_dimension_se radius_worst texture_worst
## 1      0.03003             0.006193        25.38         17.33
## 3      0.02250             0.004571        23.57         25.53
## 6      0.02165             0.005082        15.47         23.75
## 7      0.01369             0.002179        22.88         27.66
## 15     0.01961             0.008093        15.03         32.01
## 19     0.01356             0.001997        27.32         30.88
##    perimeter_worst area_worst smoothness_worst compactness_worst
## 1            184.6     2019.0           0.1622            0.6656
## 3            152.5     1709.0           0.1444            0.4245
## 6            103.4      741.6           0.1791            0.5249
## 7            153.2     1606.0           0.1442            0.2576
## 15           108.8      697.7           0.1651            0.7725
## 19           186.8     2398.0           0.1512            0.3150
##    concavity_worst concave.points_worst symmetry_worst
## 1           0.7119               0.2654         0.4601
## 3           0.4504               0.2430         0.3613
## 6           0.5355               0.1741         0.3985
## 7           0.3784               0.1932         0.3063
## 15          0.6943               0.2208         0.3596
## 19          0.5372               0.2388         0.2768
##    fractal_dimension_worst Benign Malignant diag0
## 1                  0.11890  FALSE      TRUE     2
## 3                  0.08758  FALSE      TRUE     2
## 6                  0.12440  FALSE      TRUE     2
## 7                  0.08368  FALSE      TRUE     2
## 15                 0.14310  FALSE      TRUE     2
## 19                 0.07615  FALSE      TRUE     2
The training set has 383 samples, and the testing set has 186 samples.

3. Training a model on the data

We first train the model with one hidden node:
#install.packages("neuralnet")
#install.packages("NeuralNetTools")
library(neuralnet)
## Warning: package 'neuralnet' was built under R version 3.4.1
library(NeuralNetTools)
## Warning: package 'NeuralNetTools' was built under R version 3.4.1
model1 = neuralnet(Benign + Malignant ~ radius_mean + texture_mean + perimeter_mean + area_mean + smoothness_mean + compactness_mean + concavity_mean + concave.points_mean + symmetry_mean + fractal_dimension_mean + radius_se + texture_se + perimeter_se + area_se + smoothness_se + compactness_se + concavity_se + concave.points_se + symmetry_se + fractal_dimension_se + radius_worst + texture_worst + perimeter_worst + area_worst + smoothness_worst + compactness_worst + concavity_worst + concave.points_worst + symmetry_worst + fractal_dimension_worst, data = train, hidden = 1)

predictions1 = compute(model1, test[2:31]) 
par(mar = numeric(4))
plotnet(model1)
Then, we train the model with three hidden nodes:
model2 = neuralnet(Benign + Malignant ~ radius_mean + texture_mean + perimeter_mean + area_mean + smoothness_mean + compactness_mean + concavity_mean + concave.points_mean + symmetry_mean + fractal_dimension_mean + radius_se + texture_se + perimeter_se + area_se + smoothness_se + compactness_se + concavity_se + concave.points_se + symmetry_se + fractal_dimension_se + radius_worst + texture_worst + perimeter_worst + area_worst + smoothness_worst + compactness_worst + concavity_worst + concave.points_worst + symmetry_worst + fractal_dimension_worst, data = train, hidden = 3)

predictions2 = compute(model2, test[2:31]) 
par(mar = numeric(4))
plotnet(model2)
In the plots the positive weights between layers are plot as black lines and negative weights as grey lines, and line thickness is in proportion to relative magnitude of each weight.

4. Evaluating model performance.

First, we evaluate the model with one hidden node:
evalu1 = as.vector(apply(predictions1$net.result, 1, which.max))
confu1 = confusionMatrix(evalu1, test$diag0, positive = "1")
confu1
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   1   2
##          1 114   9
##          2   3  60
##                                                 
##                Accuracy : 0.9354839             
##                  95% CI : (0.8900157, 0.9662222)
##     No Information Rate : 0.6290323             
##     P-Value [Acc > NIR] : < 0.00000000000000022 
##                                                 
##                   Kappa : 0.8592509             
##  Mcnemar's Test P-Value : 0.1489147             
##                                                 
##             Sensitivity : 0.9743590             
##             Specificity : 0.8695652             
##          Pos Pred Value : 0.9268293             
##          Neg Pred Value : 0.9523810             
##              Prevalence : 0.6290323             
##          Detection Rate : 0.6129032             
##    Detection Prevalence : 0.6612903             
##       Balanced Accuracy : 0.9219621             
##                                                 
##        'Positive' Class : 1                     
## 
The accuracy of the model is 93.55 %, whit an error rate of 6.45 %.
The kappa statistic of the model is 0.86.
Then, we evaluate the model with three hidden nodes:
evalu2 = as.vector(apply(predictions2$net.result, 1, which.max))
confu2 = confusionMatrix(evalu2, test$diag0, positive = '1')
## Warning in confusionMatrix.default(evalu2, test$diag0, positive = "1"):
## Levels are not in the same order for reference and data. Refactoring data
## to match.
confu2
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   1   2
##          1 117  69
##          2   0   0
##                                                   
##                Accuracy : 0.6290323               
##                  95% CI : (0.5553044, 0.6985759)  
##     No Information Rate : 0.6290323               
##     P-Value [Acc > NIR] : 0.5328217               
##                                                   
##                   Kappa : 0                       
##  Mcnemar's Test P-Value : 0.0000000000000002695185
##                                                   
##             Sensitivity : 1.0000000               
##             Specificity : 0.0000000               
##          Pos Pred Value : 0.6290323               
##          Neg Pred Value :       NaN               
##              Prevalence : 0.6290323               
##          Detection Rate : 0.6290323               
##    Detection Prevalence : 1.0000000               
##       Balanced Accuracy : 0.5000000               
##                                                   
##        'Positive' Class : 1                       
## 
The accuracy of the model is 62.9 %, whit an error rate of 37.1 %.
The kappa statistic of the model is 0.
Comparing both models we can see that we get a better model performance using 1 hidden node.

Popular posts from this blog

Support Vector Machines (SVM) in R (package 'kernlab')

Support Vector Machines (SVM) learning combines of both the instance-based nearest neighbor algorithm and the linear regression modeling. Support Vector Machines can be imagined as a surface that creates a boundary (hyperplane) between points of data plotted in multidimensional that represents examples and their feature values. Since it is likely that the line that leads to the greatest separation will generalize the best to the future data, SVM involves a search for the Maximum Margin Hyperplane (MMH) that creates the greatest separation between the 2 classes. If the data ara not linearly separable is used a slack variable, which creates a soft margin that allows some points to fall on the incorrect side of the margin. But, in many real-world applications, the relationship between variables are nonlinear. A key featureof the SVMs are their ability to map the problem to a higher dimension space using a process known as the Kernel trick, this involves a process of constructing ne...

Initial Data Analysis (infert dataset)

Initial analysis is a very important step that should always be performed prior to analysing the data we are working with. The data we receive most of the time is messy and may contain mistakes that can lead us to wrong conclusions. Here we will use the dataset infert , that is already present in R. To get to know the data is very important to know the background and the meaning of each variable present in the dataset. Since infert is a dataset in R we can get information about the data using the following code: require(datasets) ?infert #gives us important info about the dataset inf <- infert #renamed dataset as 'inf' This gives us the following information: Format 1.Education: 0 = 0-5 years, 1 = 6-11 years, 2 = 12+ years 2.Age: Age in years of case 3.Parity: Count 4.Number of prior induced abortions: 0 = 0, 1 = 1, 2 = 2 or more 5.Case status: 1 = case 0 = control 6.Number of prior spontaneous abortions: 0 = 0, 1 = 1, 2...

Ant Colony Optimization (part 2) : Graph optimization using ACO

The Travelling Salesman Problem (TSP) is one of the most famous problems in computer science for studying optimization, the objective is to find a complete route that connects all the nodes of a network, visiting them only once and returning to the starting point while minimizing the total distance of the route. The problem of the traveling agent has an important variation, and this depends on whether the distances between one node and another are symmetric or not, that is, that the distance between A and B is equal to the distance between B and A, since in practice is very unlikely to be so. The number of possible routes in a network is determined by the equation: (𝒏−𝟏)! This means that in a network of 5 nodes the number of probable routes is equal to (5-1)! = 24, and as the number of nodes increases, the number of possible routes grows factorially. In the case that the problem is symmetrical the number of possible routes is reduced to half: ( (𝒏−𝟏)! ) / 𝟐 The complexity o...