An Artificial Neural Network (ANN) models the relationship between a set of input signals and an output signal using a model derived from our undestanding of how a biological brain responds to stimuli from sensory inputs. ANN uses a network of artificial neurons or nodes to solve learning problems.
At first ANNs were used to simulate learning simple functions like the
AND
function or the logical OR
function, but nowadays as cumputers have become more powerfull that complexity of the ANNs has increased so much that they are now frequently applied to more practical problems including speech and handwrinting recognition programs, automation of smart devices, sophisticated models of weather and climate patterns, etc…
ANNs are very versatile learners that can be applied to nearly any learning task, classification, numeric prediction, and even unsuppervised pattern recognition.
ANNs are best applied to problems where the input data and output data are well defined, yet the process that relates the input to the output is extremlly complex. Results in a complex black box model that is difficult, if not impossible, to interpret.
There are many variants of neural networks, but all of them can be defined in terms of the following characteristics:
- Activation function: transforms a neuron’s combined input signals into a single output signal to be broadcasted further in the network.
- Network topology: describes the number of neurons and number of layers in the model and the manner in which they are connected.
- Training algorithm: specifies how connection weights are set in order to inhibit or excite neurons in proportion to the input signal.
?Here we will use breast cancer data from Winsconsin (https://data.world/health/breast-cancer-wisconsin/workspace/file?filename=breast-cancer-wisconsin-data%2Fdata.csv) with a ANN algorithm to predict if a tumor is bening or malignant
1. Collecting data. Exploring and preparing the data.
breastcancer = read.csv("C:/Users/ester/Downloads/breast-cancer-wisconsin-data-data.csv", sep = "," , dec = ".", header = TRUE)
head(breastcancer)
## id diagnosis radius_mean texture_mean perimeter_mean area_mean
## 1 842302 M 17.99 10.38 122.80 1001.0
## 2 842517 M 20.57 17.77 132.90 1326.0
## 3 84300903 M 19.69 21.25 130.00 1203.0
## 4 84348301 M 11.42 20.38 77.58 386.1
## 5 84358402 M 20.29 14.34 135.10 1297.0
## 6 843786 M 12.45 15.70 82.57 477.1
## smoothness_mean compactness_mean concavity_mean concave.points_mean
## 1 0.11840 0.27760 0.3001 0.14710
## 2 0.08474 0.07864 0.0869 0.07017
## 3 0.10960 0.15990 0.1974 0.12790
## 4 0.14250 0.28390 0.2414 0.10520
## 5 0.10030 0.13280 0.1980 0.10430
## 6 0.12780 0.17000 0.1578 0.08089
## symmetry_mean fractal_dimension_mean radius_se texture_se perimeter_se
## 1 0.2419 0.07871 1.0950 0.9053 8.589
## 2 0.1812 0.05667 0.5435 0.7339 3.398
## 3 0.2069 0.05999 0.7456 0.7869 4.585
## 4 0.2597 0.09744 0.4956 1.1560 3.445
## 5 0.1809 0.05883 0.7572 0.7813 5.438
## 6 0.2087 0.07613 0.3345 0.8902 2.217
## area_se smoothness_se compactness_se concavity_se concave.points_se
## 1 153.40 0.006399 0.04904 0.05373 0.01587
## 2 74.08 0.005225 0.01308 0.01860 0.01340
## 3 94.03 0.006150 0.04006 0.03832 0.02058
## 4 27.23 0.009110 0.07458 0.05661 0.01867
## 5 94.44 0.011490 0.02461 0.05688 0.01885
## 6 27.19 0.007510 0.03345 0.03672 0.01137
## symmetry_se fractal_dimension_se radius_worst texture_worst
## 1 0.03003 0.006193 25.38 17.33
## 2 0.01389 0.003532 24.99 23.41
## 3 0.02250 0.004571 23.57 25.53
## 4 0.05963 0.009208 14.91 26.50
## 5 0.01756 0.005115 22.54 16.67
## 6 0.02165 0.005082 15.47 23.75
## perimeter_worst area_worst smoothness_worst compactness_worst
## 1 184.60 2019.0 0.1622 0.6656
## 2 158.80 1956.0 0.1238 0.1866
## 3 152.50 1709.0 0.1444 0.4245
## 4 98.87 567.7 0.2098 0.8663
## 5 152.20 1575.0 0.1374 0.2050
## 6 103.40 741.6 0.1791 0.5249
## concavity_worst concave.points_worst symmetry_worst
## 1 0.7119 0.2654 0.4601
## 2 0.2416 0.1860 0.2750
## 3 0.4504 0.2430 0.3613
## 4 0.6869 0.2575 0.6638
## 5 0.4000 0.1625 0.2364
## 6 0.5355 0.1741 0.3985
## fractal_dimension_worst X
## 1 0.11890 NA
## 2 0.08902 NA
## 3 0.08758 NA
## 4 0.17300 NA
## 5 0.07678 NA
## 6 0.12440 NA
summary(breastcancer)
## id diagnosis radius_mean texture_mean
## Min. : 8670 B:357 Min. : 6.981 Min. : 9.71
## 1st Qu.: 869218 M:212 1st Qu.:11.700 1st Qu.:16.17
## Median : 906024 Median :13.370 Median :18.84
## Mean : 30371831 Mean :14.127 Mean :19.29
## 3rd Qu.: 8813129 3rd Qu.:15.780 3rd Qu.:21.80
## Max. :911320502 Max. :28.110 Max. :39.28
## perimeter_mean area_mean smoothness_mean compactness_mean
## Min. : 43.79 Min. : 143.5 Min. :0.05263 Min. :0.01938
## 1st Qu.: 75.17 1st Qu.: 420.3 1st Qu.:0.08637 1st Qu.:0.06492
## Median : 86.24 Median : 551.1 Median :0.09587 Median :0.09263
## Mean : 91.97 Mean : 654.9 Mean :0.09636 Mean :0.10434
## 3rd Qu.:104.10 3rd Qu.: 782.7 3rd Qu.:0.10530 3rd Qu.:0.13040
## Max. :188.50 Max. :2501.0 Max. :0.16340 Max. :0.34540
## concavity_mean concave.points_mean symmetry_mean
## Min. :0.00000 Min. :0.00000 Min. :0.1060
## 1st Qu.:0.02956 1st Qu.:0.02031 1st Qu.:0.1619
## Median :0.06154 Median :0.03350 Median :0.1792
## Mean :0.08880 Mean :0.04892 Mean :0.1812
## 3rd Qu.:0.13070 3rd Qu.:0.07400 3rd Qu.:0.1957
## Max. :0.42680 Max. :0.20120 Max. :0.3040
## fractal_dimension_mean radius_se texture_se perimeter_se
## Min. :0.04996 Min. :0.1115 Min. :0.3602 Min. : 0.757
## 1st Qu.:0.05770 1st Qu.:0.2324 1st Qu.:0.8339 1st Qu.: 1.606
## Median :0.06154 Median :0.3242 Median :1.1080 Median : 2.287
## Mean :0.06280 Mean :0.4052 Mean :1.2169 Mean : 2.866
## 3rd Qu.:0.06612 3rd Qu.:0.4789 3rd Qu.:1.4740 3rd Qu.: 3.357
## Max. :0.09744 Max. :2.8730 Max. :4.8850 Max. :21.980
## area_se smoothness_se compactness_se concavity_se
## Min. : 6.802 Min. :0.001713 Min. :0.002252 Min. :0.00000
## 1st Qu.: 17.850 1st Qu.:0.005169 1st Qu.:0.013080 1st Qu.:0.01509
## Median : 24.530 Median :0.006380 Median :0.020450 Median :0.02589
## Mean : 40.337 Mean :0.007041 Mean :0.025478 Mean :0.03189
## 3rd Qu.: 45.190 3rd Qu.:0.008146 3rd Qu.:0.032450 3rd Qu.:0.04205
## Max. :542.200 Max. :0.031130 Max. :0.135400 Max. :0.39600
## concave.points_se symmetry_se fractal_dimension_se
## Min. :0.000000 Min. :0.007882 Min. :0.0008948
## 1st Qu.:0.007638 1st Qu.:0.015160 1st Qu.:0.0022480
## Median :0.010930 Median :0.018730 Median :0.0031870
## Mean :0.011796 Mean :0.020542 Mean :0.0037949
## 3rd Qu.:0.014710 3rd Qu.:0.023480 3rd Qu.:0.0045580
## Max. :0.052790 Max. :0.078950 Max. :0.0298400
## radius_worst texture_worst perimeter_worst area_worst
## Min. : 7.93 Min. :12.02 Min. : 50.41 Min. : 185.2
## 1st Qu.:13.01 1st Qu.:21.08 1st Qu.: 84.11 1st Qu.: 515.3
## Median :14.97 Median :25.41 Median : 97.66 Median : 686.5
## Mean :16.27 Mean :25.68 Mean :107.26 Mean : 880.6
## 3rd Qu.:18.79 3rd Qu.:29.72 3rd Qu.:125.40 3rd Qu.:1084.0
## Max. :36.04 Max. :49.54 Max. :251.20 Max. :4254.0
## smoothness_worst compactness_worst concavity_worst concave.points_worst
## Min. :0.07117 Min. :0.02729 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.11660 1st Qu.:0.14720 1st Qu.:0.1145 1st Qu.:0.06493
## Median :0.13130 Median :0.21190 Median :0.2267 Median :0.09993
## Mean :0.13237 Mean :0.25427 Mean :0.2722 Mean :0.11461
## 3rd Qu.:0.14600 3rd Qu.:0.33910 3rd Qu.:0.3829 3rd Qu.:0.16140
## Max. :0.22260 Max. :1.05800 Max. :1.2520 Max. :0.29100
## symmetry_worst fractal_dimension_worst X
## Min. :0.1565 Min. :0.05504 Mode:logical
## 1st Qu.:0.2504 1st Qu.:0.07146 NA's:569
## Median :0.2822 Median :0.08004
## Mean :0.2901 Mean :0.08395
## 3rd Qu.:0.3179 3rd Qu.:0.09208
## Max. :0.6638 Max. :0.20750
breastcan = breastcancer[-c(1,33)] #we don't need the first and last columns
summary(breastcan)
## diagnosis radius_mean texture_mean perimeter_mean
## B:357 Min. : 6.981 Min. : 9.71 Min. : 43.79
## M:212 1st Qu.:11.700 1st Qu.:16.17 1st Qu.: 75.17
## Median :13.370 Median :18.84 Median : 86.24
## Mean :14.127 Mean :19.29 Mean : 91.97
## 3rd Qu.:15.780 3rd Qu.:21.80 3rd Qu.:104.10
## Max. :28.110 Max. :39.28 Max. :188.50
## area_mean smoothness_mean compactness_mean concavity_mean
## Min. : 143.5 Min. :0.05263 Min. :0.01938 Min. :0.00000
## 1st Qu.: 420.3 1st Qu.:0.08637 1st Qu.:0.06492 1st Qu.:0.02956
## Median : 551.1 Median :0.09587 Median :0.09263 Median :0.06154
## Mean : 654.9 Mean :0.09636 Mean :0.10434 Mean :0.08880
## 3rd Qu.: 782.7 3rd Qu.:0.10530 3rd Qu.:0.13040 3rd Qu.:0.13070
## Max. :2501.0 Max. :0.16340 Max. :0.34540 Max. :0.42680
## concave.points_mean symmetry_mean fractal_dimension_mean
## Min. :0.00000 Min. :0.1060 Min. :0.04996
## 1st Qu.:0.02031 1st Qu.:0.1619 1st Qu.:0.05770
## Median :0.03350 Median :0.1792 Median :0.06154
## Mean :0.04892 Mean :0.1812 Mean :0.06280
## 3rd Qu.:0.07400 3rd Qu.:0.1957 3rd Qu.:0.06612
## Max. :0.20120 Max. :0.3040 Max. :0.09744
## radius_se texture_se perimeter_se area_se
## Min. :0.1115 Min. :0.3602 Min. : 0.757 Min. : 6.802
## 1st Qu.:0.2324 1st Qu.:0.8339 1st Qu.: 1.606 1st Qu.: 17.850
## Median :0.3242 Median :1.1080 Median : 2.287 Median : 24.530
## Mean :0.4052 Mean :1.2169 Mean : 2.866 Mean : 40.337
## 3rd Qu.:0.4789 3rd Qu.:1.4740 3rd Qu.: 3.357 3rd Qu.: 45.190
## Max. :2.8730 Max. :4.8850 Max. :21.980 Max. :542.200
## smoothness_se compactness_se concavity_se
## Min. :0.001713 Min. :0.002252 Min. :0.00000
## 1st Qu.:0.005169 1st Qu.:0.013080 1st Qu.:0.01509
## Median :0.006380 Median :0.020450 Median :0.02589
## Mean :0.007041 Mean :0.025478 Mean :0.03189
## 3rd Qu.:0.008146 3rd Qu.:0.032450 3rd Qu.:0.04205
## Max. :0.031130 Max. :0.135400 Max. :0.39600
## concave.points_se symmetry_se fractal_dimension_se
## Min. :0.000000 Min. :0.007882 Min. :0.0008948
## 1st Qu.:0.007638 1st Qu.:0.015160 1st Qu.:0.0022480
## Median :0.010930 Median :0.018730 Median :0.0031870
## Mean :0.011796 Mean :0.020542 Mean :0.0037949
## 3rd Qu.:0.014710 3rd Qu.:0.023480 3rd Qu.:0.0045580
## Max. :0.052790 Max. :0.078950 Max. :0.0298400
## radius_worst texture_worst perimeter_worst area_worst
## Min. : 7.93 Min. :12.02 Min. : 50.41 Min. : 185.2
## 1st Qu.:13.01 1st Qu.:21.08 1st Qu.: 84.11 1st Qu.: 515.3
## Median :14.97 Median :25.41 Median : 97.66 Median : 686.5
## Mean :16.27 Mean :25.68 Mean :107.26 Mean : 880.6
## 3rd Qu.:18.79 3rd Qu.:29.72 3rd Qu.:125.40 3rd Qu.:1084.0
## Max. :36.04 Max. :49.54 Max. :251.20 Max. :4254.0
## smoothness_worst compactness_worst concavity_worst concave.points_worst
## Min. :0.07117 Min. :0.02729 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.11660 1st Qu.:0.14720 1st Qu.:0.1145 1st Qu.:0.06493
## Median :0.13130 Median :0.21190 Median :0.2267 Median :0.09993
## Mean :0.13237 Mean :0.25427 Mean :0.2722 Mean :0.11461
## 3rd Qu.:0.14600 3rd Qu.:0.33910 3rd Qu.:0.3829 3rd Qu.:0.16140
## Max. :0.22260 Max. :1.05800 Max. :1.2520 Max. :0.29100
## symmetry_worst fractal_dimension_worst
## Min. :0.1565 Min. :0.05504
## 1st Qu.:0.2504 1st Qu.:0.07146
## Median :0.2822 Median :0.08004
## Mean :0.2901 Mean :0.08395
## 3rd Qu.:0.3179 3rd Qu.:0.09208
## Max. :0.6638 Max. :0.20750
str(breastcan)
## 'data.frame': 569 obs. of 31 variables:
## $ diagnosis : Factor w/ 2 levels "B","M": 2 2 2 2 2 2 2 2 2 2 ...
## $ radius_mean : num 18 20.6 19.7 11.4 20.3 ...
## $ texture_mean : num 10.4 17.8 21.2 20.4 14.3 ...
## $ perimeter_mean : num 122.8 132.9 130 77.6 135.1 ...
## $ area_mean : num 1001 1326 1203 386 1297 ...
## $ smoothness_mean : num 0.1184 0.0847 0.1096 0.1425 0.1003 ...
## $ compactness_mean : num 0.2776 0.0786 0.1599 0.2839 0.1328 ...
## $ concavity_mean : num 0.3001 0.0869 0.1974 0.2414 0.198 ...
## $ concave.points_mean : num 0.1471 0.0702 0.1279 0.1052 0.1043 ...
## $ symmetry_mean : num 0.242 0.181 0.207 0.26 0.181 ...
## $ fractal_dimension_mean : num 0.0787 0.0567 0.06 0.0974 0.0588 ...
## $ radius_se : num 1.095 0.543 0.746 0.496 0.757 ...
## $ texture_se : num 0.905 0.734 0.787 1.156 0.781 ...
## $ perimeter_se : num 8.59 3.4 4.58 3.44 5.44 ...
## $ area_se : num 153.4 74.1 94 27.2 94.4 ...
## $ smoothness_se : num 0.0064 0.00522 0.00615 0.00911 0.01149 ...
## $ compactness_se : num 0.049 0.0131 0.0401 0.0746 0.0246 ...
## $ concavity_se : num 0.0537 0.0186 0.0383 0.0566 0.0569 ...
## $ concave.points_se : num 0.0159 0.0134 0.0206 0.0187 0.0188 ...
## $ symmetry_se : num 0.03 0.0139 0.0225 0.0596 0.0176 ...
## $ fractal_dimension_se : num 0.00619 0.00353 0.00457 0.00921 0.00511 ...
## $ radius_worst : num 25.4 25 23.6 14.9 22.5 ...
## $ texture_worst : num 17.3 23.4 25.5 26.5 16.7 ...
## $ perimeter_worst : num 184.6 158.8 152.5 98.9 152.2 ...
## $ area_worst : num 2019 1956 1709 568 1575 ...
## $ smoothness_worst : num 0.162 0.124 0.144 0.21 0.137 ...
## $ compactness_worst : num 0.666 0.187 0.424 0.866 0.205 ...
## $ concavity_worst : num 0.712 0.242 0.45 0.687 0.4 ...
## $ concave.points_worst : num 0.265 0.186 0.243 0.258 0.163 ...
## $ symmetry_worst : num 0.46 0.275 0.361 0.664 0.236 ...
## $ fractal_dimension_worst: num 0.1189 0.089 0.0876 0.173 0.0768 ...
The data we are going to work with has a dimention of 569 rows and 31 columns.
In order to work with the
neuralnet
package we create the following logical variables based on the previous variable diagnosis
(type of tumor):breastcan$Benign
breastcan$Benign[breastcan$diagnosis == "B"] = TRUE
breastcan$Benign[breastcan$diagnosis != "B"] = FALSE
breastcan$Malignant
breastcan$Malignant[breastcan$diagnosis == "M"] = TRUE
breastcan$Malignant[breastcan$diagnosis != "M"] = FALSE
#this new variable based on diagnosis variable will be used later on to evaluating model performance
breastcan$diag0[breastcan$diagnosis == "B"] = "1"
breastcan$diag0[breastcan$diagnosis == "M"] = "2"
2. Creating training and testing datasets
We will divide our data into two different sets: a training dataset that will be used to build the model and a test dataset that will be used to estimate the predictive accuracy of the model.
The dataset will be divided into training (67%) and testing (33%) sets, we create the data sets using the
caret
package:library(caret)
set.seed(123)
train_ind= createDataPartition(y = breastcan$diagnosis,p = 0.67,list = FALSE)
train = breastcan[train_ind,]
head(train)
## diagnosis radius_mean texture_mean perimeter_mean area_mean
## 2 M 20.57 17.77 132.90 1326.0
## 4 M 11.42 20.38 77.58 386.1
## 5 M 20.29 14.34 135.10 1297.0
## 8 M 13.71 20.83 90.20 577.9
## 9 M 13.00 21.82 87.50 519.8
## 10 M 12.46 24.04 83.97 475.9
## smoothness_mean compactness_mean concavity_mean concave.points_mean
## 2 0.08474 0.07864 0.08690 0.07017
## 4 0.14250 0.28390 0.24140 0.10520
## 5 0.10030 0.13280 0.19800 0.10430
## 8 0.11890 0.16450 0.09366 0.05985
## 9 0.12730 0.19320 0.18590 0.09353
## 10 0.11860 0.23960 0.22730 0.08543
## symmetry_mean fractal_dimension_mean radius_se texture_se perimeter_se
## 2 0.1812 0.05667 0.5435 0.7339 3.398
## 4 0.2597 0.09744 0.4956 1.1560 3.445
## 5 0.1809 0.05883 0.7572 0.7813 5.438
## 8 0.2196 0.07451 0.5835 1.3770 3.856
## 9 0.2350 0.07389 0.3063 1.0020 2.406
## 10 0.2030 0.08243 0.2976 1.5990 2.039
## area_se smoothness_se compactness_se concavity_se concave.points_se
## 2 74.08 0.005225 0.01308 0.01860 0.01340
## 4 27.23 0.009110 0.07458 0.05661 0.01867
## 5 94.44 0.011490 0.02461 0.05688 0.01885
## 8 50.96 0.008805 0.03029 0.02488 0.01448
## 9 24.32 0.005731 0.03502 0.03553 0.01226
## 10 23.94 0.007149 0.07217 0.07743 0.01432
## symmetry_se fractal_dimension_se radius_worst texture_worst
## 2 0.01389 0.003532 24.99 23.41
## 4 0.05963 0.009208 14.91 26.50
## 5 0.01756 0.005115 22.54 16.67
## 8 0.01486 0.005412 17.06 28.14
## 9 0.02143 0.003749 15.49 30.73
## 10 0.01789 0.010080 15.09 40.68
## perimeter_worst area_worst smoothness_worst compactness_worst
## 2 158.80 1956.0 0.1238 0.1866
## 4 98.87 567.7 0.2098 0.8663
## 5 152.20 1575.0 0.1374 0.2050
## 8 110.60 897.0 0.1654 0.3682
## 9 106.20 739.3 0.1703 0.5401
## 10 97.65 711.4 0.1853 1.0580
## concavity_worst concave.points_worst symmetry_worst
## 2 0.2416 0.1860 0.2750
## 4 0.6869 0.2575 0.6638
## 5 0.4000 0.1625 0.2364
## 8 0.2678 0.1556 0.3196
## 9 0.5390 0.2060 0.4378
## 10 1.1050 0.2210 0.4366
## fractal_dimension_worst Benign Malignant diag0
## 2 0.08902 FALSE TRUE 2
## 4 0.17300 FALSE TRUE 2
## 5 0.07678 FALSE TRUE 2
## 8 0.11510 FALSE TRUE 2
## 9 0.10720 FALSE TRUE 2
## 10 0.20750 FALSE TRUE 2
test = breastcan[-train_ind,]
head(test)
## diagnosis radius_mean texture_mean perimeter_mean area_mean
## 1 M 17.99 10.38 122.80 1001.0
## 3 M 19.69 21.25 130.00 1203.0
## 6 M 12.45 15.70 82.57 477.1
## 7 M 18.25 19.98 119.60 1040.0
## 15 M 13.73 22.61 93.60 578.3
## 19 M 19.81 22.15 130.00 1260.0
## smoothness_mean compactness_mean concavity_mean concave.points_mean
## 1 0.11840 0.2776 0.3001 0.14710
## 3 0.10960 0.1599 0.1974 0.12790
## 6 0.12780 0.1700 0.1578 0.08089
## 7 0.09463 0.1090 0.1127 0.07400
## 15 0.11310 0.2293 0.2128 0.08025
## 19 0.09831 0.1027 0.1479 0.09498
## symmetry_mean fractal_dimension_mean radius_se texture_se perimeter_se
## 1 0.2419 0.07871 1.0950 0.9053 8.589
## 3 0.2069 0.05999 0.7456 0.7869 4.585
## 6 0.2087 0.07613 0.3345 0.8902 2.217
## 7 0.1794 0.05742 0.4467 0.7732 3.180
## 15 0.2069 0.07682 0.2121 1.1690 2.061
## 19 0.1582 0.05395 0.7582 1.0170 5.865
## area_se smoothness_se compactness_se concavity_se concave.points_se
## 1 153.40 0.006399 0.04904 0.05373 0.01587
## 3 94.03 0.006150 0.04006 0.03832 0.02058
## 6 27.19 0.007510 0.03345 0.03672 0.01137
## 7 53.91 0.004314 0.01382 0.02254 0.01039
## 15 19.21 0.006429 0.05936 0.05501 0.01628
## 19 112.40 0.006494 0.01893 0.03391 0.01521
## symmetry_se fractal_dimension_se radius_worst texture_worst
## 1 0.03003 0.006193 25.38 17.33
## 3 0.02250 0.004571 23.57 25.53
## 6 0.02165 0.005082 15.47 23.75
## 7 0.01369 0.002179 22.88 27.66
## 15 0.01961 0.008093 15.03 32.01
## 19 0.01356 0.001997 27.32 30.88
## perimeter_worst area_worst smoothness_worst compactness_worst
## 1 184.6 2019.0 0.1622 0.6656
## 3 152.5 1709.0 0.1444 0.4245
## 6 103.4 741.6 0.1791 0.5249
## 7 153.2 1606.0 0.1442 0.2576
## 15 108.8 697.7 0.1651 0.7725
## 19 186.8 2398.0 0.1512 0.3150
## concavity_worst concave.points_worst symmetry_worst
## 1 0.7119 0.2654 0.4601
## 3 0.4504 0.2430 0.3613
## 6 0.5355 0.1741 0.3985
## 7 0.3784 0.1932 0.3063
## 15 0.6943 0.2208 0.3596
## 19 0.5372 0.2388 0.2768
## fractal_dimension_worst Benign Malignant diag0
## 1 0.11890 FALSE TRUE 2
## 3 0.08758 FALSE TRUE 2
## 6 0.12440 FALSE TRUE 2
## 7 0.08368 FALSE TRUE 2
## 15 0.14310 FALSE TRUE 2
## 19 0.07615 FALSE TRUE 2
The training set has 383 samples, and the testing set has 186 samples.
3. Training a model on the data
We first train the model with one hidden node:
#install.packages("neuralnet")
#install.packages("NeuralNetTools")
library(neuralnet)
## Warning: package 'neuralnet' was built under R version 3.4.1
library(NeuralNetTools)
## Warning: package 'NeuralNetTools' was built under R version 3.4.1
model1 = neuralnet(Benign + Malignant ~ radius_mean + texture_mean + perimeter_mean + area_mean + smoothness_mean + compactness_mean + concavity_mean + concave.points_mean + symmetry_mean + fractal_dimension_mean + radius_se + texture_se + perimeter_se + area_se + smoothness_se + compactness_se + concavity_se + concave.points_se + symmetry_se + fractal_dimension_se + radius_worst + texture_worst + perimeter_worst + area_worst + smoothness_worst + compactness_worst + concavity_worst + concave.points_worst + symmetry_worst + fractal_dimension_worst, data = train, hidden = 1)
predictions1 = compute(model1, test[2:31])
par(mar = numeric(4))
plotnet(model1)
Then, we train the model with three hidden nodes:
model2 = neuralnet(Benign + Malignant ~ radius_mean + texture_mean + perimeter_mean + area_mean + smoothness_mean + compactness_mean + concavity_mean + concave.points_mean + symmetry_mean + fractal_dimension_mean + radius_se + texture_se + perimeter_se + area_se + smoothness_se + compactness_se + concavity_se + concave.points_se + symmetry_se + fractal_dimension_se + radius_worst + texture_worst + perimeter_worst + area_worst + smoothness_worst + compactness_worst + concavity_worst + concave.points_worst + symmetry_worst + fractal_dimension_worst, data = train, hidden = 3)
predictions2 = compute(model2, test[2:31])
par(mar = numeric(4))
plotnet(model2)
In the plots the positive weights between layers are plot as black lines and negative weights as grey lines, and line thickness is in proportion to relative magnitude of each weight.
4. Evaluating model performance.
First, we evaluate the model with one hidden node:
evalu1 = as.vector(apply(predictions1$net.result, 1, which.max))
confu1 = confusionMatrix(evalu1, test$diag0, positive = "1")
confu1
## Confusion Matrix and Statistics
##
## Reference
## Prediction 1 2
## 1 114 9
## 2 3 60
##
## Accuracy : 0.9354839
## 95% CI : (0.8900157, 0.9662222)
## No Information Rate : 0.6290323
## P-Value [Acc > NIR] : < 0.00000000000000022
##
## Kappa : 0.8592509
## Mcnemar's Test P-Value : 0.1489147
##
## Sensitivity : 0.9743590
## Specificity : 0.8695652
## Pos Pred Value : 0.9268293
## Neg Pred Value : 0.9523810
## Prevalence : 0.6290323
## Detection Rate : 0.6129032
## Detection Prevalence : 0.6612903
## Balanced Accuracy : 0.9219621
##
## 'Positive' Class : 1
##
The accuracy of the model is 93.55 %, whit an error rate of 6.45 %.
The kappa statistic of the model is 0.86.
Then, we evaluate the model with three hidden nodes:
evalu2 = as.vector(apply(predictions2$net.result, 1, which.max))
confu2 = confusionMatrix(evalu2, test$diag0, positive = '1')
## Warning in confusionMatrix.default(evalu2, test$diag0, positive = "1"):
## Levels are not in the same order for reference and data. Refactoring data
## to match.
confu2
## Confusion Matrix and Statistics
##
## Reference
## Prediction 1 2
## 1 117 69
## 2 0 0
##
## Accuracy : 0.6290323
## 95% CI : (0.5553044, 0.6985759)
## No Information Rate : 0.6290323
## P-Value [Acc > NIR] : 0.5328217
##
## Kappa : 0
## Mcnemar's Test P-Value : 0.0000000000000002695185
##
## Sensitivity : 1.0000000
## Specificity : 0.0000000
## Pos Pred Value : 0.6290323
## Neg Pred Value : NaN
## Prevalence : 0.6290323
## Detection Rate : 0.6290323
## Detection Prevalence : 1.0000000
## Balanced Accuracy : 0.5000000
##
## 'Positive' Class : 1
##
The accuracy of the model is 62.9 %, whit an error rate of 37.1 %.
The kappa statistic of the model is 0.
Comparing both models we can see that we get a better model performance using 1 hidden node.