Skip to main content

Ant Colony Optimization (part 2) : Graph optimization using ACO

The Travelling Salesman Problem (TSP) is one of the most famous problems in computer science for studying optimization, the objective is to find a complete route that connects all the nodes of a network, visiting them only once and returning to the starting point while minimizing the total distance of the route.

The problem of the traveling agent has an important variation, and this depends on whether the distances between one node and another are symmetric or not, that is, that the distance between A and B is equal to the distance between B and A, since in practice is very unlikely to be so.
The number of possible routes in a network is determined by the equation: (𝒏−𝟏)!
This means that in a network of 5 nodes the number of probable routes is equal to (5-1)! = 24, and as the number of nodes increases, the number of possible routes grows factorially.
In the case that the problem is symmetrical the number of possible routes is reduced to half:
( (𝒏−𝟏)! ) / 𝟐

The complexity of calculating the traveler's problem has sparked multiple initiatives to improve efficiency in route calculation. The most basic method is the one known by the name of ‘brute force’, which consists of the calculation of all possible routes, which becomes extremely inefficient and almost impossible in large networks. There are also heuristics that have been developed by the complexity in the calculation of optimal solutions in robust networks, that is why there are methods such as the nearest neighbor and the cheapest insertion.

Finally, we find algorithms that provide optimal solutions such as ACO algorithm. The basic idea underlying all the ant-based algorithm is to use a positive feedback mechanism based on the laying pheromone. The pheromone component allows the best solutions found to be kept in memory, which can be used to make up better solutions. To avoid stagnation of the algorithm a form of negative feedback is implemented through pheromone evaporation, but it must not evaporate too fast in order to make cooperation behavior emerge. In the TSP the goal is to find the tour with minimal length connecting 𝑛 given cities, and each city must be visited only once. The distance between cities can be defined by Euclidean distance or other distance functions. In the graph the cities would be the nodes and the connections between the cities are the edges of the graph. The graph does not need to be fully connected, all the nodes may not be connected to all the other nodes. And the distances may not be symmetric, distance 𝑖𝑗 may be different that distance 𝑗𝑖.

In order to solve the TSP using ACO the transitions of the ants from city to city depends on the following premises:
▪ Whether or not the city has already been visited. Each ant has a memory or tabu list to make sure each city is visited once per tour.
▪ The inverse of the distance between two nodes (visibility). Visibility is based on local information and represents the heuristic desirability of choosing city 𝑗 when in city 𝑖.
▪ The amount of virtual pheromone on the edges. It is a global type of information, represents the learned desirability of choosing city 𝑗 when in city 𝑖.

Other things to take into account are:
▪ The transition rule: the probability for an ant to go from city 𝑖 to city 𝑗 while building a tour.
▪ Pheromone decay: Without pheromone decay the algorithm would lead to amplification of the initial random fluctuation, that will produce not optimal solutions.
▪ Total number of ants: It is an important parameter since too many ants will reinforce not optimal solutions, while too few ants would not produce cooperative effect due to pheromone decay. It is suggested to use a number of ants equal to the number of cities in the graph.


CODE using Jupyter Notebook:


!pip install ACO-Pants
import pants
import math
import random
The input for ACOPants is a list of coordinates (x, y) of the nodes, and providing a length function to the algorithm this is able to calculate the distances from node 𝑖 to 𝑗.
Here we have get a csv .file that contains information about cities all around the world from the webpage: http://simplemaps.com/data/world-cities.
We will work with the cities from the USA and with the coordenates in decimal degrees (lat and lng).
import pandas as pd
import numpy as np
cities = pd.read_csv('C:/Desktop/BLOG/worldcities.csv', decimal=".")
USAcities = cities.loc[cities['country'] == 'United States of America'] #only the cities that belong to USA
print('Dimention USAcities:', UScities.shape) #dimention of USAcities dataset
Dimention USAcities: (100, 9)
UScities = USAcities.sample(100) #to get a sample of 100 rows to work with
print('Dimention UScities:', UScities.shape)  #dimention UScities dataset
UScities.head() #fisrt rows from the new dataset
Dimention UScities: (100, 9)
citycity_asciilatlngpopcountryiso2iso3province
6739BloomingtonBloomington39.165657-86.52640985781.5United States of AmericaUSUSAIndiana
7039PittsburghPittsburgh40.429999-79.9999851535267.5United States of AmericaUSUSAPennsylvania
6527St. CloudSt. Cloud45.561210-94.16222285974.0United States of AmericaUSUSAMinnesota
6866OdessaOdessa31.845561-102.36722598655.0United States of AmericaUSUSATexas
6604AlbanyAlbany44.620492-123.08694248066.5United States of AmericaUSUSAOregon
To calculate the distances from from node 𝑖 to 𝑗, we are going to use Euclidean distance, which is the straight-line distance between two points or nodes.
def euclidean(a, b):
    return math.sqrt(pow(a[1] - b[1], 2) + pow(a[0] - b[0], 2))
Since the input is a list of nodes(x,y):
x = UScities['lat']
y = UScities['lng']
DD = list(zip(x,y)) #UScities represented in decimal degrees
print(DD)
[(39.165657160000002, -86.52640873), (40.429998600000005, -79.999985390000006), (45.561209939999998, -94.162221720000005), (31.84556134, -102.3672248), (44.620492169999999, -123.08694199999999), (41.24000083, -96.009990070000001), (41.661086240000003, -91.52997929), (33.410375389999999, -91.061687460000002), (39.178731299999995, -78.166634770000002), (36.060807920000002, -102.5186109), (42.833004369999998, -108.73259850000001), (43.750829490000001, -87.714424070000007), (36.754150610000003, -108.18609440000002), (42.090129820000001, -76.808035520000004), (44.163620829999999, -93.999156740000004), (45.37375368, -84.955186810000001), (37.975213029999999, -100.86408659999999), (37.550019349999999, -77.449985999999996), (46.188380960000003, -123.82999740000001), (39.154236670000003, -123.2108621), (42.448195820000002, -73.259828330000005), (59.547307150000002, -139.72721830000003), (45.822460139999997, -88.064092650000006), (62.085524309999997, -163.72900900000002), (45.520023819999999, -122.67999009999998), (33.423914609999997, -111.73608440000001), (43.09482302, -79.036943399999998), (37.104155089999999, -113.58333600000002), (35.369971540000002, -119.01998090000001), (63.733098159999997, -148.9140994), (32.31261293, -106.77780829999999), (40.193759790000001, -85.386374959999998), (38.280388200000004, -104.6300066), (64.506100079999996, -165.4063744), (43.012864200000003, -83.687538090000004), (39.820009990000003, -89.650016519999994), (47.474219789999999, -115.9268881), (42.101395279999998, -102.8701915), (41.490398990000003, -71.31335799), (43.208071920000002, -71.538047120000002), (39.158086570000002, -75.524703000000002), (40.885190450000003, -124.08822450000001), (33.220464499999999, -117.3349675), (27.51595481, -97.855846400000004), (42.329960139999997, -83.080055790000003), (31.603741469999999, -94.655266560000001), (36.747200130000003, -95.980586180000003), (61.004143290000002, -159.9404806), (61.58173077, -149.43944199999999), (43.549989029999999, -96.729997800000007), (64.787995010000003, -141.19999659999999), (57.060397690000002, -135.32754939999998), (57.564559959999997, -157.56912659999998), (35.761937279999998, -119.24306809999999), (45.672598489999999, -118.78748859999999), (41.790664899999996, -107.234292), (61.578707700000002, -159.52218569999999), (36.685808530000003, -101.4795012), (29.53800193, -81.223295739999998), (34.949428730000001, -81.932270549999998), (34.257920550000001, -88.703330120000004), (46.906011579999998, -98.702978150000007), (31.57873008, -84.155829920000002), (32.820023820000003, -96.840016930000004), (59.070361009999999, -160.37832340000003), (61.53108787, -166.09656480000001), (41.080398170000002, -85.129982339999998), (36.070006329999998, -79.800023440000004), (39.091113909999997, -94.415281210000003), (62.079684870000001, -150.07276250000001), (47.12729006, -88.580805299999994), (39.59979087, -110.81001689999999), (37.760058209999997, -100.01819499999999), (39.65317263, -78.762774089999994), (34.940126970000001, -120.43663859999999), (32.537457089999997, -82.918282719999993), (41.493396220000001, -90.53461369), (46.495261450000001, -84.345275720000004), (41.070398779999998, -81.519995969999997), (40.793723159999999, -77.860245200000008), (26.303186459999999, -98.159962199999995), (36.07731854, -75.704717860000002), (45.165988589999998, -67.242392010000003), (39.050005310000003, -95.669984990000003), (42.670016910000001, -73.819949179999995), (42.439540020000003, -123.3271857), (66.60387901, -160.00939109999999), (46.003896099999999, -112.53383940000001), (34.12038373, -117.3000342), (29.819974380000001, -95.339979290000002), (61.135995710000003, -146.348287), (44.529980899999998, -88.000013879999997), (35.47004295, -97.518683510000002), (32.671945010000002, -117.09800520000002), (30.18971926, -82.63974675), (47.038044859999999, -122.89943400000001), (35.612876610000001, -77.366683599999988), (32.030718, -102.09749959999999), (32.50001752, -93.770023440000003), (32.850383729999997, -83.630048059999993)]
Optional arguments:
-a A, --alpha A relative importance placed on pheromones; default=1
-b B, --beta B relative importance placed on distances; default=3
-l L, --limit L number of iterations to perform; default=100
-p P, --rho P ratio of evaporated pheromone (0 <= P <= 1); default=0.8
-e E, --elite E ratio of elite ant's pheromone; default=0.5
-q Q, --Q Q total pheromone capacity of each ant (Q > 0); default=1
-t T, --t0 T initial amount of pheromone on every edge (T > 0); default=0.01
-c N, --count N number of ants used in each iteration (N > 0); default=10
Arguments are very important and they can affect the result. Usually, it is used as many number of ants (N) as nodes. Also, is better to use a higher value of beta(distance) than beta(pheromone).
#Here we will use a number of ants less than number of nodes (N= 5).
#Number of iterations L = 5 instead of 100.
#Alpha and beta with the same relative importance (A, B = 1)

world = pants.World(DD, euclidean, N = 5, L = 5 , A = 1, B = 1)
solver = pants.Solver()
solution = solver.solve(world)
print('DISTANCE:', solution.distance) #total distance of the tour performed
tour = solution.tour    #nodes visited in order
print(tour)
DISTANCE: 486.7331895990335
[(59.070361009999999, -160.37832340000003), (61.578707700000002, -159.52218569999999), (61.004143290000002, -159.9404806), (62.085524309999997, -163.72900900000002), (61.53108787, -166.09656480000001), (64.506100079999996, -165.4063744), (63.733098159999997, -148.9140994), (64.787995010000003, -141.19999659999999), (59.547307150000002, -139.72721830000003), (57.060397690000002, -135.32754939999998), (39.154236670000003, -123.2108621), (40.885190450000003, -124.08822450000001), (47.038044859999999, -122.89943400000001), (46.188380960000003, -123.82999740000001), (45.520023819999999, -122.67999009999998), (44.620492169999999, -123.08694199999999), (42.439540020000003, -123.3271857), (46.003896099999999, -112.53383940000001), (42.833004369999998, -108.73259850000001), (41.790664899999996, -107.234292), (38.280388200000004, -104.6300066), (36.060807920000002, -102.5186109), (37.760058209999997, -100.01819499999999), (37.975213029999999, -100.86408659999999), (36.685808530000003, -101.4795012), (32.030718, -102.09749959999999), (31.84556134, -102.3672248), (36.747200130000003, -95.980586180000003), (35.47004295, -97.518683510000002), (31.603741469999999, -94.655266560000001), (32.50001752, -93.770023440000003), (29.819974380000001, -95.339979290000002), (32.820023820000003, -96.840016930000004), (39.091113909999997, -94.415281210000003), (39.050005310000003, -95.669984990000003), (41.24000083, -96.009990070000001), (44.163620829999999, -93.999156740000004), (45.561209939999998, -94.162221720000005), (43.549989029999999, -96.729997800000007), (41.661086240000003, -91.52997929), (41.493396220000001, -90.53461369), (43.750829490000001, -87.714424070000007), (44.529980899999998, -88.000013879999997), (45.822460139999997, -88.064092650000006), (47.12729006, -88.580805299999994), (40.193759790000001, -85.386374959999998), (39.165657160000002, -86.52640873), (32.850383729999997, -83.630048059999993), (32.537457089999997, -82.918282719999993), (26.303186459999999, -98.159962199999995), (27.51595481, -97.855846400000004), (32.31261293, -106.77780829999999), (37.104155089999999, -113.58333600000002), (33.423914609999997, -111.73608440000001), (39.59979087, -110.81001689999999), (34.12038373, -117.3000342), (35.761937279999998, -119.24306809999999), (35.369971540000002, -119.01998090000001), (32.671945010000002, -117.09800520000002), (33.220464499999999, -117.3349675), (34.940126970000001, -120.43663859999999), (36.754150610000003, -108.18609440000002), (42.101395279999998, -102.8701915), (33.410375389999999, -91.061687460000002), (34.257920550000001, -88.703330120000004), (29.53800193, -81.223295739999998), (30.18971926, -82.63974675), (31.57873008, -84.155829920000002), (34.949428730000001, -81.932270549999998), (36.070006329999998, -79.800023440000004), (36.07731854, -75.704717860000002), (35.612876610000001, -77.366683599999988), (37.550019349999999, -77.449985999999996), (39.178731299999995, -78.166634770000002), (39.65317263, -78.762774089999994), (40.793723159999999, -77.860245200000008), (40.429998600000005, -79.999985390000006), (41.070398779999998, -81.519995969999997), (42.329960139999997, -83.080055790000003), (43.012864200000003, -83.687538090000004), (39.158086570000002, -75.524703000000002), (43.208071920000002, -71.538047120000002), (45.165988589999998, -67.242392010000003), (41.490398990000003, -71.31335799), (42.670016910000001, -73.819949179999995), (42.448195820000002, -73.259828330000005), (42.090129820000001, -76.808035520000004), (43.09482302, -79.036943399999998), (46.495261450000001, -84.345275720000004), (45.37375368, -84.955186810000001), (39.820009990000003, -89.650016519999994), (41.080398170000002, -85.129982339999998), (46.906011579999998, -98.702978150000007), (45.672598489999999, -118.78748859999999), (47.474219789999999, -115.9268881), (61.135995710000003, -146.348287), (62.079684870000001, -150.07276250000001), (61.58173077, -149.43944199999999), (57.564559959999997, -157.56912659999998), (66.60387901, -160.00939109999999)]
To get the names of the cities vivited from the nodes values:
UScities.set_index(['lat','lng'])['city'].loc[tour].tolist()
['Togiak',
 'Aniak',
 'Nyac',
 'Mountain Village',
 'Hooper Bay',
 'Nome',
 'Denali Park',
 'Eagle',
 'Yakutat',
 'Sitka',
 'Ukiah',
 'Arcata',
 'Olympia',
 'Astoria',
 'Portland',
 'Albany',
 'Grants Pass',
 'Butte',
 'Lander',
 'Rawlins',
 'Pueblo',
 'Dalhart',
 'Dodge City',
 'Garden City',
 'Guymon',
 'Midland',
 'Odessa',
 'Bartlesville',
 'Oklahoma City',
 'Nacogdoches',
 'Shreveport',
 'Houston',
 'Dallas',
 'Independence',
 'Topeka',
 'Omaha',
 'Mankato',
 'St. Cloud',
 'Sioux Falls',
 'Iowa City',
 'Rock Island',
 'Sheboygan',
 'Green Bay',
 'Iron Mountain',
 'Hancock',
 'Muncie',
 'Bloomington',
 'Macon',
 'Dublin',
 'Edinburg',
 'Kingsville',
 'Las Cruces',
 'St. George',
 'Mesa',
 'Price',
 'San Bernardino',
 'Delano',
 'Bakersfield',
 'National City',
 'Oceanside',
 'Santa Maria',
 'Farmington',
 'Alliance',
 'Greenville',
 'Tupelo',
 'Palm Coast',
 'Lake City',
 'Albany',
 'Spartanburg',
 'Greensboro',
 'Kitty Hawk',
 'Greenville',
 'Richmond',
 'Winchester',
 'Cumberland',
 'State College',
 'Pittsburgh',
 'Akron',
 'Detroit',
 'Flint',
 'Dover',
 'Concord',
 'Calais',
 'Newport',
 'Albany',
 'Pittsfield',
 'Elmira',
 'Niagara Falls',
 'Sault Ste. Marie',
 'Petoskey',
 'Springfield',
 'Fort Wayne',
 'Jamestown',
 'Pendleton',
 'Wallace',
 'Valdez',
 'Montana',
 'Wasilla',
 'Pilot Point',
 'Selawik']
#Here we will use a number of ants bigger than the number of nodes (N= 100).
#Number of iterations L = 150.
#Alpha and beta with the different relative importance, distance (beta) will be more importat. (A = 2, B = 3)

world = pants.World(DD, euclidean, N = 150, L = 150 , A = 2, B = 3)
solver = pants.Solver()
solution = solver.solve(world)
print('DISTANCE:', solution.distance) #total distance of the tour performed
tour1 = solution.tour    #nodes visited in order
print(tour1)
DISTANCE: 477.686085193725
[(62.085524309999997, -163.72900900000002), (61.53108787, -166.09656480000001), (64.506100079999996, -165.4063744), (66.60387901, -160.00939109999999), (61.004143290000002, -159.9404806), (61.578707700000002, -159.52218569999999), (59.070361009999999, -160.37832340000003), (57.564559959999997, -157.56912659999998), (63.733098159999997, -148.9140994), (61.58173077, -149.43944199999999), (62.079684870000001, -150.07276250000001), (61.135995710000003, -146.348287), (57.060397690000002, -135.32754939999998), (39.59979087, -110.81001689999999), (36.754150610000003, -108.18609440000002), (42.833004369999998, -108.73259850000001), (46.003896099999999, -112.53383940000001), (47.474219789999999, -115.9268881), (45.672598489999999, -118.78748859999999), (44.620492169999999, -123.08694199999999), (45.520023819999999, -122.67999009999998), (46.188380960000003, -123.82999740000001), (47.038044859999999, -122.89943400000001), (42.439540020000003, -123.3271857), (40.885190450000003, -124.08822450000001), (39.154236670000003, -123.2108621), (35.761937279999998, -119.24306809999999), (35.369971540000002, -119.01998090000001), (34.12038373, -117.3000342), (33.220464499999999, -117.3349675), (32.671945010000002, -117.09800520000002), (34.940126970000001, -120.43663859999999), (37.104155089999999, -113.58333600000002), (33.423914609999997, -111.73608440000001), (38.280388200000004, -104.6300066), (36.685808530000003, -101.4795012), (36.060807920000002, -102.5186109), (37.760058209999997, -100.01819499999999), (36.747200130000003, -95.980586180000003), (41.24000083, -96.009990070000001), (39.050005310000003, -95.669984990000003), (39.091113909999997, -94.415281210000003), (41.080398170000002, -85.129982339999998), (40.193759790000001, -85.386374959999998), (31.57873008, -84.155829920000002), (32.850383729999997, -83.630048059999993), (32.537457089999997, -82.918282719999993), (30.18971926, -82.63974675), (29.53800193, -81.223295739999998), (34.949428730000001, -81.932270549999998), (36.070006329999998, -79.800023440000004), (42.090129820000001, -76.808035520000004), (43.09482302, -79.036943399999998), (40.793723159999999, -77.860245200000008), (39.178731299999995, -78.166634770000002), (39.65317263, -78.762774089999994), (40.429998600000005, -79.999985390000006), (41.070398779999998, -81.519995969999997), (43.012864200000003, -83.687538090000004), (42.329960139999997, -83.080055790000003), (45.37375368, -84.955186810000001), (46.495261450000001, -84.345275720000004), (44.163620829999999, -93.999156740000004), (45.561209939999998, -94.162221720000005), (33.410375389999999, -91.061687460000002), (34.257920550000001, -88.703330120000004), (39.165657160000002, -86.52640873), (39.820009990000003, -89.650016519999994), (41.493396220000001, -90.53461369), (41.661086240000003, -91.52997929), (45.822460139999997, -88.064092650000006), (44.529980899999998, -88.000013879999997), (43.750829490000001, -87.714424070000007), (47.12729006, -88.580805299999994), (36.07731854, -75.704717860000002), (35.612876610000001, -77.366683599999988), (37.550019349999999, -77.449985999999996), (39.158086570000002, -75.524703000000002), (42.670016910000001, -73.819949179999995), (42.448195820000002, -73.259828330000005), (43.208071920000002, -71.538047120000002), (41.490398990000003, -71.31335799), (45.165988589999998, -67.242392010000003), (31.603741469999999, -94.655266560000001), (29.819974380000001, -95.339979290000002), (32.50001752, -93.770023440000003), (35.47004295, -97.518683510000002), (32.820023820000003, -96.840016930000004), (27.51595481, -97.855846400000004), (26.303186459999999, -98.159962199999995), (31.84556134, -102.3672248), (32.030718, -102.09749959999999), (32.31261293, -106.77780829999999), (37.975213029999999, -100.86408659999999), (46.906011579999998, -98.702978150000007), (43.549989029999999, -96.729997800000007), (42.101395279999998, -102.8701915), (41.790664899999996, -107.234292), (59.547307150000002, -139.72721830000003), (64.787995010000003, -141.19999659999999)]
UScities.set_index(['lat','lng'])['city'].loc[tour1].tolist()
['Mountain Village',
 'Hooper Bay',
 'Nome',
 'Selawik',
 'Nyac',
 'Aniak',
 'Togiak',
 'Pilot Point',
 'Denali Park',
 'Wasilla',
 'Montana',
 'Valdez',
 'Sitka',
 'Price',
 'Farmington',
 'Lander',
 'Butte',
 'Wallace',
 'Pendleton',
 'Albany',
 'Portland',
 'Astoria',
 'Olympia',
 'Grants Pass',
 'Arcata',
 'Ukiah',
 'Delano',
 'Bakersfield',
 'San Bernardino',
 'Oceanside',
 'National City',
 'Santa Maria',
 'St. George',
 'Mesa',
 'Pueblo',
 'Guymon',
 'Dalhart',
 'Dodge City',
 'Bartlesville',
 'Omaha',
 'Topeka',
 'Independence',
 'Fort Wayne',
 'Muncie',
 'Albany',
 'Macon',
 'Dublin',
 'Lake City',
 'Palm Coast',
 'Spartanburg',
 'Greensboro',
 'Elmira',
 'Niagara Falls',
 'State College',
 'Winchester',
 'Cumberland',
 'Pittsburgh',
 'Akron',
 'Flint',
 'Detroit',
 'Petoskey',
 'Sault Ste. Marie',
 'Mankato',
 'St. Cloud',
 'Greenville',
 'Tupelo',
 'Bloomington',
 'Springfield',
 'Rock Island',
 'Iowa City',
 'Iron Mountain',
 'Green Bay',
 'Sheboygan',
 'Hancock',
 'Kitty Hawk',
 'Greenville',
 'Richmond',
 'Dover',
 'Albany',
 'Pittsfield',
 'Concord',
 'Newport',
 'Calais',
 'Nacogdoches',
 'Houston',
 'Shreveport',
 'Oklahoma City',
 'Dallas',
 'Kingsville',
 'Edinburg',
 'Odessa',
 'Midland',
 'Las Cruces',
 'Garden City',
 'Jamestown',
 'Sioux Falls',
 'Alliance',
 'Rawlins',
 'Yakutat',
 'Eagle']
We have to take into account that adjusting the parameters we can obtain different results. For instance, running ACO as in the first example were the parameters were not well adjusted the results are: 479,484,474,458,475..., while in the second example were the parameters are well adjusted, the results are: 461,457,470,472,470... We can see that the results in the first case are more spread (458-484), while in the second case the results are less spread and more consistent (457-472).

Popular posts from this blog

Support Vector Machines (SVM) in R (package 'kernlab')

Support Vector Machines (SVM) learning combines of both the instance-based nearest neighbor algorithm and the linear regression modeling. Support Vector Machines can be imagined as a surface that creates a boundary (hyperplane) between points of data plotted in multidimensional that represents examples and their feature values. Since it is likely that the line that leads to the greatest separation will generalize the best to the future data, SVM involves a search for the Maximum Margin Hyperplane (MMH) that creates the greatest separation between the 2 classes. If the data ara not linearly separable is used a slack variable, which creates a soft margin that allows some points to fall on the incorrect side of the margin. But, in many real-world applications, the relationship between variables are nonlinear. A key featureof the SVMs are their ability to map the problem to a higher dimension space using a process known as the Kernel trick, this involves a process of constructing ne

Initial Data Analysis (infert dataset)

Initial analysis is a very important step that should always be performed prior to analysing the data we are working with. The data we receive most of the time is messy and may contain mistakes that can lead us to wrong conclusions. Here we will use the dataset infert , that is already present in R. To get to know the data is very important to know the background and the meaning of each variable present in the dataset. Since infert is a dataset in R we can get information about the data using the following code: require(datasets) ?infert #gives us important info about the dataset inf <- infert #renamed dataset as 'inf' This gives us the following information: Format 1.Education: 0 = 0-5 years, 1 = 6-11 years, 2 = 12+ years 2.Age: Age in years of case 3.Parity: Count 4.Number of prior induced abortions: 0 = 0, 1 = 1, 2 = 2 or more 5.Case status: 1 = case 0 = control 6.Number of prior spontaneous abortions: 0 = 0, 1 = 1, 2