Visualizing data prior to any analysis is a basic and important step. Univariate plots are those that take into account one varible, these may include histograms, density plots, etc.
Here there are som examples with the dataset
airquality
:Histogram is a very well-known univariate plot:
hist(airquality$Wind)
For the construction of the histogram a specific number of bins and their spacing in the horizontal axis is required. The number of bins can be set by the parameter
breaks
:par(mfrow = c(1,3))
hist(airquality$Wind, xlab = "Wind", col = "red", main=" Histogram Wind")
hist(airquality$Wind, xlab = "Wind", col = "orange", main=" Histogram Wind", breaks= 20)
hist(airquality$Wind, xlab = "Wind", col = "gold", main=" Histogram Wind", breaks= 40)
For setting the axis limits:
par(mfrow = c(1,2)) hist(airquality$Wind, xlab = "Wind", col = "red", main=" Hist. Wind (bigger axis)", breaks= 20, xlim = c(0,30), ylim = c(0,30)) hist(airquality$Wind, xlab = "Wind", col = "orange", main=" Hist. Wind (smaller axis)", breaks= 20, xlim = c(0,15), ylim = c(0,15))
Since histograms use bins, some choices can lead to misrepresentations which obscures features of the data, and hence, it can be a poor method for determining the shape of a distribution.
Histogram with a normal curve:
hist(airquality$Wind, xlab = "Wind", col = "red1", main=" Histogram Wind", breaks= 20, freq = FALSE, border = "pink", col.main = "red1")
curve(dnorm(x, mean=mean(airquality$Wind), sd=sd(airquality$Wind)), add=TRUE, col= "purple", lty = 3, lwd = 3)
DENSITY PLOT:
To overcome the problem of data misrepresentation when using histograms we can use kernel density plots:
plot(density(airquality$Wind))
plot(density(airquality$Wind), main = "Density plot", col.main= "red1")
polygon(density(airquality$Wind), col="red1", border="purple", lty = 1, lwd = 3)
SORTED PLOT:
Also, we can use another type of graph and sort all the cases individually, which allows us to check if there are outliners in the dataset we are working with and the distribution of the data.
plot(sort(airquality$Wind), main = "Sorted plot" , ylab = "Sorted Wind", pch = 3)