To estimate the parameters of a linear model we use Matrix algebra.
The data we are working with can be represented in a tabular form. For example, if we are working with three predictors:
y1 x11 x12 x13
y2 x21 x22 x23
y3 x31 x32 x33
... ... ... ...
yn xn1 xn2 xn3
Each line represents one observation in our data (n), y is
the response and x are the predictors.
We put this data into a matrix representation:
As we can see in the previous function, this model divides the
response into two components Xβ (systematic component)
and ε (random component). We have to take into account that the column of 1s
represents the intercept term.
The design matrix or model matrix is
the matrix built with the values of explanatory variables, denoted
by X. Each row represents one observation in the dataset, while the
columns correspond to the variables and their specific values for that
observation.
The design matrix contains data on the independent
variable (explanatory variables) which try to explain observed data on a response variable (dependent variable) in terms of the explanatory variables. The theory relating
to such models makes substantial use of matrix manipulations involving the
design matrix.
Example simple linear model
with 5 observations:
Example multiple linear model with 5 observations and 6 predictors: