In this hands-on session, we will review the concept of projections onto vector subspaces. Specifically, we aim to achieve the following objectives:
This hands-on session comprises some challenging concepts. We believe that all the students at your level are able to grasp them with time and dedication, but it would also be within the expected if you donāt finish this document or donāt understand everything in the classroom. You will have this document for you to study after the class and we will be available to answer any questions.
We want to find the best approximation to the solution for a system of linear equations \(Ax = b\) that has no proper, exact solution in the traditional sense. We can approach this problem from two different perspectives, which master Gilbert Strang refers to as the ārow and column picturesā of a system of equations.
Let us illustrate the problem with the following system of equations:
\[ \begin{cases} x=6\\x+y=0\\x+2y=0 \end{cases} \]
Which can be represented in matrix form as:
\[ Ax = b \]
\[ \begin{bmatrix}1&0 \\1&1 \\1&2 \\\end{bmatrix} \begin{bmatrix}x \\y \\\end{bmatrix} = \begin{bmatrix}6 \\0 \\0 \\\end{bmatrix} \]
Where \(A\) is the coefficient matrix, \(x\) is the vector that contains the unknowns and \(b\) is the vector that contains the constants.
Geometrically, the row picture tells us that a system of linear equations has no solution if all the lines specified by the equations do not meet in a single point:
This is typically the case in systems of linear equations with more equations than unknowns. We can think of these as systems with ātall and thin matricesā (more rows than columns).
The term ārow pictureā is reminiscent of the fact that each row of the system represents an equation and every linear equation in the plane has typically not just one but many āa straight line ofā solutions.
Understanding \(Ax = b\) using the column picture is the ultimate goal of linear algebra. We do not think of equations and unknowns anymore. Instead, we represent the equations as a matrix-vector multiplication:
\[ Ax = b \]
We can think of the vector \(b\) as a linear combination of the column vectors of A. The scalars of this linear combinations are the elements of the vector \(x\).
Considering our system of linear equations we have:
\[ \begin{bmatrix}1&0 \\1&1 \\1&2 \\\end{bmatrix} \begin{bmatrix}x_1 \\x_2 \\\end{bmatrix} = \begin{bmatrix}6 \\0 \\0 \\\end{bmatrix} \]
\[ x_1 \begin{bmatrix}1 \\1 \\1 \\\end{bmatrix} + x_2 \begin{bmatrix}0 \\1 \\2 \\\end{bmatrix} = \begin{bmatrix}6 \\0 \\0 \\\end{bmatrix} \] *Remember that the column space of \(A\) is the vector space generated by the columns of \(A\) considered as vectors.
Thus, \(Ax = b\) has no proper solution in the traditional sense whenever \(b\) is not contained in \(C(A)\). In other words, there is no two \(x_1\) and \(x_2\) that specify a linear combination of \(a_1\) and \(a_2\) that produces \(b\).
Exercise
To geometrically visualize this, enter the shiny app associated with this tutorial by clicking the following link or copying it into your browser:
https://massonix.shinyapps.io/projections_onto_subspaces/
Then, in the left hand side of the app enter the matrix \(A\) and the vector \(b\). The app will render a visualization of \(C(A)\) as a plane in \(\mathbb{R}^3\), and \(b\) as a vector in \(\mathbb{R}^3\), which need not be contained in the plane.
As we have seen above, in our system of linear equations, there is in general no vector \(x\) that will provide a proper solution of \(Ax = b\) in the traditional sense. However, we can provide āthe best solution we canā by finding \(\widehat{x}\) such that vector \(p = \widehat{x}_1 a_1 + Ā·Ā·Ā· + \widehat{x}_n a_n\) is as close as possible to the vector b.
To make the āas close as possibleā more precise, we will consider the closest vector to \(b\) that belongs to the column space \(C(A)\) by moving perpendicularly with respect to \(C(A)\), i.e., \(p=p(b)\) will be the vector obtained with an orthogonal projection of \(b\) onto \(C(A)\). You can visualize this in the shiny app.
As \(p(b)\) is in \(C(A)\), we can find a vector \(\widehat{x}\) that solves the new system:
\[ A\widehat{x} = p(b) \]
We will first find \(\widehat{x}\), and then we will calculate \(p(b)\), that is the projection of \(b\) onto the column space \(C(A)\).
We can also calculate the projection matrix \(P\) that projects any vector onto the column space \(C(A)\).
We have seen that, if \(p(b)\) is the projection of \(b\) onto the column space \(C(A)\), then \(p(b)-b\) belongs to the orthogonal space of \(C(A)\), which is the same as the null space of the transpose of \(A\), i.e, \(N(A^t)\) (see the chapter 5 of the course notes).
Ā
\[ p(b) - b = A\widehat{x}-b \hspace{0.1cm} \textrm{ belongs to } C(A)^\perp = N(A^t) \] means that:
\[ A^t(A\widehat{x}-b)=\vec{0} \]
Ā
Let us distribute \(A^t\):
\[ A^t A\widehat{x}-A^tb = 0 \hspace{0.1cm} \Longrightarrow \hspace{0.1cm} A^t A\widehat{x} = A^tb \]
We know that the symmetric matrix \(A^tA\) is invertible if the column vectors are linearly independent. If thatās the case, we can multiply both sides by \((A^tA)^{-1}\) and isolate \(\widehat{x}\):
\[ (A^t A)^{-1}( A^t A)\widehat{x} = (A^t A)^{-1} A^tb \] \[ \textrm{Id }\widehat{x} = (A^t A)^{-1} A^tb \]
\[ \widehat{x} = (A^t A)^{-1} A^tb \]
Let us apply this formula to the example above. To make computations easier, we will use the operations in R that we learned in the previous hands-on session. First, we compute \(A^tA\):
A <- matrix(c(1, 1, 1, 0, 1, 2), nrow = 3, ncol = 2, byrow = FALSE)
AT_A <- t(A) %*% A
AT_A
## [,1] [,2]
## [1,] 3 3
## [2,] 3 5
\[ A^tA = \begin{bmatrix}1&1&1 \\0&1&2 \\\end{bmatrix} \begin{bmatrix}1&0 \\1&1 \\1&2 \\\end{bmatrix} = \begin{bmatrix}3&3 \\3&5 \\\end{bmatrix} \]
Now we find the inverse \((A^t A)^{-1}\):
library(matlib)
library(MASS)
AT_A_inv <- inv(AT_A)
AT_A_inv <- fractions(AT_A_inv)
AT_A_inv
## [,1] [,2]
## [1,] 5/6 -1/2
## [2,] -1/2 1/2
\[ (A^t A)^{-1} = \begin{bmatrix} 5/6 & -1/2\\ -1/2 & 1/2 \end{bmatrix} \]
Finally, we can solve for \(\widehat{x} = (A^t A)^{-1} A^tb\):
b <- matrix(c(6, 0, 0), nrow = 3)
x_hat <- AT_A_inv %*% t(A) %*% b
x_hat
## [,1]
## [1,] 5
## [2,] -3
\[ \widehat{x} = (A^t A)^{-1} A^tb = \begin{bmatrix} 5/6 & -1/2\\ -1/2 & 1/2 \end{bmatrix} \begin{bmatrix}1&1&1 \\0&1&2 \\\end{bmatrix} \begin{bmatrix}6 \\0 \\0 \\\end{bmatrix} = \begin{bmatrix} 5 \\ -3 \end{bmatrix} \]
Let us find \(p(b)\), which is the projection of \(b\) onto \(C(A)\):
\[ p(b) = A \widehat{x} = A (A^t A)^{-1} A^tb \]
p <- A %*% x_hat
p
## [,1]
## [1,] 5
## [2,] 2
## [3,] -1
\[ p(b) = \begin{bmatrix}1&0 \\1&1 \\1&2 \\\end{bmatrix} \begin{bmatrix}4.99999998 \\-3 \\\end{bmatrix} = \begin{bmatrix}4.99999998 \\1.99999998 \\-1.00000002 \\\end{bmatrix} \]
Now we can pursue one more step to find the matrix \(P\) encoding the orthogonal projection onto the column space \(C(A)\), which describes a linear transformation that takes any vector and projects in onto \(C(A)\).
If we look at what happened with \(b\), it is easy to spot the matrix that did the job: \(P = A (A^t A)^{-1} A^t\).
Projection matrices have two interesting properties:
\[ P^t = P \]
\[ P^2 = P \]
Can you guess why?
Hint:
In this session we move our focus from matrices to dataframes. Remember that dataframes are data structures that are 2 dimensional (tabular) and heterogeneous (can handle numerics, characters and logicals together). Intuitively, you can think of them as spreadsheets.
Here, we will introduce the tidyverse. As defined by the authors: āThe tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structuresā.
You can learn more about Tidyverse in this video and this link. There is an entire book to learn how to use this ecosystem; which at its core aims to simplify the data analysis life cycle:
(Image obtained from this link)
In particular, we will focus on two crucial steps: transform and visualize; which are accomplished with the dplyr and ggplot2 packages, respectively.
To introduce this package, we will work with the iconic gapminder dataset, which contains information on population size, gross domestic product (GDP) and life expectancy for each country across several years.
Let us start by loading it into memory:
library(tidyverse)
gapminder <- as.data.frame(gapminder::gapminder)
head(gapminder)
## country continent year lifeExp pop gdpPercap
## 1 Afghanistan Asia 1952 28.801 8425333 779.4453
## 2 Afghanistan Asia 1957 30.332 9240934 820.8530
## 3 Afghanistan Asia 1962 31.997 10267083 853.1007
## 4 Afghanistan Asia 1967 34.020 11537966 836.1971
## 5 Afghanistan Asia 1972 36.088 13079460 739.9811
## 6 Afghanistan Asia 1977 38.438 14880372 786.1134
tail(gapminder)
## country continent year lifeExp pop gdpPercap
## 1699 Zimbabwe Africa 1982 60.363 7636524 788.8550
## 1700 Zimbabwe Africa 1987 62.351 9216418 706.1573
## 1701 Zimbabwe Africa 1992 60.377 10704340 693.4208
## 1702 Zimbabwe Africa 1997 46.809 11404948 792.4500
## 1703 Zimbabwe Africa 2002 39.989 11926563 672.0386
## 1704 Zimbabwe Africa 2007 43.487 12311143 469.7093
Dplyr is centered around 5 verbs:
filter()
: get rows that fulfill a specific condition.select()
: select specific columns.arrange()
: sort rows in a certain order.mutate()
: create new variables from existing ones.summarise()
: collapse many values down to a single one.All this functions can be used together with the group_by()
function to apply these verbs grouped by a categorical variable.
In addition, dplyr aims to maximize code readability. To this end, it uses the pipe operator (%>%
) from the magrittr package to create the so-called pipelines.
(Image obtained from this link)
You can learn more about dplyr in this link.
Example pipeline:
āStarting with the gapminder dataset, filter all rows from 2002, and then select the continent and population. With that, group the rows by continent and summarise the dataset to the mean population for each continent. Finally, arrange the results in descending orderā.
pop_by_continent <- gapminder %>%
filter(year == "2002") %>%
select(continent, pop) %>%
group_by(continent) %>%
summarise(mean_pop = mean(pop)) %>%
arrange(desc(mean_pop))
pop_by_continent
## # A tibble: 5 Ć 2
## continent mean_pop
## <fct> <dbl>
## 1 Asia 109145521.
## 2 Americas 33990910.
## 3 Europe 19274129.
## 4 Africa 16033152.
## 5 Oceania 11727414.
As we can see, dplyr makes it easy to translate a thought into clear code.
The input to ggplot2 is a tidy dataset which is defined by the following properties:
The underlying philosophy of ggplot2 is the grammar of graphics. This grammar entails that plots are formed with several components, whose relationships are structured in a layered hierarchy.
This hierarchy is better visualized as a pyramid:
(Image obtained from this link)
The important concept is clearly the layered hierarchy, which means that
The main layers, by order, are the following:
As an example, using the same dataset as in the previous example, we can plot the correlation between life expectancy and GDP for each country by using the functions geom_point and geom_smooth:
gapminder_2002 <- gapminder %>%
filter(year == 2002)
gapminder_2002 %>%
ggplot(aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "black") +
scale_x_log10() +
labs(x = "Income", y = "Life Expectancy") +
theme_classic()
Exercise:
Use the tools that we have seen until now to:
Hereafter, we will develop the reasoning and solution for this exercise. You are all highly encouraged to try the problem on your own and then compare the results. Otherwise, you can follow our explanation, and try to code each chunk by yourself before looking at ours.
Solution
First, we will subset the dataframe to keep only the variables of interest (life expectancy, income and country). Then we will visualize the first 3 rows with the head()
function:
# Visualize the structure of the dataframe
str(gapminder::gapminder)
## tibble [1,704 Ć 6] (S3: tbl_df/tbl/data.frame)
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num [1:1704] 28.8 30.3 32 34 36.1 ...
## $ pop : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num [1:1704] 779 821 853 836 740 ...
# Character subsetting
gapminder_sub <- gapminder_2002[, c("country", "lifeExp", "gdpPercap")]
head(gapminder_sub, 3)
## country lifeExp gdpPercap
## 1 Afghanistan 42.129 726.7341
## 2 Albania 75.651 4604.2117
## 3 Algeria 70.994 5288.0404
Note that the gdpPercap in the previous scatter plot is in logarithmic scale. Let us log-transform it:
gapminder_sub$log10_gdpPercap <- log10(gapminder_sub$gdpPercap)
head(gapminder_sub, 3)
## country lifeExp gdpPercap log10_gdpPercap
## 1 Afghanistan 42.129 726.7341 2.861376
## 2 Albania 75.651 4604.2117 3.663155
## 3 Algeria 70.994 5288.0404 3.723295
Our purpose is to fit a line in the form of
\[ lifeExp = m Ā· \log_{10}(gdpPercap) + n, \]
where lifeExp (life expectancy) is the dependent (or response) variable, gdpPercap (per-capita Gross domestic product) is the independent (or explanatory) variable, \(m\) is the slope and \(n\) is the \(y\)-intercept.
Hence, if we represent each of the rows of the previous head
in this form, we would get the following toy system of linear equations:
\[ \begin{cases} 2.86m+n=42.19\\3.66m+n=75.65\\3.72m+n=70.99 \end{cases} \]
Which from the column perspective we would express in \(Ax = b\):
\[ \begin{bmatrix}2.86&1 \\3.66&1 \\3.72&1 \\\end{bmatrix} \begin{bmatrix}m \\n \\\end{bmatrix} = \begin{bmatrix}42.19 \\75.65 \\70.99 \\\end{bmatrix} \]
In our real problem, we have \(142\) equations for only two unknowns (slope and \(y\)-intercept). Thus, the coefficient matrix is \(142\)x\(2\). As you will have guessed, to find the least squares regression line we will need to use orthogonal projections to find \(m\) and \(n\).
Challenge:
To get a sense of how \(b\), \(p\) and \(p-b\) translate in the context of a least squares problem try the shiny app.
To solve for \(m\) and \(n\), we will first create the matrix \(A\) and the vector \(b\). Note that the first column of \(A\) is the gdpPercap, and the vector \(b\) is the lifeExp:
num_rows <- nrow(gapminder_sub)
A <- matrix(
c(gapminder_sub$log10_gdpPercap, rep(1, num_rows)),
nrow = num_rows,
ncol = 2,
byrow = FALSE
)
head(A)
## [,1] [,2]
## [1,] 2.861376 1
## [2,] 3.663155 1
## [3,] 3.723295 1
## [4,] 3.442995 1
## [5,] 3.944366 1
## [6,] 4.486965 1
b <- as.matrix(gapminder_sub$lifeExp)
head(b)
## [,1]
## [1,] 42.129
## [2,] 75.651
## [3,] 70.994
## [4,] 41.003
## [5,] 74.340
## [6,] 80.370
We can solve the problem with the following equation (seen above):
\[ \widehat{x} = (A^t A)^{-1} A^tb \]
AT_A <- t(A) %*% A
AT_A_inv <- inv(AT_A)
AT_A_inv <- AT_A_inv
x_hat <- AT_A_inv %*% t(A) %*% b
m <- x_hat[1, 1]
n <- x_hat[2, 1]
out <- str_c("The slope is ", round(m, 4), " and the y-intercept is ", round(n, 4))
print(out)
## [1] "The slope is 17.5394 and the y-intercept is 1.2807"
Finally, we can recreate the plot. This time, you should manually provide \(m\) and \(n\) that you calculated by yourselves.
gapminder_2002 %>%
ggplot(aes(gdpPercap, lifeExp, color = continent, size = pop)) +
geom_point() +
geom_abline(slope = m, intercept = n, color = "black") +
scale_x_log10() +
labs(x = "Income", y = "Life Expectancy") +
theme_classic()
sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=es_ES.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=es_ES.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Europe/Madrid
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.3
## [5] purrr_1.0.2 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1
## [9] tidyverse_2.0.0 MASS_7.3-60 matlib_0.9.6 ggplot2_3.4.4
## [13] BiocStyle_2.28.1
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.7 utf8_1.2.3 generics_0.1.3
## [4] lattice_0.21-9 stringi_1.7.12 hms_1.1.3
## [7] digest_0.6.33 magrittr_2.0.3 rgl_1.2.1
## [10] timechange_0.2.0 evaluate_0.22 grid_4.3.1
## [13] bookdown_0.35 fastmap_1.1.1 Matrix_1.6-1.1
## [16] jsonlite_1.8.7 BiocManager_1.30.22 mgcv_1.9-0
## [19] fansi_1.0.5 scales_1.2.1 jquerylib_0.1.4
## [22] abind_1.4-5 cli_3.6.1 rlang_1.1.1
## [25] splines_4.3.1 munsell_0.5.0 base64enc_0.1-3
## [28] withr_2.5.1 cachem_1.0.8 yaml_2.3.7
## [31] tools_4.3.1 tzdb_0.4.0 colorspace_2.1-0
## [34] vctrs_0.6.4 R6_2.5.1 lifecycle_1.0.3
## [37] car_3.1-2 htmlwidgets_1.6.2 pkgconfig_2.0.3
## [40] pillar_1.9.0 bslib_0.5.1 gtable_0.3.4
## [43] glue_1.6.2 xfun_0.40 tidyselect_1.2.0
## [46] rstudioapi_0.15.0 knitr_1.44 farver_2.1.1
## [49] xtable_1.8-4 nlme_3.1-163 gapminder_1.0.0
## [52] htmltools_0.5.6.1 labeling_0.4.3 rmarkdown_2.25
## [55] carData_3.0-5 compiler_4.3.1