1 Introduction and objectives

In this hands-on session, we will show how principal component analysis (PCA) is used in common bioinformatics workflows. We will use a toy single-cell RNA-seq (scRNA-seq) dataset and a real NCI microarray data to illustrate it.

The main objectives are the following:

Understand the intuition behind PCA.
Learn to compute PCA in 2 ways in R: by eigendecomposition of the covariance matrix and using the prcomp function.
Perform a case-study using scRNA-seq data.

2 Introduction to PCA

Analyzing tabular datasets, specially high-dimensional data, involves identifying which features carry more information and whether some features can be recovered from other features. Principal component analysis (PCA) is a dimensionality reduction method that transforms high-dimensions data into lower-dimensions while retaining as much information as possible.

We will work tabular data with numerical entries, using an $n x m$ matrix to represent feature values (columns) for experimental samples (rows).

PCA offers several advantages when analyzing biological data:

Reduce the noise of the dataset: by finding the orthogonal axis that maximize the variance, PCA eliminates noise that may have been introduced during the experiments.
Reduce the redundancy: as we will see below, when analyzing gene expression data, there is a vast degree of redundancy across gene sets. Thus, principal components can be interpreted as a “metagene”: a score that summarizes information from a correlated set of genes.
Reduce computational complexity: since we reduce the dimensionality, operations downstream such as clustering will be executed faster.

3 Case-study scRNA-seq data

3.1 Introduction to scRNA-seq

Single-cell RNA-seq allows the gene expression profiling of thousands of cells, one a time. It is often exemplified using the following analogy:

Bulk RNA-seq would be analogous to a fruit smoothie, in which all we get is an average taste out of several fruits. Analogously, in bulk RNA-seq all we get is an average transcriptome across a population of cells.
scRNA-seq, on the other hand, is analogous to a fruit salad, in which you can taste one fruit at a time. Here, we get a transcriptome for each single cell, so we can understand cell-to-cell variability in gene expression.

3.2 Why use PCA to analyze scRNA-seq data

scRNA-seq is a double-edged sword. On the one hand, scRNA-seq provides unprecedented discriminative power to chart all the cell types and states in a given tissue. This has catalyzed the creation of the Human Cell Atlas (HCA), which aims to classify all cells in the human body using scRNA-seq; and has been coined as “the periodic table of human cells”.

On the other hand, however, scRNA-seq has a high degree of technical variability, which can be introduced at multiple points of the process. Some examples include the following (Definitions extracted from table 1 of this paper):

Dissociation artifacts: enzymatic treatment is frequently used to dissociate tissues into single-cells. This can induce cellular stress that can be confounded by the variable of interest. Described in detail in this paper.
Doublets: Two cells that are processed together in a reaction volume and receive the same single-cell barcode.
Dropouts: Transcripts that are not detected in the final dataset even though the gene is expressed in the cell, leading to false zero values in the expression matrix.

3.3 PCA with scRNA-seq data

As an example, we will create an expression matrix with the following cells:

4 T-cells (identified by high expression of CD3D and CD3E).
3 monocytes (identified by high expression of LYZ and S100A8).
3 naive B-cells (identified by high expression of CD79A, CD79B and BLNK).
2 plasma cells (identified by B-cell and proliferation markers, such as TOP2A or MKI67).
2 poor-quality cells (identified by high mitochondrial expression). If a cell has pores in the membrane due to cell lysis, the cytosolic mRNA will leak out of the cell; but if the diameter of mitochondria is higher than the pores, they will get trapped inside the cell.

In the following image you can visualize the redundancy between CD79A and CD79B. Together, they form the heterodimer CD79; which means that for every molecule of CD79A we need one of CD79B. Under the hood they are the same variable, which should be captured by a single principal component.

Image obtained from this link

3.4 Create and visualize toy dataset

Load packages

library(pheatmap)
library(tidyverse)

We will create a toy dataset that contains the gene expression of each of the cells. Cells are represented as rows and genes as columns.

# Create toy dataset
toy <- data.frame(
  CD3D = c(4, 5, 4, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
  CD3E = c(5, 3, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
  LYZ = c(0, 0, 0, 0, 5, 7, 6, 0, 0, 0, 0, 0, 0, 0),
  S100A8 = c(0, 0, 0, 0, 5, 7, 6, 0, 0, 0, 0, 0, 0, 0),
  CD79A = c(0, 0, 0, 0, 0, 0, 0, 6, 4, 4, 3, 5, 0, 0),
  CD79B = c(0, 0, 0, 0, 0, 0, 0, 5, 5, 3, 4, 6, 0, 0),
  BLNK = c(0, 0, 0, 0, 0, 0, 0, 6, 4, 5, 5, 4, 0, 0),
  TOP2A = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 9, 0, 0),
  MKI67 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 10, 0, 0),
  MT_CO1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 23),
  MT_ND5 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 25) 
)
rownames(toy) <- paste("cell", as.character(1:nrow(toy)))
toy <- as.matrix(toy)


# Visualize
pheatmap(toy, cluster_cols = FALSE, cluster_rows = FALSE, angle_col = 45)

3.5 PCA in R: 2 ways

Two ways to perform PCA in R:

Eigendecomposition of the covariance matrix.
prcomp function.

Regardless of the method, we always need to normalize the data (center and scale). We will do it by using the function scale.

The function scale takes as arguments the matrix and two parameters: center and scale, that can be either a logical value or a numeric-alike vector of length equal to the number of columns of the matrix.

center: if center is a numeric-alike vector with length equal to the number of columns of the matrix, then the output will have the result of substracting these values to each of the values in the corresponding columns. If center is TRUE then centering is done by subtracting the column means (omitting NAs) of the matrix from their corresponding columns, and if center is FALSE, no centering is done.
scale: if scale is a numeric-alike vector with length equal to the number of columns of the matrix, then, after centering, each column of the matrix is divided by the corresponding value from scale. If scale is TRUE then scaling is done by dividing the (centered) columns of the matrix by their standard deviations, and if scale is FALSE, no scaling is done.

toy_centered <- scale(toy, center = TRUE, scale = FALSE)
toy_normalized <- scale(toy, center = TRUE, scale = rep((sqrt(nrow(toy)-1)), ncol(toy)))

toy_normalized[1:5,1:5]

##              CD3D       CD3E       LYZ    S100A8      CD79A
## cell 1  0.7924289  1.0697789 -0.356593 -0.356593 -0.4358359
## cell 2  1.0697789  0.5150788 -0.356593 -0.356593 -0.4358359
## cell 3  0.7924289  0.7924289 -0.356593 -0.356593 -0.4358359
## cell 4  0.5150788  0.7924289 -0.356593 -0.356593 -0.4358359
## cell 5 -0.3169715 -0.3169715  1.030158  1.030158 -0.4358359

Now we will compute the exact same thing (PCA) in the 2 ways.

In each case we will be tracking the same three pieces of information:

the principal components (pdir), also known as loadings.
the variance of the data along each principal component (var).
the scores of each data sample in the principal components (score).

3.5.1 Eigendecompostion of the covariance matrix

We will be computing an eigenbasis for the covariance matrix $\Omega = A^tA$ where $A$ is the normalized data (centered and scaled). We will use the function eigen.

The eigenvectors of the covariance matrix are the loadings (principal components):

cov_matrix <-  (t(toy_normalized) %*% toy_normalized)
eigen_cov_matrix <- eigen(cov_matrix)

pdir_eigen <- eigen_cov_matrix$vectors
pdir_eigen[1:5,1:5]

##             [,1]       [,2]       [,3]        [,4]       [,5]
## [1,] -0.03621728 -0.1281224 -0.2892754  0.37018269 -0.1942142
## [2,] -0.03621728 -0.1281224 -0.2892754  0.37018269 -0.1942142
## [3,] -0.04319892 -0.1949563  0.5855971 -0.07543384 -0.1254525
## [4,] -0.04319892 -0.1949563  0.5855971 -0.07543384 -0.1254525
## [5,] -0.06565032  0.2467930 -0.1395316 -0.42583898 -0.3020970

The eigenvalues contains the variance of the data along each principal component.

var_eigen <- eigen_cov_matrix$values
head(var_eigen)

## [1] 133.1235625  36.7772802  14.4225675  10.9965640   0.4843188   0.2349899

The scores of each data sample in the principal components can be obtained by multiplying the original centered matrix with the matrix with the eigenvectors as columns:

scores_eigen <- toy_centered %*% pdir_eigen
scores_eigen[1:5, 1:5]

##             [,1]      [,2]      [,3]      [,4]        [,5]
## cell 1 -3.951320 -3.948608 -3.661022  3.654521 -0.25850086
## cell 2 -3.915103 -3.820486 -3.371747  3.284338 -0.06428666
## cell 3 -3.915103 -3.820486 -3.371747  3.284338 -0.06428666
## cell 4 -3.878885 -3.692364 -3.082472  2.914155  0.12992755
## cell 5 -4.057353 -4.745070  4.798428 -0.431462  0.23490185

3.5.2 PCA

Now we will use a stand-alone PCA function provided by R, prcomp, to calculate the principal components.

For the prcomp function we don’t need to scale the data, it is already implemented within the function. We do need to center the data.

pca_out <- prcomp(toy, center = TRUE, scale = FALSE)

We can use the following keywords to parse the output instance:

Principal components (loadings): “rotation”

pdir_pca <- pca_out$rotation
pdir_pca[1:5,1:5]

##                PC1        PC2        PC3         PC4       PC5
## CD3D   -0.03621728 -0.1281224  0.2892754  0.37018269 0.1942142
## CD3E   -0.03621728 -0.1281224  0.2892754  0.37018269 0.1942142
## LYZ    -0.04319892 -0.1949563 -0.5855971 -0.07543384 0.1254525
## S100A8 -0.04319892 -0.1949563 -0.5855971 -0.07543384 0.1254525
## CD79A  -0.06565032  0.2467930  0.1395316 -0.42583898 0.3020970

Standard deviations : “sdev”

(We need to multiply by $n-1$ since, in prcomp, variances are computed with the usual divisor $n-1$. See documentation.)

var_pca <- (pca_out$sdev)** 2 

head(var_pca)

## [1] 133.1235625  36.7772802  14.4225675  10.9965640   0.4843188   0.2349899

Principal components scores: “x”

scores_pca <- pca_out$x
scores_pca[1:5,1:5]

##              PC1       PC2       PC3       PC4         PC5
## cell 1 -3.951320 -3.948608  3.661022  3.654521  0.25850086
## cell 2 -3.915103 -3.820486  3.371747  3.284338  0.06428666
## cell 3 -3.915103 -3.820486  3.371747  3.284338  0.06428666
## cell 4 -3.878885 -3.692364  3.082472  2.914155 -0.12992755
## cell 5 -4.057353 -4.745070 -4.798428 -0.431462 -0.23490185

In summary, we obtain the following:

Gene loadings: pca_out$rotation
Variance associated to each principal component: pca_out$sdev ** 2
PC scores: pca_out$x

3.6 Infer dimensionality of the dataset

When using prcomp function can get the proportion of variance explained (PVE) and the cumulative proportion with the summary function (more info in this book):

summary(pca_out)

## Importance of components:
##                            PC1    PC2     PC3     PC4     PC5    PC6     PC7
## Standard deviation     11.5379 6.0644 3.79771 3.31611 0.69593 0.4848 0.48038
## Proportion of Variance  0.6777 0.1872 0.07342 0.05598 0.00247 0.0012 0.00117
## Cumulative Proportion   0.6777 0.8649 0.93829 0.99427 0.99674 0.9979 0.99911
##                            PC8     PC9    PC10      PC11
## Standard deviation     0.33153 0.24758 0.06164 4.519e-16
## Proportion of Variance 0.00056 0.00031 0.00002 0.000e+00
## Cumulative Proportion  0.99967 0.99998 1.00000 1.000e+00

To reduce the dimensionality, we will choose the number of PC that explains most of the variance in the dataset. To this end, we use a so-called elbow plot:

# Calculate percentage of variance explained (PVE):
pve <- pca_out$sdev ** 2 / sum(pca_out$sdev ** 2) * 100
pve <- round(pve, 2)
pve_df <- data.frame(principal_component = 1:length(pve), pve = pve)
pve_gg <- pve_df %>%
  ggplot(aes(principal_component, pve)) +
    geom_point() +
    geom_vline(xintercept = 4.5, linetype = "dashed", color = "darkblue") +
    scale_x_continuous(breaks = 1:length(pve)) +
    labs(x = "Principal Component", y = "Percentage of Variance Explained (%)") +
    theme_bw()
pve_gg

Some people prefer to plot the cumulative proportion of explained variance:

summ_pca <- summary(pca_out)
cum_pct <- summ_pca$importance["Cumulative Proportion", ] * 100
cum_pct_df <- data.frame(principal_component = 1:length(pve), cum_pct = cum_pct)
cum_pct_gg <- cum_pct_df %>%
  ggplot(aes(principal_component, cum_pct)) +
    geom_point() +
    geom_vline(xintercept = 4.5, linetype = "dashed", color = "darkblue") +
    scale_x_continuous(breaks = 1:length(cum_pct)) +
    scale_y_continuous(limits = c(0, 100)) +
    labs(x = "Principal Component", y = "Cumulative Percentage of Variance Explained (%)") +
    theme_bw()
cum_pct_gg

Thus, we conclude that the first 4 PC are significant. Let us express each cell as a vector of the first 4 PC (scores):

toy_reduced <- pca_out$x[, c("PC1", "PC2", "PC3", "PC4")]
pheatmap(toy_reduced, cluster_rows = FALSE, cluster_cols = FALSE, angle_col = 45)

3.7 Visualize gene loadings

Now we have reduced the dimensionality of the dataset. To interpret each principal component, let us inspect the gene loadings:

gene_loads_reduced <- pca_out$rotation[, c("PC1", "PC2", "PC3", "PC4")]
significant_pcs <- c("PC1", "PC2", "PC3", "PC4")
loadings_gg <- map(significant_pcs, function(x) {
  loadings <- gene_loads_reduced[, x]
  df <- data.frame(gene = names(loadings), score = loadings)
  p <- df %>%
    ggplot(aes(fct_reorder(gene, score), score)) +
      geom_point() +
      labs(x = "", y = x) +
      theme_bw() +
      coord_flip()
  return(p)
})
loadings_gg

## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

3.8 Cluster cells

Clustering means classifying observations into groups that minimize and maximize intra-group and inter-group distances, respectively.

3.8.1 Calculate all pairwise Euclidean distances

To that end we need a concept of distance, which basically measures how similar or different two vectors are. In our case we will use Euclidean distance, but be aware that there are a plethora of different options that can yield different clustering results.

Let us start by computing all pairwise distances (distance between cells, considering their expression profiles):

dist_mat <- dist(toy_reduced, method = "euclidean")
pheatmap(dist_mat, cluster_rows = FALSE, cluster_cols = FALSE)

3.9 Perform hierarchical clustering

hclust_average <- hclust(dist_mat, method = "average")
plot(
  hclust_average,
  labels = rownames(toy),
  main = "Average Linkage",
  xlab = "",
  sub = "",
  ylab = "Cophenetic distance"
)

3.9.1 Cut the dendrogram and visualize clusters

plot(
  hclust_average,
  labels = rownames(toy),
  main = "Average Linkage",
  xlab = "",
  sub = "",
  ylab = "Cophenetic distance"
)
abline(h = 7.5, col = "red")

Visualize the correspondent cluster to each of the cells:

hc_clusters <- cutree(hclust_average, h = 7.5)
hc_clusters

##  cell 1  cell 2  cell 3  cell 4  cell 5  cell 6  cell 7  cell 8  cell 9 cell 10 
##       1       1       1       1       2       2       2       3       3       3 
## cell 11 cell 12 cell 13 cell 14 
##       4       4       5       5

Visualize the number of cells in each cluster:

table(hc_clusters)

## hc_clusters
## 1 2 3 4 5 
## 4 3 3 2 2

3.9.2 Annotation

Cluster	Cell type
1	T-cells
2	Monocytes
3	Naive B-cells
4	Plasma Cells
5	poor-quality cells

Visualize the centered gene expression across cells and their corresponding cluster:

annotation_rows <- data.frame(cluster = as.character(hc_clusters))
rownames(annotation_rows) <- names(hc_clusters)
annotated_clusters <- c("T-cells", "Monocytes", "Naive B-cells", "Plasma Cells", "poor-quality cells")
names(annotated_clusters) <- as.character(1:5)
annotation_rows$annotation <- annotated_clusters[annotation_rows$cluster]
pheatmap(
  toy_normalized,
  cluster_rows = FALSE,
  cluster_cols = FALSE,
  angle_col = 315,
  annotation_row = annotation_rows
)

4 Exercise microarray data

We will use the data from NCI 60 Data from the ISLR package. This data set contains NCI microarray data. The data contains expression levels on 6830 genes from 64 cancer cell lines.

Exercise

Find the principal components of this dataset and cluster the cell lines according to their similarities in gene expression.

We will load the data and keep the information of the table and the tumor types of the cell lines:

library(ISLR)
data_NCI60 <- NCI60$data
labels_NCI60 <- NCI60$labs
rownames(data_NCI60) <- labels_NCI60

We can look at the dimension of the data set which contains information about 64 cell lines of different tumor types and 6830 genes.

dim(data_NCI60)

## [1]   64 6830

We can see the list of the tumor types:

labels_NCI60

##  [1] "CNS"         "CNS"         "CNS"         "RENAL"       "BREAST"     
##  [6] "CNS"         "CNS"         "BREAST"      "NSCLC"       "NSCLC"      
## [11] "RENAL"       "RENAL"       "RENAL"       "RENAL"       "RENAL"      
## [16] "RENAL"       "RENAL"       "BREAST"      "NSCLC"       "RENAL"      
## [21] "UNKNOWN"     "OVARIAN"     "MELANOMA"    "PROSTATE"    "OVARIAN"    
## [26] "OVARIAN"     "OVARIAN"     "OVARIAN"     "OVARIAN"     "PROSTATE"   
## [31] "NSCLC"       "NSCLC"       "NSCLC"       "LEUKEMIA"    "K562B-repro"
## [36] "K562A-repro" "LEUKEMIA"    "LEUKEMIA"    "LEUKEMIA"    "LEUKEMIA"   
## [41] "LEUKEMIA"    "COLON"       "COLON"       "COLON"       "COLON"      
## [46] "COLON"       "COLON"       "COLON"       "MCF7A-repro" "BREAST"     
## [51] "MCF7D-repro" "BREAST"      "NSCLC"       "NSCLC"       "NSCLC"      
## [56] "MELANOMA"    "BREAST"      "BREAST"      "MELANOMA"    "MELANOMA"   
## [61] "MELANOMA"    "MELANOMA"    "MELANOMA"    "MELANOMA"

Solution

We will use the function prcomp to calculate the principal components.

4.1 Perform PCA

pr_output <- prcomp(data_NCI60, center = TRUE)

As in the previous example, we can calculate the loadings:

pdir_pca <- pr_output$rotation
pdir_pca[1:6,1:6]

##            PC1           PC2         PC3          PC4          PC5
## 1 -0.005096247  0.0009839929 0.002116058  0.007628801 -0.012118316
## 2 -0.001642354  0.0034355664 0.008049350  0.004910196 -0.007412249
## 3 -0.002509243 -0.0015838271 0.004746350  0.008769557 -0.012426296
## 4  0.004940063  0.0078435347 0.013716214  0.011378816 -0.014851587
## 5 -0.003365039 -0.0002680479 0.010677453 -0.005249648 -0.003317312
## 6  0.001382038 -0.0034431320 0.003167842  0.007425971 -0.002879251
##             PC6
## 1  0.0061392242
## 2  0.0114429879
## 3 -0.0002860562
## 4 -0.0065009935
## 5 -0.0003513787
## 6  0.0003210365

The standard deviations :

var_pca <- pr_output$sdev ** 2
head(var_pca)

## [1] 633.2156 352.9278 279.9189 183.0830 163.5573 149.0968

And the principal components scores

scores_pca <- pr_output$x
scores_pca[1:6,1:6]

##              PC1         PC2        PC3       PC4        PC5      PC6
## CNS    -19.79578   0.1152691  -5.968917  4.753293  -4.882164 18.92591
## CNS    -21.54610  -1.4573503  -9.019584  6.767942  -2.247604 17.07273
## CNS    -25.05662   1.5260929  -6.959653  2.785913 -10.819648 16.45389
## RENAL  -37.40954 -11.3894784  -5.407097 15.442094 -16.011475 33.09651
## BREAST -50.21864  -1.3461737 -17.599944 15.099862 -13.852847 16.94340
## CNS    -26.43520   0.4629819 -16.931456 11.389195  -6.742920 11.85838

4.2 Plot the scores

We will plot the first 2 principal components score vectors with a dot plot, in order to visualize the data. We will color the observations (cell lines) according to their cancer type. In this way we can see to what extent the observations within a cancer type are similar to each other.

scores <- as.data.frame.array(pr_output$x)
scores$tumor_type <- labels_NCI60
scores %>% 
  ggplot(aes(x = PC1, y = PC2, color = labels_NCI60)) +
    geom_point() +
    labs(x = "PC1", y = "PC2") +
    theme_classic()

4.3 Visualize proportion of the variance explained

summary(pr_output)$importance[,c(1:14)]

##                             PC1      PC2      PC3      PC4      PC5      PC6
## Standard deviation     25.16378 18.78637 16.73078 13.53082 12.78895 12.21052
## Proportion of Variance  0.14893  0.08301  0.06584  0.04306  0.03847  0.03507
## Cumulative Proportion   0.14893  0.23194  0.29777  0.34083  0.37930  0.41437
##                             PC7      PC8      PC9     PC10     PC11    PC12
## Standard deviation     11.05840 10.94492 10.59140 9.576574 9.449313 9.22659
## Proportion of Variance  0.02876  0.02817  0.02638 0.021570 0.021000 0.02002
## Cumulative Proportion   0.44313  0.47130  0.49769 0.519260 0.540260 0.56028
##                            PC13     PC14
## Standard deviation     8.821954 8.668634
## Proportion of Variance 0.018300 0.017670
## Cumulative Proportion  0.578580 0.596260

As in the previous example we will represent the cumulative proportion of variance explained:

pve <- pr_output$sdev ** 2 / sum(pr_output$sdev ** 2) * 100
pve <- round(pve, 2)
pve_df <- data.frame(principal_component = 1:length(pve), pve = pve)
summ_pca <- summary(pr_output)
cum_pct <- summ_pca$importance["Cumulative Proportion", ] * 100
cum_pct_df <- data.frame(principal_component = 1:length(pve), cum_pct = cum_pct)
cum_pct_gg <- cum_pct_df %>%
  ggplot(aes(principal_component, cum_pct)) +
    geom_point() +
    scale_x_continuous(breaks = 1:length(cum_pct)) +
    scale_y_continuous(limits = c(0, 100)) +
    labs(x = "Principal Component", y = "Cumulative Percentage of Variance Explained (%)") +
    theme_bw()
cum_pct_gg

4.4 Cluster the samples

In this case, we will select the first 8 PC. We can cluster the samples within the heatmap:

pheatmap(scores[, c("PC1", "PC2", "PC3","PC4","PC5", "PC6", "PC7","PC8")], cluster_rows = TRUE, cluster_cols = FALSE, angle_col = 45, labels_row = labels_NCI60, fontsize_row = 5)

5 Session Information

sessionInfo()

## R version 4.3.2 (2023-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Europe/Madrid
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] ISLR_1.4         lubridate_1.9.3  forcats_1.0.0    stringr_1.5.0   
##  [5] dplyr_1.1.3      purrr_1.0.2      readr_2.1.4      tidyr_1.3.0     
##  [9] tibble_3.2.1     ggplot2_3.4.4    tidyverse_2.0.0  pheatmap_1.0.12 
## [13] BiocStyle_2.28.1
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.7          utf8_1.2.3          generics_0.1.3     
##  [4] stringi_1.7.12      hms_1.1.3           digest_0.6.33      
##  [7] magrittr_2.0.3      evaluate_0.22       grid_4.3.2         
## [10] timechange_0.2.0    RColorBrewer_1.1-3  bookdown_0.35      
## [13] fastmap_1.1.1       jsonlite_1.8.7      BiocManager_1.30.22
## [16] fansi_1.0.5         scales_1.2.1        jquerylib_0.1.4    
## [19] cli_3.6.1           rlang_1.1.1         munsell_0.5.0      
## [22] withr_2.5.1         cachem_1.0.8        yaml_2.3.7         
## [25] tools_4.3.2         tzdb_0.4.0          colorspace_2.1-0   
## [28] vctrs_0.6.4         R6_2.5.1            lifecycle_1.0.3    
## [31] pkgconfig_2.0.3     pillar_1.9.0        bslib_0.5.1        
## [34] gtable_0.3.4        glue_1.6.2          xfun_0.40          
## [37] tidyselect_1.2.0    rstudioapi_0.15.0   knitr_1.44         
## [40] farver_2.1.1        htmltools_0.5.6.1   rmarkdown_2.25     
## [43] labeling_0.4.3      compiler_4.3.2

Principal Component Analysis in Bioinformatics

30/10/2023