Seurat – Guided Clustering Tutorial
Adapted from https://satijalab.org/seurat/v3.0/pbmc3k_tutorial.html.
For this tutorial, we will be analyzing the a dataset of Peripheral Blood Mononuclear Cells (PBMC) freely available from 10X Genomics. There are 2,700 single cells that were sequenced on the Illumina NextSeq 500.
This is the raw data:
We start by loading the required libraries:
library(dplyr) library(ggplot2) library(Seurat)
Seurat requires the data files to be in a common directory, while Nextjournal's data versioning stores them separately. We can work around this using symbolic links.
mkdir -p /data/pbmc3k/hg19/ ln -sfmatrix.mtx/data/pbmc3k/hg19/ ln -sfmatrix.mtx/data/pbmc3k/hg19/ ln -sfmatrix.mtx/data/pbmc3k/hg19/
We start by reading in the data. The Read10X
function reads in the output of the cellranger pipeline from 10X, returning a unique molecular identified (UMI) count matrix. The values in this matrix represent the number of molecules for each feature (i.e. gene; row) that are detected in each cell (column).
pbmc.data <- Read10X(data.dir = "/data/pbmc3k/hg19/")
We next use the count matrix to create a Seurat
object. The object serves as a container that contains both data (like the count matrix) and analysis (like PCA, or clustering results) for a single-cell dataset. For a technical discussion of the Seurat
object structure, check out our GitHub Wiki. For example, the count matrix is stored in pbmc[["RNA"]]@counts
.
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200) pbmc
What does data in a count matrix look like?
pbmc.data[c("CD3D", "TCL1A", "MS4A1"), 1:30]
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
In the example below, we visualize QC metrics, and use these to filter cells.
- We filter cells that have unique feature counts over 2,500 or less than 200
- We filter cells that have >5% mitochondrial counts
VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)