Making science reproducible @nextjournal

Seurat – Guided Clustering Tutorial

For this tutorial, we will be analyzing the a dataset of Peripheral Blood Mononuclear Cells (PBMC) freely available from 10X Genomics. There are 2,700 single cells that were sequenced on the Illumina NextSeq 500.

This is the raw data:


We start by loading the required libraries:


Seurat requires the data files to be in a common directory, while Nextjournal's data versioning stores them separately. We can work around this using symbolic links.

mkdir -p /data/pbmc3k/hg19/
ln -sf 
/data/pbmc3k/hg19/ ln -sf
/data/pbmc3k/hg19/ ln -sf

We start by reading in the data. The Read10X function reads in the output of the cellranger pipeline from 10X, returning a unique molecular identified (UMI) count matrix. The values in this matrix represent the number of molecules for each feature (i.e. gene; row) that are detected in each cell (column). <- Read10X(data.dir = "/data/pbmc3k/hg19/")

We next use the count matrix to create a Seurat object. The object serves as a container that contains both data (like the count matrix) and analysis (like PCA, or clustering results) for a single-cell dataset. For a technical discussion of the Seurat object structure, check out our GitHub Wiki. For example, the count matrix is stored in pbmc[["RNA"]]@counts.

pbmc <- CreateSeuratObject(counts =, project = "pbmc3k", min.cells = 3, min.features = 200)

What does data in a count matrix look like?[c("CD3D", "TCL1A", "MS4A1"), 1:30]
pbmc[[""]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")

In the example below, we visualize QC metrics, and use these to filter cells.

  • We filter cells that have unique feature counts over 2,500 or less than 200
  • We filter cells that have >5% mitochondrial counts
VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", ""), ncol = 3)