HRM analysis with PCA and clustering

Principal Component Analysis (PCA) of High Resolution Melting (HRM) Analysis for high throughput genotyping

Dependencies installation

install.packages("tidyverse")
install.packages("mclust")
install.packages("plot3D")
library(mclust)
library(plot3D)
library(tidyverse)
R

1. Import raw data

(1) File upload

If you choose to try this script on NextJournal, simply remix this notebook and drag your file with corresponding format as shown below.

Let's take a glance at the example file with the name of "PCA_HRM.example.csv".

PCA_HRM.example.csv
0 items

(2) Create a data frame to store normalized data

exported.data.file = read.csv(
PCA_HRM.example.csv
, header = T)
head(exported.data.file)
R
0 items

2. Select melting region

# max_lim = readline('Enter the upper limit of the melt region: ')
# min_lim = readline('Enter the lower limit of the melt region: ')
max_lim = 83.9
min_lim = 77.6
exported.data.file <- subset(exported.data.file, Temperature>min_lim & Temperature<max_lim)  # Excluding the pre- and post-melt region
normfluor = exported.data.file #to store the normalized data
head(normfluor)
R
0 items

3. Make absolute fluorescence into percentage

for (i in 2:length(exported.data.file[1,]))
{sample.fluorescence.norm <-(exported.data.file[,i])/max(exported.data.file[,i])
normfluor[,i]=sample.fluorescence.norm}
normfluor
R
0 items

4. Make the medium curve

medcurve <- apply(normfluor, 1, median)
R

5. Use median curve to normalize percentage melt curves

for(i in 2:length(normfluor[1,])){
  normfluor[,i]=normfluor[,i]-medcurve}
normfluor
R
0 items

6. Principal component analysis (PCA)

PCA.analysis.file <-prcomp(normfluor[,2:28], scale = FALSE) 
plot(PCA.analysis.file,type="line",  main="PCs of HRM") # Pareto plot
R

7. Clustering

cluster.data <- Mclust(PCA.analysis.file$rotation[,1:3], G = 3)##CHANGE HOW MANY PCs TO BE SELECTED HERE!!!!!!!!!!!
df.PCA.MM1 <- as.data.frame(PCA.analysis.file$rotation[,1:3])##CHANGE HOW MANY PCs TO BE SELECTED HERE!!!!!!!!!!!
## report the variant call
cluster.data$classification
## preview plots
#PC1 vs PC2
plot(df.PCA.MM1[,1:2], bg=cluster.data$classification,pch=21,xlab='PC1', ylab='PC2')
identify(df.PCA.MM1[,1:2], labels = rownames(df.PCA.MM1))
#PC2 vs PC3, ignore this if only 2 PCs are selected
plot(df.PCA.MM1[,2:3], bg=cluster.data$classification,pch=21,xlab='PC2', ylab='PC3')
identify(df.PCA.MM1[,2:3], labels = rownames(df.PCA.MM1))
#PC1 vs PC3, ignore this if only 2 PCs are selected
plot(df.PCA.MM1[,c(1,3)], bg=cluster.data$classification, pch=21, xlab='PC1', ylab='PC3')
identify(df.PCA.MM1[,c(1,3)], labels = rownames(df.PCA.MM1))
#3D plot
points3D(x=df.PCA.MM1$PC1,y=df.PCA.MM1$PC2,z=df.PCA.MM1$PC3, 
         colvar=cluster.data$classification, 
         xlab = 'PC1', ylab = 'PC2', zlab = 'PC3', pch = 16,
         colkey = F, col=c('black', 'red', 'green')
         )
identify(df.PCA.MM1[,1:3], labels = rownames(df.PCA.MM1))
R

Link to github

This NextJournal notebook is used to show the functionality and reproducibility of the source code. The full version R script with interactive plotting is available on GitHub, (https://github.com/choulin2/PCA_HRM.git).

Citation

Chou, L., S.J. Huang, C. Hsieh, M.T. Lu, C.W. Song, F.C. Hsu. A High Resolution Melting Analysis (HRM)-based Genotyping Toolkit for Chilling Requirement in Peach (Prunus persica). Submitted to International Journal of Molecular Sciences on 11/29/2019.

Runtimes (1)