HRM analysis with PCA and clustering
Principal Component Analysis (PCA) of High Resolution Melting (HRM) Analysis for high throughput genotyping
Dependencies installation
install.packages("tidyverse")
install.packages("mclust")
install.packages("plot3D")
library(mclust)
library(plot3D)
library(tidyverse)
1. Import raw data
(1) File upload
If you choose to try this script on NextJournal, simply remix this notebook and drag your file with corresponding format as shown below.
Let's take a glance at the example file with the name of "PCA_HRM.example.csv".
(2) Create a data frame to store normalized data
exported.data.file = read.csv(PCA_HRM.example.csv, header = T)
head(exported.data.file)
2. Select melting region
# max_lim = readline('Enter the upper limit of the melt region: ')
# min_lim = readline('Enter the lower limit of the melt region: ')
max_lim = 83.9
min_lim = 77.6
exported.data.file <- subset(exported.data.file, Temperature>min_lim & Temperature<max_lim) # Excluding the pre- and post-melt region
normfluor = exported.data.file #to store the normalized data
head(normfluor)
3. Make absolute fluorescence into percentage
for (i in 2:length(exported.data.file[1,]))
{sample.fluorescence.norm <-(exported.data.file[,i])/max(exported.data.file[,i])
normfluor[,i]=sample.fluorescence.norm}
normfluor
4. Make the medium curve
medcurve <- apply(normfluor, 1, median)
5. Use median curve to normalize percentage melt curves
for(i in 2:length(normfluor[1,])){
normfluor[,i]=normfluor[,i]-medcurve}
normfluor
6. Principal component analysis (PCA)
PCA.analysis.file <-prcomp(normfluor[,2:28], scale = FALSE)
plot(PCA.analysis.file,type="line", main="PCs of HRM") # Pareto plot
7. Clustering
cluster.data <- Mclust(PCA.analysis.file$rotation[,1:3], G = 3)##CHANGE HOW MANY PCs TO BE SELECTED HERE!!!!!!!!!!!
df.PCA.MM1 <- as.data.frame(PCA.analysis.file$rotation[,1:3])##CHANGE HOW MANY PCs TO BE SELECTED HERE!!!!!!!!!!!
## report the variant call
cluster.data$classification
## preview plots
#PC1 vs PC2
plot(df.PCA.MM1[,1:2], bg=cluster.data$classification,pch=21,xlab='PC1', ylab='PC2')
identify(df.PCA.MM1[,1:2], labels = rownames(df.PCA.MM1))
#PC2 vs PC3, ignore this if only 2 PCs are selected
plot(df.PCA.MM1[,2:3], bg=cluster.data$classification,pch=21,xlab='PC2', ylab='PC3')
identify(df.PCA.MM1[,2:3], labels = rownames(df.PCA.MM1))
#PC1 vs PC3, ignore this if only 2 PCs are selected
plot(df.PCA.MM1[,c(1,3)], bg=cluster.data$classification, pch=21, xlab='PC1', ylab='PC3')
identify(df.PCA.MM1[,c(1,3)], labels = rownames(df.PCA.MM1))
#3D plot
points3D(x=df.PCA.MM1$PC1,y=df.PCA.MM1$PC2,z=df.PCA.MM1$PC3,
colvar=cluster.data$classification,
xlab = 'PC1', ylab = 'PC2', zlab = 'PC3', pch = 16,
colkey = F, col=c('black', 'red', 'green')
)
identify(df.PCA.MM1[,1:3], labels = rownames(df.PCA.MM1))
Link to github
This NextJournal notebook is used to show the functionality and reproducibility of the source code. The full version R script with interactive plotting is available on GitHub, (https://github.com/choulin2/PCA_HRM.git).
Citation
Chou, L., S.J. Huang, C. Hsieh, M.T. Lu, C.W. Song, F.C. Hsu. A High Resolution Melting Analysis (HRM)-based Genotyping Toolkit for Chilling Requirement in Peach (Prunus persica). Submitted to International Journal of Molecular Sciences on 11/29/2019.