HRM analysis with PCA and clustering
Principal Component Analysis (PCA) of High Resolution Melting (HRM) Analysis for high throughput genotyping
Dependencies installation
install.packages("tidyverse")install.packages("mclust")install.packages("plot3D")library(mclust)library(plot3D)library(tidyverse)1. Import raw data
(1) File upload
If you choose to try this script on NextJournal, simply remix this notebook and drag your file with corresponding format as shown below.
Let's take a glance at the example file with the name of "PCA_HRM.example.csv".
(2) Create a data frame to store normalized data
exported.data.file = read.csv(PCA_HRM.example.csv, header = T)head(exported.data.file)2. Select melting region
# max_lim = readline('Enter the upper limit of the melt region: ')# min_lim = readline('Enter the lower limit of the melt region: ')max_lim = 83.9min_lim = 77.6exported.data.file <- subset(exported.data.file, Temperature>min_lim & Temperature<max_lim) # Excluding the pre- and post-melt regionnormfluor = exported.data.file #to store the normalized datahead(normfluor)3. Make absolute fluorescence into percentage
for (i in 2:length(exported.data.file[1,])){sample.fluorescence.norm <-(exported.data.file[,i])/max(exported.data.file[,i])normfluor[,i]=sample.fluorescence.norm}normfluor4. Make the medium curve
medcurve <- apply(normfluor, 1, median)5. Use median curve to normalize percentage melt curves
for(i in 2:length(normfluor[1,])){ normfluor[,i]=normfluor[,i]-medcurve}normfluor6. Principal component analysis (PCA)
PCA.analysis.file <-prcomp(normfluor[,2:28], scale = FALSE) plot(PCA.analysis.file,type="line", main="PCs of HRM") # Pareto plot7. Clustering
cluster.data <- Mclust(PCA.analysis.file$rotation[,1:3], G = 3)##CHANGE HOW MANY PCs TO BE SELECTED HERE!!!!!!!!!!!df.PCA.MM1 <- as.data.frame(PCA.analysis.file$rotation[,1:3])##CHANGE HOW MANY PCs TO BE SELECTED HERE!!!!!!!!!!!## report the variant callcluster.data$classification## preview plots#PC1 vs PC2plot(df.PCA.MM1[,1:2], bg=cluster.data$classification,pch=21,xlab='PC1', ylab='PC2')identify(df.PCA.MM1[,1:2], labels = rownames(df.PCA.MM1))#PC2 vs PC3, ignore this if only 2 PCs are selectedplot(df.PCA.MM1[,2:3], bg=cluster.data$classification,pch=21,xlab='PC2', ylab='PC3')identify(df.PCA.MM1[,2:3], labels = rownames(df.PCA.MM1))#PC1 vs PC3, ignore this if only 2 PCs are selectedplot(df.PCA.MM1[,c(1,3)], bg=cluster.data$classification, pch=21, xlab='PC1', ylab='PC3')identify(df.PCA.MM1[,c(1,3)], labels = rownames(df.PCA.MM1))#3D plotpoints3D(x=df.PCA.MM1$PC1,y=df.PCA.MM1$PC2,z=df.PCA.MM1$PC3, colvar=cluster.data$classification, xlab = 'PC1', ylab = 'PC2', zlab = 'PC3', pch = 16, colkey = F, col=c('black', 'red', 'green') )identify(df.PCA.MM1[,1:3], labels = rownames(df.PCA.MM1))Link to github
This NextJournal notebook is used to show the functionality and reproducibility of the source code. The full version R script with interactive plotting is available on GitHub, (https://github.com/choulin2/PCA_HRM.git).
Citation
Chou, L., S.J. Huang, C. Hsieh, M.T. Lu, C.W. Song, F.C. Hsu. A High Resolution Melting Analysis (HRM)-based Genotyping Toolkit for Chilling Requirement in Peach (Prunus persica). Submitted to International Journal of Molecular Sciences on 11/29/2019.