Chen Hsieh / Jan 07 2020

HRM analysis with PCA and clustering

Principal Component Analysis (PCA) of High Resolution Melting (HRM) Analysis for high throughput genotyping

Dependencies installation

install.packages("tidyverse")install.packages("mclust")install.packages("plot3D")library(mclust)library(plot3D)library(tidyverse)

1. Import raw data

(1) File upload

If you choose to try this script on NextJournal, simply remix this notebook and drag your file with corresponding format as shown below.

Let's take a glance at the example file with the name of "PCA_HRM.example.csv".

PCA_HRM.example.csv

0 items

(2) Create a data frame to store normalized data

exported.data.file=read.csv(PCA_HRM.example.csv
,header=T)head(exported.data.file)

0 items

2. Select melting region

# max_lim = readline('Enter the upper limit of the melt region: ')# min_lim = readline('Enter the lower limit of the melt region: ')max_lim = 83.9min_lim = 77.6exported.data.file <- subset(exported.data.file, Temperature>min_lim & Temperature<max_lim)  # Excluding the pre- and post-melt regionnormfluor = exported.data.file #to store the normalized datahead(normfluor)

0 items

3. Make absolute fluorescence into percentage

for (i in 2:length(exported.data.file[1,])){sample.fluorescence.norm <-(exported.data.file[,i])/max(exported.data.file[,i])normfluor[,i]=sample.fluorescence.norm}normfluor

0 items

4. Make the medium curve

medcurve <- apply(normfluor, 1, median)

5. Use median curve to normalize percentage melt curves

for(i in 2:length(normfluor[1,])){  normfluor[,i]=normfluor[,i]-medcurve}normfluor

0 items

6. Principal component analysis (PCA)

PCA.analysis.file <-prcomp(normfluor[,2:28], scale = FALSE) plot(PCA.analysis.file,type="line",  main="PCs of HRM") # Pareto plot

7. Clustering

cluster.data <- Mclust(PCA.analysis.file$rotation[,1:3], G = 3)##CHANGE HOW MANY PCs TO BE SELECTED HERE!!!!!!!!!!!df.PCA.MM1 <- as.data.frame(PCA.analysis.file$rotation[,1:3])##CHANGE HOW MANY PCs TO BE SELECTED HERE!!!!!!!!!!!## report the variant callcluster.data$classification## preview plots#PC1 vs PC2plot(df.PCA.MM1[,1:2], bg=cluster.data$classification,pch=21,xlab='PC1', ylab='PC2')identify(df.PCA.MM1[,1:2], labels = rownames(df.PCA.MM1))#PC2 vs PC3, ignore this if only 2 PCs are selectedplot(df.PCA.MM1[,2:3], bg=cluster.data$classification,pch=21,xlab='PC2', ylab='PC3')identify(df.PCA.MM1[,2:3], labels = rownames(df.PCA.MM1))#PC1 vs PC3, ignore this if only 2 PCs are selectedplot(df.PCA.MM1[,c(1,3)], bg=cluster.data$classification, pch=21, xlab='PC1', ylab='PC3')identify(df.PCA.MM1[,c(1,3)], labels = rownames(df.PCA.MM1))#3D plotpoints3D(x=df.PCA.MM1$PC1,y=df.PCA.MM1$PC2,z=df.PCA.MM1$PC3,          colvar=cluster.data$classification,          xlab = 'PC1', ylab = 'PC2', zlab = 'PC3', pch = 16,         colkey = F, col=c('black', 'red', 'green')         )identify(df.PCA.MM1[,1:3], labels = rownames(df.PCA.MM1))

Link to github

This NextJournal notebook is used to show the functionality and reproducibility of the source code. The full version R script with interactive plotting is available on GitHub, (https://github.com/choulin2/PCA_HRM.git).

Citation

Chou, L., S.J. Huang, C. Hsieh, M.T. Lu, C.W. Song, F.C. Hsu. A High Resolution Melting Analysis (HRM)-based Genotyping Toolkit for Chilling Requirement in Peach (Prunus persica). Submitted to International Journal of Molecular Sciences on 11/29/2019.

HRM analysis with PCA and clustering

Dependencies installation

1. Import raw data

(1) File upload

(2) Create a data frame to store normalized data

2. Select melting region

3. Make absolute fluorescence into percentage

4. Make the medium curve

5. Use median curve to normalize percentage melt curves

6. Principal component analysis (PCA)

7. Clustering

Link to github

Citation

Runtimes (1)