Rotation of principal components

To further facilitate interpretation of the relationships between variables and PCs, additional rotation can be applied to PCs to result in high factor loadings for a few variables and low factor loadings for the rest. In other words, a small number of variables will become highly correlated with each PC. The most common form of rotation is varimax rotation, a generalized form of which is implemented in the PCAmixdata package for mixed data.

Here we will visualize the result of varimax rotation on relationships between variables and the first two PCs.

2.0s
## Import library
library(PCAmixdata)
library(FactoMineR)
library(factoextra)

## Import data
df <- read.csv('https://github.com/nchelaru/data-prep/raw/master/telco_cleaned_renamed.csv')

## Drop the TotalCharges variable, as it is a product of MonthlyCharges and Tenure
df <- within(df, rm('TotalCharges'))

## Split quantitative and qualitative variables
split <- splitmix(df)

## FAMD
res.pcamix <- PCAmix(X.quanti=split$X.quanti,  
                     X.quali=split$X.quali, 
                     rename.level=TRUE, 
                     graph=FALSE, 
                     ndim=25)

## Add "Churn" as a supplementary varible
res.sup <- supvar(res.pcamix,  
                  X.quanti.sup = NULL, 
                  X.quali.sup = df[19], 
                  rename.level=TRUE)

## Apply varimax rotation to the first two PCs
res.pcarot <- PCArot(res.sup,
                     dim=2,
                     graph=FALSE)

## Visualize factor loadings before rotation
plot(res.sup, 
     choice="sqload", 
     coloring.var=TRUE, 
     axes=c(1, 2),
     leg=TRUE, posleg="topleft", main="Variables before rotation",
     xlim=c(0,1), ylim=c(0,1))
## Visualize factor loadings after rotation
plot(res.pcarot, 
     choice="sqload", 
     coloring.var=TRUE,
     axes=c(1, 2),
     leg=TRUE, posleg="topright", main="Variables after rotation", 
     xlim=c(0,1), ylim=c(0,1))

We see higher factor loading of MonthlyCharges and InternetService for the rotated PC1, and Tenure and Contract for the rotated PC2 (as their projections are more closely aligned either axis). This indicates these four variables are the most important in accounting for overall variation in the entire dataset.

Interestingly, correlation between PC2 and Churn has decreased after rotation, with increased factor loading for PC1.