Variable contributions
Whereas factor loading and squared loading measure how well a given PC "describes" variation capture in a variable, contribution describes the converse, namely how much a variable accounts for the total variation captured by a given PC. It is important to compare the squared loading and contribution for each variable to critically assess its relationship with a given PC, as a variable that is important for a PC may not be well represented by the same PC, which warrants very different interpretation as compared to the converse.
Top contributing variables to the first few PCs can provide insights into which variables underlie variations in the dataset, and may help with feature selection for downstream analyses. The FactorMineR
package can be used to visualize the top contributing variable to each PC. The red dashed line indicates the expected average contribution (100% contribution divided the total number of variables available in the dataset). So variables meeting the cut-off would be considered as important in contributing to the PC.
Here we plot out the top 10 contributing variables to each of the first two PCs separately, and then look at them together. Note, as varimax rotation for FAMD results is not possible in the FactoMineR
package, the PCs visualized here are unrotated.
## Import libraries library(FactoMineR) library(factoextra) ## Import data df <- read.csv('https://github.com/nchelaru/data-prep/raw/master/telco_cleaned_renamed.csv') ## FAMD, set the target variable "Churn" as a supplementary variable, so it is not included in the analysis for now res.famd <- FAMD(df, sup.var = 19, graph = FALSE, ncp=25) fviz_contrib(res.famd, choice = "var", axes = 1, top = 10)
fviz_contrib(res.famd, choice = "var", axes = 2, top = 10)
fviz_contrib(res.famd, choice = "var", axes = 1:2, top = 20)
These figures suggest that the variables MonthlyCharges
, InternetService
and Tenure
play a significant part in accounting for the overall variation in this dataset, and so may warrant further investigations.
In the next notebook, we will delve deeper into examining relationships between variables.
Til then! :)