Interactive circle packing plots
Introduction
I was looking for something suited for visualizing hierarchical categorical data that goes beyond the regular bar graphs. This D3 zoomable circle packing visualization, done using the circlepackeR
package, uses a series of nested circles that you can click on and zoom in/out of.
To learn more, please see the official documentation by the package author.
If you want to try this yourself, click on "Remix" in the upper right corner to get a copy of the notebook in your own workspace. Please remember to import both the Python (circle_packing_Python
) and R (circle_packing_R
) runtimes from this notebook under "Runtime Settings" to ensure that you have all the installed packages and can start right away.
Import and pre-process data
As usual, we will use the IBM Telco customer churn dataset, which I have cleaned up in a previous post.
Since I'm quite a bit more comfortable with data wrangling in Python, I will first get the number of customers in each level of every categorical variable using pandas
:
## Import data import pandas as pd df = pd.read_csv("https://github.com/nchelaru/data-prep/raw/master/telco_cleaned_renamed.csv") ## Get categorical column names cat_list = [] for col in df.columns: if df[col].dtype == object: cat_list.append(col) ## Get all possible levels of every categorical variable and number of data points in each level cat_levels = {} for col in cat_list: levels = df[col].value_counts().to_dict() cat_levels[col] = levels ## Convert nested dictionary to dataframe nestdict = pd.DataFrame(cat_levels).stack().reset_index() nestdict.columns = ['Level', 'Category', 'Population'] ## Output data to file nestdict.to_csv("./results/nested_dict.csv") ## Preview dataframe nestdict.head()
Level | Category | Population | |
---|---|---|---|
0 | Bank transfer (automatic) | PaymentMethod | 1542.0 |
1 | Churn | Churn | 1869.0 |
2 | Credit card (automatic) | PaymentMethod | 1521.0 |
3 | DSL | InternetService | 2416.0 |
4 | Dependents | Dependents | 2099.0 |
Create circle packing visualization
Now we will take the prepared data and move to R for making the plot:
ip <- as.data.frame(installed.packages()[,c(1,3:4)]) rownames(ip) <- NULL ip <- ip[is.na(ip$Priority),1:2,drop=FALSE] print(ip, row.names=FALSE)
## Import libraries library(tidyverse) library(circlepackeR) library(hrbrthemes) library(htmlwidgets) library(data.tree) ## Import data nestdict <- read.csv(nested_dict.csv) ## Prepare data format nestdict$pathString <- paste("world", nestdict$Category, nestdict$Level, sep = "/") population <- as.Node(nestdict) ## Make the plot x <- circlepackeR(population, size = "Population", color_min = "hsl(56,80%,80%)", color_max = "hsl(341,30%,40%)") ## Save widget to HTML file for display saveWidget(x, 'widget.html')
Finally, move the HTML file to the results
folder so we can visualize it. Try clicking on the circles!
mv widget.html ./results
At a glance, the sizes of circles in the second level give a quick overview of relative distributions of the levels of each categorical variable. Click on the circles to zoom in and out!
When the occasion is right, this could be a really fun way to add some pizzazz to your visualizations. :)