Fairness.jl - Fairness Toolkit in Julia
Fairness.jl is a new bias audit and mitigation toolkit in Julia designed with the aim to solve practical problems faced by practitioners with existing fairness toolkits.
This notebook shall present you with an introduction to Fairness.jl, its power, and uniqueness by the means of a Real-Life example of COMPAS Dataset.
But before visiting we begin the introduction, let us be clear why we need this package.
Why should I care about fairness, ethics, etc. ?
Machine Learning is involved in a lot of crucial decision support tools. The use of these tools ranges from granting parole, shortlisting job applications to accepting credit applications. There have been a lot of political and policy developments during the past year that have pointed out the transparency issues and data bias in these Machine Learning tools. Thus it has become crucial for the Machine learning community to think about fairness and bias. Eliminating bias is not as easy as it might seem at first glance. This toolkit helps you in easily auditing and minimizing the bias through a collection of various fairness metrics and algorithms.
In this example, we will use the COMPAS Dataset to predict whether a criminal defendant will recidivate(re-offend). Neural Network Classifier is used for classification. It is wrapped with the Reweighing Algorithm to preprocess the data. This wrapped Model is then wrapped with Equalized Odds Algorithm for postprocessing of predictions.
Downloading Required Packages
To install the package, you have to install
Now we import the required packages. Note that this will take 5 minutes on the first run. This is the case with all Julia packages. Julia pre-compiles the code to make it more efficient. This is a one-time thing. 2nd run onwards, everything will be fast!!
There is some issue with curl in nextjournal environment. So, curl needs to be uninstalled.
Load the COMPAS Dataset using the macro by Fairness.jl
This dataset has 8 features and 6907 rows. The protected attribute here is
race. Using the 8 features, it predicts whether a criminal defendant will recidivate(re-offend).
Multi-valued Protected Attribute
Notice in the output of the previous cell that the column
race has 6 different possible values "Native American", "African-American", "Caucasian", "Hispanic", "Asian" and "Other".
We support Multi-valued Protected attributes in both fairness algorithms and metrics. The fairness algorithms by Researchers have been improved to generalize for multiple values of a protected attribute.
Load Neural Network Classifier
We will use MLJFlux to load Neural Network classifier. MLJFlux is an interface of Flux with MLJ. You don't need to explicitly import MLJFlux. MLJ does all that for you!!
We use the
@load macro to load Neural Network Classifier into main. Then using the Neural Network, we use
@pipeline to add Continuous Encoder to the Neural Network Encoder. Continuous Encoder converts categorical strings to continuous values that support a much wider range(~50) of models!!
Fairness Algorithm Wrappers
We first wrap the Neural Network Classifier with Reweighing Algorithm. This wrapped Model is again wrapped with LinProg Postprocessing Algorithm.
Notice how usage of wrappers allows provides us with composability and enables you to apply unlimited algorithms on a single classifier.
Automatic evaluation using MLJ.evaluate
Using the evaluate function from MLJ, you only need to pass your model, data and concerned metrics. MLJ handles the rest of the work internally. Note that you need to wrap the metrics to specify the protected attribute.
Finer Control (Advanced)
You can get greater control than what is provided by
First, we need to get the train and test indices. This will be provided by the partition function.
machine is used to package the dataset and the wrapped Model (reused from before)
The machine fitted on the training rows.
Now we use
predict function on the
machine on the rows specified by
We use the concept of Fairness Tensors to avoid redundant calculations. Refer https://www.ashrya.in/Fairness.jl/dev/fairtensor/ to learn more about Fairness Tensors.
We pass predictions, ground-truth and protected values to the
Disparity can be calculated by passing the following to the disparity
An array of fairness metrics from the ones listed in README
Fairness tensor that we calculated in the previous step
func : disparity value for a metric M, group A and reference group B is
func(M(A), M(B)) . The default value for func is division and hence is an optional argument
The values above show that Asians and African-Americans have a higher percentage of False Positive Rate w.r.t. the reference group Caucasian. On the other hand, Native Americans have a lower percentage of False Positive Rate w.r.t Caucasian. But these disparity values are better than the case if the Neural Network Classifier was directly used.
To calculate parity, we need to pass following to the
parity function :
Scroll the output to the right to see the column for parity values.
The above parity outputs show that parity constraints for False Positive Rate are satisfied only by the groups: Other and Caucasian.
Visualizing improvement by Fairness Algorithm
Now we will use VegaLite to visualize the improvement in fairness metrics due to the fairness algorithms added in the form of wrappers. We shall also visualize the drop in accuracy due to the trade-off between accuracy and fairness.
wrappedModel2 is the ML model we previously wrapped with Reweighing algorithm and LinProg Algorithm.
Summary of what following code does :
Evaluate metric values using MLJ.evaluate for both: The Wrapped Model and the original model
Collect metric values from the result of evaluate function
Create a DataFrame using the collected values that will later be used with VegaLite to plot the graphs
Now, let us add the values of fairness metrics corresponding to scores by Northpointe’s COMPAS algorithm.
Improvement in False Positive Rate Disparity Values
The above plot shows that there was a high bias against the group "African-American" in the NeuralNetworkClassifier(ML Model). The False Positive Rate Disparity value is greater than
2.0 for this group while its nearly to
1.0 for others. This means that a person belonging to the group "African-American" is twice as likely as other groups to be falsely predicted as a criminal who would re-offend!!
But in the case of the wrapped model, the False Positive Rate disparity has been reduced for "African-American" to about
1.3 which is the same as most other groups.
The above plot shows that there is a drop in accuracy on using the wrapped model. This is a direct consequence of the fairness-accuracy tradeoff. So, we obtain a model that is fairer at the cost of accuracy.
Fairness vs Accuracy Comparison across Algorithms
This toolkit has been designed to solve the numerous problems faced by both Policy Makers, Researchers, etc while using Fairness toolkits. Various innovative features of this package have been explicitly listed at https://github.com/ashryaagr/Fairness.jl#what-fairnessjl-offers-over-its-alternatives
We are open to contributions. Feel free to open an issue on Github in case you want to contribute or have any confusion regarding the package. We would love to help you in getting started with this package.
Finally, this work would have been impossible without the immense support, novel ideas, and efforts made by Jiahao Chen, Sebastian Vollmer, and Anthony Blaom.
Ashrya Agrawal (firstname.lastname@example.org)
Link to Github Repository for Fairness.jl: https://github.com/ashryaagr/Fairness.jl