Fairness.jl - Fairness Toolkit in Julia

Fairness.jl is a new bias audit and mitigation toolkit in Julia designed with the aim to solve practical problems faced by practitioners with existing fairness toolkits.

This notebook shall present you with an introduction to Fairness.jl, its power, and uniqueness by the means of a Real-Life example of COMPAS Dataset.

But before visiting we begin the introduction, let us be clear why we need this package.

Why should I care about fairness, ethics, etc. ?

Machine Learning is involved in a lot of crucial decision support tools. The use of these tools ranges from granting parole, shortlisting job applications to accepting credit applications. There have been a lot of political and policy developments during the past year that have pointed out the transparency issues and data bias in these Machine Learning tools. Thus it has become crucial for the Machine learning community to think about fairness and bias. Eliminating bias is not as easy as it might seem at first glance. This toolkit helps you in easily auditing and minimizing the bias through a collection of various fairness metrics and algorithms.


In this example, we will use the COMPAS Dataset to predict whether a criminal defendant will recidivate(re-offend). Neural Network Classifier is used for classification. It is wrapped with the Reweighing Algorithm to preprocess the data. This wrapped Model is then wrapped with Equalized Odds Algorithm for postprocessing of predictions.

Downloading Required Packages

To install the package, you have to install

using Pkg
Pkg.activate("my_environment", shared=true)
Pkg.add("MLJ") # Toolkit for Machine Learning
Pkg.add("PrettyPrinting") # For readibility of outputs

Now we import the required packages. Note that this will take 5 minutes on the first run. This is the case with all Julia packages. Julia pre-compiles the code to make it more efficient. This is a one-time thing. 2nd run onwards, everything will be fast!!

using Fairness
using MLJ
using PrettyPrinting

There is some issue with curl in nextjournal environment. So, curl needs to be uninstalled.

;sudo apt-get remove curl

Load the COMPAS Dataset using the macro by Fairness.jl

This dataset has 8 features and 6907 rows. The protected attribute here is race. Using the 8 features, it predicts whether a criminal defendant will recidivate(re-offend).

data = @load_compas
X, y = data
data |> pprint

Multi-valued Protected Attribute

Notice in the output of the previous cell that the column race has 6 different possible values "Native American", "African-American", "Caucasian", "Hispanic", "Asian" and "Other".

We support Multi-valued Protected attributes in both fairness algorithms and metrics. The fairness algorithms by Researchers have been improved to generalize for multiple values of a protected attribute.

Load Neural Network Classifier

We will use MLJFlux to load Neural Network classifier. MLJFlux is an interface of Flux with MLJ. You don't need to explicitly import MLJFlux. MLJ does all that for you!!

We use the @load macro to load Neural Network Classifier into main. Then using the Neural Network, we use @pipeline to add Continuous Encoder to the Neural Network Encoder. Continuous Encoder converts categorical strings to continuous values that support a much wider range(~50) of models!!

@load NeuralNetworkClassifier
model = @pipeline ContinuousEncoder NeuralNetworkClassifier
Pipeline373( continuous_encoder = ContinuousEncoder( drop_last = false, one_hot_ordered_factors = false), neural_network_classifier = NeuralNetworkClassifier( builder = Short @703, finaliser = softmax, optimiser = ADAM(0.001, (0.9, 0.999), IdDict{Any,Any}()), loss = crossentropy, epochs = 10, batch_size = 1, lambda = 0.0, alpha = 0.0, optimiser_changes_trigger_retraining = false)) @730

Fairness Algorithm Wrappers

We first wrap the Neural Network Classifier with Reweighing Algorithm. This wrapped Model is again wrapped with LinProg Postprocessing Algorithm.

Notice how usage of wrappers allows provides us with composability and enables you to apply unlimited algorithms on a single classifier.

wrappedModel = ReweighingSamplingWrapper(classifier=model, grp=:race)
wrappedModel2 = LinProgWrapper(classifier=wrappedModel, grp=:race,
LinProgWrapper( grp = :race, classifier = ReweighingSamplingWrapper( grp = :race, classifier = Pipeline373 @730, factor = 1.0, rng = _GLOBAL_RNG()), measure = false_positive_rate( rev = nothing)) @828

Automatic evaluation using MLJ.evaluate

Using the evaluate function from MLJ, you only need to pass your model, data and concerned metrics. MLJ handles the rest of the work internally. Note that you need to wrap the metrics to specify the protected attribute.

  X, y,
    Disparity(false_positive_rate, refGrp="Caucasian", grp=:race),
    MetricWrapper(accuracy, grp=:race)]) |> pprint

Finer Control (Advanced)

You can get greater control than what is provided by evaluate function.

  • First, we need to get the train and test indices. This will be provided by the partition function.

  • machine is used to package the dataset and the wrapped Model (reused from before)

  • The machine fitted on the training rows.

train, test = partition(eachindex(y), 0.7, shuffle=true)
mach = machine(wrappedModel2, X, y)
fit!(mach, rows=train)
Machine{LinProgWrapper{ReweighingSamplingWrapper{Pipeline373}}} @128 trained 1 time. args: 1: Source @226 ⏎ `Table{Union{AbstractArray{Count,1}, AbstractArray{Multiclass{3},1}, AbstractArray{Multiclass{6},1}, AbstractArray{Multiclass{2},1}}}` 2: Source @079 ⏎ `AbstractArray{Multiclass{2},1}`

Now we use predict function on the machine on the rows specified by test

 = predict(mach, rows=test) 
 |> pprint

Auditing Bias

We use the concept of Fairness Tensors to avoid redundant calculations. Refer https://www.ashrya.in/Fairness.jl/dev/fairtensor/ to learn more about Fairness Tensors.

We pass predictions, ground-truth and protected values to the fair_tensor function.

ft = fair_tensor(, y[test], X[test, :race])
FairTensor{6}([471 113; 4 0; … ; 0 1; 33 8] [231 212; 2 1; … ; 1 2; 41 36], ["African-American", "Asian", "Caucasian", "Hispanic", "Native American", "Other"])

Disparity Calculation

Disparity can be calculated by passing the following to the disparity function :

  • An array of fairness metrics from the ones listed in README

  • Fairness tensor that we calculated in the previous step

  • Reference Group

  • func : disparity value for a metric M, group A and reference group B is func(M(A), M(B)) . The default value for func is division and hence is an optional argument

df_disparity = disparity(
  [accuracy, false_positive_rate], 
  ft, refGrp="Caucasian",
  func=(x, y)->(x-y)/y
Native American-0.2002212389380531-0.18181818181818207
6 items

The values above show that Asians and African-Americans have a higher percentage of False Positive Rate w.r.t. the reference group Caucasian. On the other hand, Native Americans have a lower percentage of False Positive Rate w.r.t Caucasian. But these disparity values are better than the case if the Neural Network Classifier was directly used.

Parity Calculation

To calculate parity, we need to pass following to the parity function :

  • DataFrame output from the disparity function in the previous step

  • Custom Function to calculate parity based on disparity values.

Scroll the output to the right to see the column for parity values.

  func= (x) -> abs(x)<0.4
Native American-0.2002212389380531-0.18181818181818207truetrue
6 items

The above parity outputs show that parity constraints for False Positive Rate are satisfied only by the groups: Other and Caucasian.

Visualizing improvement by Fairness Algorithm

Now we will use VegaLite to visualize the improvement in fairness metrics due to the fairness algorithms added in the form of wrappers. We shall also visualize the drop in accuracy due to the trade-off between accuracy and fairness.

Note that wrappedModel2 is the ML model we previously wrapped with Reweighing algorithm and LinProg Algorithm.

Summary of what following code does :

  • Evaluate metric values using MLJ.evaluate for both: The Wrapped Model and the original model

  • Collect metric values from the result of evaluate function

  • Create a DataFrame using the collected values that will later be used with VegaLite to plot the graphs

using VegaLite
using DataFrames
result = evaluate(wrappedModel2,
    X, y,
    Disparity(false_positive_rate, refGrp="Caucasian", grp=:race),
    MetricWrapper(accuracy, grp=:race)])
result_1 = evaluate(model,
    X, y,
    Disparity(false_positive_rate, refGrp="Caucasian", grp=:race),
    MetricWrapper(accuracy, grp=:race)])
n_grps = length(levels(X[!, :race]))
dispVals = collect(values(result.measurement[1]))
dispVals_1 = collect(values(result_1.measurement[1]))
accVals = collect(values(result.measurement[2]))
accVals_1 = collect(values(result_1.measurement[2]))
df = DataFrame(
  disparity=vcat(dispVals, dispVals_1),
  accuracy=vcat(accVals, accVals_1),
  algo=vcat(repeat(["Wrapped Model"],n_grps+1), repeat(["ML Model"],n_grps+1)),
  grp=repeat(collect(keys(result.measurement[1])), 2));

Improvement in False Positive Rate Disparity Values

df |> @vlplot(
  y={"disparity:q",axis={title="False Positive Rate Disparity"}},
  x={"algo:o", axis={title=""}},

The above plot shows that there was a high bias against the group "African-American" in the NeuralNetworkClassifier(ML Model). The False Positive Rate Disparity value is greater than 2.0 for this group while its nearly to 1.0 for others. This means that a person belonging to the group "African-American" is twice as likely as other groups to be falsely predicted as a criminal who would re-offend!!

But in the case of the wrapped model, the False Positive Rate disparity has been reduced for "African-American" to about 1.3 which is the same as most other groups.

Accuracy Comparison

df |> @vlplot(
  x={"algo:o", axis={title=""}},

The above plot shows that there is a drop in accuracy on using the wrapped model. This is a direct consequence of the fairness-accuracy tradeoff. So, we obtain a model that is fairer at the cost of accuracy.

Fairness vs Accuracy Comparison across Algorithms

using Plots
function algorithm_comparison(algorithms, algo_names, X, y;
  refGrp, grp::Symbol=:class)
	grps = X[!, grp]
	categories = levels(grps)
	train, test = partition(eachindex(y), 0.7, shuffle=true)
	plot(title="Fairness vs Accuracy Comparison", seriestype=:scatter, 
        ylabel="False Positive Rate Disparity refGrp="*refGrp,
        legend=:topleft, framestyle=:zerolines)
	for i in 1:length(algorithms)
		mach = machine(algorithms[i], X, y)
		fit!(mach, rows=train)
		 = predict(mach, rows=test)
		if typeof() <: MLJ.UnivariateFiniteArray
			 = mode.()
		ft = fair_tensor(, y[test], X[test, grp])
		plot!([accuracy(ft)], [fpr(ft)/fpr(ft, grp=refGrp)], 
      seriestype=:scatter, label=algo_names[i])
algorithm_comparison([model, wrappedModel, wrappedModel2], 
  ["NeuralNetworkClassifier", "Reweighing(Model)",
    "LinProg+Reweighing(Model)"], X, y, 
    refGrp="Caucasian", grp=:race)

Concluding Remarks

This toolkit has been designed to solve the numerous problems faced by both Policy Makers, Researchers, etc while using Fairness toolkits. Various innovative features of this package have been explicitly listed at https://github.com/ashryaagr/Fairness.jl#what-fairnessjl-offers-over-its-alternatives

We are open to contributions. Feel free to open an issue on Github in case you want to contribute or have any confusion regarding the package. We would love to help you in getting started with this package.

Finally, this work would have been impossible without the immense support, novel ideas, and efforts made by Jiahao Chen, Sebastian Vollmer, and Anthony Blaom.

Ashrya Agrawal (ashryaagr@gmail.com)

Link to Github Repository for Fairness.jl: https://github.com/ashryaagr/Fairness.jl

Documentation: https://www.ashrya.in/Fairness.jl/dev/

Runtimes (1)