Why Use Volume % for Spectroscopic Calibration Models of Liquids?
Today I'll address an important case where the misuse of weight % can occur in chemometric modelling efforts involving absorption spectroscopy of liquids. This is something that stifles some newcomers in the field - especially people without a firm background in chemistry. It often leads to people asking "Why is concentration and absorbance nonlinear for this data!?" or even "This is bad data; go recollect it!" Or, worse yet, the practitioner sees nothing wrong, proceeds unaware, and delivers substandard results/models. This mistake has likely cost a lot of time, money, and accuracy in delivered models globally. I had been instructed to avoid this early in my career, but maybe others weren't so lucky? Either way I'd like to tack it down.
On Absorption Spectroscopy
Many, if not most, optical spectrometers used in industry have been designed to operate on the basis of absorbance. So, what is absorption spectroscopy? In a typical absorption spectrometer configuration (UV-VIS, NIR, etc) we shine a broadband light source such that the beam transmits through a material or sample. Photons may be absorbed by the sample if the analytes inside of it have modes that match the available radiant energy. The light that exits, or transmits, all the way through the sample is typically focused, then dispersed by it's wavelength onto a detector. Bins on a detector such as a CCD then correspond to a grouping of wavelengths of light and intensity can be digitally measured (1). Please consult your instrumental analysis textbooks for a considerably more thorough treatment of this topic. Below is a simple figure detailing the most naive setup.
using Plots
rectangle(w, h, x, y) = Shape(x .+ [0,w,w,0], y .+ [0,0,h,h])
plot([0,1], [0,0] ,arrow = arrow(0.5), legend = false, color = :black, bordercolor="white", xaxis = false, yaxis = false);
plot!(rectangle(1.,0.5,1.,-0.25), opacity=.5, color = :blue);
plot!([2,4], [0,0] ,arrow = arrow(0.5), legend = false, color = :black);
a = plot!( 0.1sin.(0.0:(pi/2):20*pi) .+ collect(3.0:0.05:5.0),
reverse(-0.25:0.0125:0.25), color = :grey);
annotate!(a, 1.5,0, "Sample")
annotate!(a, 4.1,0.05, Plots.text("Dispersing Optic", 12, :dark, rotation = 315 ))
colors = [:red, :yellow, :green, :blue, :violet]
for (n, col) in enumerate(colors)
plot!( a, [4, 4 .+ cos( n/pi + pi)], [0, 0.75sin(n/pi + pi) ], color = col)
end
display(a)
So let's break this down. What happens to an infinitely thin stream of photons as it travels through the sample? Well, we know the rate of change of intensity (I) of the light in an absorbing medium depends on how far it has traveled through the medium (L). By making a good guess we can state that there is likely to be some material dependent proportionality constant (k) by which the intensity is diminished given an absorption phenomenon. The k term handles the likelihood that a given mode will be excited/absorb a photon. If we translate this to a simple differential equation, we can make quick work of this problem statement.
formula not implementedformula not implementedformula not implementedformula not implementedThree things worth noting here...
An absolute intensity measure for each experiment is not needed. The equation shows that we just need to make an intensity measure relative to some standard. Typically this performed via a "blank" sample (I0).
The definition of transmittance has naturally appeared.
3. We may also assume that intensity diminishes linearly with the abundance of an absorbing species (C), and that there exists some constant for the loss of energy due to excitation of a physical mode (ɛ).
formula not implementedSomeone smart looked at this equation and said, "Hey that's great but natural logarithms are really annoying to deal with" (this was way before computers!) so they made this new thing called "Absorbance" defined on the base 10 logarithm:
formula not implementedConversion of a base 10 logarithm to a natural log is just the multiplication by a constant. Most people just factor that into ɛ. There's no deep-seated reasoning for base 10, it was just easier to deal with experimentally. Again, a lot of the formalism here was being made before computers and calculators were widely available. Anyhow, we've arrived at Beer-Lambert-Bouguer law.
formula not implementedWe now have a way to linearly relate absorbance (what we tend to measure from an instrument) to some quantum effects, path length (typically fixed), and concentration. Our derivation treated an infinitely thin stream of light, but by convention this equation usually is used to treat a volume of light passing through a sample, because that's how the analysis' are actually performed.
Dimensional Analysis
A powerful, yet simple, and always facile tool for understanding physics is examining the units of the variables in a given equation. In this case there's insight to be had from such a tool:
A: No units
ɛ: dm^2/mol (conventionally)
L: dm
C: mol/dm^3
Notice that ɛ is kind of a combination of two terms, something like an area, and something like a "loss" per unit abundance (mol). There's really no reason why the equation couldn't be written like the following,
formula not implementedwhere,
ɛ: 1/mol
V: dm^3
and the other units remain the same. The take away is, we are measuring the relative loss of photons which have traversed a volume of sample. Secondarily, ɛ must contain information relating to how likely a photon for a given species is absorbed. But the volumetric term is conceptually important when we address which concentration measures are most apt for liquid analysis.
On Concentration Measures
Concentrations, like most fundamental measures of real "stuff", are a funny thing. If we wanted to know how densely populated fish are in a pond, we might stick a net in and pull it out to say "this area had 11 fish per net sampled". That's valid, but not really usable to someone with a different sized net.
In chemistry, we like things that are more universal. As chemists we often talk about two forms of concentration (surely there are dozens if not hundreds of others),
formula not implementedand,
formula not implementedWhats the big deal? Don't they say the same thing? No - no they don't! Consider mixing fixed amounts of small marbles with ping-pong balls. Does the following equation hold?
formula not implementedThis may be counter intuitive to some readers, but it does not. Volume is not an intrinsic property, its a function of a great deal of things actually. In the ping pong ball example, volumes aren't additive because things can pack or settle differently. The image below shows an example of how two collections of items don't mix to yield a linear increase in volume.
using Plots
big_grid = collect.(Iterators.product(1:5, 1:5))
smaller_grid = map( xy -> xy .+ [0.5,0.5], Iterators.product(1:4, 1:4) )
l = [a b c]
p1 = scatter( first.(big_grid), last.(big_grid), legend = false, color = :green, markersize = 37);
p2 = scatter( first.(smaller_grid), last.(smaller_grid), legend = false, color = :pink, markersize = 25);
p3 = scatter( first.(big_grid), last.(big_grid), legend = false, color = :green, markersize = 37);
p3 = scatter!(p3, first.(smaller_grid), last.(smaller_grid), legend = false, color = :pink, markersize = 25);
plot(p1, p2, p3, layout = l, xaxis = false, yaxis = false, size = (1200,400))
Pure liquid constituents can become attracted or even repelled by electrostatic (or otherwise) interactions of added or removed species. These effects can shrink or expand the liquid away from what might be the expected volume. If this topic interests you please pursue search terms like "volume of mixing" or "mixture effects" - or wait until you take a course in thermodynamics/physical chemistry - no rush.
Now it's pertinent to ask, why wouldn't we prefer weight percent for calibration models, since mass is conserved in nonreacting mixtures (2)? Well - weight percent does not account for these volume fluctuations. Recall that weight percent describes nothing about the volumes of mixture components. More importantly, absorption spectrometers are not measuring mass. The measurement is based on an interaction between light and matter in a known and often fixed volume.
If the liquids are very well studied, documented, and we do extra calculations with thermodynamic considerations, we can infer these quantities. But, even then - things get hairy!
It's much easier to just measure the final volume after mixing and normalize our concentration than do those calculations and experiments. Less degrees of freedom for lab error to propagate, and less work. That being said, most ways of doing this experimentally involve serial dilution, so handle error with care.
Yea but I'm doing Multivariate Calibration!
Awesome - me too! MVC still depends on this linear relationship (3). Yes, you may abstract the meaning of ɛ into reduced rank regression coefficients, sums, whatever boats your float. Its best to remember that this has everything to do with the fact that the attenuation of light is proportional to the exposed volume, and not weight or weight percentage of analyte.
The equation most often used when handling volume percents is the following:
formula not implementedwhere ɛ is now in terms of inverse volume. Interestingly, if the volume of the measurement is constant, that term can be dropped and ɛ no longer has units.
At the end of the day, chemical measures are based on irrefutable science, otherwise it has no verifiable utility. The beauty is that science is more flexible, playful, and abstract than many have been taught.
Take Aways
Beer-Lambert-Bouguer Law is an easy way to linearly relate the light attenuated by a chemical species to a chemicals abundance or concentration.
In absorption/transmission spectroscopy, we are measuring a sample with a volume of light.
Sample volumes are not additive measures when we are dealing with chemicals due to things like mixture effects.
Weight percent doesn't account for nonlinear volume effects.
Volume percent normalizes mixture effects and phrases the data in a way natural to the attenuation equation.
"k"/ɛ are just an idea; a highly formalized, experimentally backed, and indoctrinated one. As long as the term is properly constructed the physics still holds no matter the units.
That all being said, if data has already been collect with weight percent reference values there are easy recoveries! There's a host of nonlinear modelling approaches that make quick work of these volume effects when using weight % as well. The trick is in knowing what you may sacrifice when using them, and best practices for their use.
Suggestion for Machine Learners
Many people from the various disciplines of Machine Learning see chemical data as neat but don't invest in the domain knowledge. It's really important to know what is going into and out of a model to select proper tools, make correct experiments, designs, suggestions, and of course, conclusions. Chemical data follow physical laws. When analytics practitioners don't abide by these laws they get bit sooner or later and in most cases under perform.
Chemical data is expensive, it's not like an image you download from the internet with a web-scraping script. Mistakes made with chemical data can cost millions of dollars over time, harm the environment, lead to ecological disasters, national security issues, and even harm or kill. Often it's best to either do it right or don't bother with doing it at all.
Caveats
1. Not all spectrometers use CCD/CMOS detectors.
2. Mass is not always preserved as mass. It can turn into energy - thanks Einstien. If you find yourself measuring chemicals traveling near the speed of light please give me a call. I probably can't help you, but I'd love to hear about it.
3. Beer-Lambert-Bouger law doesn't always hold. No, the relationship is not always linear. Identifying this can be the difference between success and failure.
Related Work
There's a great paper about this by Howard Marks somewhere too! The paper comes from an experimentalists perspective, someone who fell into this trap and had to reason their way out of it.
Do you see a mistake?
Hey its a free information, I didn't scrutinize this a whole lot, and wrote it in my spare time. Let me know and I'll issue a correction ASAP! Even give you a shout out if you wish.
Disclaimer
The thoughts and views expressed in this blog post are my own and are not those of my current nor former employers.