Dr. Schmood's Notebook of Python Calisthenics and Orthodontia
Don't get bit by misaligned state and output
Sharing Information
The massive information processing power [...] will truly become available to the general public. And, I see that as having a tremendous democratizing potential, for most assuredly, information — data and the ability to organize and process it — is power.
~ Jim Warren Dr. Dobbs Journal Vol 1 (January 21, 1977)inline_formula not implemented
Sharing Information in 1977
Hidden State
The Hidden State problem is a problem with Python and not necessarily notebook interfaces.
What Does = Equal?
Reflexive:inline_formula not implemented
Transitive: inline_formula not implemented
Symmetric:inline_formula not implemented
Hidden State by Mutation
Assignment (=
) and Equality (==
)
Strings
a="10"
b=a
a=a+"0"
str("a: " + a + " b: " + b)
An unbalanced operation.
id(a) == id(b)
Immutable sequences: Strings, Tuples, Bytes
Lists
a=["10"]
b=a
a[len(a):] = ["20"]
str("a: " + str(a) + " b: " + str(b))
A balanced operation.
id(a) == id(b)
Mutable sequences: Lists, Byte Arrays
State and Syntax
The Case For ≔
A notorious example for a bad idea was the choice of the equal sign to denote assignment. It goes back to Fortran in 1957 and has blindly been copied by armies of language designers. Why is it a bad idea? Because it overthrows a century old tradition to let “=” denote a comparison for equality, a predicate which is either true or false. But Fortran made it to mean assignment, the enforcing of equality. In this case, the operands are on unequal footing: The left operand (a variable) is to be made equal to the right operand (an expression). x = y does not mean the same thing as y = x.
~ Niklaus Wirth, Good Ideas, Through the Looking Glass
Ada :=
--- Ada 95
procedure Main is
A, B : Integer := 0;
C : Integer := 100;
D : Integer;
begin
if A = 0 and C = 100 then
A := A + 1;
D := A + B + C;
end if;
end Main;
Reasoning About State
a=42
→ Set a
equal to 42
or a
equals 42
.
If either side of the equation changes, the other side must accordingly change.
a:=42
→ a
is assigned the value of 42
.inline_formula not implemented
a
becomes 42
, but 42
does not become a
. 42
is immutable, it will always be 42
.
The thing on the right side will exist until nothing points to it, at which point it is garbage collected.
The Case For Define
(def c)
(print c)
Assignment: to specify a correspondence or relationship
Define: to identify the essential qualities
(def a "10")
(def b a)
(def a (str a "0"))
(str "a: " a " b: " b)
Hidden State by Scope
Referential Transparency
x=1 # Introducing state
def inc():
global x
x+=1 # Modifying state
return x
print("x: %s, inc(): %s" % (x, inc()))
print("x: %s, inc(): %s" % (x, inc()))
inc() == inc()
y=6 # Introducing state
def inc(y):
y+=1 # Modifying state
return y
print("y: %s, inc(): %s" % (y, inc(y)))
print("y: %s, inc(): %s" % (y, inc(y)))
inc(y) == inc(y)
def inc(z):
z+=1
return z
print("1: 1, inc(): %s" % (inc(1)))
print("1: 1, inc(): %s" % (inc(1)))
inc(1) == inc(1)
Can Seem Innocuous
(lambda a: a + 1)(1)
(lambda a: a + y)(1)
Local Bindings
let [a 2]
→ make the symbol a
the integer 2
. a
is not a variable, it will never change.
(let [a 2]
(println "Lexical scope: " a))
(println " Global scope: " a)
(var a) ;; return the var itself, not its value
The scope is totally defined by the let
form.
(let [i 2]
(try
(eval (var i))
(catch Exception e
(print "Exception:" (.getMessage e))))
i)
Expressions of State
In general we like to favor immutability where sensible.
~ Pandas Documentation (emphasis theirs)
Immutability
import pandas as pd
artwork_data = pd.read_csv(artwork_data.csv) # Introducing in-memory state
print(list(artwork_data))
artwork_data.drop(columns=["accession_number"])
print(list(artwork_data))
Explicit Mutability
print(list(artwork_data))
artwork_data.drop(columns=["accession_number"], inplace=True)
print(list(artwork_data))
Data Mutation
print(artwork_data.at[1, 'acquisitionYear'])
artwork_data.head()
Pure Function
inc(artwork_data.at[1, 'acquisitionYear'])
Implicit Mutability
Run in one order, this cell returns True
. Run in another order, this cell returns False
.
print(inc(artwork_data.at[1, 'acquisitionYear']))
artwork_data.at[1, 'acquisitionYear'] <= 1922
artwork_data.at[1, 'acquisitionYear'] = inc(artwork_data.at[1, 'acquisitionYear'])
print(inc(artwork_data.at[1, 'acquisitionYear']))
artwork_data.at[1, 'acquisitionYear'] <= 1922
Guaranteed Immutability
Static Frame
import static_frame as sf
import pandas as pd
artwork_data = pd.read_csv(artwork_data.csv) # Introducing in-memory state
df = sf.Frame.from_pandas(artwork_data)
df['artist': 'year'].head(2)
A Change in Data is New Data
This will always be True
.
print("Original: " + str(df.loc[1, 'acquisitionYear']))
df.assign.loc[1, 'acquisitionYear'](inc(df.loc[1, 'acquisitionYear']))
df.loc[1, 'acquisitionYear'] <= 1922
df
and df_updated
are not equivalent.
print("Original: " + str(df.loc[1, 'acquisitionYear']))
df_updated = \
df.assign.loc[1, 'acquisitionYear'](inc(df.loc[1, 'acquisitionYear']))
df_updated.loc[1, 'acquisitionYear'] <= 1922
Immutability and Declarative Syntax
Writing for
loops and publishing code for copy and paste is a 1970s approach.
Instead:
Mutation: mutable assignments → default to immutability globally
Syntax: imperative statements (
for
) → declarative expressions (Pandas,maps
)Publish computational environments and data.inline_formula not implemented
Thank You
https://nextjournal.com/schmudde/dr-schmoods-notebook
-
Appendix
inline_formula not implementedSpecial thanks to the Dr. Dobb's Archive at 6502.org
inline_formula not implemented≔ in Python 3 allows assignment while also evaluating an expression:
Current:
env_base = os.environ.get("PYTHONUSERBASE", None)
if env_base:
return env_base
Improved:
if env_base := os.environ.get("PYTHONUSERBASE", None):
return env_base
inline_formula not implementedNextjournal makes code runnable by applying these approaches to all levels of the stack. The notebook interface is a departure from IDEs of the past. Furthermore, it's not just the software you write that is immutable, it's the entire computation stack, down to the operating system. The data that drives the code is also content addressed and immutable. This makes running code in the cloud reliable, reproducing results effortless, and cloud collaboration possible.
See Not everything is an expression by Michael Robert Arntzenius for a discussion on syntax classes.