# Using lambda term statistics to show how hard we try to declutter programs

There are a couple of [a]symmetries in programming that really grip me, lifting my hands off the keyboard and pushing me deep into a mind vortex for some time. I'm especially entranced by the flows of control, data and abstraction it seems. When arguments and returns are treated differently (in a symmetry sense I mean... outside of the fact that they're polar opposites), my evening plans start clearing out. Some examples uphold this ying-yang but in an initially unintuitive way, like co- and contravariance of subtyping rules. Sometimes a dichotomy should be symmtric but is skewed by the forces of developer experience, like the gaping divide in popularity between ReactiveX and InteractiveX. Recently, I've come across another asymmetry of this sort that I thought was interesting enough to study a bit.

## Some background

I've been developing some call graph analysis tools that, notably, takes into consideration the dependencies of function bodies on their arguments. The analysis as a whole is based on a very weak notion of dependency, which takes any two values in the same cluster of function applications to possibly depend on each other. Then value bindings glue all the clusters together.

What's interesting is that, despite the weakness of the dependency rule, there is still interesting structure in the results. In particular, an asymmetry emerges in the dependencies of app clusters at the callsite vs. the implementation site of a name. It is described by an obvious fact: while callers often depend on every part of the callee body (for at least some inputs) the callee often depends on just a fraction of the body of the callee. The latter fact is modularity at work composing subunits into a larger piece of functionality, and giving those subunits only what they need to know.

However, the former point definitely shouldn't be overlooked. It's saying that most functions we write don't have many or any parts that are trivially useless — parts that produce a result then discard it under all inputs. Usually, most things we write in a binding, under some inputs, can affect the output in some way. The common exception is including unused arguments to meet an interface requirement.

Most of the programs we write have some purpose and are driven by a pressure for simplicity and minimalism — in this light, that fact is hardly surprising. What interested me is how true this is of random programs. How strong is this pressure of humans to write clutter-free programs, compared to the chaos of random lambda terms?

## Analysis

We can represent instances of dead code in an untyped lambda calculus very straightforwardly, by applying dead code with a const operator (λ₁λ₂1). More generally, we can seek out unused bindings in all the term bodies of a certain size as a measure of the prevalence of dead code in random terms.

The untyped lambda calculus is an interesting and compact if somewhat tricky thing to study with combinatorics (as every paper I found on the subject likes to point out in their intro). Without even mentioning reductions, the few validity constraints that lambda terms have statically already make them tricky. Lambda terms are binary trees, where applications must have exactly two children, abstractions exactly one child, and variables exactly zero.

Meanwhile, depending on your labelling scheme, variable and abstraction indices may be redundant up to permutation, so if we're not careful, some counts may become dependent on the distributions of numbers of vars and λs.

Bendkowski et al. go through analyses of exactly those distributions in "Statistical properties of lambda terms" (a paper that pulls as few punches as its title). This was the most comprehensive set of results I could find, so I was initially motivated to use their results directly to get at our answer, for example by adding a single λ to terms of size $n$ and comparing against the total number of terms at $n+1$ via Eqn 3.4 (reproduced here):

$$\left|\Lambda_n\right|=C\rho^{-n}n^{-3/2}\\ \rho \approx 0.2956, C \approx 0.6067$$

where $\Lambda_n$ is the set of lambda terms of size $n$. We could then inductively compensate for the expected number of existing dummy λ terms. This was really appealing because their results were asymptotic for $n \rightarrow\infty$ and purely combinatorial.

Unfortunately, as their construction based on de Bruijn indices, adding λs by themselves don't necessarily lead to dummies as we might expect, because the variables in the subterm that were bound to λs above the new one will change bindings. Some of those may bind to the newly-minted λ without creating new dummy λs. For example, the transformation λ(λ1 2)1 → λ(λλ1 2)1 does not generate a dummy term.

This is the right direction though. All $n$-sized terms with dummy λs can be created by adding a λ to an $(n-1)$-sized term that does not change existing bindings or bind an existing open variable. We would just need to avoid results from de Bruijn indices for this construction.

Instead, the path that yielded the first results was to just write a recurrence relation for the number of dummy λs directly. In contrast to the construction of that first paper, which counted the sizes of de Bruijn indices into the total term size, several other teams counted terms with variables as size-1 things, and describe this recurrence relation:

$$T_{0,m} = 0\\ T_{1,m} = m\\ T_{n+1,m} = T_{n,m+1} + T_{1:n,m}*T_{1:n,m}\\$$

where $n$ is the term size, $m$ is the number of free variables, and the recursive step shows the contributions from abstraction (left term) and application (right term), where $*$ is convolution. The relation comes from §2 from "Counting and generating lambda terms" (Grygiel & Lescanne), adjusted so that variables have size 1.

It seems that the number of dummy λs in $\Lambda_{n+1,m}$ would be simply $D_{n+1,m} = nT_{n,m} + D_{n,m}$, where we sequentially add a λ in front of every subterm. Unfortunately this is not the answer: this sequence quickly outstrips the total number of lambdas altogether.

The redundancy comes from collisions between terms that originally differed only by the new dummy terms that were added to each. For example, the transformations λ₁λ₂1 → λ₃λ₁λ₂1 and λ₁λ₂2 → λ₁λ₂λ₃2 produce equivalent terms.

Note that for two terms to be related in this way, they must have dummy λs in the positions the other will place their new one. In fact, all terms of size $n$ with $d$ dummy λs are related to $d$ other terms (excluding itself) in this way, and each of these sets of size $d+1$ produces one unique term. Therefore, redundancies in the $nT_{n,m}$ terms we get from tacking a λ everywhere can be corrected by considering the number of terms with $d$ dummies ($M_{n,m,d}$) with the relation:

$$D_{n+1,m} = n\left[\sum_{d=1}^n{\frac{M_{n,m,d}}{d+1}}\right] + D_{n,m}$$

The construction of $M_{n,m,d}$ is straightforward enough with the same logic as we've used so far, where all $d+1$-dummy terms of size $n+1$ come from $d$-dummy size-$n$ terms:

$$M_{1,m,1} = 0\\ M_{n+1,m,1} = T_{n,m} - \sum_{d=1}^n{M_{n,m,d}}\\ M_{n+1,m,d+1} = \frac{nM_{n,m,d}}{d+1}$$

The one term we don't get as easily is the number of terms without any dummy terms of size $n$. We calculate in the second line by just subtracting out the terms that do have dummy terms from the total count.

This is all we need to get a feeling for trends. First, I wanted to know the ratio of terms with no dummy λs ($1-\sum_{d=1}^nM_{n,m,d}/T_{n,m}$), to get a sense for how "hard" it is to write programs without clutter. We should suspect it to vanish fast with such a strong condition. The first paper (Benkowski et al.) showed that, with their construction, there were only 18% more variables than abstractions on average.

Indeed, it doesn't take long for it to get very small (for closed terms $m=0$):

Interested in the ratio of dummy λs instead? So am I! Here's the recurrence for the total number of λs:

$$L_{0,m} = 0\\ L_{1,m} = 0\\ L_{n+1,m} = 1 + L_{n,m+1} + L_{1:n,m}*L_{1:n,m}\\$$

The ratio $\sum_{d=1}^n {d\cdot M_{n,m,d}}/L_{n,m}$ for closed terms $m=0$:

You and I aren't alone: Jue Wang in their paper Generating Random Lambda Calculus Terms generates random terms and evaluates this exact statistic empirically. This is their plot equivalent to the above: