Tracing values with state

The development of ra.hs has called for an interesting task in the domain of symbolic programming. ra.hs is a Haskell linter for reactive programs, the first implementation I'm undertaking to identify possible race conditions in reactive code. The violation to look for is a subtle one: if two threads consume the same stream and read and write these values to the same shared memory, the ordering rules of the stream might be violated ^{[citation to come]}. To decide this, it's necessary to trace the flow of values through the program to see which stream they come from, if any.

The degree of sophistication of analysis against the program could be incredibly varied, bounded above only by the undecideability of the halting problem (fun fact: with probability 1, halting can actually be decided). I believe that it's meaningfully bounded below by a type analysis, where the conditional nature of branches is ignored and all values that are the correct type that could flow to a call site are considered. This allows for a perfectly sensitive result that sacrifices specificity (i.e. it may report many false positives).

Even in this much weaker case, there is plenty of room for things to get gnarly. The subject of this post is the massive rift between the stateful and the pure. The difference is best explained by thinking about tracing values.

Pure

At first, the knee-jerk way to approach this problem seems to be a sensible one: run a graph search on the call graph starting from the entry point. That way, we traverse the pure code along all path that any execution of the program would take. Notably then, our analyzer's stack is homomorphic to any program's stack in the pure code. In pure code, we know that at a given stack frame, all references are either in the stack below it, or as immutable bound variables in the function scope. This allows us to make the simple but powerful assertion that:

All expressions that could be available to the stack frame can be known the first time we arrive at that stack frame by call graph traversal.

This means that we can traverse the pure call graph exactly once and deduce all the possible values at every call site and binding in WHNF (except for results from magic and unknown functions). That's just plain nice.

Stateful

The problem with state is precisely what differentiates it from pure code: variability in function behavior is no longer only due to variability in the stack. Function calls at any depth can modify behavior of functions in faraway stack frames, and so collecting all the expressions that a variable could represent needs to be delayed for unknown amounts of time while the analyzer searches the call graph, identifies all writes to a mutable reference, then is finally able to reduce their sources to normal form.

If the program consists only of the most basic of mutable variables — the ones that hold immutable zeroth-order values (i.e. not functions and not mutable variables) — we can reduce the values it can take to WHNF once we know all the write calls to them because the value written traces back either to a value from pure land (immutable values), or to a read from a known mutable variable.

The fun really starts with mutable variables of mutable variables (e.g. IORef (IORef a)) which are really common with Reactive. Just look at mergeMap :: (a -> Supplier b) -> Consumer b -> Consumer a. When the mutable variable itself is unknown, the order to discover values is not straightforward.

This is where the project stands as of today (July 1 🇨): trying to figure out meta-mutable variable.