Skip to main content

deepseq for Data

Here's a very short but useful snippet I recently started using to force Data instances deeply:

{-# LANGUAGE Rank2Types #-}
import Data.Functor.Identity
import Data.Generics ( Data(..), GenericT )

gmapT' :: GenericT -> GenericT
gmapT' f x0 = runIdentity (gfoldl k Identity x0)
  where
    k :: Data d => Identity (d->b) -> d -> Identity b
    k (Identity c) x = Identity $! c $! f x -- let the user control strictness in `f`

strictify :: GenericT
strictify = gmapT' strictify

It is almost identical to nonstrict = gmapT nonstrict, or equivalently everywhere id: gmapT' has identical implementation to the default gmapT, except it is strict in f x and c (f x).

The former is especially crucial: f is typically just a recursion of gmapT (in everywhere for example, this is true at all nodes except those singled out by mkT and extT), so strictness here typically forces the rest of the subtree. The latter is important only if the constructor c is somehow lazy, although for most instances of Data it is just the plain constructor of the datatype. f could also construct a separate lazy head (like it does in the branches of everywhere), but then the user has the choice to make f strict as well or not, which is nice.


I use this to achieve what deepseq does for Generic: it's a library that applies seq to nested data structures recursively. It has a default instance for Generic and has been implemented for lots of datatypes in base. While normally most of these representable types in base are both Generic and Data, unfortunately, the GHC AST is still entirely locked in Data. This is especially important for AST analysis, since processing an AST can sometimes leave big thunks on some nodes.

This sees action in hastic, a random program generator, to force blocks of work on the GHC AST. Of particular interest is one block that identifies class-instance pairs and functions from the AST. The threat comes when processing a large project, where the thunk written to the global store is an iteration over the entire set of bindings in the package! Not fun. It was stack overflowing trying to analyze base.