Property-based testing of higher-order functions

Property-based testing and functional programming are friends, because they’re both motivated by Reasoning About Code. Functional programming is about keeping your functions data-in, data-out so that you can reason about them. Property-based testing is about expressing the conclusions of that reasoning as properties, then showing that they’re (probably) true by testing them with hundreds of different input values.

example:
Person: “By my powers of reasoning, I can tell that this code will always produce an output collection longer than the input collection.”

Test: “For any generated collection listypoo, this function will return a collection of size > listypoo.size.”

Testing Framework: “I will generate 100 different collections, throw them at that test, and try to make it fail.”

Property-based testing gets tricky when I try to combine it with another aspect of functional programming: higher-order functions. If the function under test accepts a function as a parameter, how do I generate that? If it returns a function, how do I validate it’s the right one?

In the trivial case, we pass in the identity function or a constant function, and base our expectations on that. But that violates the spirit of generative testing.

Test operate on values. Values go in, values go out and we can check those. Functions are tested by operating on values. So to test functions that operate on functions, we need to generate functions that operate on values, and generate values to test the functions that came out of the function under test. Then we’ll need to state properties in terms of values going in and out of functions going in and out… ack! It’s Higher-Order-Property-Based Testing.

My sources on twitter tell me that QuickCheck for Erlang and Haskell have generators for functions. Looking at this sample of the Haskell one, I think this is what it does:

Each generated function from A => B includes a random factor that’s constant to the function, and a generator for B that can create Bs based on a seed. When an A is passed in, it’s combined with the function’s random factor to prod the generator into creating a deterministically-random B.

So, we can create random functions A=>B if we have a generator of B, a random-factor-generator, and some mapping from (A, random-factor) to the seed for the generator of B. That doesn’t sound so bad.

There are two other functionalities in good generators. When a test fails, we need enough information to figure out why and fix the bug. One tool for this is shrinking: make the generated input simpler and simpler until the test doesn’t fail anymore, and then report the simplest failing case. I haven’t even started thinking about that in function-generation. (Does Haskell or Erlang do it?)
The second tool for fixing failed tests is simpler: print out the input that revealed the bug. That’s also hard with generated functions. Since the generated functions described above are random mappings from input to output, printing the actual input-output mapping used in the failing property evaluation should be sufficient. This is implemented in Haskell, with no state, so it must be possible in Scala and Erlang. And it’s in F#.

Like everything in property-based testing, generic input generation is the easy part (even when it isn’t easy). Describing properties expected of output functions based on input functions – now that sounds hard. When I write a test that does it, that’ll be worth another post.

Thanks to @reiddraper, @ericnormand, @silentbicycle, @natpryce, @deech, @kot_2010 and everyone who chimed in. Please continue.