How I Teach Algebraic Data Types
In my opinion, if students already know enums in languages such as C or Java, the best way to start is to explain how to write an enum in Haskell1.
I find that the day-of-week datatype is a good2 start3:
data Dow = Mon | Tue | Wed | Thu | Fri | Sat | Sun
Haskell has the unfortunate habit of complaining that values of such types cannot be transformed into strings, in order for them to be printed out by the system. So we have to add the magic incantation deriving Show
to help it out4.
data Dow = Mon | Tue | Wed | Thu | Fri | Sat | Sun deriving Show
I now explain that the individual values Mon
, Tue
, … are called constructors. They are the only possible values of the particular datatype that we have defined.
At this point, I use an analogy, which is IMHO very important: imagine that the constructors are empty envelopes5. Each envelope has the name of the constructor written on it, as in Figure 1.
You can use constructors in pattern matching to write interesting functions, such as the one below6.
nextWorkDay :: Dow → Dow
nextWorkDay Mon = Tue
nextWorkDay Tue = Wed
nextWorkDay Wed = Thu
nextWorkDay Thu = Fri
nextWorkDay Fri = Mon
nextWorkDay Sat = Mon
nextWorkDay Sun = Mon
A function processings Dow
s can be thought of as taking an envelope, inspecting what is written on the envelope, and then applying the corresponding equation (Figure 2).
Because a Dow
is the disjoint union of all its possible values, Dow
is an example of a (disjoint) sum type.
There also exist product types, the simplest example of which is the pair. I define a datatype called Pair
data Pair = P Int Int deriving Show
that can be used to store two integers as a single entity. In product types, the constructor (P
for our type Pair
) is also an envelope labelled by the name of the constructor. But the envelope now also contains space for the arguments to the constructor: in our case, space for two integers (Figure 3). The order of the two integers is important. The two integers do not swap places when the envelope is moved around.
Of course, you can also have more than two items as arguments to some constructor. This is the case, for example, when defining points in space as their 3D coordinates:
data Point = Coord3D Int Int Int deriving Show
A value of type Point
is simply an envelope labelled Coord3D
, where if you open up the envelope, you can find the three values, in the right order (as in Figure 4).
Such datatypes are called product types because the number of values of the product type is basically the product of the number of values of the types of the arguments.
Applying a constructor consists of wrapping up the corresponding values into the envelope:
examplePoint :: Point
examplePoint = Coord3D 7 1 1
Pattern matching in Haskell is then simply opening up the envelope and inspecting its contents. It allows to write interesting functions, such as the one below for retrieving the x
coordinate of a point:
xCoord :: Point → Int
xCoord (Coord3D x _ _) = x
The analogy of constructors-as-envelopes helps clarify two important issues:
constructor application is not like function application: even if the syntax is the same, a function actually makes some sort of computation, whereas the constructor simply wraps its arguments in a single entity;
pattern matching works on constructors, but not functions7, because constructors are so simple: it is easy to retrieve the arguments they were applied on by opening up the envelope.
Being armed with sum types and product types, it is simple to explain algebraic datatypes: they are simply a combination of both, where you make a disjoint union of several product types.
Let us see where we could use such a type, which combines sums and products. Suppose we wanted to write a function for integer division. We could start as follows:
divide :: Int → Int → Int
divide x y = x `div` y
The function above mostly works, but it raises an exception during evaluation if y
is 0
. To say the least, this design of our division function is not great, because the caller cannot become aware of the exceptional behaviour by inspecting the result.
A better way would be to use a special return value to signal the fact that the computation did not work out. But which value to use? Typically people would use a value such as -1
, or maybe some other integer outside of the expected range of the function.
divide :: Int → Int → Int
divide x 0 = -1
divide x y = x `div` y
Unfortunately, such a value could still very well be the valid result of a division. The caller would have no way to distinguish between the following situations:
the computation cannot be performed;
the computation can be performed, but it happens to return
-1
.
Ideally, we need a way to augment the type Int
by a new value, which is used as a marker of failure. To do this, we use the algebraic datatype defined below and depicted graphically in Figure 6.
data Result = Failure | Success Int deriving Show
The constructor Failure
is nullary and it should be used to mark the fact that the computation did not work out (because of a division by 0
in our case). We can imagine Failure
as an empty envelope labelled Failure
. The constructor Success
is unary and it is used to wrap the integer resulting from a successful division in order to transform it into a Result
. We can imagine Success
as an envelope containing an integer.
divide :: Int → Int → Result
divide x 0 = Failure
divide x y = Success (x `div` y)
Now our function is much better designed. The caller gives the function two integers and gets back an envelope of type Result
. It can inspect the envelope it gets back. If it is labelled Failure
, as in Figure 7, it knows that the computation failed.
If the result is an envelope labelled Success
, as in Figure 8, it knows that the computation took place successfully and that it can open the envelope to retrieve the numerical result of the division.
This design works well because the caller must check if the computation was successful in order to retrieve the numerical result — there is no way around it.
I then proceed to define recursive datatypes (using lists). A list works out to be a sequence of nested envelopes. I also explain polymorphic types, using pairs and then lists as examples. I explain that we have reinvented the wheel, and that Bool
, Maybe a
, Either a b
, [a]
are all defined in the standard library. We can simply use those. I do not explain at all how values are represented in memory. IMO, the best way to understand ADTs (and, more generally, functional programming) is at an abstract level, not as the low-level implementation details. Depending on time, I might explain the unit type and also work out an example with arithmetic expressions. Algebraic datatypes are especially useful when doing symbolic computation, such as when writing a compiler or an interpreter. Depending on time available, I might also talk about the expression problem and how designing algebraic datatypes is different from class hierarchies in OOP languages.
I use Haskell, but OCaml or another functional language would work just as well.
In my home country, the week conventionally starts on Monday, not on Sunday, as you might be used to.
Any other reasonable enumeration also works well here, but keep things simple — students should concentrate on ADTs, not the domain of the problem we are solving using ADTs. For example, it helps to stay away from booleans at this point, because students might then wonder if only two values are allowed, how the constructors are encoded, etc.
This is really bad from a pedagogical point of view, but there are not many things that can be done.
I use envelopes, but other analogies that could work equally well for constructors include rooms or boxes.
Of course, there are better ways to write this function, without so much redundancy, but for now this will do.
Pattern matching cannot work on functions in general for several reasons. Imagine a function to multiply three numbers:
mult3 :: Int → Int → Int → Int
mult3 x y z = x * y * z
What would it mean to pattern match on the application of a function? If you could do that (in general), it would allow writing functions such as:
doesNotWork :: Int → Int
doesNotWork (mult3 x _ _) = x
The “function” doesNotWork
would not work out for two reasons:
it would not be deterministic, because there could be several values
x
such thatx = doesNotWork n
; andmore importantly, it is not possible in general to invert a function, or at least to invert it somewhat efficiently. Whereas it is trivial to “invert” a constructor, by opening the envelope.