By making sure it avoids success at all costs, Haskell is actually one of the more fun programming languages to use in a real-world project. But the additional expressivity1 you get in Haskell in comparison with more mainstream languages comes at a cost.
While running a course on functional programming and using Haskell for the practical sessions, I have experienced several issues that act as impediments to understanding.
I still think Haskell is one of the best ways to teach functional programming. In any case, the issues below are difficulties in teaching and learning Haskell. Most of them actually make the life of a working Haskell programmer better.
Here they go, in no particular order.
1. The Foldable
type-class
When teaching higher-order functions, I like to also explain the usual suspects: map
, reduce
, and filter
, especially as they are relevant from a historical point of view.
At this point in the lecture, the students are not yet familiar with type-classes. This is fine for map
and filter
, because there are no type-classes to explain:
ghci> :t map
map :: (a -> b) -> [a] -> [b]
ghci> :t filter
filter :: (a -> Bool) -> [a] -> [a]
Haskell does not have a function called reduce
. Instead, there are two functions foldl
and foldr
, which play the role of reduce
(there is also a strict version which I do not worry about just yet). But unfortunately, since a few years ago, these functions work not only on lists, but on any Foldable
type.
ghci> :t foldl
foldl :: Foldable t => (b -> a -> b) -> b -> t a -> b
This is not great at all, because now I have to explain type-classes. To make things worse, Foldable
is a higher-kinded type class.
ghci> :k Foldable
Foldable :: (* -> *) -> Constraint
What I do is to force a specialisation of foldl
for lists and work with that. I then show the signature of the original function and basically say do not worry about this yet. I come back to the issue when discussing type-classes.
2. The Num
type-class
At some point during the first lecture, I like to explain that Haskell is statically typed. Static typing is a defining trait of Haskell. It is one of the most important features that makes it great for learning functional programming.
But the generality that comes with the Num
class in the standard library is not great. I like to explain the :type
command early on, immediately after having some simple Haskell expressions evaluated.
Unfortunately, the response the system gives when asked for the type of a number is overwhelming:
ghci> :t 42
42 :: Num a => a
Understanding the output requires type variables, type constraints and type classes.
I work around the issue by first doing Char
s and Bool
eans.
ghci> :t 'c'
'c' :: Char
ghci> :t True
True :: Bool
I explain static typing by using wrongly typed expressions involving booleans and characters.
ghci> True && 'c'
<interactive>:1:9: error:
• Couldn't match expected type ‘Bool’ with actual type ‘Char’
• In the second argument of ‘(&&)’, namely ‘'c'’
In the expression: True && 'c'
In an equation for ‘it’: it = True && 'c'
This makes it easy to understand the error message.
I then go in more depth into numbers, because the are so ubiquitous, and I explain the three ingredients to some degree (type variables, type constraints, type classes).
ghci> :t 42
42 :: Num a => a
I used to explain typing later in the lectures to avoid this issue. But static typing is such a core part of the language, and students will very likely run into type errors early on, so it makes more sense to approach it early on.
3. Lazy evaluation
In my opinion, lazy evaluation is not a critical aspect of teaching functional programming. But still, Haskell is lazy and there are important consequences of being lazily evaluated, so this aspect must be taught.
I usually simply ignore the evaluation strategy until later on in the lectures. In the early lectures, when I teach equational reasoning, I will tell students a small white lie and pretend that the evaluation strategy is strict. As the class is an introductory class, I intentionally ignore the low-level details of data representation or compilation, such as talking about thunks and whatnot. Instead, I only teach students the abstract model of computation of applying equations left-to-right.
When it comes time to explain lazy evaluation, I start with an experiment. I define a function that executes slowly and show that it is indeed slow:
ghci> let fib x = if x <= 1 then x else fib (x - 1) + fib (x - 2)
ghci> :set +s
ghci> fib 25
75025
(0.13 secs, 57,593,696 bytes)
ghci> fib 30
832040
(1.29 secs, 638,049,152 bytes)
I then define some simple function that only touches its arguments in some particular cases.
g :: Int -> Int -> Int
g x y = if x > 2 then y else 10
I experiment with various arguments based on the earlier defined fib
and we come up with a theory of why certain calls are so fast, even if the arguments to g
should take a while to compute.
ghci> g 10 (fib 30)
832040
(1.30 secs, 638,049,280 bytes)
ghci> g 1 (fib 30)
10
(0.00 secs, 65,120 bytes)
This works well because most of my students have studied programming during high-school, where they were taught that
a call to a function
f(x)
is executed by first evaluating the argumentx
, and then the body of the functionf
, with the value ofx
plugged in.
I then proceed to explain upsides (e.g., can define if-then-else in the standard library, can be more efficient) and downsides (e.g., cannot easily predict running time). I also give several more examples, including examples of infinite data structures.
Students sometimes mistake lazy evaluation for the short-circuiting performed in the logical operators such as &&
and ||
. This also requires a bit of extra explaining.
I also come back to laziness when discussing the lambda calculus and evaluation strategies in the last few lectures.
But I have found over time that lazy evaluation is mostly a distraction in the way of students learning functional programming. It is also really difficult to understand without first understanding lambda calculus. How evaluation actually works remains a gray box for most students at this point.
4. The IO monad
From a software engineering point of view, I find the IO
monad to be absolutely great. The way it enforces the separation of imperative code and pure code is second to none. But it has two downsides: it makes teaching difficult and it makes it more difficult to get started with “real” programs.
When discussing the IO
monad, I essentially refuse to explain what a monad is2, except in the shallowest possible sense. I simply explain how to get the job done. But the contortionistic discussion on essentially imperative functions like putStrLn
actually being pure and returning an IO
action, action which gets executed at some point, gets in the way.
There is also a school of thought that you should start Haskell by teaching the IO
monad first, but I am not convinced: in my experience, if someone gets exposed to IO
early on, they will contaminate all their functions with IO
. They will essentially end up writing Java in Haskell.
There is also the weird interaction between lazy evaluation and IO
, where you can lazily read the contents of a file, only to find that the file has been closed prior to actually processing the contents, which also impedes simplicity.
5. The package/build/compiler management tools
I have written about this before, so I only mention it now. Package/build/compiler management tools, which are great for large projects, are an absolute nuisance when you are getting started. In an ideal world, you would pop open the laptop only to be greeted with an editor and a Haskell REPL. Unfortunately, it is much more involving to set up a development environment…
6. ghci
top-level expressions not first-class
It is unfortunate that the standard top-level does not allow for general expressions or multi-line input, at least not by default. This has gotten better over the last few versions, but you still cannot define multi-equation functions, cannot declare a function, and so on.
Having a first-class top-level would actually improve the ecosystem for both learners and engineers. Ideally, the top-level should simply accept any line you write in a Haskell script, except that it should also evaluate the last expression.
OCaml does much better, especially with alternative REPLs such as utop
. The Python top-level is also good in this respect.
7. Indentation sensitivity
I actually like indentation as a convention for defining blocks, and I think it is a good idea. Unfortunately, it can get in the way of learning.
Typical mistakes include indenting the equations defining a function, like this:
f :: Int -> Int
f 0 = 0
f n = n + f (n - 1)
The error message is not helpful: error: parse error on input ‘=’
.
Another common mistake is to not indent where you actually should do it:
data MyBool = MyFalse | MyTrue
instance Eq MyBool where
(==) MyTrue MyTrue = True
(==) MyFalse MyFalse = True
(==) _ _ = False
The error message here is even worse (there is no error, just a warning saying you have not defined (==)
).
Not sure what could be done here, except to improve the error messages, which is technically difficult.
8. Purism
I actually have mixed feelings about this, but I’m leaving it in for completeness.
Being pure is good in general, even if it does not allow for some imperative patterns which can be useful and sometimes simpler than the equivalent functional code.
Forcing functions to be pure teaches students how to think differently.
But it is also somewhat difficult to code rather simple computations, especially for students coming from imperative backgrounds.
Overall, I’m leaning towards the view that purism is the right kind of pedagogical difficulty, not really a downside.
9. Currying
Currying is a nice, interesting, and useful concept, but it does get in the way of learning Haskell.
This is because, to understand currying, you first need to understand higher-order functions.
But before you understand higher-order functions, you have to understand first-order functions.
But to understand first-order functions, you have to get your hands dirty and code up some first-order functions.
Which means that you end up using curried builtins (such as (+)
, div
, (&&)
, etc.).
You may also want to code up a two-argument first-order function (e.g., gcd
) as an exercise before understanding higher-order functions. This you cannot do in Haskell.
When I teach Haskell, I work around the concept of currying by telling students a white lie, that functions can have more than one argument. For example, I explain that (&&)
is a function taking two booleans into a boolean.
ghci> :t (&&)
(&&) :: Bool → Bool → Bool
Some students are bothered by the lack of symmetry between the first and the second argument in the type of the function (second argument has an →
in front of it, first one does not), but they quickly get over it.
Of course, I later admit to the lie and explain currying in depth when I finally get to higher-order functions.
10. Lack of first-class modules
This has been explained very well in Modules Matter Most, so I will not go in depth here.
Most Haskell programmers will actually not notice the lack of first-class modules, unless they come from a background like OCaml/Standard ML/the Coq proof assistant.
But it can get annoying when you try to do even mildly advanced stuff.
For example, try to use the built-in sort function for lists with two different orderings.
ghci> :t sort
sort :: Ord a => [a] -> [a]
You cannot, because a type can only implement the type class Ord
once. What you have to do is to write your own sort function, which takes a comparison function as an argument. This is exactly what a compiler does for functions with type constraints, so you have to redo the job of the compiler — not exactly ideal.
11. The go pattern
A common pattern in Haskell code is to write helper functions called go
in the where
clause of a definition. Even the standard library uses this pattern. Here is how foldr
is defined in the standard library:
foldr k z = go
where
go [] = z
go (y:ys) = y `k` go ys
There is an issue with this, which actually goes beyond teaching Haskell. It makes the code more difficult to read and understand for two reasons:
The order in which you read the definition is not the logical order. You are relying on a concept which has not been defined yet. This is a well known downside of
where
in general; thelet
construct does this better.But the more pressing problem is that the name
go
is so generic, that it does not mean anything. It is simply a poor alternative to a loop.
And functional programming has better alternatives to a loop. Namely, one of the most important things FP does better than imperative programming is to force programmers to properly name their entities, including looping structures. What would be a nameless loop in an imperative language gets a meaning as a recursive function in a functional language. My favourite example is the naive primality test. You would write it in C like this:
int isprime(int x)
{
for (int i = 2; i < x; ++i)
if (x % i == 0)
return 0;
return 1;
}
Forgive the fact that 1
is (wrongly) considered prime by the function above — I’m trying to make another point.
In Haskell, because there are no loops, you are forced to have a helper function. You have to name the function. What better name than hasDivisors
?
isPrime :: Int -> Bool
isPrime x = not $ hasDivisors x 2 (x - 1)
hasDivisors :: Int -> Int -> Int -> Bool
hasDivisors x l r = if l > r then
False
else if x `mod` l == 0 then
True
else
hasDivisors x (l+1) r
Even if slightly longer than the C code, the intent is now crystal clear. What would be an implicit invariant in the C loop (no divisors between 2
and i-1
) is written out explicitly in the Haskell code.
If you use the go pattern, you end up simply writing C(++) code in Haskell syntax:
isPrime :: Int -> Bool
isPrime x = go 2
where go i = if i > x - 1 then
True
else if x `mod` i == 0 then
False
else
go (i + 1)
To sum up
Overall, Haskell is still a great language for both beginners and advanced users. But no language is perfect (unfortunately); there are always tradeoffs. Be aware of the pedagogical downsides listed above and do your best to avoid the common traps.
Formally, it is not expressivity, but conciseness. Expressivity is about what programs you can write, and conciseness is how many tokens the programs take.
I do go a bit into monads later on, with Maybe
and lists.
I don't see any indentation in the code snippets in this article at all. Has Substack stripped it away?
1.–4. and 9. You're teaching your stuff in an order that doesn't fit the language you're using :) Just re-order when you introduce what. A very refreshing take on that is used in the book https://atypeofprogramming.com/. It may serve as an inspiration. If you ever encounter "white lies" you know that you'll have to think again. For me as a student there's nothing I hate more than that.
Oh and why do you "introduce the λ-Calculus later"? I mean that's the basically the foundation for functional programming, so that should be introduced as early as possible.