What is a monad?

This post is my attempt to come to an understanding of monads as they are relevant to the Haskell programming language.

Haskell’s use of monads may be the first unreasonably-difficult-seeming aspect of the language which users will encounter. There is no shortage of monad tutorials online; I suspect that the only way to learn about monads to write such a tutorial oneself.

But Haskell is purely functional at heart, and some kind of tricks must be used to perform basic imperative-style tasks, including IO, error handling, and mutable state. The monad is Haskell’s mathematically elegant solution to imperative-stye programming.

Leading Haskell researcher Simon Peyton Jones has cheekily claimed that Haskell’s organization around monads makes it the “world’s finest imperative programming language”. I won’t comment on this claim, but I will suggest that monads make Haskell the world’s most fascinating imperative programming language.

The focus of this post is the mathematics of monads, though there are plenty of code snippets in this post. I avoid the somewhat complex and opaque IO monad in this post, in favor of exploring simpler monads (especially List and Maybe) more completely.

1. Functors

Recall that a category consists of a class of objects which come with sets of morphisms between them, abstracting the idea of functions between sets. A morphism {f} from {X} to {Y} (written {f:X\rightarrow Y}) can be composed with a morphism {g:Y\rightarrow Z} to obtain a morphism {g\circ f: X\rightarrow Z}. Furthermore, each object {X} has a (necessarily unique) identity morphism which, when composed (on either side) with another morphism (to or from {X}) yields that same morphism.

Examples of categories commonly studied in mathematics are:

  • {\mathbf{Set}}, in which the objects are sets, and the morphisms are functions
  • {\mathbf{Grp}}, of groups with group homomorphisms
  • {\mathbf{Top}}, of topological spaces with continuous functions
  • The above examples are all based on the paradigm “sets with additional structure that is preserved by certain types of functions”; for a rather different example, any preorder (reflexive, transitive relation) on a set can be made a category. One simply declares that there is a unique morphism from {X} to {Y} if and only if {X\leq Y}. Transitivity guarantees morphism composability; reflexivity guarantees the existence of identity morphisms.

In applying the categorical viewpoint to Haskell, we are typically interested in (subcategories of) {\mathbf{Hask}}, whose objects are Haskell types and whose morphisms are the (definable in Haskell) functions from one type to another. {\mathbf{Hask}} is of course seeded with some base types such as Integer and Char, and from the following (non-exhaustive list of) closure properties, we see that {\mathbf{Hask}} is quite rich:

  • If a and b are types, then so is (a,b), the product type consisting of ordered pairs. The functions (morphisms) fst :: (a,b) -> a and snd :: (a,b) -> b select the first and second elements of the pair, respectively, and these morphisms make (a,b) a categorical product of a and b.
  • If a and b are types, then so is a -> b, the type of functions from a to b. Then a -> b is an exponential object in {\mathbf{Hask}}; the obvious operation taking an a -> b and applying it to an a to get a b is Haskell’s infix application operator $ (whose main use in Haskell code is actually just to reduce the number of parentheses the programmer must type).
  • {\mathbf{Hask}} is also closed under “disjoint unions” aka coproduts, via the type Either a b, which admits the morphisms Left :: a -> Either a b and Right :: b -> Either a b.
  • If a is a type, then so is [a], the type of lists of a.

The categorical viewpoint is helpful for identifying patterns in Haskell programs and abstracting their usage into common, reuseable frameworks.

Consider the higher-order function map, which takes a function f of type a -> b and a list of type [a], and generates a list of type [b] by applying f to every a in the original list. That is, map has type (a -> b) -> [a] -> [b].

Haskell automatically curries functions and supports partial application, meaning that map can either be thought of as taking an a -> b and an [a] to a [b], or, more correctly, map takes an a -> b to an [a] -> [b].

This means that the type constructor [] (pronounced “List”) is a functor from {\mathbf{Hask}} to itself. We’ll explain what this means, then give a few more examples from {\mathbf{Hask}}, then move to monads in the next section.

Recall that a (covariant) functor {F} from a category {C} to a category {D} associates to every object {X} in {C} an object {FX} in {D}, and associates to every morphism {f:X\rightarrow Y} in {C} a morphism {Ff:FX\rightarrow FY} in {D} in a “respectful way”. Explicitly, {F}:

  • Takes the identity morphism of {X} in {C} to the identity morphism of {FX} in {D}, and
  • Preserves composition, {F(f\circ g) = (Ff)\circ(Fg)}.

So to check that map makes List a functor on {\mathbf{Hask}}, one simply observes that

  • map applied to id :: a -> a equals id :: [a] -> [a]
  • map (f . g) equals map f . map g where . is Haskell’s infix notation for function composition; both functions apply g and then f to every element of a list.

But now by taking the categorical viewpoint, we see that there are other type constructors than List for which analogues of map can be naturally defined. The simplest example is probably Maybe, defined as

data Maybe a = Nothing | Just a

which is often used to represent either failure (Nothing), or a success with a particular value (such as Just 10). The obvious analogue of map is

maybeMap :: (a -> b) -> Maybe a -> Maybe b
maybeMap f (Just x) = Just (f x)
maybeMap f Nothing = Nothing

which takes the value in Just x “out of its box”, applies f, and “puts it back in the box”. It’s easy to check that maybeMap takes id :: a -> a to id :: Maybe a -> Maybe a and preserves function composition, so Maybe is (somewhat trivially) a functor.

For a less trivial example, one can consider the Hom functors from mathematics. For a fixed object {X} in a category {C}, the mapping {A\mapsto\mathrm{Hom}_C(X,A)} of an object {A} to the set of morphisms {X\rightarrow A} is a functor. A morphism {g:A\rightarrow B} in {C} induces a morphism on Hom sets by composition, {g\circ\cdot:\mathrm{Hom}_C(A,B_1)\rightarrow\mathrm{Hom}_C(A,B_2)}.

In Haskell terminology, ((->) x) (note: the parens around -> indicate that (->) is to be read as a prefix operator) is a functor via

homMap :: (a -> b) -> (x -> a) -> (x -> b)
homMap g f = g . f

Or more breifly, homMap = (.)

In any case, Haskell allows users to define their own functors, provided they specify the appropriate analogue of map. The Functor typeclass (Haskell’s typeclasses are somewhat analogous to Java’s interfaces) declares a single function

class Functor f where
    fmap :: (a -> b) -> f a -> f b

And Maybe is declared an instance of Functor like this:

instance Functor Maybe where
    fmap _ Nothing = Nothing
    fmap f (Just a) = Just (f a)

Note that one can’t algorithmically enforce the “functor laws”, so it’s up to the user to define their functors sensibly.

Basically any container class (e.g. data structures such as trees) can be made into a functor; Either a (with one type parameter free) is also a functor.

2. Monads

In Haskell, monads are used to sequence steps of a computation, while carrying around some extra “context”. Before jumping into the real definitions, we give some fuzzy examples:

  • Maybe is a monad, where the “context” of a computation done in Maybe is that it could fail at any step and give Nothing; thus Maybe is suitable for primitive error handling. Actually Either String (with one parameter free) is also a monad, and more suitable for error handling, as Nothing is less informative than Left "invalid argument to function foo" or the like. We stick to Maybe in these notes for simplicity, but the use of Either String is almost identical.
  • List is a monad, representing non-deterministic computations in whose every branch must be searched out. In fact Haskell’s list comprehensions are just syntactic sugar for monadic-style code, as we’ll see in subsection 2.4.
  • The State monad facilitates the passing of a “mutable state” from one step in a computation to the next. I won’t further discuss State in this post (it’s long enough as is!), but keep in mind that there are several ways to do this without using monads:
    1. Passing state around explicitly as a function parameter
    2. In the special case where data is “accumulated” while traversing a list (analogous to imperative for loops), the state passing can be automatically handled via folds (Haskell’s foldl, foldr, and the countless variations).
  • And of course there’s IO, in which one leaves “pure” computation behind to interact with an “impure” world. While IO code can be written in the same style as other monads, IO “values” are made opaque to the programmer (specifically: no data constructors are exported to pattern match against), so that one cannot easily get back out of the IO context: once a computation has been “tainted” with IO purity can never be recovered! (The same is not true of the other monads mentioned, e.g. one can escape the non-deterministic context of List with a deterministic function like length or null or head.)

2.1. Monads as functors

In category theory, a monad is a functor {M} from a category {C} to itself (i.e. an endofunctor) which comes with two natural transformations {\mathrm{join}:M^2\rightarrow M} and {\mathrm{return}:\mathrm{Id}_C\rightarrow M} satisfying two “coherence conditions” describing their interaction. That {\mathrm{return}} and {\mathrm{join}} are natural transformations means in part that for each object {X} in {C}, there are morphisms {\mathrm{join}_X:MMX\rightarrow MX} and {\mathrm{return}_X:X\rightarrow MX}. Of course, these collections of morphisms are required to obey certain “naturality conditions” guaranteeing that they play nicely with the other morphisms in {C}.

Haskell’s monads are defined slightly differently (equivalently, but reorganized for a different focus), which is why I don’t want to dwell on the above, at least not until subsection 2.5 on the monad laws. But I do want to show why a functor equipped with {\mathrm{join}} and {\mathrm{return}} is an appropriate way to view computations with “context”.

If we view Haskell functors as “containers” that can be mapped over (like List and Maybe), then return takes a piece of data and wraps it in a container, and join takes a doubly wrapped piece of data, and “flattens” the wrapper into one layer. Here’s how this flattening and wrapping would play out for the Maybe functor:

maybeJoin :: Maybe (Maybe a) -> Maybe a
maybeJoin Nothing = Nothing
maybeJoin (Just Nothing) = Nothing
maybeJoin (Just (Just x)) = Just x

maybeReturn :: a -> Maybe a
maybeReturn = Just

A computation can use Maybe in the context of primitive error handling, in which several subcomputations must be performed in sequence, any one of which could fail, causing the overall computation to fail.

An example adapted from O'Sullivan, Stewart, and Goerzen's Real World Haskell involves composing lookup operations, using the result of one as the key for the next. Suppose you've got three Data.Map.Map objects (Haskell's analogues of hash tables, but implemented as purely functional self-balancing binary search trees), respectively associating people's names to phone numbers, phone numbers to service providers, and service providers to billing addresses, and you'd like a function which looks up the billing address of a person's service provider, failing gracefully if any single lookup did.

The type signature of lookup is

Data.Map.lookup :: Ord k => k -> Data.Map.Map k a -> Maybe a

and as you'd expect, it returns Nothing when given a key not in the map; otherwise it returns Just the value associated with the given key. (Note: keys are required to be orderable only because it doesn't otherwise make sense to put them in binary search trees.)

So we have a sequence of functions whose input values are "pure", e.g. a String such as "John Doe" (okay, and a Map), but whose output values are "impure", e.g. Just 1234567890 if that is John Doe's number, and we need to chain them together.

The first lookup goes off without a hitch, because the input "John Doe" is "pure". For the second, we have to use fmap to "get inside" the container and apply lookup. But then we have a type problem. If the phone number 1234567890 associates to the string "TMobil", then the result of the second lookup is the awkward Just (Just "TMobil"); or if 1234567890 has no associated carrier, then the result is Just Nothing.

Either way, we use join (or rather maybeJoin) to flatten the result to prepare it for the next fmap. Roughly speaking, the sequence looks something like this:

          lookup           fmap lookup                  join          fmap lookup 
"John Doe" --> Just 1234567890 --> Just (Just "TMobil") --> Just "TMobil" --> ...

For maximal symmetry, one could even start the sequence with maybeReturn "John Doe", so that every lookup occurs as the argument to fmap, followed by a join.

2.2. From join to bind

Continually writing fmap and join is unwieldy. This is solved in Haskell by the following:

  1. If the main purpose of monads in Haskell is to use fmap and join one after the other, they might as well be combined into the single operation (>>=), pronounced "bind". Haskell's monads actually take (>>=) to be the primitive operation (along with return), understanding that both join and fmap can be derived. The (>>=) formulation has the benefit of brevity.

  2. Haskell further provides syntactic sugar via the do keyword to make monadic code more succinct and more similar to the imperative style code that it often emulates.

The first point will be developed in this subsection; the second is the topic of the next subsection.

The Monad typeclass is defined as follows, but ignore the last two lines for now ((>>) has a default implementation in terms of (>>=), and fail by default calls error which is something we'll want to avoid):

class Monad m where
    (>>=)  :: m a -> (a -> m b) -> m b
    return :: a -> m a
    (>>)   :: m a -> m b -> m b
    fail   :: String -> m a

Of course, just like the "functor laws", there are also laws that proper monads ought to satisfy, analogous to the requirements in the category theoretic definition. We'll postpone this discussion until we've introduced do.

The above definition does not require that monads be functors, but as noted fmap can be derived from return and (>>=). Just arguing from type signatures, one can see that the correct definition of fmap :: (a -> b) -> m a -> m b must be

fmap f = (>>= return . f)

Rather than introduce a name conflict, Haskell calls the above function liftM instead of fmap, though it does follow from the monad laws that the functor laws hold for liftM. It might have been better for Haskell to require monads to be functors in the first place.

Similarly arguing from type signatures, one can see that the correct definition of join :: m (m a) -> m a must be

join = (>>= id)

Here's the Monad instance declaration for Maybe (eliding the definitions of (>>) and fail); it's equivalent to the maybeJoin and maybeReturn above.

instance Monad Maybe where
    (Just x) >>= k = k x
    Nothing >>= _ = Nothing
    return = Just

Before ending this subsection, I want to briefly note that the additional structure provided by (>>=) allows one to "lift" functions of arbitrary arities into the monad e.g.

liftM2 :: (Monad m) => (a1 -> a2 -> b) -> m a1 -> m a2 -> mb

and higher, whereas functors only have fmap to lift functions of one variable. Constants, viewed as zero-variable functions, are even lifted via return! You can read more about lifting here.

2.3. Imperative "do" syntax

We introduce do syntax (an alternative to explicitly writing out (>>=)) by returning to the repeated lookup example from RWH.

I'll leave out the particulars of the type signature, just noting that its final return type must correspond to Maybe Address (leaving the representations of addresses and other data as implementation details). The function we want is:

findBillingAddress name phoneMap providerMap addressMap = do
    phone <- Data.Map.lookup name phoneMap
    provider <- Data.Map.lookup phone providerMap
    Data.Map.lookup provider addressMap

This looks a lot like imperative code, with two intermediate lookups bound to the names phone and provider before the third. In the monadic context, this binding happens, unsurprisingly, via (>>=) (and note that (>>=) is based on function application, which is how names are bound in the lambda calculus). That is, the above definition is syntactic sugar for the longer, messier

findBillingAddress name phoneMap providerMap addressMap =
    Data.Map.lookup name phoneMap
    (\phone -> Data.Map.lookup phone providerMap
               (\provider -> Data.Map.lookup provider addressMap))

The explicit transformation rules used here are:

  • do { v <- m; ...} becomes m >>= (\v -> do { ... }) where m is an expression with monadic type.
  • In the base case, do { m } becomes m.

It may be worth noting that there are two other transformation rules for do:

  • do { let { v = x }; ... } becomes let { v = x } in do { ... } where { v = x } is meant to stand for one or more bindings of "pure" values to variable names.
  • do { m; ... } becomes m >>= (\_ -> do { ... }).

Note the similarity between the first and final rules. The only difference is that the actual value of m in the last rule is ignored in the subsequent (>>=). This rule corresponds to the common practice in imperative programming of evaluating expressions for their side effects only, for instance calling ++i without binding its value to another variable in the C programming language.

In Haskell's Maybe monad, this last form can be used as a guard: if mx evaluates to Nothing, then the whole do block evaluates to Nothing; otherwise evaluation of the do block proceeds as normal.

Actually, the last rule, as I've stated it, is not quite correct. The real rule is that do { m; ... } becomes m >> do { ... }, where (>>) has the default implementation

m >> k = m >>= \_ -> k

The implementation can be overridden for user-defined monads, but one should only do this for efficiency reasons: one should not change the functional behavior of (>>).

2.4. Lists and MonadPlus

List was our first example of a functor; it is also quite a nice example of a monad. It's fairly easy to figure out what join and return have to be, just from their type signatures:

listJoin :: [[a]] -> [a]
listJoin = concat

listReturn :: a -> [a]
listReturn x = [x]

The idea of course is that concat appropriately "flattens" a list of lists, and the creation of singleton lists is the most obvious way to "wrap" data in a list container.

Then we also know what (>>=) must be, in terms of join and fmap (or in the case of List, in terms of map)

listBind :: [a] -> (a -> [b]) -> [b]
listBind lst f = concatMap f lst

where the builtin concatMap does exactly what you would think in terms of concat and map.

Indeed, the Monad instance declaration (eliding (>>) and fail as before) is

instance Monad [] where
    m >>= k = foldr ((++) . k) [] m
    return x = [x]

which amounts to the same thing.

As a first example of a "non-deterministic computation" done in the List monad, consider the following very simple function which takes the Cartesian product of two lists

cartesianProduct :: [a] -> [b] -> [(a,b)]
cartesianProduct xs ys = do
    x <- xs
    y <- ys
    return (x,y)

The monadic version above works by considering the function which for each x maps the function (\y -> [(x,y)]) over ys and flattens the result, then maps that function over xs and flattens the result.

There are, of course, other ways of writing cartesianProduct, one of which is via list comprehension

cartesianProduct xs ys = [(x,y) | x <- xs, y <- ys]

This looks very similar to the do version above! In fact list comprehension is only syntactic sugar for the monadic code, and apparently the original version of Haskell supported comprehension syntax for arbitrary monads.

I'll also note that a very terse point-free definition of cartesianProduct is possible via lifting:

cartesianProduct = liftM2 (,)

But there's one ingredient in list comprehensions which I haven't mentioned, and that's the use of boolean guards to filter the results. As a contrived example, suppose I wanted a "checkered product", that would only accept lists of integers and only output those tuples with even sum,

checkeredProduct :: Integral a => [a] -> [a] -> [(a,a)]
checkeredProduct xs ys = [(x,y) | x <- xs, y <- ys, even (x+y)]

For instance checkeredProduct [1..8] [1..8] could represent the list of black squares on a chessboard. When I claim that list comprehension is syntactic sugar for monadic code, I have to say how such guards are translated to do blocks. Well, one can rewrite checkeredProduct like so:

checkeredProduct xs ys = do
    x <- xs
    y <- ys
    if even (x+y) then [()] else []
    return (x,y)

The point is that if the test even (x+y) evaluates True, then the computation continues, mapping over a list of length 1 whose element () we otherwise ignore (Cf. the fourth transformation rule for do, treating expressions not bound to names via <-). If the test evaluates False, we map over a list of length 0, killing (that branch of) the computation.

This solution feels ad hoc, but it's not the end of the story. I've mentioned that something analogous can be done in the Maybe monad, using an expression without bothering to bind it to a name to determine whether the computation should continue normally or yield Nothing right away.

If I wish to check a boolean condition test in the middle of a Maybe block in this way, I could insert a line like the following:

if test then Just () else Nothing

The similarity of the above line to the list version

if test then [()] else []

was of course noticed by the authors of Haskell. In fact, with mzero standing for [] in the List context and Nothing in the Maybe context, both lines follow the pattern

if test then return () else mzero

The relevant abstraction here is the MonadPlus typeclass which must come with a "zero object" satisfying the "computation killing" condition that mzero >>= f is mzero for any f. And then the function guard :: MonadPlus m => Bool -> m () does exactly what we want in the if test ... line above.

(There's actually some dispute about what exactly should go into a MonadPlus, i.e. whether mzero is enough, or whether it should additionally have monoid structure, that is, an associative binary operation mplus treating mzero as the identity element. List and Maybe are the only instances of MonadPlus declared by default, and they do satisfy the additional monoid axioms.)

I want to conclude the subsection by sharing one slightly non-trivial piece of code in the List monad, as well as its (almost direct!) translation to Python, an exemplary imperative language. What follows is a solution to the n queens puzzle, in which one must find all possible placements of n queens onto an n by n chessboard such that no queen can attack another.

In the example below, a placement of queens will be given as a permutation of the range [1..n], with the understanding that if j is the i-th element of the permuation, then there is a queen at row i column j (clearly no two queens can occupy the same row).

The solution below maps over all "partial solutions" (defined recursively), and all possible placements of the next queen, guarding against any conflicts (queens in the same column or same diagonal). Thus a do block felt perfectly suited to this problem, whereas I don't like using long lines (the complex guard condition--which could incidentally be rewritten with its own do block) inside list comprehensions.

(Extremely terse solutions are also possible at a higher level of abstraction; this code does the same thing via monadic folding in essentially three lines.)

For the clear connection to imperative programming: Python admits a very close translation of the above code thanks to the yield keyword's ability to build generators inside a for loop.

The only real hiccup in the translation is the unfortunate nesting in lines 5-8; thus list (or generator) comprehension style might have been more appropriate in Python. (Or maybe not: for reasons I don't understand, the Google style guide forbids the use of comprehensions to flatten deeply nested code.) Alternately, two levels of indentation could have been eliminated by

  1. defining solve_first_rows outside of n_queens, and
  2. explicitly raising the StopIteration exception in the first if branch so that the else: on line 5 and its indentation would be redundant.

2.5. Monad laws

Finally, I want to go over the monad laws; specifically what they mean and why they are reasonable laws.

In previous sections, I've avoided using a symbol for function equality/equivalence, because the relation isn't computable, and Haskell raises a type error if you try to compare two functions for equality with (==). In this subsection I'll use the triple equals symbol (===) as if it were valid Haskell.

As we've noted before, there are corresponding laws for monads as they are usually defined in category theory (with join and fmap instead of (>>=)). That such laws are equivalent to Haskell's monad laws is an amusing exercise in symbol pushing, but it takes us farther afield than I want to go here; I've uploaded a proof to Google drive.

Here are Haskell's monad laws:

Left identity: return a >>= f  ===  f a
Right identity: m >>= return  ===  m
Associativity: (m >>= f) >>= g  ===  m >>= (\x -> f x >>= g)

We'll translate these laws into do blocks. First, recall the relevant rules for desugaring do into (>>=):

do { v <- m; ... }  ===  m >>= (\v -> do { ... })
do { m }  ===  m

Then left identity says that

     v <- return a
     f v)                 ===  f a

because the do block desugars into (something equivalent to) return a >>= f. Meanwhile right identity says that

     v <- m
     return v)              ===  m

In do notation, both identities look pretty natural. In some sense, the two identity laws just say that return neither adds nor removes any information, just faithfully wraps it in some kind of container (I believe the mathematically rigorous notion in the background is that of adjoint functors, but I'm getting out of my depth here).

Consider the List monad, where we (correctly) guessed that return x should equal [x], based on the necessary type signature a -> [a]. Of course there are other functions with this signature, for instance

return1 x = []
return2 x = [x,x]
return3 x = repeat x

(Actually repeat is a particularly interesting function in that it returns an infinite list with one element repeated indefinitely. Haskell's lazy semantics only evaluates the elements of a list "by need", so has no problem with infinite lists; Python's generators are somewhat analogous.)

But none of these alternatives to return feel like the "right" way to wrap a piece of data. In fact, all three fail both identity laws.

The third law is associativity. It's only slightly harder to come up with do blocks which desugar to the two sides of the associativity identity:

    y <- do
        x <- m
        f x
    g y

desugars to (m >>= f) >>= g, and

    x <- m
        y <- f x
        g y

desugars to m >>= (\x -> f x >>= g).

Note that the following flat do block also desugars to m >>= (\x -> f x >>= g)

    x <- m
    y <- f x
    g y

Thus the content of the associativity law is that the first nested do block with the slightly different grouping is also equivalent to the flat block.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s