This post is my attempt to come to an understanding of monads as they are relevant to the Haskell programming language.
Haskell’s use of monads may be the first unreasonablydifficultseeming aspect of the language which users will encounter. There is no shortage of monad tutorials online; I suspect that the only way to learn about monads to write such a tutorial oneself.
But Haskell is purely functional at heart, and some kind of tricks must be used to perform basic imperativestyle tasks, including IO, error handling, and mutable state. The monad is Haskell’s mathematically elegant solution to imperativestye programming.
Leading Haskell researcher Simon Peyton Jones has cheekily claimed that Haskell’s organization around monads makes it the “world’s ﬁnest imperative programming language”. I won’t comment on this claim, but I will suggest that monads make Haskell the world’s most fascinating imperative programming language.
The focus of this post is the mathematics of monads, though there are plenty of code snippets in this post. I avoid the somewhat complex and opaque IO
monad in this post, in favor of exploring simpler monads (especially List
and Maybe
) more completely.
1. Functors
Recall that a category consists of a class of objects which come with sets of morphisms between them, abstracting the idea of functions between sets. A morphism from to (written ) can be composed with a morphism to obtain a morphism . Furthermore, each object has a (necessarily unique) identity morphism which, when composed (on either side) with another morphism (to or from ) yields that same morphism.
Examples of categories commonly studied in mathematics are:
 , in which the objects are sets, and the morphisms are functions
 , of groups with group homomorphisms
 , of topological spaces with continuous functions
 The above examples are all based on the paradigm “sets with additional structure that is preserved by certain types of functions”; for a rather different example, any preorder (reflexive, transitive relation) on a set can be made a category. One simply declares that there is a unique morphism from to if and only if . Transitivity guarantees morphism composability; reflexivity guarantees the existence of identity morphisms.
In applying the categorical viewpoint to Haskell, we are typically interested in (subcategories of) , whose objects are Haskell types and whose morphisms are the (definable in Haskell) functions from one type to another. is of course seeded with some base types such as Integer
and Char
, and from the following (nonexhaustive list of) closure properties, we see that is quite rich:
 If
a
andb
are types, then so is(a,b)
, the product type consisting of ordered pairs. The functions (morphisms)fst :: (a,b) > a
andsnd :: (a,b) > b
select the first and second elements of the pair, respectively, and these morphisms make(a,b)
a categorical product ofa
andb
.  If
a
andb
are types, then so isa > b
, the type of functions froma
tob
. Thena > b
is an exponential object in ; the obvious operation taking ana > b
and applying it to ana
to get ab
is Haskell’s infix application operator$
(whose main use in Haskell code is actually just to reduce the number of parentheses the programmer must type).  is also closed under “disjoint unions” aka coproduts, via the type
Either a b
, which admits the morphismsLeft :: a > Either a b
andRight :: b > Either a b
.  If
a
is a type, then so is[a]
, the type of lists ofa
.
The categorical viewpoint is helpful for identifying patterns in Haskell programs and abstracting their usage into common, reuseable frameworks.
Consider the higherorder function map
, which takes a function f
of type a > b
and a list of type [a]
, and generates a list of type [b]
by applying f
to every a
in the original list. That is, map
has type (a > b) > [a] > [b]
.
Haskell automatically curries functions and supports partial application, meaning that map
can either be thought of as taking an a > b
and an [a]
to a [b]
, or, more correctly, map
takes an a > b
to an [a] > [b]
.
This means that the type constructor []
(pronounced “List”) is a functor from to itself. We’ll explain what this means, then give a few more examples from , then move to monads in the next section.
Recall that a (covariant) functor from a category to a category associates to every object in an object in , and associates to every morphism in a morphism in in a “respectful way”. Explicitly, :
 Takes the identity morphism of in to the identity morphism of in , and
 Preserves composition, .
So to check that map
makes List
a functor on , one simply observes that

map
applied toid :: a > a
equalsid :: [a] > [a]

map (f . g)
equalsmap f . map g
where.
is Haskell’s infix notation for function composition; both functions applyg
and thenf
to every element of a list.
But now by taking the categorical viewpoint, we see that there are other type constructors than List
for which analogues of map
can be naturally defined. The simplest example is probably Maybe
, defined as
data Maybe a = Nothing  Just a
which is often used to represent either failure (Nothing
), or a success with a particular value (such as Just 10
). The obvious analogue of map
is
maybeMap :: (a > b) > Maybe a > Maybe b
maybeMap f (Just x) = Just (f x)
maybeMap f Nothing = Nothing
which takes the value in Just x
“out of its box”, applies f
, and “puts it back in the box”. It’s easy to check that maybeMap
takes id :: a > a
to id :: Maybe a > Maybe a
and preserves function composition, so Maybe
is (somewhat trivially) a functor.
For a less trivial example, one can consider the Hom functors from mathematics. For a fixed object in a category , the mapping of an object to the set of morphisms is a functor. A morphism in induces a morphism on Hom sets by composition, .
In Haskell terminology, ((>) x)
(note: the parens around >
indicate that (>)
is to be read as a prefix operator) is a functor via
homMap :: (a > b) > (x > a) > (x > b)
homMap g f = g . f
Or more breifly, homMap = (.)
In any case, Haskell allows users to define their own functors, provided they specify the appropriate analogue of map
. The Functor
typeclass (Haskell’s typeclasses are somewhat analogous to Java’s interfaces) declares a single function
class Functor f where
fmap :: (a > b) > f a > f b
And Maybe
is declared an instance of Functor like this:
instance Functor Maybe where
fmap _ Nothing = Nothing
fmap f (Just a) = Just (f a)
Note that one can’t algorithmically enforce the “functor laws”, so it’s up to the user to define their functors sensibly.
Basically any container class (e.g. data structures such as trees) can be made into a functor; Either a
(with one type parameter free) is also a functor.
2. Monads
In Haskell, monads are used to sequence steps of a computation, while carrying around some extra “context”. Before jumping into the real definitions, we give some fuzzy examples:

Maybe
is a monad, where the “context” of a computation done inMaybe
is that it could fail at any step and giveNothing
; thusMaybe
is suitable for primitive error handling. ActuallyEither String
(with one parameter free) is also a monad, and more suitable for error handling, asNothing
is less informative thanLeft "invalid argument to function foo"
or the like. We stick toMaybe
in these notes for simplicity, but the use ofEither String
is almost identical. 
List
is a monad, representing nondeterministic computations in whose every branch must be searched out. In fact Haskell’s list comprehensions are just syntactic sugar for monadicstyle code, as we’ll see in subsection 2.4.  The
State
monad facilitates the passing of a “mutable state” from one step in a computation to the next. I won’t further discussState
in this post (it’s long enough as is!), but keep in mind that there are several ways to do this without using monads:  And of course there’s
IO
, in which one leaves “pure” computation behind to interact with an “impure” world. WhileIO
code can be written in the same style as other monads,IO
“values” are made opaque to the programmer (specifically: no data constructors are exported to pattern match against), so that one cannot easily get back out of theIO
context: once a computation has been “tainted” withIO
purity can never be recovered! (The same is not true of the other monads mentioned, e.g. one can escape the nondeterministic context ofList
with a deterministic function likelength
ornull
orhead
.)
2.1. Monads as functors
In category theory, a monad is a functor from a category to itself (i.e. an endofunctor) which comes with two natural transformations and satisfying two “coherence conditions” describing their interaction. That and are natural transformations means in part that for each object in , there are morphisms and . Of course, these collections of morphisms are required to obey certain “naturality conditions” guaranteeing that they play nicely with the other morphisms in .
Haskell’s monads are defined slightly differently (equivalently, but reorganized for a different focus), which is why I don’t want to dwell on the above, at least not until subsection 2.5 on the monad laws. But I do want to show why a functor equipped with and is an appropriate way to view computations with “context”.
If we view Haskell functors as “containers” that can be mapped over (like List
and Maybe
), then return
takes a piece of data and wraps it in a container, and join
takes a doubly wrapped piece of data, and “flattens” the wrapper into one layer. Here’s how this flattening and wrapping would play out for the Maybe
functor:
maybeJoin :: Maybe (Maybe a) > Maybe a
maybeJoin Nothing = Nothing
maybeJoin (Just Nothing) = Nothing
maybeJoin (Just (Just x)) = Just x
maybeReturn :: a > Maybe a
maybeReturn = Just
A computation can use Maybe
in the context of primitive error handling, in which several subcomputations must be performed in sequence, any one of which could fail, causing the overall computation to fail.
An example adapted from O'Sullivan, Stewart, and Goerzen's Real World Haskell involves composing lookup
operations, using the result of one as the key for the next. Suppose you've got three Data.Map.Map
objects (Haskell's analogues of hash tables, but implemented as purely functional selfbalancing binary search trees), respectively associating people's names to phone numbers, phone numbers to service providers, and service providers to billing addresses, and you'd like a function which looks up the billing address of a person's service provider, failing gracefully if any single lookup
did.
The type signature of lookup
is
Data.Map.lookup :: Ord k => k > Data.Map.Map k a > Maybe a
and as you'd expect, it returns Nothing
when given a key not in the map; otherwise it returns Just
the value associated with the given key. (Note: keys are required to be orderable only because it doesn't otherwise make sense to put them in binary search trees.)
So we have a sequence of functions whose input values are "pure", e.g. a String
such as "John Doe"
(okay, and a Map
), but whose output values are "impure", e.g. Just 1234567890
if that is John Doe's number, and we need to chain them together.
The first lookup
goes off without a hitch, because the input "John Doe"
is "pure". For the second, we have to use fmap
to "get inside" the container and apply lookup
. But then we have a type problem. If the phone number 1234567890
associates to the string "TMobil"
, then the result of the second lookup
is the awkward Just (Just "TMobil")
; or if 1234567890
has no associated carrier, then the result is Just Nothing
.
Either way, we use join
(or rather maybeJoin
) to flatten the result to prepare it for the next fmap
. Roughly speaking, the sequence looks something like this:
lookup fmap lookup join fmap lookup
"John Doe" > Just 1234567890 > Just (Just "TMobil") > Just "TMobil" > ...
For maximal symmetry, one could even start the sequence with maybeReturn "John Doe"
, so that every lookup
occurs as the argument to fmap
, followed by a join
.
2.2. From join to bind
Continually writing fmap
and join
is unwieldy. This is solved in Haskell by the following:

If the main purpose of monads in Haskell is to use
fmap
andjoin
one after the other, they might as well be combined into the single operation(>>=)
, pronounced "bind". Haskell's monads actually take(>>=)
to be the primitive operation (along withreturn
), understanding that bothjoin
andfmap
can be derived. The(>>=)
formulation has the benefit of brevity.  Haskell further provides syntactic sugar via the
do
keyword to make monadic code more succinct and more similar to the imperative style code that it often emulates.
The first point will be developed in this subsection; the second is the topic of the next subsection.
The Monad
typeclass is defined as follows, but ignore the last two lines for now ((>>)
has a default implementation in terms of (>>=)
, and fail
by default calls error
which is something we'll want to avoid):
class Monad m where
(>>=) :: m a > (a > m b) > m b
return :: a > m a
(>>) :: m a > m b > m b
fail :: String > m a
Of course, just like the "functor laws", there are also laws that proper monads ought to satisfy, analogous to the requirements in the category theoretic definition. We'll postpone this discussion until we've introduced do
.
The above definition does not require that monads be functors, but as noted fmap
can be derived from return
and (>>=)
. Just arguing from type signatures, one can see that the correct definition of fmap :: (a > b) > m a > m b
must be
fmap f = (>>= return . f)
Rather than introduce a name conflict, Haskell calls the above function liftM
instead of fmap
, though it does follow from the monad laws that the functor laws hold for liftM
. It might have been better for Haskell to require monads to be functors in the first place.
Similarly arguing from type signatures, one can see that the correct definition of join :: m (m a) > m a
must be
join = (>>= id)
Here's the Monad
instance declaration for Maybe
(eliding the definitions of (>>)
and fail
); it's equivalent to the maybeJoin
and maybeReturn
above.
instance Monad Maybe where
(Just x) >>= k = k x
Nothing >>= _ = Nothing
return = Just
Before ending this subsection, I want to briefly note that the additional structure provided by (>>=)
allows one to "lift" functions of arbitrary arities into the monad e.g.
liftM2 :: (Monad m) => (a1 > a2 > b) > m a1 > m a2 > mb
and higher, whereas functors only have fmap
to lift functions of one variable. Constants, viewed as zerovariable functions, are even lifted via return
! You can read more about lifting here.
2.3. Imperative "do" syntax
We introduce do
syntax (an alternative to explicitly writing out (>>=)
) by returning to the repeated lookup
example from RWH.
I'll leave out the particulars of the type signature, just noting that its final return type must correspond to Maybe Address
(leaving the representations of addresses and other data as implementation details). The function we want is:
findBillingAddress name phoneMap providerMap addressMap = do
phone < Data.Map.lookup name phoneMap
provider < Data.Map.lookup phone providerMap
Data.Map.lookup provider addressMap
This looks a lot like imperative code, with two intermediate lookup
s bound to the names phone
and provider
before the third. In the monadic context, this binding happens, unsurprisingly, via (>>=)
(and note that (>>=)
is based on function application, which is how names are bound in the lambda calculus). That is, the above definition is syntactic sugar for the longer, messier
findBillingAddress name phoneMap providerMap addressMap =
Data.Map.lookup name phoneMap
>>=
(\phone > Data.Map.lookup phone providerMap
>>=
(\provider > Data.Map.lookup provider addressMap))
The explicit transformation rules used here are:

do { v < m; ...}
becomesm >>= (\v > do { ... })
wherem
is an expression with monadic type.  In the base case,
do { m }
becomesm
.
It may be worth noting that there are two other transformation rules for do
:
do { let { v = x }; ... }
becomeslet { v = x } in do { ... }
where{ v = x }
is meant to stand for one or more bindings of "pure" values to variable names.do { m; ... }
becomesm >>= (\_ > do { ... })
.
Note the similarity between the first and final rules. The only difference is that the actual value of m
in the last rule is ignored in the subsequent (>>=)
. This rule corresponds to the common practice in imperative programming of evaluating expressions for their side effects only, for instance calling ++i
without binding its value to another variable in the C programming language.
In Haskell's Maybe
monad, this last form can be used as a guard: if mx
evaluates to Nothing
, then the whole do
block evaluates to Nothing
; otherwise evaluation of the do
block proceeds as normal.
Actually, the last rule, as I've stated it, is not quite correct. The real rule is that do { m; ... }
becomes m >> do { ... }
, where (>>)
has the default implementation
m >> k = m >>= \_ > k
The implementation can be overridden for userdefined monads, but one should only do this for efficiency reasons: one should not change the functional behavior of (>>)
.
2.4. Lists and MonadPlus
List
was our first example of a functor; it is also quite a nice example of a monad. It's fairly easy to figure out what join
and return
have to be, just from their type signatures:
listJoin :: [[a]] > [a]
listJoin = concat
listReturn :: a > [a]
listReturn x = [x]
The idea of course is that concat
appropriately "flattens" a list of lists, and the creation of singleton lists is the most obvious way to "wrap" data in a list container.
Then we also know what (>>=)
must be, in terms of join
and fmap
(or in the case of List
, in terms of map
)
listBind :: [a] > (a > [b]) > [b]
listBind lst f = concatMap f lst
where the builtin concatMap
does exactly what you would think in terms of concat
and map
.
Indeed, the Monad
instance declaration (eliding (>>)
and fail
as before) is
instance Monad [] where
m >>= k = foldr ((++) . k) [] m
return x = [x]
which amounts to the same thing.
As a first example of a "nondeterministic computation" done in the List
monad, consider the following very simple function which takes the Cartesian product of two lists
cartesianProduct :: [a] > [b] > [(a,b)]
cartesianProduct xs ys = do
x < xs
y < ys
return (x,y)
The monadic version above works by considering the function which for each x
maps the function (\y > [(x,y)])
over ys
and flattens the result, then maps that function over xs
and flattens the result.
There are, of course, other ways of writing cartesianProduct
, one of which is via list comprehension
cartesianProduct xs ys = [(x,y)  x < xs, y < ys]
This looks very similar to the do
version above! In fact list comprehension is only syntactic sugar for the monadic code, and apparently the original version of Haskell supported comprehension syntax for arbitrary monads.
I'll also note that a very terse pointfree definition of cartesianProduct
is possible via lifting:
cartesianProduct = liftM2 (,)
But there's one ingredient in list comprehensions which I haven't mentioned, and that's the use of boolean guards to filter the results. As a contrived example, suppose I wanted a "checkered product", that would only accept lists of integers and only output those tuples with even sum,
checkeredProduct :: Integral a => [a] > [a] > [(a,a)]
checkeredProduct xs ys = [(x,y)  x < xs, y < ys, even (x+y)]
For instance checkeredProduct [1..8] [1..8]
could represent the list of black squares on a chessboard. When I claim that list comprehension is syntactic sugar for monadic code, I have to say how such guards are translated to do
blocks. Well, one can rewrite checkeredProduct
like so:
checkeredProduct xs ys = do
x < xs
y < ys
if even (x+y) then [()] else []
return (x,y)
The point is that if the test even (x+y)
evaluates True
, then the computation continues, mapping over a list of length 1 whose element ()
we otherwise ignore (Cf. the fourth transformation rule for do
, treating expressions not bound to names via <
). If the test evaluates False
, we map over a list of length 0, killing (that branch of) the computation.
This solution feels ad hoc, but it's not the end of the story. I've mentioned that something analogous can be done in the Maybe
monad, using an expression without bothering to bind it to a name to determine whether the computation should continue normally or yield Nothing
right away.
If I wish to check a boolean condition test
in the middle of a Maybe
block in this way, I could insert a line like the following:
if test then Just () else Nothing
The similarity of the above line to the list version
if test then [()] else []
was of course noticed by the authors of Haskell. In fact, with mzero
standing for []
in the List
context and Nothing
in the Maybe
context, both lines follow the pattern
if test then return () else mzero
The relevant abstraction here is the MonadPlus
typeclass which must come with a "zero object" satisfying the "computation killing" condition that mzero >>= f
is mzero
for any f
. And then the function guard :: MonadPlus m => Bool > m ()
does exactly what we want in the if test ...
line above.
(There's actually some dispute about what exactly should go into a MonadPlus
, i.e. whether mzero
is enough, or whether it should additionally have monoid structure, that is, an associative binary operation mplus
treating mzero
as the identity element. List
and Maybe
are the only instances of MonadPlus
declared by default, and they do satisfy the additional monoid axioms.)
I want to conclude the subsection by sharing one slightly nontrivial piece of code in the List
monad, as well as its (almost direct!) translation to Python, an exemplary imperative language. What follows is a solution to the n
queens puzzle, in which one must find all possible placements of n
queens onto an n
by n
chessboard such that no queen can attack another.
In the example below, a placement of queens will be given as a permutation of the range [1..n]
, with the understanding that if j
is the i
th element of the permuation, then there is a queen at row i
column j
(clearly no two queens can occupy the same row).
The solution below maps over all "partial solutions" (defined recursively), and all possible placements of the next queen, guarding against any conflicts (queens in the same column or same diagonal). Thus a do
block felt perfectly suited to this problem, whereas I don't like using long lines (the complex guard
conditionwhich could incidentally be rewritten with its own do
block) inside list comprehensions.
(Extremely terse solutions are also possible at a higher level of abstraction; this code does the same thing via monadic folding in essentially three lines.)
For the clear connection to imperative programming: Python admits a very close translation of the above code thanks to the yield
keyword's ability to build generators inside a for
loop.
The only real hiccup in the translation is the unfortunate nesting in lines 58; thus list (or generator) comprehension style might have been more appropriate in Python. (Or maybe not: for reasons I don't understand, the Google style guide forbids the use of comprehensions to flatten deeply nested code.) Alternately, two levels of indentation could have been eliminated by
 defining
solve_first_rows
outside ofn_queens
, and  explicitly raising the
StopIteration
exception in the firstif
branch so that theelse:
on line 5 and its indentation would be redundant.
2.5. Monad laws
Finally, I want to go over the monad laws; specifically what they mean and why they are reasonable laws.
In previous sections, I've avoided using a symbol for function equality/equivalence, because the relation isn't computable, and Haskell raises a type error if you try to compare two functions for equality with (==)
. In this subsection I'll use the triple equals symbol (===)
as if it were valid Haskell.
As we've noted before, there are corresponding laws for monads as they are usually defined in category theory (with join
and fmap
instead of (>>=)
). That such laws are equivalent to Haskell's monad laws is an amusing exercise in symbol pushing, but it takes us farther afield than I want to go here; I've uploaded a proof to Google drive.
Here are Haskell's monad laws:
Left identity: return a >>= f === f a
Right identity: m >>= return === m
Associativity: (m >>= f) >>= g === m >>= (\x > f x >>= g)
We'll translate these laws into do
blocks. First, recall the relevant rules for desugaring do
into (>>=)
:
do { v < m; ... } === m >>= (\v > do { ... })
do { m } === m
Then left identity says that
(do
v < return a
f v) === f a
because the do
block desugars into (something equivalent to) return a >>= f
. Meanwhile right identity says that
(do
v < m
return v) === m
In do
notation, both identities look pretty natural. In some sense, the two identity laws just say that return neither adds nor removes any information, just faithfully wraps it in some kind of container (I believe the mathematically rigorous notion in the background is that of adjoint functors, but I'm getting out of my depth here).
Consider the List
monad, where we (correctly) guessed that return x
should equal [x]
, based on the necessary type signature a > [a]
. Of course there are other functions with this signature, for instance
return1 x = []
return2 x = [x,x]
return3 x = repeat x
(Actually repeat
is a particularly interesting function in that it returns an infinite list with one element repeated indefinitely. Haskell's lazy semantics only evaluates the elements of a list "by need", so has no problem with infinite lists; Python's generators are somewhat analogous.)
But none of these alternatives to return
feel like the "right" way to wrap a piece of data. In fact, all three fail both identity laws.
The third law is associativity. It's only slightly harder to come up with do
blocks which desugar to the two sides of the associativity identity:
do
y < do
x < m
f x
g y
desugars to (m >>= f) >>= g
, and
do
x < m
do
y < f x
g y
desugars to m >>= (\x > f x >>= g)
.
Note that the following flat do
block also desugars to m >>= (\x > f x >>= g)
do
x < m
y < f x
g y
Thus the content of the associativity law is that the first nested do block with the slightly different grouping is also equivalent to the flat block.