A simpler approach to functional programming

I’ve spent a lot of time thinking about purely functional programming lately.  There’s a lot of exciting stuff going on in the domain lately, mostly thanks to Haskell, which is the only purely functional programming language anyone cares about nowadays.  There’s not necessarily anything wrong with this in and of itself but there’s a disturbing trend that naturally comes about as a result: tending to view Haskell as a “baseline” for type systems and programming language features.  /u/tactics describes this well:

The issue is that any type system will simultaneously be too weak to express some constraints and strong enough to keep you from getting your job done.
There’s a fallacy in presuming that because Haskell (or more generally, any language) has some feature, it should be considered the baseline.
Haskell is littered with good examples of its own limitations (as is any language). Need a version of liftM which will work with two parameters? No problem, we have liftM2. Need one that will work with 6? No can do. Pretty much any library that works with arbitrary tuple types suffers from the same problem that you can’t write code generically over n-tuples.
Similarly, Haskell has no straightforward, culturally-standardized way to prevent users from entering negative numbers into a function, and many basic functions (length, foldl) are partial, with no indication at the type level (and frankly, at the documentation level) that their inputs should not be infinite.

That’s one person’s words, with some specific examples.  I agree with the spirit of that comment which I would summarize simply as “Haskell hasn’t necessarily done everything right”.  Sometimes, people default to thinking that Haskell’s approach is “the right one” for purely functional programming languages, first of all because it’s the only purely functional programming language that is remotely mainstream, and secondly because there is so much that Haskell did right, but make no mistake: the conversation isn’t over because Haskell had a say.  (See the conversation about typeclasses vs. ML-style modules, lazy vs. eager evaluation, first-class records vs. whatever it is that Haskell has, etc.)  This fallacy isn’t good for functional programming in general, and it doesn’t only affect people who are already functional programmers.  I’ve heard people insist they dislike functional programming when, in fact, what they dislike is some particular choice that Haskell made which perplexed or frustrated them.  In this post I’m going to describe one of those choices and I’m going to describe an alternative which I, personally, think is undisputably better.  Here it is:


Somewhat predictably, I’m writing about IO here.  IO and functional programming have somewhat of a contentious relationship because, in a way, they’re sort of diametrically opposed: functional programming isn’t functional programming if it doesn’t make an effort to control and greatly limit side effects.  So figuring out how to make IO “work” with Haskell back in the 90’s was awfully confusing until someone proposed the IO type.  This has been explained to death in other posts so just google “monad burrito” if you don’t already know this, but here’s a quick summary: Haskell’s IO type describes IO actions in an opaque, abstract fashion.  You can string IO actions together by sequencing them.  The result of a Haskell program is a big IO action which is sort of like an abstract syntax tree which the Haskell runtime then interprets for you, thereby preserving referential transparency.  There’s an important advantage of this model in that it makes an essentially valuable distinction between things that have side effects and things that don’t.  As any Haskell programmer will tell you, C’s type system isn’t so amazing because (among many other reasons) every type signature tells you nothing:

int foo(int a, int b);

Who knows what this function does?  I don’t know.  It could do a “pure” computation on the two integers and return a new integer, but it could also do literally anything else, including looping forever, halting the program, dropping all the tables in your database, emailing embarrassing pictures to your mom, launching the missiles, nasal demons, etc.  The equivalent Haskell type signature

foo :: Int -> Int -> Int

severely limits the scope of what foo is allowed to do.  Otherwise, it’s

foo :: Int -> Int -> IO Int

… which reflects all the infinite power of computers.  However, there are also some important drawbacks to this design choice.  Here are a couple:

  • IO is Haskell’s so-called “sin bin”.  There’s one type in Haskell which is allowed to “do stuff”: IO.  Crucially, the language gives you no built-in way to distinguish between different kinds of side effects.  You’re either in pure, side effect-less code or you’re in IO land where anything is possible.  So the type IO () gives you no increased ability to reason about what the IO action does than the corresponding C function signature, which is unfortunate because that’s what purely functional programming is supposed to do for you, right?  Of course you can define your own types which have a more limited scope but this is always ad-hoc, and those eventually have to be hoisted up into IO which leads to nasty stuff like monad transformers and the like.
  • The interface to Haskell’s IO type is monadic.  Let’s not say much about this except that monads aren’t awfully intuitive, and for some reason you see a lot of head-bashing online with people trying to wrap their heads around IO in Haskell.  I refuse to believe that functional programming is itself unintuitive, so there must be some better approach from this respect.

None of this is to say that IO in Haskell is broken, or a disaster, but Haskell doesn’t have the final say.

Starting from scratch

Let’s start from scratch.  If we redesigned IO in a purely functional language from the ground up, how would we do it? Let’s take a basic principle of functional programming: you don’t change data, you perform transformations to make new data.  In Haskell, we have our putStrLn function as follows:

putStrLn :: String -> IO ()

Translating this along the core principle of functional programming, that functions should have no hidden inputs and outputs, we come up with something like this:

putStrLn :: (World, String) -> World

In this case, World is an abstract parameter which abstractly represents the “state of the world”: this function takes a world and a string and returns a new world where the string is printed to the screen. This is an abstraction which doesn’t match up to how things work in the “real world” but it matches up with how other data structures work in purely functional programming languages: nothing changes, but you get new, changed versions of old things. The corresponding getLine:

getLine :: IO String


getLine :: World -> (World, String)

The program that reads a line from stdin in Haskell and writes it out? Well, we’re in monad mode, so we need `>>=`:

getLine >>= putStrLn :: IO ()

However, this is just a function composition, isn’t it? In my opinion this is much more clearly expressed in that way:

putStrLn . getLine :: World -> World

This matches up to a more intuitive understanding of how these functions should work and resembles other languages more closely. Here’s another little code sample: let’s open a file and write a string to it.

writeMessage :: File -> File
writeMessage f = fPutStrLn (f, "Hi buddy!")

main :: World -> World
main world = let
(world', f) = open (world, "~/out.txt", ReadMode)
f' = writeMessage f
world'' = close (world', f')in world''

The type signatures here are

open :: (World, String, FileMode) -> File
close :: (World, File) -> World
fPutStrLn :: (File, String) -> File

In Haskell, the type signature of writeMessage would be

writeMessage :: Handle -> IO ()

The superiority of the File -> File type signature in the other model should be obvious for the simple reason that it’s limited; the type Handle -> IO () is unlimited in scope, while the type File -> File can only perform file operations on the input file, because changing any other state requires access to the World (which isn’t an input to the function!). Reasoning about programs suddenly becomes much easier because the types inform our reasoning about what a function is allowed to do, in a way that is enforced by the type system, and which is not necessarily any more complicated than Haskell’s type system already is.

This isn’t the whole story

This isn’t the whole story.  This proposed system isn’t sound as-is; in order to make this a sensible model, you need to supplement it with something called “uniqueness” or “linear types” (see the Wikipedia article), which Haskell doesn’t have.  Haskell doesn’t have those and it doesn’t plan to have those in the future as far as I know; I don’t have time to discuss that in-depth but linear types are really nifty in and of themselves, even outside the context of IO.  I can write about that later.

This isn’t a new, original idea.  There are other languages that have this type system feature already, most notably Clean, a research language which is pretty much dead in the water nowadays.  (Rust deserves a shout-out here for also having affine/uniqueness types, though it isn’t pure or functional.)  I don’t mean to insinuate that I came up with this myself, but it is another way to think about IO in pure functional languages.  I think it’s better.  Maybe you do too.

Reverse inheritance

Let’s talk about inheritance.  Now, let me be clear here: when I say “inheritance”, I’m not specifically talking about OO inheritance.  I’m talking about a facility, built into the language, for saying that one thing satisfies the interface of another type of thing, and has additional functionality on top of that.  Most modern languages have something that’s sort of like this.  Object-oriented languages have inheritance (either through the class hierarchy, interfaces, or mixins, or whatever), Haskell has superclasses, and so on.  You already know about inheritance and agree that it, in one of its many forms, is useful. But this post isn’t about that.

This post is about reverse inheritance.  As far as I know, no popular language supports reverse inheritance conveniently. (I’ve been wrong before — if it turns out I’m wrong, and some popular language does have this feature, please do point it out to me.)

What do I say when I mean “reverse inheritance”?  Reverse inheritance is the opposite of inheritance.  If one datatype A inherits from another datatype B, it means that A supports all the functionality of some B, and then possibly some more functionality.  If one datatype A reverse-inherits from another datatype B, it means that A supports some of the functionality of B, but not necessarily all of it.  Here’s an example in a made-up language:

fn frobulate(s: Socket)

In this example, s is a Socket and so it satisfies the Socket interface; i.e., it can do everything a Socket can do. There are a few things you can do with a socket: you can connect it to an address, or you can read from it or write to it, or you can close it. Since s can do everything a socket can do, the function frobulate can accordingly can do any of those things to the socket.

Now let’s try the same thing, but with reverse inheritance:

fn frobulate(s: Socket(read))

Now, s isn’t a Socket, or at least not necessarily. s reverse inherits from Socket, and the method read is reverse-inherited. s supports the read method of the Socket, but no other method.

Now, in the strongly- and statically-typed hypothetical language we’re talking about here, reverse-inheritance is checked at compile time, so it’s a compile error if any method other than read is called on s.

I don’t get it.

So, unlike traditional inheritance, it doesn’t add functionality to datatypes. It actively takes away functionality from datatypes. Since they’re opposites, the goals they accomplish are completely orthogonal. Inheritance tends to aid code reuse and abstraction, while reverse inheritance aids code readability and facilitates making static, verifiable guarantees about program behavior — which, maybe coincidentally, are not things that traditional inheritance tends to be good at.

Here’s what I mean. You see this type signature:

fn frobulate(s: Socket(read))

… And you know that no method besides read is called on s in that function. The socket doesn’t get written to, it doesn’t get closed, it doesn’t get bound to an address: it just gets read, and that’s it. This is very good for code readability: if a function makes good use of reverse-inheritance, then you don’t have to pore through a function’s source code, and all of the functions that function calls, in order to understand what concrete side effects a function has. Of course, this doesn’t just work for sockets. You can reverse inherit from a database driver to ensure at the type level that the connection is never closed, or that only SELECT queries are sent (in other words, you can guarantee that the method never changes anything in the database). Imagine taking your MVC app and using reverse-inheritance to statically ensure that your program is structured correctly, because you can ensure the model, view, and controller only call methods that they each should be calling. All of this also works to make sure you’re not changing the semantics of your program when you refactor, too.

So does this mean I would have to annotate the parameters of every function I write with all the methods that function calls?

Not necessarily. If your function makes use of the entire interface of a datatype, or almost its entire interface, then it’s probably not worth it to specify exactly which behaviors are used. If you’re unsure whether the implementation of a function will change, then you might want not want to reverse-inherit just in case you end up calling other methods on those parameters in the future. The question you should be asking is: “does it make sense to guarantee that this function will only ever use a small subset of the behaviors provided by an interface?” Does it make sense to limit the scope of what a function can do at the type level? If so, then reverse inheritance could be an elegant way to capture that requirement.

Of course, it doesn’t matter that much, because it’s a hypothetical language feature. OO languages only support inheritance going one way, and there’s no real clean way to say “this object supports methods X, Y, and Z of this other object, but none of the others”. Haskell, in spite of its supreme type system, also does not make this easy. In both Java and Haskell, if you have 10 operations on a datatype A, but you want the type system to guarantee you only made use of one particular operation, you have to define a new wrapper datatype around A that only supports that operation — pretty messy, and not very convenient or obvious. Reverse inheritance is more direct and immediate.

Here’s the funny part: of all the statically typed languages I know, none support true first-class reverse inheritance, but one makes it markedly easier than all others. What is this language and its type system of kings?

Believe it or not, it’s Go. In Go, structs don’t have their interfaces declared, as they’re dynamically inferred according to whatever methods end up being defined on the struct. So if you define an interface I which happens to “reverse-inherit” from another interface J, then any type which implements J automatically implements I. This is not generally the case for other languages, which often require you to specify all of the interfaces a datatype satisfies exhaustively. So Go happens to provide a better reverse-inheritance story than all of the other statically typed languages I know, which is pretty neat. That makes it easier for them to reap the rewards I detailed above — increased readability and ability to statically reason about code behavior and correctness — but given the kind of people that tend to be drawn to a language like Go, I’m sure they’re not using this accidental feature for that purpose very much.

There has to be another way: my entry in the FP vs. OO debate

The other day I read an article.  This is my response to that article.

As articles go it was mostly fine.  As most articles do it made a few nice points and a few points that I wasn’t awfully convinced by.  (If you ask me the increasing acceptance of lambdas, or anonymous functions, in languages that are otherwise “purely” object-oriented is by no means indicative of a failure on the part of the OO philosophy, especially when you consider that closures and objects are basically two sides of the same coin, and that in many languages [see Java] lambdas are really just syntactic sugar over the instantiation of an object which might otherwise be more cumbersome.  I agree though that composition is better in almost all practical respects than inheritance.)

Having said all of that, what I’m writing here won’t really be about that.  It’s going to be about a section in that article that I find somewhat more troublesome, entitled “the multicore revolution”.  The crux of it is that object-oriented programming has a certain focus on mutable state which leads to big problems in a multi-threaded context.  An uncontroversial opinion, this one.  Threading is hard, threading with shared memory is really hard, and threading with shared memory with mutability is both harder and less fulfilling than the Japanese version of Super Mario Bros. 2.  I don’t aim to disagree with this and I don’t think anyone else does either.

What I disagree with is the conclusion this observation led the author to.  Quoth the article:

On the other hand, forbidding mutable state and other such side effects without changing anything else is crippling. You need a whole new bag of tricks. Persistent data structures, so you don’t have to copy the state of your entire application for every little update.

Mkay, right with you.  We need some help outside of the traditional OO toolset.

Standard higher order functions such as map, fold, and filter, so you have an alternative to most of your for and while loops without resorting to the recursive nuclear option. Algebraic data types, so you can have functions that fail without resorting to the harder to tame exceptions. Monads, so you can chain such computations without having to write annoying chains of if-else expressions…

Oh, my god.  What just happened?  I was just trying to read this article about parallel programming and then Simon Peyton-Jones threw up all over my monitor.  The prose just turned into a list of Haskell’s features.  The m-word was spoken.

See, the article falls into the same sort of trap that I’ve seen hundreds of articles about functional programming fall into.  It’s the Haskell Trap.

The Haskell Trap: the tendency of some to take it for granted that FP is the opposite of, and the only alternative to, OO, and that Haskell in particular is exactly equivalent to FP.

You can find more examples.  One that comes to mind immediately is this one.  Now, Erik Meijer is a super-smart guy.  He is far more experienced and knowledgeable on this topic than I am and will ever be, at least for a very long time.  But I’m not convinced by this article that Erik didn’t fall into the Haskell Trap either.  He brings up a slew of legitimate qualms about mutable state and OO (they make compiler optimizations more difficult to do, complicated constructor logic can have surprising effects, and so on and so forth) before arriving at a somewhat startling conclusion:

Instead, developers should seriously consider a completely fundamentalist option as well: embrace pure lazy functional programming with all effects explicitly surfaced in the type system using monads.

Ack.  Is there really no other choice?  Do I have to program functionally?  Why on earth must everything be lazy; didn’t we agree that wasn’t the most sensible default?  And why are you forcing me to think about monads without my consent?  I should make it known here that there are other solutions to implementing side effects in a pure functional language.  Uniqueness types are one of them, and they’re actually really intuitive and neat.  Meijer dismisses them without any real consideration, saying you need to have a “Ph.D in theoretical computer science” to understand them.  No one tell Meijer that Haskell and its monads have the very same reputation or he might be really upset.

Anyway, the Haskell Trap itself is sort of beside the point.  Here’s the point: while these criticisms of OO and of unchecked mutable state in general are valid and Haskell provides an adequate solution to many of those problems, Haskell shouldn’t be taken as the be all, end all of sensible computation.  Haskell isn’t the only solution.  It can’t be.  There has to be another way.

Coming out

Okay, so, um, this brings me to a somewhat uncomfortable part in the essay.  I knew this was coming but it’s just… kind of impossible to prepare for, you know?  I want to thank all of you for being so supportive of me.  So, um, it’s time for me to, uh, come out.

I don’t like Haskell.

… there, I said it.  That’s it.  It’s out in the open now and there’s nothing I can do about it.  Thanks again for bearing with me.

So, yeah, I don’t like Haskell.  There it is.  It’s just not my cup of tea.  There are a lot of reasons for it; I don’t want to get too much into it right now.  Basically the IO type seems like a really clumsy solution to the problem of state for me, and monads in general rub me the wrong way. do blocks really bum me out, and maybe most importantly the things that the Haskell community seems to value (abstraction, burritos) are not the things that I personally find interesting in programming and the features that “unconventional” programming languages can provide.

But it’s not just that.  Honestly I don’t like functional programming that much, either.  There are definitely some advantages it confers and it makes the expression of certain ideas really natural and easy, but that’s by no means a universal thing for me.  If I’m trying to express an operation on a recursive data structure that isn’t just a simple map, filter, or fold, things can get kind of messy.  Often I have to contort my ideas to fit the model and express things in a recursive fashion that doesn’t correspond well to my intuitive understanding of the problem.  To define a function I often have to locally define a tail recursive helper function with confusing parameters.  Another quote from someone smarter than me:

But when it comes down to it, I am an object-oriented programmer, and Haskell is not an object-oriented language. Yes, you can do object-oriented style if you want to, just like you can do objects in C. But it’s just too painful. I want to write object.methodName, not ModuleName.objectTypeMethodName object. I want to be able to write lots of small classes that encapsulate complex functionality in simple interfaces – without having to place each one in a whole separate module and ending up with thousands of source files.

I would agree with all that.  (By the way.  Don’t try to tell me I just need more experience with functional programming.  I’ve got plenty; my first language was a functional one.)

But here’s the interesting thing: I kind of just realized this.  I used to think I loved functional programming, languages like Standard ML.  But the more I used these languages, the more I realized how far off I was.  I don’t love Standard ML itself; in fact, at its worst, I find it cumbersome and unintuitive.  But there are things about Standard ML I do love.  Algebraic data types, the focus on immutability and persistence.  Pattern matching.  The strong type system.  The simple, sensible semantics.  But here’s the thing:  none of that stuff that I love has anything to do with functional programming.  They could happily exist in a language that isn’t functional, or purely functional, or almost purely functional, or whatever.  You could lift up all of those features and plop them in a language with more traditionally imperative semantics and everything would be peachy.

Same with Haskell.  A language can be “object-oriented”, or procedural or C-like or whatever, while still having a strong type system that captures effects, informs the programmer about intent and behavior, and allows the compiler to make certain optimizations.  This feature doesn’t have to be specific to functional languages, and in fact it explicitly is not tied to functional languages.

You know what?  I get it.  My criticisms of Haskell are shallow.  These are lame reasons to dislike a language, on the whole.  Don’t get me wrong, I respect Haskell.  I respect Haskell like I respect PepsiCo (a company which has achieved admirable success despite being wrong-headed about some things in my opinion, particularly because I view them as inferior to a certain other soda manufacturer).  Its advantages are obvious and great, it is staunchly principled, and undoubtedly mind-bending.  But it’s not the kind of language I want.  Now, I could learn to get around the fact that I don’t much care for Haskell and learn to write idiomatic Haskell code in a rather efficient manner (in the same way I did with Python) but that’s just the thing: I shouldn’t have to.  Haskell isn’t the only solution to this problem.  There has to be another way, because if there isn’t I’m going to be seriously bummed.

What does the alternative look like?  I’m not really sure yet.  I’ve been thinking about it.  There’s an effect system, a capability system.  You should be able to glean the behavior of a function from its type.  It should be a compile time error if the code tries to have an effect that it doesn’t have the capability to; you should be able to pass capabilities through method calls and, in a multithreaded context, between threads.  Compiler optimizations should understand the effects and how they interact and be able to make judgments about what transformations are legal in a sensible way given the effects associated with each function.  Effects should interact and interplay in a sane way (e.g. if there’s a mutable reference type, then any number of “read” capabilities can exist, since any number of threads can read a reference at the same time, but no other “read” capabilities can exist if a function has a “write” capability, since you don’t want anyone else reading a reference while you’re writing to it).  Effects and capabilities should be extensible so that users can declare their own effects and have the compiler treat them correctly.

All of this is really abstract; I’m limited by a very small number of existing implementations and a comparatively small amount of existing research (all of which looks pretty dense).  But it’s just an idea; people will have others.  That’s the point: Haskell’s only one way of many to do this.

A preprocessor hack to implement “with” blocks in pure C

Alright, everyone, get out your pencils: it’s time for a quick test on the C language. Don’t worry if you didn’t study; the test will be rather painless, and you should be able to follow along even if you don’t really know C that well. Here it is:

Johnny is trying to design a function to be integrated into the codebase of an evil genius. The function must meet the following specifications:

  1. The function takes two parameters, an unsigned integer password and a string fname.
  2. The function tests whether password is equal to the Number of Evil (NOE), 69381U.
  3. If the test succeeds (if password is equal to NOE), then the function must write the secret evil plans to a file named fname.
  4. If the password doesn’t match, then we have an intruder, and the function should write a log message warning of the event to the standard output.

Here is his function.

void write_evil_plans(unsigned password, char *fname)
    if (password == 69381U) {
        FILE *f = fopen(fname, "w");
        fprintf(f, "You have successfully been authenticated. The evil plans are as follows: ...");
    } else { 
        printf("There has been an intruder. Please make a note of it.\n");

What are all the problems with the function Johnny wrote?


Alright, pencils down. There are two big problems with this function. You probably caught the first one: Johnny never closed his file! Now, whenever that function is called and the password is verified, a new file handle is created, and it’s never going to be closed. This is a classic example of a memory leak: we allocate a resource and lose the pointer to it so that freeing it becomes impossible. God help you if you call Johnny’s write_evil_plans more than a few times over the course of the runtime of the program. This problem is fixed easily enough, though; all we have to do is throw a fclose(f); right after the call to fprintf.

The second problem is more subtle: this program is unsafe in that it’s likely to cause a runtime crash. If fopen fails, then it will return a null pointer. Once it does, the call to fprintf will dereference the null pointer. BOOM! Instant undefined behavior. Johnny will have to restructure the function a bit in order to fix this problem: he’ll have to test whether f == NULL, execute the fprintf and the fclose only if that test fails, and take an appropriate action in case there was an I/O error.

These two issues — memory leaks and unchecked null pointers — are among the most pervasive problems in pure C code, and they happen for basically the same reason: writing robust, memory-safe C code is hard, and programmers are likely to fail to get it right. Anytime you call malloc, or you call a function that calls malloc you have to make sure to check for NULL. You then have to make sure you free that pointer somewhere. While these checks aren’t technically hard to implement, they’re easy to forget about if you fail to structure your code correctly. Moreover, C programs with memory errors are generally hard to debug, so these errors can be hard to spot until something catastrophic happens.

Other languages have different approaches to dealing with memory allocation robustly. Scripting languages eliminate memory leaks by using garbage collection. In Lua, for example, you can open and close files just as in C, but unclosed files are closed automatically once their file handles are garbage collected. C++ isn’t garbage-collected, but C++ code generally makes heavy use of the RAII model, which puts cleanup code in the destructors of objects, and therefore ensures that cleanup code will be executed deterministically when objects go out of scope.

In the general case, we can’t easily adapt either of those solutions to the C language. C has no facilities for garbage collection built-in, and there is no way to trigger automatic cleanup for objects. However, there is an idiom found in high-level programming languages that we can port over to C with a little preprocessor magic.

“With” to the rescue

In Python, you have the with statement, which allows the programmer to tie resource management to the scope of a variable:

with open(fname, 'w') as f:
    f.write('You have successfully been authenticated. The evil plans are as follows: ...')

After the write call, the file f is automatically closed. While we can’t capture the more powerful RAII idiom in C, we can simulate with statements like this one in C. Thanks to C’s powerful preprocessor and scoping support, we can capture all logic relating to resource-management and error-checking in a single macro. Here’s a with macro that works on files:

#define WITH_FILE(varname, fname, options, block, error) do {         \
    FILE *const varname = fopen(fname, options);                      \
    if (varname) { block; fclose(varname); } else { error; }          \
  } while (0)

Simple, eh? The 5 arguments are:

  1. The name of the variable we want to create.
  2. The name of the file (as a string) that we want to open.
  3. The options string we want to pass to fopen.
  4. The block that we want to execute with the file.
  5. The block that we want to execute if the file open operation fails.

Here’s how we would use that macro in Johnny’s program above:

void write_evil_plans(unsigned password, char *fname)
    if (password == 69381U) 
        WITH_FILE(f, fname, "w", {
            fprintf(f, "You have successfully been authenticated...");
          }, { 
            fprintf(stderr, "There was an I/O error.\n");
        printf("There has been an intruder. Please make a note of it.\n");

In this case, if the fopen operation succeeds, the first block gets executed (that is, the evil plan is written to the user’s file), and if it doesn’t, the second block gets executed (that is, the error message is written to stderr). The macro gives us a few interesting advantages:

  • The macro eliminates a lot of boilerplate (the FILE* declaration, the NULL check, and the calls to fopen and fclose).
  • Because the macro takes 5 arguments, there will be a compile error if you try to omit the error-checking block. Of course, if you don’t want to take any special action in case of an I/O error, you can just throw in an empty block as the argument.
  • The macro behaves sensibly; the declared variable will only exist within the blocks you pass to the function-like macro, and will not be accessible elsewhere. Calls to WITH_FILE can be nested without a problem, even if the declared variables have clashing names.
  • Thanks to the structure of the macro being a do-while loop, you can write WITH_FILE(...); anywhere you would be able to put a normal statement. Of course, the conditional check of the loop is easy for compilers to optimize out.
  • Because the declared variable is const, the programmer can’t edit the contents of the variable. This means the call to fclose is completely safe.

This approach can be applied in other contexts as well, of course. We can, for example, write a with wrapper to malloc that allocates and frees an array of objects automatically:

#define WITH_ALLOC(varname, type, count, block, error) do {           \
    type *const varname = malloc(count * sizeof(type));               \
    if (varname) { block; free(varname); } else { error; }            \
  } while (0)

Here, the type parameter is the type of the object we want to allocate, and count is the number of consecutive objects we want to allocate. For example, if type is int, count is 300, and varname is a, then a will be declared as an int*, and it will be allocated to point to an array of 300 ints before running the block. This macro has the added advantage that it makes it impossible for us to give the wrong argument to malloc, which will prevent a particularly infuriating kind of memory error. Here’s an example usage:

  WITH_ALLOC(buffer, char, 256, { // buffer is a char*                                    
      WITH_FILE(file, "hello.txt", "r", { // file is a FILE*                            
          fgets(buffer, 256, file);
          printf("%s", buffer);
        }, {/* do nothing if I/O error */});
    }, {/* do nothing if malloc error */});

So, under the right circumstances (namely, you need to allocate a resource at the same time a variable goes into scope, and you want to free that resource as soon as the variable goes out of scope), using a macro like this might help you write robust and safe code. Of course, it’s questionable whether you should actually use this in a production environment — people disagree about whether it’s alright to use macros at all, and if you accidentally mix up the order of the arguments to these macros the result will probably be a confounding syntax error — but wrapping your resource allocations in this way can help make your pure C code a bit safer.

Why sorting an array makes a Python loop faster

This morning I ran across a blog post that presented two Python scripts, one of which was much more efficient than the other. The blog post has since been deleted so I can’t link to it, but the scripts basically boil down to this:

import time
a = [i for i in range(1000000)]
sum = 0
t1 = time.time()
for i in a:
    sum = sum + i
t2 = time.time()
print t2-t1
import time
from random import shuffle
a = [i for i in range(1000000)]
sum = 0
t1 = time.time()
for i in a:
    sum = sum + i
t2 = time.time()
print t2-t1

As you can see, the two scripts have entirely identical behavior. They both generate a list of the first one million integers and print the time it takes to compute their sum. The only difference is that slow.py first randomizes the order of the integers. Although it may seem surprising, it seems that this randomization is enough to slow the program significantly. On my machine running Python 2.7.3, fast.py is consistently a tenth of a second quicker than slow.py (this is a nontrivial speedup given that fast.py takes about a quarter of a second total to run). Go ahead and try it out yourself. (I haven’t tested it on Python 3, but the results shouldn’t be too different.)

So why does randomizing the list’s elements result in such a significant slowdown? The author of the original blog post chalked it up to “branch prediction”. If you’re not familiar with the term, check out this StackOverflow question, which explains the concept very well. (My suspicion is that the author of this blog post encountered this page or a similar one and applied the idea to a Python snippet where it didn’t quite fit.)

Of course, I was doubtful that branch prediction could actually be the culprit here. There are no top-level conditional branches in the Python code, and it stands to reason that both scripts take precisely the same branches within the body of the loop. No part of the program is conditional on the integers themselves, and there is no data dependency between each of the list’s elements. Of course, I’m still uncertain that Python is anywhere near “close to the metal” enough that CPU-level branch prediction should even be a factor in the performance analysis of a Python script. Python is way too high-level for that.

So if it’s not branch prediction, just why is slow.py so slow? After a little bit of research, I think I’ve figured it out, but not before making a few “false starts”. The answer requires a little familiarity with the internals of the Python virtual machine.

False start: lists and generators

My first thought was that Python was able to deal with the sorted list [i for i in range(1000000)] much more efficiently than the randomized list. That is, maybe it was substituting the list for a generator like this one:

def numbers():
    i = 0
    while i < 1000000:
        yield i
        i += 1

That might be a bit more time-efficient, I thought. After all, if Python internally uses a generator in place of a bonafide list, then it doesn’t have to go to the trouble of keeping all of the integers in memory at once, which might save us a lot of overhead. The randomized list in slow.py can’t be easily captured by a simple generator, so the VM wouldn’t be able to make any such optimization.

Unfortunately, this wasn’t a useful observation. If you squeeze an a.sort() in between the shuffle and the loop in slow.py, it becomes as fast as fast.py again. Clearly there is something about having these numbers sorted that makes the program faster.

False start: lists vs. arrays

My next thought that it might be the data structure giving us caching issues. a is a “list”, which (understandably) led me to believe that a was implemented under-the-hood as a linked list. If the shuffle operation statefully randomizes the nodes of the linked list, then fast.py may be able to take advantage of cache locality by allocating all of its linked-list nodes adjacently, while slow.py will get a lot of cache misses because each linked list node will reference another node that is not likely to be on the same cache line.

… Unfortunately, that didn’t get me anywhere, either. Python’s list objects aren’t linked lists but are arrays in the truest sense of the word. In particular, this is the C structure definition of a Python list object:

typedef struct {
  PyObject **ob_item;
  Py_ssize_t allocated;
} PyListObject;

… in other words, ob_item is a pointer to an array of pointers to PyObjects, and allocated is the size of the array we’ve allocated. So that’s not helpful in solving this problem, either (though it did soothe a few of my uncertainties about the algorithmic complexity of some list operations in Python: YES, list appends are O(1), YES, accessing an arbitrary element of a list is O(1), etc.). Now, I’m just trying to figure out why Guido chose to call them “lists” rather than “arrays”, which is what they are.

The solution: integral objects

An array’s elements are all adjacent in memory, so the data structure isn’t giving us cache locality problems. It turns out that cache locality is the problem that makes slow.py so slow, but it comes from an unexpected place. In Python, integers are heap-allocated objects rather than simple values. In particular, this is what an integer object looks like in the virtual machine:

typedef struct {
  long ob_ival;
} PyIntObject;

The only “interesting” element of this structure is ob_ival (which is the value of the integer as a C long). If it seems wasteful to you to make an entire heap-allocated object just for an integer, you’re probably right. A lot of languages make optimizations to prevent doing exactly this. For example, Matz’s Ruby Interpreter generally stores objects as pointers, but makes an exception for integers since they’re used so frequently. In short, the MRI crams Fixnums into the same space as an object reference and tags the least significant bit to let you know it’s an integer rather than a pointer (on all modern systems, malloc always returns memory addresses that are aligned to some power of 2). At that point, you just have to do a right shift to get the value of the integer — no heap allocation or redirection needed. slow.py and fast.py would have been equally speedy (and it’s likely they both would have been faster) if CPython had made a similar optimization.

So how does CPython deal with integers, then? What is the behavior of the interpreter that’s giving us so much trouble? The Python interpreter allocates integers in “blocks” of about 40 at a time. When Python needs to create a new integer object, it carves out the next available space in the current integer “block” and stores the integer there. Since our code allocates one million integers in a row, adjacent integers will (for the most part) be placed adjacently in memory. Therefore, iterating through the first million integers in order exhibits good cache locality, and iterating through the first million integers in a random order will result in frequent cache misses.

So, the answer to the question “why does sorting this array make this code faster” is that it doesn’t. The non-shuffled array is faster to iterate through since we access the integer objects in the same order that they were allocated (and they do have to be allocated).

Edit, September 4: In the original article, I included a bit about how many CPU instructions it takes to print out an integer in Python. The figure was a bit misleading so I removed it. Don’t go looking for it, it’s gone now.