Monthly Archives: March 2009

The fallacy of high-level languages

There’s been a meme going around the open source community for a while now.  That programming in C is somehow dirty, distasteful and worst-of-all inefficient compared to programming in a high-level language such as C# or Python.

Its detractors will tell you how it takes much longer it takes to program anything in C.  They’ll point at how much C code it takes to do something as simple as create a GObject sub-class compared to the equivalent in Python or C#.  They’ll also probably complain that everything in C has to be compiled first, which takes even more time up.

No argument would be complete without them pointing out that C has no standard types for strings, let alone linked lists, binary trees, associative arrays, etc. and that you have to spend all that time implementing your own.  They’ll probably make a point about how C’s static type system means that even if you have a array type, you need to know in advance what types you’re going to put into it and can’t just mix and match.

Don’t feel tempted at this point to counter with a discussion about how great and flexible pointers are.  You’ll receive a lashing about how they’re even more evil than people who talk in the theatre.  The rant about the C problems of uninitialised memory, out of bounds pointer errors and segmentation faults is a timeless classic.  Especially when they get to the bit about how much time was lost debugging them.

And do you know what?

I simply do not agree with them.

I cannot think of a single project where the majority of time was wasted writing GObject header files compared to the single line of Python I needed.  I can think of lots where I’ve sat for hours trying to figure out which class I needed to derive from, or reworking the code after I realised I derived from the wrong class to begin with.  The high-level language doesn’t make this any easier.

As to the number of projects where I’ve needed to write a linked list or hash table implementation because C lacked a convenient dynamic array or associative array type like Python?  If it takes you any time to write that kind of code, you’re doing it wrong.  I’ve spent far more time realising that the structure is a performance bottleneck, and planning on the whiteboard a faster alternative.  Neither language helps with this whiteboard time.

And all those pointer issues?  This comes down to the tools that you’re using.  If you’re writing in a language and not using its development environment properly, then it’s little surprise that you’re not being as efficient in it.  gdb, valgrind, gprof and gcov are your friends.  Use them well.  I’ve spent just as much time dealing with other language-specific issues to make me believe that pointers aren’t any more evil as (for example) monkey patching.

The vast majority of my time on any new project is first of all spent thinking, and on a more mature project its figuring out what I did wrong and how I need to rework it.

Yes, the next biggest use of my time is working out what the best way is to express that.  If I’m writing in C, that means I’m deciding whether it needs a linked list, or a hash table, or some other fancy structure.  But if I’m writing in Python, believe me I spend just as much time normalising my class structure and coming up with all sorts of insane Pythonic ways of doing things.

If you’ve ever written in Perl and not lost a day or two to optimising your regular expressions, or eliminating code to arrive at the shortest possible expression, you’ve never written in Perl.

Refactoring takes you just as long in Python as it does in C.  Just because when you do it to C code you end up setting segfaults doesn’t mean that when you do it to Python code, suddenly your class structures don’t match anymore.

Proponents of test-driven-development, AGILE, LEAN and ANEMIC programming methodologies will probably argue that it’s easier to practice their religion with a high-level language.  I’m not buying that either, I’ve managed to write several large software projects in C that have a comprehensive test suite – including testing for allocation failures.

Ah, Rapid Application Development I hear you say?  Well, the only people I’ve ever heard say how great RAD is are people who’ve never had to support the software that was written rapidly, or debug issues with it years later.  RAD is great when v2 is going to be a complete rewrite, and v3 a complete rewrite again.  Very few websites have upgrades without announcing a completely new codebase.

It’s certainly true that it’s faster to mash up some code in a high-level language.  I use shell scripts and bits of Perl for this kind of thing all the time, and I frequently even do basic mock-ups or essays in Python.  But ultimately it all tends to be throw-away code, that I don’t really ever intend to take seriously or attempt to support later on.

For larger projects, I just don’t see any difference in the time it takes to write.

I’d like to cite an example.  The GIT and Bzr revision control systems are about the same age, one of them is written in C and one of them is written in Python.  It hasn’t taken them any less time to write the one in Python than it’s taken the others to write the other in C.  The one in Python doesn’t have extraordinary features that the one in C lacks.

C# fans would point out how much faster it is to UI code.  Really?  Then why isn’t Banshee that much dazzingly better than Rhythmbox?  Sure they’re different, but there’s nothing there that suggests one language is better than the other.

And do you know what?  I trust code written in C far more than I do any higher level language.  No, that’s probably not fair.  I trust C programmers far more than I do programmers of other languages.  If you tell me I have the option of choosing a program or library written in C over one written in Python or C#, I’ll take the C one every time.