Your Programming Language Benchmark is Wrong

Date: 2023-10-18 | create | benchmarks | software |

Programming language benchmarks like Tech Empower, Web Benchmarks, and Benchmarks Game utilize standardized test scenarios to try and determine which programming language is faster in each. Many software engineers (myself included) reference these benchmarks when comparing and choosing languages for projects.

The problem with these benchmarks is that they're all missing a key thing necessary to understand the true speed of a programming language. To find out what this is, we need to answer a different more broad question:

Q: What is "speed" wrt a programming language?

Performance

The obvious answer is performance or how fast can the language do operations (like math) or respond to web requests (which power your favorite website). This is the approach that most benchmarks take.

This is a valid answer and makes a lot of sense as it's certainly a large factor you want to be aware of when choosing a language (at least to avoid the deal-breaking slow ones). Plus it's relatively easy to implement (you'd think - it's actually quite hard) "fair" tests and compare the results.

So it's unfair to say that these benchmarks are wrong or doing the wrong thing (though all benchmarks have asterisks) - they are doing a reasonable thing reasonably well. But it doesn't really mean that much in reality.

The reason these benchmarks don't hold up when theory meets reality is that the things they're measuring are rarely the bottleneck in today's software.

  • Horizontal and Vertical scaling is easy in Cloud
  • Hot, slow paths can largely be resolved w caches and moving heavy ops off blocking path
  • Most SW just needs to work and be "fast enough" (i.e. respond to users in <200ms). This is not that hard (up to a few million DAU).
  • Not to mention that most of these test scenarios won't be comparable enough to what you're actually building to give stat sig answers

This means that these benchmarks are largely not that important outside of a few areas:

  • Avoiding lemons - some languages are so slow that you should avoid them entirely
  • Nanosecond-sensitive areas - Some areas are very affected by tiny perf changes like Big video games (Starfield, Cyberpunk), high frequency trading (to get those extra thousandths of a cent), hard science / number calcs (simulations of black holes or decades of supply chains (this is what Fast F# does))

Why your benchmark is wrong

Okay so I've already laid out why I think your benchmark isn't that useful in reality. But I haven't explained why your benchmark is wrong yet.

The main reason is that it's missing the largest factor in terms of software speed. So just as Big O is bounded on the largest factor, if your benchmark does not include the largest factor in terms of speed then it will also be wrong.

When I think of the speed of a given programming language, I typically think of it in two buckets:

  • User perspective: How fast does your thing work?
    • Measured in milliseconds, seconds (worst case: hours for big data jobs)
  • Business / Builder perspective: How fast does it take to build?
    • Measured in days, weeks, months (worst case: years for huge projects)
      • Build time: Amount of time to build the thing
      • Maintenance time: Amount of time to maintain the thing (adding / changing features, fixing bugs, keeping lights on, etc) <- most time is here

We kind of have the user perspective down by measuring speed of operations. Though this is still debatable as largely your bound will be on how you architect the pieces together (hello Big O) and that most users think in terms of boundaries (it's either slow enough to be noticeable or fast enough).

But we're missing the Business / Builder perspective which I'd posit is the larger factor and thus the bottleneck in terms of programming language speed. Whereas the User perspective is certainly important in terms of the usability of your software, the Business perspective is arguably a larger factor:

  • If you never get the thing built in the first place -> people can't use it
  • If your software is not reliable / keeps breaking -> people can't use it
  • The longer your software takes to keep the lights on / build -> more money spent from business, less time for improving other things

A better benchmark

So what would the ideal benchmark take into account?

Ideally it would take into account each of these parts:

  • User Perspective: Performance on common e2e workflows
  • Build Perspective:
    • Build Time: Time to build the workflow
    • Maintenance Time: Time to maintain, evolve the workflow

This would give us a much better perspective of how programming languages stack up in reality, not just theory.

But this is quite hard to do. Each business / tool has its own workflows it cares about (combinatorial explosion of User scenarios to cover) and no two eng teams are the same so the Build Time would likely vary a lot even if all other factors were held constant.

The best recommendation I've got for the Build Perspective is to survey companies (doing things similar to what you want to do) that have used different technologies and try to get some patterns out of what they used, liked / disliked, and ultimately learned from that. The best pattern I've pulled out of this is Static > Dynamic languages long-term.

Next

If you're interested in traditional User Perspective benchmarks, you might be interested in:

Want more like this?

The best / easiest way to support my work is by subscribing for future updates and sharing with your network.