Instagram's tech stack will surprise you
Date: 2023-03-22 | featured | tech-stack | technology |
I love building software and consume a lot of content talking about the best tech stacks, programming languages, and strategies to scale systems and businesses. I also spent 3 years working as a software engineer at Meta - building and scaling backends for Facebook and Instagram, some of the largest services in the world serving billions of users a day.
The problem is that Meta's tech stack breaks many of the "best practices" espoused in the technosphere. Yet it's a system that's scaled far beyond almost every other system on the planet. This discrepancy calls into question the veracity of these "best practice" strategies.
In this post I'm going to detail 3 surprising things about Meta's tech stack to serve as a counterexample to balance these "best practices".
Architecture
Claim: Monoliths don't scale
A lot of people think monoliths don't scale. For years microservices have been pushed as the solution to this problem and many orgs have undergone the often multi-year endeavor of migrating their architecture.
Data: Monoliths do scale
Yet Meta, one of the largest software systems on the planet, runs on monoliths.
Meta's backbone is estimated at 100ks to millions of servers - most running monoliths. These handle the main app logic of each platform for Meta, Facebook, and Instagram.
Meta / Facebook / Instagram run on Monoliths
Balance: Monoliths can scale
On balance, Meta and its services don't only run on Monoliths. Instead, Monoliths handle most of the core app / business logic and heavy workloads get isolated into their own services.
HAMYTODO: image of service-based monoliths
I call this general architecture approach "service-based monoliths".
For more, read: Software Monoliths for Scale
Programming Languages
Claim: "slow" languages don't scale
Software engineers love to optimize - it's a common path to waste via the build trap and premature optimization. One area of optimization that leads to constant bikeshedding is choosing the best, fastest, newest language. We literally have multiple surveys / polls every year on this very subject (see Stack Overflow Dev Survey 2022)
A common argument is that slower languages don't scale - commonly levered against languages like JavaScript, Python, PHP, and Ruby.
Data: "slow" languages do scale
Well let's look at what languages run Meta / Facebook / Instagram - one of the largest software systems in the world.
- Meta / Facebook - PHP
- Instagram - Python / Django
Facebook and Instagram run on PHP and Python
Just like that we have a great counterexample showing these "slow" languages handling some of the heaviest workloads on the planet.
Balance: "slow" languages can scale
Now to be fair, Meta isn't just running PHP and Python and also the PHP / Python Meta runs isn't necessarily something you can get off the shelf.
- Meta / Facebook - PHP -> Hack - a self-built language fork from PHP (now entirely different language) that at one time returned 2-10x perf improvements over mainstream PHP (as of PHP7 has generally caught up to Hacklang)
- Instagram - Python / Django -> Forked version of each. In some cases this was reported to lead to ~10x improvement though available benchmarks point to a 1-3x improvement in some cases
Facebook and Instagram run on custom PHP and Python with faster languages mixed in
Heavy workloads that are specialized / intensive (think video encoding, ML workloads, big data) also don't make too much sense to run on these tech stacks. These workloads are often split off into specialized services built with "faster" languages like C++ and Rust.
Version Control
Claim: git is the tool for version control
Everyone uses git. It interoperates everywhere - GitHub, GitLab, BitBucket, VS Code, etc.
Data: Meta doesn't use git for version control
Instead Meta uses Mercurial - a tool they moved to after facing scaling challenges with git.
I only use git via UI (exceptions for init, cloning a repo).
— Hamilton Greene (@SIRHAMY) November 16, 2022
Version Control is ripe for disruption by simply moving it into the 2000s -> Make the 5 most common workflows easy to do / understand.
Meta got this right with SmartLog. Watching OSS version - https://t.co/YAM6fA6ffK https://t.co/Pj51OSPoc7
Personally after using Meta's mercurial, I can say git sucks. It's painful to use and basically everyone uses 5% of the thing and the other 95% just gets in the way.
Balance: git is the most popular VC tool
But just because I dislike git and think Meta's mercurial is better, doesn't mean I'd recommend it right now.
Popularity comes with a lot of benefits like interoperability, tooling, and support. Mercurial doesn't have that and it has a long road before the developer experience reaches that of git.
Also - it's unlikely your codebase will reach the kinds of scalability challenges Meta did so git will probably be fine for the entirety of your system's life.
I am keeping an eye on Meta's Sapling to see if Mercurial can gain some traction.
Conclusion
That's it - hopefully some ammunition to contradict some "best practices" when they simply don't make sense in the given situation.
If you're interested in the tech stack I use to launch Simple Scalable Systems, read: Up and Running with CloudSeed (F# / SvelteKit boilerplate).
Want more like this?
The best way to support my work is to like / comment / share for the algorithm and subscribe for future updates.