Postgres Over Everything - Why You Should Probably Just use Postgres for your next Web App
Date: 2024-09-18 | create | tech | postgres | webapps | mongo | redis | sqlite | oracledb | sqlserver |
I recently read Ethan McCue's Just Use Postgres and a lot of it resonated with me from my own experiences building apps and managing data at work and in side projects.
In this post I'm going to focus on a few reasons why Postgres should be your default DB for webapps and some strategies for how I like to use it that balances simplicity and scale.
Just Use Postgres
Overall my recommendation is to just use Postgres for most web applications. It's free, it's reliable, and it scales - moreso than basically any other competitor.
- Why use Relational? (and not NoSQL like Mongo or DynamoDB)
- Why not alternatives like MySQL, MSSQL, SQLite?
- How I use Postgres
Why use Relational Databases?
The alternative from relational databases is basically noSQL - things like doc dbs (Mongo, Dynamo) or k/v stores (Redis, Valkey).
A lot of startups and companies created in the past decade have have built on DocDBs (like Mongo) because it promised a more straightforward data experience - just dump your data as JSON and operate on it there. This works - until it doesn't.
Simplicity
- Simple - to save / read
- Not simple - to keep consistent, evolve over time
Scalability
- Scalable - To read / write via primary key (object id) and horizontally scale via sharding
- Not Scalable - To read / write against any other key (i.e. a scan), look at only parts of the document
Systemization
- Systemizable - Can just have json versioning, buid your own migrations, do in-mem scans
- Not Systemizable - Hard to keep everything consistent, json versions must be kept migratable through lifecycle of data, migrations are often roll-your-own
I'm not saying that Mongo / DocDBs can't work but I am saying that they often bring with them edge case issues that there's not always a great way to resolve. They work quite well when you know what you're doing upfront but the hard part of software is that the requirements keep changing - which means often we don't know exactly what we want at the outset.
Relational databases give us tools to enforce the most important things about our core data:
- Data Integrity - If you can't trust your data, what's the point in having it? Things like data schemas, constraints, and relational keys helps make sure that the data we're storing makes sense - preventing issues long-term. Moreover Relational DBs have tools to handle migrations out of the box - allowing you to plug into these battle-hardened tools vs rolling your own.
- Existing entity changed? Cool update the constraints / add a new row / handle the migration.
- Data flexibility - Whether you like it or not, your data is related. If it wasn't related, it wouldn't be in the same DB. Taking this into account, and having the ability to enforce relations with constraints / foreign keys, improves data integrity but also allows you to remain flexible and performant as your data continues to change and grow.
- Have a new entity you want to track? Cool create a table and relate it to existing stuff.
- Performant - Because our data is inherently related, it makes sense that performance on these relations is critical. DocDBs do this well when all the data is in a single doc and you can do lookups on their primary key. When it's not and you need to scan on other fields... it starts to break down. In almost all business cases you will eventually need to scan on non-primary key fields for analytics, new features, or new rules so making a reasonable decision now that works for long-term can protect you from inevitable rework in the future.
- Have a slow read path? Add an index / view / query hint.
I've seen these docDB scaling issues first-hand at my current company Rippling. We use MongoDB but we don't really use it as a DocDB. There is no one document per customer - there are simply too many models with too many teams involved to make that scalable. Instead we have a new docs per entity we're tracking - which looks a lot like a relational normalized table structure. What this means is that we get basically none of the benefits of a docdb (fast lookups on id) while all of the downsides (scans / joins are slow, low consistency contrainsts, roll your own migrations). There are certainly ways around this and my anecdata is on small N but my guess is that many usecases follow this pattern.
The downside with these relational approaches is that this extra consistency may require a bit more footwork to setup - you'll often want to update the schema directly yourself which is an extra step compared to just changing your model in code and saving it.
But really this step is required either way. If you're going docDB, you can't "just change it in code" because you still have existing data on the old code in the db. So you do have to either migrate it all now to make the data consistent or build infra to migrate it as you go - an extra step either way, just explicit in the relational model.
Personally, I find the gains in data integrity, data flexibility, and long-term performance are almost always worth it for the small extra cost in data schema maintenance. You only write sometimes but you read quite a lot. So take the extra time to do it right now so you don't have to redo it later.
Plus there are approaches to minimize the extra work required with relational schemas while still getting most of the benefits - we'll touch on these later.
The way I like to build my data models is:
- Core data model - Postgres. This gives me a solid core that is reliable, flexible, and performant long-term.
- Auxiliary data model - Mongo / Redis / etc. This holds extra data often as an external cache. This provides the improved lookup speed for things that don't need that extra consistency / flexibility since they are not the core.
Why Postgres vs other Relational options like MySQL, MSSQL, Oracle, Sqlite?
The reason to use Postgres is that it is reliable, performant, and free. It's the most loved DB, has all the features you might ask for via plugins, regularly ranks in the top 5 dbs for any given task performance-wise, and you don't have to worry about a license that may come and bite you later on.
Comparing this to other options:
- SQLite - Great little DB. But very constrained by scale. Hard to scale horizontally which means the single-server db works at small scale but harder to work at large-scale. (I expect SQLite to continue to improve in the next few years so this is one to keep an eye on)
- MySQL - Also a good "free" option. However falls behind Postgres in reliability, consistency, and performance and is owned by Oracle with a potentially scary license that could invoke audits and your data being taken hostage. Postgres is just better.
- MSSQL / SQLServer - By all accounts a good technology. But it's expensive and proprietary and in all the benchmarks I've seen falls behind Postgres. Why not just use Postgres for free and slash your costs / avoid data hostages?
- Oracle - Oracle has a terrible track record of performance, reliability, and support. Plus its DB is not very performant. The only reason I see people using Oracle is because the higher ups got swindled into a deal and no technical person was involved in the decisions. Don't use Oracle. There are far better options out there.
How I use Postgres DBs as a Simple Scalable System
I'm a big fan of Postgres - it's reliable, performant, and free. No other DB comes close when you compare the whole package. Other DBs will often be good competitors in a given usecase but Postgres wins out overall (and often beats others at their own game).
That said I think the general sentiment against Relational DBs is warranted - they can be a pain to work with if you don't SQL good and there is a bit of boilerplate involved with translating your code model into SQL which could slow down dev times.
So here I want to share an approach that I use with Postgres that tries to leverage it for the good parts while avoiding the known downsides of working with SQL.
The system is:
- Use columns for important metadata I will constrain / index / scan on - IDs, createdAt, Important Identifying infos
- Use a JSONB column for other properties
This basically gets the data flexibility / integrity / consistency benefits of Relational while still getting the dev benefits of DocDBs (not having to change the schema every time I add a field).
As I build my systems, if a certain property turns out to be more important than I originally thought, I will migrate it out from the JSONB into its own column so we can enforce its integrity even better.
This is the core system, then at hyper scale we can use auxiliary dbs like docdb, kv stores to handle caching / read-layers to remove load from our core db. These are rarely necessary until high scale and often can be pretty small, servicing just a few specific read paths.
This gives us most of the benefits of relational while limiting its downsides - typically giving us a better overall experience than if we reached for just relational or non-relational.
Next
This is how I currently build my apps. JSON integrity is still a bit of a pain requiring versioning and hand-rolled migrations but at least I have a bit more flexibility / integrity on the properties that matter most.
If you liked this post you might also like:
Want more like this?
The best / easiest way to support my work is by subscribing for future updates and sharing with your network.