CloudSeed: Improving Docker build times by ~14%

Date: 2023-10-04 | cloudseed | docker | fsharp | sveltekit | postgres |

CloudSeed is a project boilerplate for F# / SvelteKit. It aims to be a Simple Scalable System (3S) for developing web apps, making development across these technologies easy via lightweight local Docker and Docker Compose orchestration.

CloudSeed Architecture

CloudSeed Architecture

This orchestration handles the run, build, destroy loop for local development:

  • Frontend - Typesript / SvelteKit
  • Backend - F# / Giraffe
  • DB - Postgres

For more on CloudSeed including an overview on how it works, see: Up and Running with CloudSeed (F# / SvelteKit boilerplate)

The problem with setups like this is that every abstraction layer adds an additional layer of complexity. As developers we often feel this most acutely via the amount of time our builds take to complete as each layer often means an additional layer of dependencies to build and bundle.

I start most of my projects with CloudSeed these days so I feel this pain a lot - like every time I build and run. So I got to wondering:

Q: Can we make these Docker Compose builds faster?

Answer

By combining several micro-optimizations, I was able to improve cached builds from 8.9s -> 7.6s = 1.3s (~14%). Theoretically non-cached builds should also be significantly improved but those times are bottlenecked by network, Docker image DL speeds which proved too variable to accurately measure.

  • Backend:
    • Dotnet 7 -> 8
      • Cached - 7.8s → 7.2s = 0.6s, ~8% improvement
  • Frontend:
    • Removing unnecessary build lines (rm -rf)
      • NoCache: 21.1s → 20.7s = 0.4s, ~2% improvement
  • DB:
    • Moving from 13.1 -> 15-slim
      • NoCache - 12.7 → 6.5s = 6.2s (48%)
      • Cache - 0.6s → 0.5s = 0.1s (17%)
  • All
    • Cache - 8.9s → 7.6s = 1.3s (14%)

Benchmark Setup

Benchmarks were run on the base version of CloudSeed. Before each change baseline measurements were taken followed by the change and additional measurements with the delta being logged as the change's effects.

I took two types of measurements:

  • No Cache - Purge Docker data then build the target
  • Cache - A target image is built, then make a sentinel change in source code to trigger a need to rebuild the target, then build the target

Theoretically most dev time will be spent building from Cache (i.e. a previous build exists on your machine) so this is more important.

Note: This is small n anecdata. I took ~4 measurements for each change. This means that this data may not be stat sig though I hope it's still directionally interesting.

Backend

The backend is built with F# running dotnet 6. Running this a few times resulted in:

  • NoCache: ~23.5s
  • Cache: ~7.6s

NoCache was heavily dependent on fetching metadata from the Docker repo and downloading the assets.

The Cached version was largely bounded by dotnet publish runtimes (making up ~95% of the build time) - which builds and packages the source for production use. My original dotnet Docker files were actually relatively efficient - placing the dependency check / download in its own layer which is easily cached.

The only thing I could think of to improve the performance of the dotnet publish command was to upgrade to the latest full release of dotnet - dotnet 7. That resulted in a median build change from ~7.6s -> 7.2s = 0.6s (~8%).

This upgrade was very smooth - it was a very simple change to make and didn't brea any of the existing libraries / code while still presenting noticeable build time improvements. Kudos to the dotnet / lib maintainers for that.

For an idea on how the Backend works, checkout: Build a simple F# web API with Giraffe

Frontend

The frontend is SvelteKit running on Node because SvelteKit is the best frontend framework I've found.

  • NoCache: ~21.1s
  • Cache: 4.9s

NoCache was again largely bounded by pulling metadata from Docker and downloading the base images.

Cache was largerly bounded by npm run build which is hard to get around when source changes are present, similar to Backend.

The only thing I found to improve so far was to remove some unnecessary build lines cleaning unused directories. These were largely cached anyway as they only were run when the package.json changed (rarely) but resulted in the removal of a ~0.4s step. Because these NoCache builds are still largely bounded by network and Docker responsiveness, this at best leads to a ~2% improvement in NoCache builds.

For more on how I'm running SvelteKit on Node in Docker containers, see: Run SvelteKit with Node in Docker

DB

IME the best dev environments are those that authentically simulate their prod environment. This is because it allows for a tighter feedback / build loop, allowing you to see what works (and what doesn't) closer to dev time.

One major cause of prod / dev discrepancies is utilizing different data infrastructure. I've seen this used to great detriment where queries that work in dev don't work in prod and vice versa.

So CloudSeed aims to remove this entire class of problem by providing a built-in, fully-fledged DB for use in dev and testing. It ships with Postgres configured out of the box, but is easy to swap out with other DBs with Docker images (most mainstream ones).

  • NoCache: ~12.7s
  • Cache: ~0.6s

Unfortunately DBs can be rather large. Fortunately their images can largely be cached cause you'll rarely change their configuration.

Here I moved from a full version of Postgres 13 to the slim version of Postgres 15, leading to a smaller image payload. This paid off largely in the NoCache scenario (bounded by network downloads), improving builds from 12.7s -> 6.5s = 6.2s (48%). In Cached land the change led to improvements from 0.6s -> 0.5s = 0.1s (17%), though this change is unlikely to be noticeable.

All

After making all of these individual changes, I decided to rerun the full builds to ensure everything worked together. Interestingly the Cached scenario showed improvements larger than the sum of the individual improvements from 8.9s -> 7.6s = 1.3s (14%).

I'm not entirely sure why this is the case but my guess is it has something to do with lowering contention on memory, compute if it's not just a non stat sig fluke /shrug.

Next Steps

That's all I've got for now. Lmk if you have suggestions for additional improvements to these Docker builds!

If you're interested in building web apps with F# and SvelteKit, checkout CloudSeed.

References

Want more like this?

The best / easiest way to support my work is by subscribing for future updates and sharing with your network.