How fast can we sort 40k elements in TypeScript?

Date: 2023-07-05 | create | tech | typescript | performance |

I've been doing a lot of software engineering interviews recently as I enter the tail end of my 7 month entrepreneurial / 3 month sabbatical journey.

In one hypothetical scenario I was tasked with creating a system to return the top 10 results for a given corpus of 40k elements whose values would change as data streamed in. On further investigation of the problem space, it was revealed that these values would only be read to power some near real time dashboards by at most 5 people at a time.

Constraints:

  • Reads: Max of 5 concurrent readers
  • Latency: Updates ~1 min
  • Load: Data streams constantly, likely < 10 /s
  • Memory: No memory bounds (can hold all data in memory easily)
  • Data Storage: No access to data store -> want you to solve in memory

I proposed that given the relatively small load on the service, low rates of reads to writes, and the speed of most modern sw / hw we can probably just sort on read for now. My guesstimate was that we could sort 40k elements in < 2s which fulfills these constraints.

Pros:

  • Simple - easy to see it's correct
  • Scalable - For current load (even up to 100s of users) this is plenty and we could always throw a cache w TTL of 1 minute in front to scale further
  • Cheap - Finish this task fast, if more constraints pop up we can optimize for those

Of course we likely would think twice before pushing this to production but given the hypothetical constraints this seemed plenty.

The problem is while I intuitively believed that we could easily handle sorting a 40k list in < 2s in memory I didn't really have hard data proving this. So I came away thinking this was probably a workable solution but potentially this assumption was false.

Q: How fast can we sort a list of 40,000 elements in TypeScript?

For this question I used TypeScript so for the rest of this post we're going to dive into how fast we can actually sort 40k elements (and other large list sizes / data types) to get some data we can reference in the future.

Results: Sorting 40k elements in TypeScript

In general, we can sort lists of size 40k in less than 0.5s. The speeds seem to depend a bit on the data types but median sorts trend below 0.2s.

Median time to sort 40k elements by data type:

  • Numbers: 150-270ms
  • Strings: 22ms
  • Object.Number: 31ms
  • Object.String: 104ms

Benchmark Setup

Every benchmark has its own shortcomings so here I'm going to describe the general approach for later reproducibility / criticism.

Various Scenarios

This is a best-effort experiment but I realize that only checking the sorts of one type isn't that useful in the real world. Instead, looking at several different "simple" types is more applicable for basing decisions on.

Here I thought of some common scenarios you might try to sort on:

  • number list
  • string list
  • object.number list (an object with a number property)
  • object.string list (an objects with a string property)

There are obvs way more types you could sort on but I think this is a good 80/20 to start with and hopefully these benchmarks are simple enough that you could add additional ones to test your own custom scenarios.

Uniform, Reproducible Hardware

A lot of simple benchmarks like these are run on the user's computer. I don't love this because there's a lot of cardinality in personal computer setups which could have large effects on performance.

I opted for Replit as it provided a simple interface to run adhoc code in a small, default environment that theoretically anyone else could use to reproduce this for free.

Replit Basic (free) instance details:

  • Template: Official TS from Replit (running Node I believe)
  • 0.5 vCPU
  • 512MB RAM
  • 1GB Storage - Unused

Benchmarking code

I opted to write my own simple benchmarking code mostly cause I wanted to and also cause I figured it'd be simpler to see how things work. I'm not sure if that's actually true cause my code got a little longer than I'd hoped but at least it's all in one script so you can audit it as you please.

This benchmarking code aims to be as simple as possible and is self-contained so you can copy paste the whole thing in replit to reproduce this experiment.

Full benchmark code is available in the Code section.

Benchmark Results

I ran each benchmark scenario 200 times to try and get a large enough population to have representative data. Here I'll list the median times as it's 80/20 representative of runtimes and easy to format / compare here. If you want other stats, you can run the benchmark code which outputs further infos.

Results of sorting n elements in a list

Each set of results will be reported as:

  • COUNT : RUNTIME_MS

Sort Numbers list (median runtimes):

  • 10 : 0
  • 100 : 0
  • 1000 : 1
  • 10000 : 32
  • 40000 : 159
  • 100000 : 722

Sort String list (median runtimes):

  • 10 : 0
  • 100 : 0
  • 1000 : 1
  • 10000 : 6
  • 40000 : 22
  • 100000 : 170

Sort Object.Number list (median runtimes):

  • 10 : 0
  • 100 : 0
  • 1000 : 0
  • 10000 : 7
  • 40000 : 31
  • 100000 : 270

Sort Object.String list (median runtimes):

  • 10 : 0
  • 100 : 0
  • 1000 : 0
  • 10000 : 15
  • 40000 : 104
  • 100000 : 215

re: slow sorts of number list

I was really surprised with the inital results of sorting a list of primitive numbers performing so poorly - worse than sorting strings (which I assumed would be harder) AND worse than sorting objects by a number property which I assumed would at least incur some overhead and run slower but actually runs 3-5x faster. My initial hypothesis was that these results were biased for some reason - maybe due to the runtime not warming up yet.

To test this, I added another number list scenario at the end of the benchmarks as runtime should def be warmed up by then so theoretically wouldn't be affected by this.

The results were similar so I don't think it was a problem of runtime warming but it's still very weird to me so potentially something wron w benchmarks.

Sort number list w warmed runtime (median runtimes):

  • 10 : 0
  • 100 : 0
  • 1000 : 1
  • 10000 : 25
  • 40000 : 265
  • 100000 : 718

Benchmark Code

Here I'll copy / paste the current version of the benchmark code. It is mostly self-contained so you should be able to copy / paste this block of code anywhere you can run TypeScript and run the full benchmark.

There's a lot of code here because I'm a bit of a verbose programmer (explicit > implicit) but really it boils down to three primary entities:

  • Benchmark Timer - Passed into a BenchmarkScenario from a BenchmarkRunner - allows the Scenario to time a given piece of code it cares about.
  • BenchmarkScenario - This is the code to be benchmarked, but the whole thing is not benchmarked by default. Instead, it uses the passed-in BenchmarkTimer to wrap the code it wants to have bnechmarked. This is useful as each Scenario can now do whatever setup it needs to (creating lists etc) outside of the benchmark before calling the code it wants timed.
  • BenchmarkRunner - This takes in a scenario, a number of times to run it, and handles the scenario execution / results summary.

Replit link: Run this code on Replit

index.ts

type BenchmarkCode = () => void

type BenchmarkResult = {
  TimeElapsedMs: number
}

type BenchmarkTimer = (code: BenchmarkCode) => BenchmarkResult

type BenchmarkScenario = (
  benchmarkTimer: BenchmarkTimer
) => void

type BenchmarkResultsSummary = {
  ScenarioIterations : number,
  MedianTimeElapsedMs: number,
  AverageTimeElapsedMs: number,
  MaxTimeElapsedMs: number,
  MinTimeElapsedMs: number,
}

const benchmarkTimer = (
  codeToBenchmark: BenchmarkCode): BenchmarkResult => {
  const startTimeMs = Date.now()
  codeToBenchmark()
  const endTimeMs = Date.now()
  return {
    TimeElapsedMs: (endTimeMs - startTimeMs)
  }
}

const createBenchmarkTimerWithResultCallback = (
  logResult: (result: BenchmarkResult) => void
): BenchmarkTimer => {
  return (benchmarkCode: BenchmarkCode) => {
    const result = benchmarkTimer(benchmarkCode)
    logResult(result)
    return result
  }
}

const getMedianElapsedTimeOfResultsArray = (results: BenchmarkResult[]): number => {
  const sortedResults = results
    .slice()
    .sort((a, b) => {
      if (a.TimeElapsedMs < b.TimeElapsedMs) {
        return -1
      }
      return 1
    })

  const middleIsh = Math.floor(results.length / 2)
  return results[middleIsh].TimeElapsedMs
}

const getAverageElapsedTimeOfResultsArray = (results: BenchmarkResult[]): number => {
  const sum = results
    .reduce(
      (accumulator, currentValue) => accumulator + currentValue.TimeElapsedMs,
      0
    )

  return sum / results.length
}

const runBenchmarks = (
  scenario: BenchmarkScenario,
  times: number): BenchmarkResultsSummary => {
  let results: BenchmarkResult[] = []

  const benchmarkTimer = createBenchmarkTimerWithResultCallback(
    (result: BenchmarkResult) => { results.push(result) }
  )

  Array(times)
    .fill(0)
    .map(_ => scenario(benchmarkTimer))

  return {
    ScenarioIterations: results.length,
    MedianTimeElapsedMs: getMedianElapsedTimeOfResultsArray(results),
    AverageTimeElapsedMs: getAverageElapsedTimeOfResultsArray(results),
    MaxTimeElapsedMs: Math.max(...(
      results
        .map(r => r.TimeElapsedMs))),
    MinTimeElapsedMs: Math.min(...(
      results
        .map(r => r.TimeElapsedMs))),
  }
}

// const scenario1 = (benchmarkTimer : BenchmarkTimer) => {
//   console.log("In Setup")

//   benchmarkTimer(() => {
//     console.log("Running benchmark!")
//   })
// }
// const results = runBenchmarks(
//   scenario1,
//   5
// )
// console.log(results)

const runAndLogBenchmarks = (
  scenarioName: string,
  times: number,
  scenario: BenchmarkScenario,
): void => {
  console.log(`Running scenario: ${scenarioName} - ${times} times`)
  const resultSummary = runBenchmarks(
    scenario,
    times
  )
  console.log(resultSummary)
}

runAndLogBenchmarks(
  "SanityCheck",
  5,
  (benchmarkTimer: BenchmarkTimer) => {
    console.log("In Setup")

    benchmarkTimer(() => {
      console.log("Running benchmark!")
    })
  }
)

const scenarioIterations = 200

const createNItemsArray = (
  createItem: () => any,
  n: number
): any[] => {
  return Array(n)
    .fill(0)
    .map(_ => createItem())
}

// Referencing: https://stackoverflow.com/questions/1527803/generating-random-whole-numbers-in-javascript-in-a-specific-range
const getRandomPositiveInteger = () => {
  return Math.floor(Math.random() * (Number.MAX_VALUE - 1));
}

const basicNumberSortRuns = [
  10,
  100,
  1000,
  10000,
  40000, // This is 40k - cpu struggles
  // HAMY: replit runs out of available cpus at 100k!
  100000,
  // 1000000,
  // 10000000
]
basicNumberSortRuns.forEach(n => {
  runAndLogBenchmarks(
    `sortNumberList-${n}`,
    scenarioIterations, // Iterations
    (benchmarkTimer: BenchmarkTimer) => {
      const numberList = createNItemsArray(
        () => getRandomPositiveInteger(),
        n
      )

      benchmarkTimer(() => {
        numberList.sort()
      })
    }
  )
})

const basicStringSortRuns = [
  10,
  100,
  1000,
  10000,
  40000, // 40k
  100000,
]
basicStringSortRuns.forEach(n => {
  runAndLogBenchmarks(
    `sortStringList-${n}`,
    scenarioIterations, // Iterations
    (benchmarkTimer: BenchmarkTimer) => {

      // Not a true GUID but crypto.randomUUID() not working in replit
      const stringList = createNItemsArray(
        () => getRandomPositiveInteger() + "",
        n
      )

      benchmarkTimer(() => {
        stringList.sort()
      })
    }
  )
})

type SortableObject = {
  aString : string,
  aNumber : number
}

const createSortableObject = () => {
  return {
    aString : getRandomPositiveInteger() + "",
    aNumber : getRandomPositiveInteger()
  }
}

const objNumberSortRuns = [
  10,
  100,
  1000,
  10000,
  40000, // 40k
  100000,
]
objNumberSortRuns.forEach(n => {
  runAndLogBenchmarks(
    `sortObjNumberList-${n}`,
    scenarioIterations, // Iterations
    (benchmarkTimer: BenchmarkTimer) => {

      const objList = createNItemsArray(
        () => createSortableObject(),
        n
      )

      benchmarkTimer(() => {
        objList.sort((a, b) => {
          if(a.aNumber < b.aNumber) {
            return -1
          }
          return 1
        })
      })
    }
  )
})

const objStringSortRuns = [
  10,
  100,
  1000,
  10000,
  40000, // 40k
  100000,
]
objStringSortRuns.forEach(n => {
  runAndLogBenchmarks(
    `sortObjStringList-${n}`,
    scenarioIterations, // Iterations
    (benchmarkTimer: BenchmarkTimer) => {

      const objList = createNItemsArray(
        () => createSortableObject(),
        n
      )

      benchmarkTimer(() => {
        objList.sort((a, b) => {
          if(a.aString < b.aString) {
            return -1
          }
          return 1
        })
      })
    }
  )
})

basicNumberSortRuns.forEach(n => {
  runAndLogBenchmarks(
    `sortNumberListWarm-${n}`,
    scenarioIterations, // Iterations
    (benchmarkTimer: BenchmarkTimer) => {
      const numberList = createNItemsArray(
        () => getRandomPositiveInteger(),
        n
      )

      benchmarkTimer(() => {
        numberList.sort()
      })
    }
  )
})

Next Steps

All benchmarks have asterisks but I still think it's useful to run experiments and share the results as it helps us to build a general mental model and ask better questions. Let me know if you have any questions / suggestions / criticisms of the benchmark - I'm sure there are some incorrect things here and I'd love to improve it if I can.

This is the first time I've done this kind of thing in TypeScript but if people are interested, I've got a few more experiments I've been keen to try.

If you're interested in Performance / Benchmark deep dives, you might be interested in:

Want more like this?

The best / easiest way to support my work is by subscribing for future updates and sharing with your network.