Scaling? Focus On Raw Speed
When web developers talk about scale, we're typically talking about supporting a greater number of users and requests. To achieve this, we want more threads, processes or hardware. Ideally, this type of horizontal scaling should allow you to grow linearly. If a single server can handle 5K requests per second, then two servers should be able to handle 10K requests per second. In other words, scale has become synonymous with the idea that we need more.
While more is good, less is even better. Consider a request that takes 100ms to process. In a single threaded application, we can handle 10 requests per second. One way to support 40 requests per second, is to use more threads. Specifically, we'd need 3 more threads.
As an alternative to adding more threads, we could also make our request take less time. To achieve the same 40 requests per second using our original thread, we'd need to cut the response time to 25ms.
With the multithreaded approach, the minimum, average and maximum time to serve a request is 100ms. With the singlethreaded approach, the minimum time is 25ms, the average time is 62.5ms and the maximum time is 100ms.
There's obviously a limit to how far you can take this. Eventually, you'll have no choice but to throw hardware at the problem. But raw speed should be the first thing you consider, and it should be something you strive to maintain, if not improve, as you move forward. Run tasks in the background, denormalize your data, master your cache, fix your broken serialization, fill every bit of RAM and leverage modern processors.
Once live, system performance is the hardest feature you'll ever be asked to add.