In the last post, we discussed the reliability of data systems, next topic discussed in the book is scalability.
Scalability is the ability of system to maintain performance even with changes in the incoming load .
What is load? Load is the number of requests coming to a system. Generally we use certain parameters to describe load called load parameters. Load parameters differ based on the architecture of the system. Example for a service it is number of requests incoming, for database it is number of read/writes, etc.
Performance parameters/metrics These metrics measure the performance of systems and let us determine what happens in both the cases –
- When we increase a load parameter but keep the system resources same, how is the performance of the system.
- When we increase a load parameter, how much resources we need to change to maintain the performance of the system.
Some common performance metrics –
- Throughput – the number of records system can process per second, or the total time it takes to run a job on a dataset of a certain size.
- Service Response Time – time between client sending a request and receiving a response. This is the time that client sees, includes networking delays and other delays.
Generally service response time is measured in percentiles. The response times over a period of time are sorted for slowest to fastest. Median or p50 is the halfway point. p50 is good metric if you want to know how much the users have to wait to get response.
For finding outliers, p90, p95, p99, p999 are good metrics. p95 -> 95% requests takes less than x seconds and remaining 5% takes more than x seconds. - Latency – Latency is the duration that a request is waiting to be handled—during which it is latent, awaiting service.
Some other interesting terms related to performance metrics –
- head-of-line blocking – The number of requests a system can process small number of things in parallel. So even a small amount of slow requests can hold up the processing.of following requests. This is called head-of-line blocking.
- tail latency amplification – Say our backend service calls multiple services in parallel to serve the end user request. The end user request will slow down even if one of those parallel calls is slow. This is called tail latency amplification.
Approaches for handling load – Scale up (switching to system with more power) and scale out (distributing load to more systems in parallel).
Some systems are elastic, they add more systems automatically as load increases and remove them once load systems decreases. Other systems are manually managed, means human analyses system and decides to add more resources or remove them.
The decision on which approach to use to manage load to maintain scalability depends on architecture. And it might even change with different increases in load.
Thanks for stopping by! Hope this gives you a brief overview in to scalability aspect of data systems. Eager to hear your thoughts and chat, please leave comments below and we can discuss.