The Millau Viaduct. The Guggenheim Museum Bilbao. The Dubai Palm Islands.
Architectural masterpieces – beautiful art with solid engineering foundations.
Bash. Hadoop. Git.
Great software – beautiful code with solid architectural themes.
Is software development art? Pretty much. Is it engineering? Pretty much as well.
Dams, tunnel-boring machines and turbines are similar to operating systems, high-performance load-balancers and big data systems. Regardless of the end product’s tangibility – good engineering is necessary for delivering great results.
Engineers need to estimate system performance and simulate real-life scenarios. For most engineering fields, there are rich banks of proven theories and mathematical relations to rely upon. Unfortunately, software engineering – the new kid on the block – has a few rigorous rules, most times we rely on heuristics and handed-down wisdom.
The good news is that most of these rules can be successfully tailored to software engineering.
This post examines a few of these rules and shows how to use them.
1. Rule of 72
This is great for estimating exponential growth rate scenarios. It is well known in the financial sector but can be easily applied to algorithmic growth. The rule:
An exponential process with a growth rate r will roughly double its value in time t if r X t = 72.
The rule has its roots in logarithms (log 2 ~ 0.693). 69 is more accurate however it doesn’t have as many factors. 68 and 70, which are just as close, have the same flaw too. The closest easy-to-factor number is 72 (factors are 1,2,3,4,6,8,9,12,18,24,36 and 72).
An exponential algorithm with a growth rate of 4% (i.e. it takes 1.04 more time to run a problem size of n + 1 compared to n) will have its run time doubled when the problem size increases by a factor of 18. Why? Because 4 * 18 = 72.
What if the problem size increases by 90?
4 * 90 = 360
360 / 72 = 5
Thus, we can expect the run time to double 5 times, a 32-fold (2 ^ 5) increase in run time.
Test: What problem size would cause a 1000-fold increase in runtime? Hint 1000 ~ 1024, (2^ 10).
2. π seconds make a nanocentury
Well, not exactly but close enough for back-of-the-envelope calculations and easy to remember.
π, the ubiquitous constant, is approximately 3.1427 thus π seconds is ~3.1427 seconds. A nano-century is 10-9 * 100 years which is 10-7 years. There are approximately 3.154 x 107 seconds in a year and thus about 3.154 seconds in a nano-century.
The 0.3% difference between 3.154 and 3.142 is safe enough for quick estimates. Thus, π can be used for such quick calculations (with some minor accuracy losses).
Let’s build on rules 1 and 2 with a possible real-world scenario.
An exponential program with a growth rate of 72% takes 300 seconds to run on a sample size n; would it be wise to run the program on a problem with n = 1000000?
Using the rule of 72, a 1000-fold increase in n leads to a 10x increase in run time.
0.72 * 1000 = 720.
720 / 72 = 10
Doubling the run time 10 times gives a factor of 1024. The 1000-size problem should take about 1024 * 300 seconds ~ 300 000 seconds.
Let’s invoke the π seconds rule next, 300 seconds ~ 100 π seconds which in turn gives about 3.6 days. If a 1000-fold increase will cause a 4-day wait period; you can well imagine what a million-fold increase would lead to. Spending 3 days on a better algorithm might be worth it…
3. Little’s law
Little’s law states that the average number of things in a system L, is equal to the average rate at which things leave (or enter) the system λ, multiplied by the average amount of time things spend in the system, W. i.e. L = λ * W.
Imagine a processing system with a peak capacity of 2000 requests per second. Let’s further assume that is a 5-second processing time for each request. Using Little’s law, the system needs to robustly accommodate 10,000 requests (you better start thinking about memory too).
One fix might be to drop all messages that have spent more than 5 seconds in the system, thereby allowing new requests to come in. Otherwise, the system would need extra processing capability or storage under heavy loads.
A better approach would be to over-engineer and design a system with a higher capacity e.g. one with a 2500 requests per second limit. Thus 2000 requests per second spikes would push the system to a comfortable 80% of max capacity.
Is over-engineering bad? Doesn’t it go against YAGNI? Well, think of this: over-engineering is one of the reasons why the 132-year old Brooklyn Bridge is still standing today.
Little’s law is very useful when building web servers, real-time systems and batch processors. Stable complex systems can be built by linking several small systems that obey Little’s law. It has also been applied to Kanban work-in-progress (WIP) limits.
The beauty of Little’s law is in its simplicity and versatility – you can even use it to estimate queue wait times!
Note: The Guava RateLimiter mentions the rule, not sure if it implements it though.
4. Casting out 9s
A quick rule of thumb for verifying results of mathematical operations. Let’s take summation as an example.
Given x + y = z; the sum of all the digits in x and y modulo 9 must be equal to the sum of all the digits of z modulo 9. Difficult? Nope, just cast out 9s as you go along.
Let’s take an addition example:
The sum of digits in the sum is (9 + 4 + 6 +2) = 21. 21 modulo 9 is 3. The sum of digits in the addends is (1 + 2 +4 + 2) + (3 + 4 + 8 + 9) + (4 + 7 + 3 + 1) = 9 + 24 + 15 = 48. And 48 modulo 9 is 3. Since both remainders are 3, we can assume the addition is correct.
For faster results, cast out 9s as soon as they are generated. For example, 9462 gives
9 + 4 + 6 + 2 which gives 9 + 1 + 2 = 3.
Subtractions can be verified by casting out 9s in the reverse operation. For example a – b = c turns into a = b + c. The rule also works for multiplication and division too.
P.S: Teach this to your kids…
More and more scenarios
How many books can a 1.44MB floppy disk hold?
Assuming a book has 250 pages, with each page containing 250 words and an average word length of 5 characters. This gives an average of 312500 characters per book.
A character can be stored in a byte; thus a book could be stored in about 0.325MB (~0.31 MebiBytes MiB). 4 such books will easily fit in a 1.44MB floppy disk. What about a 2GB memory stick? That can hold 6100+ books (several lifetimes of reads).
Can I physically move data faster than 100MB/s Ethernet?
How long does it take to transfer 1TeraBytes of data via a 100MB/s Ethernet link channel?
1 Terabyte = 1000000 MegaBytes; thus the Ethernet link channel would take 1000000 / 100 = 10000 seconds for a successful corruption-free transfer; which is about 2.778 hours. If physically moving a 1 TeraByte drive to the new location takes 30 mins; then it makes more sense to do that.
Need to move 10 Terabytes? Maybe a courier would be faster…
This post is to show the power of back-of-the-envelope and how they enable us to make quick accurate estimates.
Casting out nines