Why computer science matters for software developers


I used to think computer science never mattered because I rarely used algorithms and never saw the value of algorithm-based interviews (I still don’t ;) ). The few folks I asked also concurred so I felt I was right.

September 2016

My team got reorged and our goal was to build and deliver a brand new SaaS offering. As part of that move, I made a conscious decision to switch to full stack engineering + dev ops. My experiences have made me re-evaluate my initial stance and realize I was wrong.

Computer Science does matter, a lot! Having that foundation empowers you to

  • Make better tradeoff decisions
  • Innovate new ways of solving problems
  • Spot design pitfalls early; e.g. a design that violates CAP theorem is a disaster waiting to happen.
  • Avoid solving impossible problems e.g. trying to parse HTML with regex.

The following paragraphs show how computer science principles underpin common concepts.

1. File systems and the HTML DOM

What do hierarchical file systems have in common with the HTML DOM? Simple, both are based on trees. Some common operations on trees include reparenting, search and walks. The table below shows how file system and DOM operations can be tied back to basic tree operations.

Tree Operation File system DOM
Search File search Search
Traversal Directory listing Layout rendering
Node reparenting File move Hiding and showing sections of the DOM

Having this foundation allows you to ask deeper questions. For example, let’s analyze the reason behind the discouragement of excessive DOM manipulations.

It’s not the DOM – tree operations are fast. The challenge lies in repainting the entire HTML layout when deep-lying nodes change. If you keep moving nodes around, the browser has to play catch up and this adds up over time.

The ‘best practice’ is to detach the node, manipulate it and then re-attach it. This approach means the browser repaints twice – on attach and detach. The detached node can change several times without triggering DOM reflow since it’s no longer attached.

2. Solving scheduling problems

I wrote a timetable generator in PHP as an intern 8 years ago. Yes, it was a brute force solver and took about an hour to generate a timetable for an average-sized school.

A quick inefficient solution can get you to the market fast and might even work well at small scale. However, any attempt to extend or improve such solutions would be prohibitive. My brute force solution of 2009 would have broken for larger problem sets; a trivial ask such as introducing a new constraint e.g. multiple teachers for multiple classes, would have necessitated a rewrite.

The timetable problem is a constraint satisfaction problem (CSP). Other popular CSP problems include appointment scheduling, map colouring and the 8 queens puzzle. Backtracking search is a standard way to solve CSPs. Solvers can also leverage greedy search, minimum-conflicts heuristics and simulated annealing.

This approach separates the problem from the solver; thus it becomes easy to change the constraints and extend the solver to new scenarios.

3. Avoiding stack overflows

How would you make sure your recursive function never runs out of stack space? This might not be a problem if the language optimizes tail calls but alas most languages don’t.

  1. You could catch the stack overflow exception but that doesn’t mean your computer can’t calculate the value. It just means the recursive implementation exceeded the computer’s stack memory limit.
  2. You could convert the recursive function to a loop. This would require passing in values around and might not be so elegant.

A trampoline solves the problem beautifully and can be reused for all recursive functions and across languages. Read this post to learn how a trampolined factorial function allows you to compute the factorial of 30000.

4. Consistently parsing HTML with Regex?

A common joke is that experienced programmers send newbies on a wild goose chase by asking for a regex parser for HTML. It’s been long since I did any automata course but the short answer goes thus:

  • Regular expressions are a regular grammar (Type 3)
  • HTML is a context-free grammar (Type 2)

Type-2 grammars encompass Type-3 and are thus more complex. See the Chomsky hierarchy.

It might be safe to do this for a small set or to extract data out of pages. But saying this is something that’ll work all the time is not true.

Tip: Know the class of impossible computing problems and save yourself from wasting time on futile challenges.

Conclusion

Concepts like design patterns, computer networks, architecture, etc. all matter in the software engineering profession. However, having a solid CS background is key.

There are loads of opportunities to apply computer science principles to daily tasks – you just have to make the push.

Thoughts?

Related

Advertisements

Exploring my Facebook Network


So I finally completed the Social Network Analysis course by Lada Adamic today and I learnt quite a few things Alhamdulilah. Some of the MOOCs are really good but there are so many options that I sometimes get overwhelmed.

One of the cool things about the course is that students can get exports of their entire Facebook network (export tool available at this link). Of course, I got mine – I have always wanted to and I am fascinated by social computing. Analyzing my network in Gephi led to quite a few unexpected discoveries. Now, don’t get any ideas, I ain’t talking about those here ;). I’ll only talk about the ‘safe’ stuff which is already obvious public anyway.

So, I have about 1051 friends who share 23915 edges who can be segregated into four communities – these communities kind of map the friends I have made in different places and times. The average path length is about 3 while the network diameter is 8 (quite close to the much-touted 6 from the classical work of Milgram). The average degree is about 45, so my friends have on average about 45 friends? I dunno…

I generated a visualization of the network using the Fruchterman-Reingold algorithm (it provided the ‘nicest’ picture); the colours indicate the degree (i.e. number of connections to other friends) while the sizes of the nodes shows the betweenness centrality. This betweenness centrality can be explained as a measure of how important a friend is in bridging/connecting separate parties or groups of people.

My Facebook Network
My Facebook Network

The cluster on the top left shows the friends from my high school (Federal Government College, Kano). The person with the largest betweenness value (the big yellow circle) is the only person I spent about 10 years of my life with in the same institutions! (High School and University) It is quite unsurprising the person connects these two communities together.

On the top right is the community of friends I met at the MASDAR institute while the lower right (that appears to be two really close communities) are the friends I made in university – Obafemi Awolowo University. The left half is something of a sub community – it consists majorly of my colleagues in the Computer Science and Engineering Department while nearly every other person falls in the right half. Lastly, my family appears on the network too but I ain’t mentioning :).

Twitter next? Maybe.. Maybe not.