Data Structures for Programmers


A good understanding of data structures separates the master programmer from the noob. Some make it possible to get faster solutions to problems; and who doesn’t want faster code?

1. Heaps

In heaps, child nodes are always larger than their parents so the smallest element is at the root of the tree. (Max-heaps are the converse, child nodes are smaller than their parents and the largest element is at the root of the tree).

Heaps are good for operations that involve repeated removal/retrieval of a maximum/minimum value from a collection of elements. If you encounter these scenarios; you’ll do better to use a heap which gives you the desired max/min element (using max-heap/min-heap respectively) in O(logn) time. This is definitely better than the O(n) time you’ll get with other data structures like lists, sets and hash tables that require you to search through all elements.

Heaps are majorly used in graph algorithms (a famous example is Dijkstra’s), heapsort and priority queues. Want to implement your own heap? You can use arrays but watch out for issues when you want to dynamically increase the size of the heap.

2. Stacks and Queues

These are fundamental data structures and have been around for long. The stack’s distinctive characteristic is that the last element put into the stack is also the first to be popped off the stack. A queue is a queue; visualize a queue of people; yes that is it! Data is appended at the end and popped off the front.

The stack is referred to as Last In, First Out (LIFO) while the Queue is First In, First Out (FIFO) [Sounds similar to GIGO right?].  Applications include reversing collections, parsing, evaluation of prefix and postfix expressions, compilers and program counters, depth first search, breadth first search, etc.

Got the coding skills? Implement a stack using a queue or vice versa!

3. Hash Tables

Hash tables map keys to values and are excellent for retrieving values in constant time O(1); absolutely better than the O(n) for arrays/lists. The main advantage of hash tables is speed and they can be used to significantly improve the performance of many programs.

Well, hash tables are no silver bullets and have the downside too (increased computation, storage and handling of collisions in keys). To use them efficiently, you have to know how many items you need to store, how often you’ll need to update and search for items and the distribution of keys.

When you need to make 300 thousand lookups in a 100-item dataset every second on a 5MB processor; please use a hash table; don’t use an array.

4. Graphs and Trees

I used to think graphs were 2-D/3-D plots showing the relationships between some variables…Graphs/Trees are sets of objects with relationships between objects represented as links; these objects are called nodes/vertices and the edges are called links. Trees differ from graphs in two ways: nodes in trees have at most one parent and trees do not contain cycles. As such, every tree can be said to be a graph although the converse does not hold. There are lots of things that can be modeled as graphs in the real world.

Social networks (I think Facebook is an undirected graph while Twitter is directed), road networks, Chess solvers, Sudoku solvers, finding the shortest paths between two points/cities, solving mazes, representing files and directories structure (trees) and lots of others. I don’t like graphs because they tend to be tricky; I have to do all sorts of mental acrobatics trying to imagine all sorts of weird scenarios and expansions of the graph. Imagine trying to find cycles of a certain length in a graph, “mind bending it is…” (” Yoda’s voice”).

Did I miss anything? Arrays?! Well, you weren’t expecting me to write about that, were you? ;) Well, arrays have constant-time access, are space-efficient and occur contiguous blocks of memory. On the flip side, they do not grow dynamically. Other interesting structures are binary trees, binary search trees, red-black trees, b-trees, treaps, tries etc.

Lastly, please don’t use an array in situation where a hash table, graph or heap would be better; it doesn’t just fit.

Still interested? Search for the following: string data structures, set data structures and geometric data structures.

Got some comments? Share below..

4 thoughts on “Data Structures for Programmers

  1. There are numerous excellent books on the subject as well as template/generic implementations (C++ STL + http://www.boost.org) and Generics in Java. Further, there are several web pages on the subject. So, what is the purpose of this posting

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.