Safely handling destructive loops


Collections abound everywhere in the real world and loops are very common in programming. A simple loop is shown below:

var arr = [1,2,3,4];

for(var i = 0, len = arr.length; i < len; i++) {
    console.log(arr[i]);
}

//Output
1,2,3,4

Destructive loops

Simple loops can be boring, after all, why not have the collection change during a looping operation for more fun? Won’t happen? Believe me it does, have run into such at least twice.

An example scenario involves collections of listeners (say for example in a pub-sub implementation). Ideally, you want to go through all the listeners and invoke them. However it is possible a listener can either directly or indirectly remove itself while the loop is processing. For example, it might call a destroy function on some object which in turn leads to a unsubscribe call.

When this happens, it’s all chaos. Funny things happen or even worse – bugs.

Suddenly, the loop invariants are all wrong and you get funny results and/or errors (most likely errors). Let’s take an example:

function createListenerArray(arr) {
   for(var i = 0; i < 4; i++) {
       arr.push(function(i) {
           console.log(i);
           if(i === 2) {
               arr.splice(i, 1);
           }
       });
   }

   return arr;
}

The createListener function pushes 4 functions onto the array reference it gets. One of these functions (the third one) will remove itself from the array once invoked. To make tracking things easier, each function will log its index to the console.

var arr = [];
createListenerArray(arr);
for(var i = 0, len = arr.length; i < len; i++) {
    arr[i](i);
}

0
1
2
//TypeError: arr[i] is not a function

Oops! Blows up…

The loop was proceeding smoothly until it got to the function at index 2. Invoking that function changed the array by removing the corresponding entry from the array. Thus the valid length of the array decreased from 4 to 3.

The next step of the loop tried to access the element at index 3; however that index is now an invalid position since the element at index 3 got moved to index 2 when the array shrunk. arr[3] is undefined and invoking undefined as a function leads to the TypeError.

Imagine the ripple effect this can cause in larger arrays

Let’s try forEach.

var arr = [];
createListenerArray(arr);

arr.forEach(function(val, i){
    val(i);
});

0
1
2
undefined

This is not the desired result but on the bright side, at least there were no Exceptions. So what other approaches exist?

1. Clone

One way is to create a clone of the array to provide a validation checker before invocations as shown below.

var arr = [];
createListenerArray(arr);
var clone = arr.slice(0);

for(var i = 0, len = clone.length; i < len; i++) {
    if(arr.indexOf(clone[i]) > -1) {
        clone[i](i);
    }
}
//
0
1
2
3

This guarantees that the functions would always exist; to verify, we can re-run the snippet again to prove that the value at index 2 is now gone.

for(var i = 0, len = clone.length; i < len; i++) {
if(arr.indexOf(clone[i]) > -1) {
        clone[i](i);
    }
}

//Output
0
1
3

2. Count down

Cloning can lead to new issues with memory allocation and garbage collection. There is another approach that is cleaner, doesn’t require cloning and should not leave danglers around.

The fix relies on understanding the true cause of the looping issue. The problem occurs because the removal of a loop element affects unprocessed elements in the loop (by changing the array length).

We can circumvent this by counting down from right to left – if an element removes itself, the unprocessed elements to the left are not affected since their indices are still valid (compare to the left-to-right count-up approach).

Now, even if a function should remove itself, things are still alright, since its index would be removed and not be valid anymore.

let arr = [];
createListenerArray(arr);

for(var i = arr.length - 1; i >= 0; i--) {
    arr[i](i);
}

//Output
3
2
1
0

console.log(arr.length);
//3

Neat! Now, this is another reason to use utils like _.forEachRight.

Liked this trick? Let me know your thoughts.

6 Comments

·

Leave a Reply

  1. I didn’t read through the code in detail. I don’t know JavaScript. But isn’t the real issue that an array isn’t the best data structure for a problem like this?

    It seems like a doubly linked list (as one alternative) would be better. Traversing a linked list is easy and there are no indexes or counters to update if an element gets inserted or deleted. The double links (forward and back) make it easy to insert and delete elements.

    The actual data structure will depend on how the data must be accessed and on how fast it must be accessed.

    If you don’t have pointer types in the language, lists can be simulated in arrays, but it gets a bit messy.

    No matter how you do it, you have to deal with memory management if you don’t know how many elements will be active at a given time.

    Liked by 1 person

    • Yes, the linked list would avoid the issue since there are no invariant changes.

      However there are no pointers in JavaScript :) and even representing arrays as linked lists would still lead to this issue when you want to delete an element from the list in a loop. So far I have run into this issue in C# + JavaScript.

      Liked by 2 people

      • I thought that might be the case. Only a few languages have pointer types. A lot of designers feel they are too error prone (and they’re probably right.)

        Just to clarify: It would be representing a linked list using an array, not the other way around. Doing this (as messy as it might be) will get around the problem without an explicit pointer type. The messy part is keeping track of the free nodes because the language won’t do it for you.

        If the array is small, you can just scan it from the start looking for the first node with both pointers null. If it’s larger, then you could use the pointers to have a free node linked list in the same array. This only works sanely if you can define arrays with more than one dimension.

        I don’t know C# either, but I’m surprised that it doesn’t have pointers given that native C has them for everything. They’re the quickest way to get a segmentation fault!

        Like

  2. The error-prone idea is to delete elements (array-parts or objects) instead of handling the values.
    Surly you should delete things you never need anymore if you can save some memory, but as you pointed out: in a loop that’s not the case.
    So in a loop better set the things to null, undefined, 0, “0” or perhaps even a negative value, i.e. for special user-defined error-handling.
    Nevertheless, nice to see you sawing the branches you’re sitting on ;-)

    Like

    • Haha at the sawing branch joke. Thanks for sharing your thoughts.

      The introduction of a value to replace the function would lead to extra code for validating entries and this might lead to even more bugs downstream. The countdown from the right approach is the best I have seen so far – even the angularJS watch listeners use this approach.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.