The Continue That Doesn’t

One of the foundations of computer code is the humble loop. Going back at least as far as the beginning of C in the early 1970s, the two mainstays of loops in procedural languages have been the while and the for loop. Unfortunately, I think it’s fair to say that they also contribute to the kind of complexity in code that can be a source of technical debt. I’m going to look at one of the problems with these loop constructs.

While Loops

The while loop is probably the most primitive of loops possible. If we come across a while loop while reading code, all we know is how the loop will end. However, we don’t generally know what is going to cause that condition to occur. Of course, there are some things for which while loops are naturally suited. Probably the best example is the main processing loop such as for message and event based applications. For example, in the following code it is pretty clear how the loop will end.

while (app.IsRunning()) {
    event e;
    app.GetEvent(e);
    app.ProcessEvent(e);
}

For Loops

In contrast, the for loop in C and most other languages is an excellent construct in terms of readability. It tells you right from the outset exactly what you’re getting, both in terms of the termination condition and ensuring that the necessary changes are going to happen to get there. Take the following example. Even without comments, it’s pretty clear what you’re going to get. Although we could perform the same thing using a while loop, the for loop is infinitely more readable:

for (int retry = 0; retry < 5; retry++) {
    ...
}

One task that for loops are particularly suited to is looping over the elements in a collection. Let's take a look at an example of looping over some kind of vector using regular old for syntax:

vector vec;
for (int index = 0; index < vec.size(); index++) {
    ProcessElement(vec[index]);
}

As before, it's pretty clear what's going on even without comments. Now, let's consider a fairly normal sequence of changes. First, we get some additional condition where we don't want to process certain elements. This means we have to use the continue statement. The syntax here is pretty clear. Continue means continue on with the loop.

vector vec;
for (int index = 0; index < vec.size(); index++) {
    // Don't process if paused
    if (vec[index].state == PAUSED) continue;
    // Don't process if underlying connection closing
    if (vec[index].con_state == CLOSING) continue;
    ProcessElement(vec[index]);
}

This is still fairly easy to follow. But what happens now if we discover we also need to delete elements from within the loop? Well, now we have a problem because we'll end up skipping an element when our index is incremented by the for loop. In this particular case, we could probably just pre-decrement the index before looping, but that's not going to work in the general case. Instead, we have to convert our for loop to a while loop:

vector vec;
int index = 0;
while (index < vec.size()) {
    // Don't process if paused
    if (vec[index].state == PAUSED) continue;
    // Don't process if underlying connection closing
    if (vec[index].con_state == CLOSING) continue;
    ProcessElement(vec[index]);
    // Increment loop variable:
    index++;
}

Here, if you are an experienced programmer (or even an inexperienced programmer), you should be able to see the problem that we have. Our two conditions at the start of the loop are broken. The problem is that the continue statement in the for loop means something slightly different from the while loop. Yes, they both mean to continue, but one means to continue after advancing to the next element, the other means repeat the loop without changing anything. Our continue no longer continues anything. Before I go into this further, let's see the correctly converted while loop:

vector vec;
int index = 0;
while (index < vec.size()) {
    // Don't process if paused
    if (vec[index].state == PAUSED) {
        index++;
        continue;
    }
    // Don't process if underlying connection closing
    if (vec[index].con_state == CLOSING) {
        index++;
        continue;
    }
    ProcessElement(vec[index]);
    // Increment loop variable:
    index++;
}

And then once we add our delete condition:

vector vec;
int index = 0;
while (index < vec.size()) {
    // Don't process if paused
    if (vec[index].state == PAUSED) {
        index++;
        continue;
    }
    // Don't process if underlying connection closing
    if (vec[index].con_state == CLOSING) {
        index++;
        continue;
    }
    // Delete zombies:
    if (vec[index].state == ZOMBIE) {
        vec.erase(index);
        continue;
    };
    ProcessElement(vec[index]);
    // Increment loop variable:
    index++;
}

Is this a big issue? Well, it obviously is an issue if we're forced to convert a for loop to a while loop for any reason. From personal experience, infinitely repeating loops under only certain conditions often arise from this kind of error. Our code only hangs if it happens to meet the conditions for skipping the processing, which can sometimes mean only under certain very specific and hard-to-reproduce conditions. However, I think the bigger problem comes in terms of readability and maintenance. The first issue is that the first line of the while loop doesn't really tell us the full story about how our loop is going to end. Without the variable increment statement, it could potentially run forever.

For the second issue, let's say you are a maintenance programmer and you have to add some conditions to this kind of construct. You've gone down and added your special processing case, and now you need to bail out of the loop early and move onto the next element. It's no longer simply a matter of writing continue at the end of your case. First, you have to find out if you are in a for or a while loop. And if you are in a while loop, where do you find the code that moves onto the next element? In an ideal world, it would be clearly detailed at the bottom of the loop. But most world's aren't ideal, and your probably going to scroll down to the bottom of the loop and be left wondering exactly which parts of that code are needed for advancing the loop, and which parts are just part of the main loop body processing.

Can We Do Better

I think the heart of the problem is the use of the same keyword to mean two different things. What we really need is two different keywords. My proposal would be next and repeat. This makes things a whole lot easier for maintenance. next means advance to the next element in the collection. repeat means loop without advancing the loop variable. Using this would give us:

vector vec;
for (int index = 0; index < vec.size(); index++) {
    // Don't process if paused
    if (vec[index].state == PAUSED) next;
    // Don't process if underlying connection closing
    if (vec[index].con_state == CLOSING) next;
    // Delete zombies:
    if (vec[index].state == ZOMBIE) {
        vec.erase(index);
        repeat;
    };
    ProcessElement(vec[index]);
}

Related Posts

RSS

1 Comment

  1. PL said,

    June 23, 2015 @ 4:01 am

    I would also add the possibility and probably a good practice in most contexts avoiding modifying a collection and instead creating a new one or relying on immutable types altogether.

    This is definitely easier to implement in high level languages. On the other hand, almost everything is easier in high level languages. 🙂

RSS feed for comments on this post