Mark Gritter (markgritter) wrote,
Mark Gritter

Almost As Common As Off-By-One

Today I resolved a bug that turned out to be a trivial mistake. (It hadn't previously been a bug because we hadn't up to this point depended on the correctness of this field.)

The code was something like this:
int SomeClass::someFunction( int &operation,
                             int &seqNum ) {
   seqNum_ = seqNum;

The correct implementation would have loaded the member variable's function into the "output parameter", rather than trashing the sequence number (and leaving the output undefined.)

   seqNum = seqNum_;

So, naturally I wondered how to avoid such problems in the future. Ideas I've used (or observed) in the past:

(I) Naming: Name output parameters something distinctive, like an "out" prefix. Or use something more visually easy to track for member variables, like "m_" prefix instead of just "_" suffix.

(II) Require the use of explicit scoping to access member variables. this->seqNum = seqNum_ wouldn't compile, and makes it obvious which side is the member variable.

(III) Forbid output parameters. Languages really should have built-in tuple support to get this right, though.

(IV) If the method is logically constant, mark it as such and have the compiler catch the assignment-to-member.

(V) Annotate output parameters and check them with the compiler. (However, in some cases not all output parameters get used. We could still demand that they had well-defined values.) Eiffel-style contracts are a more sophisticated version of this.

(VI) Use a program analysis tool to determine that the seqNum output parameter is being used uninitialized at the call site.

(VII) Write the damn unit test next time!

What's interesting to me is that all these mechanisms (from coding conventions up to data-flow analysis) seem to be points on the same continuum. They add more semantic meaning to the bare assignment statement. In some cases the correct behavior is implicit by saying "one of these variables is different" (whether it's the output parameter or the member variable). In other cases it is more explicitly defined by describing in some formal way the behavior that is expected.

Idea (III) is the closest to escaping by "defining the problem away". You could just as easily say "program in a functional language to avoid imperative statements altogether." But I would still characterize it as an implicit separation of variables semantically into "inputs" and "outputs", just one that is particularly strictly enforced.

But, since I have a mathematical background perhaps I'm too limited by thinking of things in terms of program semantics. Is there another axis or branch of correctness-improving techniques? One that has gotten a lot of attention is:

(VIII) Have another human check your work in real-time. (Or offline, in code review.)

Obviously the human could be using some mental model of how the program should operate (and how it should be written) but it's not obvious that is what drives somebody saying "wait, you swapped the two variable names." In fact, what might be going on could be as "superficial" (or, as tremendously complex) as:

(IX) Reasoning by analogy.

If you saw:

   operationType = operationType_;
   firstParameter = firstParameter_;
   secondParameter = secondParameter_; 
   seqNum_ = seqNum;

then even without building a sophisticated model you should be at least suspicious that the last line is incorrect. Could we build a tool that applies ML and Bayesian analysis to correctness "hinting", rather than explicit constraint or model-checking?
Tags: programming
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.