Mark Gritter (markgritter) wrote,
Mark Gritter
markgritter

Now That The Murderous Rage Has Ebbed...

I tracked down a long-standing bug in our code today.


Matt complains that a top-of-branch build does not come up on the system. He tries several times. But, the latest "release candidate" (a tagged build from the same branch) works fine.

There's no obvious change since the RC was built that would cause the failure he's seeing. The changes that were made in the relevant modules seem harmless. So I make my own fresh top-of-branch build this morning and I can't reproduce the failure. Great.

This suggests there is something different between Matt's build and mine. Unfortunately I can't just compare my binaries with his directly. But I make sure the right versions of all the code is used, etc. But it catches my eye that some of the binary files he's using in one particular directory have not been rebuilt since June, which seems like a stretch. We've run into problems before that were "solved" by a make clean/make all cycle, it's time to finally put this one to rest. (This is consistent with the evidence so far--- Matt's binary is built using his existing tree, mine and the RC were fresh build directories.)

I can't compare the two sets of binaries (Matt's and mine) directly, but I attempt the next best thing. I make two copies of his build directory, do a 'make clean' in one, and see what files appear to be different. Unfortunately the changed directory name seems to be sufficient to cause 20-40 byte differences in file length, but that's easy enough to eyeball and look for major changes.

Bingo. Two of the files in the library I suspect of causing problems are off by several kilobytes. I look at the directory and notice that they don't have dependency (.d) files generated. So a change to a header file won't cause a recompilation.

I verify this by modifying a header file and verifying that my build doesn't pick up the change. Then I track down the error and it turns out to be a one-character typo, the Makefile use SRCIPC when it should be SRCSIPC, but just in the line setting up the dependency targets.


Now, being a responsible manager and all I'd like to avoid having a similar problem crop up ever again. But I'm stumped as to how to put procedures or automation in place to determine that the build process is actually working correctly.

  • Rewrite GNU Make to disallow use of undefined variables? Yeah, right.
  • Demand manual tests of dependency generation? Unlikely to actually get done.
  • Start running nightly builds from an existing build tree instead of a fresh one? Yuck.
  • Run a dedicate test that 'touch'es all the header files and verifies that all the .o's get rebuilt?

    Any ideas?

    Probably the best solution would be to get with the 21st century and use a better make system...
  • Tags: geek, work
    Subscribe
    • Post a new comment

      Error

      default userpic

      Your reply will be screened

      Your IP address will be recorded 

      When you submit the form an invisible reCAPTCHA check will be performed.
      You must follow the Privacy Policy and Google Terms of use.
    • 5 comments