I got a couple good stories in response.
A local person who works in computer vision talked about finding a new algorithm for shadow detection. It worked great on the graduate student's ten examples. It didn't work at all on his company's hundreds of examples in all different lighting and weather conditions.
A coworker shared:
A few years ago my brother was working at an aircraft instrument company. He was
looking for an algorithm that described max loading for a plane still able to take off
based on many variables.
He found what was supposed to be the definitive solution written by a grad student at
USC. He implemented the algorithm and started testing it against a large DB of data
from actual airframe tests. He quickly found that the algorithm worked just fine for
sea-level examples, but not as airport altitude rose. He looked through the algorithm
and finally found where the math was wrong. He fixed his code to match his new
algorithm and found that it now matched the data he had for testing.
He sent the updated algorithm to the grad student so he could update it on public
servers. He never heard back nor did he ever see the public code updated.
The example I put as a bonus slide wasn't previously mentioned in my series of blog posts is a good one too. In Clustering of Time Series Subsequences is Meaningless the authors showed that an entire big data technique that had been used dozens of times in published papers produced essentially random results. (The technique was to perform cluster analysis over sliding window slices of time series.)