### Fun with Bad Methodology

Here's a Bill Gross talk at TED about why startups fail. Mr. Gross identifies five factors that might influence a startup's success, analyzes 250 startups according to those factors, and analyzes the results.

This is possibly the worst methodology I have seen in a TED talk. Let's look at just the data presented in his slides. I could not find a fuller source for Mr. Gross's claims. (In fact, in his DLD conference talk he admits that he picked just 20 examples to work with.)

I did a multivariate linear regression--- this does not appear to be the analysis he performed. This produces a coefficient for each of the factors:

While this analysis agrees that "Timing" is the most important, it differs from Mr. Gross on what is second. It actually says that the business plan is a better predictor of success. That's strike one--- the same data admits multiple interpretations about importance. Note also that linear regression says more funding is actively harmful.

The second methodological fault is taking these numbers at face value in the first place. One might ask: how much of the variation in the dependent variable do these factors explain? For linear regression, the adjusted R^2 value is about 0.74. That means about 25% of the success is explained by some factor not listed here. What checks did Mr. Gross do to validate that his identified factors were the only ones?

While we're still on problems with statistical analysis, consider that the coefficients also come with error bars.

The error bars are so wide that few useful statements can be made about relative ordering of importance. (Obviously this might change with 230 more data points. But it seems like Mr. Gross was operating on only 20 samples.)

Next let's transition to the data-gathering faults. Mr. Gross's sample of companies is obviously nonrandom. But it is nonrandom in a way that is particularly prone to bias. He lists startup companies that actually got big enough to attract sufficient attention that he'd heard of them! The even mix of success and failure when he's looking to explain the high rate of failure should be a huge red flag.

Suppose we add 15 more failed companies to the list that have an average timing of 7 (slightly better than the sample) but average on the other factors of 5.

Oops, now timing has slipped to third!

Not surprising, because I deliberately set things up that way. But Mr. Gross doesn't know what the "next" set of failures look like either. He has a complete list of IdeaLabs companies, but not the outside world, which is its own bias--- maybe it's Mr. Gross who is prone to timing failures, not the rest of the world!

Picking only large, well-known failures for your analysis is nearly the very definition of survivorship bias.

Finally, the inputs themselves are suspect even where they are complete. Mr. Gross already *knows* whether these companies are successes or failures when he's filling out his numbers. Does Pets.com really deserve a "3" in the idea category, or is that just retrospective thinking? Why are the successful IdeaLabs companies given 6's and 7's for Team, but the unsuccessful ones get two fours and a five? Did Mr. Gross really think he was giving those two ventures the "B" team at the time?

Even the Y axis is suspect! Pets.com had a successful IPO. Uber and AirBnB can't claim to have reached that stage--- they might yet implode. (Unlikely, admittedly, but possible. And their revenue numbers are better than Pets.com ever was.) As an investor, "Did I receive any return on my investment" is the measure of success.

To summarize,

* The data were generated from the opinions of the principal investigator and not subject to any cross-checks from other parties.

* The data exhibit survivorship bias and other selection bias.

* The analysis includes no confidence intervals or other measurements of the meaningfulness of the results.

* The results presented appear highly dependent upon the exact analysis performed, which was not disclosed.

Am I being unfair? Perhaps. This was not an academic conference talk. But if we are to take his advice seriously, then Mr. Gross should show that his analysis was serious too.

This is possibly the worst methodology I have seen in a TED talk. Let's look at just the data presented in his slides. I could not find a fuller source for Mr. Gross's claims. (In fact, in his DLD conference talk he admits that he picked just 20 examples to work with.)

I did a multivariate linear regression--- this does not appear to be the analysis he performed. This produces a coefficient for each of the factors:

Idea 0.021841942 Team 0.033134009 Plan 0.047012997 Funding -0.03324002 Timing 0.133669929

While this analysis agrees that "Timing" is the most important, it differs from Mr. Gross on what is second. It actually says that the business plan is a better predictor of success. That's strike one--- the same data admits multiple interpretations about importance. Note also that linear regression says more funding is actively harmful.

The second methodological fault is taking these numbers at face value in the first place. One might ask: how much of the variation in the dependent variable do these factors explain? For linear regression, the adjusted R^2 value is about 0.74. That means about 25% of the success is explained by some factor not listed here. What checks did Mr. Gross do to validate that his identified factors were the only ones?

While we're still on problems with statistical analysis, consider that the coefficients also come with error bars.

Lower 95.0% Upper 95.0% Idea -0.082270273 0.125954157 Team -0.082753104 0.149021121 Plan -0.049741296 0.143767289 Funding -0.126226643 0.059746603 Timing 0.034672747 0.232667111

The error bars are so wide that few useful statements can be made about relative ordering of importance. (Obviously this might change with 230 more data points. But it seems like Mr. Gross was operating on only 20 samples.)

Next let's transition to the data-gathering faults. Mr. Gross's sample of companies is obviously nonrandom. But it is nonrandom in a way that is particularly prone to bias. He lists startup companies that actually got big enough to attract sufficient attention that he'd heard of them! The even mix of success and failure when he's looking to explain the high rate of failure should be a huge red flag.

Suppose we add 15 more failed companies to the list that have an average timing of 7 (slightly better than the sample) but average on the other factors of 5.

Oops, now timing has slipped to third!

Idea 0.079811441 Team 0.067029361 Plan -0.007925792 Funding 0.033260711 Timing 0.064723176

Not surprising, because I deliberately set things up that way. But Mr. Gross doesn't know what the "next" set of failures look like either. He has a complete list of IdeaLabs companies, but not the outside world, which is its own bias--- maybe it's Mr. Gross who is prone to timing failures, not the rest of the world!

Picking only large, well-known failures for your analysis is nearly the very definition of survivorship bias.

Finally, the inputs themselves are suspect even where they are complete. Mr. Gross already *knows* whether these companies are successes or failures when he's filling out his numbers. Does Pets.com really deserve a "3" in the idea category, or is that just retrospective thinking? Why are the successful IdeaLabs companies given 6's and 7's for Team, but the unsuccessful ones get two fours and a five? Did Mr. Gross really think he was giving those two ventures the "B" team at the time?

Even the Y axis is suspect! Pets.com had a successful IPO. Uber and AirBnB can't claim to have reached that stage--- they might yet implode. (Unlikely, admittedly, but possible. And their revenue numbers are better than Pets.com ever was.) As an investor, "Did I receive any return on my investment" is the measure of success.

To summarize,

* The data were generated from the opinions of the principal investigator and not subject to any cross-checks from other parties.

* The data exhibit survivorship bias and other selection bias.

* The analysis includes no confidence intervals or other measurements of the meaningfulness of the results.

* The results presented appear highly dependent upon the exact analysis performed, which was not disclosed.

Am I being unfair? Perhaps. This was not an academic conference talk. But if we are to take his advice seriously, then Mr. Gross should show that his analysis was serious too.