Mark Gritter's Journal|
[Most Recent Entries]
Below are 20 journal entries, after skipping by the 20 most recent ones recorded in
Mark Gritter's LiveJournal:
[ << Previous 20 -- Next 20 >> ]
[ << Previous 20 -- Next 20 >> ]
|Saturday, January 9th, 2016|
|"... you got to learn to feel good about it. Look at the way the whole economy is structured."
What can rich people possibly do with their money?
This is a serious question. I'm rereading "The First Modern Economy: Success, Failure, and Perseverance in the Dutch Economy, 1500-1815." The authors take some time to discuss the rentiers of the 18th century, who were criticized both by contemporaries and historians for "a tendency on the part of capital... to prefer speculative trading... and investment in money lending and insurance, to the toil and hazards of foreign trade." (quote from Violet Barbour). The authors make a case that in part this was a natural reaction to a less-friendly trade environment --- mercantilist barriers were going up all over Europe --- but also a consequence of having more money than they really knew what to do with:
The vast accumulations of capital in the hands of the eighteenth-century capitalists placed them (ironically) in a position of dependence vis-a-vis national states. Only the states, with their taxing power, could hope to absorb such sums and make regular interest payments. The fact that governments sought to borrow almost exclusively to wage war guaranteed that Dutch capital would do little to stimulate economic growth directly; but neither the immediate economic possibilities nor the grandiose development projects that we, in retrospect, may be capable of imagining could have absorbed a large portion of the late eighteenth-century stock of financial capital. In the circumstances of the times, governments held a monopsonistic position in the capital markets.
(I think this is not quite on point because the Dutch did have some pretty grandiose empire-building designs of their own, which absorbed quite a bit of capital, they just didn't work out.)
The same criticism persists today, about too much money sloshing around in financial instruments instead of the "real economy." But, absent some grand redistributionary scheme, what exactly should the ultra-rich be doing with their money? (We have seen how even voluntary redistribution provokes acerbic backlash on the means and goals.
Loaning it to the government is still a popular choice, and like the 17th and 18th century, it goes mainly to fund wars rather than development projects.
Starting a business takes lots of money --- and that's how at least a couple billionaires I know choose to use their money --- but, if the business is successful, only compounds the problem
. And presumably we don't want capital going to unsuccessful businesses.
There's arguably a limit to even how much the startup ecosystem can fruitfully absorb. Small companies just don't need all that much capital. Plus an excess of cash leads to self-fulfilling prophecies in which success is measured by the ability to raise more cash
Plunging all your cash into a 2-billion-dollar iron mine
increasingly seems like a bad idea. Buying somebody else's company just shifts the problem around.
So where *should* all that capital be going? Is there even an answer? Clean energy? Our best answers don't seem all that different from the 18th century Dutch:
1. War and interest service on war
2. Naked speculation
4. World domination (aka Elon Musk, although in his case "space" might be more apropos)
|Saturday, January 2nd, 2016|
I managed to tune my CUDA (GPU programming) demo project to get about 4x better throughput: https://github.com/mgritter/advent-of-cuda/tree/master/day4
The first major piece of surgery was moving constant tables into "shared memory" on the GPU, instead of device memory. The GPU's L1 cache does not serve as a read cache for device memory! At least, not in the version of the architecture I'm using, 3.2. Later revisions have a "read only" cache and a special intrinsic to use it. The "shared memory" is data that is available to all threads in a "block", and lives in the L1 memory, along with per-thread allocations and register spills.
After this, profiling (with the NVIDIA visual profiler) still showed that L2 cache throughput was maxed out, with a mix of both reads and writes. So the next step was to find a way to move the actual data being hashed into registers instead of memory. Shared memory would not be appropriate as each thread is working on a different input.
After both these steps, profiling shows that the performance is now bounded by compute, and that the kernel occupancy is 97.3%, so the GPU is almost fully utilized. The device is doing slightly more than 100 million MD5 hashes per second. (For this particular problem, only the first word of the hash is relevant.) Further improvements would probably require finding ways to hash with less integer arithmetic--- which is about 80% of the instructions. But, I have not looked at other GPU-based hash calculations and figured out what techniques they are using.
|Friday, January 1st, 2016|
|How I've been entertaining myself
I participated in Advent of Code
, and Advent calendar of programming puzzles. Of the 25 days of puzzles, one I solved without doing any coding, one other by minor scripting of an existing application, and the rest in Python. It was a fun exercise. (Sadly, I did not submit a solution quickly enough on the last day to make the leaderboard.)
For Christmas I received a NVIDIA Jetson TK1
, a development board for the Tegra K1 "system-on-a-chip". Most notably, it comes with a 192-core GPU built in, making it a 5"x5" supercomputer.
I thought a good project to get more familiar with programming it would be to reimplement some of the Advent of Code problems using CUDA (Nvidia's programming API for GPU computing.) I posted my first effort on github: https://github.com/mgritter/advent-of-cuda/tree/master/day4
(But, I haven't rerun my old Python solution to demonstrate the speedup!) The new code is capable of finding the solutions to the original problems in less than a second each, and of finding solutions to the "next-hardest" version of the problem in about 9 seconds.
I also have a bunch of new games to play: Mini Metro (a fun logistics game), Concrete Jungle (deck-building + city-building), Infinifactory (factory layout for evil aliens), and Offworld Trading Company (a logistic RTS.) Yes, even my games are really all about programming. :)
|Sunday, December 13th, 2015|
|Algorithms that only need to be executed once
Charles Babbage knew from personal experience
that mathematical tables are hard to create. He and William Herschel discovered error after error in published books. Even if you get all the calculations right, the odds are pretty good that there will be typesetting errors.
Because his whole motivation for the Difference Engine was to perfect this process, the design reflects an appreciation for unexpected errors. The engine prints its own output, rather than relying upon manual transcription. A second copy of the output is printed in a log, which allows checking the final copy for agreement. The mechanisms of the calculation operate in such a way as to jam rather than produce an incorrect sum.
One can think of the difference engine is a mechanism designed for running its algorithm just once. After you build a table of logarithms, you don't need to do it again. Of course, the machine is configurable to produce a variety of tables. And it's restartable in case a page is damaged. But the key difference is that correctness takes precedence over throughput or flexibility.
What's striking to me is how few examples of such run-once algorithms are of practical use today. Mathematical models generally don't consult pre-computed tables, we just run an algorithm that calculates the answer on demand. A lot of the problems that are interesting to us are online problems where new data shows up all the time.
Verification tools for circuit design are usually run *successfully* only once. However, they are often run multiple times on nearly the same input, because the whole point is to find and fix bugs.
An example from number theory is verification of the largest Mersenne prime. In that case, rather than writing one really solid algorithm, the number was verified on multiple implementations of primality testers! I don't know what Babbage would think of that.
So, what sort of algorithms today are run only once on a given problem? Admittedly this is a fuzzy definition because, say, Google never indexes the same Internet twice. And the circuit-verification example is technically different each time.
|Tuesday, December 8th, 2015|
The current rumor is that Cisco may purchase SpringPath, a recently-launched startup that makes a "hyperconverged" software solution. http://m.crn.com/news/data-center/300079048/cisco-partners-acquiring-hyper-converged-startup-springpath-would-make-ton-of-sense.htm
Cisco has been partnering with Simplivity, a company with a similar offering.
What these companies offer is a software solution that combine local storage (flash drives attached to the hosts in a virtualized environment) into a distributed storage appliance. Cisco's previous attempt to sell a storage solution, by acquiring Whiptail, crashed and burned.
The only reason I can think of to pick SpringPath rather than one of the leaders in this area is because they're cheap. So it's not an expensive bet for Cisco to make. But they've been burned before by trying to build a storage solution without sufficient commitment or maturity.
|Saturday, November 21st, 2015|
|NMBL, you had one job
I've written before about how Tintri's best public comparable is Nimble, and NMBL's price has been pretty flat despite revenue growth.
Well, Nimble missed their revenue and profitability targets this quarter, and the market reacted viciously. As Tintri's CFO has warned us on multiple occasions. The stock lost about half its value. They also pulled down Pure Storage by about 15%. Incumbent storage companies (EMC, NTAP, etc.) went up. The dominant story today seems to be "gosh, selling storage is tough and there's a lot of competition."
BMO Capital analyst Keith Bachman wrote
Our original thesis on Nimble is broken. We had previously assumed (incorrectly) that Nimble’s technology advantage would enable the company to take share in both the midmarket and enterprise, while also gradually growing profits and cash flows. However, we think the challenges of Nimble’s competitors are now engulfing Nimble, in terms of both revenue growth and profits. Moreover, while Nimble envisions improving profitability in 2H FY2017, we are less convinced.
Hence, we are moving to our back up thesis. We believe that Nimble could make an interesting take out target for the very same reasons mentioned above. The incumbent storage vendors are seeking growth and could meaningfully improve Nimble’s sales leverage. In addition, Nimble is a lot less expensive than it used to be. Nimble’s gross margins remain 67%, which would be accretive to Nimble’s competitors. Moreover, R&D is only about 22% of revenues, whereas S&M and G&A combined are about 58% of revenues. Further, we believe a large portion of non-R&D spending could be meaningfully reduced or eliminated.
Frankly, I *don't* think Nimble has a technology advantage. They lack an all-flash array (though one is coming) so can't compete with Pure, EMC's ExtremeIO, Netapp, Tintri, and Tegile in that market. They have good analytics capability in InfoSight but no real-time analytics on the box (which particularly hurts federal and financial customers) and no per-VM features like Tintri. The areas where Nimble has been strongest is in the commercial sector where they have affordable products and a very strong channel presence.
There is a tendency of market analysts to treat public companies as the only companies around. If somebody is going to get snapped up in an acquisition, I don't think it's going to be Nimble. (The fact that it's cheap hasn't helped Violin.)
|Wednesday, November 18th, 2015|
|Competing narratives and yes, your wiseassery really will get brought up in court
It's interesting skimming through the competing trial briefs in the Optumsoft vs. Arista case. They're documents 216 and 218 in the electronic case filings here: https://www.scefiling.org/cases/docket/newdocket.jsp?caseId=956
Both lawyers are, unsurprisingly, very good at explaining why the other side's position is completely unreasonable and contradictory.
It bears repeating that you should treat company email very, very seriously. Ken's comment unearthed in discovery:
Moving forward, any project involving moving code out of //src/tacc will probably need Henk’s approval or something like that. All the more reason to not add any code there ever again.
is dissected at length and held up as an example of Arista's bad faith. (The Optumsoft narrative is that Arista is trying to pretend that only changes made in that directory are "improvements" to TACC, the software under dispute. Arista's actual claim is more subtle than that.)
Now, I know Ken from Stanford, and used to work for Henk. (In fact, I think I know everybody who testified in this case except the expert witnesses.) Ken almost certainly didn't mean "let's violate our agreement by not providing Optumsoft any of the required improvements." The email is in keeping with Ken's approach to technical issues --- "how can we fix the process to remove cause for future disputes?" --- but uses a hyperbolic tone that is easily misconstrued.
It is also a bit amusing, as the recipient of some jeremiads from David, to see them being brought up in the trial record.
I don't think it's a good idea to stress over how every email you write might look in court. But when you *know* there's a dispute, you should probably assume that any email discussions about that dispute can and will be dragged into the light.
|Sunday, November 8th, 2015|
|Pi is Political
What is Pi? Although the typical definition in terms of circles is intuitive, it's not particularly well suited to analysis.
In the early 20th century, Edmund Landau, a German Jew, championed the position that Pi/2 should be defined as the smallest positive zero of the cosine function. (The cosine itself can be defined as an infinite series.)
For his efforts, Landau was singled out by Ludwig Bieberbach in his inflammatory talk on "J-type" and "S-type" mathematics in 1934. "Thus... the valiant rejection by the Gottingen student body which a great mathematician, Edmund Landau, has experienced is due in the final analysis to the fact that the un-German style of this man in his research and teaching is unbearable to German feelings. A people who have perceived... how members of another race are working to impose ideas foreign to its own must refuse teachers of an alien culture."
Bieberbach started his own journal, "Deutsche Mathematik" to publish "Aryan mathematics".
British mathematician G.H. Hardy published a note in Nature, stating: "There are many of us, both Englishmen and many Germans, who said things during the War which we scarcely meant and are sorry to remember now. Anxiety for one's own position, dread of falling behind the rising torrent of folly, determination at all costs not to be outdone, may be natural if not particularly heroic excuses. Prof. Bierberbach's reputation excludes such explanations of his utterances; and I find myself driven to the more uncharitable conclusion that he really believes them true."
It's striking how apologetic the tone is for failing to find a charitable interpretation of Bierberbach's anti-Semitism. The split between intuitionistic and mainstream mathematics persists today, in less racially charged form. But there are plenty of other divides where it seems like the search for an exculpatory frame of mind obscures the real harm being wrought.
(Quotes from "Numbers", H.D. Ebbinghaus et al, translated by H.L.S. Orde.)
|Monday, November 2nd, 2015|
|Bones was right about the transporter
Star Trek canon records at least 11 transporter accidents:
* Splitting Kirk into good and evil halves (TOS: The Enemy Within)
* Transport to the mirror universe (TOS: Mirror, Mirror) --- not counting later deliberate action
* Transport to another universe (TOS: The Tholian Web)
* Two deaths in Star Trek: The Motion Picture
* Weyoun 5 dies in DS9: Treachery, Faith, and the Great River
* Clone of Riker (TNG: Second Chances)
* Reverting four individuals to younger versions of themselves (TNG: Rascals)
* Travel through time (DS9, Past Tense)
* Merging multiple individuals (VOY, Tuvix)
* Damaging the holographic doctor's mobile emitter (VOY, Drone)
* Rendering two crew members incorporeal (TNG, The Next Phase)
To calculate a malfunction rate, we need some estimate of how many times the transporter was used in total during this period. There are 79 TOS episodes, 178 TNG episodes, 176 DS9 episodes, and 172 VOY expisodes. That gives 605 episodes. If we generously estimate 4 transports per episode that gives a failure rate of 0.4%, unacceptably high.
Transporters appear to be able to cross planetary distances easily (no waiting for proper orbital alignment.) An estimate of 25,000 miles per transport --- that is, assuming starships are usually in high orbit --- and 6 people at a time gives an accident rate of around 30 per billion passenger-miles. Modern cars and light trucks have a fatality risk of 7.3 per billion passenger-miles. (However, it should be noted that many of the transporter accidents were reversible or otherwise non-fatal.)
|Saturday, October 10th, 2015|
So, the news is going around that VW is attempting to claim just a few software engineers knew about the hack, and not any executives. The theory is that the executives made a decision to pursue a particular diesel technology strategy which couldn't be made to work, and so low-level engineers put a hack in to save their jobs.
But there's a lot of levels in between that are glossed over in this story. It's certainly possible, even likely, that the board did not know. It's also possible that the CEO and head of business unit did not know. But "rogue software engineers" doesn't cut it.
Suppose you've magically cracked a tough software problem that has been causing lots of hair-pulling and executive pressure. What happens next?
1. Your boss asks "how did you do it?" What's your answer? Is your boss now complicit? (Or merely suspicious?)
2. Your QA department asks how to test your fix. Their code coverage tool makes sure that the branch you put in actually gets executed. Do they ask what use case this is? You can't just check in and push to production in an embedded automotive environment.
3. Your patent lawyer stops by and asks you to file a disclosure about your invention. How do you convince the legal team to leave the flagship "clean diesel" technology a secret?
4. A new hire in the group is excited about how you guys have done great things for the planet. (Maybe he or she has been watching to many GE advertisements.) Your coworkers want to understand and maybe improve on your design. What's your answer when they ask for a description of how the magic works?
It seems unlikely to me that this deception could be executed without being widely known within the team. Management may have been deliberately obtuse, I suppose, but this is not about some insider siphoning off half-pennies. It's a key feature that VW actively promoted, and it's inconceivable that nobody else asked how it was accomplished.
|Thursday, September 24th, 2015|
|Ambiguity might or might not work in our favor
Tintri is on TechCrunch's "Unicorn Leaderboard" http://techcrunch.com/unicorn-leaderboard/
but at the very bottom. We have not disclosed a valuation, ever. (Also, as of this writing, our sector is listed as "healthcare." Don't believe everything you read on the Internet.)
This table is very amusing to me given the sheer number of companies clustered between $1.0b and $1.03b. There are 25 in that range, compared to 11 in the "emerging unicorn" list for the range $800-999m. Some of these funding round values were set to just cross over into "unicorn" range.
On a tangential note, Pure Storage hopes to be valued at up to $3.33b at IPO.
Here I am with my Titan of Technology
The crowd gasped when the MCs stated that Tintri hoped to go IPO next year with a valuation over a billion dollars. It was definitely a Dr. Evil moment, but I was never any good with the pinky finger.
The collection of attendees was a little bit odd. Quite a lot of real estate people. Nobody I saw from any of the banks, or from Target, or the major healthcare players, although Thomson Reuters was a sponsor. Tim had to sit next to some headhunters, but also somebody from Public Radio International. Phil Soran (Compellent, Vidku, etc., a winner last year) came up and introduced himself, and I said hi to Sona Mehring (CaringBridge and the Minne* board), who was awarded this year.
|Friday, September 4th, 2015|
|DFA's aren't easy
A blog entry on "Open Problems That Might Be Easy"
mentions this one:
Given a DFA, does it accept a string that is the binary representation of a prime number--- is this decidable?
I think this is sort of a tricky question in that it sounds simple but actually asks for very deep insights. To solve it you might have to know something about prime numbers which is also an open problem!
How hard are DFAs? After all, we have the pumping lemma which helps us prove that languages aren't regular. But, your computer is actually equivalent in power to a DFA (if we disconnect it from the Internet.) It has only finitely many states, and so the languages it can recognize are regular. So imagine this problem as "how can I tell whether a computer program, running on a Windows Server 2012, on a eight-core machine with 256GB of memory, ever accepts a prime number?" We're talking an unimaginably large number of states--- does it still seem reasonable that there's a computable way of analyzing its behavior?
But, we do have the pumping lemma. So if we could characterize that there is always a prime number of the form
a ## bb...bb ## c
, that might answer the question in the affirmative for any DFA for which we could identify its "sufficiently long" strings. But this is obviously false without further qualification--- take a = 1, b = 0, c = 0. If we use [math]a=1111...1111, b=1, c=1[/math] then the answer may depend on the existence of very large Mersenne primes. So it may be possible to write a DFA for which the answer "does it accept any primes" provides a deep number-theoretical result.
|Thursday, August 20th, 2015|
|Friday, August 14th, 2015|
|Speaking of comparables
Pure Storage filed their form S-1
in preparation for an IPO. It contains a few surprises.
The one most remarked upon is that Pure's market share is not quite what it was believed to be. Gartner estimated calendar year 2014 revenue of $276 million, while Pure reported fiscal year 2014 (offset by one month) revenue of $155m. The difference for 2013 was also large, on the order of $75m.
The Gartner analyst has apologized for his company's error
. Pure, having entered their quiet period, cannot explain why they decided to let Gartner's mistake pass (they, like other companies Gartner covers, are given an opportunity to correct factual errors.) I haven't worked in analyst relations--- perhaps letting Gartner make errors is standard practice. Certainly I can understand if Pure decided keeping revenue numbers private outweighed getting an accurate representation in Gartner's market share breakdown.
Pure is proposing a dual class stock structure in which existing investors keep control of the company (their class B shares will have 10x the voting rights of the new class A common stock.) Opinions vary on the wisdom of this. I think it's something the market doesn't care about if your company is successful, and cares a lot about if things don't look bright. It's a little arrogant to assume that Pure is in the former category, but that has been Pure's marketing image from day one. :)
Something I *haven't* seen discussed elsewhere is the large payouts to the executive team (and early investors) in the form of stock repurchases.
In November 2013, we repurchased an aggregate of 3,045,634 shares of our outstanding Class B common stock at a purchase price of $6.9315 per share for an aggregate purchase price of $21.1 million, of which 557,842 shares of common stock were repurchased from David Hatfield, our President, for an aggregate price of $3.9 million. Mr. Hatfield was the only executive officer to participate in this tender offer.
In July 2014, we repurchased an aggregate of 3,803,336 shares of our outstanding Class B common stock at a purchase price of $15.7259 per share for an aggregate purchase price of $59.8 million. The following table summarizes our repurchases of common stock from our directors and executive officers in this tender offer.
Name Shares of Common Stock Purchase Price
Scott Dietzen(1) 192,051 $ 3,020,174
John Colgrove(2) 200,000 3,145,180
David Hatfield(3) 1,000,000 15,725,900
(1) Dr. Dietzen is our Chief Executive Officer and a member of our board of directors.
(2) Mr. Colgrove is our Chief Technology Officer and a member of our board of directors.
(3) Mr. Hatfield is our President.
In April 2014, Pure raised $225m. But $60m of that went right back out the door to existing stockholders. (In August 2013, Pure raised $150m, with $21m flowing back out.) Some of this outflow is mitigated by the exercise of stock options.
I can't speak to these individuals' financial situation and whether it made sense from a personal position for them to cash out. But from a company position, this seems excessive compensation for a company that hasn't yet proved itself. Together the three executives took $25.8m in cash out of the company (not counting any salary or bonuses) In all three cases, they were also loaned money by the company in order to purchase stock or exercise options, and these loans have been repaid, presumably with the proceeds from the stock buyback.
Finally, Pure is growing its revenue rapidly (although, as noted, not as rapidly as had been previously believed--- and EMC is all over it.) But it's losing a lot of money too. Net cash flow from operations in the most recent quarter was -$14m, and an extra -$6.7 in investment cash flow (including capital investment). That's not too bad, although the reported loss was more than $49m. Somebody more versed in accounting than me can probably explain out how they managed to pay that much operational cost without a corresponding drop in cash? (It's in there, depreciation and stock-based compensation and such.) For the fiscal year ending January 2015 they burned through $196m.
In fiscal 2015, Pure spent about $1 in sales and marketing for every $1 in product revenue: $154,836,000 in product sales (not counting support) and $152,320,000 sales and marketing (not counting any G&A or R&D.) EMC will hammer them for this too. It's a "get big fast" strategy which spends a lot of money every quarter trying to make the next quarter's sales even bigger. You can see this when you slice the data quarter by quarter:
4Q2015 sales and marketing: $42,533K
1Q2016 revenue: $74,077K
3Q2015 sales and marketing: $38,224K
4Q2015 revenue: $65,850K
2Q2015 sales and marketing: $46,448K
3Q2015 revenue: $49,189K
1Q2015 sales and marketing: $25,115K
2Q2015 revenue: $34,764K
This may be correct, but it's an expensive strategy and one that doesn't leave a lot of room for error. The return they're getting on a sales and marketing dollar is not consistently high.
|Tuesday, August 11th, 2015|
Our CFO gave the company an informal chat about our recent financing round today.
One of the interesting things he discussed (and which I feel fine talking about publicly) is comparables. Just like houses are priced based on similar houses in the neighborhood, and CEOs are paid based on what other CEOs are paid, private investments tend to get valued based on what other companies in the same business are worth.
Unfortunately, this causes a little bit of "headwind" for Tintri because our most direct comparable is Nimble. (There's also Violin, about which the less said the better.) None of the other storage startups have gone public.
Nimble Storage's IPO (as NMBL
) was in December 2013, and closed that day at $33.93/share. Their peak was $52.74 in February, and 2014, and ever since they've just been wandering around $25-$30, 50% off the peak. In contrast, the S&P has gone up 17% over NMBL's time on the market.
It's not that they're doing particularly poorly (although they are losing money.) Revenue has continued to increase at a healthy rate, and they meet analyst expectations. But, this means their price/revenue ratio has been on a steep decline too. And that's what gets used for valuation. Combined with NTAP and EMC's woes, this makes investors question whether storage is where they want to put their money.
This is not a serious impediment to Tintri going public, and obviously didn't stop us from landing a sizable investment.
|Saturday, August 8th, 2015|
I watched the documentary "Stripped" last night, and enjoyed it. But it was trying to do too many things to really be a good documentary. The part I enjoyed most was hearing the artists talk about their process for "being funny" every day.
Many of the interviewees have created web-based comics, and so there was a longer-than-necessary section on "how do webcomic authors make money." (Like I said, it tried to do too many things.) There was also a fair amount of incomprehension from the older artists: "that's the part I want somebody else to take care of", "how do these kids make money", "I just like it when a bag of money shows up regularly", etc.
Although the comics syndicates do compete with each other, they form an oligopoly. And the gatekeeper function of that oligopoly is, I think, a large part of what kept artists paid. We know people will make comics--- even surprisingly good comics--- for a pittance, and that there are thousands of hopeful cartoonists out there. (One stat was something like 1 in 3500 applications gets accepted by the syndicates.) A gatekeeper can cut off this supply curve, thereby keeping prices (wages) high. You either are the top 1% and earn a decent living, or eventually go do something else with your life.
The web changes this, although in a complicated way because the revenue stream isn't the same. But it means nobody is saying "no" to a comic artist--- they might be a failure in the market, but they aren't getting rejected by an editor. That suggests that the distribution of money will look different, too.
|Friday, August 7th, 2015|
The Minneapolis-St. Paul Business Journal selected me as one of its 2015 "Titans of Technology": http://www.bizjournals.com/twincities/morning_roundup/2015/08/2015-titans-of-technology-honorees-announced.html
There will be a more in-depth profile (or at least a photo!) later, and a fancy awards lunch in September. It's an honor to appear on the awards list with local leaders such as Clay Collins (of LeadPages) and Sona Mehring (of CaringBridge.) My award is in the category of "people responsible for the creation of breakthrough ideas, processes or products" along with a professor at UMN, Lucy Dunne, who studies wearable computing.
|Wednesday, August 5th, 2015|
|Monday, July 20th, 2015|
|Fun with Bad Methodology
Here's a Bill Gross talk at TED
about why startups fail. Mr. Gross identifies five factors that might influence a startup's success, analyzes 250 startups according to those factors, and analyzes the results.
This is possibly the worst methodology I have seen in a TED talk. Let's look at just the data presented in his slides. I could not find a fuller source for Mr. Gross's claims. (In fact, in his DLD conference talk he admits that he picked just 20 examples to work with.)
I did a multivariate linear regression--- this does not appear to be the analysis he performed. This produces a coefficient for each of the factors:
While this analysis agrees that "Timing" is the most important, it differs from Mr. Gross on what is second. It actually says that the business plan is a better predictor of success. That's strike one--- the same data admits multiple interpretations about importance. Note also that linear regression says more funding is actively harmful.
The second methodological fault is taking these numbers at face value in the first place. One might ask: how much of the variation in the dependent variable do these factors explain? For linear regression, the adjusted R^2 value is about 0.74. That means about 25% of the success is explained by some factor not listed here. What checks did Mr. Gross do to validate that his identified factors were the only ones?
While we're still on problems with statistical analysis, consider that the coefficients also come with error bars.
Lower 95.0% Upper 95.0%
Idea -0.082270273 0.125954157
Team -0.082753104 0.149021121
Plan -0.049741296 0.143767289
Funding -0.126226643 0.059746603
Timing 0.034672747 0.232667111
The error bars are so wide that few useful statements can be made about relative ordering of importance. (Obviously this might change with 230 more data points. But it seems like Mr. Gross was operating on only 20 samples.)
Next let's transition to the data-gathering faults. Mr. Gross's sample of companies is obviously nonrandom. But it is nonrandom in a way that is particularly prone to bias. He lists startup companies that actually got big enough to attract sufficient attention that he'd heard of them! The even mix of success and failure when he's looking to explain the high rate of failure should be a huge red flag.
Suppose we add 15 more failed companies to the list that have an average timing of 7 (slightly better than the sample) but average on the other factors of 5.
Oops, now timing has slipped to third!
Not surprising, because I deliberately set things up that way. But Mr. Gross doesn't know what the "next" set of failures look like either. He has a complete list of IdeaLabs companies, but not the outside world, which is its own bias--- maybe it's Mr. Gross who is prone to timing failures, not the rest of the world!
Picking only large, well-known failures for your analysis is nearly the very definition of survivorship bias.
Finally, the inputs themselves are suspect even where they are complete. Mr. Gross already *knows* whether these companies are successes or failures when he's filling out his numbers. Does Pets.com really deserve a "3" in the idea category, or is that just retrospective thinking? Why are the successful IdeaLabs companies given 6's and 7's for Team, but the unsuccessful ones get two fours and a five? Did Mr. Gross really think he was giving those two ventures the "B" team at the time?
Even the Y axis is suspect! Pets.com had a successful IPO. Uber and AirBnB can't claim to have reached that stage--- they might yet implode. (Unlikely, admittedly, but possible. And their revenue numbers are better than Pets.com ever was.) As an investor, "Did I receive any return on my investment" is the measure of success.
* The data were generated from the opinions of the principal investigator and not subject to any cross-checks from other parties.
* The data exhibit survivorship bias and other selection bias.
* The analysis includes no confidence intervals or other measurements of the meaningfulness of the results.
* The results presented appear highly dependent upon the exact analysis performed, which was not disclosed.
Am I being unfair? Perhaps. This was not an academic conference talk. But if we are to take his advice seriously, then Mr. Gross should show that his analysis was serious too.