Log in

Wednesday, March 18th, 2015

Sums of Cubes

Somebody on Quora asked for an example of w^3 + x^3 + y^3 = z^3 in positive integers, all relatively prime. Turns out they were looking for a counterexample of a theorem they thought they proved, which is sort of a passive-aggressive way to approach it.

Anyway, brute force solves the day. Noam Elkies came up with this monster set of equations to characterize solutions to the "Fermat Cubic Surface": http://www.math.harvard.edu/~elkies/4cubes.html, reducing the search space dramatically since we only need to look at triples (of both positive and negative integers) and filter the resulting cubic equations to those which have only 1 negative value and the relative-primeness condition. Here's Elkies' characterization in python:

def elkies( s, r, t ):
    w = -(s+r)*t*t + (s*s+2*r*r)*t - s*s*s + r*s*s - 2*r*r*s - r*r*r
    x = t*t*t - (s+r)*t*t + (s*s+2*r*r)*t + r*s*s - 2*r*r*s + r*r*r
    y = -t*t*t + (s+r)*t*t - (s*s+2*r*r)*t + 2*r*s*s - r*r*s + 2*r*r*r
    z = (s-2*r)*t*t + (r*r-s*s)*t + s*s*s - r*s*s + 2*r*r*s - 2*r*r*r
    return (w, x, y, z)

The smallest one I found was:

365^3 = 107^3 + 209^3 + 337^3

and the first with two different solutions:

67477^3 = 36919^3 + 46049^3 + 54205^3
          26627^3 + 46747^3 + 57103^3

and just for posterity, here are the first 1000:
Read more...Collapse )
(Leave a comment)

Saturday, February 28th, 2015

More on not trusting published algorithms

If you haven't read this paper on how not to implement the Sieve of Eratosthenes in a functional language, it's well worth your time: "The Genuine Sieve of Eratosthenes" by Melissa O'Neill of Harvey Mudd. It's about how a common simple functional programming example is completely wrong. (Her talk at Stanford on a new family of random number generators is worth the time too.. It's a cool hack to incorporate permutations into linear-congruential RNGs.)

I ran into a question on Quora about summing the first K prime numbers, and the person was asking whether there was a better way than trial division. When I pointed out that the Sieve was much better (even at relatively small scales!) his response was "then I need to fill all integers upto INTEGER.MAX_VALUE"--- i.e., how do we bridge the gap between "All the prime numbers up to N" and "The first K prime numbers".

There are three ways to tackle this and they're all interesting in their own way.

1) The Sieve is so much better than trial division that it almost doesn't matter. The prime number theorem tells us that there are about N/log(N) prime numbers less than N, and that the Kth prime is about K log K. So, trial division to find the Kth prime require work that's around O( (K log K)*K ) = O( K^2 log K ). We might be generous and note that most composite numbers don't require testing all previous primes in order to find a divisor, and bound below by Omega(K^2).

The sieve takes O(N log log N). If you're looking for, say, the 1000th prime, you can easily sieve up to 1000^2 = 1,000,000 and still come out ahead. (Particularly since, in practical terms, the Sieve doesn't require any division.)

2) The prime number theorem lets us come up with a pretty-good bound on how large an N we need to find the K'th prime. Run the sieve on, say, N = 2 K log K and that's a solid win.

3) Even if we could not predict where the Kth prime was likely to occur (or, say some subset of the primes of special interest), the Sieve can still be run to an indeterminate stopping point at only a constant-factor slowdown.

The paper linked above gives a sophisticated way of doing this, but brute force is fine too.

Say we sieve up to N=1000 and don't yet find the primes we're looking for. Then double N and start over from scratch. Do this until we find K primes; iteration J of the sieve has size N =(2^J)*1000. If we had an oracle to tell us J, we could have done one sieve pass. Instead we did sieves of size 1000, 2000, 4000, 8000, ... , 2^J*1000, but the sum of the sizes is just 2^(J+1)*1000 since it's a geometric series. So even if our answer was at 2^(J-1)*1000 + 1, the total amount of numbers sieved was at most 4x the minimum we needed to do.

And that's not the best we can do, if we re-use the first 1000 numbers rather than throwing them away and starting from scratch it gets somewhat better--- but that's just quibbling about constant factors. Even treating the Sieve as a black box gives us a better result than trial division.
(Leave a comment)

Wednesday, February 4th, 2015

Rule of thumb measurements on acquisitions

When a large company buys a small company, it typically doesn't have to disclose details of the acquisition price, even in its quarterly reports. This is because the small company's value is not "material" to the large company's operations. (I don't know where the dividing line is, but for a billion-dollar company it can be quite large.)

So a friend asked if I thought VMware had paid more or less than $100m for Shavlik (in May 2011.)

VMware's quarterly report in July 2011 disclosed $205m spent on acquisitions of four different companies over the past six months. One of these had a specific amount in their previously quarterly filing: $15m for NeoAccel.

The other two companies, SocialCast and SlideRocket (remember when VMware thought they were getting into "end user computing?") were startups with a couple rounds of funding each. It is unlikely that they were "fire sales" at the time--- cash was not hard to get in 2011. Nor were they small enough to be acquihires; VMware really thought they would be part of the product offering. (SocialCast is still active, SlideRocket was later sold off.)

The one number we do have is the size of their series B funding rounds. A series B investor typically wants to control about 30% of the company, so based on that rule of thumb we can give valuations as follows:

SocialCast: $8m series B in March 2010: about $28m
SlideRocket: $5m series B in July 2009: about $17m

These numbers are relevant to the purchase price in that they're a lower bound the investors will accept, but (given the no-fire-sale assumption above) they probably want more than those values. Picking nice round numbers we can estimate acquisition prices:

SocialCast: $40-60m
SlideRocket: $30-40m

SlideRocket might be a little more since they had longer to grow, but a 2x "single" for the VC investors would be acceptable.

That leaves between $100m and $120m as the purchase price for Shavlik.

Now, this estimate might be off: SocialCast or SlideRocket may have been seeing great revenue numbers and growing faster than I give them credit for. But Shavlik had revenue and a long history (founded 1993) so it was probably the largest of the acquisitions, or a lower bound of about $65m.
(Leave a comment)

Wednesday, November 26th, 2014

Future Calendars

I have started playing the "Elite: Dangerous" prerelease version (the gamma release). One choice the developers made is to continue to use real-time calendar dates, just advanced to the year 2300. That is, if you're playing on November 25th then the game reports the 25th of November 2300.

This works fine, except for leap years (the day of the week isn't shown.) 2016 is a leap year, but 2302 is not. Presumably David Braben is planning to continue to support the game for at least two years, so why not start with 2314? Then the problem doesn't occur until 2100/2400.

What about leap seconds? Well, the median solar day is currently increasing by about 1.7ms per century. The net error in 300 years should be around 280 seconds, big enough to notice! But you can just pretend that the ITU (or successor organization) made up its mind to abolish leap seconds, while strangely keeping the Earth calendar.

I suppose future-ITU could also be deemed to have change the leap years for the sake of game consistency as well. :)
(Leave a comment)

Monday, June 23rd, 2014

Why Mark is a Computer Scientist at Heart

So, I came across a question on Quora about showing that no reversible two-bit gates are universal. (NAND is universal for boolean logic but not reversible!)

My first thought was the CS one: well, how many 3-bit reversible functions are there? We could probably just do exhaustive search on combinations of two-bit gates and exhaust the space, there are far less than 8^8 reversible functions and that's only 16 million. (Hmm... how many are there? Must be something like 8! since each output must appear once and only once, so a reversible function is a permutation of the identity function.)

But the mathematical answer is the one I had to look up. All the 2-bit reversible functions are binary. That is, they are of the form f(x,y)=M(x,y)+(a,b) for some 2x2 matrix M. And the composition of linear transformations is always linear. But you can come up with a nonlinear 3-bit reversible binary function. (The Toffoli gate, which *is* universal, is one of them.)

If given enough time, I could probably come up with the second answer. But the first one is a lot more fun. :)
(Leave a comment)

Saturday, May 31st, 2014

A brick-and-mortar distributed systems problem

On Friday we climbed to the top of the Witch's Hat Water Tower. It's an old water tower (preserved as a area landmark and historic place) whose observation deck is only open once a year.

The top of the tower is reached via an internal spiral staircase which is rather narrow. The tower guides had arranged a system for tracking the number of people in the tower--- they handed blue notecards to those going up and retrieved them from those exiting the tower, thereby limiting the occupancy. But, this still led to congestion on the stairs as groups traveling in opposite directions had to squeeze by each other. Is it possible to come up with a system which also limits the number of people using the staircase in opposite directions? Without cheating by giving the guides mobile phones to communicate. :)

Time-based multiplexing is a common technique. Allow upward traffic for 5 minutes, then downward traffic for the next five minutes.

The token-based approach could also be expanded in a couple ways. For example, use two colors of tokens. Allow a group of people to start upwards but give the last person in the group a special red token. Block all other ascending traffic until the red token returns. When the red token gets to the top, collect it from the carrier and permit a group to begin going downwards, again giving the red token to the last person waiting to go down.

Now, there may be some psychological issues here where people might not appreciate being told to wait to leave the tower! But the guide was already doing a rough-and-ready contention control on the top portion of the stairs.
(1 comment | Leave a comment)

Thursday, May 22nd, 2014

Geeky Linkdump

20 question for Donald Knuth, of which I found Robert Tarjan's the most interesting. Knuth's response (in part):

In general I'm looking for more focus on algorithms that work fast with respect to problems whose size, n, is feasible. Most of today's literature is devoted to algorithms that are asymptotically great, but they are helpful only when n exceeds the size of the universe...

For instance, I've been reading about algorithms that decide whether or not a given graph G belongs to a certain class. Is G, say, chordal? You and others discovered some great algorithms for the chordality and minimum fillin problems, early on, and an enormous number of extremely ingenious procedures have subsequently been developed for characterizing the graphs of other classes. But I've been surprised to discover that very few of these newer algorithms have actually been implemented. They exist only on paper, and often with details only sketched.

I was annoyed at the recent New Scientist article making the rounds about a "proof" of how computers couldn't be conscious under Integrated Information Theory, so I'm glad Scott Aaronson took the time to show why IIT is bunk and saved me some ranting.

My company announced support for storing Hyper-V VMs on the Tintri VMstore bringing our total number of supported hypervisors to three, perilously close to the maximum possible. :) Only Openstack and Xen to go (well, and perhaps a few other KVM variants.) Chris Wahl posted a great review of Tintri at "Tom's IT Pro".

Bruce Schneier points out that the NSA is not magic. We've seen a lot of standard hacks carried out at industrial scale, or from privileged positions within the network, but not an Enigma-scale (or, well, Magic) breakthrough. I'm not sure this is cause for optimism; rather, it speaks poorly about the state of communications security that the NSA doesn't need such breakthroughs to do its job.

And, finally, one of my own pieces of writing on Quora: What mathematical functions can convert multiplication to summation? The previous answers were mainly literal-minded "only logarithms work among continuous functions" references rather than coming up with something that might be more interesting to somebody that's curious, like non-logarithmic slide rules.
(Leave a comment)

Wednesday, May 14th, 2014

Unix geekery

I fixed a bug this week which involved a subtle distinction in the behavior of

int f = open( filename, O_RDONLY );
fchmod( f, permissions );


chmod( filename, permission );

Can you guess what situations the latter handles that the open()/fchmod() combination does not?

SpoilersCollapse )
(4 comments | Leave a comment)

Sunday, April 13th, 2014

Minnebar 9

Here's how I spent my time on Saturday at the local 900-person "unconference":

Paul Cantrell, "Scheduling Minnebar with Simulated Annealing" Great talk as is usual from Paul. He intro'd by saying that simulated annealing sounds very fancy but really the algorithm is very dumb. For Minnebar, the key metric is whether people can attend all of the sessions they have expressed interest in through the session web tool. So the function being optimized is the mean, over all attendees, of the percent of their sessions they cannot attend due to scheduling. This may mean it's possible to game the system by registering for fewer sessions. Somebody who registers for 6 will have 0.17/N weight for having two sessions scheduled opposite each other, while somebody who expresses interest in just 2 will have 0.5/N weight on their preference. Certainly you should not register for any you don't much care about.

I asked about whether it was possible to do better on assignment to rooms. Currently Paul just assigns time slots first, then greedily puts the session with the largest number of interested people in the largest room. But I think it may be possible to do better than this--- some recurring sessions are known to be very popular and yet do not get placed in the largest venues.

There were quite a few questions of the form "couldn't you find a better/faster solution with X" to which is response was (correctly) "no, we only run this once a year so we don't care very much."

"Pack Your Own Chute - the Personal Decision to Join a Startup", a panel discussion led by Neal Tovsen, with Paul DeBettignies, Liz Tupper, Matt Hardy, and Todd H Gardner. This one was fairly depressing but a good conversation. They talked about the need to set boundaries and have clear communication with your co-founder when forming a startup. How much money are you willing to lose on this project? Where's your bail-out point? What is the worst-case scenario? (They also talked about some of the fun and freedom too.)

What struck me is that everyone on the panel was talking about creating a startup rather than joining one. Nobody talked about taking a smaller plunge by going to work for somebody else's startup first. (I know I found my time at Kealia very valuable when we started Tintri--- and I wasn't even doing the heavy lifting on the corporate/financial/HR side!) But just like you might save up money to make sure you could survive the 18 months it took to bootstrap your startup, it may make total sense to work for 2-3 years in another startup to build connections and experience for your own.

Another issue that was not confronted directly was moving. Relocating is a big cost and one that I totally understand not wanting to pay--- I didn't. But it should be asked: if the only way to make this startup succeed is to move to San Francisco, would you do so?

I learned about Track:js from one of its founders on the panel (and the other seated next to me in the next session!) A good-looking tool for debugging Javascript problems that happen in production systems, and one Tintri might try out.

Reed Robinson, "Lessons from a Failed Startup". The failed startup was Heroic, a local attempt at a recommendation service for home services. (His dad repairs garage openers.) He had a good list which I'll try to reproduce here, but I wish his slides were available:

  1. Validate your assumptions. (Like, "people care about my product", "the team can build the product", and "people will pay for my product.")
  2. Sell it, then build it. Heroic spent lots of time on UX but decided not to ask people to pay right away. For his next venture he wants to sell the idea and find people willing to fund the development. (I am not a huge fan of this model but VC looks like that if you squint at it correctly.)
  3. Acknowledge holes in your team ASAP. He talked about how Heroic lacked marketing, finance, and sales expertise. It's possible to go overboard as well here--- but you also don't want to dig a hole.
  4. See things to fruition. Reed gave an example of promotions that were tried and quickly discarded after only a week or so. Some strategic moves need time.
  5. Do things that other people don't want to do. His father gets ten emails a week from people wanting to partner with him. Very crowded space, maybe focus on something less "sexy".
  6. Intimately know your cash situation. Ties in with holes in the team.
  7. Have confidence, but don't take yourself too seriously. Celebrate successes.

Bridget Kromhout, Monitoring at a SaaS Startup. Talked about her work in ops at 8thBridge, and a wide variety of tools. One thing the Tintri support lead emphasizes is that alerts need to be actionable--- don't wake somebody up in the night for something they can't fix. Bridget reinforced this message strongly. Nagios is old but still a good tool for ensuring alerts are what you want. Some of the newer tools like Sensu are so complicated that ops people fear they need monitoring on their monitoring tool.

She shared a couple stories in which 8thBridged goofed some by not using some of the information they had available. MongoDB's MMS was telling them about a global write lock problem but they weren't monitoring or alerting on it. Etsy has a very good monitoring team and tries to monitor even those things that don't seem to be moving, "just in case they make a run for it."

Tools: Graphite and StatsD for collecting and showing stats (they feed info into nagios from them for actual alerting), Whisper for storing time-seris data, Carbon for buffering and storing stats. Slides with more references here: http://www.slideshare.net/bridgetkromhout/monitoring-at-a-saas-startup

Maybe Tintri should look at providing stats directly into some of these tools. She talked about Logstache (?), Kibana, and ElasticSearch as some next-generation tools it might be worth looking into.)

Jeff Lin of BustOut Solutions, "Chasing Ninja Rockstars: Searching for Top Talent and Why We're Doing it All Wrong". I should have given one this a miss. The key problem, as Jeff admitted, is that he doesn't do much hiring. His 13-person team is 5x smaller than the number of hires Tintri made last quarter. So when we asked questions like how to source more diverse candidates, or how to get team buy-in to change hiring practices, he didn't have any suggestions.

He had some good points about team culture mattering, and creativity requiring diversity, but it was not very practical advice on how to get there.

Jeff told an anecdote that pissed me off. At a previous company (not BustOut) his boss took the resumes and filtered out all those with Master's or PhD degrees and said he basically didn't want to hire anybody who had spent too much time in the "ivory tower" because they were out of touch with technology trends.

I'm not sure what Jeff thought the point of this story was (I think it was in support of skills over credentialism) but it directly contradicted much of the rest of what he said about looking for good problem-solvers and people who were enthusiastic learners. Nobody goes for their PhD to get a better salary. (The same may not necessarily be true of MSCS.) I suspect a not-so-subtle side effect is that Jeff's old boss didn't want to pay more for developers.

James Renkin, "Building A Global, Privacy-Conscious CDN On $20 A Day". This talk was pure geeky joy. James located 11 sites throughout the globe that were willing to let his virtual server peer with their BGP (the Internet's routing protocol.) He was thus able to create a content distribution network on his own using BGP anycast to direct users to the "closest" (sort of) server. This is what the big guys do (some root DNS servers use it, Akamai might but usually just uses DNS redirection.) But he did it for an outlay of about $7000/year. About $500 of that is the IP address range (which you have to jump through multiple hoops to get--- he's got an ISP side business which he was able to leverage.) The remainder is running the servers and additional charges for injecting his route into BGP. I learned a lot of this stuff during my PhD work, so this talk made me really happy. I have no intention of duplicating his work, though.

The conference had a couple innovations that didn't work so well. Best Buy let us use their employee parking garage this year instead of visitor parking, but frankly it's a maze (not very well marked, poor traffic flow.) They only had one exit gate opening to leave and more than one person got into the wrong lane.

The organizers brought in food trucks to serve lunch but the lines were long and it was not all that warm a day to be standing outside. I think three trucks was insufficient; it didn't help that it was not clear there were three separate lines, and that one of the trucks opened late.

There was also a keynote from an agile coach that I didn't get much out of. I was amused that he emphasized subtracting things to get more productivity but he's usually called in as an "add" to the team.
(2 comments | Leave a comment)

Tuesday, January 28th, 2014

100 answers on Quora

I reached this dubious milestone this week.

Here are the five answers I've written with the most upvotes:

Given that the digits of pi never end and never repeat, does a running average of those digits tend toward any limit?

Was Facebook or Twitter involved at all in the January 2014 [sic] wage fixing scandal?

What have you learned as an early employee at a startup?

Is suicide a problem in Sillicon [sic] Valley?

How good is ten jack offsuit against two random cards?

Answer I feel is most under-rated:

What is the #1 staffing problem that startups face?
(1 comment | Leave a comment)
Previous 10 | Next 10