Log in

Saturday, July 4th, 2015

Stupid NFS Tricks

One of the Tintri SE's asked whether the advice here about tuning NFS was relevant to us: http://docstore.mik.ua/orelly/networking_2ndEd/nfs/ch18_01.htm (It is not.)

Back in the days where CPU cycles and network bits were considered precious, NFS (the Network File System) was often run over UDP. UDP is a connectionless, best-effort "datagram" service. So occasionally packets get lost along the way, or corrupted and dropped (if you didn't disable checksumming due to aforesaid CPU costs.) But UDP doesn't provide any way to tell that this happened. Thus, the next layer up (SUN-RPC) put a transaction ID in each outgoing call, and verified that the corresponding response had arrived. If you don't see a response within a given time window, you assume that the packet got lost and retry.

This causes problems of two sorts. First, NFS itself must be robust against retransmitted packets, because sometimes it was the response that got lost, or you just didn't wait long enough for the file server to do its thing. This is not so easy for operations like "create a file" or "delete a file" where a second attempt would normally result in an error (respectively "a file with that name already exists" and "there's no such file.") So NFS servers started using what's called a "duplicate request cache" which tracked recently-seen operations and if an incoming request matched, the same response was echoed.

The second problem is figuring out what the appropriate timeout is. You want to keep the average cost of operations low, but not spend a lot of resources retransmitting packets. The latter could be expensive even if you don't have a crappy network--- say the server is slow because it's overloaded. You don't want to bombard it with extra traffic.

Say you're operating at 0.1% packet loss. A read (to disk) normally takes about 10ms when the system is not under load. If you set your timeout to 100ms, then the average read takes about 0.999 * 10ms + 0.000999 * 110ms + 0.000000999 * 210ms + so on, about 10.1ms. But if the timeout is a second, that becomes 11ms, and if the timeout's 10 seconds then we're talking 20ms.

So, at least in theory, this is worth tuning because a 2x difference between a bad setting and a good setting is worthwhile. Except that the whole setup is completely nuts.

In order to make a reasonable decision, the system administrator needs statistics on how long various NFS calls tend to make, and the client captures this information. But once you've done that, why does the system administrator need to get involved? Why shouldn't the NFS client automatically adapt to the observed latency, and dynamically calculate a timeout value in keeping with a higher-level policy? (For a concrete example, the timeout could be set to the 99th percentile of observed response times, for an expected 1% over-retransmit rate.) Why on earth is it better to provide a tuning guide rather than develop a protocol implementation which doesn't need tuning? This fix wouldn't require any agreement from the server, you could just do it right on your own!

Fortunately, in the modern world NFS mainly runs over TCP, which has mechanisms which can usually tell more quickly that a request or response has gone missing. (Not always, and some of its specified timeouts suffer the same problem of being hard-coded rather than adaptive.) This doesn't remove the first problem (you still need the duplicate request cache) but makes an entire class of tuning pretty much obsolete.

Nothing upsets me more in systems research and development, than a parameter that the user must set in order for the system to work correctly. If the correct value can be obtained observationally or experimentally, the system should do so. If the parameter must be set based on intuition, "workload type", expensive consultants, or the researcher's best guess then the real problem hasn't been solved yet.
(Leave a comment)

Friday, May 29th, 2015

Internet arguments about math I have participated in, because I'm on vacation

1. Is there such thing as an uncomputable integer? Or, even if a function F(x) is uncomputable, does, say F(10^100) still count as computable because, hey, some Turing machine out there somewhere could output it! Or does a number count as computable only if we (humans) can distinguish that machine from some other machine that also outputs a big number?

Consider that if F() is uncomputable, it seems extremely dubious that humans have some mechanism that lets them create Turing machines that produce arbitrarily-large values of F.

I'm probably wrong on the strict technical merits but those technical grounds are unduly Platonist for my taste.

2. Is trial division the Sieve of Eratosthenes? Maybe it's the same algorithm, just a particularly bad implementation!

I'm with R. A. Fisher on this one. If the idea was just some way to laboriously generate primes, it wouldn't be famous, even among Eratosthenes' contemporaries. The whole thing that makes the sieve notable is that you can apply arithmetic sequences (which translate into straight lines on a grid!) to cut down your work.

The counterargument seems to be that Eratosthenes might well have performed "skip, skip, skip, skip, mark" instead of jumping ahead by five.
(Leave a comment)

Saturday, May 2nd, 2015

Never Trust Any Published Algorithm

The slides from my Minnebar talk, "Never Trust Any Published Algorithm", can be found here.

I got a couple good stories in response.

A local person who works in computer vision talked about finding a new algorithm for shadow detection. It worked great on the graduate student's ten examples. It didn't work at all on his company's hundreds of examples in all different lighting and weather conditions.

A coworker shared:

A few years ago my brother was working at an aircraft instrument company. He was
looking for an algorithm that described max loading for a plane still able to take off
based on many variables.

He found what was supposed to be the definitive solution written by a grad student at
USC. He implemented the algorithm and started testing it against a large DB of data
from actual airframe tests. He quickly found that the algorithm worked just fine for
sea-level examples, but not as airport altitude rose. He looked through the algorithm
and finally found where the math was wrong. He fixed his code to match his new
algorithm and found that it now matched the data he had for testing.

He sent the updated algorithm to the grad student so he could update it on public
servers. He never heard back nor did he ever see the public code updated.

The example I put as a bonus slide wasn't previously mentioned in my series of blog posts is a good one too. In Clustering of Time Series Subsequences is Meaningless the authors showed that an entire big data technique that had been used dozens of times in published papers produced essentially random results. (The technique was to perform cluster analysis over sliding window slices of time series.)
(Leave a comment)

Friday, April 24th, 2015

Infinitesimals in the family of asymptotics

I answered this question about the Master Theorem (which is used to calculate the asymptotic behavior of divide-and-conquer algorithms): Why is the recurrence: T (n) = sqrt(2) *T (n/2) + log(n) a case 1 of master method?

The source of confusion (see the comments to my answer) seemed to be that the original poster did not really understand the asymptotic behavior of log(n) and n/log(n). In fact, he went so far as to post a graph "proving" that n^0.99 is larger than n/log(n). However, this is false for large enough numbers (large enough being somewhere around 10^100 in this case.) The logarithm grows more slowly than any positive power of n. As a consequence, n/log(n) grows asymptotically faster than any positive power of n less than 1.

What I realized is that this might be some student's first (and maybe only!) experience with infinitesimals! Normally we think of infinitesimals as numbers "smaller than any positive real number". For example, before epsilon-delta proofs and modern analysis, the differential and integral calculus informally treated dx as an infinitesimal value. But while students are told the "idea" is that dx is a very small value, they are repeatedly cautioned not to treat it as an actual number.

The surreal numbers (and thus the class of combinatoric Games too) have infinitesimals, but they are far outside the normal curriculum.

So what model is a student supposed to apply when confronted with O(log n) or Θ(log n)? It behaves exactly like an infinitesimal in the family of asymptotic limits. big-O = bounded above, Ω = bounded below, and Θ = bounded both above and below, thus:

log n = Ω( 1 )

log n = O( n^c ) for any c > 0, i.e., log n = O(n), log n = O(sqrt(n)), log n = O(n^0.0001)

n^k log n = O( n^c ) for any c > k.

n / log n = Ω( n^c ) for any c < 1

Nor does taking a power of the logarithm make any difference.

log^100 n = (log n)^100 = O( n^c ) for any c > 0

Crazy, right? But we can get Wolfram Alpha to calculate the crossover point, say, when does (log n)^100 = n^0.01? At about 4.3*10^50669.

That's what makes logarithm an infinitesimal. No matter what power we raise it to (think "multiply") it is still smaller in its asymptotic behavior than the smallest power of n (think "any positive real number.") And there's a whole family of infinitesimals hiding here. log log n is infinitesimally smaller than log n. Don't even ask about the inverse Ackermann function or the iterated-log function.

So it's not surprising that students might have difficulty if this is their first journey outside the real numbers. Everybody can handle the fact that O(n^2) and O(n^0.99) and O(n^0.5) are different, but the fact that none of these examples will be O(log n) is kind of perplexing, because O(log n) is obviously bigger than O(n^0). (Jumping to exponentials like O(2^n) seems like an easier stretch.)

What was your first encounter with an infinitesimal?
(Leave a comment)

Wednesday, March 18th, 2015

Sums of Cubes

Somebody on Quora asked for an example of w^3 + x^3 + y^3 = z^3 in positive integers, all relatively prime. Turns out they were looking for a counterexample of a theorem they thought they proved, which is sort of a passive-aggressive way to approach it.

Anyway, brute force solves the day. Noam Elkies came up with this monster set of equations to characterize solutions to the "Fermat Cubic Surface": http://www.math.harvard.edu/~elkies/4cubes.html, reducing the search space dramatically since we only need to look at triples (of both positive and negative integers) and filter the resulting cubic equations to those which have only 1 negative value and the relative-primeness condition. Here's Elkies' characterization in python:

def elkies( s, r, t ):
    w = -(s+r)*t*t + (s*s+2*r*r)*t - s*s*s + r*s*s - 2*r*r*s - r*r*r
    x = t*t*t - (s+r)*t*t + (s*s+2*r*r)*t + r*s*s - 2*r*r*s + r*r*r
    y = -t*t*t + (s+r)*t*t - (s*s+2*r*r)*t + 2*r*s*s - r*r*s + 2*r*r*r
    z = (s-2*r)*t*t + (r*r-s*s)*t + s*s*s - r*s*s + 2*r*r*s - 2*r*r*r
    return (w, x, y, z)

The smallest one I found was:

365^3 = 107^3 + 209^3 + 337^3

and the first with two different solutions:

67477^3 = 36919^3 + 46049^3 + 54205^3
          26627^3 + 46747^3 + 57103^3

and just for posterity, here are the first 1000:
Read more...Collapse )
(Leave a comment)

Saturday, February 28th, 2015

More on not trusting published algorithms

If you haven't read this paper on how not to implement the Sieve of Eratosthenes in a functional language, it's well worth your time: "The Genuine Sieve of Eratosthenes" by Melissa O'Neill of Harvey Mudd. It's about how a common simple functional programming example is completely wrong. (Her talk at Stanford on a new family of random number generators is worth the time too.. It's a cool hack to incorporate permutations into linear-congruential RNGs.)

I ran into a question on Quora about summing the first K prime numbers, and the person was asking whether there was a better way than trial division. When I pointed out that the Sieve was much better (even at relatively small scales!) his response was "then I need to fill all integers upto INTEGER.MAX_VALUE"--- i.e., how do we bridge the gap between "All the prime numbers up to N" and "The first K prime numbers".

There are three ways to tackle this and they're all interesting in their own way.

1) The Sieve is so much better than trial division that it almost doesn't matter. The prime number theorem tells us that there are about N/log(N) prime numbers less than N, and that the Kth prime is about K log K. So, trial division to find the Kth prime require work that's around O( (K log K)*K ) = O( K^2 log K ). We might be generous and note that most composite numbers don't require testing all previous primes in order to find a divisor, and bound below by Omega(K^2).

The sieve takes O(N log log N). If you're looking for, say, the 1000th prime, you can easily sieve up to 1000^2 = 1,000,000 and still come out ahead. (Particularly since, in practical terms, the Sieve doesn't require any division.)

2) The prime number theorem lets us come up with a pretty-good bound on how large an N we need to find the K'th prime. Run the sieve on, say, N = 2 K log K and that's a solid win.

3) Even if we could not predict where the Kth prime was likely to occur (or, say some subset of the primes of special interest), the Sieve can still be run to an indeterminate stopping point at only a constant-factor slowdown.

The paper linked above gives a sophisticated way of doing this, but brute force is fine too.

Say we sieve up to N=1000 and don't yet find the primes we're looking for. Then double N and start over from scratch. Do this until we find K primes; iteration J of the sieve has size N =(2^J)*1000. If we had an oracle to tell us J, we could have done one sieve pass. Instead we did sieves of size 1000, 2000, 4000, 8000, ... , 2^J*1000, but the sum of the sizes is just 2^(J+1)*1000 since it's a geometric series. So even if our answer was at 2^(J-1)*1000 + 1, the total amount of numbers sieved was at most 4x the minimum we needed to do.

And that's not the best we can do, if we re-use the first 1000 numbers rather than throwing them away and starting from scratch it gets somewhat better--- but that's just quibbling about constant factors. Even treating the Sieve as a black box gives us a better result than trial division.
(Leave a comment)

Wednesday, February 4th, 2015

Rule of thumb measurements on acquisitions

When a large company buys a small company, it typically doesn't have to disclose details of the acquisition price, even in its quarterly reports. This is because the small company's value is not "material" to the large company's operations. (I don't know where the dividing line is, but for a billion-dollar company it can be quite large.)

So a friend asked if I thought VMware had paid more or less than $100m for Shavlik (in May 2011.)

VMware's quarterly report in July 2011 disclosed $205m spent on acquisitions of four different companies over the past six months. One of these had a specific amount in their previously quarterly filing: $15m for NeoAccel.

The other two companies, SocialCast and SlideRocket (remember when VMware thought they were getting into "end user computing?") were startups with a couple rounds of funding each. It is unlikely that they were "fire sales" at the time--- cash was not hard to get in 2011. Nor were they small enough to be acquihires; VMware really thought they would be part of the product offering. (SocialCast is still active, SlideRocket was later sold off.)

The one number we do have is the size of their series B funding rounds. A series B investor typically wants to control about 30% of the company, so based on that rule of thumb we can give valuations as follows:

SocialCast: $8m series B in March 2010: about $28m
SlideRocket: $5m series B in July 2009: about $17m

These numbers are relevant to the purchase price in that they're a lower bound the investors will accept, but (given the no-fire-sale assumption above) they probably want more than those values. Picking nice round numbers we can estimate acquisition prices:

SocialCast: $40-60m
SlideRocket: $30-40m

SlideRocket might be a little more since they had longer to grow, but a 2x "single" for the VC investors would be acceptable.

That leaves between $100m and $120m as the purchase price for Shavlik.

Now, this estimate might be off: SocialCast or SlideRocket may have been seeing great revenue numbers and growing faster than I give them credit for. But Shavlik had revenue and a long history (founded 1993) so it was probably the largest of the acquisitions, or a lower bound of about $65m.
(Leave a comment)

Wednesday, November 26th, 2014

Future Calendars

I have started playing the "Elite: Dangerous" prerelease version (the gamma release). One choice the developers made is to continue to use real-time calendar dates, just advanced to the year 2300. That is, if you're playing on November 25th then the game reports the 25th of November 2300.

This works fine, except for leap years (the day of the week isn't shown.) 2016 is a leap year, but 2302 is not. Presumably David Braben is planning to continue to support the game for at least two years, so why not start with 2314? Then the problem doesn't occur until 2100/2400.

What about leap seconds? Well, the median solar day is currently increasing by about 1.7ms per century. The net error in 300 years should be around 280 seconds, big enough to notice! But you can just pretend that the ITU (or successor organization) made up its mind to abolish leap seconds, while strangely keeping the Earth calendar.

I suppose future-ITU could also be deemed to have change the leap years for the sake of game consistency as well. :)
(Leave a comment)

Monday, June 23rd, 2014

Why Mark is a Computer Scientist at Heart

So, I came across a question on Quora about showing that no reversible two-bit gates are universal. (NAND is universal for boolean logic but not reversible!)

My first thought was the CS one: well, how many 3-bit reversible functions are there? We could probably just do exhaustive search on combinations of two-bit gates and exhaust the space, there are far less than 8^8 reversible functions and that's only 16 million. (Hmm... how many are there? Must be something like 8! since each output must appear once and only once, so a reversible function is a permutation of the identity function.)

But the mathematical answer is the one I had to look up. All the 2-bit reversible functions are binary. That is, they are of the form f(x,y)=M(x,y)+(a,b) for some 2x2 matrix M. And the composition of linear transformations is always linear. But you can come up with a nonlinear 3-bit reversible binary function. (The Toffoli gate, which *is* universal, is one of them.)

If given enough time, I could probably come up with the second answer. But the first one is a lot more fun. :)
(Leave a comment)

Saturday, May 31st, 2014

A brick-and-mortar distributed systems problem

On Friday we climbed to the top of the Witch's Hat Water Tower. It's an old water tower (preserved as a area landmark and historic place) whose observation deck is only open once a year.

The top of the tower is reached via an internal spiral staircase which is rather narrow. The tower guides had arranged a system for tracking the number of people in the tower--- they handed blue notecards to those going up and retrieved them from those exiting the tower, thereby limiting the occupancy. But, this still led to congestion on the stairs as groups traveling in opposite directions had to squeeze by each other. Is it possible to come up with a system which also limits the number of people using the staircase in opposite directions? Without cheating by giving the guides mobile phones to communicate. :)

Time-based multiplexing is a common technique. Allow upward traffic for 5 minutes, then downward traffic for the next five minutes.

The token-based approach could also be expanded in a couple ways. For example, use two colors of tokens. Allow a group of people to start upwards but give the last person in the group a special red token. Block all other ascending traffic until the red token returns. When the red token gets to the top, collect it from the carrier and permit a group to begin going downwards, again giving the red token to the last person waiting to go down.

Now, there may be some psychological issues here where people might not appreciate being told to wait to leave the tower! But the guide was already doing a rough-and-ready contention control on the top portion of the stairs.
(1 comment | Leave a comment)
Previous 10 | Next 10