Sunday, August 18th, 2013


The size of the search space in the toy problem I described in the last post is basically (n^2) choose (n^2)/2 for an n x n board. My initial implementation is in-memory only which only works for 3x3 and 4x4. It ought to work for 5x5 too but for some reason Python (on Windows) just stopped computation rather than giving me an out-of-memory error or finishing.

9C5 = 126
16C8 = 12,870
25C13 = 5,200,300
36C18 = 9,075,135,300
49C25 =~ 6.3 * 10^13
64C32 =~ 1.8 * 10^18

What you need for the brute-force algorithm is basically a bit per tile arrangement (permutation) to tell whether you've visited it already, plus a list of the "frontier" of newly-generated arrangements:

next = []
for X in frontier:
  for P in permutations:
    if P.X not already visited:
       mark P.X visited
       next.add( P.X )
frontier = next

So you only need sequential access to the queue of positions at distance D, in order to calculate D+1; that queue can safely be placed in a file.

9 billion bits is just over a gigabyte, thus a "streaming" algorithm where frontier/next are files will scale at least that far. And could even be written in pure Python! The files for each distance can be stored as a list of numbers representing the positions (no more than 64 GB for this size problem) or we can include some additional metadata if we want to remember the actual word.

How much further could we go using the same basic algorithm? Well, 49C25 bits is about 7.2 terabytes, so in-memory on a single computer is out. But, we could certainly use an array of flash drives to store the visited set. (A Tintri T540 has 2.4TGB of raw flash, but you could get 1 TB SSDs if you really want them.) Assuming sufficient parallelism to keep the drives busy, >=100000 IOPS is feasible, but even if each position required even a single I/O we're talking 20 years. (We might be somewhat smarter about I/Os but on the other hand the brute-force approach will visit the same position many times. Even a million IOPS is still 2 years.)

On the other hand, 7.2TB of memory is only about 31 Amazon EC2 cr1.8xlarge instances, so storing a distributed hash table would be feasible and solve the problem in a much shorter period of time. You would do this in a sort of map/reduce style, distributing the multi-TB list of positions in generation D to each node and have each independently produce its slice of generation D+1. Each node would categorize each P.X as "visited", "not visited", or "some other node's problem", and then merge all the "not visited" to form the list for the next iteration. Of course, you'd need at least at least half a petabyte to store the results, which doesn't come cheaply either. But you could throw money at the problem and it would still be doable within a reasonable amount of time. We're talking about 4 days to stream all the data at 10Gbps, though, so "reasonable" is probably at least a couple weeks.

So what about 8x8? Well, now the hash table itself is 200 petabytes. Even SETI-at-home style resources aren't adequate for this, as you'd need almost 900,000 computers with 244GB of memory to implement the straightforward approach. 200 petabytes is about as large as a storage cluster gets today, or larger, I think--- most examples I can find are no more than half that. (Internet Archive is at about 10 petabytes, Megaupload held about 30 petabytes, Facebooks's Hadoop cluster is 100PB physical.) Instead--- assuming you can build out a multimillion dollar exabyte storage solution for the intermediate steps of the solution--- you'd need to find ways to make CPU-vs-space tradeoffs. The algorithm would have to change at this point, given today's technology, and would probably consume a nontrivial portion of the Internet's communication capability. (Partitioning a second time by giving each computer only part of D and part of the visited set doesn't scale because you require N^2 passes to make sure you found all the previously visited positions in D+1.)

I asked earlier this year whether it's still true that computers can get just one "size" larger than humans--- in this example, I think computers can get 3 sizes larger, assuming the 4x4 grid is human-tractable and the 5x5 is not.
(Leave a comment)

Saturday, August 17th, 2013

Some preliminary goofing around on tile games

My vacation project has been putting together some Python code to systematically study various tile-permuting operations found in match-3 or sliding puzzle games. What I'm interested in is: given a particular goal on a square grid (like making a group of N tiles, or arriving at some other in a set of arrangements), how many moves does it take to reach the goal? Are certain types of moves better suited to particular types of goals?

Read more...Collapse )
(1 comment | Leave a comment)

Tuesday, July 9th, 2013

Random Matrix Algebra

I've been getting familiar with the Sage environment. Here's a nontrivial result (but I dunno how interesting it is) about eigenvalues.

Take the finite field GF(3), and look at all its 3x3 matrices. How many of these 39 = 19683 matrices' characteristic functions have all three roots? Or no roots? Brute force search reveals:

roots 	count
() 	3456
(0) 	2106
(1) 	2106
(2) 	2106
(0,0,0) 	729
(0,0,1) 	1054
(0,0,2) 	1053
(0,1,1) 	1052
(0,1,2) 	1404
(0,2,2) 	1053
(1,1,1) 	729
(1,1,2) 	1053
(1,2,2) 	1053
(2,2,2) 	729

No cubic in GF(3) has just two roots for the same reason as cubics in the reals don't have just two real roots, I think. The element of a field extension in which those two extra roots exist must cancel out. But in GF(3) you've got degree-three extensions too, so zero-root polynomials exist as well.

What's potentially interesting here is the symmetry breaking for (0, 0, 1) and (0, 0, 2). Those cases have +1 and -1 count, respectively, than you would "expect" by comparison with (0, 0, 1), (1, 1, 2), (0, 1, 1), and (1, 2, 2). The single-root cases are perfect symmetric, by contrast.

Anyway, since these roots are the eigenvalues it turns out that just slightly over 50% of the 3x3 matrices in GF(3) have the full set of eigenvalues (and thus might be diagonalizable, although probably a lot are defective.)

Expanding to 4x4 matrices (43 million of them) or GF(5) (about 2 million) would require a lot more processing time, so I'd like to make the computation smarter in some way.
(1 comment | Leave a comment)

Sunday, June 16th, 2013

Mathy thing I learned today

V. K. Leont’ev, Roots of random polynomials over a finite field, 2006 (payment required but you can see the first couple pages.)

As the size q of a finite field Fq grows larger, the number of roots of a random polynomial from Fq[x] asymptotically becomes a Poisson distribution with mean value 1. That is, for large finite fields, a random polynomial can be expected to have just one root within the finite field.

So if random matrices give rise to uniformly distributed characteristic polynomials (which I'm not at all sure is the case) then we can expect random matrices over finite fields to have just one eigenvalue, on average.
(Leave a comment)

Sunday, May 19th, 2013

Classic Math Geek Sunday

...will not become a regular feature, unlike timprov's classic photos.

But I did get a question from a high school student about a calculation I'd done about the game Tonk, five years ago on 2+2:

I didn't mention the technique in the forum post, but I used a generating function to count the possibilities. If you multiply (1+yxk) together for each value of "k" in the deck (as a multiset, not a set), you get a polynomial whose coefficient on yixj is the number of ways to make a "j"-point hand with "i" cards.

Back in 2008 there wasn't a readily available symbolic manipulation tool available on the web, so I hacked some python together to do the math. Today there's Wolfram Alpha, whose limited-duration (free) computation is powerful enough to give:

simplify | Coefficient[(1+y x)^4 (1+y x^2)^4 (1+y x^3)^4 (1+y x^4)^4 (1+y x^5)^4 (1+y x^6)^4 (1+y x^7)^4 (1+y x^8)^4 (1+y x^9)^4 (1+y x^10)^16, y, 5] as

4368 x^50+7280 x^49+10640 x^48+16720 x^47+22496 x^46+31216 x^45+40436 x^44+52556 x^43+65532 x^42+82176 x^41+92548 x^40+105176 x^39+116832 x^38+127484 x^37+136344 x^36+143676 x^35+147784 x^34+149268 x^33+146936 x^32+140224 x^31+134052 x^30+125188 x^29+115520 x^28+103808 x^27+92416 x^26+79416 x^25+67600 x^24+55712 x^23+45584 x^22+36708 x^21+28948 x^20+22144 x^19+16520 x^18+11988 x^17+8344 x^16+5724 x^15+3784 x^14+2492 x^13+1552 x^12+920 x^11+484 x^10+240 x^9+92 x^8+28 x^7+4 x^6

which (fortunately) seem to be the numbers I came up with in October 2008.

I wonder how well Wolfram Alpha's freemium model is working for them. :)
(4 comments | Leave a comment)

Monday, May 13th, 2013

Late-night group theory

Nearly every undergrad who takes abstract algebra learns the proof that the 15-puzzle cannot be solved if two of the tiles are reversed. That's because the permutations generated by the sliding motions are a subset of the entire permutation group (Sn), called the alternating group (An). The alternating group contains permutations that can be expressed as an even number of transpositions (called, unsurprisingly, "even permutations"), and a single-tile swap is an odd permutation.

So what if the motions are instead wraparound "slides"? That is, take an entire row or column and move it as many squares as you want, with tiles wrapping around from left to right or top to bottom. Does that set of moves produce the entire permutation group? Or is it the alternating group, or some other subgroup?

For odd-sized grids, the slide moves produce just even permutations. For example, sliding the top row on a 3x3 puzzle like this:
 3  1  2
 4  5  6
 7  8  9

is the cycle (1 2 3) = (1 3)(1 2), which is even.

For even-sized grids it seems possible to generate a single-transposition--- say, (1 2)--- since the slides are odd permutations, but how to prove it?

Well, one way is brute force. For a 4x4 grid it might be just feasible to find the shortest "word" (combination of generating moves) which generates a particular transposition, although the search space is of size 16! =~ 2*10^13. I thought for a bit and bounded depth-first-search seemed better than BFS or heuristic search, because the queue size could become very large. But the size grows very quickly, even with two pruning heuristics at work.

The 8 basic moves on a 4x4 grid are H1, H2, H3, H4, V1, V2, V3, V4, where Hy slides the y'th row right by one, and Vx slides the x'th column down by one. The reverse moves can be generated by cycling around, i.e., H1^-1 = H1.H1.H1. So there are two easy ways to prune the DFS tree. The first is that we should never use the same move more than three times in a row. The second is that all the H's commute with each other (and so do all the V's), so we shouldn't explore both H3.H2.H1 and H1.H2.H3 since they are the same.

Depth   # of words    (w/o pruning commutative pairs)
4            2443          4673
6           95883        298057
8         3761907
10      147594811
12     5790738743

So much for brute force--- we didn't find (1 2) in this search, and going to 14 would take more than overnight.

But, that undergraduate algebra class probably talked about commutators as well, moves of the form y^-1.x^-1.y.x. In solving a Rubik's cube these moves tend to swap just a subset of the cubes, making it easier to find a move which fixes a portion of the cube while leaving the rest alone. The same is true here. (Note that composition of group operators is read from right to left.)

A useful commutator, X = V2^-1 . H1^-1 . V2 . H1 (first row right, second column down, first row left, second column up) introduces a 3-cycle among 1, 2, and 14:
*14 *1  3  4
  5  6  7  8
  9 10 11 12
 13 *2 15 16

So V2.X.X produces the desired swap of 1 and 2, but has messed up the column:
[2   1] 3  4
 5 *14  7  8
 9  *6 11 12
13 *10 15 16

But, another three-cycle can be used to correct this. Let Z = V2^-1 . H3^-1 . V2 . H3, which looks like this:
 1   2  3  4
 5 *10  7  8
*6  *9 11 12
13  14 15 16

We just need to move the '10' into position, perform the cycle, and then back out again. That means something of the form Y^-1.Z.Y, where Y = V1^-1 . H4^-1. Here's Y alone:
 5  2  3  4
 9  6  7  8
14 10 11 12
 1 15 16 13

which screws up a bunch of stuff, but here's Y^-1.Z.Y:
 1   2  3  4
 5 *10  7  8
 9 *14 11 12
13  *6 15 16

which is a nice well-behaved three-cycle, exactly the one we want to patch up V2.X.X. Putting it all together, Y^-1.Z.Y.V2.X.X =
 2  1  3  4
 5  6  7  8
 9 10 11 12
13 14 15 16

or in word form: H4.V1.V2.V2.V2.H3.H3.H3.V2.H3.V1.V1.V1.H4.H4.H4.V2.V2.V2.V2.H1.H1.H1.V2.H1.V2.V2.V2.H1.H1.H1.V2.H1. There might be a shorter word for the same permutation, but at 33 moves we were not likely to find this one by brute force any time soon.

But, this is sufficient to show that the slide moves generate the whole symmetric group S_16, since we can generate any other desired transposition in the same way, and the transpositions are generators for S_16.

How could we encode the directed search I performed by hand, into a computer algorithm for finding efficient (but not necessarily minimal) sets of moves for a given permutation?
(3 comments | Leave a comment)

Tuesday, May 7th, 2013

Towards a Taxonomy of Match-3 Games

Field: square lattice, hexagonal lattice, freeform, 1-d

Moves: slide (15-puzzle style), rotate around lattice "great circle", rotate around interior point, swap, drop (singles or groups, with restricted placement at edges), place (arbitrary free location)

Matching result: promote/build (objects are combined to a next-tier object), clear

Minimum match size: 3, 4, 5+

Slide Lattice rotation Point rotation Swap Drop Place
Clear Puzzle Pirates "Navigation" (cylindrical topology), PopCap Chuzzle Puzzle Pirates "Foraging", KadoKado "Binary" (match-4) PopCap Bejeweled, Puzzle Quest, Puzzle Pirates "Bilging" PopCap Zuma (1-d, single unit), KadoKado "Paradice" (multiple units in parallel, match-4), KadoKado "Drakhan" (single unit, freeform), Puzzle Pirates "Sailing" (two-unit), Dr. Mario (two-unit, match-4) PopCap Alchemy (match-6)
Promote Alien Hive Naturalchimie (2-unit drops) Spryfox Triple Town

I am sure I have seen a "clear"ing game with the slide-to-empty-space mechanic, but I cannot come up with it. "Alchemy" is something of a cheat.

I am less sure that the "promote" row can be filled in. So many games use the swap mechanic that I feel like there must be a "build+swap" game but I can't think of one.

One obvious variation is to perform rotations around the center on a square grid. I.e., the center rotation would move 4 squares, the one outside that would move 12 squares, the next ring has 22 squares.

(4 comments | Leave a comment)

Saturday, April 6th, 2013

MinneBar 8

MinneBar 8 has come and gone. I had a good time, as usual, and touched base with a lot of friends who were attending.

"All the Monitoring" by Bryan Brandau from Best Buy: good overview of monitoring tools (other than Nagios). Showed Sensu, Graphite, Logstash. What the heck is up with all the different open-source dashboard projects, though? He mentioned at least five by name.

"Agile Financial Modelling" by the Investyr guys. A little weak, their main point was not to spend a bunch of time on useless stuff--- put a quick spreadsheet together spending 80% of your time on year 1, 15% on year 2, don't bother beyond year 3. I wish they had done more with showing what sort of scenarios you could play with once you had a spreadsheet in place, instead they actually built it on the fly.

"Do 'Data Science'" by a bunch of folks from NativeX. I walked out of this one--- should have gone to learn about Julia or attend the lightning talks instead, but both rooms were packed. The presentation was dull and process-oriented to start, and included an explanation of Bayes' Theorem that was tedious to those who already knew it and probably confusing to those who didn't. Really would have been a ton better if they'd started with "here's what we did for the company, want to learn how we did it" rather than some bogus "we follow the scientific method unlike you uncultured barbariansengineering types" . (I mean, they quoted this PhD comic but failed to notice that real science is more like the joke version--- that's the funny.)

Technical Support for Developers panel by Chris Warren, Luke Francl, and Kevin Whinnery. This was a lot of fun. The support they provide is mainly API support rather than purely end-user but they had good stories comparing different approaches (dedicated team vs. development team) and the value of direct customer feedback.

Shopping for Nerds by Neal Tovsen. This was a brief talk on how to weigh the risks of different technical "talent": outsourcing, freelancers, co-founder, etc. I think Neal accurately described the trade-offs, and his point that you need to make sure somebody is "looking out for you" is well taken. But the session could have been stronger with more examples to help illustrate what has worked and what hasn't--- we mainly got some (usually anonymized) horror stories.

I asked whether anybody in the room had actually been successful at throwing out their outsourced/cobbled-together MVP and building the "real" product afterwards. Nate Yourchuck said he's done a couple of consulting projects where he built a real product to replace emailed-around Excel spreadsheets. I remain dubious that you can really "build one to throw away" (pace Fred Brooks.)

I got to hear Neal give his pitch for "Apruve" several times today and yesterday, and I think there is quite a bit of potential there.

Journey to the Bottom of the Storage Stack by me. I thought it went pretty well, but I suspect I spoke too quickly. Also I kept wandering too close to another live microphone and causing feedback. My friend Scott brought a friend from Isilon along to keep me honest. That gentleman suggested an alternate proposal for an anomaly I'd found, of hot data blocks pointed to be cold metadata--- that frequently-written metadata blocks might never be flushed from the journal. I'm dubious, I don't know if ext4 actually implements that optimization.

Slides will go up on the Minnestar wiki within the next few days, hopefully.

Scaling with Cassandra by Jeff Bollinger + another NativeX (nee W3i) guy. Talked about why they chose to move from SQL Server to Cassandra, and about Cassandra's data and consistency model.

The numbers they gave for their production system: 12 nodes (each running 12 cores @ 2GHz) with 2x480GB commodity Intel flash drives. Per node they see about 240 writes/sec peak, and 888 reads/sec. (These numbers seemed low, I think they are observations rather than benchmarks.) They see about 3ms writes, 4ms reads (end-to-end) which is not too bad considering the extra network hop, but not blazing fast--- their SQL server achieves 1.5ms read latency, but most of the data is in memory after moving bulk data to Cassandra. Their solution is hybrid now, they kept SQL for the data that really was relational, and talked about the importance of understanding the data movement characteristics while doing the design. Jeff also mentioned that it took them a few times to get the hardware right, so they wished in retrospect they'd done more prototyping and experimentation using cloud resources instead of in-house boxes.

I always feel like I should stick around longer during the happy hour, but as usual I had had about all the monkeys I could take, and even on the edges of the large cafeteria area the conversational noise is quite high.
(Leave a comment)

Friday, January 18th, 2013

Neighborly Sequences

In an arithmetic sequence --- for example, 1, 4, 7, 10, 13, 16, 19 ... --- each number is the arithmetic mean of its neighbors. (In fact, it is the arithmetic mean of all of its N predecessors and N successors.)

In a geometric sequence (1, 4, 16, 64, 256, ...) each number is the geometric mean of its neighbors. And again this can be expanded to the N closest neighbors.

What other "aggregate" functions can produce sequences of this sort? For example, "max()" and "min()" produce only the constant sequences.

The "mex" function (minimum excluded number) produces a cyclic sequence, but cannot be expanded from one neighbor to multiple neighbors. For example,

mex( 0 ) = 1
mex( 0, 1 ) = 2
mex( 1, 2 ) = 0

produces the sequence 0, 1, 2, 0, 1, 2, .... Taking two neighbors produces the same sequence, a promising start!

mex( 0, 1, 0, 1 ) = 2
mex( 1, 2, 1, 2 ) = 0
mex( 2, 0, 2, 0 ) = 1

But mex( 0, 1, 2, 1, 2, 0 ) = 3. In fact I think we could show that 0, 1, 2 repeating and 0, 2, 1 repeating are the only possible mex-based sequences.

How about 'median'? Or '90th percentile'? Median works fine for any increasing sequence, if you include the number itself.

In general, what conditions on F: 2N -> N are necessary and sufficient so that a sequence s0, s1, s2, s3, ... exists with F( sn-k, sn-k+1, ..., sn, sn+1, ..., sn+k ) = sn? What if we require the sequence to be nonrepeating?
(Leave a comment)

Saturday, October 20th, 2012

Rapture Denied

I finished Rapture of the Nerds today, about which Marissa has more to say in her review. Their future is one in which the inner planets (except Earth, although the fate of Luna was not mentioned) have been disassembled to build orbiting computational substrates, which form a cloud around the sun in order to harvest its energy. Uploaded humans live at speeds greater than real-time.

I felt like the issue of lightspeed delay was mentioned but not really handled. Let's crunch some numbers. Or at least as many numbers as don't require me to perform calculus.

Let's assume the asteroid belt is included but no outer planets (which are mentioned as future work.) It doesn't make sense to disassemble Venus until you've scooped up all the small planetesimals first! So we're talking a radius of about R = 3.27 AU = 4.9*1011 meters. Although the past-rapture nerds could limit themselves to in-plane orbits, this is wasteful of sunlight so we should assume a sphere.

The speed of light is 3*108 m/s, so the maximum volume with which you can hold a conversation at 10ms round-trip latency has (5ms * c) radius = a mere 1500km. Now, 10ms latency is way less than humans need. But if you're living at 100x speedup then that's a perceptual second.

Suppose you want "common knowledge" (say, a fully-functional simulation of Earth) to be accessible at no more than a 1-second perceptual delay. How many times would a particular record need to be replicated within the cloud? R^3 / (1500km)^3 gives us about 3.5*1025 copies. This is the first hint that the computational infrastructure doesn't scale well.

(We can improve matters by sticking to the surface of a sphere rather than distributing nodes throughout its volume, but that makes the average case delay for uncommon information worse!)

Of course, if there are only say 1012 (a trillion) posthuman entities floating around, they can just take the data with them wherever they happen to be. But what about *uncommon* knowledge, like the consciousness of a particular individual?

Let's imagine the best possible case in which you're tucked neatly within the orbit of Mercury. By sending a signal a distance of D you can reach D^3/R^3 of the total storage capacity (at latency 2D/c). If data is uniformly distributed, then the probability of finding a unique datum within a given time limit is as follows:

ProbabilitySubjective seconds at 100x speedup
1%70,000 seconds (~19 hours)
2%24.6 hours
5%33 hours
10%42 hours
20%53 hours
50%72 hours

In other words, the Singularity is not like a universal jukebox so much as a universal you can find anything you want, but it'll probably take two days to get it. It seems a better bet for a Solar System-spanning electronic culture to run at slower than realtime.

spoileryCollapse )
(1 comment | Leave a comment)
Previous 10 | Next 10