?

Log in

Saturday, September 24th, 2016

Roguelike Celebration 2/3

Angband by Erik Osheim and Robert Au, two of the current dev team. This was an interesting insight into being a maintainer of a codebase not originally your own. In fact, the very first Angband authors passed it on nearly immediately--- because they were graduating from college. This system broke down around 2005 when the transition from one benevolent dictator to another failed. This lead to the creation of an official dev team (and relicensing in GPLv2) with a more collaborative approach and better coverage for succession issues.

But the question of "what is Vanilla Angband" hasn't been answered! The sense I got is that this is something the dev team still struggles with--- what is their job? Some of their changes were obvious: prune the ports (old platforms no longer testable), induce less RSI on things like targeting missile attacks, and support UTF-8. But Angband circa 2005 was undergoing a revolution in play style of rapidly diving to lower levels. This increased the danger to the player--- but also provided better rewards, and trained the player in avoidance techniques that were crucial to the endgame.

They decided to embrace this strategy and made some gameplay changes to support and encourage it it. One is "forced descent" to prevent the player from farming easy levels. Another is selling more staples in town, and removing selling of items. (They also removed Charisma and haggling, but that is more in the "obvious cleanup" category than a "rebalance the game" change.)

They briefly mentioned some simulations they did, but didn't go into detail--- this would be interesting to look into. They also said that it's hard to tell whether players are actually happier with the game, given that lots of players never post on the forums to complain. :) I wonder how many roguelike games would benefit from "phone-home" functionality that tells the dev team what players are actually spending their time on.

What didn't get fixed: documentation and testing, like any other open-source project. A google search still turns up a guide from 1996 as the top result.

"The Infinite Dungeon" (originally titled "Difficulty in Roguelike Games" in the program) by John Harris. This was one of the most thought-provoking talks to me. John asks why, in a golden age of roguelikes, does he have difficulty building enthusiasm for some of the new offerings?

He goes back to some of the first procedurally-generated content. TSR's Strategic Review, Volume 1, Issue 1, has an article on randomly generating dungeons in D&D. But this sort of generation is not really interesting. It's ultimately just a linear sequence of challenges. It's literally gambling--- stop now or go on? Because the structure of the environment you've seen so far has no bearing whatsoever to what comes next. (If you back up and try a branch you passed over before, the next room will be the same as if you stayed where you are.) Rogue does better at giving an actual sense of place.

Contrast those, however, with an early D&D adventure like "Village of Hommlet" that has layered depth and backstory (and surprises!) I sometimes think the temptation is always to add too much--- I always wanted everything to tie together in my Shadowrun campaigns, but the real world does have things that are more or less as they seem. But the point John made is that this backstory and design is something players can exploit. They could figure out that the monsters are likely to have a midden, and go look for it. How could we do the same in roguelike games?

He proposed a trichotomy: knowledge, logic, and wit. (Though he doesn't much like "wit" for the third category.)

1. There are things about a game you could look up in a FAQ or guide, that are able to be written down. Like, here are all the possible potions in Nethack.

2. There's inferences you can make from the knowledge: if I know X, then Y must be true too. If I've fought a tiger, the other room must have a lady. If there is only one unknown potion, it must be Z.

3. Then there are things that constitute "game playing ability" or the ability to discern intent behind a game. It's the application of common sense or intuition or pattern-finding abilities to things not yet encoded in guides.

How can we make games where we get to exercise our "wit"?

(Brian Walker, whose talk I'll cover in the next entry, had some thoughts about how story emerges from place/space/environment in Brogue. But each level in Brogue still feels random, in that there's no geological reason why there's a pool of water one place and not another--- or maybe I just haven't figured it out yet.)

Dwarf Fortress Design Inspirations by Zach and Tarn Adams. This was basically a love letter to their wasted childhoods. :) But more than that, it was a fascinating look at how they put the things they loved in games into their own game. Even their earliest games (text-only scrolling narratives) incorporated the elements that they knew how to do at the time, like remembering what killed you.

Permadeath in roguelikes is often coupled to a high score list. They view Dwarf Fortress's legends mode as an evolution of that same high score list--- recording what you did for later use or comparison. The Hack "bones file" was a revelation for them and is translated almost literally into their initial vision for Dwarf Fortress: build a fort, then explore its ruins.

They also talked about games they have not yet been able to incorporate. The game Ragnarok features climactic final battles among the gods. They'd like Dwarf Fortress to have the same sort of arc where great powers arise and ultimately shape the world in large-scale ways. This seems like something that might also edge into John Harris's "wit" --- if you know the shape of the story, you can exploit it.
(Leave a comment)

Friday, September 23rd, 2016

Roguelike Celebration notes 1/N

Last weekend I attended the Roguelike Celebration (roguelike.club) at the Eventbrite headquarters. All the recordings are available online. Each session is short, only a half hour each, but the conference did have to split into two tracks. Here are my notes from ones I attended in person.

A Love Letter to Hack by George Moromisato, author of Transendence, a space exploration and shoot'em'up game. He talked about the ways he was inspired by Nethack and tried to apply the same lessons to his game. So what's so awesome about Nethack? Despite its crude ASCII nature, it came out in 1987, the same time as Ultima V.

He contrasts "graphical immersion" with "interactive immersion", "quality" vs "variety" and "experience" vs "mastery." The last axis is sometimes talked about as "replayability", but the distinction he's drawing is more about whether you are continuously learning the game, or just going back for another helping.

One of the key features he identified was the "illusion of winnability." Even though you almost always lose, there's a sense that "if only I had done X". Nethackers label this "Yet Another Stupid Death." But the combinatorial explosion of possibilities makes it hard to win even an "open book test" like Nethack where you have unlimited time to make up your mind. This combinatorial explosion and interaction was a common theme at the conference.

Because George is writing a space game, he found it hard to take advantage of another of Nethack's strengths, which is relying upon the player's knowledge of the world. Yes, Nethack is a fantasy game and has its own set of conventions. But it also has monsters the are familiar to fantasy fans, and interactions like "getting things wet" behave like they do in the real world. In a space game, this is harder, because the interactions between different technologies doesn't have the same sort of standard vocabulary.

A third feature he characterized as "quantity is quality." Interacting with 100's of monsters is its own good experience even if each of them is just an ASCII letter and a small set of behaviors. It's the combinatorial explosion of attributes interacting which makes this variety possible. Roguelike developers provide the most benefit when they add "one more experience" to the game rather than higher-quality content. He characterized this as "building a city, not a resort."

Accessibility in Roguelike Games" by Alexei Pepers talked about a project to make Nethack more accessible to visually impaired players. (Believe it or not, some do play it by using screen readers pronouncing the punctionation characters on the screen!) Her main idea was new commands that describe your surroundings in text, including the exact offsets in terms of relative number of tiles. Backtracing and mapping is also hard, so they also added some shortcuts for navigation ("head back to the staircase.")

Alexei characterized three main lessons learned: (1) No substitute for feedback from visually impaired users. They did some of their experiments with sighted players with the map turned off. But that population are not experts in screen-reading software! (2) Give users options on how exactly to get info, for example NSEW vs up/down/left/right. (3) The complexity of games leads to a lot of special cases. For example, a common tactic in Nethack is stealing from shops by training your pet to pick up items and put them down in the store entrance where you can grab them. A large store may have tons of items and having screen-reading software describe them all every clock tick, in order for the player to figure out what their dog is doing, is a complete pain.

Corridors in Nethack, and some of the lower levels, are very difficult. Precise spatial information using text is hard, while maintaining the sense of immersion that makes the game fun. (One idea that I'd like to see is building a tactile interface for showing the shapes of corridors--- but I don't have the time nor skills to put one together.)

Concrete Poems by Nick Montfort (a poet.) There was a lengthy opening discursion on why a tweet is limited to 140 characters, as an example of the "history of material texts" that Nick is interested in. But this left frustratingly little time to talk about roguelikes! (It's an entertaining enough exposition, so you should watch it.) Why were compute terminals a grid? Because of typewriters, which need to advance by the same amount on each letter. Previous typography was, sensibly, proportional.

But what really got my attention is talking about "Concrete poetry" and some of his examples were stunning. This is poetry that uses the typewriter, but in a "dirty" form: moving the paper around, masking letters, changing the ribbon, etc. He presented this as an early exploration of how artists made "space" out of ASCII.

His challenge was what roguelike developers can do to make something as visually stunning as the examples he showed, like Steve McCaffery's "Carnival".

(I have notes on several more sessions which I'll post later, but each session had a lot of depth so my writeups tend to be long.)
(Leave a comment)

A micro-study of NFS operation sizes

Over at the Tintri blog, I wrote a study of NFS operation sizes in Tintri's customer base. I looked at averages weighted by operation and by byte, and tried to apply K-means clustering to the data to discover patterns of usage.

Here's a visualization I didn't include in that blog entry. It shows a timeline for several production VMstores, with each time point colored based on the characteristics of its read sizes. Many VMstores have a stable workload, others less so. (White areas are missing data.)

(Leave a comment)

Tuesday, January 26th, 2016

Legal Go Positions and Nonsingular Matrixes in F_3

A Go position is legal if there is no completely surrounded group of stones in it. It can be represented as an NxN array with each entry filled in as 0 (blank), 1 (white), or 2 (black.) I don't want to use the term "dead" because a dead group is a legal position, but an already-captured one still on the board is not. Note also that while players must alternate turns, 'pass' is a legal move so for example Black could fill the board all by herself without a single White stone. (But not fill in the last square as that would kill her group.)

A matrix over the finite field F_3 is also an NxN array with each entry either 0, 1, or 2. It is singular if its determinant is zero. The nonsingular matrices form a group under multiplication (because they are invertible.)

Owen Maresh asks how these two sets compare: https://twitter.com/graveolens/status/691732255333548032

For the easily-tractable cases, here is the answer:

11,232 nonsingular 3x3 matrices over F_3
12,675 legal 3x3 Go positions
6,996 that are both

24,261,120 nonsingular 4x4 matrices over F_3
24,318,165 legal 4x4 Go positions
13,518,392 that are both
(5 comments | Leave a comment)

Sunday, January 24th, 2016

Fermat's near misses

Using the technique described in my previous post, I attempted to answer this question on Quora: "What is the smallest possible difference between the sum of two perfect nth powers and a different nth power?" http://qr.ae/RO86UX

That is, what is the smallest nontrivial 'd' (in absolute terms) in the equation a^n + b^n = c^n + d?

After excluding the all-1 cases, and the cases where a = c, here are the best solutions for a,b,c < 10000:

8^3 + 6^3 = 9^3 + -1
5^4 + 5^4 = 6^4 + -46
16^5 + 13^5 = 17^5 + 12
2^6 + 2^6 = 3^6 + -601
2^7 + 2^7 = 3^7 + -1931

Here are some more computational number theory results on x^3 + y^2 = z^5: http://qr.ae/RO864s
(1 comment | Leave a comment)

Friday, January 22nd, 2016

Parallelizing

In 1991 Daniel Bernstein published a short paper on a computational technique for solving diophantine equations of the form p(a) + q(b) = r(c) + s(d): https://cr.yp.to/papers/sortedsums.pdf He describes how a heap can be used to visit all sums p(a) + q(b), with a,b < H, in increasing order, using only near-linear space and near-linear time (in H). You can then set up two heaps and "increment" whichever one has the lowest minimum value, outputting any cases of equality.

Can this technique be efficiently parallelized, on a SIMD machine (such as GPU)?

A simple approach is to divide up the elements in one of the heaps. That is, instead of, say, [math]r(d) = d^4[/math], you could have a family of functions [math]r_i(d) = (192d+i)^4[/math], which partition the search space. Each thread gets a different function to work with. This reduces one of the heaps by a factor of 192, but the other is just as large. Where previously we did H^2 + H^2 heap operations, now we have to do H^2 + H(H/192) heap operations, and the operations on the smaller heap are more efficient in the worst-case as well. Also, every thread needs its own pair of *both* heaps which is somewhat memory-intensive, limiting either the size of the problem that can be tackled or the number of available threads.

We could also partition *all* of the inputs as long as every partition is tested against every other combination of partitions. (This is a natural optimization when some restriction on the equations can be found for some moduli, thus eliminating a broad range of possible solutions.) For example, if we divided all the inputs in half by parity, then we need to check 8 different equations. This again requires a pair of heaps per thread.

We could perhaps continue to use a shared heap for the left-hand side of the equation, at the cost of threads becoming idle when their minimum value grows too large. I don't know how to evaluate what this does to the speedup. It also introduces some serialization as the heap operation itself can't be parallelized.

A variation on the shared heap would "unroll" the left-hand heap and have a shared array of the next K values. I can't decide whether this is helpful or not.

On a side note, perhaps the availability of cheap large-capacity flash devices may make it feasible to apply techniques previously dismissed as infeasible due to space requirements.
(Leave a comment)

Friday, January 1st, 2016

How I've been entertaining myself

I participated in Advent of Code, and Advent calendar of programming puzzles. Of the 25 days of puzzles, one I solved without doing any coding, one other by minor scripting of an existing application, and the rest in Python. It was a fun exercise. (Sadly, I did not submit a solution quickly enough on the last day to make the leaderboard.)

For Christmas I received a NVIDIA Jetson TK1, a development board for the Tegra K1 "system-on-a-chip". Most notably, it comes with a 192-core GPU built in, making it a 5"x5" supercomputer.

I thought a good project to get more familiar with programming it would be to reimplement some of the Advent of Code problems using CUDA (Nvidia's programming API for GPU computing.) I posted my first effort on github: https://github.com/mgritter/advent-of-cuda/tree/master/day4 (But, I haven't rerun my old Python solution to demonstrate the speedup!) The new code is capable of finding the solutions to the original problems in less than a second each, and of finding solutions to the "next-hardest" version of the problem in about 9 seconds.

I also have a bunch of new games to play: Mini Metro (a fun logistics game), Concrete Jungle (deck-building + city-building), Infinifactory (factory layout for evil aliens), and Offworld Trading Company (a logistic RTS.) Yes, even my games are really all about programming. :)
(Leave a comment)

Monday, November 2nd, 2015

Bones was right about the transporter

Star Trek canon records at least 11 transporter accidents:
* Splitting Kirk into good and evil halves (TOS: The Enemy Within)
* Transport to the mirror universe (TOS: Mirror, Mirror) --- not counting later deliberate action
* Transport to another universe (TOS: The Tholian Web)
* Two deaths in Star Trek: The Motion Picture
* Weyoun 5 dies in DS9: Treachery, Faith, and the Great River
* Clone of Riker (TNG: Second Chances)
* Reverting four individuals to younger versions of themselves (TNG: Rascals)
* Travel through time (DS9, Past Tense)
* Merging multiple individuals (VOY, Tuvix)
* Damaging the holographic doctor's mobile emitter (VOY, Drone)
* Rendering two crew members incorporeal (TNG, The Next Phase)

To calculate a malfunction rate, we need some estimate of how many times the transporter was used in total during this period. There are 79 TOS episodes, 178 TNG episodes, 176 DS9 episodes, and 172 VOY expisodes. That gives 605 episodes. If we generously estimate 4 transports per episode that gives a failure rate of 0.4%, unacceptably high.

Transporters appear to be able to cross planetary distances easily (no waiting for proper orbital alignment.) An estimate of 25,000 miles per transport --- that is, assuming starships are usually in high orbit --- and 6 people at a time gives an accident rate of around 30 per billion passenger-miles. Modern cars and light trucks have a fatality risk of 7.3 per billion passenger-miles. (However, it should be noted that many of the transporter accidents were reversible or otherwise non-fatal.)
(2 comments | Leave a comment)

Monday, July 20th, 2015

Fun with Bad Methodology

Here's a Bill Gross talk at TED about why startups fail. Mr. Gross identifies five factors that might influence a startup's success, analyzes 250 startups according to those factors, and analyzes the results.

This is possibly the worst methodology I have seen in a TED talk. Let's look at just the data presented in his slides. I could not find a fuller source for Mr. Gross's claims. (In fact, in his DLD conference talk he admits that he picked just 20 examples to work with.)

I did a multivariate linear regression--- this does not appear to be the analysis he performed. This produces a coefficient for each of the factors:

Idea	 0.021841942
Team	 0.033134009
Plan	 0.047012997
Funding	-0.03324002
Timing	 0.133669929


While this analysis agrees that "Timing" is the most important, it differs from Mr. Gross on what is second. It actually says that the business plan is a better predictor of success. That's strike one--- the same data admits multiple interpretations about importance. Note also that linear regression says more funding is actively harmful.

The second methodological fault is taking these numbers at face value in the first place. One might ask: how much of the variation in the dependent variable do these factors explain? For linear regression, the adjusted R^2 value is about 0.74. That means about 25% of the success is explained by some factor not listed here. What checks did Mr. Gross do to validate that his identified factors were the only ones?

While we're still on problems with statistical analysis, consider that the coefficients also come with error bars.

	Lower 95.0%	Upper 95.0%
Idea	-0.082270273	0.125954157
Team	-0.082753104	0.149021121
Plan	-0.049741296	0.143767289
Funding	-0.126226643	0.059746603
Timing	 0.034672747	0.232667111


The error bars are so wide that few useful statements can be made about relative ordering of importance. (Obviously this might change with 230 more data points. But it seems like Mr. Gross was operating on only 20 samples.)

Next let's transition to the data-gathering faults. Mr. Gross's sample of companies is obviously nonrandom. But it is nonrandom in a way that is particularly prone to bias. He lists startup companies that actually got big enough to attract sufficient attention that he'd heard of them! The even mix of success and failure when he's looking to explain the high rate of failure should be a huge red flag.

Suppose we add 15 more failed companies to the list that have an average timing of 7 (slightly better than the sample) but average on the other factors of 5.

Oops, now timing has slipped to third!

Idea	 0.079811441
Team	 0.067029361
Plan	-0.007925792
Funding	 0.033260711
Timing	 0.064723176


Not surprising, because I deliberately set things up that way. But Mr. Gross doesn't know what the "next" set of failures look like either. He has a complete list of IdeaLabs companies, but not the outside world, which is its own bias--- maybe it's Mr. Gross who is prone to timing failures, not the rest of the world!

Picking only large, well-known failures for your analysis is nearly the very definition of survivorship bias.

Finally, the inputs themselves are suspect even where they are complete. Mr. Gross already *knows* whether these companies are successes or failures when he's filling out his numbers. Does Pets.com really deserve a "3" in the idea category, or is that just retrospective thinking? Why are the successful IdeaLabs companies given 6's and 7's for Team, but the unsuccessful ones get two fours and a five? Did Mr. Gross really think he was giving those two ventures the "B" team at the time?

Even the Y axis is suspect! Pets.com had a successful IPO. Uber and AirBnB can't claim to have reached that stage--- they might yet implode. (Unlikely, admittedly, but possible. And their revenue numbers are better than Pets.com ever was.) As an investor, "Did I receive any return on my investment" is the measure of success.

To summarize,
* The data were generated from the opinions of the principal investigator and not subject to any cross-checks from other parties.
* The data exhibit survivorship bias and other selection bias.
* The analysis includes no confidence intervals or other measurements of the meaningfulness of the results.
* The results presented appear highly dependent upon the exact analysis performed, which was not disclosed.

Am I being unfair? Perhaps. This was not an academic conference talk. But if we are to take his advice seriously, then Mr. Gross should show that his analysis was serious too.
(Leave a comment)

Saturday, July 4th, 2015

Stupid NFS Tricks

One of the Tintri SE's asked whether the advice here about tuning NFS was relevant to us: http://docstore.mik.ua/orelly/networking_2ndEd/nfs/ch18_01.htm (It is not.)

Back in the days where CPU cycles and network bits were considered precious, NFS (the Network File System) was often run over UDP. UDP is a connectionless, best-effort "datagram" service. So occasionally packets get lost along the way, or corrupted and dropped (if you didn't disable checksumming due to aforesaid CPU costs.) But UDP doesn't provide any way to tell that this happened. Thus, the next layer up (SUN-RPC) put a transaction ID in each outgoing call, and verified that the corresponding response had arrived. If you don't see a response within a given time window, you assume that the packet got lost and retry.

This causes problems of two sorts. First, NFS itself must be robust against retransmitted packets, because sometimes it was the response that got lost, or you just didn't wait long enough for the file server to do its thing. This is not so easy for operations like "create a file" or "delete a file" where a second attempt would normally result in an error (respectively "a file with that name already exists" and "there's no such file.") So NFS servers started using what's called a "duplicate request cache" which tracked recently-seen operations and if an incoming request matched, the same response was echoed.

The second problem is figuring out what the appropriate timeout is. You want to keep the average cost of operations low, but not spend a lot of resources retransmitting packets. The latter could be expensive even if you don't have a crappy network--- say the server is slow because it's overloaded. You don't want to bombard it with extra traffic.

Say you're operating at 0.1% packet loss. A read (to disk) normally takes about 10ms when the system is not under load. If you set your timeout to 100ms, then the average read takes about 0.999 * 10ms + 0.000999 * 110ms + 0.000000999 * 210ms + so on, about 10.1ms. But if the timeout is a second, that becomes 11ms, and if the timeout's 10 seconds then we're talking 20ms.

So, at least in theory, this is worth tuning because a 2x difference between a bad setting and a good setting is worthwhile. Except that the whole setup is completely nuts.

In order to make a reasonable decision, the system administrator needs statistics on how long various NFS calls tend to make, and the client captures this information. But once you've done that, why does the system administrator need to get involved? Why shouldn't the NFS client automatically adapt to the observed latency, and dynamically calculate a timeout value in keeping with a higher-level policy? (For a concrete example, the timeout could be set to the 99th percentile of observed response times, for an expected 1% over-retransmit rate.) Why on earth is it better to provide a tuning guide rather than develop a protocol implementation which doesn't need tuning? This fix wouldn't require any agreement from the server, you could just do it right on your own!

Fortunately, in the modern world NFS mainly runs over TCP, which has mechanisms which can usually tell more quickly that a request or response has gone missing. (Not always, and some of its specified timeouts suffer the same problem of being hard-coded rather than adaptive.) This doesn't remove the first problem (you still need the duplicate request cache) but makes an entire class of tuning pretty much obsolete.

Nothing upsets me more in systems research and development, than a parameter that the user must set in order for the system to work correctly. If the correct value can be obtained observationally or experimentally, the system should do so. If the parameter must be set based on intuition, "workload type", expensive consultants, or the researcher's best guess then the real problem hasn't been solved yet.
(Leave a comment)
Previous 10 | Next 10