# Finding all Nonintersecting Hands, Continued

Tintri suffered a server room power outage Monday evening so instead of useful work I prototyped the idea laid out in the previous post!

Let's call this structure a "Discard Tree":
* Each bucket at a particular level of the tree is labeled (indexed) with a fixed-sized set of cards.
* Every hand within that bucket contains all of the cards in the label
* To find the nonintersecting set of hands for a target hand X, we need search only those buckets whose labels are nonintersecting with X.
* The list of labels is chosen to be minimal, provided that every hand belongs to at least one bucket.
* Each bucket may contain a list of hands to be searched, or a subtree.

The good news is that this structure appears to achieve near-optimal reduction in search space for the size of the index. Suppose we had a huge table which, for each K-card combination, lists exactly the hands which do not intersect with those particular K cards. The fraction of the sample set that would need to be examined would be (52-K)C17 / 52C17. The Discard Tree achieves this ratio at a much lower cost.

(Is this really the "best" possible? Well, if we could efficiently calculate the intersection of two different K-card buckets we might be able to play inclusion/exclusion tricks. But, any such operation that has a per-hand cost loses to just searching through one of the buckets, so I feel fairly confident that, for a given "index size", this is the best possible reduction in search space.)

Here's some measurements from the prototype, showing the fraction of buckets visited (on average) and--- assuming equal distribution of hands to buckets--- thus the percentage of the search space to examine. I used both "flat" (single-level) tables as well as a tree lookup.

```# 8 card theoretical value = (52-8)C17 / 52C17 = 0.0313
#  8            ??????? of 40619150 buckets
# [ 4, 4 ]      0.03092 of 1137780 buckets
# [ 5, 3 ]      0.03142 of 772200 buckets
#  7            ??????? of 1315600 buckets
# 7 card theoretical value = (52-7)C17 / 52C17 = 0.0503
# [ 4, 3 ]      0.04995 of 206400 buckets
# [ 3, 2, 2 ]   0.05058 of 516120 buckets
# [ 5, 2 ]      0.05054 of 236808 buckets
# [ 2, 5 ]      0.05062 of 249480 buckets
#  6            0.07891 of 43317 buckets
# [ 2, 2, 2 ]   0.08006 of 155520 buckets
# [ 3, 3 ]      0.08061 of 38500 buckets
# [ 2, 4 ]      0.08068 of 63000 buckets
#  5            0.12556 of 5148 buckets
```

The good news here is that we can pick whichever structure of indexes is most efficient--- any tree that is "7 cards deep" is just as good at reducing the search space, whether it is a single 7-card table or a three-level lookup.

This is useful, because once we get to 7 or 8 cards, the number of buckets is prohibitively large. The minimal label sets start becoming a significant fraction of the combinatorial space.

```K	Minimal label set size for 17-card hand
2	60
3	220
4	1290
5	5148
6	43317
7	1315600
8	40619150
```

What happens is that at K=6 we can still divide the cards into 3 groups, while with K=7 we have to drop down to two groups. Even at K=6 it is more efficient--- in terms of number of buckets--- to built the index as 3,3 rather than a flat table. This is not true for K<=5.

The other reason that being able to break the index up is useful is to reduce the number of index comparisons (which, after all, cost just as much as hand comparisons.) This lets us calculate the optimal index size for any given number of sample hands, to reduce the total number of intersection operations. (I assume that number of buckets is not a limiting factor.)

```K      Space reduction    Best indices    Fixed cost     Breakeven point
2      0.4487             [2]                     60                 134 hands
3      0.2962             [3]                    220                 743
4      0.1934             [4]                   1290                6670
5      0.1249             [3,2]                ~4130               33064
6      0.0797             [3,3]               ~14556              182569
7      0.0503             [5,2]               ~43727              869950
8      0.0313             [5,3]              ~146605             4687572
9      0.0192             [6,3]              ~802837            41832660
10     0.0116             [5,3,2]           ~2565704           229718055
```

"Fixed cost" == number of index comparisons needed. "Breakeven point" == point where reduction in total space equals fixed cost of index comparisons. But, we are more interested in what is the lowest-cost option at each number of hands.

```Sample size     Best index
10^3            [2]
10^4            [4]
10^5            [3,2]
10^6            [5,2]
10^7            [5,3]
10^8            [6,3]
10^9            [5,3,2]
```

Thus, for a 10^7-hand sample, we perform hand-by-hand intersections on just 3.1% of the sample space (but an additional 1.5% overhead for labels), using 236,808 leaf buckets. The number of nonintersecting hands found should be about 23,000, and we performed 460,000 intersections to find them, a 20:1 ratio.

For a 10^8-hand sample: 1.9% of the sample space, and 0.8% overhead. Thus we find 230,000 nonintersecting hands using 2,700,000 intersections, a 12:1 ratio.
Tags:
• Post a new comment

#### Error

default userpic