The solution I came up with at the time partitioned the sample into buckets based on 14 bits of the bitvector (i.e., the presence or absence of 14 specific cards.) If the 'tag' for the bucket intersected with the 'tag' of the hand being analyzed, all hands in that bucket could be skipped. Further, I duplicated the sample into two sets of buckets based on different regions of the bitvector, so that whichever tag had the higher weight (number of 1 bits) could be used. 14 bits seemed to work best, empirically.
I was wondering if there was a better solution, particularly as I am interested in running some 17-card experiments (eventually), and the problem becomes even more acute in that case. A spreadsheet model says that the tagging solution reduces the sample space to 18% of its original size, on average, for a single tag. (Two tags is not likely to produce an enormously better number, maybe 17% at best.)
Obviously if we used a much larger number of tags, we could find one with a high bit-weight for any possible hand, which would dramatically reduce the sample size. But then we'd be storing many, many duplicates of the sample set (or at least many indices.) It might be possible to amortize the cost of rearranging the sample set if we were interested in many target hands with similar tags, but it seems like the idea just isn't feasible--- there is no way to identify a small set of tags which collectively cover the space of 17-card hands.
The approach I was playing with tonight was to create buckets using 2-card combinations. This is not a partitioning--- a sample hand might match many different 2-card labels, in which case we'd just assign it to one of the matching buckets at random. Using just 60 labels we can ensure that every 17-card hand belongs to at least one bucket. Then, we can skip a bucket if either of its cards appears in our target hand. With this data structure, the sample size is reduced to 45% of its original size--- not as good as the 14-bit buckets but at significantly lower overhead. It looks like 3-card combinations yield about 30-35% of the original sample size with 220 buckets. So there might be some other set of 2^14 buckets which gives a better result than the tag-based partitioning I used before, but it seems unlikely to be orders of magnitude better.