Mark Gritter (markgritter) wrote,
Mark Gritter
markgritter

Red Herring

I tried rewriting my CP2-7 hand evaluator to 'batch' requests and run the entire batch against the opponent data in one pass. My hypothesis was that the bottleneck was pulling opponent hands into the cache. But, this does not appear to be the case. The batched version performed worse than the one-hand-at-a-time version for every batch size I tried between 1 and 20. (Batch size of 10 performed almost as well.)

I did not put any extra effort into making the batched hands use memory efficiently, because I figured if there was a significant effect, it would be visible despite the batch being scattered throughout memory.

Let's do the math.

It takes about 140-160 milliseconds to evaluate one hand vs 1M opponent hands. That means each opponent hand is considered in (an average of) 0.14-0.16 microseconds. This is significantly higher than memory latency--- we'd expect that to be about 0.04 microseconds using 100MHz DDR SDRAM.

But, most hands are rejected quickly because they overlap. For these the memory latency might be an issue. But others are scored against each possible arrangement. 1 cycle = 5*10^-10 second = 5*10^-4 microseconds, so 0.14-0.16 us = 280-320 cyles per opponent hand. But if count just the 13000 or so relevant hands, then the cost is actually 11-12 us, 22000-24000 cycles per relevant hand.

The truth is probably that the overlapping hands take much less time (just a few CPU cycles for a compare and branch, memory latency adds perhaps 80 cycles) while non-overlapping hands take more.

So, it doesn't make sense to concentrate on getting through overlapping hands faster--- I need to focus on getting the score calculation down to a minimum.

But, I still can't explain why the 2-CPU box is slower.
Tags: chinese poker, geek, performance
Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 10 comments