I think I'm on the right track by searching in the space of matchings rather than the space of moves. My current-best algorithm looks like this:
Iterate over every pairwise-feasible combination "A" of 1000-point matches, in order of decreasing size: If A is too small to beat the current best score, quit Check that A can be solved by a valid set of moves, go onto the next combination if not. For each combination B of 300-point matches that is pairwise-feasible with A and itself, in order of decreasing size: If A+B is too small to beat the current best score, go back to the next 1000-point combination. Check that A+B can be solved by a valid set of moves, go onto the next combination if not. For each combination C of 50-point matches, etc. ... Compare valid (solvable) game A+B+C+D with current high score.
This prunes the search tree substantially, but it still examines far too many 300-point matches that don't work out. (A recent example searched 544332 300-point combinations and found only 13 that were good enough to go down a level.) The problem is that pairwise-feasible is not strong enough. Also, "solvable" is not a strong enough condition either--- the choice of matches must both reach the end state and be reachable from the initial state.
What I think needs to happen is to check solvability match by match as combinations are being created. Right now we might find a 1000-set ABCDEF that's pairwise feasible but not actually solvable with 300-point match G. But we'll try GHIJ, GHIK, GHJK, GLMN, etc., none of which will work.
Ideally we can build a library of sub-combinations known not to work--- if this is cheaper than just doing a feasibility check by playing it out, which I'm not sure. It might help when we try a different 1000-point combination if we'd already discovered that CD+G isn't feasible without having to try playing it out. (This dictionary could even be generated ahead of time, but it's likely to be too large that way.)
I'm also interested in how quickly a feasibility check can be done. It doesn't seem like we can leverage small forward validity checks into large ones (given that there are multiple possible orders). But, for backwards validity ("can we get to this point"), I think it is sufficient that every open region has an even number of squares. We know that this isn't sufficient for domino tiling, though...