Mark Gritter (markgritter) wrote,
Mark Gritter

Selecting a random sample when the population can only be accessed nonuniformly

I need some stats help.

Suppose you've got a large set of boxes, each of which contains a varying number of marbles, each of which has a color.

If you're interested in the color distribution of marbles, but can only pick a box at random, can you still do unbiased sampling? (There might be skew such that boxes that have more marbles are predominantly blue, while boxes with fewer marbles are predominantly red and green.)

In the experiment I threw together, random sampling of boxes seems to return the correct distribution of colors even with a high correlation between marbles per box, and color of marbles in the box. With a high sample error, naturally, but the mean looks correct. Is this always true, or is there some adjustment that needs to be done? For example, should you pick just one marble from each selected box? (This seems like it would skew things even further.)

I don't even know what keywords to use to figure this out...
Tags: mathematics, statistics
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.