Their estimate is that only about 1% of the sample is non-infringing use. About 46% of the sample was movies and TV shows, 14% games and other software, and 14% pornography. Of the last category,
...53% of pornography in our sample was in English, 16% was in Chinese, 15% was in Japanese, 6% was in Russian, 3% was in German, 2% was in French, 2% was unclassifiable, and Italian, Hindi, and Spanish appeared infrequently (1% each).
So many questions for further research here!