I often use the average ratings of books/movies from Goodreads/IMDB in order to decide what to read/watch next. Unfortunately, if there are not enough ratings, the average rating is often inflated, and it seems to decrease with the number of ratings. Plots of two datasets of books and movies show that the story is slightly more complicated.

Recently I used the Goodreads API to obtain my list of books marked as read on the site. For each of those books I also scrapped other books which are listed by Goodreads as being similar, resulting in a list of some 30K books. My plan was to devise a recommendation algorithm. Unfortunately including the average rating turns out to give poor results, mainly because there are many books with very high average rating which are not very interesting. These books also have low number of ratings. Therefore we need a method to quantify how much the average rating can be trusted.

A plot of the number of ratings vs average ratings, shows a surprising pattern.