The uncertainty in the ratings is a complicated problem. It's not like the uncertainty in the outcome after a certain number of coin flips or bernoulli trials--though we do expect some relation to that. A single game against a similarly rated opponent carries more information than does a single game against a much stronger or much weaker opponent. A game against an opponent who has 600 games in the system carries more information than a game against an opponent who has 300 games or 30 games. A game against an unrated opponent tells nothing about you.
(Brief gobblygook in this paragraph and then it gets understandable again next paragraph). What we maximize is the log-likelihood function, and that gives us the MLE (Maximum Likelihood Estimates) for the ratings. The Hessian matrix--the second derivative of log-likelihood--evaluated at the optimum ratings is what is called the observed Fisher Information Matrix. There is a theorem (Cramer Rao Bound) that relates the diagonal elements of the inverse of the Hessian to the variances/covariances--that's where we are going, analyzing that. But we're not there yet.
In the meantime we can get some brute force estimates of the standard errors under actual conditions. Here is an example. We took a player who has played a very large number of games (5500) in the last two years and has a rating of 622. Then we can take a random sampling of, say, 200 of those games and compute a rating just based on the 200 games. Then we do this many times. What we find is the standard error (standard deviation) of the 200-game rating is about 20 points.
That suggests when a rating is based on 200 games, just at the threshold of what we refer to as "established," it has about a two-thirds chance of being within 20 points of the "true" rating and about a 95% chance of being within 40 points.
These intervals decrease as you log more games with the square root of the number of games. So if you want to cut the uncertainty in half (68% chance you are within 10 points and 95% chance you are within 20 points) you have to go to 800 games.
The average number of games for players on the top 100 list is 3000. So in general when you look at those numbers it is fair to think of them as probably (68%) right within 5 points. And someone fairly new--like Aranas--is probably right within 10 points.
Sorry my stats aren't super strong. So in the top 100 there are probably 5 players off by 3 or more standard deviations?
which at 3000 games would be at least 20 points off the real