Mig 
Greengard's ChessNinja.com

Linares 2010 r9: Grischuk Saves Linares

| Permalink | 170 comments

Defending Linares champ Alexander Grischuk stepped up big time in round nine in Linares by beating Topalov to tie him for the lead heading into tomorrow's final round. Instead of cruising to clear first with an easy draw with white against Gelfand tomorrow, suddenly there's everything to play for. Grischuk has black against Vallejo. The other games were drawn with medium to high degrees of tedium.

The matchup between the leaders did not disappoint. Out of a QID in which both players seemed to be trying to avoid surprises (I'll play a sideline. Oh yeah? 'll play even more of a sideline...), Grischuk got the sort of imbalances he needed to play for a win. Topalov failed to handle the exchanges very well, deciding to bail out by giving up two pieces for a rook and pawn. That's the opposite of where he likes to be in such swaps, always preferring activity over material. But Topalov being Topalov, he managed to get very active anyway, but at the cost of a pawn, leaving him dead lost if he couldn't keep the pressure up. Grischuk had some good karma built up and used it all on needing only two repetitions of position to reach the time control on move 40. It would have been a real test of testicular fortitude had he been forced to choose to go for it or repeat the third time with (once again) seconds on his clock.

With time to think it was clear, well, clear-ish, that there was little Black could do to prevent White from eventually untangling his pieces. Grischuk then made the interesting decision to give up his queen for an unusual R+B+N vs Q game. It likely would have been over faster had he played more directly, but with so much riding on the result he decided not to take any chances. Topalov didn't have many ways to complicate with just the queen and White carefully consolidated despite threatening to get into time trouble for the second time in the game. A great win for Grischuk and for Linares, which has already been diminished by the shrunken field and the lackluster performances from Aronian and Gelfand. Had Topalov waltzed off with his first Linares title after playing so spottily, if wonderfully ambitiously, it wouldn't have much seemed like Linares at all. This is supposed to be our premier event, a real crucible. And even if it's Linares Lite this year, at least now it feels like everyone has been in a fight.

Despite the official commentary saying yesterday that Grischuk would have the advantage on tiebreaks if he beat Topalov today, I haven't seen an explanation for that yet. Last year, and I got this wrong on the air today because I was going from memory (or lack of) instead of checking my own report from 2009, Grischuk got the title ahead of Ivanchuk on most wins but only after they were tied on head-to-head, which is first tiebreak. Second tiebreak is most wins, third is most wins with black. At least that's the way it was last year. (This confused me because I'm 99% sure it used to be most wins first.) But Grischuk and Topalov are tied on all three. If they go to TPR it will be Grischuk since he's lower rated than Topalov, but that's pretty silly even for tiebreaks. I wish they'd just split the title and have done with it. They do need a clear winner for the purposes of deciding the Grand Slam qualifier, however. I still don't see anything posted about how it will be decided if they tie tomorrow.

Another incentive for Topalov in particular is that +2 won't give him back the #1 ranking in time for his match with Anand, so Gelfand can't expect an easy day tomorrow. [Apparently +3 won't do it either, so nevermind. Sorry about the misinfo. I'm reliably informed that +3 would only be 2812.2, 0.7 behind Carlsen, so I was wrong on this from the start.] Some game notes late tonight. Round 10: Topalov-Gelfand, Grischuk-Vallejo, Aronian-Gashimov. Topalov gave a simul on yesterday's free day, scoring 18 wins and three draws.

Update: Just a few game notes before crashing. Still no final word on the tiebreak situation if Topalov and Grischuk both draw tomorrow. If they both win, Grischuk takes the title with more wins with black. Not a huge deal to me because I basically consider it a shared first anyway, and I believe they split the cash, but it might matter more to the players...

Topalov had some tough choices to make ealry on in the loss to Grischuk. He seems uncharacteristically unsure of himself, perhaps feeling the effects of missing a knockout against Aronian yesterday. One choice I discussed on ICC Chess.FM with GM Ben Finegold was on move 16, when Black is figuring out how to deal with all his hanging minor pieces on the queenside. Black has the bishop pair, and the uncontested dark-squared bishop might be quite valuable at some point. So it surprised Finegold when, after a substantial think, Topalov gave it up right off for the knight. At first it seemed to at least have the point of playing for the initiative instead of retreating with 16..Bb7 and defending. But Topalov had a more intriguing idea, lover of material imbalances that he is. The resulting position is better for White, and Grischuk steadily outplayed Topalov from there for a good stretch. His usual patient pressure didn't help much in a position where time was on Grischuk's side for once. The Russian champ still got into time pressure, but here it was a few minutes instead of a few seconds. As he nursed his material plus closer to the time control he missed the elegant 34.Bf1!, which activates the bishop (c4 next, with Rf1) breaks the coordination of the black rooks.

Topalov kept going for activity with the dubious 30..d3 instead of the obvious pawn capture. His instincts betrayed him again a few moves later when he doubled his rooks on the 7th instead of grabbing the e-pawn with 35..Rxe3. GM Yermolinsky, who is coming on Chess.FM for the final round, kibitzed that he thought Grischuk sort of bailed out with the queen sac when a few well-calculated variations would have finished things off. 45.Bd5! proves his point. If 45..Ra3 46.Ne5 and the white pieces are suddenly swarming, with threats of Nxg6 and e4. The game continuation looked rather inevitable, however, though Topalov could have dragged things out by holding on to his h-pawn. Big chess.

170 Comments

Mig is smokin' and cookin' now, two great titles/articles in a row! Spot on!

Topalov's loss against Grischuk cost him 6 Elo points. He is now down to 2807,8. Even if he beat Gelfand tomorow in the final round, he will not catch up with Magnus (2812,9) on the live list.
-The WC match in April will be between the # 2 and #4...

Finally someone points out the lackluster performance of Aronian , i was afraid that his unambitious play would go unpunished.

Actually ,that might be not so bad for Topa since in his last WCH match he was ranked number one and didn´t went well for him.

If Sonneborn-Berger also enters the equation, it seems that Grischuk would be first on tiebreak if he remains tied with Topalov after the final round: If both win their games, Grischuk will have an extra win with black. If both draw or lose, Grischuk will have the better Sonneborn-Berger (he beat Gelfand, Topalov beat Vallejo). If both lose, actually Aronian(!) could catch up by beating Gashimov, but he (+1=9) would be far behind on number of wins .... .

But as I already mentioned in the other thread, if Topalov and Grischuk remain tied both may be invited for Shanghai/Bilbao: Carlsen has double-qualified (Nanjing + Corus), and the Grand Slam organizers said that one of his spot would go to the best runner-up (best non-winning performance).

The Linares organizer had a lower budget this year. They managed to save expenses in several ways, including 30% less pay to the players.
-Which all the players accepted, BTW, although the lackluster play doesn't justify that much pay, you might say. Except Topalov, who really has provided entertaining games. Soon, kibitzers will be surprised if he does NOT sacrifice a pawn or a piece in each game!

Apart from this, I think Topalov's performance in Linares doesn't say much about how he will play against Anand. Probably Topalov will be more precise and careful against Anand, with a much stronger opening reportoire of course.

Well the thing is that most comments about Topalov´s play are made without considering that he might be playing third-choice openings in all his games .
I would say that his uncompromising performance makes him the ¨saver¨of the tournament , at least in terms of excitement.

"It would have been a real test of testicular fortitude had he been forced to choose to go for it or repeat the third time with (once again) seconds on his clock."

I thought that at first but the only reason white would have any doubts about playing Qc4 is ...Qxe3, but then it's obvious that Qf7+ is at least a perpetual(actually adding Bd5 later means it wins). So Qc4 could be played at no risk.

I watched Shipov's video comments at Crestbook and one interesting point was that after Rybka's recommended 35. Nc5 he thinks Grischuk's probably lost - black gets in some second rank mating combinations - so he praised Grischuk's sense of danger in playing 35. Nf2. Shipov agreed that 34. Qc6 was a blunder and Topalov just taking the pawn on move 35 would have made a draw likely. The same went for 30...d3 instead of the simpler and better ...dxe3. He thought Topalov went astray by trying to play on Grischuk's time trouble.

One thing I think is pretty predictable for tomorrow's game: if Topa's going for a win vs. Gelfand, he won't play 1. e4 because of the draw-likely Petroff; so I'd think he'd opt for 1. d4, unless he has some real major surprise for the Petroff (but then again, if he does he might save that for Anand...though I don't seem to remember if Anand plays that defense. Anyone know?).

@Eyal from Chessgames - feel free to repost any translations from here but please include some sort of attribution (even just generally to this blog). It's a bit odd to read my words with a link to Google's translation - though I realise you're just making the whole text available rather than claiming that's where they come from.

"Another incentive for Topalov in particular is that +2 won't give him back the #1 ranking in time for his match with Anand, so Gelfand can't expect an easy day tomorrow."

Mig, I don't see why this is relevant at all. If FIDE is going to stick with their rules - and they should - then Linares can't possibly be rated for the March 1st official FIDE list. The deadline is 1 week -- 7 days -- before the date it is to be published, and we're already past that date.

I.e. - Linares is not supposed to be rated for the next official FIDE list in March - it will be rated for the May 1st list. Both Anand's and Topalov's official ratings will be what they were in the January 2010 list at the time of the match.

Also in terms of live ratings, Topalov would've had to go +4 in Linares to remain #1 after the event was over. So he won't be.

Sorry, scratch that re Anand's rating: it'll be 2787 of course, due to the points he lost in Corus. Their official rankings will be #2 and #4 like Bobby Fiske said:

March 1st ratings:

1 Carlsen 2813
2 Topalov 2805
3 Kramnik 2790
4 Anand 2787

Re: Tie-breaks. Once you do the three things Mig mentioned, then don't you break ties based on a tie-break score that is computed by adding the sum of the player's points they have defeated to half the sum of the player's points they have drawn against?

I think the tie-break scores shown on www.chessbase.com were computed using this formula. (This tie-break score is not TPR because it does not depend on the pre-tournament ratings.)

What you describe is the Sonneborn-Berger that Thomas already mentioned. And yes, it's what the chessbase program uses as tie-break for round robins.

Anand used to play the Petroff, and with very good results. Assuming that Fischl's statistics are at least close to being fully accurate, you can compare his performance to Kramnik's for instance:

http://members.aon.at/sfischl/opblack9505.txt

http://members.aon.at/sfischl/opblack.txt

Spot on Mig! Grischuk's loss and now, if not perfect, but gutsy effort to win has truly been the only bright spots on an otherwise forgettable tournament. Bravo Grischuck, but must say that Anand looked to be playing to draw and now Topalov is playing to avoid that at all costs. What has happened to the days when the WCs played every game like it was life and death, are those days gone? I miss them.

Frogbert, you are supposed to be the rating "expert" and even I know that usually Linares is rated for the next list even if it's past the deadline.

Anyone remembers when Anand was supposed to cross 2800 for the first time, then Linares didn't went into the next list because it was to late, then later it was fixed?

They have traditionally made an exception and rated Linares. Although it used to be for the April list instead of July when the lag was even greater than now. They were going to do just that in 2007, correctly rate it for July, but there was an uproar because the April list was Anand's first #1 ranking if they included Linares. So FIDE did, saying:

"Linares is clearly a tournament that belongs to July list. The others I do not know.
After discussion with Casto Abundo we decided that they will be rated on the April list, partially because Linares has been incorrectly rated in earlier years."

Rating it now for March does seem odd, and I don't have any official word on that one way or another. This is the first Linares after the change to bi-monthly lists. Seems like a good chance to start following their own rules and not rate it if it's over the deadline.

But is it really the case Topalov won't pass Carlsen on the live list by beating Gelfand? How could that be the case? I thought +3 was +3 and that +3 was good enough to do it.

Ah, John beat me to it. Had the page sitting here for so long and didn't refresh!

Really annoying there's still no guidance on who gets the trophy if all games are drawn tomorrow. (If both Topalov and Grischuk win, Grischuk takes it on more wins with black.) Must have popped up somewhere by now, surely. Doggers is in Linares but seems to arrive at the same odd conclusion as the La Marca writer, that Grischuk "should now be considered favourite for victory, since this year the first tiebreak rule is the individual encounters." But Topalov beat Grischuk in round five! So first tiebreak, head-to-head, is even, most wins is even, most wins with black is even. Next?

I wonder how players feel having to do a simul before/during a tournament? Zatonskih did a small blindfold simul before the US Women's Championship, and here Topalov did a sizeable simul before two critical rounds. I would have thought they'd despise being distracted/losing energy from important events. Am I wrong?

I wonder if Topa's simul was a "package-deal" to justify a high(er) honorar?

Not sure , but the guy seems fairly happy and relaxed during the simul , look at this picts:

http://www.chessbase.com/espanola/newsdetail2.asp?id=8066

And BTW , Smeets and L´Ami are Topa´s new seconds?!
What happened with Dominguez ? And Cheparinov and Paco?



L'Ami has been there for a while, Cheparinov was playing I think, Paco too. :P He might have all 5 for the Anand match.

I knew about L´ami and also about Cheparinov´s situation but it was announced Dominguez as his second for this event , so Smeets came as a surprise (to me) .
I wonder who gets to pick the ties.

"But is it really the case Topalov won't pass Carlsen on the live list by beating Gelfand? How could that be the case? I thought +3 was +3 and that +3 was good enough to do it."

Mig, doesn't the rating of an opponent impacts the gain (or loss) of ELOs?

"Really annoying there's still no guidance on who gets the trophy if all games are drawn tomorrow. ... first tiebreak, head-to-head, is even, most wins is even, most wins with black is even. Next?"
Sorry for insisting ... wouldn't Sonneborn-Berger be next, favoring Grischuk? BTW this might be considered a bit ironic: last year Grischuk finished ahead of Ivanchuk and got the Bilbao spot based on number or wins (tiebreaker at Linares), while Chucky would have been on top based on Sonneborn-Berger (applied at most other events).
Or is there some solid evidence, or credible rumors that the Linares organizers never heard of, or completely forgot about this common tiebreaker?

Topalov has made the interesting decision to play against Gelfand's Petroff. So, unless he really screw up, I guess he's content with a draw.

I just want to see an official announcement. I'm sure they know SB tiebreaks exist, I just don't think they like that, traditionally. They may end up playing rapid/blitz for the trophy if they both draw. Trying to confirm.

if both Topalov and Grischuk win their games they'll play two 3+2 blitz games according to Leontxo Garcia. If drawn, armageddon game 5/4 mins will be played.

Sorry, he must have said they'll play blitz if both draw. Naturally Grischuk should win the tournament with a black win.

Oops. According to chessvibes there's no black win tiebreak. So I think I heard it correctly and they'll play blitz if both win. If both draw, Grischuk wins.

Maybe there's life in the ol e4 mutt yet, Jim.

Peter Doggers on Chessvibes corrected his error (Grischuk didn't win, but draw his mini-match with Topalov) and posted tiebreak rules:
"1. Individual result. [even]
2. Highest number of victories. [even]
3. Highest sum of points against players who scored 50% or more. [even - only Aronian has also 50% with 8 draws]
4. Remove the points scored against the player/group of players at the bottom of the standings. If still equal, do the same for the player/group of players above. [seems like Sonneborn-Berger but isn't called Sonneborn-Berger -> advantage Grischuk if both he and Topalov draw or lose today]
5. If still equal, blitz games will be played (but only to decide the 1st place). [apparently only needed if both win - nothing mentioned about "more wins with black"]"

http://www.chessvibes.com/reports/linares-r9-grischuk-beats-topalov/

"even I know that usually Linares is rated for the next list even if it's past the deadline."

I'm perfectly aware of what's happened in the past. As the ratings expert, I'm also perfectly aware that FIDE indeed has started to strictly enforce their rules for deadlines, much due to the mess in 2007.

These days, only the few FIDE organized events EXPLICITLY listed in the handbook are rated if they finish after the rating deadline, which now is so near to the date of publication that there really is no need to allow any exceptions.

The historical (and now obsolete) inclusion of Linares for the April list hence means nothing. Do you want a link to the FIDE statement about the change of practice?

Grischuk didn't save Linares. The #1 seed would have won and it still would have been marred by only 6 players and only 2 of the top 5 bothering to show up.

"Anyone remembers when Anand was supposed to cross 2800 for the first time, then Linares didn't went into the next list because it was to late, then later it was fixed?"

The issue wasn't crossing 2800 for the first time, but becoming World Number one for the first time.

"As the ratings expert"

As the frogbert expert, I say you know nothing (like Sgt Schultz) and you like it! ;)

"But is it really the case Topalov won't pass Carlsen on the live list by beating Gelfand? How could that be the case? I thought +3 was +3 and that +3 was good enough to do it."

+3 is indeed +3, but +3 was never good enough to "do it".

Topalov was expected 5,78 points prior to the event. +3 means 6,5/10 and 6,5 - 5,78 = 0,72

2805 + 10 * 0,72 = 2812,2

Carlsen is rated 2812,9 ~ 2813 for March 1st.

I'm not sure if you saw "John's" ironic comment to which I replied. Here it is:

"you are supposed to be the rating "expert" and even I know that usually Linares is rated for the next list even if it's past the deadline."

And you clearly need to brush up on your frogbert knowledge - it seems to be lacking in several respects.

"Test your engines – the Silver Openings Suite
23.02.2010 – One way to measure the relative strengths of chess engines is to play them against each other – see who wins and by how much." Chessbase

This is also the only objective way to measure the relative strength of openings for comparative purposes. People who refer to chessgames.com for this purpose just don't get that people can mistakes in the opening, middle, or endgame that have nothing to do with the objective strengths of openings relative to each other.

"Seems like a good chance to start following their own rules and not rate it if it's over the deadline."

Actually they followed "their own rules" also in 2009 regarding Linares. The deadline was then 14 days before the date of publication, and Linares finished on March 7th and was submitted for rating on March 13th.

In other words: no exception was made for Linares in 2009 either.

Grischuk is going for it with the Sicilian. I'm rooting for him and Anand in the WCh.

So this leaves Carlsen 1 (or 0.7) points ahead of Topalov, if the latter beats Gelfand today - a highly significant difference ,:) .
As far as I understand, even if it was 2812.6 vs. 2813.4 ~2813, Carlsen would be on top based on number of games played (13 at Corus vs. 10 at Linares), but then people might argue that the reduction of the Linares field from 8 to 6 players prevented Topalov from becoming #1 again ... [neglecting the fact that Linares shouldn't/won't be rated for the next list].

BTW, to the best of my knowledge frogbert is right that FIDE (in this context!) enforces its own rules. This was an issue at the end of 2008 when Nanjing wasn't rated for the Jan2009 list - Topalov fans were disappointed and complained about it, particularly because (in accordance with FIDE rules) a parallel GP tournament was rated.

What happened to the Chess.FM broadcast with Mig and Yermolinsky?

"a highly significant difference"

Yup. The 0,7 points are as "highly significant" as the 8 point difference between 2805 and 2813.

Moreover, in a match setting, I think the differences between 2805/2813 and 2790 (Kramnik), 2787 (Anand) or 2781 (Aronian) are also "highly significant".

Irony aside, I merely answer these questions about tenths or a handful points - I don't raise them or consider such differences significant or meaningful. :o)

As a description of tournament results, Carlsen's and Topalov's ratings are basically indistinguishable (but indicate slightly better results than anybody else), and as match predictors (for a match between any two of the top 5) the rating differences between these 5 players are quite irrelevant IMHO (with all of them being within 30-35 points atm). Other factors would weigh far more heavily I think.

Oops, looks like Topalov just trapped Gelfand's Rook on e2 (move 23).

This is a very interesting opinion, coming as it does from the keeper of live ratings. I agree, but ratings, especially of the top guns, are still an interesting sideline.

Yep, Gelfand in all sorts of trouble - Grischuk only has to deal with approaching time trouble (and maybe a slightly worse position).

But oops, what's this? Topalov returned the exchange ... does he have any winning chances in the endgame with an extra (doubled) pawn?

I'm not that sure the exchg up endgame was all that great. Was there any clear plan of making headway?

But this is even more clearly drawn of course!

Grischuk has 7 minutes for 16 moves. C'mon dude, don't do this to yourself again.

The only rationale I can come up for Topalov's return of the exchange is that he wanted to play on Gelfand's time trouble. I thought Topalov would have played 32.Rd1 Bd6 33.c5 d5 34.Rg1. Yes, it would have been a long endgame but he's clearly better if not winning. The return of the exchange makes it a lot easier for Gelfand.
Anand must be smiling right about now...

Yep, it seems just to be drawn. Really strange from Topalov - here's some of the Chesspro commentary:

After 31... Re4 the most accurate is
32.Rd1 Be5 [passive, аnd therefore totally hopeless is the position after 32... Re6 33.Kc2 g5 34.Kd3]
33.c5! d5 34.Rg1! and the white rooks are ready to break into the black camp.

"32.Rxf6 A very unexpected decision, why give black chances? In addition Veselin's playing without thinking, as if it's blitz or a simul."

Then there's some assessment of the pawn endgame after 35. Rd7, but it just seems drawn. It's not really clear what Topalov missed!?

"This is a very interesting opinion, coming as it does from the keeper of live ratings. I agree, but ratings, especially of the top guns, are still an interesting sideline."

I'd like to see the rankings (1,2,3) published, but the ratings unpublished. More than half the internet posters (so called "chess fans") are more interested in the ratings than they they are in the moves otb.

Looks like Topalov wants to inflate his statistics as most sacrificial player.

Yes he is particularly well known for his exchange sacrifices - this was his last chance @Linares ... .

Looking at Chessok Rybka seemed to choose the line Sakaev called passive and hopeless. It doesn't look that bad for Gelfand and maybe there's even a fortress. Sakaev earlier thought 26. c4 made it harder for Topalov's rooks to invade blacks position by using the 4th rank - but that objectively it should also win.

Vallejo-Grischuk seems to be simplifying at just about the right time for Grischuk to make the time control intact!?

Another shared first for Topa could be a very good omen for him , the last time that the same thing happened the guy became world champion and had a great year.

And drawn! Now just for Gelfand to hold the draw. Sakaev says he was a little inaccurate before the time control (it would be an ironic twist if it counted in the end!) - 39... Re4 40.b3 a6, instead of 39...Rc2 would have completely stopped white. Now the white king can get to b5 but it should still be drawn.

... and (this time) to accept a repetition draw. I wonder if he could follow Topalov-Gelfand? Irrelevant for the top final standings, Aronian-Gashimov is going into time trouble, both have 4 1/2 minutes for 10 moves.

Last I heard, they don't give points for playing sacrifices.

It's me or chessdom fanboyinsm of Topalov is going overboard with comments like:

"The position is easily won for White for such player like Topalov."

"Topalov is fixing d6 pawn,I think that Gelfand has to resign soon.In such level,players don't play the positions like this"

"If Topalov will win this endgame,it will be a great victory"

"Gelfand thinks that he can defend pawn endgame! Probably,it's a draw,but it's TOPALOV,so we cannot judge him." (upper cases in the orignal)

"More than half the internet posters (so called "chess fans") are more interested in the ratings than they they are in the moves otb."

I find that very unlikely, actually. However, while a handful ratingpoints are technically insignificant in itself, the question of who's number one is an issue that people care for, including the top players. If you're number one by one insignificant point (in terms of measure of _anything_, whether results or "strength"), you're still number one. Nobody can take that away from you.

Similarly, regarding the 2700 "border": Unlike popular belief, this isn't a fixed point at all - and NOT due to rating inflation; inflation or not has little or nothing to do with it. The meaning of ANY given number is only "defined" within a specific list, as a relative measure to the ratings of the other players in the current pool. No two lists have ever been based on the exact same pool, and hence the "meaning" of any number - such as 2700 or 2041 (my FIDE rating) - in fact changes from list to list.

Still, for practical reasons it's convenient to "pretend" that 2500, 2600 and 2700 are "fixed" figures, and for the purpose of e.g. the live ratings, you simply need to have some kind of (fixed) cut-off. It could've been defined differently, of course, but that's not the point. Not here, anyway. :o)

At least the second and fourth comments do seem a bit over the top! To be fair Sakaev at the Russian Chesspro called Gelfand's chances of drawing after the exchange sac "illusory" (or "negligible" might be another translation), so it was reasonable to say it should be easily won for Topalov.

I have some suspicions that the Chessdom commentator just took Sakaev's line and repeated it: "21.Bxg5 Re6 22.h6 g6 23.Rh4 Qe2 24.h7 Kh8 25.Qxe2 Rxe2 26.Be3 - with the king coming to d1..." Though it was fairly straightforward, so perhaps they both saw it at the same time. I can't access their site any more.

Hey Frogbert, on a slightly different subject. FIDE should drop their rating lists and all that noise about the "dates when they appear and events included crap" and instead use your "real time" rating. I for one find it much more useful. Why not use the real time stuff? They should pay you for that service too.

D.

Does this surprise you? Chessdom has a reputation to defend or maintain ... .

But actually the first two comments may even be correct - I find it more odd that they take the opportunity to mention Kramnik's loss against Naiditsch from Dortmund from Dortmund 2008 (also a Petroff, but a different line), yet such things aren't unprecedented either on planet Chessdom.

At the ratings meeting in Athens last year, we discussed the question of how frequent the rating list could really be. There are federations that go through detailed verification of results before submitting to FIDE, and thus there is a need to leave time for this. It might work to go "live" for the results of the top 20/50/100/whatever, but it would be a real headache to have ratings not get updated all at the same time, and so I think the frequency of the complete list needs to follow the frequency of the lowest acceptable federation review. Seems like it might go down to monthly but that's as far as it would get without radical change in the federation review process.

I like the attempt to put a positive spin on the exchange sac - "Of course,the position before 32.Rxf6 was won for Topalov,but he should play that final a lot of moves,probably more than 70 and 80 so he decided to play more interesting!" :)

Oddly enough, though, Ipatov seems to be a 16-year-old Ukrainian: http://players.chessdom.com/alexander-ipatov

Meanwhile Gelfand seems to have taken pity on Topalov and is trying to find him some winning chances...

While after move 26 Ipatov thought that Gelfand has to resign soon ... .
Nothing odd with Ipatov being Ukrainian, though - he is actually living in Spain (like Topalov), maybe that's the connection. And he is not the only Topalov fan (and Kramnik hater) from a Spanish-speaking country ... .

Looking like quite an appropriate end to a very weird tournament :)

Does this have anything to do with Gashimov's presence in the room?
http://www.chessgames.com/perl/chessgame?gid=156280
which brought Azerbaijan gold at the European Team Championship

supergm????gelfand played the endgame like a under 2000(rather 1800)player.topy is just superlucky,he undeservedly winn the tourney.he should split the money with boris(well it might be true!!!)


I think this is a draw ... Black can just keep his King on the e file, and the rook on b file, thus preventing white king stuck on c file; therefore the two white pawns can not move. As soon as white rook move over the e file to check black king (making room for white king) it will have to move back right away)

I like the timing of the previous post

YIPEEE nice nice nice Toppy

What utterly stupid comments here.


ooops, I forgot that White could sac the rook for Black's pawn. Good rook end game play by Topalov. Turns out 32 Rxf6 was a brilliant move :-) he sees far more than the engines. Anand should be really scared now.

IM Ipatov had already left, otherwise we may now get a comment such as "I told you he is TOPALOV" - calculating 30 moves ahead, everything was forced after the BRILLIANT exchange sacrifice.

At least in this ocasion the spanish-speaking fans seem to know better than others , oh my , my smile is starting to hurt my face .
:)

Looks like 48...Ke8 (instead of ...Kf6) was a serious inaccuracy and then 49...a2 was the loosing blunder - plus maybe just before the time control (the story of Topalov's tournament!) Gelfand may have been a little inaccurate to give any chances at all.

Brownie points to anyone who can explain why any white move other than 57. Ra5! only draws? (Topalov took a long time on that to add some tension)

Leontxo announced a post game interview with Topa , if that´s the case i´ll keep you informed.
:D

roamingwind please....!after i posted that topy is winning you said it is a draw????then u came and say good endgame by topy,when gelfand could find a draw in at least 3 separate ocasion!he tried but he got a lot of help from boris.

Sakaev's summary at Chesspro:
BLACK RESIGNED It looks like Boris Gelfand was very tired, as that's the only possible explanation for how poorly he played today. Veselin Topalov, on the other hand, played well, except for the 32nd move. But, as they say, you don't judge winners!

It looked like TopalOFF, but then GelFOUND a way to lose the rook ending ... .


yes, I said it was a draw based on what I, a patzer, can calculate. Turn out I was wrong.
I don't see anything bad/shameful about that. Otherwise, we could just wait for the end and not post anything during the game, lest out ideas turn out to be incorrect. What is the fun of that ??

I know Gelfand must have erred somewhere along the line (for example, move the king to the back rank on the 48th move), but still it was a good one for Topalov to take advantage of that.
Maybe because I like endgame play.

Google translates the idiom "you don't judge winners" as "everybody loves a winner", but I think that might be going a bit far in this case :) Though I love Aronian who seems to have had a perfect tournament - he beat his consecutive draws record and then ends with a win!

Thomas, will you skip even one chance to do your annoying nitpicking on Chessdom, Topalov, etc... It’s Ok that you’re not very fond of Topalov, but if you have 1% gentleman hidden somewhere in you, perhaps today you should appreciate something about Topalov.

Chessdom is a site that comments on Chess besides Topalov and will continue to do so. It’s stupid to suggest favoritism in a Chess commentary. In this game you either lose or win, there is no space for the commentator to show favoritism while the game is on the board. If your player is losing there is no way to hide that.

D.

"Nothing odd with Ipatov being Ukrainian, though - he is actually living in Spain (like Topalov), maybe that's the connection. And he is not the only Topalov fan (and Kramnik hater) from a Spanish-speaking country ..."

Nerd.

D.

Ok ,some fragments of Topalov´s spanish interview with Leontxo:

Topalov said he was happy and very exhausted , that he was lucky to play against the older participant in the last round , that in his opinion exahustion decided the tournament.
He pointed out that 48 Kf6 could possibly be better (drawing) than what Gelfan played (Ke8).
He commented that his idea for this tournament was to experiment a little , and that he had to play many openings for the first time in his career because of his forthcoming WCH match.
Veselin also pointed that he made many mistakes in the event and that those mistakes constitute a lesson for him to learn.
Leontxo asked him about Carlsen being a genious and that if he considered Magnus to be like Anand in terms of geniality.
Topa said that they are two very different players and that he is not particulary comfortable with the word ¨genious¨ since it requires a lot of work to be at the top of the game.He called Carlsen the most important chess discovery from the last few years.
When asked which word defines him better (genious or hard worker) , he said that he is without a doubt a hard worker and that it would be disrespecful to claim the world genious to himself since he knows so many talented people.
Leontxo also asked him about what could be the decisive factor in his match with Anand , Veselin said that in his opinion having ¨good nerves¨ could be one of the most important things .
He said he want to arrive relaxed and fresh to his WCH match ,that he is not obsessed with preparation.
Leontxo congratulated him for his victory in Linares and said that errors are an important part of the game between humans , then asked What would happen if everybody played flawless chess?
Topalov smiled and answered that in that case there would be no chess.

Nice tournament guys , it´s been a pleasure .
:)

"Another shared first for Topa could be a very good omen for him , the last time that the same thing happened the guy became world champion and had a great year."

I assume this should be considered a bad omen for Topalov then, missing out on the shared win. :o)

They ought to divide the rating numbers by a factor of ten.

Then folks would look even sillier arguing, for example, that a 281.3-rated player is oh so much better than a 280.5-rated player or a 279.0, or a 278.7.

I left the drawish endgame between Topa and Boris, to watch something more important: The Winter Olympics, Men's 4x10km Final, in Vancouver.
Seems Topa was more lucky than Norway though, who "only" won Silver.

Congratulations to Topalov. The best fighter won Linares 2010. I really, really look forward to the WC match in April!


Thanks Manu for the translation, I couldn't listed to the interview today.

This shows something that always characterized Topalov. He is a very modest and honest in his answers during interviews. Of course, bashers from the WCC 2006 might argue based on what happened on the time, but that is plain ignorance about Topalov's personality we always knew before 2006 and after that (until 2008).

What was shown during 2006 is that Topalov was more fragile as a person than Kramnik and his (or his manager) strategy of showing extra-confidence at that period backfired and left him completely disoriented and a reaction like that was something that might occur, specially when you put your life in nothing else beside chess and the ambition to win.

Now it seems that Topalov might have a girlfriend or something that gives him more stability as a person without hampering his performances. Given that Anand is already a very stable player and both are in great shape, I think this upcoming match will have plenty of exciting chess.


Oddly enough the discussion on the final round at Shipov's forum ended up with them wondering (more than a little tongue in cheek) if the play in Linares has been so haphazard because of sleepless nights spent watching the Winter Olympics! Apparently Gelfand's a big sports fan :)

Well maybe they will stop inviting Gelfand to these tournaments at least, he makes Ulf Andersson seem exciting. Sorry Boris fan(s) but he can take his Petroff and Slav and stick'em......

Well, I still say Topalov won despite playing some less than sterling chess. But he did explain it well, and in the end, played better than his opponents. Does that sound like a begrudging admission of the obvious. It is!

Go Vishy! Best Wishes and Good Luck in Sophia. It will certainly be an interesting World Championship match.

"folks would look even sillier arguing, for example, that a 281.3-rated player is oh so much better than a 280.5-rated player"

Greg, a more constructive approach I think, would be if FIDE would publish their ratings with a measure of spread/variation, based on the player's previous 6 ratings or something (last 12 months).

That would give a better sense of the fact that any "official rating" is simply a somewhat "random" snapshot of something that's constantly changing (here: the history of achieved results, measured along a single axis).

Of course it might create some "problems" in terms of the rankings when you have this scenario:

Player A: 2790 +/- 15 points (p% confidence)
Player B: 2780 +/- 30 points
Player C: 2770 +/- 45 points

Who's the better player?

Such a presentation would still give more information than the current one-dimensional, "exact" ratings.

Regarding the "exactness" ratings are calculated with between lists, with tenths of a point and so on: If you go to the supermarket and one day is presented with a sum of $27.90 for what's in your basket and another with $28.13, I hardly find it a great system flaw that you get information down to the single cent - after all the items in the store are priced at that granularity.

However, if you as a customer think or feel that there's a big and significant difference between the two sums (maybe you had the exact same items in your basket!) then my point of view is that it is YOU that have a problem, more than anything else. Either you are very poor, or there's something wrong about your perception of money and its value.

Most people learn about money's worth when they are kids - or at least when they get their first summer job. Unfortunately there are few moms and dads around to teach chess players about "ratings' worth". That doesn't mean that it's forbidden to spread some information - or to suggest possible "pedagogical" improvements for FIDE.

"There are federations that go through detailed verification of results before submitting to FIDE"

My educated guess is that this first and foremost spells "the German federation" and not very many others. I never understood how they could convince QC/RC to revert the deadline for result reporting from 30 days after an event's completion back to 60 days. I mean, how much time does it take to verify the results from a single event? (In particular because the big majority of events are paired with software that produces rating reports automatically...)

Anyway, even going down to monthly lists doesn't change much as long as the official lists are based on tournament reports from organizers that are allowed to sit for 60 days after an event's completion before they pass the results on to the FIDE rating server for inclusion/calculation.

In fact, the more frequent the official rating lists are, the more room for manipulation from organizers there is, in terms of "timing" in which list a certain event should be counted (to - as long as the absurdly generous 60 days of today remains.

My recommendation would in any case be that FIDE starts to pro-actively collect all relevant results from at least the top 100 players - and alternatively even requires that top 100 players have to register up front any rated events they plan to play, to to facilitate the collection of their results. I fail to see why longish QA-processes performed in only a few member federations should be allowed to be a hindrance to more frequent and up-to-date result reporting - and the latter is a prerequisite for more frequent rating lists.

"instead use your "real time" rating. I for one find it much more useful. Why not use the real time stuff? They should pay you for that service too."

I'm glad you find the live ratings useful, but they are obviously doomed to remain unofficial for the limited time I'll continue to keep them updated. Official live (or more frequent) ratings would need to depend on some official reporting system that is completely trustworthy and kept up to date with all recent results. (See my previous comment.)

Thanks to several people around the world, I'm usually (nearly) complete on results, but there's no guarantee. And I occasionally make punching errors - which another couple guys are generous to point out, too. But it's no sustainable or working model for anything remotely official, I'm afraid. It's not really sustainable for anything else, either ;o)

Not to hijack the thread, but there is a story out that Tobey Maguire will be playing Bobby F in a movie about the 72 world championship.

http://www.firstshowing.net/2010/02/24/tobey-maguire-might-play-a-chess-master-in-pawn-sacrifice/

In case anyone is interested...

Don't worry Dondo. You cannot hijack a thread that has already been hijacked by multiple hijokers. Thanks for the link. By far the most interesting post.

Yesterday I was not the only one, and not the first one to pick on Chessdom's live commentary - I merely added two things:
1) Ipatov is no exception (is the whole team selected, instructed and/or paid for Topalov fanboyism?)
2) It goes along with Kramnik-bashing whenever a remote opportunity (such as Kramnik losing a game in a quite different Petroff line) arises. Similarly, I don't know how many times Manu has mentioned that Kramnik, at one occasion, missed a mate in one ... .

In the Chessdom live coverage, at least the comment on the pawn endgame is odd: If a pawn endgame is drawn, TOPALOV or even R Y B K A cannot win it by a miracle - but gelfand (or someone like thomas if he got that far) could still lose it ... .

Finally it seems you misunderstood my comment about Ipatov being Ukraino-Spaniard. All I wanted to imply is that Topalov has fans in other countries besides Bulgaria, and that Ipatov's Spanish connection _may_ have helped him to get his Chessdom job. Please send your insults to mishanp - while I often agree with him, the following quote is odd IMO:
[mishanp] "Oddly enough, though, Ipatov seems to be a 16-year-old Ukrainian"

Maybe I could have elaborated, but no insult was intended. I'd just assumed the commentator was Bulgarian and it was a surprise to see both that he was Ukrainian and that he was so young. Seems to be a trend of late with Giri at Chessbase.

It was odd to mention that particular game of Kramnik's seeing as the key point was that Topalov's 11. h5 was an improvement on Caruana-Kramnik from Wijk last month. It's a shame Gelfand didn't get in Kramnik's ...Qa5, ...Ne5 & ...Be6 - Shipov liked blacks chances - though he admitted some nostalgia for his childhood when he'd play a move like 14. Rdg1 and always end up mating his (inferior) opponent. He gave 16...Nxh2 as a blunder. Not Gelfand's finest hour to manage to lose that game so badly twice!

the conspiracy theory.vallejo played an honest game against grischuk,i don't know about gelfand,it was a basic endgame and ke8 was about the only losing move..we all know that topalov is a wealty guy compared with the rest of the field,so he could care less about the prize,he wanted to winn before the WC match and to be onthe grand slem thing.some falks say he deserved the overall winn more than grish.why????if you look at their encounters,grish beat topy in a must winn situation and it was a clear winn by white from small advantage,to adv,to decisive advantage,a nice game by grish.on the other hand thei first game white had some advantage from the opening and the played risky(on grish time trouble)and grish got a decisive advantage -+2,2 or so but he could not handle the zeitnot.it wasn't a game to be prouf off for topy.then grish won nicely against both gasimov and gelfand,topy won with luck against vallejo(he was lost again)and now this weir endgame against boris..so werw was his superiority over grish?i'm not talking with topy hardcore fans,but with chess fan in generally

Has it occurred to anyone that Top's erratic play is simply due to lack of tournamant practise?

Topalov lucky to win Linares 2010? Of course he was lucky in some games. Or in some phases of the games, to put it more correct. This is not controversial at all. Topalov himself pointed out his luck in an interview (quoted a couple of times in this and previous thread). The super-GMs are quite even in strength. Often some luck is the deciding factor when a player wins a tournament. Its seldom we see “undisputed/dominant” wins like Carlsen’s victory in Nanjing last year.

Ipatov’s excitement about Topalov might be explained with Topalov’s exciting play…
Remember also that his writings actually are “live chat”, not meant for post scrutinizing. He is from Ukraina, living in Spain, but writes in English. His grammar is barely OK, which might introduce some improper sentences. Also, he is only 16 years old. -Unless he is a frequent reader of The Daily Dirt, he hasn’t learned the consequences of “loose cannon” talk.

Or, if you go for the conspiracy theory, he is employed by Chessdom, who has close business relations with Chessbomb, who is a Bulgarian company, which (wild guess) has Danailov pulling its strings.

Make your pick.

I actually thought chessdom was an official Topalov fan site. Does it pretend to be independent? That's funny.

It may be sort of a combination of both: Ipatov chose to work for (or with) Chessdom because he is a Topalov fan - just like other "mainstream" journalists may choose between a tabloid and a quality newspaper, between a left- or right-wing paper etc. ... . Chessdom being Bulgarian of course doesn't mean that Danailov exerts control or is directly involved, not even that they are pro-Topalov/Danailov, this I infer from their track record.
Nothing wrong with being enthusiastic or excited either - personally I prefer objective coverage but I am not part of their (presumed) target audience!? This leaves their recurrent Kramnik bashing, which is more amusing than annoying ... .

@Rob Fish: Is Ipatov only 16 years old? I’m most impressed!!

Carlsen is #1. I wish Anand and Topalov were #s 5 and 6 in the world just to bug all the ratings freaks.

I think Chessdom is what the late lamented veselintopalov.net morphed into. They certainly both used/use the unusual "N1" for no. 1, as in "Veselin Topalov - a page dedicated to the N1 chess player of the 21st Century".

I actually find Chessdom more Carlsen fans than Topalov fans. In any tournament where Topalov and Carlsen participate, they praise Carlsen and do not pay attention to Topalov.
Not to mention they were the official live games and site for BNbank blitz, often write about Hammer, etc.

Ipatov himself was equally enthusiastic about Carlsen, so what? He is biased?

And in general he was most enthusiastic about Anna Muzychuk :)

rdh, when you write stuff like this you look like Nigel's twin. I can detect something is off.

So Thomas never read objective commentary on Chessdom like “Topalov is lost”, “Topalov lost track...”, etc…??

You guys…

D.

In fact, the more frequent the official rating lists are, the more room for manipulation from organizers there is, in terms of "timing" in which list a certain event should be counted (to - as long as the absurdly generous 60 days of today remains.

My recommendation would in any case be that FIDE starts to pro-actively collect all relevant results from at least the top 100 players - and alternatively even requires that top 100 players have to register up front any rated events they plan to play, to to facilitate the collection of their results. I fail to see why longish QA-processes performed in only a few member federations should be allowed to be a hindrance to more frequent and up-to-date result reporting - and the latter is a prerequisite for more frequent rating lists.

***

I just do not see the value in the live rating lists -- other than marketing (which is a separate function).

From a ratings usefulness perspective, it is absurd to have ratings change daily or weekly...game by game.

What is your rating? I don't know -- let me check the instantaneous internet to see.

Not very useful for pairings, not very useful for invitations.

Official ratings ARE useful for those things -- the concept of a chess rating includes the idea that chess performance doesn't change dramatically over short intervals of time (unless one is very young and improving...or older with dementia).

Thus, your rating shouldn't change very much...and a tourney by tourney change should -- for most purposes -- be ignored as NOISE.

Does it help prediction to have ratings game by game, minute by minute? Perhaps at some abstract level...but not really.

Does it deflect attention from what ratings are FOR? For predicting results...for generating pairing lists (irrelevant for round robins like Linares) and for generating invitations.

Ratings themselves shouldn't be the only criterion used for invites...as they are almost assuredly lacking the fine detail necessary to select among dozens of deserving candidates.

So...no live ratings please.

And no castigating aspersions on federations that might need to double check submissions.

FIDE does require that its rated events be listed in advance, which at least identifies the body of rated games that should be received for a ratings period.

USCF (amateur non-FIDE events) don't have to be registered, so there is an unknown body of games/events to be submitted -- you never know exactly how many will be submitted as some are planned in advance and some are spontaneously played.

But I'm sure larger federations -- like USCF -- struggle to ensure that submissions from FIDE events in their areas are submitted TO THEM in a at timely fashion...so that the results can be inputted into the special FIDE formats and then submitted.

This may seem easy when you are doing so for the top 10 players in your federation.

But not when you might have 1,000+ rated players...down to very low ratings...and you need some oversight to catch obvious fraudulent submissions.


Live ratings make great marketing noise.

Live ratings (or more frequent ratings -- not just Frogbert's invention) are lousy at doing what ratings should do.

###

I've certain been critical of Topalov on this blog before, and I'm not Topalov fan, but I also don't see how one can say Topalov was lucky to win Linares.

Needing a clutch win in the final round to win the tournament, Topalov simply outplayed his opponent in a tense ending. Great players have the ability to pull out a win when the need it, and Topalov showed he could do that. My hat is off to him.

> Even Grishuk agrees with you .

Actually, Grishuk's point has nothing to do with mine. Grishuk was talking about all the games in tournament prior to the last round. My comments were _solely_ directed at the final game, in which Topalov was able to score a critical win by cleanly outplaying his opponent, like a great quarterback scoring the winning touchdown in the final drive with 2 minutes left on the clock. Doing that offsets any amount of luck he may have had prior that final game in the tournament.

Mmm , but when you said:
"but I also don't see how one can say Topalov was lucky to win Linares."
Maybe you should have said to win " the last game" instead ...

Anyway , is nice to hear Grishuk calling the claims of luck an exaggeration.

> Mmm , but when you said:
> "but I also don't see how one can say Topalov was lucky to win Linares."
> Maybe you should have said to win " the last game" instead ...

No, I meant exactly what I said. In every tournament, a player experiences a certain amount of luck as well as a certain amount of skill in outplaying his opponents. A person "deserves to win a tournament" (as opposed to "is lucky to win a tournament") if the amount of luck, over the tournament as a whole, is negligible. I didn't think it was negligible prior to Topalov's last round, but Topalov's great last round performance tipped to balance in my opinion.

¨ No, I meant exactly what I said. ¨
You talked about his allegedly luck on winning Linares :
"but I also don't see how one can say Topalov was lucky to win Linares."
and then claimed to be talking only about the last game:
¨ My comments were _solely_ directed at the final game¨
Wich is fine , i guess ...
But i´m sure you can understand that when a person hears about ¨luck in a tournament¨ has no reason to guess that you were talking about a particular game .
But nevermind , nice talking with you .
:/

Carlsen is #1. I wish Anand and Topalov were #s 5 and 6 in the world just to bug all the ratings freaks.

This type of nonsense is why FIDE should go back to listing ratings in blocks of 5 points -- i.e.

2815
2810
2805

And it would be clear there is no difference among various players.

chesspride, you listed a whole bunch of things that live ratings are not good for and then you make a non-sequitur "no live ratings please."

It's like saying, "dogs are not good at helping the kids do their homework or for getting me to the store to buy groceries or for fixing the car when it breaks down. So no dogs please."

Dogs, like live ratings, are just for fun.

Elo himself emphasized that the system bearing his name did not have single-digit accuracy. Trust FIDE to get it wrong...

Hm, strange. But the scale is arbitrary. Why didn't he divide everything by 5, then?

If I recall correctly (it's in his book "The Rating of Chessplayers, Past and Present"), the USCF wanted it that way.

chesspride, you listed a whole bunch of things that live ratings are not good for and then you make a non-sequitur "no live ratings please."

It's like saying, "dogs are not good at helping the kids do their homework or for getting me to the store to buy groceries or for fixing the car when it breaks down. So no dogs please."

Dogs, like live ratings, are just for fun.

****

Please read what I said (again).

Ratings are used to predict results in future games -- that is their primary purpose.

Ratings are also used for invitations, pairings in open events and various things.

For these (primary) purposes....ratings must be based on a rather large body of work (games). The assumption is that they are (fairly) stable over the short-term.

Thus, the official ratings must come out once every 6 mths or every 3 mths or even (gasp) once a month.

But they cannot and should not change weekly or daily or game-by-game -- that is not real change, it is noise (i.e. mostly error if the goal of measurement is to find the "real" or "true" score).

Thus, live ratings are not -- cannot be -- used for these sorts of primary purposes.

They are a distraction -- a type of "fake" rating or "deceptive" rating. They turn chess ratings into a type of token economy or gambling.

So their primary (only) purpose is one of marketing -- i.e. lists of "top 10 fast-rising stars" or "hot players" or some such drivel that belongs in Vogue or Cosmopolitan magazine rather than a rating list.

CAN you generate ratings game by game? Yes.

Should you? Probably not...or at least don't release the results daily...release them once per month.

Otherwise you confuse real change with spurious change.

Would you want your college grades to change day by day -- minute by minute (they could -- but should they)? How about your review at work?

No...that's just as absurd as saying Topalov trails Carlsen by seven-tenths of a rating point.

I should also add that it is precisely because there are quite a few "equivalent" players who vary by only a few rating points...that ratings should not be the sole source for any invitational formula or prize award.

That puts too much stress on ratings....especially as players can choose where, when and whether they want to play in particular events/vs. particular players.

Instead, better to award prizes or invites via qualification or a mix of ratings and qualification or some other mixed approach. That is why ratings are not ideal for invitations or awards...and why live ratings are even less useful for such things.

>From a ratings usefulness perspective, it is absurd to have
>ratings change daily or weekly...game by game.

Absurd?!? Nothing absurd about it. It is a simple fact of life that ratings change after every game. The only absurdly senseless thing is to report them weeks/months later. Do you like to read yesterday's papers too?

Now, the argument as to what criteria tournaments invite players is a totally different and much deeper subject. Obviously it is not based on ratings alone.

> Does it help prediction to have ratings game by game

Prediction??! What, are you a gambler now? Would you gamble on Shirov or Ivanchuk's ELO to predict how they'll do in the next event?

Anyway, I am glad Runde does what he does and if some don't like they see they can close their eyes.

D.


As related earlier in this thread, it would be impossible to pre-register every tournament played under the USCF rating system, as some occur after about two minutes from inception to planning to execution!

But would it be possible to pre-register every tournament rated under the FIDE rating system? I know that the scope of this ratings system has increased as of late -- but if it were possible, then could not ratings be revised after each tournament (rather than game-by-game, or month to month)?

Perhaps answering my own question would be the problem posed by league results, which often take months to complete, while the participants play in other tournies.

chesspride, you think it's silly for ratings to change on a per game basis (or after every event), yet the fact they aren't updated so quickly actually makes it easier for "tourists" to make huge rating gains into the top 10 (or top 20, etc). By the same token, a player in bad form has his rating loss amplified. This might actually have a greater influence on invitations than the potential failure of organizers to correctly interpret more frequent changes to the rating lists.

"Does it deflect attention from what ratings are FOR? For predicting results...for generating pairing lists (irrelevant for round robins like Linares) and for generating invitations."

Where did you get the answer to what ratings are FOR, chesspride?

The purpose of ratings is to measure results.

* NOT to predict them

A side effect of people having ratings as a measure of previous results, is that it can be considered regarding invitations. In Swiss pairings (or other similar systems) they can be used as a tool for making the pairings in order to get some desired (or maybe undesired?) effect.

At the end of the day, though - the PURPOSE of ratings is to MEASURE results/performances.

Ratings don't directly measure strength - obviously somebody's strength doesn't take a huge hit because of one very bad event. But the player's recent history of RESULTS does. And this is what's visisble in the rating (and its change after a good/bad event).

From what you write, it seems to me that the problem you have with ratings (live or not) is a wrong (or not very useful) perception of what they are and what they are for, chesspride.

If you start thinking about ratings as a measure of results, you'll see that it makes just as much sense updating the CURRENT NUMBER continously as it does choosing some "random" sample of the (in fact) continously changing number to be THE official rating for a period of time.

If what you really want is some more constant measure of strength, then what I suggest YOU start advocating, is that FIDE changes its official ranking number for each player to be the average of the last 50 or 100 data points - where each data point is the resulting rating number for each individual game rated. This average could be additionally decorated with a measure of spread. The average over the last 50-150 data points would say much more about someone's LEVEL than the current habit of using the absolute last data point - where does the graph continue? Nobody knows. [Of course, more fancy algorithms than averaging could theoretically be used if one really wanted to try to say something about the future - e.g. extrapolating some tendency - but I would recommend sticking to the past.]

Going back to more coarse grained measurements, 5 point or even 10 point intervals in reality does NOTHING to solve what you apparently consider to be the problem. Even if i'm a bit unsure if there's really any problem here, except with the perception.

The most important thing would be if more people would actually understand what ratings are and what they mean. Ratings are DESCRIPTORS of results. We don't even WANT ratings to be anything near perfect predictors of results - such ratings would've been utterly useless, for anything. Ponder that.

"Ratings are used to predict results in future games -- that is their primary purpose."

That's a positive wrong. :o)

If you start out with a wrong assumption, what follows will be accordingly.

The very elegant thing about ratings is that they make it possible to compare the results of players that have played in completely different events against mostly different players - and still be able to say something meaningful about who have done well and who have done less well.

I still see many chess fans put too much emphasis on tournament wins, disregarding both the strength of opposition and whether or not the win was multiply shared and finally rewarded only due to some tie-break criterion (or blitz games).

Four 3rd places can be much more impressive than 5 1st places, if the opposition was much tougher in the former case. That's exactly what the rating system makes possible to understand. It's got nothing to do with PREDICTING anything, but rather about being able to "normalize results" across different events and opposition.

So again, the purpose of the chess ratings is to measure and describe results in a uniform and comparable way. The element of expectation is simply a technical bit used in how ratings are adjusted.

The misconception that ratings can be used in any meaningful way (or exists to be able) to predict outcomes of single games with any notable degree of certainty in an otherwise even field of players, is exactly the kind of thing that spurs so many unfair claims about ratings and their usefulness - or the lack of such.

My three cents on the rating discussion - disclaimer: I do criticize neither the system nor frogbert's work, only the way people might perceive it to draw far-fetched or wrong conclusions:

1) Going to blocks of 5 points may actually be confusing: Players with "true" ratings of 2783 and 2787 would appear to have exactly the same strength. But a player rated 2787.6 would appear to be "much" stronger than one rated 2787.4 (25-fold exaggeration of the real difference between them) ... .

2) One "feature" of the live rating list: daily ratings and rankings can depend on random effects of drawing of lots:
- Topalov was temporarily #1 because he beat Grischuk with white before losing with black. This wouldn't have been the case if the two games were played in reverse order (assuming the same results).
- At Corus, Shirov moved up to #12 (2741) by scoring 5/5 against the lower part of the field, then down to #18 (2734) by virtue of a minus score against the upper half [of course this also indicates how closely packed the subtop is ...].

3) frogbert's suggestion (in response to chesspride) to use "the average of the last 50 or 100 data points": This more or less corresponds to giving recent TPR's (last 5-10 events) for each player?
Maybe this could be given at the live rating pages? I assume it wouldn't take much extra time nor storage space ... . Expected results (not new for those following the scene):
- Carlsen (and recently also Kramnik) are stable world-top players (>2750 or even >2800), yet Carlsen's Nanjing result was a positive outlier.
- Players like Ivanchuk and Shirov are much more unpredictable and variable.

Thomas,

I've actually been discussing various ways of adding/implementing average measures (or similar) plus an indication of spread to the live ratings with several kibitzers over at cg.com - there are a few technical problems since I only record/store data for players currently above 2700 - but a simple solution would be to only give this kind of info for stable (or "real") 2700+ players.

I'm also getting closer to "releasing" my performance profiles which show a player's performance over the last 18 months (or possibly some user-selectable period if i bother) mapped to the strength of his/her opposition - in addition to average performance and average rating in that period. These graphs illuminate a player's level over time, plus his strengths and weaknesses in terms of opposition much, much better than what simply stacking all the different results into a single, one-dimensional number.

Personally I'm much more fond of doing that kind of things as opposed to the tedious labour of updating the live ratings. The performance profiles are made directly from fide data - and hence will only be updated as many times per year as there are official lists released. If FIDE would've given me "direct access" to all the data in their database I could've easily made several other illuminating presentations without having to change anything about how the ratings are calculated.

I consider the argument (not yours!) about live (or very frequently updated) ratings being damaging to the ability to produce norm-tournaments and such just an example of not being able to think "outside the box". Even if it hypothetically would've been possible to update all ratings continously - and we'd chosen to do so - it would be very simple to define a "norm-applicable" rating for each player as many times during a year one found useful - for instance 4 times each year.

At those 4 pre-defined times one could simply choose

a) average of last X data-points

b) the most recent data-point (today's "official" ratings)

c) some other similar measure of one's choice

to be each player's number that would count towards other players' norm chances, whether simply rating norms or title norms. There is absolutely no scientific or natural law that prohibits us from choosing such an approach - if one would prefer to have completely floating ratings.

"daily ratings and rankings can depend on random effects of drawing of lots:"

Sure. But why do you consider that a problem? As long as you don't consider changes of +/- 10 points (the difference of a win and a loss for the 2700 players that have an effective K of 10) to be very significant - or differences of that amount between any two players to be of great importance, such random effects are negligible.

Again, I think the problem lies in the perception - not with the rating changes or the rating system itself. Remember my example about the $28.13 and $27.90 grocery baskets.

"This more or less corresponds to giving recent TPR's (last 5-10 events) for each player?"

Not quite, but maybe "more or less" :o). In Jeff Sonas' Chessmetrics system, the rating numbers are simply "padded" performance ratings over a defined period, with stronger weight given to more recent results. Except for some technicalities about how he's "padding" these performance numbers, I think that approach in many ways results in more sensible/representative ratings than FIDE's "random snapshot" official ratings. [They are of course not completely random, as the snapshots are taken at given time intervals, but the amount of activity between any two sampling points has great variations and differences, both between players and even for each player, since very few (essentially none) play a consistent number of games in each rating period. Hence, what goes into someone's next rating is completely unpredictable from the system's point of view - at least until the rating reports have been submitted. But both in Sonas' and FIDE's system the "bucketing" is done on a calendar-based basis and hence with little or no regard to the amount of ACTIVITY. Alas, rating changes over one year might be based on 0 games, 10 games or 150 games. It's quite evident what provides the more significant evidence of a player's general level of play.

Frogbert and others think it is silly to use ratings to predict future results.

Where did you get the answer to what ratings are FOR, chesspride?

The purpose of ratings is to measure results.

* NOT to predict them

Yet that is the purpose of the rating system -- to help with Swiss-style tournaments so the players can be ranked accordingly....so that the "top" players are recognized and so that you maximize the chances that the top players win the event.

USCF's own rating committee views "prediction" as a primary rating purpose...which is why they maintain strict control over the formulas used so that their results mirror past results.

This is not exactly a controversial claim and I am surprised to see posters here objecting.

>

---

---

But don't you agree that the predictions which can be obtained from ratings are useful for assessing how well a given system measures skill, at least as a basis for comparison among competing rating systems? Not just competing system actually, also as an objective way to optimize the rating system parameters (K-factor)? Otherwise, I can create any arbitrary rating system and claim that it measures skill even if does a lousy job of it.

It will be interesting to see your "player profile" results. I suppose it will quantify the degree of truth in widely held beliefs such as Moro's relative dominance of weaker GMs compared to the elite, or the adherence of certain GMs to the "draw with black, win with white" strategy?

Oops, the post above was supposed to be preceded by:

"Ratings are used to predict results in future games -- that is their primary purpose."

---

"That's a positive wrong. :o)

If you start out with a wrong assumption, what follows will be accordingly."

---

To those involved in the rating discussion:
I refer to
http://www.ratingtheory.com/conclusions.htm

and particularly comments like:

[start quote]
Nathan Divinsky in The Chess Encyclopedia [D] calls the Elo System "a mathematically sound and universally accepted (1970) rating system for chess players." The year refers to the adoption of the Elo System by FIDE. Aside from a 1965 contribution by Elo to The Journal of Gerontology, there has been virtually no peer review of the system beyond the world of organized chess. One of the few external references is to be found in The Mathematics of Games, by J. D. Beasely [B]. Beasely offers this scathing footnote on the work of the late Professor Elo:

"His statistical testing is unsatisfactory to the point of being meaningless; he calculates standard deviations without allowing for draws, he does not always appear to allow for the extent to which his test results have contributed to the ratings which they purport to be testing, and he fails to make the important distinction between proving a proposition true and merely failing to prove it false."

Beasely nevertheless accepts the premise of the Percentage Expectancy Curve. His basic concern is the difficulty of demonstrating any suitable probability function for that role. Again the essential incoherence of Elo's position seems to have escaped notice.

[end quote]

So are the claims re Elo of:

1)"essential incoherence",
2)"unsatisfactory to the point of being meaningless",
3)"aside from a 1965 contribution by Elo to The Journal of Gerontology, there has been virtually no peer review of the system beyond the world of organized chess",
4)"the essential incoherence of Elo's position seems to have escaped notice"

generally accepted as valid?

Is it true that we need to go to the J. Gerontolgy for a peer review of Elo's "science" ....and that the review was by Elo himself!??
Mmm......Thomas?

Is Elo's statistical testing meaningless?

Help!

peers!? ya

Elo ratings are also used in
Scrabble, bowling, tennis, table tennis
and golf.(just to name a few) ahahahaaaaeee!
o.k.

Understand that Hans Arild Runde is undertaking
nothing new; merely showing current ratings as if FIDE were to have now published them.
When FIDE publishes their list,
the process begins again!

Alas, ratings are not science.
You want scrutiny?
You got Rybka.
But what if Rybka loses on time?!
Aah. It's 1970. GM's have opinions again
without worring of the little fish!

Cheers!

@dysgraphia:
Your post confuses me. Where is your target? Are you claiming that Elo rating doesn't work properly?

"Frogbert and others think it is silly to use ratings to predict future results."

That's not exactly what I said. I won't bother to repeat what I wrote, though - but I suggest that you read it again.

"USCF's own rating committee views "prediction" as a primary rating purpose...which is why they maintain strict control over the formulas used so that their results mirror past results."

I'm not very familiar with the formulas used by USCF and in which ways they differ from those of FIDE. I do know that USCF maintains a really hopeless notion of rating floors (to avoid "sandbagging") that both hurts the system's predictional strength and causes inflation. Hence, anti-cheating measures obviously dominate the wish to maximize predictional strength.

But no matter what "view" the USCF rating committee has, it doesn't change what rating systems primarily DO.

They do NOT primarily predict things - they primarily DESCRIBE things, namely results. They do so in a way that allows results from different events against different opponents to be measured and hence compared.

For any Swiss-like system, you only need a mostly accurate ranking of the players. The significance of rating differences of 10-30 points or so is zilch - and even more so over short 5-6 round events, but also for 9 round events. Hence, the fine-grained rating system is useful for giving a strict ranking - but the strict ranking doesn't at all predict the expected outcome anywhere close to what's needed to JUSTIFY the clear favouritism of the top-ranked player(s) in a point group in a swiss-like system. Go through any tournament's result and compare TPR with ratings and you should probably understand what I mean.

"they maintain strict control over the formulas used so that their results mirror past results."

Chesspride, what do you think the above means? Which running modifications are done to the USCF system to tune predictional strength, if that's what you mean to imply here? "maintain strict control over the formulas"? So that nobody "breaks in" and does strange modifications to them? ;o) Seriously - how strong correlation do you think there is between someone's rating and their result in a future, specific event, on average? And what do you think the USCF rating committee can do - on a month to month basis - to influence that?

Btw, I responded to several other of your claims about the "problems" with live or more frequent ratings - I can't see that you answered any of my points. Do you accept that all your criticisms have been refuted? :o)

"But don't you agree that the predictions which can be obtained from ratings are useful for assessing how well a given system measures skill, at least as a basis for comparison among competing rating systems? Not just competing system actually, also as an objective way to optimize the rating system parameters (K-factor)?"

Of course the correlation between rating differences and actual results is important. The philosophy behind the system is that of a "best estimate by now", though - we never assume that we've achieved the "correct" rating. In the FIDE system, if a player does less well than the system expected (or "foresaw"/predicted), one simply takes the consequence of that and adjusts the rating down, according to the amount of "underperformance". And similarly for an "overperformance".

However, I'm not completely convinced that optimizing for "predictional strength" necessarily makes the mapping from "achieved results" to "estimated skills" notably better. I know Jeff Sonas has argued that Chessmetrics is better than the FIDE system due to having higher predictional strength - SHORT TERM - at least in the rating segment and over the data in which Sonas has performed his comparison & validation.

The two main tools to achieve this in Chessmetrics (compared to FIDE) are higher K and more frequent rating lists (12 per year), resulting in more responsive/dynamic ratings. But does that really capture skills better? Or does it simply pick up on variations in form & performances quicker? You can be darn sure that Ivanchuk's skills don't change by the amount that his rating typically varies; even with FIDE's K of 10, he dropped from 2790 to under 2700, and went back up to 2755 in much less than a year. With a doubled K it could've been down to 2610 and up to 2730 instead. Do you think ratings between 2790 and 2610 would give a better or worse picture of Ivanchuk's SKILLS than values between 2700 and 2790?

Something that's 100% certain, is that we do not want perfect "immediate predictional strength" - simply because chess players' performances vary much, much more than their skills do. Remember that ratings really measure RESULTS, not skills. While the correlation between "practical skills" and results obviously is big and notable in the long run, it's far from 100% in the short run - not in single games or single events, heck - not even over 2-3 months (with variations from player to player). Alas, I disagree with Sonas here; it's not obvious and undisputable that immediate predictional strength can tell us which of two competing rating systems is better. And even less so if what you're interested in is a more "permanent" estimate of "skill" - I consider "skills" to be notably less volatile than "results" and "performances".

For some more thoughts about this, see the following article on chessvibes, and my comment(s) in the comment section.

http://www.chessvibes.com/reports/on-the-increase-of-the-k-factor-part-ii/

In particular, the comment from "frogbert on May 15th, 2009 12:49 pm"

Your view is a view of what ratings should be. It is not what they were designed for, and it is not an objective asessment of their matematical function.

I agree that this should be a goal, but then the system must be changed quite alot.

Hi Bartleby!
I drew attention to the link:

http://www.ratingtheory.com/conclusions.htm

which canvases the conclusions of the author who examined Elo's methodology IIRC as part of a research thesis. The author had provided many pages of discussion regarding his research prior to his stated conclusions.

Given those conclusions, so far unchallenged on this blog, for example point

4)"the essential incoherence of Elo's position seems to have escaped notice"

I was curious why so many have faith in Elo-type ratings.....are they soundly based on correct mathematical and statistical principles or are we dealing with something more akin to astrology?


@dysgraphia: Ok, I've read the linked article now, and it seems you are serious about it.

The article makes the same mistake that has been pointed out by frogbert: The Elo rating measures past results, and it does this in a satisfactory way. The article states that the system's predictive power does not stand on a sound scientific base and "falls short of the scientific rigor that Elo envisioned for it". I can agree with that but don't see a big problem here. It doesn't affect the main service the Elo rating provides for the chess community.

It's some time since I've read Elo's book but from what I remember the function for computing the expectations is based on assumptions, which may or may not approach reality. In Germany we formerly had a different, less ambitious, linear function that worked well, too. As far as I understand the mathematics involved, you will get meaningful result measurements with a broad range of functions. The basic requirement is only that the winner gets added some points, and the loser gets subtracted some points.

Hi Barltleby,
Did you read the whole article or just the page of conclusions?
IIRC the article condenses work from:
R. C. Jones, Evaluating Competitive Performance with Rankings and Ratings,
Master of Science Thesis, University of Rhode Island, 1994.

He provides via ftp supporting VB programs et al at the link:
http://www.ratingtheory.com/Downloads.htm

Link to first page is:
http://www.ratingtheory.com/index.htm

which then allows the many other pages to be accessed.
I ask this question because at the beginning of the treatise there is a discussion of an early German rating method and the maths it used and it may be the one you were thinking of.

[quote]
The influential Ingo System of West Germany followed in 1948, named by its originator Anton Hoesslinger (1875-1959) for his home town, Ingolstadt in Bavaria [HW, "rating"]. It establishes the basics of ratings in a remarkably simple formula,

[1.1] R = ERc - (Pct - 50) ,

where ERc is the arithmetic average of the opposition ratings and Pct is the player's score in percentage points. A peculiarity here, from the standpoint of subsequent systems, is that lower ratings represent greater playing strength. Hoesslinger appears to have relied largely on intuition in developing his system, which manages nevertheless to be theoretically provocative.
[end quote]

You state:
"The article makes the same mistake that has been pointed out by frogbert: The Elo rating measures past results, and it does this in a satisfactory way."

I think you are wrong here. If you read the full discussion you will see that the author carefully and systematically builds his maths....and in the process discovers some mistakes by Elo.
I would be interested in you or frogbert pointing out where the author erred in his maths.
He gives an email contact so maybe he would be interested too.

I didn't read the whole article, only the page of conclusions.
And yes, it was the Ingo system I was referring to.

I am fully prepared to accept that the author got his maths right. But in the practical usability of the rating system, this point is a technicality. The underlying function does not have to be perfect. If the author has a function that does a better job in modeling reality, it's fine, if not, Elo's formula is good enough for all practical purposes.

You don't need a perfect function because you don't try to calculate the whole universe in one go, as mathematicians like to do, but you apply it repeatedly with every rating period. As long as the change points into the right direction you will get meaningful results. That, and a bunch of corrective factors to handle obvious nonsense, and you get away with a lot of sloppy assumptions.


...going to read the article after all...

Interesting article!

[quote]
"For Elo the Percentage Expectancy Curve was patently a probability function,
and he could make no sense of the objection that it was not. In hindsight,
the objection might better have been raised as a distinction of terms. Since a
percentage score may be thought of as an estimate of probability, defined as
a LONG-TERM PERCENTAGE [my emphasis], there is reason enough to regard the Percentage Expectancy Curve as a function that relates probability to rating difference. In this broad sense, it is a probability function. But there is another sense of the
term that is restricted to those functions that arise in probability theory from
a mathematical analysis of variability, such as the normal curve or the logistic,
and these we may call true probability functions. A function that merely maps
probability to another variable without some justification based on variability
analysis would thus be called an arbitrary probability function. Such functions
include those based arbitrarily on a true probability function, which is the case
of the Percentage Expectancy Curve. If the Percentage Expectancy Curve itself
were a true probability function, it would be derived independently from distributions
of rating difference, however these might arise, but these can hardly be
known without a pre-established definition of ratings.

It need hardly be said that an arbitrary probability function may take virtually
any form, including the linear form deprecated by Elo. Since the function is by definition arbitrary, it cannot be improved by mimicking true probability
functions."
[/quote]

In short, the author says that the percentage expectancy curve is not a true probability function, but an arbitrary one - and it remains arbitrary (and not based on or modelling anything observed in the "real world") also when arbitrarily based on true probability functions, like for instance the one describing the normal curve. The percentage expectancy function (or table) mapping rating differences to score expectancies, is a LONG TERM expectancy. [Please take notice, Chesspride.]

[quote]
"Tests of a rating system, such as those offered by Elo in his main work, tend not
to be worth the paper they are written on, mainly because a rating is a measure
of playing strength only in a metaphorical sense. Ratings are statistics, and
the predictions they offer in this respect are self-fulfilling. A test of arithmetic
averaging, by analogy, to determine whether it yields central values would be
pointless. Ratings do not predict changes in playing strength: whether, for
instance, a particular twelve-year-old will remain a novice all his/her life or
become the next Bobbie Fischer. Ratings assume, rather, that the playing
strength exhibited by past results will be the playing strength exhibited by
future results. Their predictions are nothing more than an extrapolation of
demonstrated playing strength."
[/quote]

In other words - the only thing the rating system can "predict", is that in the LONG RUN a player will continue to perform as before; Not in the next few games necessarily, or even the next few events - but theoretically over a "reasonably large" number of future games. However, there's no way we can know that anyone will continue to produce the same kind of results as in the past. And short term, over few games, it's more likely than not that there'll be a difference!

[quote]
"Choosing the best statistic for a rating system will depend on criteria
other than variability analysis, including but not limited to simplicity and practicality.
Unfortunately, the complexities of probability theory have captured the
imagination of chess, and the bell curve has become an icon of chess ratings.

What follows is a belated attempt to demonstrate a more reasonable theoretical
basis."
[/quote]

Hence, the author doesn't claim that the formulas developed aren't useful for its intended purpose, but rather that Elo "misused" probability theory in his attempt to create a theoretically sound and rigorous system. Alas - it seems to work, but not for the reasons claimed by Elo.

[quote]
"The proof of the pudding, it has been said, is the actual operation of a rating
system, and the Elo System has been grinding out chess ratings for over four
decades now with hardly a grumble from the rating pool. One is tempted to
say that the system works despite its theory rather than because of it. The
reputation of the Elo System, on the other hand, rests largely on its supposed
ability to predict chess outcomes. There is even the occasional inquiry as to
whether the system can predict outcomes in sports such as basketball, football,
golf and soccer.

As this treatise has attempted to show, the predictive powers
of the Elo System are not due to its application of probability theory, which
in the final analysis must be characterized as a misapplication, but rather to
principles of averaging which have hardly been articulated elsewhere."
[/quote]

The article also has sections on the pros and cons of sequential ratings (like in the FIDE system) and simultaneous ratings (like in Chessmetrics) plus generally some of the best discussion/analysis I've read so far on the topic of chess ratings, confirming several of my own "feelings" and/or "experiences" that I've been unable to formalize with my relatively limited mathematical skills. Absolutely a must-read for anyone with a desire to understand more of the basis of rating systems.

Again, thanks for pointing out the article, dysgraphia.

Hi frogbert!
I'm pleased you enjoyed the article!

For me Elo-type ratings are a little like the following analogy: I have a strip of metal and I want to measure how long it is so I get a ruler and measure it. How do I know my ruler is accurate? .... drum roll....I have this strip of metal whose length I now know ... end drum roll.

There have of course been rating systems used outside of chess and long before in as diverse fields as financial markets and thoroughbred horse racing.
The better ratings always try to start from an acceptable "gold standard". In chess a possible "gold standard" was referred to in a previous thread to which we both contributed, the measurement of move accuracy in WCh matches. How do we assess the "strength" of a chessplayer without using results against another chessplayer ie self-referentially? ... by either playing a match against a standardized chess engine or feeding game scores into such an engine to determine the players "accuracy" coefficient then expressing this coefficient in a conventional format.
My $0.02 worth.

Hi Bartleby!
Thx for your comments.
I became interested in chess ratings from a non-chess stance. That is, I wanted to apply the touted "Elo science" to ratings in other fields so I had a closer look at how Elo-type ratings were developed and applied. My background in the "hard sciences" immediately started alarm bells ringing esp Elo's use of probability and statistics ... meh! Project abandoned in favour of much better methods.
BTW, Microsoft has developed a rating method for their online game community. The algorithm is secret but I understand it is based on some work of Prof. Glickman: I'm told it is definitely not Elo-based.

"How do we assess the "strength" of a chessplayer without using results against another chessplayer ie self-referentially?"

Well, the simple answer is that we do NOT assess strength in any absolute sense (in our rating systems). It certainly IS a relative measure - an imperfect ordering of the players, based on limited data about how they "dominate" each other (or not). It's bound to be imperfect even in a pool where a directed "domination-graph" can be constructed such that there exists a node from which every other node can be reached. Because there are no very strict transitivity regarding these "domination relationships", cyclic domination-paths are bound to exist.

Does that render these relative ratings useless? Certainly not. In terms of providing meaningful rankings not only of the top 50-100 players in a sport, the rating systems manage to create mostly meaningful rankings of even 6-digit numbers of players - where we can be pretty sure that someone ranked 50000 is notably stronger than someone ranked 70000. We can even make a decent guess about the outcome of a 10-game match between two such players, based on the rating difference. It certainly isn't perfect - and like I've warned about here: the ratings are mostly a measure of the past, a descriptor of what has been, much more than a real predictor of what's going to happen.

But as long as there are no really good and/or practical alternatives to (various kinds of) rating systems, I'm quite happy about what they do, despite their relative and hence partly cyclical (self-referring) nature.

"Microsoft has developed a rating method for their online game community. The algorithm is secret "

Hm... Maybe I should try and see if it would be possible to get a glimpse of how that algorithm works - for my own enjoyment...

Sorry, but I think your comment (this particular one) isn't worth much more than $0.02 ,:) . Why should there be a need for a "gold standard"? The only purpose of the ELO or any other rating and ranking system is to compare players between each other, within a given population - global for ELO, American for USCF ratings, German for the former Ingo system (which I also remember from my own experience). The number ELO 2000 doesn't have any particular meaning, indeed it's absolutely meaningless for outsiders - but within the given framework it means "roughly halfway between beginner and world champion".

There are other problems:
How golden is the gold standard? Engines are perfect (or as perfect as can be) when it comes to brute calculation - but their positional chess understanding is no better than the programmer's, and even this assumes he is a perfect teacher. For example, would players lose ELO by accepting a draw in an opposite-colored bishop ending, just because an engine insists on +2 or +3?
What would Topalov's Linares TPR be? Fans, neutral observers and 'detractors' will agree that his play was far from perfect with regard to an engine gold standard, still he won the event. [They disagree regarding "just how lucky he was" to win the event ...]. How much ELO would this tournament winner lose, if his usual play is less speculative?

Such problems are avoided if he, or anyone else, is simply compared with his peers - either directly or indirectly: I never played against Anand, but I may have played against someone who played someone who played someone who played Anand ... (probably there would be 10, if not 100 persons in between!?). IMO this, and nothing else, is the purpose of the ELO system.

"I may have played against someone who played someone who played someone who played Anand"

Having played Magnus Carlsen in (at least) one classical, rated game makes the path quite short to most elite players... But the Carlsen many of us played wasn't the Carlsen that plays Anand. :o)

On second thought, my closest link are a couple of blitz games against teenage Jan Gustafsson in the 1990's, who later became a GM and occasionally (Dortmund, Bundesliga, World Cup) played against the world top.
These were of course unrated, but I don't think my results (close to 0%) would have been better at classical time controls, even back then.

Hi Thomas!
" I never played against Anand, but I may have played against someone who played someone who played someone who played Anand ... "

I have played Michael Adams and Susan Polgar so now you can add to your resume that you've discussed chess with someone who's analyzed with player's who've analyzed with Anand .

"Why should there be a need for a "gold standard"? "

Stock standard practise that is why! .... if you want to measure something and have those measurements taken seriously then you must have a standard of measurement.... the alternative is astrology.

"The only purpose of the ELO or any other rating and ranking system is to compare players between each other, within a given population ..."

Sorry, no! Elo-type ratings have many purposes, are used by FIDE and others in the awarding of titles, invitations to competitions, team selection etc etc hence directly impinge on players careers and incomes. Therefore it is essential that any numerical ratings are based on sound mathematical/statistical principles. The links I have already referred to in this thread, and now examined I notice by frogbert, clearly expose Elo-type ratings as dubious.
In the link I gave the claim was made that Elo's "science" had never been peer reviewed outside the chess community and the only review that could be found was in The Journal of Gerontolgy (sic!!) and that "peer review" was done by Elo himself! If this claim alone is true and alarm bells are not ringing for you then I will reassess my offered $0.02!

As for the reviews carried out within the chess community these appear have been quite deficient, according to the discussion at the link I provided, missing some simple errors made by Elo.

"How golden is the gold standard? Engines are perfect (or as perfect as can be) when it comes to brute calculation - but their positional chess understanding is no better than the programmer's, and even this assumes he is a perfect teacher."

Sorry, quite wrong! ....where do you get this information from? .... a weak chessplayer but capable programmer can program a v. strong engine simply by using one or more GM's in consultation. It would not surprise me if some current engines have had Wch positional input.
I'm a passable/kludge programmer in several languages but if I was intending to market a strong engine my first actions would be to employ the best GM's I could afford with a wide range of styles. The engine I produced would far exceed both my tactical and positionl understanding.

Hi frogbert!
Somewhere amongst my few remaining neurons I seem to recollect that I downloaded an executable of the algorithm a while ago. I think three French guys working for Microsoft created it. Maybe Google something like " rating multi player games " will find it. Sorry tied up at the moment and can't find the scissors!

I don't think the gold standard had been based on peer-reviewed science either :)

"if you want to measure something ... then you must have a standard of measurement"
Competitive chess is ultimately about results, isn't it? If you win a game you were relatively better than the opponent, anything else is irrelevant - though it can still be discussed on chess blogs ,:)
Regarding the "many purposes" of ELO ratings you mention, most also refer to the relative ranking of players - and other aspects but bare numbers may enter the equation for team selection, maybe also tournament invitations: Who is more motivated? Who is the better team player? Who has potential for improvement?
For a GM title, you need at least two, mostly three norms AND a rating >2500, so it's hard to be lucky and benefit from deficiencies of the system several times!? And if one turns out to be a "weak GM" (rating barely above 2500 or dropping below again) the title isn't really worth that much ... .

While the article about the ELO system was written by Elo (why not? who else?) it was presumably reviewed by others. And there also doesn't seem to be a peer-reviewed article criticizing or refuting the ELO system!?

As to strengths of engines, OK I should have written "the team" rather than "the programmer". Fact is that engines aren't perfect:
- they misevaluate certain endings
- they haven't "killed" correspondence chess yet

Microsoft's rating system isn't all that secret. It's called Trueskill - some documentation here with a link to publication list.

http://research.microsoft.com/en-us/projects/trueskill/

As far as I can remember, Trueskill uses an extra parameter to track consistency. Whereas Elo has a single parameter (mean strength) and a blanket assumption on performance variation that's applied equally to all players, Trueskill tracks both mean and variance individually for all players. In theory this is reasonable because there isn't much doubt that some players are prone to longer bouts of good/bad form than others. I can't recall whether it assumes the same skill distribution function (e.g. Gaussian, Geometric) for all players or if it also models that separately.

In practice, Trueskill can be frustrating for the players. You'll hear the term "level-locked" to describe when your rating changes sluggishly over a long series of wins. The question is whether or not the system really knows you well enough to judge that a streak is just "par for the course" as opposed to legitimate improvement / regression. There may be other undesirable consequences of having different K-factors for each player - for example, rating changes are no longer zero sum.

Hi Bartleby!

"I don't think the gold standard had been based on peer-reviewed science either :)"

Gold standards are established in various ways typically after extensive peer-review of alternatives, sometimes horsetrading, sometimes political intervention etc etc.

To digress, for example ....
The SI unit system uses for its gold standard for length a meter as the base unit; determined by the length of the path traveled by light in a vacuum during a time interval of 1/299,792,458 of a second.

This of course requires us to already have defined a time "gold standard".

Time's gold standard has the "second" as the base unit; duration of 9,192,631,770 periods of
the radiation corresponding to the transition between the two hyperfine levels of the ground state in the cesium 133 atom.

Newer optical clocks that measure the oscillations of a trapped atom of Aluminium 27 have even higher precision than the atomic clock.
One day this may become the "gold standard" for time measurment.
This will likely occur if/when peer pressure builds sufficiently to effect this change.
The gold standard for length used to be a platinum bar kept at a fixed temperature in Paris.
So clearly gold stadards can and do change and are not always universally accepted.

The chess study of move accuracy in WCh chess matches used as their "gold standard" a particular chess engine and PC configuration and gave justifications for so doing. Since there is no universally accepted "gold standard" for chess move accuracy I have no problems with the authors proposing this. One of the reasons was stated to be the open-source nature of their engine.
Of course it is widely accepted that with the same PC configuration running say Rybka could well produce more accurate moves but whilst the evaluation codes of Rybka remain secret peer review may be problematic.

Hi Thomas!
"While the article about the ELO system was written by Elo (why not? who else?) it was presumably reviewed by others."

Sorry! .... you seemed to have missed the significant point? .... maybe my bad in not using capitals for emphasis. The discussion in the link I gave said that the only "peer-review" of the Elo system he could find was in the J. Gerontolgy and that the review was by Elo HIMSELF !!!!!

That's why I ended my initial post with "Help!"

"And there also doesn't seem to be a peer-reviewed article criticizing or refuting the ELO system!?"

This was one of the telling points of the discussion ie Elo had escaped critical review by maths/stats peers and had been adopted even though it was dubious "science".
The author at the link I gave was puzzled why this had happened ..... perhaps you are too maybe?

"As to strengths of engines, OK I should have written "the team" rather than "the programmer"."

No worries! I enjoy your contributions and read your posts with interest, time pressure unfortunately limits my contributions.

" Fact is that engines aren't perfect:
- they misevaluate certain endings
- they haven't "killed" correspondence chess yet"

Agreed!

As for correspondence chess, well I prefer to call it centaur-chess as it is most likely chessplayer+engine competing.
At the higher levels CC chess I liken more to a research project than a mere chess game.

A friend got to the quarter finals of the CC WCh.
I asked him did he use a chess engine. Answer, No, not really, only to put my moves through to check for blunders!

"A friend got to the quarter finals of the CC WCh."

Well, I've listened to a CC World Champion talk about how he was analysing some games in his WC tournament, and my impression was that engines were indeed used a lot - but interestingly with comments like "I had to leave 'it' running for xyz hours before it realized that ..., while engine rpq was able to come to this conclusion quite faster ..."

I'm sure some of the best CC players could've given very valuable input to the engine teams - even though it's probably hard to create heuristics that work well for all kinds of positions - even if using different sets for different "types" of positions (which again requires one to classify the position correctly). It will be a trade-off between simplicity and coverage - but I expect there to be lots of room left for tweaking still.

I heard roughly the same from my former clubmate Joachim Neumann, currently #6 on the ICCF rating list: "I do my own analyses, and double-check with an engine to rule out tactical blunders". [though this was about 10-15 years ago, and engines became even stronger in the meantime]

As to CC players providing input to engine teams: if heuristics become too detailed, it might backfire and introduce programming bugs!? Wasn't there once a game between two engines where one of them refused to recapture a piece - because some other aspects of the position were considered more important?

Twitter Updates

    Follow me on Twitter

     

    Archives

    About this Entry

    This page contains a single entry by Mig published on February 23, 2010 5:12 PM.

    Linares 2010 r5-8: Topalov the Creator was the previous entry in this blog.

    Linares 2010: Topalov Triumphant is the next entry in this blog.

    Find recent content on the main index or look in the archives to find all content.