Following plenty of gauntlets, here's another update with some strong newcomers that we've been waiting since a long time.
Despite the high hopes, Fritz and Gull were not able to break into the top 5. Stockfish 7 still crushes everything that gets on its way. Fritz and Gull could bite Komodo from time to time but they can't steal enough draws from Stockfish. That makes the gap between the fish and the reptile more visible now. Android needs a new Komodo version...
The highlights of this release:
* ADDED Fritz 14
* UPDATED Gull 1.2 to 3
* UPDATED Ivanhoe 9.46h to 9.47c beta
* DELETED Cylcone due to similarity with Grapefuit & Toga
* DELETED The Mad Prune due to similarity with Grapefuit & Toga
After 4 extra days of delay, i was able to finish another complete round and to increase the number of games by 18, so that this release is not only an introduction of updates but also some more games by all engines. Narrower error margins matter, though, it's still more than 30 ELOs.
The nasty rule is: Less error requires more games >> More games require more full rounds >> More full rounds require less gauntlets, less updates. Unfortunately, the latter happens very rarely.
One thing which gets clear is the relationship between the average ELO difference between players and the accuracy of the ranking.
Shortly said, the main targets of a perfect rating list are:
* Unlimited & equal number of games played by each engine
* 100% draw rate
* Average ELO difference between opponents of each game = 0
* For each engine, a perfect bell curve distribution of the number of games vs all opponents within the neighborhood of +/- 100 ELO
Sure, no list can reach above targets at once. One can get close to the last one with extreme care but the first three are utopic and impossible.
No matter what, in order to improve the quality of Rapidroid, i've recently decided to continue with less promotions between divisions. It used to move 3 engines up and 3 engines down out of 10 engines after each round.
As it often leads to pairings beyond 200 ELO gap, 2 up and 2 down should be better.
As it often leads to pairings beyond 200 ELO gap, 2 up and 2 down should be better.
Now, i expect that the draw ratio will increase. The average ELO gap must decrease in parallel. At present, Rapidroid has a low 25.7% draw ratio and ~100 ELO average gap which both mean the overall pairing scenario has been too agressive.
Targets for next release:
* UPDATE Arasan 18.2 to 18.3
* UPDATE Rodent 1.7 to II
* UPDATE Tucano 5.0 to 6.00
Have a nice checkmate!
BAYES RATINGS AFTER 17173 GAMES PLAYED BY 124 PROGRAMS
Rnk Name O/S T Elo + - gam sco oppo dra
001 Stockfish 7 A32 4 3352 38 36 336 85% 3077 29%
002 Komodo 9.3 A32 4 3281 35 34 336 76% 3084 33%
003 Critter 1.6a A32 4 3140 30 30 344 57% 3097 47%
004 Firenzina 2.4.1 xTreme A32 4 3109 30 30 340 53% 3092 49%
005 Sting SF 4.8.4 JA A32 4 3109 32 31 338 57% 3057 39%
006 BlackMamba 2.0 A32 4 3072 30 30 338 47% 3096 49%
007 Rybka 2.3.2a mp W64 4 3070 109 107 26 54% 3044 38%
008 Fritz 14 A32 4 3067 30 30 336 49% 3078 48%
009 Gull 3 x64 (syzygy) A32 4 3052 31 31 332 50% 3054 41%
010 Texel 1.05 A32 4 3037 31 31 342 48% 3055 35%
011 Senpai 1.0 A32 4 3016 32 32 338 51% 3011 36%
012 DeepSaros ver.2.3f A32 4 3010 32 32 334 51% 3003 36%
013 Hiarcs 13.71 IOS 2 2982 121 113 26 65% 2873 23%
014 RobboLito 0.085e4l A32 1 2976 31 31 342 51% 2968 42%
015 IvanHoe 9.47c beta A32 1 2970 32 32 326 54% 2948 39%
016 Cheng 4.39 A32 4 2938 32 32 334 44% 2987 38%
017 Shredder 1.7.0 IOS 2 2918 114 113 24 54% 2901 42%
018 Hakkapeliitta 3.0 A32 1 2905 32 32 328 49% 2913 34%
019 Scorpio_2.7.7.JA_xb.arm7 A32 4 2899 36 36 284 56% 2853 28%
020 ExChess_7.88b.JA_xb.arm7 A32 4 2896 32 32 322 44% 2945 38%
021 Gaviota v1.0-d A32 4 2894 32 32 340 44% 2936 31%
022 Arasan 18.2 A32 4 2894 32 32 330 50% 2895 31%
023 Grapefruit 1.0 A32 4 2855 30 30 340 46% 2877 41%
024 Toga II 3.0 A32 1 2838 32 32 314 55% 2805 37%
025 Deep Saros 0.9 A32 4 2822 31 31 330 47% 2836 39%
026 DiscoCheck 5.2.1 A32 1 2821 32 32 340 43% 2875 32%
027 Deuterium v14.3.34.130 A32 1 2776 31 31 332 50% 2774 40%
028 Bobcat 6.4b A32 1 2774 32 32 328 49% 2775 30%
029 Doch32 1.3.4 JA A32 1 2771 32 32 324 48% 2787 37%
030 Crafty_25.0.JA_xb.arm7 A32 1 2769 32 32 314 54% 2742 36%
031 Fruit Reloaded 2.1 A32 1 2765 32 31 308 48% 2780 43%
032 Murka 3 A32 1 2763 31 31 338 49% 2768 37%
033 Chess Pro 2016.02 IOS 2 2760 111 113 22 45% 2788 55%
034 GNU Chess 5.60 A32 1 2741 33 33 316 51% 2732 28%
035 The King 3.50 x64 W64 1 2733 51 52 122 44% 2772 34%
036 Strelka 5 A32 1 2722 33 33 314 52% 2708 30%
037 RedQueen 1.1.98 A32 4 2678 33 33 310 53% 2656 29%
038 CNVCS 1.2.0 IOS 2 2665 108 109 26 48% 2671 42%
039 Tucano_5.00.JA_xb A32 1 2654 35 36 260 48% 2666 35%
040 Rodent 1.7 build 1 A32 1 2645 33 33 314 48% 2659 29%
041 Rhetoric 1.4.1 A32 1 2628 33 33 312 54% 2598 32%
042 Mini Rodent 1.0 A32 1 2624 35 35 268 46% 2649 33%
043 Bison 15.1 A32 1 2615 33 33 314 51% 2610 31%
044 Chess Genius 4.0.00 IOS 2 2580 194 254 8 13% 2831 25%
045 Alfil 12.10 A32 1 2577 33 33 314 49% 2583 29%
046 Rotor 0.8 A32 1 2547 33 33 320 47% 2570 31%
047 Daydreamer 1.75 JA A32 1 2546 32 32 322 50% 2550 34%
048 Cheese 1.7 A32 1 2535 33 34 312 43% 2585 30%
049 Fridolin 2.00 A32 4 2515 32 32 314 53% 2492 33%
050 Chess Genius 2.6.4 A32 1 2514 224 239 4 38% 2562 75%
051 GarboChess 3 A32 1 2501 33 33 312 48% 2520 26%
052 Glaurung Mainz A32 1 2497 41 41 208 43% 2545 26%
053 Danasah_5.07.JA_xb A32 1 2487 35 35 296 55% 2446 27%
054 Sloppy_0.23.JA_xb A32 1 2482 33 33 300 50% 2477 33%
055 BBChess 1.3b JA A32 4 2475 32 32 324 51% 2465 29%
056 Maverick 1.5 arm A32 1 2470 33 32 326 56% 2421 31%
057 Dirty_030411.JA_xb A32 1 2467 35 35 294 52% 2454 28%
058 Phalanx_XXIV.JA_xb.arm7 A32 1 2455 35 35 302 49% 2464 20%
059 Pawny_1.0.JA_uci2xb A32 1 2428 33 33 312 51% 2416 32%
060 GreKo_12.5.JA_xb A32 1 2418 34 34 302 52% 2399 25%
061 Pepito v1.59 A32 1 2418 32 32 320 50% 2419 33%
062 BetsabeII_1.47.JA_xb A32 1 2399 36 36 300 50% 2397 18%
063 Ifrit_m1.8.JA_uci2xb A32 1 2381 34 34 306 55% 2344 28%
064 Diablo 0.5.1b JA A32 1 2345 33 33 326 52% 2329 25%
065 zurichess geneva A32 1 2342 51 51 140 49% 2347 26%
066 Typhoon_1.0.r358.JA_xb A32 1 2341 34 34 318 54% 2306 24%
067 Olithink_5.3.2.JA_xb A32 1 2324 34 34 316 51% 2312 21%
068 Amy_0.8.JA_xb A32 1 2294 34 34 330 50% 2292 21%
069 Myrddin_0.86.JA_xb A32 1 2278 35 35 314 48% 2283 21%
070 TJchess 1.1U A32 1 2273 33 33 346 48% 2286 23%
071 Natwarlal_0.14.JA_xb A32 1 2272 34 33 328 52% 2248 23%
072 Bitfoot 150922.JA A32 1 2269 34 34 348 57% 2207 15%
073 MangoPaola_1.1.JA_xb A32 1 2261 34 34 326 50% 2253 21%
074 Sungorus 1.4 JA A32 1 2241 34 34 322 48% 2251 23%
075 KmtChess_1.21.JA_xb A32 1 2197 34 34 332 52% 2183 22%
076 Rattate_Nosferatu.JA_xb A32 1 2182 34 34 336 49% 2193 17%
077 NGplay_9.86.JA_xb A32 1 2176 33 33 332 52% 2164 23%
078 Scidlet_2.61b2.JA_xb A32 1 2168 35 34 336 52% 2149 13%
079 Resp_0.19.JA_xb A32 1 2140 33 33 348 50% 2141 19%
080 Clubfoot 150907.JA A32 1 2116 36 35 338 62% 2014 14%
081 DanChess_1.04.JA_xb A32 1 2084 35 35 326 51% 2077 17%
082 Floyd 0.7 JA A32 1 2082 35 35 344 56% 2029 12%
083 Kurt 0.9.2.2 JA A32 1 2049 34 34 348 47% 2077 18%
084 Robocide 28.12.14.JA A32 1 2033 32 32 372 51% 2027 19%
085 Witz_Alpha21.JA_xb A32 1 2017 34 34 330 48% 2037 20%
086 Woodpecker_2.11.JA_xb A32 1 1994 35 35 324 50% 1991 16%
087 Knightcap_3.7F.JA_xb A32 1 1980 35 35 308 52% 1959 18%
088 AdroitChess0.4 JA A32 1 1970 35 35 336 48% 1984 16%
089 BikJump v1.8 A32 1 1961 32 33 358 46% 1996 22%
090 Sjeng_1.12.JA_xb A32 1 1944 35 35 314 50% 1943 16%
091 Gunborg_1.39.JA_uci2xb A32 1 1942 38 37 294 60% 1861 20%
092 Leonidas_r83.JA_xb A32 1 1931 35 35 316 47% 1954 19%
093 ZCT-0.3.2500 A32 1 1917 35 35 328 42% 1980 12%
094 Faile_1.44.JA_xb A32 1 1909 34 34 304 47% 1929 28%
095 Samchess_JA_xb A32 1 1897 36 36 314 42% 1966 17%
096 Mephisto Roma Turbo W64 1 1896 79 83 56 37% 1997 16%
097 Cilian_4.14.JA_xb A32 1 1894 34 34 328 53% 1870 26%
098 Ecce rev. 508 A32 1 1856 35 35 324 44% 1908 16%
099 Sayuri 2015.10.01 A32 4 1840 35 35 330 53% 1800 14%
100 Colchess_8.0.JA_xb A32 1 1818 37 37 274 52% 1793 24%
101 Smash 1.03 JA A32 1 1814 35 35 336 50% 1807 14%
102 Claudia v. 0.5 A32 1 1811 52 52 158 51% 1804 13%
103 Surprise_4.3.b13.JA_xb A32 1 1805 49 49 166 48% 1816 16%
104 Zzzzzz_3.5.1.JA_xb A32 1 1716 37 37 264 49% 1702 31%
105 Hoichess_0.12.1.JA_xb A32 1 1713 35 36 318 48% 1716 19%
106 Chenard_2015.08.15.JA_xb A32 1 1702 42 42 262 47% 1694 10%
107 Kitteneitor_060513.JA_xb A32 1 1695 37 38 254 47% 1690 35%
108 Tscp_1.8.1.AB_xb A32 1 1686 40 40 260 47% 1686 16%
109 Jester_0.84.JA_xb A32 1 1681 42 42 248 49% 1644 13%
110 Colossus 4.0 100X C64 1 1678 237 208 10 80% 1300 20%
111 Rocinante 2.0 JA A32 1 1675 37 37 316 50% 1634 18%
112 Pulse 1.5-cpp A32 1 1621 37 37 320 55% 1521 28%
113 Mephisto Roma 68020 UCI W64 1 1616 124 134 20 35% 1718 30%
114 VIRUTOR CHESS 1.1.1 A32 1 1486 41 41 320 52% 1425 10%
115 Superpawn b108 JA A32 1 1455 47 47 212 54% 1415 19%
116 K2 v.075 A32 1 1439 52 51 206 57% 1372 6%
117 Chess for Android A32 1 1434 41 42 313 52% 1371 14%
118 Chess Titans W64 1 1336 212 203 7 57% 1306 29%
119 Trappy_Beowulf_2.0.JA_xb A32 1 1196 44 46 308 37% 1317 9%
120 Colossus 4.0 C64 1 1171 170 185 14 36% 1253 14%
121 Byak 8.10.14.JA A32 1 1128 47 49 226 25% 1372 18%
122 Xadreco_5.7.JA_xb A32 1 1018 54 59 220 14% 1393 10%
123 Novag Secondo TTC 1 945 279 20 6 42% 964 17%
124 OliveChess 0.2.7 A32 1 568 390-352 180 0% 1503 0%
Rapidroid test platform:
* GT-N7100 1.6 * 4 + 256MB hash: All Android progs
* GT-N5105 1.6 * 4 + 256MB hash: All Android progs
* Codegen Novatab 1.4 * 4 + 256MB: Single thread Android progs,
* Polypad 1010IPS tablet 1.61 * 2 + 128MB: Single thread Android progs
* HTC Diam 528Mhz, 16MB hash: Windows Mobile
* i7 M620 2.67 Ghz dual + Arena 3.5 + 2GB hash: Windows 64
* iPhone5S A7 1.3 Ghz * 2: iOS progs
* DosBox 1.74: DOS progs
* WinVICE 2.24: Commodore-64 progs
* Messtiny UCI adapters or CB-Emu2014: Mephisto progs
* Openings: 20 ply from Adam Hair or 16 ply from TCEC, no Q exchange, +0.15 to +0.40 eval by Stockfish and Komodo, depth 20 minimum, played twice both sides
* Repeating openings and twin games not allowed between two programs
* Tablebases and pondering off
* Time control: 10 to 30 sec/move or 600+0 to 1800+5 or closest known by both programs.
Calibration:
* Based on 42 engines rated in CCRL 40/4
* 32 bit engines: Exynos 4412 = (Athlon X2 4600+) - 65 ELO
* 64 bit engines: Exynos 4412 = (Athlon X2 4600+) - 110 ELO
* Bayeselo offset = 2309 (Mean of ELO error vs target: 37.74)
6 comments:
Hi Gurcan!
Do you know how to add opening book to SF 7 in C4a?
Hello Gurcan,
Indeed, we need next release Komodo or "new" reptile/fish in rating list ;-)
Good job! Thank you :-)
What program do you use to calculate/record your Bayes Ratings? Is it a program or spreadsheet that you can publish a link for on here? Thanks.
I maintain all the rounds in a huge excel file where all openings, promotions and updates are monitored. Is this what you wanna see? Other than xls, there's nothing specific: Some Arena help to verify, one big pgn per round and bayeselo to calculate
Alex, no. It doesn't work. Only Komodo and a few others which come with indirect methods like Critter.
Thanks for the information. I've been manually entering and keeping tournaments and gauntlets on a spreadsheet too and manually calculating the elo, but wanted something more automatic. I found both the Bayes program and ELOstat that pulls all the data straight off the tour.pgn file that CfA creates. Really liking ELOstat as it is VERY easy to use. Thanks again.
Post a Comment