HAL9000

HAL9000
"It just isn't conceivable that you can design a program strong enough to beat players like me."

January 17, 2016

Stockfish 7 DF 1.60 qualifies vs 7.JA and 121215.JA

Since stockfishchess.org still lacks of an official SF7 compile, it was necessary for me to decide which alternative should be used in Rapidroid, among 3 candidates:
1) Stockfish 7 bundled with Droidfish 1.60, compiled by Peter Österlund
2) Stockfish 7 compiled by Jim Ablett
3) Stockfish 121215 compiled by Jim Ablett, leader of the last Rapidroid release.

I don't like to simulate rapid games with blitz time controls but we need to choose between more samples of quicker games and less samples with usual time controls. And blitz is barely supportable.

As my Exynos devices were continously running for other Rapidroid tourneys, i moved the duels to a quad core Rockchip 3188 tablet which requires a downclocking to avoid freezing. First duel ran at 1.0GHz instead of 1.4 without any issues while the second ran at 1.2GHz with a lot of freezing and restarting. RK3188 really needs 1.0GHz to keep running alive with quad cores.

Time control was 180+2 for all duels and the openings were taken from TCEC-7, a total of 306 positions.

The first duel has been played between two SF7 compiles and SF7.DF160 won:
Program               Elo   +  - gam win dra los score oppo draws
1 Stockfish 7 DF160   3307 13 13 612  79 480  53 52.1% 3293 78.4%
2 Stockfish 7 JA      3293 13 13 612  53 480  79 47.9% 3307 78.4%


The second duel has been played between SF7.DF160 and SF 121215.JA. Suprisingly it was almost drawn:
Program               Elo   +  - gam win dra los score oppo draws
1 Stockfish 7 DF160   3301 12 12 612  64 488  60 50.3% 3299 79.7%
2 Stockfish 121215.JA 3299 12 12 612  60 488  64 49.7% 3301 79.7%


The error margins are still higher than the ELO gaps. This is a typical case of "no verdict". The statistics rules tell that more games are necessary to judge.

Then which one is the winner? My sixth sense definitely...

I decide to use DF.160 in Rapidroid even though nothing is proven statistically, bearing in mind that:
* This build is referred in Stockfish official site via a third party app
* It never fell behind other builds during both duels and led until the end.

The games, a summary and CfA snaphots can be downloaded: HERE

6 comments:

flither said...

That was my "gut feeling" about SF DF160 (though I haven't run any longer tournaments between SF's), because DF160 is much lighter in code, thus little bit faster (at least on my phone).

flither said...

I remember having problems with thermal throttling on my Galaxy s3 when running JellyBean long time ago.
It throttled itself to 1GHz on still safe battery temp 42, while cpu was set to 82 if I remember correctly, so even touching the phone triggered throttling (joke ;))
At that time I used DVFS disabler, but as far as I know it's only for Samsung Touchwiz roms, and you have to be rooted.
Depending on your Rockchip device and Android version there could be an easy way to fix throttling, but root is necessary.

Unknown said...

Скорость компиляций stockfish P.O.всегда быстрее чем у J.A,поэтому stockfish 7 P.O. имеет приоритет.

Unknown said...

Gurcan,у меня есть все основания полагать что stockfish 7 P.O. это stockfish 7 beta1 P.O.Вес движков одинаковый,результаты схоже.Вот почему Stockfish 7P.O. и stockfish 121215 показали примерно одинаковый результат.Вспомни тест который ты проводил с разными компиляциями stockfish и контролем 180+2,stockfish 7P.O. и stockfish 121215 выдали одинаковый ELO.А лучший результат показал stockfish 7beta2 J.A.

Unknown said...

Hello Raff, all my device are rooted. The problem with RK3188 is the absence of thermal sensor. Besides its kernel doesn't allow playing with voltages like Exynos. It freezes instead of throttling but although it's annoying, it better than throwing away suspected games. I just reset, reboot and continue with 1.2 ghz. At 1.0 ghz, no freeze btw.

Unknown said...

Hmm. Let me analyze again. Maybe i can do another closed tournament with longer time control, less builds and more games. However, it's difficult to obtain something fully trustable unless we play thousands of games. Besides, self play may be misleading. Stockfish test network has the same problem with self play. It's good to measure the improvement but you don't exactly know if the improvement has an effect against the rivals.