HAL9000

HAL9000
"It just isn't conceivable that you can design a program strong enough to beat players like me."

March 21, 2014

A method to rate Android chess programs

An accurate rating list is not easy at all to establish. It would be, if chess programs were eggs and we would just grab two of them and just bump! And the winner is the one which stands still in one piece. Though, the loser can't be rated again.

A study of various methods points out nothing but constraints. We must find an optimum balance each time and this balance must remain valid long enough:

1) More samples for more accuracy need more time. Spread money on more host devices or wait years for accuracy?

2) Quick samples are possible but they lose precision. Sacrifice number of samples or sample quality vs time?

3) Programs are updated all the time before we finish rating a version. Diversify and measure again or stack all versions like human players?

4) Hardware capacities increase all the time. Spread money to diversify hosting devices and remeasure or stay outdated on the same basis?

Really hard to choose... There's no concrete correct answer but we may have specific options which suit better existing ressources.

That's why different people have defined and will define different rules for their studies.

Mines are the following and i hope they will remain unchanged for a long time:

* Platform: Any platform with few or outdated data (mainly Android, retro machines, old legends and everthing out-of-the-standard) is welcome. Winboard and UCI engines are clearly out of scope as many people handle them very well already. I may only need some UCI engines as reference for calibration of the list.

* Time control: I'm all alone with one device and limited finance. Thus, tournament timings are beyond the capacity. Bullet modes such as 1/1 or 1'+1" are quick and attractive but fast time controls are always tricky as any deviation of time usage may degrade the result. Loss of accuracy is of untolerable level, so i go for something in between which is 5 seconds/move. This setting is present in almost every program or machine since computer chess began.

* Opening book: No own book, unless it's impossible to disable. I prefer a car race with standard tyres only and there are plenty of trusted opening suites available.

* Pondering: No. I do not and can not use automated play between two devices connected. It's "a tester's dream" by the way. SSDF must be mentioned here as they connect two pc's for two programs.

* Playing mode: Double round robin tournaments with both sides of the same opening played by programs of comparable strength. The opening is unique in each tournament. After all divisions finish, one round is completed. Maximum 1/3 of players on top of the division go one division up and last 1/3 go down. This exchange between divisions ensure optimum linking between divisions and minimizes segmentation effects on the list. Therefore, ratings are updated before the next round starts with a different opening from the suite. Any device or program without tournament automation is subject to manual play against selected opponents. This ones will hurt a lot indeed.

* Tablebases: Disabled because many engines using tablebases on pc versions can not use them under Android. "Same for all" rule looks fair here. Another constraint is limited system ressources of the Android environment. Tablebases are potential ressource consumers as they require huge storage area.

* Hash tables: Yes. 64MB for Android. Same for others whenever possible and effectively used.

* Program updates: Not requested, not welcomed at all by a tester. But unfortunately there will be some :-) Each new version is a new player in the arena and repeats all past rounds to join others. In each round, the new version plays with all participants of the best matching division. This method remain accurate but generates important interruptions.

* Hardware updates: Future devices will allow stronger play indeed. If i was to rate only Android programs, it would be possible to assume added playing power is the same for all programs and i could keep all previous input. But once any chess computer or non Android app is included in the list i must diversify the hardware and repeat the whole story like SSDF does. Pay off is very time consuming in that option. My decision is simply to wait and see given that my current device, Samsung Galaxy Note 2 is not yet outdated.

* Graphical user interface: Aart Bik's Chess for Android. Although it still lacks a lot of features, that's the only one which allows automated engine tournaments. What Arena is under Windows, for me Chess for Android is the same under Android.

No comments: