Predicting the Outcome of Chess Games based on Historical Data

by Diogo R. Ferreira

This page describes the approach that finished 4th place in the Chess ratings - Elo versus the Rest of the World competition.

The approach is based on the Bradley-Terry model, but with a slightly different interpretation of what the strength of a player is, and with a custom procedure for estimating that strength.

This following report provides a detailed explanation of the approach, together with some information about how the parameters have been tuned:

Diogo R. Ferreira, Predicting the Outcome of Chess Games based on Historical Data, IST - Technical University of Lisbon, November 2010 [PDF] [BibTeX]

The following source code contains an implementation of the approach in Python 2.6:

estimate.py

The program reads the training data for months 1-100, estimates the strength of all players, reads the test data for months 101-105, and writes the predictions in the CSV format required by the competition.

To run the program, you will need to download the following files from this location:

training_data.csv
test_data.csv

The program can be run with the command:

python estimate.py

It will create (or overwrite) the output file "predictions.csv" with the predictions for months 101-105.

Note that the program has a set of parameter values (beta) which can be adjusted to produce different results or can be tuned according to some error measure other than the month-aggregated RMSE used in the competition. For details, please refer to the report above.

The values contained in the source code are the original values which yielded the 4th place in the competition, but it is possible to do better.

Last updated: 2010-11-25