The aim of this Repository is to stimulate better practice in benchmarking (performance comparison of methods) for cluster analysis by providing a variety of well documented high quality datasets and simulation routines for use in practical benchmarking.
The repository collects datasets with and without given "true" clusterings. A particular feature of the repository is that every dataset comes with a comprehensive documentation, including information on the specific nature of the clustering problem in this dataset and the characteristics that useful clusters should fulfill, with scientific justification.
Note: Up to January 15, 2017 data sets may be contributed to this repository within the framework of a challenge! More information on this challenge is available here.