Sorting remains a key primitives for many uses of database systems. The task this year is to sort four datasets of varying size, skew, and data types. The key constraints here are that memory usage is limited, as only 30 GB of memory is allowed while the datasets vary in size from 10 GB to 60 GB.
Input to your program will be provided by a path to an input file to partition or sort and output must be output to a specified output file.
Our test harness will first give the path to the input file to sort to your program's first argument, and the path to the output file to your program's second input. The files to be used are in the format generated by gensort, where the first 10 bytes are the key and the last 90 bytes the payload. We will iteratively give your program each of the three files one at a time once we have verified the sorting of the current file is correct. Sorting will be validated by valsort. The data sizes are as follows: 10 GB, 20 GB, 60 GB. 20 GB will be in ASCII, 60 GB will have skewed keys. We will start by supporting small and medium evaluation, and expand to large through the weeks.
Your solution will be evaluated for correctness and execution time.
Quick Start Package
We provide a quick start package in C++. It is in the format required for submission. This project is only meant to give you a quick start into the project and to dig right into the fun (coding) part. It is not required to use the provided code.
Our testing infrastructure will evaluate each submission by unpacking the submission file, compiling the submitted code (using your submitted compile.sh script), and then running a series of tests (using your submitted run_sort.sh script). Extraction time is limited to 10 seconds, and compilation time is limited to 60 seconds. Submissions that exceed these limits will not be tested at all.
Each test has its own time limit, for all sizes of the data. If the submission exceeds that time, we will stop the test as a failure. These might be relaxed if they prove too constricting.
Submissions will be unpacked, compiled and tested in a network-isolated container with characteristics shown in the following table:
|Processor||2x Intel Xeon E5-2640 v4 CPU (2.40 GHz)|
|Configuration||20 Cores / 40 Hyperthreads|
|Main Memory||Contestants are limited to 30 GB of main memory.|
|Operating System||Ubuntu 17.10 (GNU/Linux 4.13.0-32-generic x86_64)|
|Software||Autoconf 2.69, Automake 1.15, Boost 1.62, CMake 3.9.1, Go 1.8.3, Maven 3.5.0, Node.js 6.11.4, OpenJDK 8, Python 2.7.14, Python 3.6.3, Ruby 2.3.3, Rust 1.23.0, TBB 2017~U7, ant 1.9.9, clang 5.0, gcc/g++ 7.2.0, gccgo 7.2.0, jemalloc 3.6.0|
|Dockerfile||Dockerfile is available for download here|
A submission is considered to be rankable if it correctly processes all of the test workloads within their time limits. The leaderboard will show the best rankable submission for each team that has at least one such submission.
This script is responsible for running your sort executable, which should interact with our test harness according to the testing protocol, reading from a file path given and writing to a file path given..
This file must contain information about the submission, including:
- team name
- for each team member: full name, e-mail, institution, department, and degree program
- advisor/supervisor name (if any)
- a brief description of the solution
- a list of the third party code used and their licenses
The package must contain all of the source code.
This shell script must be able to compile the source contained in the src directory. The produced executable file(s) must be located within the src directory. The benchmark environment is isolated and without internet access. You will have to provide all files required for successful compilation.
You can use the quick start package, which has the required format, as a starting point.
- The 2019 SIGMOD Programming Contest is open to undergraduate and graduate students from degree-granting institutions all over the world. However, students associated with the organizers' institution (University of Wisconsin - Madison) are not eligible to participate.
- Teams must consist of individuals currently registered as graduate or undergraduate students at an accredited academic institution. A team may be formed by one or more students, who need not be enrolled at the same institution. Several teams from the same institution can compete independently, but one person can be a member of only one team. There is no limit on team size. Teams can register on the contest site after 19 March, 2019.
- All submissions must consist only of code written by the team or open source licensed software (i.e., using an OSI-approved license). For source code from books or public articles, clear reference and attribution must be made. Final submissions must be made by 2 May, 2019, 23:59 CEST (UTC+02:00).
- All teams must agree to license their code under an OSI-approved open source license. By participating in this contest, each team agrees to publish its source code. The finalists' implementations will be made public on the contest website.
- A team is eligible for the prize if at least one of its members will come to present the work at the SIGMOD 2019 conference. The travel grants will only be granted to eligible teams.
- Contest submissions must match the criteria established by the Sort Benchmark contest. Values cannot be generated from just the first 20 bytes, the entire 100 byte value must be worked with and sorted in its entirety. Submissions that fail to do this will be disqualified once the contest period ends and the source code is examined.