The gensort program can be used to generate input records for various sort benchmarks. The program currently can be compiled on Linux systems or within a Cygwin environment. Sort benchnark contests may elect to extract parts of the gensort code and run them as part of a distributed application.
The sort generator and sort validator can be downloaded here.
If the gensort program does not recognize its command line parameters, it prints the following to standard error:
usage: gensort [-a] [-c] [-bSTARTING_REC_NUM] NUM_RECS FILE_NAME
-a Generate ascii records required for PennySort or JouleSort.
These records are also an alternative input for the other
sort benchmarks. Without this flag, binary records will be
generated that contain the highest density of randomness in
the 10-byte key.
-c Calculate the sum of the crc32 checksums of each of the
generated records and send it to standard error.
-bN Set the beginning record generated to N. By default the
first record generated is record 0.
NUM_RECS The number of sequential records to generate.
FILE_NAME The name of the file to write the records to.
Example 1 - to generate 1000000 ascii records starting at record 0 to
the file named "pennyinput":
gensort -a 1000000 pennyinput
Example 2 - to generate 1000 binary records beginning with record 2000
to the file named "partition2":
gensort -b2000 1000 partition2
Gensort can be used to generate separate input partitions. This allows multiple instances of gensort to be run in parallel to generate the sort benchmark input.
The following output demonstrates that data can either be generated in partitions (part0 and part1), or in one file (whole), and that the concatenation of the two partitions (caten) is the same as the whole:
$ gensort -b0 1000 part0 $ gensort -b1000 1000 part1 $ cat part0 part1 > caten $ gensort -b0 2000 whole $ cmp whole caten $
The valsort program can be used to validate the sort output file is correct. It performs the following tasks:
If the valsort program does not recognize its command line parameters, it prints the following to standard error:
usage: valsort [-i] [-q] [-o SUMMARY_FILE] [-s] FILE_NAME
-i Use case insensitive ascii comparisons (optional for PennySort).
Case sensitive ascii or binary keys are assumed by default.
-q Quiet mode, don't output diagnostic text.
-o SUMMARY_FILE Output a summary of the validated records. This method
can be used to validate partitioned sort outputs separately.
The contents of the separate summary files can then be
concatenated into a single file that can be checked using
the valsort program with the -s flag.
-s The file to validate contains partition summaries instead of
sorted records.
FILE_NAME The name of the sort output file or the partition summaries
file to validate.
Example 1 - to validate the sorted order of a single sort output file:
valsort sortoutputfile
Example 2 - to validate the sorted order of output that has been
partitioned into 4 output files: out0.dat, out1.dat, out2.dat and out3.dat:
valsort -o out0.sum out0.dat
valsort -o out1.sum out1.dat
valsort -o out2.sum out2.dat
valsort -o out3.sum out3.dat
cat out0.sum out1.sum out2.sum out3.sum > all.sum
valsort -s all.sum