gensort Data Generator

The gensort program can be used to generate input records for various sort benchmarks. The program currently can be compiled on Linux systems or within a Cygwin environment. Sort benchnark contests may elect to extract parts of the gensort code and run them as part of a distributed application.

Download

The sort generator and sort validator can be downloaded here.

Standard Documentation

If the gensort program does not recognize its command line parameters, it prints the following to standard error:

usage: gensort [-a] [-c] [-bSTARTING_REC_NUM] NUM_RECS FILE_NAME
-a        Generate ascii records required for PennySort or JouleSort.
          These records are also an alternative input for the other
          sort benchmarks.  Without this flag, binary records will be
          generated that contain the highest density of randomness in
          the 10-byte key.
-c        Calculate the sum of the crc32 checksums of each of the
          generated records and send it to standard error.
-bN       Set the beginning record generated to N. By default the
          first record generated is record 0.
NUM_RECS  The number of sequential records to generate.
FILE_NAME The name of the file to write the records to.

Example 1 - to generate 1000000 ascii records starting at record 0 to
the file named "pennyinput":
    gensort -a 1000000 pennyinput

Example 2 - to generate 1000 binary records beginning with record 2000
to the file named "partition2":
    gensort -b2000 1000 partition2

Partitioned Input

Gensort can be used to generate separate input partitions. This allows multiple instances of gensort to be run in parallel to generate the sort benchmark input.

The following output demonstrates that data can either be generated in partitions (part0 and part1), or in one file (whole), and that the concatenation of the two partitions (caten) is the same as the whole:

$ gensort -b0 1000 part0
$ gensort -b1000 1000 part1
$ cat part0 part1 > caten
$ gensort -b0 2000 whole
$ cmp whole caten
$

valsort Output Validator

The valsort program can be used to validate the sort output file is correct. It performs the following tasks:

Standard Documentation

If the valsort program does not recognize its command line parameters, it prints the following to standard error:

usage: valsort [-i] [-q] [-o SUMMARY_FILE] [-s] FILE_NAME
-i        Use case insensitive ascii comparisons (optional for PennySort).
          Case sensitive ascii or binary keys are assumed by default.
-q        Quiet mode, don't output diagnostic text.
-o SUMMARY_FILE  Output a summary of the validated records. This method
          can be used to validate partitioned sort outputs separately.
          The contents of the separate summary files can then be
          concatenated into a single file that can be checked using
          the valsort program with the -s flag.
-s        The file to validate contains partition summaries instead of
          sorted records.
FILE_NAME The name of the sort output file or the partition summaries
          file to validate.

Example 1 - to validate the sorted order of a single sort output file:
    valsort sortoutputfile

Example 2 - to validate the sorted order of output that has been
partitioned into 4 output files: out0.dat, out1.dat, out2.dat and out3.dat:
    valsort -o out0.sum out0.dat
    valsort -o out1.sum out1.dat
    valsort -o out2.sum out2.dat
    valsort -o out3.sum out3.dat
    cat out0.sum out1.sum out2.sum out3.sum > all.sum
    valsort -s all.sum