This is the artifact accompanying the submission "Efficient Generation of Error-Inducing Floating-Point Inputs via Symbolic Execution".
The original results are produced on a workstation with Intel(R) Xeon(R) Gold 6238 CPU (8 cores, 2.10GHz), 32GB RAM, and the operating system is Ubuntu 14.04.5 LTS. To reproduce the results, a machine with similar CPUs(~2.10GHz and at least 8 cores), 32GB or larger RAM is required. Running the artifact on a different machine could possibly diverge the execution and lead to different results. Moreover, all the experiments have to be run sequentially.
$ docker pull ucdavisplse/fpgen-artifact:icse20
$ docker run -ti --name=FPGen --cpus=8 --memory=32g ucdavisplse/fpgen-artifact:icse20
Notes:
when inside the FPGen container,
(1) to exit the FPGen docker container and terminate all running jobs inside the container, use:
$ exit
(2) to detach from the running FPGen container without stopping it,
press CTRL+P followed by CTRL+Q
when outside the FPGen container and looking to start/attach to it, run:
$ docker start -ai FPGen
Inside FPGen container, navigate to home directory, clone this repo. and rename it,
$ cd
$ git clone https://github.com/ucd-plse/FPGen.git
$ mv FPGen FPTesting
Note: you can ignore the example and directly go to the next section if you are only looking to replicating/reproducing the results of FPGen.
navigate to example
cd /home/fptesting/example; ls
We use the recursive summation program that takes an input array of size 4 as an example. There are two c files needed to run FPGen:
- add-SYMBOLIC.c : the original program with error injections
- test/add-INOUT.c : the original program that is modified to read the input array from a file and write the result to the output file, and can be compiled to various precision (float, double, __float128) for error measurement.
(1) compile and symbolically run the test
$ ../scripts/compile.sh add-SYMBOLIC
This step enables KLEE to explore the error paths and it will take 2~3 minutes. When KLEE finishes, you should see the number of paths KLEE has reached,
KLEE: done: total instructions = 4133
KLEE: done: completed paths = 84
KLEE: done: generated tests = 84
(2) read the inputs generated by KLEE
$ ../scripts/show_tests.sh
(3) prepare the executables with multiple precisions
$ cd test
$ make
$ ls
There are 3 executables: add-32 with float precision, add-64 with double precision, _add-128_with __float128 precision. (4) Measure errors for the generated inputs
$ cd ../../example
$ ../scripts/compute_error.sh "klee-last/test0000*-input" test/add-64 test/add128
It measures the errors between double and __float128, and prints out the top 10 error-inducing inputs and the relative error the input triggers like following,
Maximum relative error: 5.1607477707e-05 [klee-last/test000084-input].
---top 10 inputs
klee-last/test000084-input -- 5.1607477707e-05
klee-last/test000080-input -- 2.1288082770e-09
klee-last/test000064-input -- 5.0207933244e-16
klee-last/test000077-input -- 1.6766168035e-16
klee-last/test000072-input -- 1.1026481444e-16
klee-last/test000067-input -- 1.0747657680e-16
klee-last/test000079-input -- 1.0714443092e-16
klee-last/test000066-input -- 8.8272423124e-17
klee-last/test000082-input -- 4.1739994958e-17
klee-last/test000063-input -- 6.9408001417e-18
The maximum error triggered is 5.1607477707e-05
and the error-inducing input is klee-last/test000084-input
.
navigate to benchmarks
$ cd /home/fptesting/FPTesting/benchmarks; ls
The benchmarks are divided into three parts:
(1) summations (3 summation tests):
recursive-summation pairwise-summation compensated-summation
(2) matrix (9 meschach tests):
sum 1norm 2norm dot conv mv mm lu qr
(3) gsl (15 gsl tests):
wmean wvariance wsd wvariance-w wsd-w wtss wtss-m wvariance-m wsd-m wabsdev wskew wkurtosis wabsdev-m wskew-m wkurtosis-m
In each part, there is a README file (click the link below) that describes the detailed steps to replicate the results of FPGen or the 3 baseline tools RANDOM, S3FP and KLEE-FLOAT for one, some or all tests in that part.
- summations https://github.com/ucd-plse/FPGen/tree/master/benchmarks/summations
- matrix https://github.com/ucd-plse/FPGen/tree/master/benchmarks/matrix
- gsl https://github.com/ucd-plse/FPGen/tree/master/benchmarks/gsl
navigate to benchmarks
$ cd /home/fptesting/FPTesting/benchmarks
To replicate the results of FPGen for all 27 tests at one time, use fpgen_all.sh, and the run will take approximately 54-60 hours:
$ nohup ./fpgen_all.sh &
The FPGen results will be collected in file result-fpgen.txt. You can either inspect the results manually by comparing them to reference/result-fpgen-all.txt or use our utility script to compare the results automatically by running:
$ ../scripts/cmp_to_ref.sh -a result-fpgen.txt reference/result-fpgen-all.txt
The comparison script will report "check pass" or "check failed" for each test.
To replicate the results of RANDOM/S3FP/KLEE-FLOAT for all tests at one time, use baselines_all.sh, and the run will take approximately 108-120 hours:
$ nohup ./baselines_all.sh &
The RANDOM/S3FP/KLEE-FLOAT results will be collected in file result-baselines.txt. Inspect the results by comparing them to reference/result-baselines-all.txt.
FPGen starts with a random search that provides the base values of the input data for the algorithm to further improve. The random search, however, is performed with a time threshold which we found is unstable across machines even with similar hardware specifications. To increase the chance of replication/reproduction, we provide the base values in the artifact and skip this step, which is obsereved to be helpful in replicating/reproducing the FPGen results in other machines. To ignore the base values we provide and start FPGen from scratch, use fpgen-from-scratch.sh for gsl and matrix tests, and the same fpgen.sh for summations tests.