README

Common Lisp benchmarking suite
==============================

This package contains Lisp code intended for performance benchmarking
of different Common Lisp implementations. The tests it runs include

  - the well-known Gabriel benchmarks, with a number of bugfixes from
    Henry Baker and limited rewriting to bring them a little closer to
    modern Common Lisp coding style
  
  - hashtable exercising
  
  - READ-LINE exercising
  
  - mathematical functions: factorial, fibonnaci, ackermann's number

  - some bignum-intensive code from Bruno Haible
    
  - various aspects of CLOS: time taken to compile and execute
    DEFCLASS forms which create a class hierarchy, instance creation
    time, time taken to compile and execute DEFMETHOD forms, execution
    of method combinations, both with and without :after methods. 

  - various operations on arrays, bitvectors and strings

  - performance-intensive kernels such as CRC generation and an
    implementation of the DEFLATE algorithm
 
Except for the CLOS COMPILER tests, timings do not include compilation
time. The garbage collector is run before each test to try to make the
timings more repeatable. For certain targets, we assume that the times
reported by GET-INTERNAL-RUN-TIME and GET-INTERNAL-REAL-TIME are
accurate. Timings for a given Common Lisp environment may be quite
sensitive to the optimization settings; these are set at the beginning
of the Makefile.

Common Lisp is a very large language, so it is difficult to evaluate
the performance of all aspects of an implementation. Remember that the
only real benchmark is your application: this code is only
representative of real-life programs to a limited extent.

Running the suite
=================

Put cl-bench directory in the asdf:*central-registry* or link it in
~/quicklisp/local-projects/ directory and load the CL-BENCH. CL-BENCH
depends on the ALEXANDRIA and TRIVIAL-GARBAGE libraries, while
CL-BENCH/REPORT depends additionally on CL-WHO.

   (ql:quickload 'cl-bench)
   (cl-bench:bench-run)

Benchmark results will be in the output/ directory in the CL-BENCH
source tree directory.

Generating reports
==================

To generate reports after running the tests on various
implementations, issue the following:

   (ql:quickload 'cl-bench/report)
   (cl-bench::bench-analysis-page)

Benchmarks report will be generated to the output/report.html file in
the CL-BENCH source tree directory.

Running the suite (obsolete)
============================

Each implementation has a run-<impl>.sh file, and a setup-<impl>.lisp
file in the sysdep directory. To execute the test for CMUCL, for
exampe, type

   bash run-cmucl.sh

This will create files with the optimization setting requested in the
Makefile, then invoke CMUCL to compile the files (lots of warnings and
compilation notes are to be expected), then run the various tests.
Execution takes quite a while: around 30 minutes on a 1GHz PIII. A
large part of this time is due to the bignum benchmarks, which you can
comment out if they don't interest you. You can also reduce the number
of executions of each particular test by editing the file "tests.lisp"
-- modify the :runs parameter of the DEFBENCH forms in the file
"tests.lisp". For the results to be meaningful, the time for each test
should be more than a second. However, when comparing implementations
it is not the absolute time which counts, but the time relative to the
other implementations. Also note that you may wish to reduce the size
of the problems, particularly in the array sections, because with some
setups (CMUCL on Linux for example), your system may become unusable
if you exceed available RAM. You can do this by editing the file
"tests.lisp" and modifying the arguments to the :code parameters.

Repeat this operation for other implementations you have on your
machine. I have tried it with CMUCL, SBCL, CLISP, OpenMCL, Poplog
CL and LispWorks on various platforms. GCL and ECL are able to run
some of the tests.

If you're not running on a Unix platform, you may not be able to
script the different steps of a run. In this case, follow the
following steps:

   1. Say "make clean optimize-lisp" to create source files that contain
      optimization declarations. The optimize settings can be changed
      in the Makefile. 

   2. Load the file "generate.lisp" into your implementation. This
      should create two files "do-compilation-script.lisp" and
      "do-execute-script.lisp", that contain a sequence of compilation
      and load steps.

   3. Load the file "sysdep/setup-<yourimpl>.lisp", which you may need
      to write yourself. This requires a definition of a BENCH-GC
      function that triggers a full garbage collection.

   4. Load the file "do-compilation-script.lisp" into your
      implementation, which should result in all the source files
      being compiled.

   5. Load the file "do-execute-script.lisp", which should cause all
      the tests to be executed. 

For each tested implementation, you should have a file in output/
named "CL-benchmark-<date>".  These files will have the following
format:

,---- /var/tmp/CL-benchmark-20010821T2208 ---
| ;; -*- lisp -*-  CMU Common Lisp CVS sources, level-1 built 2001-08-22 on maftia1
| ;;
| ;; Implementation *features*:
| ;; (:PCL-STRUCTURES :PORTABLE-COMMONLOOPS :PYTHON :PCL :COMPLEX-FP-VOPS :PPRO
| ;;  :PENTIUM :MP :HASH-NEW :RANDOM-MT19937 :PROPAGATE-FUN-TYPE
| ;;  :PROPAGATE-FLOAT-TYPE :CONSTRAIN-FLOAT-TYPE :CMU18C :CMU18 :GENCGC :X86
| ;;  :LINUX :GLIBC2 :UNIX :COMMON :COMMON-LISP :ANSI-CL :CMU
| ;;  :IEEE-FLOATING-POINT)
| ;;
| ;; Function                      real     user     sys       consed
| ;; ----------------------------------------------------------------
| ;; Boyer                         1.50     1.22     0.28     54349520
| ;; Browse                        0.97     0.79     0.18     36219256
| ;; DDerviv                       0.88     0.50     0.39     67197656
| ;; Deriv                         1.64     0.87     0.77    127195824
| ;; Destructive                   0.30     0.24     0.05     12819928
| ;; div2-test-1                   0.52     0.32     0.20     38398176
| ;; div2-test2                    0.66     0.42     0.24     47999936
| ;; FFT                           0.40     0.40     0.00            0
| ;; frpoly/fixnum                 0.65     0.54     0.10     19172440
| ;; frpoly/bignum                 1.54     1.25     0.28     55628704
| ;; frpoly/float                  7.59     6.50     1.09    213052408
| ;; Puzzle                        0.82     0.82     0.00            0
| ;; CTak                          0.81     0.81     0.00            0
| ;; Tak                           0.53     0.54     0.00            0
| ;; RTak                          0.35     0.36     0.00            0
| ;; takl                          1.60     1.60     0.00            0
| ;; stak                          1.15     1.14     0.00            0
| ;; fprint                        0.25     0.25     0.01      1948416
| ;; fread                         0.82     0.68     0.13     28487280
| ;; traverse                      5.28     5.24     0.03      4493288
| ;; triangle                      2.29     2.28     0.00       499712
| ;; factorial                     0.37     0.18     0.20     26120296
| ;; fib                           2.39     2.39     0.00            0
| ;; hashtable                     0.72     0.69     0.04      9888912
| ;; CLOS/defclass                 2.82     2.31     0.12     32757328
| ;; CLOS/defmethod               10.94    10.09     0.55    120612624
| ;; CLOS/instantiate              7.13     6.27     0.86    229048352
| ;; CLOS/methodcalls              6.56     5.22     1.34    301057608
| ;; CLOS/method+after            12.02    11.09     0.93    197058816
| ;; CLOS/complex-methods          0.38     0.38     0.00       286600
| ;; 1D-arrays                     2.46     2.17     0.30     60002400
| ;; 2D-arrays                    20.57    19.50     1.07    240000240
| ;; bitvectors                   18.75    18.51     0.23     50003200
| ;; fill-strings                 21.12    15.23     5.88   1000016000
| ;; fill-strings/adjustable      57.10    56.25     0.85    259729520
| 
| ("CMU Common Lisp CVS sources, level-1 built 2001-08-22 on maftia1"
|  ("fill-strings/adjustable" 57.1 56.25 0.85 259729520)
|  ("fill-strings" 21.12 15.23 5.88 1000016000)
|  ("bitvectors" 18.75 18.51 0.23 50003200)
|  ("2D-arrays" 20.57 19.5 1.07 240000240) ("1D-arrays" 2.46 2.17 0.3 60002400)
|  ("CLOS/complex-methods" 0.38 0.38 0.0 286600)
|  ("CLOS/method+after" 12.02 11.09 0.93 197058816)
|  ("CLOS/methodcalls" 6.56 5.22 1.34 301057608)
|  ("CLOS/instantiate" 7.13 6.27 0.86 229048352)
|  ("CLOS/defmethod" 10.94 10.09 0.55 120612624)
|  ("CLOS/defclass" 2.82 2.31 0.12 32757328) ("hashtable" 0.72 0.69 0.04 9888912)
|  ("fib" 2.39 2.39 0.0 0) ("factorial" 0.37 0.18 0.2 26120296)
|  ("triangle" 2.29 2.28 0.0 499712) ("traverse" 5.28 5.24 0.03 4493288)
|  ("fread" 0.82 0.68 0.13 28487280) ("fprint" 0.25 0.25 0.01 1948416)
|  ("stak" 1.15 1.14 0.0 0) ("takl" 1.6 1.6 0.0 0) ("RTak" 0.35 0.36 0.0 0)
|  ("Tak" 0.53 0.54 0.0 0) ("CTak" 0.81 0.81 0.0 0) ("Puzzle" 0.82 0.82 0.0 0)
|  ("frpoly/float" 7.59 6.5 1.09 213052408)
|  ("frpoly/bignum" 1.54 1.25 0.28 55628704)
|  ("frpoly/fixnum" 0.65 0.54 0.1 19172440) ("FFT" 0.4 0.4 0.0 0)
|  ("div2-test2" 0.66 0.42 0.24 47999936) ("div2-test-1" 0.52 0.32 0.2 38398176)
|  ("Destructive" 0.3 0.24 0.05 12819928) ("Deriv" 1.64 0.87 0.77 127195824)
|  ("DDerviv" 0.88 0.5 0.39 67197656) ("Browse" 0.97 0.79 0.18 36219256)
|  ("Boyer" 1.5 1.22 0.28 54349520))
`----

The first section of the file is intended to be human readable, and
the second section to be READ by a Common Lisp implementation. For
each test, you should see the elapsed user time, and possibly (if this
has been coded for your implementation) elapsed system time and the
number of bytes consed during the test execution. 

The data in the different output/CL-benchmark-* files is analysed by
the file "report.lisp", to generate a report comparing the performance
of the different implementations. This file needs to be run in a
Common Lisp implementation; the one you use will be considered the
"reference" implementation. In the report which is generated, for each
test the timing for the reference implementation will be shown, as
well as the _relative times_ for each of the other tested
implementations. A relative time means that a score under 1 is better,
and a score of 2 means it is two times slower -- for that test -- than
the reference implementation. If a given test doesn't work in a
particular implementation (for example CLISP doesn't do non-standard
method combination), its entry will be -1.0.

Here is an example of the type of results you can obtain, for x86 and
SPARC: 

,---- PentiumIII at 1GHz, 256MB RAM, Linux 2.4.2 ---
|
| Benchmark                 Reference  CLISP  CMU C  SBCL 
| ----------------------------------------------------------------
| BOYER                          2.36   4.54   0.67   0.94
| BROWSE                         1.04   2.15   0.65   1.04
| DDerviv                        1.19   1.96   0.48   1.06
| Deriv                          2.27   1.93   0.42   1.04
| DESTRUCTIVE                    1.52   2.79   0.89   1.06
| DIV2-TEST-1                    1.73   1.84   0.51   1.09
| DIV2-TEST-2                    0.85   1.87   0.46   1.12
| FFT                            0.22  36.09   1.14   1.14
| FRPOLY/FIXNUM                  0.79   5.81   0.81   0.96
| FRPOLY/BIGNUM                  1.99   2.03   0.68   0.96
| FRPOLY/FLOAT                   0.78   3.79   0.72   0.99
| PUZZLE                         0.79  23.09   1.15   9.95
| CTAK                           0.86   6.28   1.10   1.06
| TAK                            0.91   8.86   1.18   1.34
| RTAK                           0.91   8.86   1.13   1.34
| TAKL                           1.67   7.33   1.16   1.22
| STAK                           1.15   6.66   1.15   1.10
| FPRINT                         1.42   1.12   1.05   2.37
| TRAVERSE                       4.35   6.75   1.19   1.64
| TRIANGLE                       2.01  17.22   1.14   1.27
| CASCOR                         4.06  80.47   1.23   0.92
| RICHARDS                       0.58  24.78   1.22   1.07
| FACTORIAL                      0.50   3.56   0.68   1.38
| FIB                            0.39   5.67   1.13   1.08
| BIGNUM/ELEM-100-1000           1.33   0.11   1.02   0.98
| BIGNUM/ELEM-1000-100           6.03   0.07   1.11   0.93
| BIGNUM/ELEM-10000-1            6.14   0.05   1.00   0.93
| BIGNUM/PARI-100-10             1.51   0.06   0.96   0.91
| BIGNUM/PARI-200-5             17.32   0.02   0.99   0.88
| HASH-STRINGS                   1.65   3.00   0.82   0.99
| HASH-INTEGERS                  0.73   2.30   0.68   1.05
| BOEHM-GC                      10.18   3.94   0.39   1.19
| CLOS/defclass                  2.64   0.35   0.83   2.33
| CLOS/defmethod                13.93   0.02   0.70   1.94
| CLOS/instantiate               7.23   1.02   0.63   1.39
| CLOS/methodcalls               8.43   1.33   0.50   1.37
| CLOS/method+after             13.90   0.45   0.79   2.64
| CLOS/complex-methods           0.35  -1.00   1.06   5.40
| 1D-ARRAYS                      3.74   4.57   0.69   1.26
| 2D-ARRAYS                     15.19   3.25   0.92   4.00
| BITVECTORS                     2.92   0.61   0.70   0.95
| FILL-STRINGS                  10.62   1.15   0.76   3.03
| fill-strings/adjustable       18.13   1.08   0.92   1.65
| BENCH-STRING-CONCAT            9.99  -1.00   0.81   2.34
| 
| Reference implementation: CMU Common Lisp 18c
| Impl CLISP: CLISP 2.27 (released 2001-07-17) (built 3204291355) (memory 3205854943)
| Impl CMU C: CMU Common Lisp 18d-pre, level-1 built 2001-12-14 on maftia1
| Impl SBCL : SBCL 0.7.0
| Linux maftia1 2.4.2-2 #1 Sun Apr 8 20:41:30 EDT 2001 i686 unknown
`----


,---- UltraSPARCIIe at 500MHz, 640MB RAM, SunOS 5.8 ---
|
| Benchmark                 Reference  CMU C  CLISP
| -----------------------------------------------------
| BOYER                          3.98   0.91   8.03
| BROWSE                         1.72   0.91   2.85
| DDerviv                        2.02   0.75   3.21
| Deriv                          3.63   0.81   3.13
| DESTRUCTIVE                    3.11   1.01   4.18
| DIV2-TEST-1                    2.19   0.83   3.92
| DIV2-TEST-2                    1.12   0.82   3.85
| FFT                            0.74   1.03  28.86
| FRPOLY/FIXNUM                  1.87   1.01   7.89
| FRPOLY/BIGNUM                  4.59   1.29   3.07
| FRPOLY/FLOAT                   1.65   0.96   5.68
| PUZZLE                         2.07   0.95  30.62
| CTAK                           2.74   1.01   9.04
| TAK                            1.84   1.00  14.08
| RTAK                           1.84   1.01  13.95
| TAKL                           3.37   1.01  11.63
| STAK                           2.32   1.01   8.87
| FPRINT                         4.17   1.02   1.12
| TRAVERSE                       5.84   0.99  13.74
| TRIANGLE                       5.53   0.86  15.57
| CASCOR                        10.53   0.73  52.81
| RICHARDS                       2.35   0.94  22.46
| FACTORIAL                      1.46   1.48   2.88
| FIB                            0.94   0.99   6.71
| BIGNUM/ELEM-100-1000           2.80   1.24   0.28
| BIGNUM/ELEM-1000-100          10.14   1.19   0.44
| BIGNUM/ELEM-10000-1           11.38   1.35   0.41
| BIGNUM/PARI-100-10             2.76   1.15   0.09
| BIGNUM/PARI-200-5             27.19   1.06   0.05
| READ-LINE                      3.39   1.06   1.19
| HASH-STRINGS                   5.42   1.20   2.19
| HASH-INTEGERS                  1.61   0.76   2.00
| BOEHM-GC                      19.97   0.76   4.14
| CLOS/defclass                  4.78   1.01   0.81
| CLOS/defmethod                27.61   0.89   0.03
| CLOS/instantiate              20.93   0.85   1.28
| CLOS/methodcalls              23.62   1.08   1.94
| CLOS/method+after             33.70   1.07   0.78
| CLOS/complex-methods           1.41   0.92  -1.00
| 1D-ARRAYS                     10.77   0.92   3.51
| 2D-ARRAYS                     56.66   1.40   2.61
| BITVECTORS                     5.35   0.86   0.42
| FILL-STRINGS                  18.88   1.07   0.97
| fill-strings/adjustable       45.09   1.46   1.41
| BENCH-STRING-CONCAT           48.10   0.90  -1.00
| 
| Reference implementation: CMU Common Lisp 18c, Built 2000-11-27
| Impl CMU C: CMU Common Lisp 18d-pre, level-1 built 2001-12-12 on liszt
| Impl CLISP: CLISP 2.27.2 (released 2001-10-05) (built on moustacho)
| SunOS eagles 5.8 Generic_108528-10 sun4u sparc SUNW,Sun-Blade-100
`----


Note that the test suite doesn't take compilation time into account
(except for the CLOS-related tests, where the compiler may be used at
runtime to create effective methods). You can use the time taken to
load do-compilation-script.lisp as a rough proxy for compilation time.


   "Life is short and it was not meant to be spent making people feel guilty
    about instruction pipelines being only partly full or caches being missed."
     -- Kent Pitman in <sfw7ksm3b7k.fsf@shell01.TheWorld.com>


Thanks
======

Raymond Toy, Christophe Rhodes, Peter Van Eynde, Sven Van
Caekenberghe, Christophe Rhodes, Kevin Layers and possibly others that
I have forgotten to note.


Related work
============

  -  @misc{ gabriel86performance,
      author = "R. Gabriel",
      title = "Performance and Evaluation of Lisp Systems",
      text = "R. P. Gabriel. Performance and Evaluation of Lisp Systems. MIT Press, Cambridge,
        Massachusetts, 1986.",
      year = "1986" }

  - Scheme benchmarks by Will Clinger (Larceny and TwoBit compilers)
    <URL:http://www.ccs.neu.edu/home/will/GC/sourcecode.html>

  - Bagley's Programming Language Shootout,
    <URL:http://www.bagley.org/~doug/shootout/>

 
Eric Marsden <emarsden@laas.fr>