Skip to content

Commit

Permalink
aarch64: add vector routines (among other goodies)
Browse files Browse the repository at this point in the history
This PR doesn't just add `aarch64`-specific code, but it refactors pretty
much everything about how the code is organized. There are big perf wins for
`aarch64` (see benchmark results below), and also latency improvements across
the board. A brief summary of the changes in this PR:

* I've added `aarch64` NEON vector implementations for `memchr`,
`memrchr`, `memchr2`, `memrchr2`, `memchr3`, `memrchr3` and `memmem`.
This should lead to massive speed improvements on an increasing popular
target, due in large part to Apple silicon.
* I've added `wasm32` simd128 vector implementations for `memchr`,
`memrchr`, `memchr2`, `memrchr2`, `memchr3` and `memrchr3`.
(alexcrichton previously contributed a vector implementation for
`memmem` and that remains.)
* `x86_64` has no real additions other than the `memchr_iter(needle,
haystack).count()` specialization. It already has SSE2 and AVX2
implementations of `memchr` (and friends) and `memmem`. It uses AVX2
automatically via runtime inspection of what the current CPU supports.
There is no need to compile with the `avx2` feature enabled.
* I've replaced the benchmark suite using Criterion with a
benchmark suite using [rebar](https://github.com/BurntSushi/rebar).
While I designed rebar to be used for regex engines, it
can be used for [any substring or multi-substring search
task](https://github.com/BurntSushi/rebar/blob/45afe89f437173d2dd970fee
7d7f1db5d0e05588/BYOB.md).
* I've added a new `arch` sub-module that exposes a lot of the internal
routines (including target specific routines) used to implement
`memchr` and `memmem`. This module is part of a major refactoring
of how this crate is organized and it seemed prudent to expose the
internals as their APIs are pretty straight-forward. That is, there
isn't a huge API design space IMO. This module includes scalar
substring search implementations of Shift-Or, Rabin-Karp and Two-Way.
* As a result of the refactoring mentioned above, most of the
conditional compilation stuff has been pushed down and mostly
abstracted away. Moreover, since each implementation now has its own
proper API surface that is uniform across other implementations, each
thing can be easily independently tested. Because of this, I was able
to remove a reliance on the variety of custom `cfg` knobs that the
previous version of `memchr` setup in its build script. This in turn
**allowed me to remove the build script entirely.** Given the ubiquity
of this crate, this may lead to compile time improvements downstream.
(Likely small in each individual case but perhaps large in aggregate.)
I can't promise that a build script will never re-appear, but I'll try
to resist adding one in the future if possible.
* Despite the above, compile times for this crate have sadly
seemed to increase slightly. Namely, a fresh `time rebar build -e
'^rust/memchrold/memmem/prebuilt$'` reports 0.944 seconds on my system
while a fresh `time rebar build -e '^rust/memchr/memmem/prebuilt$'`
reports 1.164 seconds. This is on `x86_64` where no real additional
code was added. This could be because of the "nicer" abstractions
now present in the `arch` sub-module or perhaps how the internals
are structured. (Previously there were multiple monomorphic
implementations of `memchr` for example and now there is a single
generic implementation that is monomorphized automatically by the
compiler via generics. Perhaps that is more expensive?)
* I've specialized `memchr_iter(needle, haystack).count()` to use
a different vector implementation that specifically only counts
matches instead of reporting the offsets of each match. This can make
*huge* (potentially over an order of magnitude) differences when
counting the number of matches of a frequently (even semi-frequently)
occurring byte in a large haystack. This is effectively what the
[`bytecount`](https://crates.io/crates/bytecount) crate does (which
is what ripgrep currently uses to compute line numbers for matches),
but the marginal cost of adding it to the `memchr` crate was very low.
So I did. And I plan to move ripgrep to using `memchr_iter(needle,
haystack).count()`. (Also, the benchmarks below suggest that the
counting implementation I wrote is faster than the one in `bytecount`
in some cases which look like they'll be relevant for ripgrep. This was
surprising to me.)
* I've added an `alloc` feature which permits compiling this
crate without the standard library but with the `alloc` crate.
This crate is designed through-and-through to work in a core-only
context, so this doesn't unlock much compared to just disabling
the `std` feature. It adds a couple of APIs requiring allocation
(like `memmem::Finder::into_owned`) and other things like
`arch::all::shiftor` which really want an allocation to store its
bit-parallel state machine.
* The `libc` feature is **DEPRECATED** and is now a no-op. I don't
think there is any real benefit to it any more.
* A new disabled-by-default `logging` feature has been added. When
enabled, this crate will emit a smattering of log messages. Usually
these messages are used to indicate what kind of strategy is selected.
For example, whether a vector or scalar algorithm is used for substring
search.

Differences across the board from the status quo. Showing only
measurements with a 1.2x (or greater) difference.

```
$ rebar diff tmp/old.csv tmp/new.csv -t 1.2 -e memmem -E oneshot
benchmark                                         engine                       tmp/old.csv          tmp/new.csv
---------                                         ------                       -----------          -----------
memmem/code/rust-library-never-fn-strength        rust/memchr/memmem/prebuilt  42.8 GB/s (1.25x)    53.6 GB/s (1.00x)
memmem/code/rust-library-never-fn-strength-paren  rust/memchr/memmem/prebuilt  40.8 GB/s (1.32x)    53.8 GB/s (1.00x)
memmem/code/rust-library-never-fn-quux            rust/memchr/memmem/prebuilt  40.5 GB/s (1.37x)    55.6 GB/s (1.00x)
memmem/code/rust-library-rare-fn-from-str         rust/memchr/memmem/prebuilt  39.3 GB/s (1.37x)    53.8 GB/s (1.00x)
memmem/code/rust-library-common-fn-is-empty       rust/memchr/memmem/prebuilt  40.5 GB/s (1.30x)    52.6 GB/s (1.00x)
memmem/code/rust-library-common-fn                rust/memchr/memmem/prebuilt  21.6 GB/s (1.27x)    27.5 GB/s (1.00x)
memmem/pathological/rare-repeated-huge-tricky     rust/memchr/memmem/prebuilt  40.9 GB/s (1.55x)    63.4 GB/s (1.00x)
memmem/pathological/rare-repeated-small-match     rust/memchr/memmem/prebuilt  1468.7 MB/s (1.23x)  1811.4 MB/s (1.00x)
memmem/sliceslice/short                           rust/memchr/memmem/prebuilt  14.74ms (2.08x)      7.08ms (1.00x)
memmem/sliceslice/seemingly-random                rust/memchr/memmem/prebuilt  9.1 MB/s (1.23x)     11.2 MB/s (1.00x)
memmem/sliceslice/i386                            rust/memchr/memmem/prebuilt  41.4 MB/s (1.35x)    55.8 MB/s (1.00x)
memmem/subtitles/common/huge-en-you               rust/memchr/memmem/prebuilt  10.7 GB/s (1.26x)    13.5 GB/s (1.00x)
memmem/subtitles/common/huge-zh-that              rust/memchr/memmem/prebuilt  25.2 GB/s (1.49x)    37.5 GB/s (1.00x)
memmem/subtitles/never/huge-en-john-watson        rust/memchr/memmem/prebuilt  42.9 GB/s (1.48x)    63.6 GB/s (1.00x)
memmem/subtitles/never/huge-en-all-common-bytes   rust/memchr/memmem/prebuilt  41.9 GB/s (1.26x)    52.7 GB/s (1.00x)
memmem/subtitles/never/teeny-en-all-common-bytes  rust/memchr/memmem/prebuilt  1161.0 MB/s (1.53x)  1780.2 MB/s (1.00x)
memmem/subtitles/never/teeny-en-some-rare-bytes   rust/memchr/memmem/prebuilt  1161.0 MB/s (1.53x)  1780.2 MB/s (1.00x)
memmem/subtitles/never/teeny-en-two-space         rust/memchr/memmem/prebuilt  1161.0 MB/s (1.53x)  1780.2 MB/s (1.00x)
memmem/subtitles/never/huge-ru-john-watson        rust/memchr/memmem/prebuilt  40.6 GB/s (1.56x)    63.5 GB/s (1.00x)
memmem/subtitles/never/teeny-ru-john-watson       rust/memchr/memmem/prebuilt  1741.5 MB/s (1.44x)  2.4 GB/s (1.00x)
memmem/subtitles/never/huge-zh-john-watson        rust/memchr/memmem/prebuilt  41.1 GB/s (1.46x)    59.9 GB/s (1.00x)
memmem/subtitles/never/teeny-zh-john-watson       rust/memchr/memmem/prebuilt  1285.4 MB/s (1.53x)  1970.9 MB/s (1.00x)
memmem/subtitles/rare/huge-en-sherlock-holmes     rust/memchr/memmem/prebuilt  41.9 GB/s (1.52x)    63.5 GB/s (1.00x)
memmem/subtitles/rare/huge-en-sherlock            rust/memchr/memmem/prebuilt  41.9 GB/s (1.46x)    61.3 GB/s (1.00x)
memmem/subtitles/rare/huge-en-medium-needle       rust/memchr/memmem/prebuilt  38.3 GB/s (1.46x)    55.9 GB/s (1.00x)
memmem/subtitles/rare/huge-en-long-needle         rust/memchr/memmem/prebuilt  2.5 GB/s (17.34x)    44.0 GB/s (1.00x)
memmem/subtitles/rare/huge-en-huge-needle         rust/memchr/memmem/prebuilt  2.3 GB/s (20.24x)    45.7 GB/s (1.00x)
memmem/subtitles/rare/teeny-en-sherlock-holmes    rust/memchr/memmem/prebuilt  1068.1 MB/s (1.47x)  1570.8 MB/s (1.00x)
memmem/subtitles/rare/teeny-en-sherlock           rust/memchr/memmem/prebuilt  953.7 MB/s (1.27x)   1213.8 MB/s (1.00x)
memmem/subtitles/rare/teeny-ru-sherlock-holmes    rust/memchr/memmem/prebuilt  1430.5 MB/s (1.47x)  2.1 GB/s (1.00x)
memmem/subtitles/rare/teeny-ru-sherlock           rust/memchr/memmem/prebuilt  1213.8 MB/s (1.32x)  1602.2 MB/s (1.00x)
memmem/subtitles/rare/huge-zh-sherlock-holmes     rust/memchr/memmem/prebuilt  41.8 GB/s (1.33x)    55.5 GB/s (1.00x)
memmem/subtitles/rare/huge-zh-sherlock            rust/memchr/memmem/prebuilt  43.0 GB/s (1.38x)    59.4 GB/s (1.00x)
memmem/subtitles/rare/teeny-zh-sherlock           rust/memchr/memmem/prebuilt  895.9 MB/s (1.27x)   1137.1 MB/s (1.00x)
```

A comparison with the
[`sliceslice`](https://crates.io/crates/sliceslice) crate for just
substring search. We only include measurements with a 1.2x difference
or greater.

```
$ rebar cmp benchmarks/record/x86_64/2023-08-26.csv -e sliceslice/memmem/prebuilt -e rust/memchr/memmem/prebuilt -t 1.2
benchmark                                                   rust/memchr/memmem/prebuilt  rust/sliceslice/memmem/prebuilt
---------                                                   ---------------------------  -------------------------------
memmem/byterank/binary                                      4.4 GB/s (1.32x)             5.8 GB/s (1.00x)
memmem/code/rust-library-never-fn-strength                  53.6 GB/s (1.00x)            39.8 GB/s (1.35x)
memmem/code/rust-library-never-fn-strength-paren            53.8 GB/s (1.00x)            39.7 GB/s (1.35x)
memmem/code/rust-library-never-fn-quux                      55.6 GB/s (1.00x)            38.7 GB/s (1.44x)
memmem/code/rust-library-rare-fn-from-str                   53.8 GB/s (2.65x)            142.7 GB/s (1.00x)
memmem/pathological/md5-huge-no-hash                        50.1 GB/s (1.00x)            25.7 GB/s (1.95x)
memmem/pathological/md5-huge-last-hash                      47.6 GB/s (1.00x)            27.7 GB/s (1.72x)
memmem/pathological/rare-repeated-huge-tricky               63.4 GB/s (1.00x)            41.9 GB/s (1.51x)
memmem/pathological/rare-repeated-small-tricky              25.2 GB/s (1.32x)            33.3 GB/s (1.00x)
memmem/pathological/defeat-simple-vector-alphabet           4.1 GB/s (1.65x)             6.7 GB/s (1.00x)
memmem/pathological/defeat-simple-vector-freq-alphabet      19.2 GB/s (1.00x)            2.6 GB/s (7.33x)
memmem/pathological/defeat-simple-vector-repeated-alphabet  1234.5 MB/s (1.00x)          508.7 MB/s (2.43x)
memmem/sliceslice/short                                     7.08ms (1.00x)               14.10ms (1.99x)
memmem/sliceslice/i386                                      55.8 MB/s (1.00x)            39.6 MB/s (1.41x)
memmem/subtitles/never/huge-en-john-watson                  63.6 GB/s (1.00x)            41.7 GB/s (1.53x)
memmem/subtitles/never/huge-en-all-common-bytes             52.7 GB/s (1.00x)            42.6 GB/s (1.24x)
memmem/subtitles/never/teeny-en-john-watson                 1027.0 MB/s (2.17x)          2.2 GB/s (1.00x)
memmem/subtitles/never/teeny-en-all-common-bytes            1780.2 MB/s (1.25x)          2.2 GB/s (1.00x)
memmem/subtitles/never/teeny-en-some-rare-bytes             1780.2 MB/s (1.25x)          2.2 GB/s (1.00x)
memmem/subtitles/never/teeny-en-two-space                   1780.2 MB/s (1.25x)          2.2 GB/s (1.00x)
memmem/subtitles/never/huge-ru-john-watson                  63.5 GB/s (1.00x)            12.7 GB/s (4.99x)
memmem/subtitles/never/teeny-ru-john-watson                 2.4 GB/s (1.23x)             3.0 GB/s (1.00x)
memmem/subtitles/never/huge-zh-john-watson                  59.9 GB/s (1.00x)            41.1 GB/s (1.46x)
memmem/subtitles/never/teeny-zh-john-watson                 1970.9 MB/s (1.25x)          2.4 GB/s (1.00x)
memmem/subtitles/rare/huge-en-sherlock-holmes               63.5 GB/s (1.00x)            41.6 GB/s (1.53x)
memmem/subtitles/rare/huge-en-sherlock                      61.3 GB/s (1.00x)            43.0 GB/s (1.42x)
memmem/subtitles/rare/huge-en-medium-needle                 55.9 GB/s (1.00x)            25.7 GB/s (2.17x)
memmem/subtitles/rare/huge-en-long-needle                   44.0 GB/s (1.00x)            25.9 GB/s (1.70x)
memmem/subtitles/rare/huge-en-huge-needle                   45.7 GB/s (1.00x)            29.3 GB/s (1.56x)
memmem/subtitles/rare/teeny-en-sherlock                     1213.8 MB/s (1.37x)          1668.9 MB/s (1.00x)
memmem/subtitles/rare/huge-ru-sherlock-holmes               40.7 GB/s (1.00x)            15.2 GB/s (2.67x)
memmem/subtitles/rare/teeny-ru-sherlock                     1602.2 MB/s (1.56x)          2.4 GB/s (1.00x)
memmem/subtitles/rare/huge-zh-sherlock-holmes               55.5 GB/s (1.00x)            26.6 GB/s (2.09x)
memmem/subtitles/rare/huge-zh-sherlock                      59.4 GB/s (1.00x)            42.4 GB/s (1.40x)
memmem/subtitles/rare/teeny-zh-sherlock-holmes              1055.9 MB/s (1.87x)          1970.9 MB/s (1.00x)
memmem/subtitles/rare/teeny-zh-sherlock                     1137.1 MB/s (1.86x)          2.1 GB/s (1.00x)
```

Differences with the substring search implementation and `memmem` as
provided by GNU libc. Showing only measurements with 2x difference or
greater.

```
$ rebar cmp benchmarks/record/x86_64/2023-08-26.csv -e libc/memmem/oneshot -e rust/memchr/memmem/oneshot -t 2
benchmark                                         libc/memmem/oneshot  rust/memchr/memmem/oneshot
---------                                         -------------------  --------------------------
memmem/code/rust-library-never-fn-strength        11.4 GB/s (4.75x)    54.1 GB/s (1.00x)
memmem/code/rust-library-never-fn-strength-paren  12.4 GB/s (4.36x)    54.0 GB/s (1.00x)
memmem/code/rust-library-never-fn-quux            8.1 GB/s (6.91x)     55.8 GB/s (1.00x)
memmem/code/rust-library-rare-fn-from-str         15.0 GB/s (3.59x)    53.8 GB/s (1.00x)
memmem/code/rust-library-common-fn-is-empty       12.5 GB/s (4.16x)    51.9 GB/s (1.00x)
memmem/code/rust-library-common-fn                2.2 GB/s (5.89x)     13.0 GB/s (1.00x)
memmem/code/rust-library-common-let               3.2 GB/s (2.65x)     8.5 GB/s (1.00x)
memmem/pathological/rare-repeated-huge-tricky     17.8 GB/s (3.56x)    63.3 GB/s (1.00x)
memmem/pathological/rare-repeated-huge-match      718.0 MB/s (1.00x)   289.1 MB/s (2.48x)
memmem/pathological/rare-repeated-small-match     707.1 MB/s (1.00x)   303.1 MB/s (2.33x)
memmem/subtitles/common/huge-en-that              3.7 GB/s (4.22x)     15.7 GB/s (1.00x)
memmem/subtitles/common/huge-en-one-space         1543.9 MB/s (1.00x)  541.6 MB/s (2.85x)
memmem/subtitles/common/huge-ru-that              2.7 GB/s (4.22x)     11.6 GB/s (1.00x)
memmem/subtitles/common/huge-ru-not               2.0 GB/s (2.47x)     5.0 GB/s (1.00x)
memmem/subtitles/common/huge-ru-one-space         2.9 GB/s (1.00x)     1081.0 MB/s (2.71x)
memmem/subtitles/common/huge-zh-that              4.2 GB/s (3.20x)     13.4 GB/s (1.00x)
memmem/subtitles/common/huge-zh-do-not            2.6 GB/s (2.40x)     6.3 GB/s (1.00x)
memmem/subtitles/common/huge-zh-one-space         5.7 GB/s (1.00x)     2.4 GB/s (2.38x)
memmem/subtitles/never/huge-en-john-watson        15.4 GB/s (4.12x)    63.3 GB/s (1.00x)
memmem/subtitles/never/huge-en-all-common-bytes   11.9 GB/s (4.41x)    52.2 GB/s (1.00x)
memmem/subtitles/never/huge-en-some-rare-bytes    11.0 GB/s (5.77x)    63.6 GB/s (1.00x)
memmem/subtitles/never/huge-en-two-space          2.3 GB/s (27.77x)    63.5 GB/s (1.00x)
memmem/subtitles/never/huge-ru-john-watson        5.2 GB/s (11.56x)    59.9 GB/s (1.00x)
memmem/subtitles/never/huge-zh-john-watson        20.7 GB/s (2.86x)    59.2 GB/s (1.00x)
memmem/subtitles/rare/huge-en-sherlock-holmes     17.0 GB/s (3.71x)    63.1 GB/s (1.00x)
memmem/subtitles/rare/huge-en-sherlock            11.8 GB/s (5.18x)    60.9 GB/s (1.00x)
memmem/subtitles/rare/huge-en-huge-needle         19.3 GB/s (2.02x)    38.9 GB/s (1.00x)
memmem/subtitles/rare/huge-ru-sherlock-holmes     6.5 GB/s (9.47x)     61.5 GB/s (1.00x)
memmem/subtitles/rare/huge-ru-sherlock            3.8 GB/s (16.23x)    61.6 GB/s (1.00x)
memmem/subtitles/rare/huge-zh-sherlock            10.8 GB/s (5.48x)    59.1 GB/s (1.00x)
```

Differences with the [`bytecount`](https://crates.io/crates/bytecount)
crate as `memchr_iter(needle, haystack).count()` is now specialized
to its own vector implementation just for counting the number
of matches (instead of reporting the offset of each match). The
thoughput improvements as compared to `bytecount` on large haystacks
are most interesting IMO. (I was somewhat surprised by this, as
`bytecount` seems to do something clever while `memchr_iter(needle,
haystack).count()` is basically just `memchr` but with the branching
for reporting matches removed.) Either way, I expect this to translate
directly to improvements in ripgrep, although I haven't measured that
yet.

```
$ rebar cmp benchmarks/record/x86_64/2023-08-26.csv -e '^rust/bytecount/memchr/oneshot$' -e '^rust/memchr/memchr/onlycount$'
benchmark                          rust/bytecount/memchr/oneshot  rust/memchr/memchr/onlycount
---------                          -----------------------------  ----------------------------
memchr/sherlock/common/huge1       28.5 GB/s (1.94x)              55.3 GB/s (1.00x)
memchr/sherlock/common/small1      17.7 GB/s (1.25x)              22.1 GB/s (1.00x)
memchr/sherlock/common/tiny1       4.3 GB/s (1.00x)               3.8 GB/s (1.13x)
memchr/sherlock/never/huge1        28.4 GB/s (2.09x)              59.3 GB/s (1.00x)
memchr/sherlock/never/small1       17.7 GB/s (1.25x)              22.1 GB/s (1.00x)
memchr/sherlock/never/tiny1        4.3 GB/s (1.00x)               3.8 GB/s (1.13x)
memchr/sherlock/never/empty1       11.00ns (1.00x)                11.00ns (1.00x)
memchr/sherlock/rare/huge1         28.5 GB/s (1.94x)              55.2 GB/s (1.00x)
memchr/sherlock/rare/small1        17.7 GB/s (1.25x)              22.1 GB/s (1.00x)
memchr/sherlock/rare/tiny1         4.3 GB/s (1.00x)               3.8 GB/s (1.13x)
memchr/sherlock/uncommon/huge1     26.9 GB/s (2.20x)              59.3 GB/s (1.00x)
memchr/sherlock/uncommon/small1    17.7 GB/s (1.25x)              22.1 GB/s (1.00x)
memchr/sherlock/uncommon/tiny1     4.3 GB/s (1.00x)               3.8 GB/s (1.13x)
memchr/sherlock/verycommon/huge1   28.4 GB/s (2.09x)              59.3 GB/s (1.00x)
memchr/sherlock/verycommon/small1  17.7 GB/s (1.25x)              22.1 GB/s (1.00x)
```

Differences across the board from the status quo. Note that here, I've
only included measurements with a 4x difference from the old memchr
crate. Otherwise, pretty much every benchmark has a pretty sizeable
improvement from the old version. (Because previously, `aarch64` had no
vector implementations at all.)

```
$ rebar diff tmp/old-aarch64.csv tmp/new-aarch64.csv -t 4 -E oneshot
benchmark                                         engine                       tmp/old-aarch64.csv   tmp/new-aarch64.csv
---------                                         ------                       -------------------   -------------------
memchr/sherlock/never/huge2                       rust/memchr/memchr2          10.8 GB/s (4.27x)     46.3 GB/s (1.00x)
memchr/sherlock/never/small1                      rust/memchr/memchr/prebuilt  15.1 GB/s (41.00x)    618.4 GB/s (1.00x)
memchr/sherlock/never/small1                      rust/memchr/memrchr          14.7 GB/s (42.00x)    618.4 GB/s (1.00x)
memchr/sherlock/never/small2                      rust/memchr/memchr2          7.5 GB/s (83.00x)     618.4 GB/s (1.00x)
memchr/sherlock/never/small2                      rust/memchr/memrchr2         7.5 GB/s (83.00x)     618.4 GB/s (1.00x)
memchr/sherlock/never/small3                      rust/memchr/memchr3          7.5 GB/s (83.00x)     618.4 GB/s (1.00x)
memchr/sherlock/never/small3                      rust/memchr/memrchr3         7.5 GB/s (83.00x)     618.4 GB/s (1.00x)
memchr/sherlock/rare/small1                       rust/memchr/memchr/prebuilt  14.7 GB/s (42.00x)    618.4 GB/s (1.00x)
memchr/sherlock/rare/small1                       rust/memchr/memrchr          14.7 GB/s (42.00x)    618.4 GB/s (1.00x)
memchr/sherlock/rare/small2                       rust/memchr/memchr2          7.5 GB/s (83.00x)     618.4 GB/s (1.00x)
memchr/sherlock/rare/small2                       rust/memchr/memrchr2         7.5 GB/s (83.00x)     618.4 GB/s (1.00x)
memchr/sherlock/uncommon/tiny1                    rust/memchr/memchr/prebuilt  1605.0 MB/s (41.00x)  64.3 GB/s (1.00x)
memchr/sherlock/uncommon/tiny1                    rust/memchr/memrchr          1605.0 MB/s (41.00x)  64.3 GB/s (1.00x)
memmem/code/rust-library-never-fn-strength        rust/memchr/memmem/prebuilt  7.1 GB/s (4.17x)      29.6 GB/s (1.00x)
memmem/code/rust-library-never-fn-strength-paren  rust/memchr/memmem/prebuilt  6.9 GB/s (4.19x)      29.0 GB/s (1.00x)
memmem/code/rust-library-rare-fn-from-str         rust/memchr/memmem/prebuilt  6.5 GB/s (4.42x)      28.7 GB/s (1.00x)
memmem/code/rust-library-common-fn                rust/memchr/memmem/prebuilt  3.2 GB/s (5.58x)      18.0 GB/s (1.00x)
memmem/code/rust-library-common-let               rust/memchr/memmem/prebuilt  2012.9 MB/s (6.45x)   12.7 GB/s (1.00x)
memmem/pathological/md5-huge-no-hash              rust/memchr/memmem/prebuilt  1070.2 MB/s (24.69x)  25.8 GB/s (1.00x)
memmem/pathological/md5-huge-last-hash            rust/memchr/memmem/prebuilt  1148.2 MB/s (22.85x)  25.6 GB/s (1.00x)
memmem/pathological/rare-repeated-huge-tricky     rust/memchr/memmem/prebuilt  1299.3 MB/s (23.87x)  30.3 GB/s (1.00x)
memmem/pathological/rare-repeated-small-tricky    rust/memchr/memmem/prebuilt  1146.0 MB/s (19.83x)  22.2 GB/s (1.00x)
memmem/sliceslice/seemingly-random                rust/memchr/memmem/prebuilt  1485.7 KB/s (4.13x)   6.0 MB/s (1.00x)
memmem/sliceslice/i386                            rust/memchr/memmem/prebuilt  6.0 MB/s (5.07x)      30.3 MB/s (1.00x)
memmem/subtitles/common/huge-en-that              rust/memchr/memmem/prebuilt  1418.2 MB/s (11.50x)  15.9 GB/s (1.00x)
memmem/subtitles/common/huge-ru-that              rust/memchr/memmem/prebuilt  1389.1 MB/s (13.44x)  18.2 GB/s (1.00x)
memmem/subtitles/common/huge-ru-not               rust/memchr/memmem/prebuilt  1482.7 MB/s (7.06x)   10.2 GB/s (1.00x)
memmem/subtitles/never/huge-en-all-common-bytes   rust/memchr/memmem/prebuilt  1813.7 MB/s (12.81x)  22.7 GB/s (1.00x)
memmem/subtitles/never/huge-en-two-space          rust/memchr/memmem/prebuilt  1370.2 MB/s (25.23x)  33.8 GB/s (1.00x)
memmem/subtitles/never/teeny-en-two-space         rust/memchr/memmem/prebuilt  651.3 MB/s (41.00x)   26.1 GB/s (1.00x)
memmem/subtitles/rare/huge-en-sherlock            rust/memchr/memmem/prebuilt  7.0 GB/s (4.40x)      30.6 GB/s (1.00x)
memmem/subtitles/rare/huge-en-medium-needle       rust/memchr/memmem/prebuilt  6.4 GB/s (4.43x)      28.3 GB/s (1.00x)
memmem/subtitles/rare/huge-en-long-needle         rust/memchr/memmem/prebuilt  7.1 GB/s (4.64x)      32.8 GB/s (1.00x)
memmem/subtitles/rare/teeny-en-sherlock-holmes    rust/memchr/memmem/prebuilt  651.3 MB/s (41.00x)   26.1 GB/s (1.00x)
memmem/subtitles/rare/teeny-en-sherlock           rust/memchr/memmem/prebuilt  651.3 MB/s (41.00x)   26.1 GB/s (1.00x)
memmem/subtitles/rare/teeny-ru-sherlock-holmes    rust/memchr/memmem/prebuilt  953.7 MB/s (42.00x)   39.1 GB/s (1.00x)
memmem/subtitles/rare/teeny-ru-sherlock           rust/memchr/memmem/prebuilt  976.9 MB/s (41.00x)   39.1 GB/s (1.00x)
memmem/subtitles/rare/huge-zh-sherlock-holmes     rust/memchr/memmem/prebuilt  4.1 GB/s (7.06x)      28.8 GB/s (1.00x)
memmem/subtitles/rare/huge-zh-sherlock            rust/memchr/memmem/prebuilt  6.1 GB/s (4.81x)      29.6 GB/s (1.00x)
memmem/subtitles/rare/teeny-zh-sherlock-holmes    rust/memchr/memmem/prebuilt  721.1 MB/s (41.00x)   28.9 GB/s (1.00x)
memmem/subtitles/rare/teeny-zh-sherlock           rust/memchr/memmem/prebuilt  721.1 MB/s (41.00x)   28.9 GB/s (1.00x)
```

A comparison with the
[`sliceslice`](https://crates.io/crates/sliceslice) crate, which has
its own custom `aarch64` vector implementation of substring search. We
only show measurements with 1.2x or greater difference.

```
$ rebar cmp benchmarks/record/aarch64/2023-08-26.csv -e sliceslice/memmem/prebuilt -e rust/memchr/memmem/prebuilt -t 1.2
benchmark                                                   rust/memchr/memmem/prebuilt  rust/sliceslice/memmem/prebuilt
---------                                                   ---------------------------  -------------------------------
memmem/byterank/binary                                      3.1 GB/s (1.00x)             1586.4 MB/s (2.01x)
memmem/code/rust-library-never-fn-strength                  29.6 GB/s (1.00x)            16.1 GB/s (1.84x)
memmem/code/rust-library-never-fn-strength-paren            29.0 GB/s (1.00x)            15.6 GB/s (1.86x)
memmem/code/rust-library-never-fn-quux                      30.2 GB/s (1.00x)            15.1 GB/s (2.00x)
memmem/code/rust-library-rare-fn-from-str                   28.7 GB/s (1.93x)            55.5 GB/s (1.00x)
memmem/pathological/md5-huge-no-hash                        25.8 GB/s (1.00x)            13.6 GB/s (1.89x)
memmem/pathological/md5-huge-last-hash                      25.6 GB/s (1.00x)            13.5 GB/s (1.90x)
memmem/pathological/rare-repeated-huge-tricky               30.3 GB/s (1.00x)            16.6 GB/s (1.83x)
memmem/pathological/rare-repeated-small-tricky              22.2 GB/s (1.00x)            11.2 GB/s (1.98x)
memmem/pathological/defeat-simple-vector-alphabet           3.0 GB/s (1.00x)             1114.1 MB/s (2.77x)
memmem/pathological/defeat-simple-vector-freq-alphabet      14.8 GB/s (1.00x)            2.2 GB/s (6.72x)
memmem/pathological/defeat-simple-vector-repeated-alphabet  835.1 MB/s (1.00x)           173.8 MB/s (4.80x)
memmem/sliceslice/short                                     7.33ms (1.00x)               36.55ms (4.99x)
memmem/sliceslice/seemingly-random                          6.0 MB/s (1.00x)             3.6 MB/s (1.67x)
memmem/sliceslice/i386                                      30.3 MB/s (1.00x)            15.1 MB/s (2.00x)
memmem/subtitles/never/huge-en-john-watson                  30.9 GB/s (1.00x)            16.6 GB/s (1.86x)
memmem/subtitles/never/huge-en-all-common-bytes             22.7 GB/s (1.00x)            13.8 GB/s (1.64x)
memmem/subtitles/never/huge-en-some-rare-bytes              30.9 GB/s (1.00x)            16.6 GB/s (1.86x)
memmem/subtitles/never/huge-en-two-space                    33.8 GB/s (1.00x)            16.6 GB/s (2.03x)
memmem/subtitles/never/huge-ru-john-watson                  30.3 GB/s (1.00x)            7.1 GB/s (4.25x)
memmem/subtitles/never/huge-zh-john-watson                  29.2 GB/s (1.00x)            16.0 GB/s (1.83x)
memmem/subtitles/rare/huge-en-sherlock-holmes               30.3 GB/s (1.00x)            16.3 GB/s (1.86x)
memmem/subtitles/rare/huge-en-sherlock                      30.6 GB/s (1.00x)            16.6 GB/s (1.85x)
memmem/subtitles/rare/huge-en-medium-needle                 28.3 GB/s (1.00x)            12.4 GB/s (2.28x)
memmem/subtitles/rare/huge-en-long-needle                   32.8 GB/s (1.00x)            15.7 GB/s (2.08x)
memmem/subtitles/rare/huge-en-huge-needle                   32.9 GB/s (1.00x)            16.1 GB/s (2.05x)
memmem/subtitles/rare/huge-ru-sherlock-holmes               30.3 GB/s (1.00x)            8.0 GB/s (3.80x)
memmem/subtitles/rare/huge-ru-sherlock                      30.2 GB/s (1.00x)            10.1 GB/s (3.00x)
memmem/subtitles/rare/huge-zh-sherlock-holmes               28.8 GB/s (1.00x)            14.7 GB/s (1.95x)
memmem/subtitles/rare/huge-zh-sherlock                      29.6 GB/s (1.00x)            14.0 GB/s (2.12x)
```

Differences with the substring search implementation and
`memmem` as provided by macOS's libc. Showing only measurements
with 2x difference or greater. This is what utter destruction
looks like. (I'm not sure what's going on in benchmarks like
`memmem/subtitles/rare/teeny-en-sherlock-holmes`. It's a tiny haystack
and macOS seems to either measure 1ns or 41ns. I wonder if there's
something odd about time precision on macOS? You can see the reverse
happen in `memmem/subtitles/rare/teeny-zh-sherlock`.)

```
$ rebar cmp benchmarks/record/aarch64/2023-08-26.csv -e libc/memmem/oneshot -e rust/memchr/memmem/oneshot -t 2
benchmark                                                   libc/memmem/oneshot   rust/memchr/memmem/oneshot
---------                                                   -------------------   --------------------------
memmem/byterank/binary                                      626.1 MB/s (5.11x)    3.1 GB/s (1.00x)
memmem/code/rust-library-never-fn-strength                  1320.8 MB/s (22.98x)  29.6 GB/s (1.00x)
memmem/code/rust-library-never-fn-strength-paren            1320.8 MB/s (22.49x)  29.0 GB/s (1.00x)
memmem/code/rust-library-never-fn-quux                      1332.0 MB/s (23.25x)  30.2 GB/s (1.00x)
memmem/code/rust-library-rare-fn-from-str                   1442.0 MB/s (20.37x)  28.7 GB/s (1.00x)
memmem/code/rust-library-common-fn-is-empty                 1320.8 MB/s (22.02x)  28.4 GB/s (1.00x)
memmem/code/rust-library-common-fn                          1320.8 MB/s (11.44x)  14.8 GB/s (1.00x)
memmem/code/rust-library-common-let                         1114.7 MB/s (8.59x)   9.4 GB/s (1.00x)
memmem/pathological/md5-huge-no-hash                        994.0 MB/s (26.39x)   25.6 GB/s (1.00x)
memmem/pathological/md5-huge-last-hash                      994.3 MB/s (26.39x)   25.6 GB/s (1.00x)
memmem/pathological/rare-repeated-huge-tricky               1670.8 MB/s (18.56x)  30.3 GB/s (1.00x)
memmem/pathological/rare-repeated-huge-match                1353.0 MB/s (1.00x)   378.5 MB/s (3.57x)
memmem/pathological/rare-repeated-small-tricky              1637.4 MB/s (13.88x)  22.2 GB/s (1.00x)
memmem/pathological/rare-repeated-small-match               1348.3 MB/s (1.00x)   394.5 MB/s (3.42x)
memmem/pathological/defeat-simple-vector-alphabet           568.1 MB/s (5.43x)    3.0 GB/s (1.00x)
memmem/pathological/defeat-simple-vector-freq-alphabet      1027.2 MB/s (14.55x)  14.6 GB/s (1.00x)
memmem/pathological/defeat-simple-vector-repeated-alphabet  173.8 MB/s (4.80x)    834.2 MB/s (1.00x)
memmem/subtitles/common/huge-en-that                        841.6 MB/s (13.19x)   10.8 GB/s (1.00x)
memmem/subtitles/common/huge-en-you                         1161.7 MB/s (4.00x)   4.5 GB/s (1.00x)
memmem/subtitles/common/huge-ru-that                        590.9 MB/s (19.48x)   11.2 GB/s (1.00x)
memmem/subtitles/common/huge-ru-not                         334.3 MB/s (18.62x)   6.1 GB/s (1.00x)
memmem/subtitles/common/huge-zh-that                        1340.1 MB/s (11.49x)  15.0 GB/s (1.00x)
memmem/subtitles/common/huge-zh-do-not                      858.5 MB/s (9.15x)    7.7 GB/s (1.00x)
memmem/subtitles/never/huge-en-john-watson                  1648.3 MB/s (19.14x)  30.8 GB/s (1.00x)
memmem/subtitles/never/huge-en-all-common-bytes             1075.4 MB/s (21.65x)  22.7 GB/s (1.00x)
memmem/subtitles/never/huge-en-some-rare-bytes              1655.7 MB/s (19.10x)  30.9 GB/s (1.00x)
memmem/subtitles/never/huge-en-two-space                    541.6 MB/s (63.83x)   33.8 GB/s (1.00x)
memmem/subtitles/never/teeny-en-two-space                   651.3 MB/s (41.00x)   26.1 GB/s (1.00x)
memmem/subtitles/never/huge-ru-john-watson                  427.0 MB/s (72.56x)   30.3 GB/s (1.00x)
memmem/subtitles/never/huge-zh-john-watson                  1155.4 MB/s (25.81x)  29.1 GB/s (1.00x)
memmem/subtitles/rare/huge-en-sherlock-holmes               1577.4 MB/s (19.60x)  30.2 GB/s (1.00x)
memmem/subtitles/rare/huge-en-sherlock                      1577.4 MB/s (19.78x)  30.5 GB/s (1.00x)
memmem/subtitles/rare/huge-en-medium-needle                 1155.6 MB/s (24.95x)  28.2 GB/s (1.00x)
memmem/subtitles/rare/huge-en-long-needle                   1488.8 MB/s (20.77x)  30.2 GB/s (1.00x)
memmem/subtitles/rare/huge-en-huge-needle                   1609.5 MB/s (17.27x)  27.1 GB/s (1.00x)
memmem/subtitles/rare/teeny-en-sherlock-holmes              26.1 GB/s (1.00x)     651.3 MB/s (41.00x)
memmem/subtitles/rare/huge-ru-sherlock-holmes               427.0 MB/s (72.41x)   30.2 GB/s (1.00x)
memmem/subtitles/rare/huge-ru-sherlock                      348.2 MB/s (91.21x)   31.0 GB/s (1.00x)
memmem/subtitles/rare/huge-zh-sherlock-holmes               955.8 MB/s (31.66x)   29.6 GB/s (1.00x)
memmem/subtitles/rare/huge-zh-sherlock                      853.4 MB/s (35.46x)   29.6 GB/s (1.00x)
memmem/subtitles/rare/teeny-zh-sherlock-holmes              28.9 GB/s (1.00x)     721.1 MB/s (41.00x)
memmem/subtitles/rare/teeny-zh-sherlock                     721.1 MB/s (41.00x)   28.9 GB/s (1.00x)
```

Differences with the [`bytecount`](https://crates.io/crates/bytecount)
crate as `memchr_iter(needle, haystack).count()` is now specialized to
its own vector implementation just for counting the number of matches
(instead of reporting the offset of each match).

```
$ rebar cmp benchmarks/record/aarch64/2023-08-26.csv -e '^rust/bytecount/memchr/oneshot$' -e '^rust/memchr/memchr/onlycount$'
benchmark                          rust/bytecount/memchr/oneshot  rust/memchr/memchr/onlycount
---------                          -----------------------------  ----------------------------
memchr/sherlock/common/huge1       29.5 GB/s (1.40x)              41.4 GB/s (1.00x)
memchr/sherlock/common/small1      618.4 GB/s (1.00x)             618.4 GB/s (1.00x)
memchr/sherlock/common/tiny1       64.3 GB/s (1.00x)              64.3 GB/s (1.00x)
memchr/sherlock/never/huge1        29.5 GB/s (1.40x)              41.4 GB/s (1.00x)
memchr/sherlock/never/small1       618.4 GB/s (1.00x)             618.4 GB/s (1.00x)
memchr/sherlock/never/tiny1        64.3 GB/s (1.00x)              64.3 GB/s (1.00x)
memchr/sherlock/never/empty1       1.00ns (1.00x)                 1.00ns (1.00x)
memchr/sherlock/rare/huge1         29.5 GB/s (1.40x)              41.4 GB/s (1.00x)
memchr/sherlock/rare/small1        618.4 GB/s (1.00x)             618.4 GB/s (1.00x)
memchr/sherlock/rare/tiny1         64.3 GB/s (1.00x)              64.3 GB/s (1.00x)
memchr/sherlock/uncommon/huge1     29.5 GB/s (1.40x)              41.4 GB/s (1.00x)
memchr/sherlock/uncommon/small1    618.4 GB/s (1.00x)             618.4 GB/s (1.00x)
memchr/sherlock/uncommon/tiny1     64.3 GB/s (1.00x)              64.3 GB/s (1.00x)
memchr/sherlock/verycommon/huge1   28.7 GB/s (1.44x)              41.4 GB/s (1.00x)
memchr/sherlock/verycommon/small1  618.4 GB/s (1.00x)             618.4 GB/s (1.00x)
```
  • Loading branch information
BurntSushi committed Aug 27, 2023
1 parent abcc473 commit 51ed4b6
Show file tree
Hide file tree
Showing 201 changed files with 39,291 additions and 397,145 deletions.
264 changes: 169 additions & 95 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,46 +8,63 @@ on:
- master
schedule:
- cron: '00 01 * * *'

# The section is needed to drop write-all permissions that are granted on
# `schedule` event. By specifying any permission explicitly all others are set
# to none. By using the principle of least privilege the damage a compromised
# workflow can do (because of an injection or compromised third party tool or
# action) is restricted. Currently the worklow doesn't need any additional
# permission except for pulling the code. Adding labels to issues, commenting
# on pull-requests, etc. may need additional permissions:
#
# Syntax for this section:
# https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#permissions
#
# Reference for how to assign permissions on a job-by-job basis:
# https://docs.github.com/en/actions/using-jobs/assigning-permissions-to-jobs
#
# Reference for available permissions that we can enable if needed:
# https://docs.github.com/en/actions/security-guides/automatic-token-authentication#permissions-for-the-github_token
permissions:
# to fetch code (actions/checkout)
contents: read

jobs:
# Baseline testing across a number of different targets.
test:
name: test
env:
# For some builds, we use cross to test on 32-bit and big-endian
# systems.
CARGO: cargo
# When CARGO is set to CROSS, TARGET is set to `--target matrix.target`.
# Note that we only use cross on Linux, so setting a target on a
# different OS will just use normal cargo.
TARGET:
# Bump this as appropriate. We pin to a version to make sure CI
# continues to work as cross releases in the past have broken things
# in subtle ways.
CROSS_VERSION: v0.2.5
# Make quickcheck run more tests for hopefully better coverage.
QUICKCHECK_TESTS: 100000
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
build:
- pinned
- stable
- stable-32
- stable-mips
- wasm
- beta
- nightly
- macos
- win-msvc
- win-gnu
- stable-x86
- stable-aarch64
- stable-powerpc64
- stable-s390x
include:
- build: pinned
os: ubuntu-latest
rust: 1.41.1
- build: stable
os: ubuntu-latest
rust: stable
- build: stable-32
os: ubuntu-latest
rust: stable
target: i686-unknown-linux-gnu
- build: stable-mips
os: ubuntu-latest
rust: stable
target: mips64-unknown-linux-gnuabi64
- build: beta
os: ubuntu-latest
rust: beta
Expand All @@ -63,10 +80,24 @@ jobs:
- build: win-gnu
os: windows-latest
rust: stable-x86_64-gnu
- build: wasm
- build: stable-x86
os: ubuntu-latest
rust: stable-x86_64-gnu
wasm: true
rust: stable
target: i686-unknown-linux-gnu
# This is kind of a stand-in for Apple silicon since we can't currently
# use GitHub Actions with Apple silicon.
- build: stable-aarch64
os: ubuntu-latest
rust: stable
target: aarch64-unknown-linux-gnu
- build: stable-powerpc64
os: ubuntu-latest
rust: stable
target: powerpc64-unknown-linux-gnu
- build: stable-s390x
os: ubuntu-latest
rust: stable
target: s390x-unknown-linux-gnu
steps:
- name: Checkout repository
uses: actions/checkout@v3
Expand All @@ -75,83 +106,76 @@ jobs:
with:
toolchain: ${{ matrix.rust }}
- name: Use Cross
if: matrix.target != ''
if: matrix.os == 'ubuntu-latest' && matrix.target != ''
run: |
# We used to install 'cross' from master, but it kept failing. So now
# we build from a known-good version until 'cross' becomes more stable
# or we find an alternative. Notably, between v0.2.1 and current
# master (2022-06-14), the number of Cross's dependencies has doubled.
cargo install --bins --git https://github.com/rust-embedded/cross --tag v0.2.1
# In the past, new releases of 'cross' have broken CI. So for now, we
# pin it. We also use their pre-compiled binary releases because cross
# has over 100 dependencies and takes a bit to compile.
dir="$RUNNER_TEMP/cross-download"
mkdir "$dir"
echo "$dir" >> $GITHUB_PATH
cd "$dir"
curl -LO "https://github.com/cross-rs/cross/releases/download/$CROSS_VERSION/cross-x86_64-unknown-linux-musl.tar.gz"
tar xf cross-x86_64-unknown-linux-musl.tar.gz
echo "CARGO=cross" >> $GITHUB_ENV
echo "TARGET=--target ${{ matrix.target }}" >> $GITHUB_ENV
- name: Download Wasmtime
if: matrix.wasm
run: |
rustup target add wasm32-wasi
echo "CARGO_BUILD_TARGET=wasm32-wasi" >> $GITHUB_ENV
echo "RUSTFLAGS=-Ctarget-feature=+simd128" >> $GITHUB_ENV
curl -LO https://github.com/bytecodealliance/wasmtime/releases/download/v0.32.0/wasmtime-v0.32.0-x86_64-linux.tar.xz
tar xvf wasmtime-v0.32.0-x86_64-linux.tar.xz
echo `pwd`/wasmtime-v0.32.0-x86_64-linux >> $GITHUB_PATH
echo "CARGO_TARGET_WASM32_WASI_RUNNER=wasmtime run --wasm-features simd --" >> $GITHUB_ENV
- name: Show command used for Cargo
run: |
echo "cargo command is: ${{ env.CARGO }}"
echo "target flag is: ${{ env.TARGET }}"
- name: Show CPU info for debugging
if: matrix.os == 'ubuntu-latest'
run: lscpu
- run: ${{ env.CARGO }} build --verbose $TARGET
- run: ${{ env.CARGO }} build --verbose $TARGET --no-default-features
- run: ${{ env.CARGO }} doc --verbose $TARGET
# Our dev dependencies evolve more rapidly than we'd like, so only run
# tests when we aren't pinning the Rust version.
- if: matrix.build != 'pinned'
name: Show byte order for debugging
- name: Basic build
run: ${{ env.CARGO }} build --verbose $TARGET
- name: Build docs
run: ${{ env.CARGO }} doc --verbose $TARGET
- name: Show byte order for debugging
run: ${{ env.CARGO }} test --verbose $TARGET byte_order -- --nocapture
- if: matrix.build != 'pinned'
name: Run tests under default configuration
run: ${{ env.CARGO }} test --verbose $TARGET
- if: matrix.build != 'pinned'
name: Run tests with just alloc feature
run: ${{ env.CARGO }} test --verbose --no-default-features --features alloc $TARGET
- if: matrix.build == 'stable'
name: Run under different SIMD configurations
run: |
set -x
# Enable libc while using SIMD, libc won't be used.
# (This is to ensure valid logic in the picking process.)
cargo test --verbose --features libc
preamble="--cfg memchr_disable_auto_simd"
- name: Run tests
run: cargo test --verbose
- name: Run with only 'alloc' enabled
run: cargo test --verbose --no-default-features --features alloc
- name: Run tests without any features enabled (core-only)
run: cargo test --verbose --no-default-features

# Force use of fallback without libc.
RUSTFLAGS="$preamble" cargo test --verbose
# Force use of libc.
RUSTFLAGS="$preamble" cargo test --verbose --features libc
preamble="$preamble --cfg memchr_runtime_simd"
# Force use of fallback even when SIMD is enabled.
RUSTFLAGS="$preamble" cargo test --verbose
# For some reason, cargo seems to get confused which results in
# link errors. So wipe the slate clean.
cargo clean
# Force use of sse2 only
RUSTFLAGS="$preamble --cfg memchr_runtime_sse2" cargo test --verbose
# ... and wipe it again. So weird.
cargo clean
# Force use of avx only
RUSTFLAGS="$preamble --cfg memchr_runtime_avx" cargo test --verbose
- if: matrix.build == 'nightly'
name: Run benchmarks as tests
run: cargo bench --manifest-path bench/Cargo.toml --verbose -- --test
# Setup and run tests on the wasm32-wasi target via wasmtime.
wasm:
runs-on: ubuntu-latest
env:
# The version of wasmtime to download and install.
WASMTIME_VERSION: 12.0.1
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Install Rust
uses: dtolnay/rust-toolchain@master
with:
toolchain: stable
- name: Add wasm32-wasi target
run: rustup target add wasm32-wasi
- name: Download and install Wasmtime
run: |
echo "CARGO_BUILD_TARGET=wasm32-wasi" >> $GITHUB_ENV
echo "RUSTFLAGS=-Ctarget-feature=+simd128" >> $GITHUB_ENV
curl -LO https://github.com/bytecodealliance/wasmtime/releases/download/v$WASMTIME_VERSION/wasmtime-v$WASMTIME_VERSION-x86_64-linux.tar.xz
tar xvf wasmtime-v$WASMTIME_VERSION-x86_64-linux.tar.xz
echo `pwd`/wasmtime-v$WASMTIME_VERSION-x86_64-linux >> $GITHUB_PATH
echo "CARGO_TARGET_WASM32_WASI_RUNNER=wasmtime run --wasm-features simd --" >> $GITHUB_ENV
- name: Basic build
run: cargo build --verbose
- name: Run tests
run: cargo test --verbose
- name: Run with only 'alloc' enabled
run: cargo test --verbose --no-default-features --features alloc
- name: Run tests without any features enabled (core-only)
run: cargo test --verbose --no-default-features

build-for-non_sse-target:
name: build for non-SSE target
# This job uses a custom target file to build the memchr crate on x86-64
# but *without* SSE/AVX target features. This is a somewhat strange
# configuration, but it pops up now and then. Particularly in kernels that
# don't support SSE/AVX registers.
build-for-x86-64-but-non-sse-target:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
Expand All @@ -163,25 +187,78 @@ jobs:
components: rust-src
- run: cargo build -Z build-std=core --target=src/tests/x86_64-soft_float.json --verbose --no-default-features

test-with-miri:
name: test with miri
# This job runs a stripped down version of CI to test the MSRV. The specific
# reason for doing this is that dev-dependencies tend to evolve more quickly.
# There isn't as tight of a control on them because, well, they're only used
# in tests and their MSRV doesn't matter as much.
#
# It is a bit unfortunate that our MSRV test is basically just "build it"
# and pass if that works. But usually MSRV is broken by compilation problems
# and not runtime behavior. So this is in practice good enough.
msrv:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Install Rust
uses: dtolnay/rust-toolchain@master
with:
toolchain: 1.60.0
- name: Basic build
run: cargo build --verbose
- name: Build docs
run: cargo doc --verbose

# Runs miri on memchr's test suite. This doesn't quite cover everything. Some
# tests (especially quickcheck) are disabled when building with miri because
# of how slow miri runs. But it still gives us decent coverage.
miri:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Install Rust
uses: dtolnay/rust-toolchain@master
with:
# We use nightly here so that we can use miri I guess?
toolchain: nightly
components: miri
- name: Show CPU info for debugging
run: lscpu
- run: cargo miri test --verbose
- run: cargo miri test --verbose --no-default-features
- run: cargo miri test --verbose --features libc
- name: Run full test suite
run: cargo miri test --verbose

# Tests that memchr's benchmark suite builds and passes all tests.
rebar:
runs-on: ubuntu-latest
env:
# The version of wasmtime to download and install.
WASMTIME_VERSION: 12.0.1
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Install Rust
uses: dtolnay/rust-toolchain@master
with:
toolchain: stable
- name: Add wasm32-wasi target
run: rustup target add wasm32-wasi
- name: Download and install Wasmtime
run: |
# Note that we don't have to set CARGO_BUILD_TARGET and other
# environment variables like we do for the `wasm` job. This is because
# `rebar` knows how to set them itself and only when running the wasm
# engines.
curl -LO https://github.com/bytecodealliance/wasmtime/releases/download/v$WASMTIME_VERSION/wasmtime-v$WASMTIME_VERSION-x86_64-linux.tar.xz
tar xvf wasmtime-v$WASMTIME_VERSION-x86_64-linux.tar.xz
echo `pwd`/wasmtime-v$WASMTIME_VERSION-x86_64-linux >> $GITHUB_PATH
- name: Install rebar
run: cargo install --git https://github.com/BurntSushi/rebar rebar
- name: Build all rebar engines
run: rebar build
- name: Run all benchmarks as tests
run: rebar measure --test

# Tests that everything is formatted correctly.
rustfmt:
name: rustfmt
runs-on: ubuntu-latest
steps:
- name: Checkout repository
Expand All @@ -193,7 +270,4 @@ jobs:
components: rustfmt
- name: Check formatting
run: |
cargo fmt -- --check
- name: Check formatting on benchmarks
run: |
cargo fmt --manifest-path bench/Cargo.toml -- --check
cargo fmt --all -- --check
14 changes: 13 additions & 1 deletion .vim/coc-settings.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
{
"rust-analyzer.cargo.allFeatures": false
"rust-analyzer.cargo.allFeatures": false,
"rust-analyzer.linkedProjects": [
"benchmarks/engines/libc/Cargo.toml",
"benchmarks/engines/rust-bytecount/Cargo.toml",
"benchmarks/engines/rust-jetscii/Cargo.toml",
"benchmarks/engines/rust-memchr/Cargo.toml",
"benchmarks/engines/rust-memchrold/Cargo.toml",
"benchmarks/engines/rust-sliceslice/Cargo.toml",
"benchmarks/engines/rust-std/Cargo.toml",
"benchmarks/shared/Cargo.toml",
"fuzz/Cargo.toml",
"Cargo.toml"
]
}
Loading

0 comments on commit 51ed4b6

Please sign in to comment.