Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
Signed-off-by: Suraj Aralihalli <[email protected]>
  • Loading branch information
SurajAralihalli committed Oct 28, 2024
1 parent 370b870 commit 0b3a2c9
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 7 deletions.
5 changes: 1 addition & 4 deletions docs/compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -484,15 +484,12 @@ These are the known edge cases where running on the GPU will produce different r
next to a newline or a repetition that produces zero or more results
([#5610](https://github.com/NVIDIA/spark-rapids/pull/5610))`
- Word and non-word boundaries, `\b` and `\B`
- Line anchor `$` will incorrectly match any of the unicode characters `\u0085`, `\u2028`, or `\u2029` followed by
another line-terminator, such as `\n`. For example, the pattern `TEST$` will match `TEST\u0085\n` on the GPU but
not on the CPU ([#7585](https://github.com/NVIDIA/spark-rapids/issues/7585)).

The following regular expression patterns are not yet supported on the GPU and will fall back to the CPU.

- Line anchors `^` and `$` are not supported in some contexts, such as when combined with a choice (`^|a` or `$|a`).
- String anchor `\Z` is not supported by `regexp_replace`, and in some rare contexts.
- String anchor `\z` is not supported
- String anchor `\z` is not supported.
- Patterns containing an end of line or string anchor immediately next to a newline or repetition that produces zero
or more results
- Line anchor `$` and string anchors `\Z` are not supported in patterns containing `\W` or `\D`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -335,10 +335,11 @@ class RegularExpressionTranspilerSuite extends AnyFunSuite {
}

test("line anchor $ - find") {
val patterns = Seq("a$", "a$b", "\f$", "$\f")
val patterns = Seq("a$", "a$b", "\f$", "$\f","TEST$")
val inputs = Seq("a", "a\n", "a\r", "a\r\n", "a\f", "\f", "\r", "\u0085", "\u2028",
"\u2029", "\n", "\r\n", "\r\n\r", "\r\n\u0085", "\n\r",
"\n\u0085", "\n\u2028", "\n\u2029", "2+|+??wD\n", "a\r\nb")
"\u2029", "\n", "\r\n", "\r\n\r", "\r\n\u0085", "\n\r",
"\n\u0085", "\n\u2028", "\n\u2029", "2+|+??wD\n", "a\r\nb",
"TEST\u0085\n", "TEST\u0085\r", "TEST\u2028\r","TEST\u2028\u2029")
assertCpuGpuMatchesRegexpFind(patterns, inputs)
val unsupportedPatterns = Seq("[\r\n]?$", "$\r", "\r$",
// "\u0085$", "\u2028$", "\u2029$", "\n$", "\r\n$", "[D$3]$")
Expand Down

0 comments on commit 0b3a2c9

Please sign in to comment.