From a38fdcc3342bd00c07dd5a0a75094548666b60d8 Mon Sep 17 00:00:00 2001 From: Raimo Niskanen Date: Fri, 6 Oct 2023 16:00:40 +0200 Subject: [PATCH] Improve examples and describe `~c` and `~C` sigils --- eeps/eep-0066.md | 69 +++++++++++++++++++++++++++++++++++------------- 1 file changed, 51 insertions(+), 18 deletions(-) diff --git a/eeps/eep-0066.md b/eeps/eep-0066.md index cf160d9..009a62d 100644 --- a/eeps/eep-0066.md +++ b/eeps/eep-0066.md @@ -188,7 +188,7 @@ shall be interpreted. The suggested Sigil Types are: Creates an Erlang `unicode:unicode_binary()`, handling escape characters in the string content. How other features like string interpolation would work is still an open question. - + Escape characters and other features are the same regardless of which [String Delimiters][] that are used. @@ -196,20 +196,38 @@ shall be interpreted. The suggested Sigil Types are: Creates an Erlang `unicode:unicode_binary()`, with verbatim string content in that only the [end delimiter][] character - can be escaped with a «`\`» character. How other features - like string interpolation would work is still an open question. + can be escaped with a «`\`» character. + + Which [String Delimiters][] that are used does not matter, + except that between triple-quote delimiters according to + [EEP 64][] there is no end delimiter character to escape. + +* «`c`»: [charlist in Elixir][4]. + + Creates an Erlang `string()`, handling escape characters + in the string content. How other features like + string interpolation would work is still an open question. Escape characters and other features are the same regardless - of which [String Delimiters][] that are used, except that - between triple-quote delimiters according to [EEP 64][] - there is no end delimiter character to escape. + of which [String Delimiters][] that are used. + +* «`C`»: [charlist in Elixir][4], verbatim. + + Creates an Erlang `string()`, with verbatim string content + in that only the [end delimiter][] character can be escaped + with a «`\`» character. + + Which [String Delimiters][] that are used does not matter, + except that between triple-quote delimiters according to + [EEP 64][] there is no end delimiter character to escape. * «`r`»: regular expression. - Creates a term `{RE::unicode:charlist(),Flags::[unicode:latin1_char()]}` + Creates a term `{re,RE::unicode:charlist(),Flags::[unicode:latin1_char()]}` that is an uncompiled regular expression with compile flags, - suitable for functions in the `re` module. The `RE` element is - the [String Content][], and the `Flags` element is the [Sigil Suffix][]. + suitable for (yet to be implemented) functions in the `re` module. + The `RE` element is the [String Content][], and the `Flags` element + is the [Sigil Suffix][]. See the [Regular Expressions][] section about the reasoning behind this proposed term type. @@ -221,11 +239,15 @@ shall be interpreted. The suggested Sigil Types are: there is no end delimiter character to escape. The main advantage of a regular expression [Sigil][] is to avoid - the additional escaping of «'\'» that regular erlang strings add. + the additional escaping of «`\`» that regular erlang strings add. + + Today: `re:run(Subject, "^\\s*\"[a-z]+\\\\\\d+\"", [caseless,unicode])` - Today: `re:run(Subject, "^[ \\t]*\\[a-z]*\\\\s+", [caseless,unicode])` + Sigil: `re:run(Subject, ~r'^\s*"[a-z]+\\\d+"'iu)` - Sigil: `re:run(Subject, ~r"^[ \t]*\[a-z]*\\s+"iu)` + Other advantages are possible tools and library integration features + such as making the `re` module recognize this tuple format, + and having the code loader pre-compile them. ### String Delimiters @@ -276,19 +298,24 @@ the [String Content][] when it sees the Sigil Suffix. ### Regular Expressions +A regular expression sigil «`~r"expression"flags`» should +be translated to something useful for tools/libraries. +There are at least two ways; [uncompiled regular expressions][], +or [compiled regular expressions][]. + #### Uncompiled Regular Expression -The value of a regular expression [Sigil][] is a 2-tuple -with the uncompiled regular expression and its compile flags -(in the guise of a sequence of character flags). +The value of a regular expression [Sigil][] is chosen +to be a tuple `{re,RE,Flags}`. With this representation, the `re` module can be augmented -with functions that accept this tuple format. These functions +with functions that accept this tuple format that bundles +a regular expression with compile flags. These functions are `re:compile/1,2`, `re:replace/3,4` `re:run/2,3`, -and `re:split/2,3`. Translation of the compile flag characters +and `re:split/2,3`. Translation of the `Flags`' characters into `re:compile_option()`s should be done by these functions. -Example: +Example of calling a yet to be implemented `re:run/3`: 1> re:run("ABC123", ~r"abc\d+"i, [{capture,first,list}]). {match,["ABC123"]} @@ -422,6 +449,12 @@ more tokenizer rewriting. [Regular Expressions]: #regular-expressions "Regular Expressions" +[uncompiled regular expressions]: #uncompiled-regular-expressions + "Uncompiled Regular Expressions" + +[compiled regular expressions]: #compiled-regular-expressions + "Compiled Regular Expressions" + Copyright =========