diff --git a/readme.md b/readme.md index 83a9cd8..1dfc015 100644 --- a/readme.md +++ b/readme.md @@ -6,23 +6,27 @@ __Statically-checked inline matching on regular expressions__ -Kaleidoscope is a small library to make pattern matching against strings more pleasant. Regular -expressions can be written directly in patterns, and capturing groups bound directly to variables, -typed according to the group's repetition. Here is an example: -```amok scala +Kaleidoscope is a small library to make pattern matching against strings more +pleasant. Regular expressions can be written directly in patterns, and +capturing groups bound directly to variables, typed according to the group's +repetition. Here is an example: +```scala case class Email(user: Text, domain: Text) email match case r"$user([^@]+)@$domain(.*)" => Email(name, domain) ``` +Strings are widely used to carry complex data, when it's wiser to use +structured objects. Kaleidoscope makes it easier to move away from strings. + ## Features - pattern match strings against regular expressions -- regular expressions can be written inline in patterns -- extraction of capturing groups in patterns -- typed extraction (into `List`s or `Option`s) of variable-length capturing groups -- static verification of regular expression syntax +- regular expressions can be written inline in patterns, anywhere a string could match +- direct extraction of capturing groups in patterns +- typed extraction (into `List`s or [Vacuous](https://github.com/propensive/vacuous/) `Optional`s) of variable-length capturing groups +- static checking of regular expression syntax - simpler "glob" syntax is also provided @@ -40,13 +44,26 @@ For the overeager, curious and impatient, see [building](#building). ## Getting Started -To use Kaleidoscope, first import its package, +Kaleidoscope is included in the `kaleidoscope` package, and exported to the `soundness` package. + +To use Kaleidoscope alone, you can include the import, ```scala import kaleidoscope.* ``` +or to use it with other [Soundness](https://github.com/propensive/soundness/) libraries, include: +```scala +import soundness.* +``` + +> Note that Kaleidoscope uses the `Text` type from +> [Anticipation](https://github.com/propensive/anticipation) and the `Optional` +> type from [Vacuous](https://github.com/propensive/vacuous/). These offer some +> advantages, but they can be easily converted: `Text#s` converts a `Text` to a +> `String` and `Optional#option` converts an `Optional` value to its equivalent +> `Option`. The necessary imports are show in the examples. -and you can then use a Kaleidoscope regular expression—a string prefixed with -the letter `r`—anywhere you can use a pattern in Scala. For example, +You can then use a Kaleidoscope regular expression—a string prefixed with +the letter `r`—anywhere you can pattern match against a string in Scala. For example, ```scala import anticipation.Text @@ -73,7 +90,7 @@ with the exception that a capturing group (enclosed within `(` and `)`) may be bound to an identifier by placing it, like an interpolated string substitution, immediately prior to the capturing group, as `$identifier` or `${identifier}`. -Here is an example: +Here is an example of using a pattern match against filenames: ```scala enum FileType: case Image(text: Text) @@ -84,29 +101,41 @@ def identify(path: Text): FileType = path match case r"/styles/$styles(.*)" => FileType.Stylesheet(styles) ``` -Alternatively, this can be extracted directly in a `val` definition, like so: +Alternatively, as with patterns in general, this can be extracted directly in a +`val` definition. + +Here is an example of matching an email address: ```scala val r"^[a-z0-9._%+-]+@$domain([a-z0-9.-]+\.$tld([a-z]{2,6}))$$" = "test@example.com": @unchecked ``` -In the REPL, this would bind the following values: + +The `@unchecked` annotation ascribed to the result is standard Scala, and +acknowledges to the compiler that the match is _partial_ and may fail at +runtime. + +If you try this example in the Scala REPL, it would bind the following values: ``` > domain: Text = t"example.com" > tld: Text = t"com" ``` -In addition, the syntax of the regular expressionwill be checked at compile-time, and any -issues will be reported then. +In addition, the syntax of the regular expression will be checked at +compile-time, and any issues will be reported then. ### Repeated and optional capture groups -A normal, unitary capturing group will extract into a `Text` value. But if a capturing group has -a repetition suffix, such as `*` or `+`, then the extracted type will be a `List[Text]`. This also -applies to repetition ranges, such as `{3}`, `{2,}` or `{1,9}`. Note that `{1}` will still extract -a `Text` value. +A normal, _unitary_ capturing group, like `domain` and `tld` above, will +extract into `Text` values. But if a capturing group has a repetition suffix, +such as `*` or `+`, then the extracted type will be a `List[Text]`. This also +applies to repetition ranges, such as `{3}`, `{2,}` or `{1,9}`. -A capture group may be marked as optional, meaning it can appear either zero or one times. This -will extract a value with the type `Option[Text]`. +Note that `{1}` will still extract a `Text` value. The type is determined +statically from the pattern, and not dynamically from the runtime scrutinee. + +A capture group may be marked as optional, meaning it can appear either zero or +one times. This will extract a value with the type `Optional[Text]`; that is, +if it present it will be a `Text` value, and if not, it will be `Unset`. For example, see how `init` is extracted as a `List[Text]`, below: ```scala @@ -122,11 +151,17 @@ def parseList(): List[Text] = "parsley, sage, rosemary, and thyme" match Note that inside an extractor pattern string, whether it is single- (`r"..."`) or triple-quoted (`r"""..."""`), special characters, notably `\`, do not need -to be escaped, with the exception of `$` which should be written as `$$`. It is -still necessary, however, to follow the regular expression escaping rules, for -example, an extractor matching a single opening parenthesis would be written as -`r"\("` or `r"""\("""`. +to be escaped, with the exception of `$` which should be written as `$$`. + +It is still necessary, however, to follow the regular expression escaping +rules, for example, an extractor matching a single opening parenthesis would be +written as `r"\("` or `r"""\("""`. + +## Globs +Globs offer a simplified and limited form of regular expression. You can use +these in exactly the same way as a standard regular expresion, using the +`g"..."` interpolator instead. @@ -146,7 +181,7 @@ as long as caution is taken to avoid a mismatch between the project's stability level and the required stability and maintainability of your own project. Kaleidoscope is designed to be _small_. Its entire source code currently consists -of 532 lines of code. +of 530 lines of code. ## Building