-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' of github.com:propensive/kaleidoscope
- Loading branch information
Showing
1 changed file
with
62 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,23 +6,27 @@ | |
|
||
__Statically-checked inline matching on regular expressions__ | ||
|
||
Kaleidoscope is a small library to make pattern matching against strings more pleasant. Regular | ||
expressions can be written directly in patterns, and capturing groups bound directly to variables, | ||
typed according to the group's repetition. Here is an example: | ||
```amok scala | ||
Kaleidoscope is a small library to make pattern matching against strings more | ||
pleasant. Regular expressions can be written directly in patterns, and | ||
capturing groups bound directly to variables, typed according to the group's | ||
repetition. Here is an example: | ||
```scala | ||
case class Email(user: Text, domain: Text) | ||
|
||
email match | ||
case r"$user([^@]+)@$domain(.*)" => Email(name, domain) | ||
``` | ||
|
||
Strings are widely used to carry complex data, when it's wiser to use | ||
structured objects. Kaleidoscope makes it easier to move away from strings. | ||
|
||
## Features | ||
|
||
- pattern match strings against regular expressions | ||
- regular expressions can be written inline in patterns | ||
- extraction of capturing groups in patterns | ||
- typed extraction (into `List`s or `Option`s) of variable-length capturing groups | ||
- static verification of regular expression syntax | ||
- regular expressions can be written inline in patterns, anywhere a string could match | ||
- direct extraction of capturing groups in patterns | ||
- typed extraction (into `List`s or [Vacuous](https://github.com/propensive/vacuous/) `Optional`s) of variable-length capturing groups | ||
- static checking of regular expression syntax | ||
- simpler "glob" syntax is also provided | ||
|
||
|
||
|
@@ -40,13 +44,26 @@ For the overeager, curious and impatient, see [building](#building). | |
|
||
## Getting Started | ||
|
||
To use Kaleidoscope, first import its package, | ||
Kaleidoscope is included in the `kaleidoscope` package, and exported to the `soundness` package. | ||
|
||
To use Kaleidoscope alone, you can include the import, | ||
```scala | ||
import kaleidoscope.* | ||
``` | ||
or to use it with other [Soundness](https://github.com/propensive/soundness/) libraries, include: | ||
```scala | ||
import soundness.* | ||
``` | ||
|
||
> Note that Kaleidoscope uses the `Text` type from | ||
> [Anticipation](https://github.com/propensive/anticipation) and the `Optional` | ||
> type from [Vacuous](https://github.com/propensive/vacuous/). These offer some | ||
> advantages, but they can be easily converted: `Text#s` converts a `Text` to a | ||
> `String` and `Optional#option` converts an `Optional` value to its equivalent | ||
> `Option`. The necessary imports are show in the examples. | ||
and you can then use a Kaleidoscope regular expression—a string prefixed with | ||
the letter `r`—anywhere you can use a pattern in Scala. For example, | ||
You can then use a Kaleidoscope regular expression—a string prefixed with | ||
the letter `r`—anywhere you can pattern match against a string in Scala. For example, | ||
```scala | ||
import anticipation.Text | ||
|
||
|
@@ -73,7 +90,7 @@ with the exception that a capturing group (enclosed within `(` and `)`) may be | |
bound to an identifier by placing it, like an interpolated string substitution, | ||
immediately prior to the capturing group, as `$identifier` or `${identifier}`. | ||
|
||
Here is an example: | ||
Here is an example of using a pattern match against filenames: | ||
```scala | ||
enum FileType: | ||
case Image(text: Text) | ||
|
@@ -84,29 +101,41 @@ def identify(path: Text): FileType = path match | |
case r"/styles/$styles(.*)" => FileType.Stylesheet(styles) | ||
``` | ||
|
||
Alternatively, this can be extracted directly in a `val` definition, like so: | ||
Alternatively, as with patterns in general, this can be extracted directly in a | ||
`val` definition. | ||
|
||
Here is an example of matching an email address: | ||
```scala | ||
val r"^[a-z0-9._%+-]+@$domain([a-z0-9.-]+\.$tld([a-z]{2,6}))$$" = | ||
"[email protected]": @unchecked | ||
``` | ||
In the REPL, this would bind the following values: | ||
|
||
The `@unchecked` annotation ascribed to the result is standard Scala, and | ||
acknowledges to the compiler that the match is _partial_ and may fail at | ||
runtime. | ||
|
||
If you try this example in the Scala REPL, it would bind the following values: | ||
``` | ||
> domain: Text = t"example.com" | ||
> tld: Text = t"com" | ||
``` | ||
|
||
In addition, the syntax of the regular expressionwill be checked at compile-time, and any | ||
issues will be reported then. | ||
In addition, the syntax of the regular expression will be checked at | ||
compile-time, and any issues will be reported then. | ||
|
||
### Repeated and optional capture groups | ||
|
||
A normal, unitary capturing group will extract into a `Text` value. But if a capturing group has | ||
a repetition suffix, such as `*` or `+`, then the extracted type will be a `List[Text]`. This also | ||
applies to repetition ranges, such as `{3}`, `{2,}` or `{1,9}`. Note that `{1}` will still extract | ||
a `Text` value. | ||
A normal, _unitary_ capturing group, like `domain` and `tld` above, will | ||
extract into `Text` values. But if a capturing group has a repetition suffix, | ||
such as `*` or `+`, then the extracted type will be a `List[Text]`. This also | ||
applies to repetition ranges, such as `{3}`, `{2,}` or `{1,9}`. | ||
|
||
A capture group may be marked as optional, meaning it can appear either zero or one times. This | ||
will extract a value with the type `Option[Text]`. | ||
Note that `{1}` will still extract a `Text` value. The type is determined | ||
statically from the pattern, and not dynamically from the runtime scrutinee. | ||
|
||
A capture group may be marked as optional, meaning it can appear either zero or | ||
one times. This will extract a value with the type `Optional[Text]`; that is, | ||
if it present it will be a `Text` value, and if not, it will be `Unset`. | ||
|
||
For example, see how `init` is extracted as a `List[Text]`, below: | ||
```scala | ||
|
@@ -122,11 +151,17 @@ def parseList(): List[Text] = "parsley, sage, rosemary, and thyme" match | |
|
||
Note that inside an extractor pattern string, whether it is single- (`r"..."`) | ||
or triple-quoted (`r"""..."""`), special characters, notably `\`, do not need | ||
to be escaped, with the exception of `$` which should be written as `$$`. It is | ||
still necessary, however, to follow the regular expression escaping rules, for | ||
example, an extractor matching a single opening parenthesis would be written as | ||
`r"\("` or `r"""\("""`. | ||
to be escaped, with the exception of `$` which should be written as `$$`. | ||
|
||
It is still necessary, however, to follow the regular expression escaping | ||
rules, for example, an extractor matching a single opening parenthesis would be | ||
written as `r"\("` or `r"""\("""`. | ||
|
||
## Globs | ||
|
||
Globs offer a simplified and limited form of regular expression. You can use | ||
these in exactly the same way as a standard regular expresion, using the | ||
`g"..."` interpolator instead. | ||
|
||
|
||
|
||
|
@@ -146,7 +181,7 @@ as long as caution is taken to avoid a mismatch between the project's stability | |
level and the required stability and maintainability of your own project. | ||
|
||
Kaleidoscope is designed to be _small_. Its entire source code currently consists | ||
of 532 lines of code. | ||
of 530 lines of code. | ||
|
||
## Building | ||
|
||
|