Skip to content

Commit

Permalink
Merge branch 'main' of github.com:propensive/kaleidoscope
Browse files Browse the repository at this point in the history
  • Loading branch information
propensive committed May 11, 2024
2 parents 3ea0b26 + 44a9816 commit 136da1a
Showing 1 changed file with 62 additions and 27 deletions.
89 changes: 62 additions & 27 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,23 +6,27 @@

__Statically-checked inline matching on regular expressions__

Kaleidoscope is a small library to make pattern matching against strings more pleasant. Regular
expressions can be written directly in patterns, and capturing groups bound directly to variables,
typed according to the group's repetition. Here is an example:
```amok scala
Kaleidoscope is a small library to make pattern matching against strings more
pleasant. Regular expressions can be written directly in patterns, and
capturing groups bound directly to variables, typed according to the group's
repetition. Here is an example:
```scala
case class Email(user: Text, domain: Text)

email match
case r"$user([^@]+)@$domain(.*)" => Email(name, domain)
```

Strings are widely used to carry complex data, when it's wiser to use
structured objects. Kaleidoscope makes it easier to move away from strings.

## Features

- pattern match strings against regular expressions
- regular expressions can be written inline in patterns
- extraction of capturing groups in patterns
- typed extraction (into `List`s or `Option`s) of variable-length capturing groups
- static verification of regular expression syntax
- regular expressions can be written inline in patterns, anywhere a string could match
- direct extraction of capturing groups in patterns
- typed extraction (into `List`s or [Vacuous](https://github.com/propensive/vacuous/) `Optional`s) of variable-length capturing groups
- static checking of regular expression syntax
- simpler "glob" syntax is also provided


Expand All @@ -40,13 +44,26 @@ For the overeager, curious and impatient, see [building](#building).

## Getting Started

To use Kaleidoscope, first import its package,
Kaleidoscope is included in the `kaleidoscope` package, and exported to the `soundness` package.

To use Kaleidoscope alone, you can include the import,
```scala
import kaleidoscope.*
```
or to use it with other [Soundness](https://github.com/propensive/soundness/) libraries, include:
```scala
import soundness.*
```

> Note that Kaleidoscope uses the `Text` type from
> [Anticipation](https://github.com/propensive/anticipation) and the `Optional`
> type from [Vacuous](https://github.com/propensive/vacuous/). These offer some
> advantages, but they can be easily converted: `Text#s` converts a `Text` to a
> `String` and `Optional#option` converts an `Optional` value to its equivalent
> `Option`. The necessary imports are show in the examples.
and you can then use a Kaleidoscope regular expression—a string prefixed with
the letter `r`—anywhere you can use a pattern in Scala. For example,
You can then use a Kaleidoscope regular expression—a string prefixed with
the letter `r`—anywhere you can pattern match against a string in Scala. For example,
```scala
import anticipation.Text

Expand All @@ -73,7 +90,7 @@ with the exception that a capturing group (enclosed within `(` and `)`) may be
bound to an identifier by placing it, like an interpolated string substitution,
immediately prior to the capturing group, as `$identifier` or `${identifier}`.

Here is an example:
Here is an example of using a pattern match against filenames:
```scala
enum FileType:
case Image(text: Text)
Expand All @@ -84,29 +101,41 @@ def identify(path: Text): FileType = path match
case r"/styles/$styles(.*)" => FileType.Stylesheet(styles)
```

Alternatively, this can be extracted directly in a `val` definition, like so:
Alternatively, as with patterns in general, this can be extracted directly in a
`val` definition.

Here is an example of matching an email address:
```scala
val r"^[a-z0-9._%+-]+@$domain([a-z0-9.-]+\.$tld([a-z]{2,6}))$$" =
"[email protected]": @unchecked
```
In the REPL, this would bind the following values:

The `@unchecked` annotation ascribed to the result is standard Scala, and
acknowledges to the compiler that the match is _partial_ and may fail at
runtime.

If you try this example in the Scala REPL, it would bind the following values:
```
> domain: Text = t"example.com"
> tld: Text = t"com"
```

In addition, the syntax of the regular expressionwill be checked at compile-time, and any
issues will be reported then.
In addition, the syntax of the regular expression will be checked at
compile-time, and any issues will be reported then.

### Repeated and optional capture groups

A normal, unitary capturing group will extract into a `Text` value. But if a capturing group has
a repetition suffix, such as `*` or `+`, then the extracted type will be a `List[Text]`. This also
applies to repetition ranges, such as `{3}`, `{2,}` or `{1,9}`. Note that `{1}` will still extract
a `Text` value.
A normal, _unitary_ capturing group, like `domain` and `tld` above, will
extract into `Text` values. But if a capturing group has a repetition suffix,
such as `*` or `+`, then the extracted type will be a `List[Text]`. This also
applies to repetition ranges, such as `{3}`, `{2,}` or `{1,9}`.

A capture group may be marked as optional, meaning it can appear either zero or one times. This
will extract a value with the type `Option[Text]`.
Note that `{1}` will still extract a `Text` value. The type is determined
statically from the pattern, and not dynamically from the runtime scrutinee.

A capture group may be marked as optional, meaning it can appear either zero or
one times. This will extract a value with the type `Optional[Text]`; that is,
if it present it will be a `Text` value, and if not, it will be `Unset`.

For example, see how `init` is extracted as a `List[Text]`, below:
```scala
Expand All @@ -122,11 +151,17 @@ def parseList(): List[Text] = "parsley, sage, rosemary, and thyme" match

Note that inside an extractor pattern string, whether it is single- (`r"..."`)
or triple-quoted (`r"""..."""`), special characters, notably `\`, do not need
to be escaped, with the exception of `$` which should be written as `$$`. It is
still necessary, however, to follow the regular expression escaping rules, for
example, an extractor matching a single opening parenthesis would be written as
`r"\("` or `r"""\("""`.
to be escaped, with the exception of `$` which should be written as `$$`.

It is still necessary, however, to follow the regular expression escaping
rules, for example, an extractor matching a single opening parenthesis would be
written as `r"\("` or `r"""\("""`.

## Globs

Globs offer a simplified and limited form of regular expression. You can use
these in exactly the same way as a standard regular expresion, using the
`g"..."` interpolator instead.



Expand All @@ -146,7 +181,7 @@ as long as caution is taken to avoid a mismatch between the project's stability
level and the required stability and maintainability of your own project.

Kaleidoscope is designed to be _small_. Its entire source code currently consists
of 532 lines of code.
of 530 lines of code.

## Building

Expand Down

0 comments on commit 136da1a

Please sign in to comment.