-
Notifications
You must be signed in to change notification settings - Fork 0
Home
The core of RegexToolbox.kt is the regex
builder function and the RegexBuilder
class. They make it easy to build complicated regular expressions in a way that's far more readable to Kotlin developers. regex
is a super-Kotliny type-safe builder that leads to fluent, readable syntax.
Let's see an example. Say we want to use a regular expression to match people's names in a text file. We'll define a person's name as two words next to each other, both beginning with a capital letter. Here's how we might do it without using RegexToolbox.kt:
val regex = Regex("\\b[A-Z][a-z]+\\s+[A-Z][a-z]+\\b")
Or using a raw string:
val regex = Regex("""\w[A-Z][a-z]+\s+[A-Z][a-z]+\w""")
That's a pretty simple regular expression, but unless you're familiar with the syntax it can still look confusing and be difficult to read, understand and maintain. Here it is again with regex
:
val regex = regex {
wordBoundary()
uppercaseLetter()
lowercaseLetter(OneOrMore)
whitespace(OneOrMore)
uppercaseLetter()
lowercaseLetter(OneOrMore)
wordBoundary()
}
Some things you'll notice straight off the bat:
- There's no regex syntax on display here at all - just simple, clearly-named building blocks such as
lowercaseLetter()
. -
regex
returns a standardkotlin.text.Regex
object, so you can treat it just the same as an object built with the regular syntax. (There's alsopattern
for legacy cases where you need to build ajava.util.regex.Pattern
.) - Matching an element conditionally or repeatedly is done by passing in a
RegexQuantifier
: more about those in Quantifiers. - The code got longer. That's unavoidable, but a worthwhile trade-off for cleaner, more maintainable code. We're not trying to win a round of code golf here. If you are, then bare regex syntax is definitely the way to go. 😉
regex
takes an optional, variable array of RegexOptions
as a parameter. Supported values are:
Value | Description |
---|---|
IGNORE_CASE |
Makes the regex case-insensitive. Note that this causes element methods like uppercaseLetter() to lose their case sensitivity. |
MULTILINE |
Causes startOfString() and endOfString() to also match line breaks within a multi-line string. |
You use them like this:
val regex = regex(IGNORE_CASE, MULTILINE) {
letter()
digit()
}
RegexToolbox supports the most commonly used features of regular expressions. Some advanced features that are rarely used are omitted for the sake of simplicity, but they may be added in future if there's enough demand. (Or of course you can fork this repo and add all you like. 😃)
The current features of RegexToolbox are described in the following pages:
- Elements. These are the building blocks that we make regexes from: things like letters, numbers, whitespace and so on.
- Quantifiers. These are used to match multiple occurrences of an element in a regex.
- Groups. These are used either a) to bunch together a set of elements so you can apply quantifiers to the whole lot, or b) to "remember" part of a regex so you can extract it from the match later. Or both of the above.
RegexToolbox: Now you can be a hero without knowing regular expressions.