Skip to content
Mark Whitaker edited this page Sep 3, 2019 · 4 revisions

Contents

Overview

Groups are used in a regex for one (or both) of two purposes:

  1. To group a number of elements together so a quantifier can be applied to the whole group.
  2. To "remember" part of the text matched by the regex so we can extract it later using the groups and groupValues properties of the kotlin.text.MatchResult class.

A simple group to include a single letter followed by a single digit is defined like this:

val regex = regex {
    group {
        text()
        digit()
    }
}

Quantifiers for groups

The group function takes an optional RegexQuantifier which applies to the whole group. As an example, here's how we'd define a regex to match the pattern {letter}{letter}{digit} exactly 4 times:

val regex = regex {
    group(Exactly(4)) {
        letter(Exactly(2))
        digit()
    }
}

Remembering parts of the match

Say we want to match a person's name (two consecutive words each beginning with a capital letter) and then greet them by their first name, we could build a regex like this:

val regex = regex {
    wordBoundary()
    group {
        uppercaseLetter()
        lowercaseLetter(OneOrMore)
    }
    whitespace()
    uppercaseLetter()
    lowercaseLetter(OneOrMore)
    wordBoundary()
}

We can then extract the first name from a successful match like this:

val firstName = regex.find(inputString)?.groupValues[1]

Note that groupValues is indexed from 1, not 0. For reasons documented elsewhere, groupValues[0] will return the whole matched string.

Named groups

If you prefer to avoid numerical indices altogether you can also define named groups which are then indexed by name. Using named groups, our code would look like this:

val regex = regex {
    wordBoundary()
    namedGroup("firstName") {
        uppercaseLetter()
        lowercaseLetter(OneOrMore)
    }
    whitespace()
    uppercaseLetter()
    lowercaseLetter(OneOrMore)
    wordBoundary()
}

val firstName = regex.find(inputString)?.groups["firstName"]?.value

Nesting groups

As with raw regexes, regex allows you to nest groups to arbitrary depth. If you use capturing groups, matchResult?.groupValues[1] will refer to the first started group, and so on. For example:

val regex = regex {
    wordBoundary()
    // Group 1, to capture a name
    group() {
        // Group 2, to capture the first letter of that name
        group {
            uppercaseLetter()
        }
        lowercaseLetter(OneOrMore)
    }
    wordBoundary()
}

val matchResult = regex.find("sorry Dave, I can't do that")
val name = matchResult?.groupValues[1]    // "Dave"
val initial = matchResult?.groupValues[2] // "D"

Grouping functions

Function Description Raw regex equivalent
group Create a group which can be extracted later by calling MatchResult.groupValues[index: Int]. (...)
namedGroup(name: String) Create a group which can be extracted later by calling MatchResult.groups[index: String].value . (?<name>...)
nonCapturingGroup Start a group which cannot be extracted later with MatchResult.groups or MatchResult.groupValues. This can be useful if you have more than one group in a regex, and you don't want to a group that's purely for quantifiers to disrupt the indices of your capturing groups. (?:...)

Download from JitPack

Clone this wiki locally