Skip to content

Elements

Mark Whitaker edited this page Sep 17, 2019 · 7 revisions

Contents

Overview

Elements are the building blocks that make up a regex. Building a regex with regex consists of the following:

val regex = regex {
    // call element methods here
}

With the exception of the anchor methods, all the methods below take an optional RegexQuantifier parameter which is used to define how many instances of the element should be matched. Without a quantifier parameter, each method matches the element exactly once. Read more about quantifiers in Quantifiers.

All elements may be added to a group: see Groups for more details on those.

Simple text matches

Method Matches Raw regex equivalent
letter() Any uppercase or lowercase Unicode letter \p{L}
lowercaseLetter() Any lowercase Unicode letter \p{Ll}
uppercaseLetter() Any uppercase Unicode letter \p{Lu}
nonLetter() Any character that is not a Unicode letter (including white space and control characters) \P{L}
digit() Any decimal digit (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) [0-9]
nonDigit() Any character that is not a decimal digit (including white space and control characters) [^0-9]
letterOrDigit() Any Unicode letter (uppercase or lowercase) or digit [\p{L}0-9]
nonLetterOrDigit() Any character that is not a Unicode letter or digit (including white space and control characters) [^\p{L}0-9]
hexDigit() Any hexadecimal digit (uppercase or lowercase letters) [a-fA-F0-9]
lowercaseHexDigit() Any hexadecimal digit (lowercase letters only) [a-f0-9]
uppercaseHexDigit() Any hexadecimal digit (uppercase letters only) [A-F0-9]
nonHexDigit() Any character that is not a hexadecimal digit [^a-fA-F0-9]
anyCharacter() Any character at all, including white space and control characters .
whitespace() Any white space character (space, tab, newline or carriage return) \s
nonWhitespace() Any non-white space character (including control characters) \S
space() A space character
tab() A tab character \t
lineFeed() A line feed character \n
carriageReturn() A carriage return character \r
wordCharacter() Any Unicode letter, decimal digit or underscore [\p{L}0-9_]
nonWordCharacter() Any character that is not a Unicode letter, decimal digit or underscore (including white space and control characters) [^\p{L}0-9_]

User-defined text matches

Method Matches
text(text: String) Any arbitrary text. If the string passed in contains reserved regex characters they will be escaped to avoid the regex doing unexpected things. For example, if you pass the string ":)", it will be escaped to ":\)".
regexText(text: String) Raw regex text. Reserved regex characters are not escaped, so this is only for tinkerers who know what they're doing.
anyCharacterFrom(characters: String) Any of the characters in the supplied string. For example, anyCharacterFrom("abc") will match "a", "b" or "c".
anyCharacterExcept(characters: String) Any characters not in the supplied string (including white space and control characters). For example, anyCharacterExcept("abc") will match "1", "d" or "&" but not "a".
anyOf(vararg strings: String) Any of the strings supplied, in their entirety. For example, anyOf("Mr", "Mrs", "Ms") will match "Mr", "Mrs" or "Ms" but not "M".

Anchors

Anchors (known in a regex world as "zero-width assertions") match a point in a string that isn't represented by a character (hence "zero-width"). They're useful for crafting regexes that match text occurring at a particular position within a string, rather than just anywhere.

Method Matches Raw regex equivalent
startOfString() The start of the string. ^
endOfString() The end of the string. $
wordBoundary() The boundary between a word character (letter, digit or underscore) and a non-word character. \b

Download from JitPack

Clone this wiki locally