-
Notifications
You must be signed in to change notification settings - Fork 0
Elements
Elements are the building blocks that make up a regex. Building a regex with regex
consists of the following:
val regex = regex {
// call element methods here
}
With the exception of the anchor methods, all the methods below take an optional RegexQuantifier
parameter which is used to define how many instances of the element should be matched. Without a quantifier parameter, each method matches the element exactly once. Read more about quantifiers in Quantifiers.
All elements may be added to a group: see Groups for more details on those.
Method | Matches | Raw regex equivalent |
---|---|---|
letter() |
Any uppercase or lowercase Unicode letter | \p{L} |
lowercaseLetter() |
Any lowercase Unicode letter | \p{Ll} |
uppercaseLetter() |
Any uppercase Unicode letter | \p{Lu} |
nonLetter() |
Any character that is not a Unicode letter (including white space and control characters) | \P{L} |
digit() |
Any decimal digit (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) | [0-9] |
nonDigit() |
Any character that is not a decimal digit (including white space and control characters) | [^0-9] |
letterOrDigit() |
Any Unicode letter (uppercase or lowercase) or digit | [\p{L}0-9] |
nonLetterOrDigit() |
Any character that is not a Unicode letter or digit (including white space and control characters) | [^\p{L}0-9] |
hexDigit() |
Any hexadecimal digit (uppercase or lowercase letters) | [a-fA-F0-9] |
lowercaseHexDigit() |
Any hexadecimal digit (lowercase letters only) | [a-f0-9] |
uppercaseHexDigit() |
Any hexadecimal digit (uppercase letters only) | [A-F0-9] |
nonHexDigit() |
Any character that is not a hexadecimal digit | [^a-fA-F0-9] |
anyCharacter() |
Any character at all, including white space and control characters | . |
whitespace() |
Any white space character (space, tab, newline or carriage return) | \s |
nonWhitespace() |
Any non-white space character (including control characters) | \S |
space() |
A space character | |
tab() |
A tab character | \t |
lineFeed() |
A line feed character | \n |
carriageReturn() |
A carriage return character | \r |
wordCharacter() |
Any Unicode letter, decimal digit or underscore | [\p{L}0-9_] |
nonWordCharacter() |
Any character that is not a Unicode letter, decimal digit or underscore (including white space and control characters) | [^\p{L}0-9_] |
Method | Matches |
---|---|
text(text: String) |
Any arbitrary text. If the string passed in contains reserved regex characters they will be escaped to avoid the regex doing unexpected things. For example, if you pass the string ":)" , it will be escaped to ":\)" . |
regexText(text: String) |
Raw regex text. Reserved regex characters are not escaped, so this is only for tinkerers who know what they're doing. |
anyCharacterFrom(characters: String) |
Any of the characters in the supplied string. For example, anyCharacterFrom("abc") will match "a" , "b" or "c" . |
anyCharacterExcept(characters: String) |
Any characters not in the supplied string (including white space and control characters). For example, anyCharacterExcept("abc") will match "1" , "d" or "&" but not "a" . |
anyOf(vararg strings: String) |
Any of the strings supplied, in their entirety. For example, anyOf("Mr", "Mrs", "Ms") will match "Mr" , "Mrs" or "Ms" but not "M" . |
Anchors (known in a regex world as "zero-width assertions") match a point in a string that isn't represented by a character (hence "zero-width"). They're useful for crafting regexes that match text occurring at a particular position within a string, rather than just anywhere.
Method | Matches | Raw regex equivalent |
---|---|---|
startOfString() |
The start of the string. | ^ |
endOfString() |
The end of the string. | $ |
wordBoundary() |
The boundary between a word character (letter, digit or underscore) and a non-word character. | \b |
RegexToolbox: Now you can be a hero without knowing regular expressions.