Regular Expressions

 

 

 

Regular Expressions are powerful search expressions which can perform advanced pattern recognition and validation.

 

The following table defines the meta characters of the regular expression language:

 

Character

Definition

Pattern

Sample Matches

^

Start of a string.

^abc

abc, abcdefg, abc123, ...

$

End of a string.

abc$

abc, endsinabc, 123abc, ...

.

Any character (except \n newline)

a.c

abc, aac, acc, adc, aec, ...

|

Alternation.

bill|ted

ted, bill

{...}

Explicit quantifier notation.

ab{2}c

abbc

[...]

Explicit set of characters to match.

a[bB]c

abc, aBc

(...)

Logical grouping of part of an expression.

(abc){2}

abcabc

*

0 or more of previous expression.

ab*c

ac, abc, abbc, abbbc, ...

+

1 or more of previous expression.

ab+c

abc, abbc, abbbc, ...

?

0 or 1 of previous expression; also forces minimal matching when an expression might match several strings within a search string.

ab?c

ac, abc

\

Preceding one of the above, it makes it a literal instead of a special character. Preceding a special matching character, see below.

a\sc

a c

The following table contains the escape sequences used in authoring regular expressions:

Character

Description

ordinary characters

Characters other than . $ ^ { [ ( | ) ] } * + ? \ match themselves.

\a

Matches a bell (alarm) \u0007.

\b

Matches a backspace \u0008 if in a []; otherwise matches a word boundary (between \w and \W characters).

\t

Matches a tab \u0009.

\r

Matches a carriage return \u000D.

\v

Matches a vertical tab \u000B.

\f

Matches a form feed \u000C.

\n

Matches a new line \u000A.

\e

Matches an escape \u001B.

\040

Matches an ASCII character as octal (up to three digits); numbers with no leading zero are back-references if they have only one digit or if they correspond to a capturing group number. For example, the character \040 represents a space.

\x20

Matches an ASCII character using hexadecimal representation (exactly two digits).

\cC

Matches an ASCII control character; for example \cC is control-C.

\u0020

Matches a Unicode character using a hexadecimal representation (exactly four digits).

\*

When followed by a character that is not recognized as an escaped character, matches that character. For example, \* is the same as \x2A.

The following table contains character classes used in regular expressions:

Char Class

Description

.

Matches any character except \n. If modified by the Single line option, a period character matches any character. For more information, see Regular Expression Options.

[aeiou]

Matches any single character included in the specified set of characters.

[^aeiou]

Matches any single character not in the specified set of characters.

[0-9a-fA-F]

Use of a hyphen (–) allows specification of contiguous character ranges.

\p{name}

Matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing.

\P{name}

Matches text not included in groups and block ranges specified in {name}.

\w

Matches any word character. Equivalent to the Unicode character categories [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \w is equivalent to [a-zA-Z_0-9].

\W

Matches any nonword character. Equivalent to the Unicode categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \W is equivalent to [^a-zA-Z_0-9].

\s

Matches any white-space character. Equivalent to the Unicode character categories [\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \s is equivalent to [ \f\n\r\t\v].

\S

Matches any non-white-space character. Equivalent to the Unicode character categories [^\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \S is equivalent to [^ \f\n\r\t\v].

\d

Matches any decimal digit. Equivalent to \p{Nd} for Unicode and [0-9] for non-Unicode, ECMAScript behavior.

\D

Matches any non-digit. Equivalent to \P{Nd} for Unicode and [^0-9] for non-Unicode, ECMAScript behavior.

 


Copyright © 2024 pasUNITY, Inc.

 

Send comments on this topic.