C# Regular Expressions are used to do the pattern matching in C#. For this purpose regex classes are being used. Regular Expression is a pattern that is used to interpret and check whether the given input text is matching with the given pattern or not.
While writing the syntax of regular expressions there are many basic syntaxes that are used, such as given below:
1- Quantifiers
2- Alternatives / Grouping
3- Character Classes
4- Special Characters
5- Character Escapes
6- Anchors
Below is the detail related to these syntaxs.
1- Quantifiers:
Quantifiers are used to tell how many cases of the given element must be in the input string for a match to happen.
Given below are the quantifiers used in C#:
Quantifiers | Description |
* | This quantifier is used to match the preceding character zero or more times. |
? | This quantifier is used to match the preceding character zero or one time. |
+ | This quantifier is used to match the preceding character one or more times. |
{n} | This quantifier is used to match the preceding character exactly n times. |
{n, } | This quantifier is used to match the preceding character at least n times. |
// C# program for Quantifier using System; using System.Text.RegularExpressions; public class Csharp { public static void Main(string[] args) { // return any pattern x, xy, xxy, ... Regex regex = new Regex(@"x*y"); Match match = regex.Match("xxxyzy"); if (match.Success) { Console.WriteLine("Match Value: " + match.Value); } } }
2- Alternatives / Grouping:
In Alternatives & Grouping, things can be grouped together using the parenthesis ( and ).
Alternatives / Grouping: | Description |
() | This is used for group expression. |
(a|b) | This operator is used for alternatives either a or b. |
(?(exp) yes|no) | If the given expression is matched, it gives yes otherwise no. |
// C# program for grouping in regex using System; using System.Text.RegularExpressions; public class Csharp { public static void Main() { // return pattern will xy. xyy Regex regex = new Regex(@"(xy)+"); Match match = regex.Match("xyyx"); if (match.Success) { Console.WriteLine("Value matched: " + match.Value); } } }
3- Character Classes:
The character classes are grouped by putting them in square brackets. Because of that, a character class matches any one of a set of characters. Given below are the character classes,
Character Classes | Description |
[] | This character is used to match the range of characters. |
\ | This character is used to match the Escaped special character. |
[a-z] | This character is used to match any character in the range of a-z. |
[^a-z] | This character is used to match any character, not in the range of a-z. |
// C# program to demonstrate // the [] character class using System; using System.Text.RegularExpressions; public class Csharp { public static void Main() { // return one character either which will come first Regex regex = new Regex(@"[xyz]"); Match match = regex.Match("xzywtv"); if (match.Success) { Console.WriteLine("Value matched: " + match.Value); } } }
4- Special Characters:
Given below are the special characters with descriptions:
Special Characters | Description |
\n | This character is used to match a newline character. |
\d | This character is used to match the digit character. |
\D | This character is used to match the non-digit character. |
.(Dot) | This character is used to match any character only once except \n(newline). |
$ | This character is used to match the word before this element with the end of the line or string. |
^ | This character is used to match the word after this element with the beginning of the string or line. |
\w | This character is used to match any alphanumeric and underscore character. |
\W | This character is used to match any non-word character. |
\s | This character is used to match the white-space characters. |
\S | This character is used to match the white-space characters. |
// C# program to demonstrate // the ^ Special Character using System; using System.Text.RegularExpressions; public class Csharp { public static void Main() { // return if csharp exist at the beginning of the line Regex regex = new Regex(@"^XYZ"); Match match = regex.Match("XYZ"); if (match.Success) { Console.WriteLine("Value Matched: " + match.Value); } } }
5- Character Escapes:
Character escapes are the special characters. The backslash character (\) in a regular expression indicates that the character that follows it is a special character.
Given below are the character escapes,
Escape character | Description | Pattern | Matches |
---|---|---|---|
\t | This character matches a tab, \u0009. | (\w+)\t | “Name\t”, “Addr\t” in “Name\tAddr\t” |
\a | This character matches a bell character, \u0007. | \a | “\u0007” in “Warning!” + ‘\u0007’ |
\n | This character matches a new line, \u000A. | \r\n(\w+) | “\r\nHello” in “\r\Hello\nWorld.” |
\f | This character matches a form feed, \u000C. | [\f]{2,} | “\f\f\f” in “\f\f\f” |
\e | This character matches an escape, \u001B. | \e | “\x001B” in “\x001B” |
\v | This character matches a vertical tab, \u000B. | [\v]{2,} | “\v\v\v” in “\v\v\v” |
\r | This character matches a carriage return, \u000D. (\r is not equivalent to the newline character, \n.) | \r\n(\w+) | “\r\nHello” in “\r\Hello\nWorld.” |
\b | In a character class, this character matches a backspace, \u0008. | [\b]{3,} | “\b\b\b\b” in “\b\b\b\b” |
\nnn | This character uses the octal representation to specify a character (nnn consists of up to three digits). | \w\040\w | “a b”, “c d” in “a bc d” |
\x nn | This character uses the hexadecimal representation to specify a character (nn consists of exactly two digits). | \w\x20\w | “a b”, “c d” in “a bc d” |
\ | When followed by a character that is not recognized as an escaped character, matches that character. | \d+[\+-x\*]\d+\d+[\+-x\*\d+ | “2+2” and “3*9” in “(2+2) * 3*9” |
\u nnnn | This character matches a Unicode character by using hexadecimal representation (exactly four digits, as represented by nnnn). | \w\u0020\w | “a b”, “c d” in “a bc d” |
\c X\c x | This character matches the ASCII control character that is specified by X or x, where X or x is the letter of the control character. | \cC | “\x0003” in “\x0003” (Ctrl-C) |
6- Anchors:
Anchors in regular expressions are used to allow a match to succeed or fail, which depends on the current position in the string.
Given below are the list of some anchors,
Assertion | Description | Pattern | Matches |
---|---|---|---|
\z | This anchor match must occur at the end of the string. | -\d{3}\z | “-333” in “-901-333” |
\A | This anchor match must occur at the start of the string. | \A\w{3} | “Code” in “Code-007-“ |
\B | This anchor match must not occur on a \b boundary. | \Bend\w*\b | “ends”, “ender” in “end sends endure lender” |
^ | This anchor match must start at the beginning of the string or line. | ^\d{3} | “567” in “567-777-“ |
\G | This anchor match must occur at the point where the previous match ended. | \\G\(\d\) | “(1)”, “(3)”, “(5)” in “(1)(3)(5)[7](9)” |
\Z | This anchor match must occur at the end of the string or before \n at the end of the string. | -\d{3}\Z | “-007” in “Bond-901-007” |
\b | This anchor match must occur on a boundary between a \w (alphanumeric) and a \W(nonalphanumeric) character. | \w | “R”, “o”, “m” and “1” in “Room#1” |
$ | This anchor match must occur at the end of the string or before \n at the end of the line or string. | -\d{4}$ | “-2012” in “8-12-2012” |