1 Regular expressions (REs), unlike simple queries, allow you to search for text which matches a particular pattern.
3 REs are similar to (but more poweful than) the "wildcards" used in the command-line interfaces found in operating systems such as Unix and MS-DOS. REs are used by sophisticated search engines, as well as by many Unix-based languages and tools ( e.g., `awk`, `grep`, `lex`, `perl`, and `sed` ).
9 <td> compan(y|ies) </td>
10 <td> Search for <em>company</em> , <em>companies</em></td>
13 <td> (peter|paul) </td>
14 <td> Search for <em>peter</em> , <em>paul</em></td>
18 <td> Search for <em>bug</em> , <em>bugs</em> , <em>bugfix</em></td>
22 <td> Search for <em>Bag</em> , <em>bag</em></td>
26 <td> Second letter is a vowel. Matches <em>bag</em> , <em>bug</em> , <em>big</em></td>
30 <td> Second letter is any letter. Matches also <em>b&g</em></td>
34 <td> Matches any one letter (not a number and a symbol) </td>
37 <td> [^0-9a-zA-Z] </td>
38 <td> Matches any symbol (not a number or a letter) </td>
41 <td> [A-Z][A-Z]* </td>
42 <td> Matches one or more uppercase letters </td>
45 <td> [0-9][0-9][0-9]-[0-9][0-9]- <br /> [0-9][0-9][0-9][0-9] </td>
46 <td valign="top"> US social security number, e.g. 123-45-6789 </td>
50 Here is stuff for our UNIX freaks: <br /> (copied from 'man grep')
52 \c A backslash (\) followed by any special character is a
53 one-character regular expression that matches the spe-
54 cial character itself. The special characters are:
56 + `.', `*', `[', and `\' (period, asterisk,
57 left square bracket, and backslash, respec-
58 tively), which are always special, except
59 when they appear within square brackets ([]).
61 + `^' (caret or circumflex), which is special
62 at the beginning of an entire regular expres-
63 sion, or when it immediately follows the left
64 of a pair of square brackets ([]).
66 + $ (currency symbol), which is special at the
67 end of an entire regular expression.
69 . A `.' (period) is a one-character regular expression
70 that matches any character except NEWLINE.
73 A non-empty string of characters enclosed in square
74 brackets is a one-character regular expression that
75 matches any one character in that string. If, however,
76 the first character of the string is a `^' (a circum-
77 flex or caret), the one-character regular expression
78 matches any character except NEWLINE and the remaining
79 characters in the string. The `^' has this special
80 meaning only if it occurs first in the string. The `-'
81 (minus) may be used to indicate a range of consecutive
82 ASCII characters; for example, [0-9] is equivalent to
83 [0123456789]. The `-' loses this special meaning if it
84 occurs first (after an initial `^', if any) or last in
85 the string. The `]' (right square bracket) does not
86 terminate such a string when it is the first character
87 within it (after an initial `^', if any); that is,
88 []a-f] matches either `]' (a right square bracket ) or
89 one of the letters a through f inclusive. The four
90 characters `.', `*', `[', and `\' stand for themselves
91 within such a string of characters.
93 The following rules may be used to construct regular expres-
96 * A one-character regular expression followed by `*' (an
97 asterisk) is a regular expression that matches zero or
98 more occurrences of the one-character regular expres-
99 sion. If there is any choice, the longest leftmost
100 string that permits a match is chosen.
102 ^ A circumflex or caret (^) at the beginning of an entire
103 regular expression constrains that regular expression
104 to match an initial segment of a line.
106 $ A currency symbol ($) at the end of an entire regular
107 expression constrains that regular expression to match
108 a final segment of a line.
110 * A regular expression (not just a one-
111 character regular expression) followed by `*'
112 (an asterisk) is a regular expression that
113 matches zero or more occurrences of the one-
114 character regular expression. If there is
115 any choice, the longest leftmost string that
116 permits a match is chosen.
118 + A regular expression followed by `+' (a plus
119 sign) is a regular expression that matches
120 one or more occurrences of the one-character
121 regular expression. If there is any choice,
122 the longest leftmost string that permits a
125 ? A regular expression followed by `?' (a ques-
126 tion mark) is a regular expression that
127 matches zero or one occurrences of the one-
128 character regular expression. If there is
129 any choice, the longest leftmost string that
130 permits a match is chosen.
132 | Alternation: two regular expressions
133 separated by `|' or NEWLINE match either a
134 match for the first or a match for the
137 () A regular expression enclosed in parentheses
138 matches a match for the regular expression.
140 The order of precedence of operators at the same parenthesis
141 level is `[ ]' (character classes), then `*' `+' `?'
142 (closures),then concatenation, then `|' (alternation)and