Regular Expression Syntax

 

Easy GREP RegEx can be written in the syntax of Javascript and VBscript which covers most of the usually used.

Essential Grammar & Syntax

Key Words

Metacharacters
Key Description Example Explanation
^ start of a line (or a string*) ^On Key word "On" must be the first word of a line (a string)
$ end of a line (or a string*) \d$ a number at the end of a line (string)
. Anything except line feed(or anything*) One character, similar as ? in Windows wildcard
X* Preceding expression exists any times or not be* b, be, bee, beee, etc.
X+ Preceding expression exists once or more be+ be, bee, beee, et.c
X? Preceding expression exists once or not be? b or be
(XX) Make a group of characters, so that we can add other metacharacter behind or reference it in result (long )+ago long ago, long long ago, long long long ago, etc.
[XX] Make a class of characters, can match any characters inside. [AD]C AC or DC
| Either of two sides can match. A|DC AC or DC
(A|B) So that A or B can be of any length. (AC|DC|XYZ) AC or DC or XYZ.
[^X] Negative class. Any character except the included characters. [^a] Anything except letter "a"
{n} Preceding expression should be repeated n times. 20{3} only 2000 matchs.
{n,n} Preceding expression should be repeated from n times to m times. 20{1,3} 20,200,2000
? Match only to the nearest expression followed \.+?> Till the first ">"
Escaped Characters
Key Abbr. of Description Example Explanation
\d Digit means a number Such as 1,2,3
\b Boundary means a word boundary. (Some other types of RegEx use \< and \> do the same thing.) \bsome\b Only word "some" matches, "something" or "handsome" doesn't match
\r Return means carriage return For Mac system
\n Newline means newline For Windows system
\r\n whole line break Mostly for Windows system, also used in most files for cross-platform
\w Word includes latin character and number and _ Such as a,b,c,A,B,C,1,2,3,_
\s Space includes white space, tab, line breaks. =[ \t\r\n]
\t Tab
\X Negative Class An upper case means the negative class of the lowercase one means, reversed range. \S =[^\s] any character except \s
  1. In regular expression, all escaped characters is case sensitive. A general expression of a class is always a backslash followed with a lowercased first character of the class name. While, an expression with a uppercased character means reversed, all things except the class marked by the lowercased one.
  2. Be careful, though there's two character, a back slash with a Latin character only represent one character. So any following metacharacter's subject is not the preceding Latin character, but the whole expression meaning a class with back slash escaped. For example, \s+ should be interpreted as (\s)+ instead of \(s+)
  3. Not all \x has a corresponding \X, such as \W means anything except \w, but \N doesn't mean anything except \n