Regular Expressions: LookAround
Regular expression lookaround assertions look at the content around the main portion of the regular expression for either a positive or negative match. These can be useful when the desired content is only distinct from other content on your page by what is consistently surrounding it.
Negative Lookahead | ?! |
Positive Lookahead | ?= |
Negative Lookbehind | ?<! |
Positive Lookbehind | ?<= |
Example:
Observe the list of order numbers below. Notice that some have white space before them, some have letters at the end of the number and one has a typo.
Order No: 1138a
Order No: 1138b
aOrder No: 1973
Order No: 1973b
For this example, we will use variations of the same regex pattern: Order\sNo:\s\d{4}.
Order\sNo: matches Order No: by explicitly stating the letters to match on with the included space character \s between the words.
The remainder of the pattern, \s\d{4} dictates that a space and four digits must be present for a match.
For more on Regular Expressions, go to the main Regex page.
Positive Lookahead
To match on all the order numbers followed by a letter, we will use a Positive Lookahead (?=).
The regular expression Order\sNo:\s\d{4}(?=[a-z]) matches the highlighted lines.
*Note: [a-z] matches on any character of the alphabet a-z. Brackets may contain any grouping of characters to match on. In this instance, [ab] would have returned the same result.
Order No: 1138a
Order No: 1138b
aOrder No: 1973
Order No: 1973b
Negative Lookahead
Inversely, to match on only the order numbers without letters following the digits, a negative lookahead can be used (?!).
The pattern Order\sNo:\s\d{4}(?![a-z]) matches the highlighted lines.
Order No: 1138a
Order No: 1138b
aOrder No: 1973
Order No: 1973b
Lookbehinds work in a similar manner to lookahead assertions. Rather than the pattern being placed after the match string as they are with lookaheads, the lookbehind pattern goes prior to the string.
Positive Lookbehind
To look behind the Order No: for an instance of a another letter, a typo, a positive lookbehind can be used(?<=).
The pattern (?<=a)Order\sNo:\s\d{4} matches the highlighted lines.
Order No: 1138a
Order No: 1138b
aOrder No: 1973
Order No: 1973b
Negative Lookbehind
The inverse can be matched by using a negative lookbehind instead of a positive lookbehind.
The pattern (?<!a)Order\sNo:\s\d{4} matches the highlighted lines.
Order No: 1138a
Order No: 1138b
aOrder No: 1973
Order No: 1973b