Regular expressions

From Algorithmist
Jump to: navigation, search
This is a stub or unfinished. Contribute by editing me.

A regular expression (or regex) is a pattern that is used in matching a string. Regular expressions have the power to match regular languages — the type 3 grammars of the Chomsky hierarchy. They are invaluable in string parsing problems. By convention, a regular expression is enclosed in forward slashes, such as in /(foo|bar)+/. A library that actually determines whether a regular expression matches a string is called a regex engine. Most modern languages have some way of letting you use regular expressions. For example, Perl's regular expression support is built directly into its syntax.


Some of these caveats may not apply to some engines. For example, there may be an engine that uses nongreedy quantifiers by default.

  • By default, quantifiers are greedy. They will attempt to slurp up as much as the string as long as it will still match. You can get nongreedy behavior in some engines by appending ? to the quantifier, as in /foo .*? bar/.
  • The '.' metacharacter will generally fail to match a newline.
  • You may have to escape the grouping/capturing and alternation operators (that is, '(', ')', and '|').
  • Use as few quantifiers as possible. Too many can cause the engine to slow to a crawl. For example, /f*o*o*b*a*r*b*a*z*/ would be a nightmare for any regex engine.
  • Most good engines will detect when no metacharacters are used in a pattern and perform an optimized string searching algorithm such as Boyer-Moore.