Using Regular Expressions#
Regular expressions are a powerful tool. However, they are also very expensive in terms of memory. Ensuring correct and useful functionality is the priority but we have a few tips to minimize impact without affecting capabilities.
- Consider non-regular expressions options.
strings.Contains()
,strings.Replace()
, andstrings.ReplaceAll()
are dramatically faster and less memory intensive than regular expressions. If one of these will work equally well, use the non-regular expression option. -
Order character classes consistently. We use regular expression caching to reduce our memory footprint. This is more effective if character classes are consistently ordered. Since a character class is a set, order does not affect functionality. We have many equivalent regular expressions that only differ by character class order. Below is the order we recommend for consistency:
- Numeric range, i.e., digits (e.g.,
0-9
) - Uppercase alphabetic range (e.g.,
A-Z
,A-F
) - Lowercase alphabetic range (e.g.,
a-z
,a-f
) - Underscore (
_
) - Everything else (except dash,
-
) in ASCII order:\t\n\r !"#$%&()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^abcdefghijklmnopqrstuvwxyz{|}~
-
Last, dash (
-
)For example, consider the following expressions which are equivalent but vary character class ordering:
`[_a-zA-Z0-9-,.]` // wrong ordering `[0-9A-Za-z_,.-]` // correct
`[;a-z0-9]` // wrong ordering `[0-9a-z;]` // correct
- Numeric range, i.e., digits (e.g.,
-
Inside character classes, avoid unnecessary character escaping. Go does not complain about extra character escaping but avoid it to improve cache performance. Inside a character class, most characters do not need to be escaped, as Go assumes you mean the literal character.
- These characters which normally have special meaning in regular expressions, inside character classes do not need to be escaped:
$
,(
,)
,*
,+
,.
,?
,^
,{
,|
,}
. - Dash (
-
), when it is last in the character class or otherwise unambiguously not part of a range, does not need to be escaped. If in doubt, place the dash last in the character class (e.g.,[a-c-]
) or escape the dash (e.g.,\-
). -
Angle brackets (
[
,]
) always need to be escaped in a character class.For example, consider the following expressions which are equivalent but include unnecessary character escapes:
`[\$\(\.\?\|]` // unnecessary escapes `[$(.?|]` // correct
`[a-z\-0-9_A-Z\.]` // unnecessary escapes, wrong order `[0-9A-Za-z_.-]` // correct
- These characters which normally have special meaning in regular expressions, inside character classes do not need to be escaped: