Using regular expressions is security-sensitive. It has led in the past to the following vulnerabilities:
Regular Expressions are subject to different kinds of vulnerabilities.
First, evaluating regular expressions against input strings is potentially an extremely CPU-intensive task. Specially crafted regular expressions
such as (a+)+ will take several seconds to evaluate the input string aaaaaaaaaaaaaaaaaaaaaaaaaaaaa!. The problem is that
with every additional a character added to the input, the time required to evaluate the regex doubles. However, the equivalent regular
expression, a+ (without grouping) is efficiently evaluated in milliseconds and scales linearly with the input size.
Evaluating user-provided strings as regular expressions opens the door to Regular expression Denial of Service (ReDoS) attacks. In the context of a web application, attackers can force the web server to spend all of its resources evaluating regular expressions thereby making the service inaccessible to genuine users.
Another type of vulnerability can occur when regular expressions are used to validate user input. A regular expression can be used to filter unsafe
input by either matching a whole input when it is valid (example: the whole string should only contain alphanumeric characters) or by detecting
dangerous parts of an input. In both cases it is possible to let dangerous values through. For example, searching for <script> tags
in some HTML code with the regular expression .*<script>.* will miss <script id="test">.
This rule flags any regular expression execution or compilation for review.
You may be at risk if you answered yes to any of those questions.
Avoid executing a user input string as a regular expression. If this is required, restrict the allowed regular expressions.
Check whether your regular expression engine (the algorithm executing your regular expression) has any known vulnerabilities. Search for vulnerability reports mentioning the one engine you're are using.
Test your regular expressions with techniques such as equivalence partitioning, and boundary value analysis, and test for robustness. Try not to
make complex regular expressions as they are difficult to understand and test. Note that some regular expression engines will match only part of the
input if no anchors are used. In PHP for example preg_match("/[A-Za-z0-9]+/", $text) will accept any string containing at least one
alphanumeric character because it has no anchors.
Django
from django.core.validators import RegexValidator
from django.urls import re_path
def build_validator(regex):
RegexValidator(regex) # Questionable
RegexValidator('(a*)*') # Questionable
def define_http_endpoint(path, view):
re_path(path, view) # Questionable
re module
import re
from re import compile, match, search, fullmatch, split, findall, finditer, sub, subn
input = 'input string'
replacement = 'replacement'
re.compile # Questionable
re.match # Questionable
re.search # Questionable
re.fullmatch # Questionable
re.split # Questionable
re.findall # Questionable
re.finditer # Questionable
re.sub # Questionable
re.subn # Questionable
compile # Questionable
match # Questionable
search # Questionable
fullmatch # Questionable
split # Questionable
findall # Questionable
finditer # Questionable
sub # Questionable
subn # Questionable
def dynamic_pattern(pattern):
re.compile(pattern) # Questionable
re.match(pattern, input) # Questionable
re.search(pattern, input) # Questionable
re.fullmatch(pattern, input) # Questionable
re.split(pattern, input) # Questionable
re.findall(pattern, input) # Questionable
re.finditer(pattern,input) # Questionable
re.sub(pattern, replacement, input) # Questionable
re.subn(pattern, replacement, input) # Questionable
regex module
import regex
from regex import compile, match, search, fullmatch, split, findall, finditer, sub, subn, subf, subfn, splititer
input = 'input string'
replacement = 'replacement'
regex.subf # Questionable
regex.subfn # Questionable
regex.splititer # Questionable
subf # Questionable
subfn # Questionable
splititer # Questionable
def dynamic_pattern(pattern):
regex.subf(pattern, replacement, input) # Questionable
regex.subfn(pattern, replacement, input) # Questionable
regex.splititer(pattern, input) # Questionable
regex.compile # Questionable
regex.match # Questionable
regex.search # Questionable
regex.fullmatch # Questionable
regex.split # Questionable
regex.findall # Questionable
regex.finditer # Questionable
regex.sub # Questionable
regex.subn # Questionable
compile # Questionable
match # Questionable
search # Questionable
fullmatch # Questionable
split # Questionable
findall # Questionable
finditer # Questionable
sub # Questionable
subn # Questionable
def dynamic_pattern(pattern):
regex.compile(pattern) # Questionable
regex.match(pattern, input) # Questionable
regex.search(pattern, input) # Questionable
regex.fullmatch(pattern, input) # Questionable
regex.split(pattern, input) # Questionable
regex.findall(pattern, input) # Questionable
regex.finditer(pattern,input) # Questionable
regex.sub(pattern, replacement, input) # Questionable
regex.subn(pattern, replacement, input) # Questionable
None