Using regular expressions is security-sensitive. It has led in the past to the following vulnerabilities:

Regular Expressions are subject to different kinds of vulnerabilities.

First, evaluating regular expressions against input strings is potentially an extremely CPU-intensive task. Specially crafted regular expressions such as (a+)+ will take several seconds to evaluate the input string aaaaaaaaaaaaaaaaaaaaaaaaaaaaa!. The problem is that with every additional a character added to the input, the time required to evaluate the regex doubles. However, the equivalent regular expression, a+ (without grouping) is efficiently evaluated in milliseconds and scales linearly with the input size.

Evaluating user-provided strings as regular expressions opens the door to Regular expression Denial of Service (ReDoS) attacks. In the context of a web application, attackers can force the web server to spend all of its resources evaluating regular expressions thereby making the service inaccessible to genuine users.

Another type of vulnerability can occur when regular expressions are used to validate user input. A regular expression can be used to filter unsafe input by either matching a whole input when it is valid (example: the whole string should only contain alphanumeric characters) or by detecting dangerous parts of an input. In both cases it is possible to let dangerous values through. For example, searching for <script> tags in some HTML code with the regular expression .*<script>.* will miss <script id="test">.

This rule flags any regular expression execution, which means that an issue will be created whenever one of the following function is called:

Note that ereg* functions have been removed in PHP 7 and PHP 5 end of life date is the 1st of January 2019. Using PHP 5 after this date is dangerous as there will be no security fix.

This rule's goal is to guide security code reviews.

Ask Yourself Whether

You may be at risk if you answered yes to any of those questions.

Recommended Secure Coding Practices

Avoid executing a user input string as a regular expression or use at least preg_quote to escape regular expression characters.

Check whether your regular expression engine (the algorithm executing your regular expression) has any known vulnerabilities. Search for vulnerability reports mentioning the one engine you're are using.

Test your regular expressions with techniques such as equivalence partitioning, and boundary value analysis, and test for robustness. Try not to make complex regular expressions as they are difficult to understand and test. Note that some regular expression engines will match only part of the input if no anchors are used. In PHP for example preg_match("/[A-Za-z0-9]+/", $text) will accept any string containing at least one alphanumeric character because it has no anchors.

Exceptions

An issue will be created for the functions mb_ereg_search_pos, mb_ereg_search_regs and mb_ereg_search if and only if at least the first argument, i.e. the $pattern, is provided.

See