Using regular expressions is security-sensitive. It has led in the past to the following vulnerabilities:

Regular Expressions are subject to different kinds of vulnerabilities.

First, evaluating regular expressions against input strings is potentially an extremely CPU-intensive task. Specially crafted regular expressions such as (a+)+ will take several seconds to evaluate the input string aaaaaaaaaaaaaaaaaaaaaaaaaaaaa!. The problem is that with every additional a character added to the input, the time required to evaluate the regex doubles. However, the equivalent regular expression, a+ (without grouping) is efficiently evaluated in milliseconds and scales linearly with the input size.

Evaluating user-provided strings as regular expressions opens the door to Regular expression Denial of Service (ReDoS) attacks. In the context of a web application, attackers can force the web server to spend all of its resources evaluating regular expressions thereby making the service inaccessible to genuine users.

Another type of vulnerability can occur when regular expressions are used to validate user input. A regular expression can be used to filter unsafe input by either matching a whole input when it is valid (example: the whole string should only contain alphanumeric characters) or by detecting dangerous parts of an input. In both cases it is possible to let dangerous values through. For example, searching for <script> tags in some HTML code with the regular expression .*<script>.* will miss <script id="test">.

This rule flags any regular expression execution or compilation for review.

Ask Yourself Whether

You may be at risk if you answered yes to any of those questions.

Recommended Secure Coding Practices

Avoid executing a user input string as a regular expression. If this is required, restrict the allowed regular expressions.

Check whether your regular expression engine (the algorithm executing your regular expression) has any known vulnerabilities. Search for vulnerability reports mentioning the one engine you're are using.

Test your regular expressions with techniques such as equivalence partitioning, and boundary value analysis, and test for robustness. Try not to make complex regular expressions as they are difficult to understand and test. Note that some regular expression engines will match only part of the input if no anchors are used. In PHP for example preg_match("/[A-Za-z0-9]+/", $text) will accept any string containing at least one alphanumeric character because it has no anchors.

Questionable Code Example

import java.util.regex.Pattern;

class BasePattern {
  String regex; // a regular expression
  String input; // a user input

  void foo(CharSequence htmlString) {
    input.matches(regex);  // Questionable
    Pattern.compile(regex);  // Questionable
    Pattern.compile(regex, Pattern.CASE_INSENSITIVE);  // Questionable

    String replacement = "test";
    input.replaceAll(regex, replacement);  // Questionable
    input.replaceFirst(regex, replacement);  // Questionable

    if (!Pattern.matches(".*<script>.*", htmlString)) { // Questionable, even if the pattern is hard-coded
    }
  }
}

This also applies for bean validation, where regexp can be specified:

import java.io.Serializable;
import javax.validation.constraints.Pattern;
import javax.validation.constraints.Email;
import org.hibernate.validator.constraints.URL;

class BeansRegex implements Serializable {
  @Pattern(regexp=".+@.+")  // Questionable
  private String email;

  @Email(regexp=".+@.+")  // Questionable
  private String email2;

  @URL(regexp=".*") // Questionable
  private String url;
  // ...
}

Exceptions

Calls to java.util.regex.Pattern.matcher(...), java.util.regex.Pattern.split(...) and all methods of java.util.regex.Matcher are not highlighted as the pattern compilation is already highlighted.

Calls to String.split(regex) and String.split(regex, limit) will not raise an exception despite their use of a regular expression. These methods are used most of the time to split on a single character, which doesn't create any vulnerability.

See