Using regular expressions is security-sensitive. It has led in the past to the following vulnerabilities:
Regular Expressions are subject to different kinds of vulnerabilities.
First, evaluating regular expressions against input strings is potentially an extremely CPU-intensive task. Specially crafted regular expressions
such as (a+)+ will take several seconds to evaluate the input string aaaaaaaaaaaaaaaaaaaaaaaaaaaaa!. The problem is that
with every additional a character added to the input, the time required to evaluate the regex doubles. However, the equivalent regular
expression, a+ (without grouping) is efficiently evaluated in milliseconds and scales linearly with the input size.
Evaluating user-provided strings as regular expressions opens the door to Regular expression Denial of Service (ReDoS) attacks. In the context of a web application, attackers can force the web server to spend all of its resources evaluating regular expressions thereby making the service inaccessible to genuine users.
Another type of vulnerability can occur when regular expressions are used to validate user input. A regular expression can be used to filter unsafe
input by either matching a whole input when it is valid (example: the whole string should only contain alphanumeric characters) or by detecting
dangerous parts of an input. In both cases it is possible to let dangerous values through. For example, searching for <script> tags
in some HTML code with the regular expression .*<script>.* will miss <script id="test">.
This rule flags any regular expression execution or compilation for review.
You may be at risk if you answered yes to any of those questions.
Avoid executing a user input string as a regular expression. If this is required, restrict the allowed regular expressions.
Check whether your regular expression engine (the algorithm executing your regular expression) has any known vulnerabilities. Search for vulnerability reports mentioning the one engine you're are using.
Test your regular expressions with techniques such as equivalence partitioning, and boundary value analysis, and test for robustness. Try not to
make complex regular expressions as they are difficult to understand and test. Note that some regular expression engines will match only part of the
input if no anchors are used. In PHP for example preg_match("/[A-Za-z0-9]+/", $text) will accept any string containing at least one
alphanumeric character because it has no anchors.
import java.util.regex.Pattern;
class BasePattern {
String regex; // a regular expression
String input; // a user input
void foo(CharSequence htmlString) {
input.matches(regex); // Questionable
Pattern.compile(regex); // Questionable
Pattern.compile(regex, Pattern.CASE_INSENSITIVE); // Questionable
String replacement = "test";
input.replaceAll(regex, replacement); // Questionable
input.replaceFirst(regex, replacement); // Questionable
if (!Pattern.matches(".*<script>.*", htmlString)) { // Questionable, even if the pattern is hard-coded
}
}
}
This also applies for bean validation, where regexp can be specified:
import java.io.Serializable;
import javax.validation.constraints.Pattern;
import javax.validation.constraints.Email;
import org.hibernate.validator.constraints.URL;
class BeansRegex implements Serializable {
@Pattern(regexp=".+@.+") // Questionable
private String email;
@Email(regexp=".+@.+") // Questionable
private String email2;
@URL(regexp=".*") // Questionable
private String url;
// ...
}
Calls to java.util.regex.Pattern.matcher(...), java.util.regex.Pattern.split(...) and all methods of
java.util.regex.Matcher are not highlighted as the pattern compilation is already highlighted.
Calls to String.split(regex) and String.split(regex, limit) will not raise an exception despite their use of a regular
expression. These methods are used most of the time to split on a single character, which doesn't create any vulnerability.