~ 4 min read

Are you also validating a JavaScript URL using RegEx?

share this story on
What do you think of the following JavaScript URL validation function code? Are you accidentally adding security issues while doing so?

What do you think of the following JavaScript URL validation function code? Are you accidentally adding security issues while trying to build a feature?

The Snyk blog features a Secure JavaScript URL validation article about the importance of security traits and secure best practices with regards to handling a JavaScript URL.

I shared the following code snippet on Twitter to see folks make of it and whether someone would be calling out security issues:

function checkUrlIsValid (string) {
    let givenURL ;
    try {
        givenURL = new URL (string);
    } catch (error) {
        console.log ("error is", error);
       return false; 
    }
    return true;
  }

The replies varied pretty much and included some interesting perspectives and potential security vectors that folks might not be aware of, which we will cover in this article. But standing out were replies that suggested to use regular expressions (RegEx) to validate a URL.

Using a RegEx to perform validation isn’t new, and in fact, an often used approach by developers when they need to perform string matching or string manipulation. In fact, even the popular validator npm package uses RegEx to validate data formats in strings. But is it the right approach? What sort of security concerns does RegEx exposes us to? Let’s find out.

Regular Expression Denial of Service

Due to how some RegEx engines work, they can be vulnerable to a type of attack called Regular Expression Denial of Service (ReDoS). This happens because of an implementation detail in the RegEx engine that is known as catastrophic backtracking.

The fact that escapes most when dealing with Regular Expressions is that RegEx expressions are CPU-bound.

For JavaScript and Node.js, both being single-threaded environments for the main event loop that handles runtime JavaScript code, this would be disastrous. A ReDoS attack can cause a Node.js process to completely halt and stop responding to any HTTP requests.

Consider the following function that uses a RegEx to validate a URL:

function checkUrlIsValidFast (string) {

    var ip = '(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])(?:\\.(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])){3}';
    var protocol = '(?:http(s?)\:\/\/)?';
    var auth = '(?:\\S+(?::\\S*)?@)?';
    var host = '(?:(?:[a-z\\u00a1-\\uffff0-9_]-*)*[a-z\\u00a1-\\uffff0-9]+)';
    var domain = '(?:\\.(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)*';
    var tld = '(?:\\.(?:[a-z\\u00a1-\\uffff]{2,}))\\.?';
    var port = '(?::\\d{2,5})?';
    var path = '(?:[/?#][^\\s"]*)?';
    var regex = '(?:' + protocol + '|www\\.)' + auth + '(?:localhost|' + ip + '|' + host + domain + tld + ')' + port + path;
  
    return new RegExp(regex, 'ig').test(string);
  }

  console.log(checkUrlIsValidFast('https://example.com'))
  // returns true

The regular expression validation above looks great, right?

Well, it’s not. It’s vulnerable to a ReDoS attack. Let’s see how. What if the attacker sends the following string as input for a URL:

  console.log(checkUrlIsValidFast('018137.113.215.4074.138.129.172220.179.206.94180.213.144.175250.45.147.1364868726sgdm6nohQ'))
  // returns true
  // but after like a million years.
  // goodluck ;-)

Your npm package validator is vulnerable to ReDoS

My personal advice when I’m asked about how to handle RegEx in situation where you need to validate a string is to avoid it completely if you can and use lower-order string manipulation functions instead.

The reason is that RegEx is a very powerful tool, but it’s also very complicated and can be very hard to get right. If you want some supporting evidence, I can offer at least two:

If smart maintainers, many collaborators, and talented developers employed by Fortune500 public companies can’t get RegEx right, how can we expect the average developer to do so?

What else to worry about when validating URLs?

Other security aspects to consider when validating URLs:

Getting better at RegEx

As I have mentioned in a follow-up tweet to the discussion about JavaScript URL validation:

for most of us, unless you are @TheDavisJam who practically wrote the book on regular expression denial of service and who is familiar with internal regex state machine engines.

I’ve put together a few resources that will be helpful to better understand ReDoS, its impact and how to avoid it: