Skip to content
This repository has been archived by the owner on Dec 16, 2019. It is now read-only.

Username portion causes failure if > 16 characters #4

Open
jaydiablo opened this issue Jul 8, 2016 · 10 comments
Open

Username portion causes failure if > 16 characters #4

jaydiablo opened this issue Jul 8, 2016 · 10 comments
Assignees
Labels

Comments

@jaydiablo
Copy link
Contributor

I've looked over the regex and can't seem to figure out why this would happen.

We had a case where php-url-validator was rejecting a URL because the username was too long.

For example, if I put this URL in the test suite, it fails:

https://abcefghijklmnopst:dklsfjkdsfjkldsfdsdsfdsffsdfsafsdafdsafdsafdsasfdsfdsfsdb@api.twilio.com

However, if I reduce the username by 1 character, it passes:

https://abcefghijklmnops:dklsfjkdsfjkldsfdsdsfdsffsdfsafsdafdsafdsafdsasfdsfdsfsdb@api.twilio.com

Any idea why this might be happening? (Notice that the password field can be any length, i.e., this still fails)

https://abcefghijklmnopst:[email protected]

but this passes:

https://abcefghijklmnops:[email protected]

I tried changing the named parameter in the regex to something other than "username", just in case, but it still fails if over 16 characters.

@jaydiablo
Copy link
Contributor Author

My colleague @aripringle was doing some additional testing on this and he discovered that the preg_match call that the URL validator is doing is actually returning false for the URLs that have a username component longer than 16 characters rather than 0.

preg_last_error() shows "2", which apparently is this constant: PREG_BACKTRACK_LIMIT_ERROR.

He was able to make the URL pass validation if he bumped the backtrack limit as high as 5,000,000:

ini_set('pcre.backtrack_limit', 5000000); 1,000,000 wasn't enough, which apparently is the default (https://secure.php.net/manual/en/pcre.configuration.php#ini.pcre.backtrack-limit)

@Fleshgrinder
Copy link
Owner

This is the result of the current .+ rule for both username and password. Limiting it to the exact range of characters that are allowed in there (unreserved + percentage encoded + colon) would overcome the catastrophic backtracking issue. The thing is, a modern URL validator should not even parse the username and password and simply treat it as user info. Splitting of the user info into parts is up to the software. This is because submitting such information within a URL is a security problem.

In other words, username and password should be removed in favor of user info and the group should be:

(?'user_info'(?:[a-z0-9\-\._~:]|%[a-f0-9][a-f0-9])++@)?

This avoids the backtracking issue and complies better with the actual RFC. If you want to keep support for direct splitting, do the following:

(?:(?'username'(?:[a-z0-9\-\._~]|%[a-f0-9][a-f0-9])++):(?'password'(?:[a-z0-9\-\._~:]|%[a-f0-9][a-f0-9])++)@)?

@jaydiablo
Copy link
Contributor Author

jaydiablo commented Jul 11, 2016

I tried both of those approaches and still encounter the backtrack limit error (at the default limit of 1,000,000).

@Fleshgrinder
Copy link
Owner

I will investigate further but I'm on vacation right now and will be back home next week. I'll get back to you then.

@jaydiablo
Copy link
Contributor Author

No worries, enjoy your vacation!

I've been messing with it a little bit, no progress yet though.

@Fleshgrinder
Copy link
Owner

An atomic group should solve it:

            (?>
                (?'username'[^@]+)
                (?::(?'password'[^@]+))?
            @)?

I have a very tight schedule these days and not much time. You might want to consider using another, better maintained package instead (e.g. thephpleague/uri although a bit over engineered).

@Fleshgrinder Fleshgrinder self-assigned this Aug 1, 2016
@jaydiablo
Copy link
Contributor Author

Unfortunately that change doesn't seem to affect the backtracking behavior, I still get the backtrack limit exceeded error on this url:

https://abcefghijklmnopst:[email protected]

I can certainly look at the League URI package. I'm curious how it performs against the URLs in your test suite.

@Fleshgrinder
Copy link
Owner

Making sure that the atomic group only matches once will help:

            (?>
                (?'username'[^@]+)
                (?::(?'password'[^@]+))?
            @)??

I will overhaul the library a bit and release a new version soon.

@jaydiablo
Copy link
Contributor Author

Ah, nice! That fixes it.

It doesn't seem necessary for the tests to pass, but would it affect performance/efficiency at all to exclude a colon from the username portion as well?:

            (?>
                (?\'username\'[^@:]+)
                (?::(?\'password\'[^@]+))?
            @)??

@Fleshgrinder
Copy link
Owner

Fleshgrinder commented Oct 30, 2016

The RegEx is not very performant nor efficient and the library should not distinguish between username and password but treat it all simply as user info (RFC 3986).

This is what I have now for the next major of this lib:

~^
    (?:([a-z][a-z\d+.-]+):)?
    (?://
        (?:([^@]+)@)?
        (
            (?:xn--[[:alnum:]-]+|(?!..--)[[:alnum:]\x{00a1}-\x{ffff}-]+(?<!-))
            (?:\.(?>xn--[[:alnum:]-]+|(?!..--)[[:alnum:]\x{00a1}-\x{ffff}-]+)(?<!-))*
        |
            \[[\.:\da-f]+\]
        )
        (?::(\d+))?
    )?
    (/(?!/)[^?\#\s[:cntrl:]]*)?
    (?:\?([^?\#\s[:cntrl:]]+))?
    (?:\#([^\s[:cntrl:]]+))?
$~Dixu

It's not fully tested yet but much closer to RFC 3986 and really fast. It parser your https://abcefghijklmnopst:[email protected] in 66 steps in less than a millisecond. The old one takes 427 steps with the fix I provided and more than 4 milliseconds.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants