Skip to content

Conversation

bdehamer
Copy link
Contributor

Closes #745

Expands the URL regex used in validate_download_location to allow for a userinfo specifier to appear before the hostname. Previously, a value like git+ssh://git@git.myproject.org/MyProject.git would fail validation due to the inclusion of the git@ portion of the URL.

In the URI spec (RFC2396), the userinfo portion of the server component is defined as follow:

userinfo = *( unreserved | escaped | ";" | ":" | "&" | "=" | "+" | "$" | "," ) unreserved = alphanum | mark alphanum = alpha | digit alpha = lowalpha | upalpha lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")" escaped = "%" hex hex hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f" 

I translated this into the following regex:

[\w\-.!~*'()%;:&=+$,]+

and then placed it as an optional prefix to the hostname (with a trailing @ character)

([\w\-.!~*'()%;:&=+$,]+@)?
Signed-off-by: Brian DeHamer <bdehamer@github.com>
Copy link
Collaborator

@armintaenzertng armintaenzertng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the addition! :)

@armintaenzertng armintaenzertng merged commit 1ecc6f6 into spdx:main Aug 17, 2023
@bdehamer bdehamer deleted the bdehamer/uri-validator branch September 15, 2023 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants