Skip to content

Conversation

@davidhewitt
Copy link
Contributor

@davidhewitt davidhewitt commented Nov 3, 2025

Change Summary

This carries out the partial revert described in #1829 (comment), moving encode_credentials behaviour to a keyword argument, defaulting False for backwards compatbility with pre-2.12.1 behaviour.

EDIT: while implementing this, the points of discussion below made it clear the semantics need further work. Here we now fully revert the URL credential encoding in Pydantic 2.12. We plan to introduce this option in Pydantic 2.13 when we have had time to design it properly.

Note: while implementing this (and the test) I observed how broken the URL parsing can be when encoding is not done:

  • We should probably implement encoding for the other components.
    • Maybe the argument should be percent_encode: bool and affect all components? I can rework this PR and extend tests if so.
  • The URL parsing is extremely broken when components are not escaped properly. I wonder if we should raise an error in these build methods if components are not properly escaped and percent_encode is False?
    • URL validation cannot have a percent_encode option because the component boundaries are not known.

Related issue number

Closes pydantic/pydantic#12457
Closes pydantic/pydantic#12434

Checklist

  • Unit tests for the changes exist
  • Documentation reflects the changes where applicable
  • Pydantic tests pass with this pydantic-core (except for expected changes)
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers
@davidhewitt
Copy link
Contributor Author

davidhewitt commented Nov 3, 2025

@Viicos I'd love your opinions on my points of discussion above.

@codspeed-hq
Copy link

codspeed-hq bot commented Nov 3, 2025

CodSpeed Performance Report

Merging #1882 will not alter performance

Comparing dh/optional-encode-credentials (8e4ba66) with main (2e932c6)

Summary

✅ 163 untouched

@Viicos
Copy link
Member

Viicos commented Nov 4, 2025

  • We should probably implement encoding for the other components.

    • Maybe the argument should be percent_encode: bool and affect all components? I can rework this PR and extend tests if so.

I've read through RFC3986 in more detail:

  • It seems we should apply percent-encoding on every component. For the scheme however, it shouldn't be necessary as only a specific set of characters is allowed (and the url crate seems to be validating it). For the host component, it is not clear to me. RFC3986 states:

    The reg-name syntax allows percent-encoded octets in order to
    represent non-ASCII registered names in a uniform way that is
    independent of the underlying name resolution technology. Non-ASCII
    characters must first be encoded according to UTF-8 [STD63], and then
    each octet of the corresponding UTF-8 sequence must be percent-
    encoded to be represented as URI characters. URI producing
    applications must not use percent-encoding in host unless it is used
    to represent a UTF-8 character sequence.

    So the logic is probably different?

  • The URL parsing is extremely broken when components are not escaped properly. I wonder if we should raise an error in these build methods if components are not properly escaped and percent_encode is False?

    • URL validation cannot have a percent_encode option because the component boundaries are not known.

We might raise an error, but not until V3. And yes for the url validation the parsing should just assume components were properly encoded first, and it will do best effort to provide meaningful validation errors (but at least should error on every invalid case).

@davidhewitt
Copy link
Contributor Author

We might raise an error, but not until V3. And yes for the url validation the parsing should just assume components were properly encoded first, and it will do best effort to provide meaningful validation errors (but at least should error on every invalid case).

I wouldn't think we need to wait until v3 to raise errors from .build methods for bad data.

The point is that in the .build methods we can error when the validation of the URL might be successful but not actually build the URL you think. e.g. @ in middle of password will currently cut the password in two; the bit before is treated as password and the bit after treated as the host.

So calling .host on the "valid" built URL would give you something different to the host you passed to .build() - this is a legitimate bug in my view (and it's exactly the issue which got us in this mess).

@davidhewitt
Copy link
Contributor Author

I have now made this PR disable the behaviour again while we figure out correct semantics for pydantic 2.13.

@davidhewitt davidhewitt marked this pull request as ready for review November 4, 2025 10:13
@davidhewitt davidhewitt changed the title make url credential encoding optional revert url credential encoding (to be reintroduced as an option in future) Nov 4, 2025
@davidhewitt davidhewitt requested a review from Viicos November 4, 2025 10:18
@davidhewitt davidhewitt merged commit 4d23017 into main Nov 4, 2025
32 checks passed
@davidhewitt davidhewitt deleted the dh/optional-encode-credentials branch November 4, 2025 10:48
davidhewitt added a commit to pydantic/pydantic that referenced this pull request Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants