- Notifications
You must be signed in to change notification settings - Fork 252
refactor: Use a common from_value method #472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
refactor: Use a common from_value method #472
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
6ca707b to 1c8ed59 Compare Explicit `ValueKind` mapping instead of implicitly inferred `Value` type. Matches the `format/json5.rs` logic. Signed-off-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
DRY: The `json`, `json5` and `toml` parsers all leverage `serde` and can share a common enum to deserialize data into, instead of individual methods performing roughly the same transformations. - While `ron` improves their support for serde untagged enums with `v0.9`, it is still not compatible with this approach (_Their README details why_). - The `yaml` support doesn't leverage `serde`, thus is not compatible. `from_parsed_value()` is based on the approached used by `format/json5.rs`: - It has been adjusted to reflect the `ValueKind` enum, which could not directly be used due to the `Table` and `Array` types using `Value` as their value storage type instead of self-referencing the enum. - Very similar to a `impl From`, but supports the complimentary `uri` parameter for each `Value` derived. Signed-off-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
- The enum did not properly handle the `Datetime` TOML value which needed to be converted into a `String` type. - This workaround approach avoids duplicating `from_parsed_value()` logic to support one enum variant. Signed-off-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
The enum is only intended as a helper for this deserializer into `String` type, it can be bundled inside. Likewise, no need to `impl Display`. Signed-off-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
If no other variant could be deserialized into successfully, the type is not supported and treated as the `Nil` type. This better communicates failures within tests when a type is compared and is not expected to be `Nil`. Signed-off-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
1c8ed59 to cfabdba Compare - `Char` variant needed to be converted to `String`. - `Option` variant could be introduced generically. NOTE: With `v0.9` of Ron, more types are introduced. Without tests, if these do not deserialize into a supported type they will be treated as `Nil` type. Signed-off-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
68df817 to c82013e Compare - Required switching to `serde_yaml`. - Like with `yaml-rust`, requires special handling for table keys. Signed-off-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
c82013e to d4f2f35 Compare | This PR has been updated to also include support for formats Ron and Yaml.
Ron:
Yaml:
During the integration of Ron and Yaml into this PR (like with earlier TOML test for datetime), both failed running their tests due to the more implicit Resolving requires either:
It's not too different from the prior approach, these types were in the formats own For TOML with the Which IIRC I had read was due to how
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some inline context if it assists with review.
If it's easier to follow/digest, I've staged out changes into scoped commits. Each commit has an associated commit message describing the changes 👍
| let key = match key { | ||
| serde_yaml::Value::Number(k) => Some(k.to_string()), | ||
| serde_yaml::Value::String(k) => Some(k), | ||
| _ => None, | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally I tried to use:
let key = serde_yaml::from_value::<String>(k).ok();However this resulted in a "stack overflow" according to my notes at the time.
IIRC this failure wasn't restricted to the untagged enum approach here, and the explicit match is also required for the existing approach used (including for the map variant in the previous yaml-rust crate).
I also have this failure in the notes that was caused by the keys string being mismatched (eg: instead of the string expected_key, a similar string like unexpected_key) was not caught well by the tests failure output:
Test failure output
running 7 tests test test_error_parse ... ok test test_override_uppercase_value_for_struct ... FAILED test test_override_uppercase_value_for_enums ... FAILED test test_override_lowercase_value_for_struct ... FAILED test test_file ... FAILED test test_yaml_parsing_key ... FAILED test test_override_lowercase_value_for_enums ... FAILED failures: ---- test_override_uppercase_value_for_struct stdout ---- thread 'test_override_uppercase_value_for_struct' panicked at tests/file_yaml.rs:151:66: called `Result::unwrap()` on an `Err` value: missing field `bar` ---- test_override_uppercase_value_for_enums stdout ---- thread 'test_override_uppercase_value_for_enums' panicked at tests/file_yaml.rs:203:54: called `Result::unwrap()` on an `Err` value: value of enum EnumSettings should be represented by either string or table with exactly one key ---- test_override_lowercase_value_for_struct stdout ---- thread 'test_override_lowercase_value_for_struct' panicked at tests/file_yaml.rs:186:56: called `Result::unwrap()` on an `Err` value: missing field `foo` ---- test_file stdout ---- thread 'test_file' panicked at tests/file_yaml.rs:43:43: called `Result::unwrap()` on an `Err` value: missing field `debug` ---- test_yaml_parsing_key stdout ---- thread 'test_yaml_parsing_key' panicked at tests/file_yaml.rs:115:10: called `Result::unwrap()` on an `Err` value: missing field `inner_string` note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ---- test_override_lowercase_value_for_enums stdout ---- thread 'test_override_lowercase_value_for_enums' panicked at tests/file_yaml.rs:221:54: called `Result::unwrap()` on an `Err` value: value of enum EnumSettings should be represented by either string or table with exactly one key | // Option to Result: | ||
| match (key, value) { | ||
| (Some(k), Some(v)) => Ok((k, v)), | ||
| _ => Err(serde::de::Error::custom("should not be serialized to Map")), | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit verbose, but easier to grok?
An alternative with .zip().ok_or_else() was considered, including here if you prefer it.
When I completed my first iteration and was after some feedback for better handling this .map() call having mixed error return types (serde_yaml + de::Error::custom), it was advised:
-
Have the
key+valueconversions convert theirResulttoOption. -
Then if either
keyorvaluewasNoneuse the desired serde deserialization error here (whichdeserialize_anyuses to understand this variant was unsuccessful and to try deserializing the next variant of theParsedValueenum). -
They had refactored to use separate functions for
keyandvalue, instead of nested inline within themap()closure.While instead of this
matchthey usedzip()on the tuple:let key = key_to_string(k); let value = val_to_parsed_value(v); key.zip(value).ok_or_else(|| serde::de::Error::custom("should not be serialized to Map") )
Which accomplishes the same by creating an iterator on the single value of key and value:
- Produces a tuple (
a.zip(b)) ofSome(key, value)until one of the iterators encounters aNone. - Return the option as a
Result(ok_or_else()).
| match ParsedMap::deserialize(deserializer)? { | ||
| ParsedMap::Table(v) => Ok(v), | ||
| #[cfg(feature = "yaml")] | ||
| ParsedMap::YamlMap(table) => { | ||
| table | ||
| .into_iter() | ||
| .map(|(key, value)| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .map() will produce a collection of Result<String, ParsedValue> items, and .collect() will transform that into a single Result<T, E> (where T would be Map<String, ParsedValue>).
An implicit feature of Rust, no explicit transform of each Result to a collection required.
I wasn't sure if it'd short-circuit on first error encountered, but due to the return type with Result for collect() it appears this is supported via FromIterator trait, thus doesn't require try_collect().
| [dependencies] | ||
| lazy_static = "1.0" | ||
| serde = "1.0.8" | ||
| serde_with = "3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Required for the Nil variant fallback (scoped to it's own commit) if deserialize_any fails to match anything prior.
The crate could be avoided if you wanted to implement the equivalent directly 🤷♂️
Any format that supports a `char` type would now be matched for conversion to the expected `String` type. Signed-off-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
| @matthiasbeyer this is ready for review, I know you're quite busy so here's a TLDR:
|
| For custom format support, the As it's all focused on using Serde with untagged enum, perhaps it shouldn't live in I also need to verify an assumption that other number types like I will also share the other formats I integrated locally, if desired they can be contributed via follow-up PR. |
| So from what I read, we want this in. |
I still need to address my concerns from my prior comment here. I will return to After that's sorted locally, I'll rebase as requested and it'd be great to see this get merged :) Regarding custom format support, how would you approach verifying that? Mostly so that you can catch that |
| Note to self, since this PR was raised the proposed yaml alternative (serde compatible) has been archived and a different yaml crate has been introduced instead, so this PR will need to drop that support (unfortunate, since the serde approach was preferrable I think). |
| // NOTE: Order of variants is important. Serde will use whichever | ||
| // the input successfully deserializes into first. | ||
| #[derive(serde::Deserialize, Debug)] | ||
| #[serde(untagged)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serde(untagged) is a performance footgun in situations like this (easily noticeable in benchmarks when I made the same mistake in toml::Value).
We'd likely want to use serde-untagged.
Yes, json5 did this already but I care more if it spreads to all formats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serde(untagged)is a performance footgun in situations like this (easily noticeable in benchmarks when I made the same mistake intoml::Value).
Ah, I think when I contributed this I favored it for being more convenient maintenance wise and thinking of it from an application config viewpoint where you'd usually not have so much config to parse that it'd be a perf concern? 😅
We'd likely want to use
serde-untagged.
Glancing over at the crate docs, I assume this means losing the convenience of serde(untagged) and requires a bit more explicit mapping to each variant?
I don't have a problem with that, but it was nice to have for simple mappings of types to their variant.
Yes, json5 did this already but I care more if it spreads to all formats
This wasn't clear to me at first where you were referencing as I thought you meant upstream json5 crate (that only uses serde(untagged) in their tests) 😅 but I see you meant the config-rs format implementation for json5 uses serde(untagged) (and is removed via this PR too) 👍 (EDIT: I see it's also the first thing my PR description mentioned, I forgot as it's been a while 😓)
So as a bonus perhaps, this would address your concern about the json5 format being deferred to this common serde handler for supported formats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there is a significant problem we run into, I stand by using serde-untaggedas I don't want the other formats to regress by adding this performance hit.
I don't have a problem with that, but it was nice to have for simple mappings of types to their variant.
I tried to push for serde_derive to have attributes to force type-variant mappings to avoid the performance overhead of untagged but the maintainers weren't interested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stand by using
serde-untaggedas I don't want the other formats to regress by adding this performance hit.
That's fair.
It could be useful as an alternative format helper to implement additional formats, which although might not be used by any official format, if a user could extend config-rs support separately without requiring upstreaming of a format into the crate?
I'll look into adapting to serde-untagged when I can find the time, but if anyone else wants to tackle that it'd be appreciated 😅
I tried to push for
serde_deriveto have attributes to force type-variant mappings to avoid the performance overhead ofuntaggedbut the maintainers weren't interested.
That's unfortunate, sounds like a very reasonable request 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be useful as an alternative format helper to implement additional formats, which although might not be used by any official format, if a user could extend config-rs support separately without requiring upstreaming of a format into the crate?
They can with https://docs.rs/config/latest/config/trait.Format.html
They won't get our extension-agnostic loader but that isn't always necessary and can easily be implemented on its own.
| // Equivalent to ValueKind, except Table + Array store the same enum | ||
| // Useful for serde to serialize values into, then convert to Value. | ||
| // NOTE: Order of variants is important. Serde will use whichever | ||
| // the input successfully deserializes into first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this just replaces our existing type we are deserializing to, I'll give this a pass. Overall, I'd rather we deserialize directly into our needed type.
That would either require
- Putting a dummy value in for the origin
- Building up my
serde-basepathidea I've been toying with, see Designing for layered configs clap-rs/clap#2763 (reply in thread)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this just replaces our existing type we are deserializing to, I'll give this a pass. Overall, I'd rather we deserialize directly into our needed type.
I believe that I wanted to deserialize directly too, but I think the comment here highlighted the reasoning that did not happen was due to config::Value being a struct with a value: ValueKind field while config::ValueKind enum has the Table + Array variants reference config::Value instead of Self to self-reference the enum, which complicated adapting to that?
If you look at what this PR does, it effectively replaces the format specific methods (like from_yaml_value() / from_toml_value()) with a single from_parsed_value() method instead.
Most of those format from_*_value() methods (that are serde reliant) took in a deserialized format value type as a parameter, delegating away error handling involved (except partially for Ron and the non-serde Yaml format).
In both cases the before/after approach of these methods are only really performing the task mapping to config::Value. That logic doesn't belong in value.rs when it's format/type specific though? You can see at the end of format.rs in this PR how the additional serde deserializers are added for format specific support concerns, with types/crates not really relevant to bundle into value.rs?
Hopefully that clarifies why ParsedValue was introduced as an intermediary convenience for the deserializing + mapping step. With from_parsed_value() it effectively consolidates the associated formats it can support, making this logic in config-rs much easier to grok and maintain?
Last I recall the various from_*_value() implementations were (or at least at some point when I was contributing) not particularly consistent in style/quality despite having roughly the same logic. One could tidy that up, or just make it DRY (which this PR favored), such that any new serde compatible formats would be much simpler to add/support, and thus maintain without as much boilerplate prone to diverging.
Existing (and new) serde compatible formats would have this common call going forward:
let value = format::from_parsed_value(uri, serde_json::from_str(text)?);If format.rs can be adapted to make needing ParsedValue redundant (by adapting the same improvement to config::Value / ValueKind directly), that'd be great too! This PR was just focused on that initial step to take a DRY approach without being too invasive of a change to the codebase 😅
That said some of the PR value was lost since as the switch to serde-yaml didn't age well. It would have been nice to shift the added error handling there to serde.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting a dummy value in for the origin
NOTE: I used uri in my responses since it's frequently the variable name used instead of origin and was what came to mind when writing out my thoughts 😅
from_parsed_value() (like how other format methods before it did) provides the uri value to use when mapping from a ParsedValue to a config::Value (with it's associated uri + ValueKind variant).
Adding a placeholder uri (effectively None?) would then need a step that updates the uri for each Value (along with it's nested Value items depending on type), and I'm not sure how you'd like to approach that in the context of a method like from_parsed_value() where the uri is already given?
ParsedValue enum leveraged the serde untagged convenience to map the formats own Value type, but to have that done for config::Value instead just to add the uri would be verbose of an implementation?
I understand if using an intermediary this way instead of skipping that may seem as a codesmell, and I think I may have originally intended to look into how I could improve that after this PR was resolved. Partly because of a lack of familiarity with the project as a whole and not wanting to introduce breaking changes or deal with extra churn / considerations that may cause vs this reduced scope that was easier for me to reason about and get through review 😅
As such, I still think the current approach with this PR using the intermediary enum would be valid and keeps the PR focused on the goal of unifying the serde compatible formats to a common parse method.
Refactoring that to then make ParsedValue redundant and work directly with config::Value + ValueKind seems better suited to a follow-up PR, and would be more pleasant in commit/merge history as context for these distinct changes to any interested party, as opposed to munged together (but that could just be more perspective as I tend to encourage more focused PRs with all commits squashed via the PR merge strategy).
All the uri value represents is the source, so for these formats the config::Value returned and any nested value produced, that uri is the same throughout.
As such it's not that meaningful during the format mapping to config::Value, but collections of config::Value need each item annotated with that same uri value/ref, so that merging/layering has that traceability of source.
If that were something better handled to avoid the need for ValueKind::Array and ValueKind::Table to require config::Value items just to track the associated uri, then this whole thing could be simplified as desired 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building up my
serde-basepathidea I've been toying with
I glanced through the linked ref but that still didn't click for me sorry.
In the current state of this PR, I could have from_parsed_value() assign None as the uri (in lieu of a serde-untagged implementation effectively transforming to Value::new(uri, kind) for deserializing to config::ValueKind on the Array/Table variants, or kind.into() if deserializing straight to config::Value?), and then I could call a separate helper to update all the uriorigin to the desired one before returning the final config::Value result?
// In value.rs: impl Value { pub fn set_origin_recursively(&mut self, origin: Option<&String>) { self.origin = origin.cloned(); // If kind is a collection, update the `origin` of their `Value`s too: match self.kind { ValueKind::Array(ref mut collection) => { for v in collection.iter_mut() { v.set_origin_recursively(origin) } } ValueKind::Table(ref mut collection) => { for (_, v) in collection.iter_mut() { v.set_origin_recursively(origin) } } // This is presumably unacceptable for handling all other variants // of `ValueKind` that don't contain nested `Value`s, but short of listing // all remaining variants here I'm not sure how you'd want to approach it _ => {}, }; } } // format.rs: // Nothing to map if already deserialized into Value: pub fn from_parsed_value(uri: Option<&String>, mut value: Value) -> Value { value.set_origin_recursively(uri); value } // Alternatively for ValueKind (implies Array/Table variants already have `v.origin = None`): pub fn from_parsed_value(uri: Option<&String>, value: ValueKind) -> Value { let mut v = Value::from(value); v.set_origin_recursively(uri); v }Both would make from_parsed_value() largely redundant then 😅
The above set_origin_recursively() utility method could be added via a separate PR in the meantime? (origin isn't public, so updating the origin after it's set isn't supported I think?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose with the more verbose addition of serde-untagged, handling the Array + Table types of ValueKind would be a non-issue as .map/.seq calls would could leverage .into() similar to how you could construct ValueKind like this:
// ValueKind::Table(T) storing `Vec<T> => ValueKind::Array(T) => Value` let map_of_vecs: ValueKind = HashMap::<String, Value>::from([ ("abc".to_string(), vec!["a", "b", "c"].into()), ("xyz".to_string(), vec!["x", "y", "z"].into()), ]).into(); // ValueKind::Table(T) => Value let v = Value::from(map_of_vecs);Value::from will provide the None placeholder for origin and the added recursive method can patch that afterwards 👍
I think these two collection variants of ValueKind were only problematic with serde(untagged) when paired with the from_parsed_value() recursively calling itself to map the enum value, since they were not wrapping Self 😅
Other than that, there's the Option value type from Ron which I'm not sure how you'd approach that with serde-untagged, for serde(untagged) with ParsedValue the recursive call through from_parsed_value() worked with the type Option<Box<Self>>.
Likewise, while serde-untagged and formats like Ron support char, ValueKind lacks it and casts to ValueKind::String. A similar concern exists for Toml support when parsing it's date value as there is no equivalent in ValueKind it is mapped to the string variant too. Both cases were handled in this PR with a separate ParsedString enum delegated to serde(untagged), all wrapped in a custom deserializer method for the ParsedValue::String variant and the deserialize_with annotation (that I've not looked into if that's compatible with serde-untagged, I assume it is?).
So a few gaps on how to proceed there with serde-untagged, but I think it might still be better deferred to a follow-up PR replacing serde(untagged). It's diff might then be more straight-forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the verbosity, limited time to condense all my comments but I'll try 💪
The gist of my response is as follows:
ParsedValue+from_parsed_value()unifies the equivalent code it replaces. Deserializing directly toconfig::Valuewould be a separate improvement, this PR just consolidates / streamlines existing mapping. It might not be ideal, but I think it's still worthwhile?- I don't think performance is the deciding factor for choosing
config-rsover the competition. I'm open to adoptingserde-untaggedhere if someone could fill the gaps in proceeding, but it might be better deferred to a separate PR at a later date?
ParsedValue is not replacing ValueKind
As this just replaces our existing type we are deserializing to, I'll give this a pass.
The PR DRYs up existing format implementations (which are inconsistent in style/approach for blocks that accomplish the same task).
Assuming that your comment was referring to the ValueKind type being replaced (but I could be mistaken and you actually meant each individual format value type):
ParsedValuesubstitutes each formats own Value type, whilefrom_parsed_value()unifies the existing equivalent format methods mapping of a format specificValuetype toconfig::ValueKind(and subsequent conversion toconfig::Value).- The
from_*_value()format methods tend to be recursively called for mapping to the finalconfig::Valuetype (optionally with nestedconfig::ValuewhenValueKindis a collection type) and associating the fixedoriginsource to each mappedconfig::Value. ValueKindcontains collection variants that store items ofconfig::ValuenotSelf, which is problematic for recursive calls offrom_*_value(). You would instead need to deserialize directly intoconfig::ValueAFAIK and then recursively updateorigin.
This transition of existing formats to a common ParsedValue enum + from_parsed_value() method is AFAIK an improvement for maintainability and anyone referencing the format support to contribute new formats.
When I authored this PR I was able to easily add a few other formats without copy/pasting boilerplate of one of the existing implementations. That seems worthwhile?
serde-untagged / performance
serde(untagged)is a performance footgun in situations like this (easily noticeable in benchmarks when I made the same mistake intoml::Value).We'd likely want to use
serde-untagged.
Personally that may not be as digestible of a change within this PR?
- The
serde(untagged)vsserde-untaggedconcern on performance is valid, but IMO is better served as a follow-up PR to this one if it will result in additions akin to this support forconfig::Value(and this much more verbose related serde support). - No one has tackled the concern this PR addresses since I raised it, so I'm not sure if blocking on requiring additional complexity will be beneficial 😓
That said, should anyone find the performance an issue when parsing configs in their real-world projects, they'd likely be motivated to contribute that suggested improvement (which is either reverting to the current state we have now prior to this PR with it's drawbacks, or adapting from the changes of this PR to use serde-untagged and skip ParsedValue?), you're an example of that with your contributions to the toml crates, so I completely understand the reluctance 😅
Config parsing benchmarks are presumably a rather niche concern for the bulk of projects though (which might prefer a config crate that is easier to grok and maintain, as existing issues tend to be focused on).
As you're effectively the core maintainer of this crate now, that is totally up to you. I am in support of your requests if you're familiar enough with how to approach the concerns I raised in my feedback that are unclear to me, but I get the impression it'd still be better delegated to a subsequent PR? (it has already been suggested that this PR alone in it's current state would benefit from splitting)
| // NOTE: Order of variants is important. Serde will use whichever | ||
| // the input successfully deserializes into first. | ||
| #[derive(serde::Deserialize, Debug)] | ||
| #[serde(untagged)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serde(untagged)is a performance footgun in situations like this (easily noticeable in benchmarks when I made the same mistake intoml::Value).
Ah, I think when I contributed this I favored it for being more convenient maintenance wise and thinking of it from an application config viewpoint where you'd usually not have so much config to parse that it'd be a perf concern? 😅
We'd likely want to use
serde-untagged.
Glancing over at the crate docs, I assume this means losing the convenience of serde(untagged) and requires a bit more explicit mapping to each variant?
I don't have a problem with that, but it was nice to have for simple mappings of types to their variant.
Yes, json5 did this already but I care more if it spreads to all formats
This wasn't clear to me at first where you were referencing as I thought you meant upstream json5 crate (that only uses serde(untagged) in their tests) 😅 but I see you meant the config-rs format implementation for json5 uses serde(untagged) (and is removed via this PR too) 👍 (EDIT: I see it's also the first thing my PR description mentioned, I forgot as it's been a while 😓)
So as a bonus perhaps, this would address your concern about the json5 format being deferred to this common serde handler for supported formats?
| // Equivalent to ValueKind, except Table + Array store the same enum | ||
| // Useful for serde to serialize values into, then convert to Value. | ||
| // NOTE: Order of variants is important. Serde will use whichever | ||
| // the input successfully deserializes into first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this just replaces our existing type we are deserializing to, I'll give this a pass. Overall, I'd rather we deserialize directly into our needed type.
I believe that I wanted to deserialize directly too, but I think the comment here highlighted the reasoning that did not happen was due to config::Value being a struct with a value: ValueKind field while config::ValueKind enum has the Table + Array variants reference config::Value instead of Self to self-reference the enum, which complicated adapting to that?
If you look at what this PR does, it effectively replaces the format specific methods (like from_yaml_value() / from_toml_value()) with a single from_parsed_value() method instead.
Most of those format from_*_value() methods (that are serde reliant) took in a deserialized format value type as a parameter, delegating away error handling involved (except partially for Ron and the non-serde Yaml format).
In both cases the before/after approach of these methods are only really performing the task mapping to config::Value. That logic doesn't belong in value.rs when it's format/type specific though? You can see at the end of format.rs in this PR how the additional serde deserializers are added for format specific support concerns, with types/crates not really relevant to bundle into value.rs?
Hopefully that clarifies why ParsedValue was introduced as an intermediary convenience for the deserializing + mapping step. With from_parsed_value() it effectively consolidates the associated formats it can support, making this logic in config-rs much easier to grok and maintain?
Last I recall the various from_*_value() implementations were (or at least at some point when I was contributing) not particularly consistent in style/quality despite having roughly the same logic. One could tidy that up, or just make it DRY (which this PR favored), such that any new serde compatible formats would be much simpler to add/support, and thus maintain without as much boilerplate prone to diverging.
Existing (and new) serde compatible formats would have this common call going forward:
let value = format::from_parsed_value(uri, serde_json::from_str(text)?);If format.rs can be adapted to make needing ParsedValue redundant (by adapting the same improvement to config::Value / ValueKind directly), that'd be great too! This PR was just focused on that initial step to take a DRY approach without being too invasive of a change to the codebase 😅
That said some of the PR value was lost since as the switch to serde-yaml didn't age well. It would have been nice to shift the added error handling there to serde.
| // Equivalent to ValueKind, except Table + Array store the same enum | ||
| // Useful for serde to serialize values into, then convert to Value. | ||
| // NOTE: Order of variants is important. Serde will use whichever | ||
| // the input successfully deserializes into first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting a dummy value in for the origin
NOTE: I used uri in my responses since it's frequently the variable name used instead of origin and was what came to mind when writing out my thoughts 😅
from_parsed_value() (like how other format methods before it did) provides the uri value to use when mapping from a ParsedValue to a config::Value (with it's associated uri + ValueKind variant).
Adding a placeholder uri (effectively None?) would then need a step that updates the uri for each Value (along with it's nested Value items depending on type), and I'm not sure how you'd like to approach that in the context of a method like from_parsed_value() where the uri is already given?
ParsedValue enum leveraged the serde untagged convenience to map the formats own Value type, but to have that done for config::Value instead just to add the uri would be verbose of an implementation?
I understand if using an intermediary this way instead of skipping that may seem as a codesmell, and I think I may have originally intended to look into how I could improve that after this PR was resolved. Partly because of a lack of familiarity with the project as a whole and not wanting to introduce breaking changes or deal with extra churn / considerations that may cause vs this reduced scope that was easier for me to reason about and get through review 😅
As such, I still think the current approach with this PR using the intermediary enum would be valid and keeps the PR focused on the goal of unifying the serde compatible formats to a common parse method.
Refactoring that to then make ParsedValue redundant and work directly with config::Value + ValueKind seems better suited to a follow-up PR, and would be more pleasant in commit/merge history as context for these distinct changes to any interested party, as opposed to munged together (but that could just be more perspective as I tend to encourage more focused PRs with all commits squashed via the PR merge strategy).
All the uri value represents is the source, so for these formats the config::Value returned and any nested value produced, that uri is the same throughout.
As such it's not that meaningful during the format mapping to config::Value, but collections of config::Value need each item annotated with that same uri value/ref, so that merging/layering has that traceability of source.
If that were something better handled to avoid the need for ValueKind::Array and ValueKind::Table to require config::Value items just to track the associated uri, then this whole thing could be simplified as desired 😅
| // Equivalent to ValueKind, except Table + Array store the same enum | ||
| // Useful for serde to serialize values into, then convert to Value. | ||
| // NOTE: Order of variants is important. Serde will use whichever | ||
| // the input successfully deserializes into first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building up my
serde-basepathidea I've been toying with
I glanced through the linked ref but that still didn't click for me sorry.
In the current state of this PR, I could have from_parsed_value() assign None as the uri (in lieu of a serde-untagged implementation effectively transforming to Value::new(uri, kind) for deserializing to config::ValueKind on the Array/Table variants, or kind.into() if deserializing straight to config::Value?), and then I could call a separate helper to update all the uriorigin to the desired one before returning the final config::Value result?
// In value.rs: impl Value { pub fn set_origin_recursively(&mut self, origin: Option<&String>) { self.origin = origin.cloned(); // If kind is a collection, update the `origin` of their `Value`s too: match self.kind { ValueKind::Array(ref mut collection) => { for v in collection.iter_mut() { v.set_origin_recursively(origin) } } ValueKind::Table(ref mut collection) => { for (_, v) in collection.iter_mut() { v.set_origin_recursively(origin) } } // This is presumably unacceptable for handling all other variants // of `ValueKind` that don't contain nested `Value`s, but short of listing // all remaining variants here I'm not sure how you'd want to approach it _ => {}, }; } } // format.rs: // Nothing to map if already deserialized into Value: pub fn from_parsed_value(uri: Option<&String>, mut value: Value) -> Value { value.set_origin_recursively(uri); value } // Alternatively for ValueKind (implies Array/Table variants already have `v.origin = None`): pub fn from_parsed_value(uri: Option<&String>, value: ValueKind) -> Value { let mut v = Value::from(value); v.set_origin_recursively(uri); v }Both would make from_parsed_value() largely redundant then 😅
The above set_origin_recursively() utility method could be added via a separate PR in the meantime? (origin isn't public, so updating the origin after it's set isn't supported I think?)
| // Equivalent to ValueKind, except Table + Array store the same enum | ||
| // Useful for serde to serialize values into, then convert to Value. | ||
| // NOTE: Order of variants is important. Serde will use whichever | ||
| // the input successfully deserializes into first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose with the more verbose addition of serde-untagged, handling the Array + Table types of ValueKind would be a non-issue as .map/.seq calls would could leverage .into() similar to how you could construct ValueKind like this:
// ValueKind::Table(T) storing `Vec<T> => ValueKind::Array(T) => Value` let map_of_vecs: ValueKind = HashMap::<String, Value>::from([ ("abc".to_string(), vec!["a", "b", "c"].into()), ("xyz".to_string(), vec!["x", "y", "z"].into()), ]).into(); // ValueKind::Table(T) => Value let v = Value::from(map_of_vecs);Value::from will provide the None placeholder for origin and the added recursive method can patch that afterwards 👍
I think these two collection variants of ValueKind were only problematic with serde(untagged) when paired with the from_parsed_value() recursively calling itself to map the enum value, since they were not wrapping Self 😅
Other than that, there's the Option value type from Ron which I'm not sure how you'd approach that with serde-untagged, for serde(untagged) with ParsedValue the recursive call through from_parsed_value() worked with the type Option<Box<Self>>.
Likewise, while serde-untagged and formats like Ron support char, ValueKind lacks it and casts to ValueKind::String. A similar concern exists for Toml support when parsing it's date value as there is no equivalent in ValueKind it is mapped to the string variant too. Both cases were handled in this PR with a separate ParsedString enum delegated to serde(untagged), all wrapped in a custom deserializer method for the ParsedValue::String variant and the deserialize_with annotation (that I've not looked into if that's compatible with serde-untagged, I assume it is?).
So a few gaps on how to proceed there with serde-untagged, but I think it might still be better deferred to a follow-up PR replacing serde(untagged). It's diff might then be more straight-forward?
| I'm detailing this as a refresher for future me or anyone else who may be interested in pushing this PR forward. My time is often sparse, so with the additional requirements to adopt Remaining tasks:
|
Note that I said this was non-blocking but we will eventually want to do this. |
It is non-blocking to update all formats to this |
| Use of |
Prep for #472 `serde(untagged)` translates to an `if`/`else if` ladder that ends up being pretty expensive, including the error generation for each branch that gets skipped. This switches us to dispatch on type instead, speeding things up. This is also prep for speeding up builds by removing the need for `serde_derive`. Once `serde_core` is published, we can depend on that and not wait for `syn`, `quote`, etc.
| #677 was merged |
| Although I now have the time available, recent experiences have led me to shift my focus away from Feel free to close this PR or leave it open for visibility should someone else want to progress it. |
Inspired from the
json5feature (Original PR July 2020, revised PR May 2021) with the refactoring in the revision by @matthiasbeyerIt looked useful for other formats that use
serdeto simplify their logic (so long as test coverage is informative of no issues 🙏 )DRY: The
json,json5andtomlparsers all leverageserdeand can share a common enum to deserialize data into, instead of individual methods performing roughly the same transformations.ronimproves their support for serde untagged enums with v0.9,it is still not compatible with this approach (Their README details why).(UPDATE: At least for the currentconfig-rstest coverage, this appears to be passing now)yamlsupport doesn't leverageserdethus is not compatible. (UPDATE: Since there is talk about the unmaintained status, it could be switched forserde-yaml, I've verified it can pass the tests with an extra deserializer method)from_parsed_value()is based on the approached used byformat/json5.rs:ValueKindenum, which could not directly be used due to theTableandArraytypes usingValueas their value storage type instead of self-referencing the enum.impl From, but supports the complimentaryuriparameter for eachValuederived.Resolves: #394