Skip to content

Conversation

@phadej
Copy link
Contributor

@phadej phadej commented Mar 28, 2018

Canonical JSON and RFC 7159 still don't agree, even strings are ASCII, as per RFC newlines have to be escaped.

Related comment is theupdateframework/python-tuf#457 (comment)

Yet, e.g. aeson lenienly accepts \x0a, but doesn't produce it:

λ Text.JSON.Canonical> prettyCanonicalJSON (JSString "\n") "\"\n\"" λ Data.Aeson> decode "\"\n\"" :: Maybe String Just "\n" λ Data.Aeson> encode (decode "\"n\"" :: Maybe String) "\"\\n\""

I'm tempted to restrict hackage-security to only allow characters in 0x20-0x7f range. That's enough for our use, isn't it?

@hvr
Copy link
Member

hvr commented Mar 29, 2018

What does "restrict" mean? consider it a parsing failure? I don't think this works, as we want to be able to support unicode package names (even if we disallow them on the central hackage server as a policy; but there's users who want to be able to have unicode on their private package repos supported).

@phadej
Copy link
Contributor Author

phadej commented Mar 29, 2018

@hvr if we want to support unicode, we have to wait what CJSON folks come up. They have very conflicting specs: "strings is sequence of bytes", OTOH JSON proper mandates that JSON document is valid Unicode

It is suggested that unicode strings be represented as the UTF-8 encoding of unicode Normalization Form C (UAX #15). However, arbitrary content may be represented as a string: it is not guaranteed that string contents can be meaningfully parsed as UTF-8.

That strongly assumes that the CJSON document is not text but binary, when JSON document is text document.

A previous version of this specification required strings to be valid unicode, and relied on JSON's \u escape. This was abandoned as it doesn't allow representing arbitrary binary data in a string, and it doesn't preserve the identity of non-canonical unicode strings.

And this is crap, everyone base64/base16 encodes binary blobs when they are in JSON.

Also

λ D.T.Normalize D.Aeson D.Text> normalize NFD "äiti" "a\776iti" λ D.T.Normalize D.Aeson D.Text> encode (normalize NFD "äiti") "\"a\204\136iti\"" λ D.T.Normalize D.Aeson D.Text> decode (encode (normalize NFD "äiti")) :: Maybe Text Just "a\776iti"

I don't see what doesn't preserve identity of non-canonical unicode strings ("broken" JSON impl somewhere?)

@phadej
Copy link
Contributor Author

phadej commented Mar 29, 2018

One way is to adhere to CJSON as it's now, which means

- | JSString String + | JSString ByteString
@hvr
Copy link
Member

hvr commented Mar 29, 2018

At least they've acknowledge the issue; we've known about this inconsistency in the OLPC spec since at least Jan 2016, see 867f2e5

@phadej
Copy link
Contributor Author

phadej commented Mar 30, 2018

Anyway, is there anything blocking this from merging? What to do with CJSON can wait, using newer QuickCheck will save me building old QuickCheck once I wipe store next time ;)

EDIT and the 0xff bug is "documented" in tests too now.

@hvr
Copy link
Member

hvr commented Mar 31, 2018

Anyway, is there anything blocking this from merging?

Nope; but there's no need for a new release yet... so it may take a while till this hits Hackage.

@hvr hvr merged commit 2aaf239 into haskell:master Mar 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants