|
|
| Created: 11 years, 8 months ago by djd Modified: 10 years, 10 months ago Reviewers: CC: mpvl, golang-codereviews Visibility: Public. | Descriptiongo.text/language: add plural and ordinal rule functions. Parse the plural/ordinal rules from CLDR to form equivalent Go functions for each language. Patch Set 1 #Patch Set 2 : diff -r 9e44a3c23558 https://code.google.com/p/go.text #Patch Set 3 : diff -r 9e44a3c23558 https://code.google.com/p/go.text #Patch Set 4 : diff -r 9e44a3c23558 https://code.google.com/p/go.text #Patch Set 5 : diff -r 9e44a3c23558 https://code.google.com/p/go.text # Total comments: 11
MessagesTotal messages: 7
Hello mpvl (cc: golang-codereviews@googlegroups.com), I'd like you to review this change to https://code.google.com/p/go.text Sign in to reply to this message.
A few general comments: - I have a very strong preference to not use Yacc. This is not a complicated grammar, and using Yacc is just a maintenance headache, in my opinion. If you have a very compelling reason to use it, or is there is precedence for it in the Go source code, you should at least include a Makefile to generate the code. - The language package is supposed to be as lightweight as possible, as it needs to be included for many different packages. The plural information should therefore not be in there. (Even the currency information should probably move out, although that is arguably needed to parse BCP 47 u extensions.) Is there any reason you included it in this package? https://codereview.appspot.com/66840043/diff/60001/cldr/plural_parse.go File cldr/plural_parse.go (right): https://codereview.appspot.com/66840043/diff/60001/cldr/plural_parse.go#newcode4 cldr/plural_parse.go:4: import __yyfmt__ "fmt" Not sure I'm getting this renaming. https://codereview.appspot.com/66840043/diff/60001/cldr/plural_parse.go#newco... cldr/plural_parse.go:26: const NUM = 57346 Ideally this is a const block. It is unidiomatic to have all uppercase constants. But should these be exported at all? It seems to me this is not the case. https://codereview.appspot.com/66840043/diff/60001/cldr/plural_parse.go#newco... cldr/plural_parse.go:73: func (x *pluralLex) Lex(yylval *pluralSymType) int { Why is this an uppercase name? It is more idiomatic to user lowercase names for internal types. A possible exception is, for example, if the struct is intended to be made public, but I don't think this is the case here. https://codereview.appspot.com/66840043/diff/60001/cldr/plural_parse.go#newco... cldr/plural_parse.go:145: Lower, Upper int The same holds for field names wrt to upper vs lower casing. https://codereview.appspot.com/66840043/diff/60001/cldr/plural_parse.go#newco... cldr/plural_parse.go:306: __yyfmt__.Printf("lex %s(%d)\n", pluralTokname(c), uint(char)) It is probably nicer to write debug output to os.Stderr. https://codereview.appspot.com/66840043/diff/60001/cldr/plural_parse.y File cldr/plural_parse.y (right): https://codereview.appspot.com/66840043/diff/60001/cldr/plural_parse.y#newcod... cldr/plural_parse.y:180: // The generated expression assumes a single integer variable 'n'. As this is a public API and considering we know that the value to be passed is likely going to be a float, it probably makes sense to have the user pass floats and document that only integer values are supported for now. https://codereview.appspot.com/66840043/diff/60001/language/maketables.go File language/maketables.go (right): https://codereview.appspot.com/66840043/diff/60001/language/maketables.go#new... language/maketables.go:1: // Copyright 2013 The Go Authors. All rights reserved.getLangID([]byte(l)) I don't think this has any effect. :) https://codereview.appspot.com/66840043/diff/60001/language/maketables.go#new... language/maketables.go:538: func (b *builder) writeFuncSlice(name string, ss []string) { This is not adding size information. But see other comments. https://codereview.appspot.com/66840043/diff/60001/language/tables.go File language/tables.go (right): https://codereview.appspot.com/66840043/diff/60001/language/tables.go#newcode295 language/tables.go:295: fromTo{from: 0x9e, to: 0x98}, Please create with "make tables". This will run gofmt with the -s option, which removes this. Sign in to reply to this message.
https://codereview.appspot.com/66840043/diff/60001/cldr/plural_parse.y File cldr/plural_parse.y (right): https://codereview.appspot.com/66840043/diff/60001/cldr/plural_parse.y#newcod... cldr/plural_parse.y:179: // For example "n is 3 or n in 5..10" becomes "n == 3 || n >= 5 && n <= 10". This seems like a fairly simple rewrite. Why would you need such a complex solution for this? https://codereview.appspot.com/66840043/diff/60001/cldr/plural_parse.y#newcod... cldr/plural_parse.y:180: // The generated expression assumes a single integer variable 'n'. Come to think of it, I believe a better solution is to not do anything special in the cldr package and rather move this code to the respective maketables. Different implementation may have very different approaches on how they encode plural rules, so this code doesn't really belong in the cldr package. It should be fairly agnostic to how users want to encode their data. Saying that they should be encoded as specific Go code already is making a strong statement about preferred implementation. The RuleProcessor for collation, for example, deals with an unusual XML representation where not having it would make the usage awkward. Still, it doesn't impose a representation of the data on the user. So I would suggest moving pretty much all of your CL to a single different package. On 2014/02/21 09:50:15, mpvl wrote: > As this is a public API and considering we know that the value to be passed is > likely going to be a float, it probably makes sense to have the user pass floats > and document that only integer values are supported for now. Sign in to reply to this message.
I'll just respond to the meta comments now, and update the code tomorrow. I put it in the language package mostly so I could access Tag.lang to index into the tables, and also because it seemed that the tables.go file was the where equivalent data lived. Happy to move it all to a new package (the cldr + the language things): would something like "language/plural" work? I can also drop this change entirely, and pursue this outside go.text for now. What would you suggest as a way to index into the tables without being able to access Tag.lang? The yacc file: happy to change this to something simpler. I had a grammar I'd written previously to convert these rules to JS, and it was easy enough to adapt for Go. The grammar for the next version of CLDR is quite different, but I don't think that will be much harder to handle with a pure-Go version. Sign in to reply to this message.
On Mon, Feb 24, 2014 at 12:13 PM, <djd@golang.org> wrote: > I'll just respond to the meta comments now, and update the code > tomorrow. > > I put it in the language package mostly so I could access Tag.lang to > index into the tables, and also because it seemed that the tables.go > file was the where equivalent data lived. Happy to move it all to a new > package (the cldr + the language things): would something like > "language/plural" work? I would probably just put it at the top level or as a subdirectory of localization. Haven't given it much thought. But probably not language, as that would mean almost all package would move under there. > I can also drop this change entirely, and pursue > this outside go.text for now. Either way. I would at least mention it is experimental if you keep it public. > What would you suggest as a way to index > into the tables without being able to access Tag.lang? > If you use a Matcher to find plural rules, it returns an index of the Tag registered with the Matcher. Another approach is to keep tags in a sorted list (could be a long string using 2 or 4 bytes per tag) and then doing a binary search to find the index. This is more compact than using Tags. I'm running into this topic more often, so I could expose an integer value for Script, Region, and Base. Note that Base is not guaranteed to be sparse, though. One of the main reasons I haven't done this is that the numbers will change, which may either confuse people, break builds, or force people to recompile their tables. In many cases, people will need to use a Matcher anyway, though. The binary search approach actually works quite well and is pretty compact. > The yacc file: happy to change this to something simpler. I had a > grammar I'd written previously to convert these rules to JS, and it was > easy enough to adapt for Go. The grammar for the next version of CLDR is > quite different, but I don't think that will be much harder to handle > with a pure-Go version. > > https://codereview.appspot.com/66840043/ > Sign in to reply to this message.
On Tue, Feb 25, 2014 at 12:12 AM, Marcel van Lohuizen <mpvl@golang.org>wrote: > > > > On Mon, Feb 24, 2014 at 12:13 PM, <djd@golang.org> wrote: > >> I'll just respond to the meta comments now, and update the code >> tomorrow. >> >> I put it in the language package mostly so I could access Tag.lang to >> index into the tables, and also because it seemed that the tables.go >> file was the where equivalent data lived. Happy to move it all to a new >> package (the cldr + the language things): would something like >> "language/plural" work? > > I would probably just put it at the top level or as a subdirectory of > localization. Haven't given it much thought. But probably not language, as > that would mean almost all package would move under there. > Another solution is to put it in go.exp/locale/plurals for now. This also includes search. > > >> I can also drop this change entirely, and pursue >> this outside go.text for now. > > Either way. I would at least mention it is experimental if you keep it > public. > > >> What would you suggest as a way to index >> into the tables without being able to access Tag.lang? >> > If you use a Matcher to find plural rules, it returns an index of the Tag > registered with the Matcher. Another approach is to keep tags in a sorted > list (could be a long string using 2 or 4 bytes per tag) and then doing a > binary search to find the index. This is more compact than using Tags. > > I'm running into this topic more often, so I could expose an integer value > for Script, Region, and Base. Note that Base is not guaranteed to be > sparse, though. One of the main reasons I haven't done this is that the > numbers will change, which may either confuse people, break builds, or > force people to recompile their tables. In many cases, people will need to > use a Matcher anyway, though. The binary search approach actually works > quite well and is pretty compact. > > >> The yacc file: happy to change this to something simpler. I had a >> grammar I'd written previously to convert these rules to JS, and it was >> easy enough to adapt for Go. The grammar for the next version of CLDR is >> quite different, but I don't think that will be much harder to handle >> with a pure-Go version. >> >> https://codereview.appspot.com/66840043/ >> > > Sign in to reply to this message.
R=close To the author of this CL: The Go project has moved to Gerrit Code Review. If this CL should be continued, please see the latest version of https://golang.org/doc/contribute.html for instructions on how to set up Git and the Go project's Gerrit codereview plugin, and then create a new change with your current code. If there has been discussion on this CL, please give a link to it (golang.org/cl/66840043 is best) in the description in your new CL. Thanks very much. Sign in to reply to this message. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
