Skip to content

Conversation

@alicebob
Copy link

Isn't really faster, somehow, but it saves a few allocs. The disadvantage is there is now a max key size (which I've set to 32).

master:

BenchmarkSlurp 300000 4338 ns/op 540 B/op 21 allocs/op 

this PR:

BenchmarkSlurp 300000 4448 ns/op 414 B/op 9 allocs/op 
@alicebob
Copy link
Author

Together with #8 and the change from yesterday I now get this benchmark:

BenchmarkSlurp 500000 3529 ns/op 89 B/op 2 allocs/op 

for this stripped City struct:

type GeoipCity struct { City struct { Names struct { En string `maxminddb:"en"` } `maxminddb:"names"` } `maxminddb:"city"` Country struct { GeoNameID uint `maxminddb:"geoname_id"` IsoCode string `maxminddb:"iso_code"` } `maxminddb:"country"` Location struct { Latitude float64 `maxminddb:"latitude"` Longitude float64 `maxminddb:"longitude"` MetroCode uint `maxminddb:"metro_code"` } `maxminddb:"location"` Postal struct { Code string `maxminddb:"code"` } `maxminddb:"postal"` } 
@oschwald
Copy link
Owner

I am a bit unsure on this one. I think a better approach might be to cache the key string offsets for the open database. However, we would have to be careful when doing this as short key string may be reproduced and nothing in the file spec prevents long strings from being reproduced.

@alicebob
Copy link
Author

Fair enough, but the problem is that you can't use byte slices as a map key. I'll think about this a bit more.

@oschwald
Copy link
Owner

I think we would only need to cache the mapping between the offset and the field index for the type. However, the scope of the cache will need to be limited to the particular reader instance.

@alicebob
Copy link
Author

I'm not sure I follow. AFAICS the offsets of the fields depend on the length of the values, which vary for every object in the db.

We use unsafe, but the lifetime of the strings build with unsafe is very short, they are never returned.
@alicebob
Copy link
Author

Using unsafe we can use strings of arbitrary lengths, see the last commit. Unsafe should be fine, since it's used to build strings we only use to compare; they are never returned to the user.

That said, it is use of unsafe ;)

@oschwald
Copy link
Owner

With the MaxMind DB writer, the strings over a certain size occur only once in the database. If you cache the offset within the file to the string, you don't actually need to read out the bytes to know it is that string.

@alicebob
Copy link
Author

As long as all map keys are always pointers, even small ones, and all equivalent keys point to the same address in the database, that would be a nice solution indeed.

@oschwald
Copy link
Owner

The issue isn't so much whether they are pointers, as we would use the offset to the actual string in the file, but that small strings get repeated. We would need some way to limit the cache size or not cache the repeated strings.

@alicebob
Copy link
Author

In that case I think the updated PR as it currently stands avoids all those problems, while still not needlessly allocating struct field names. It might not be the optimal way, but it's also reasonably simple.

Up to you, of course :)

@oschwald oschwald merged commit ba218fe into oschwald:master Jan 31, 2015
@oschwald
Copy link
Owner

I've merged this. The use of the unsafe seems safe in this situation and I don't have any serious concerns now that the key size limitation has been removed. That said, I think it is still worth testing the caching of the keys by offset, but the changes here don't conflict with that.

@alicebob alicebob deleted the bytekey branch January 31, 2015 11:25
@alicebob
Copy link
Author

ok, thanks for the quick responses!

@oschwald
Copy link
Owner

Thanks for the PRs! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants