This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, jcea, lemburg, mark.dickinson, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date 2012-01-19.13:13:41
SpamBayes Score 7.903199e-06
Marked as misclassified No
Message-id <CAMpsgwbsVX0g9o40EvjXbXt2FBugdd35AkH7zFOPVf99VNqB8A@mail.gmail.com>
In-reply-to <4F176E88.3090503@udel.edu>
Content
I tried the collision counting with a low number of collisions: less than 15 collisions ----------------------- Fail at startup. 5 collisions (32 buckets, 21 used=65.6%): hash=ceb3152f => f 10 collisions (32 buckets, 21 used=65.6%): hash=ceb3152f => f dict((str(k), 0) for k in range(2000000)) ----------------------------------------- 15 collisions (32,768 buckets, 18024 used=55.0%): hash=0e4631d2 => 31d2 20 collisions (131,072 buckets, 81568 used=62.2%): hash=12660719 => 719 25 collisions (1,048,576 buckets, 643992 used=61.4%): hash=6a1f6d21 => f6d21 30 collisions (1,048,576 buckets, 643992 used=61.4%): hash=6a1f6d21 => f6d21 35 collisions => ? (more than 10,000,000 integers) random_dict('', 50000, charset, 1, 3) -------------------------------------- charset = 'abcdefghijklmnopqrstuvwxyz0123456789' 15 collisions (8192 buckets, 5083 used=62.0%): hash=1526677a => 77a 20 collisions (32768 buckets, 19098 used=58.3%): hash=5d7760e6 => 60e6 25 collisions => <unable to generate a new key> random_dict('', 50000, charset, 1, 3) -------------------------------------- charset = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.=+_(){}%' 15 collisions (32768 buckets, 20572 used=62.8%): hash=789fe1e6 => 61e6 20 collisions (2048 buckets, 1297 used=63.3%): hash=2052533d => 33d 25 collisions => nope random_dict('', 50000, charset, 1, 10) -------------------------------------- charset = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.=+_(){}%' 15 collisions (32768 buckets, 18964 used=57.9%): hash=94d7c4f5 => 44f5 20 collisions (32768 buckets, 21548 used=65.8%): hash=acb5b39e => 339e 25 collisions (8192 buckets, 5395 used=65.9%): hash=04d367ae => 7ae 30 collisions => nope random_dict() comes from the following script: *** import random def random_string(charset, minlen, maxlen): strlen = random.randint(minlen, maxlen) return ''.join(random.choice(charset) for index in xrange(strlen)) def random_dict(prefix, count, charset, minlen, maxlen): dico = {} keys = set() for index in xrange(count): for tries in xrange(10000): key = prefix + random_string(charset, minlen, maxlen) if key in keys: continue keys.add(key) break else: raise ValueError("unable to generate a new key") dico[key] = None charset = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.=+_(){}%' charset = 'abcdefghijklmnopqrstuvwxyz0123456789' random_dict('', 50000, charset, 1, 3) *** I ran the Django test suite. With a limit of 20 collisions, 60 tests fail. With a limit of 50 collisions, there is no failure. But I don't think that the test suite uses large data sets. I also triend the Django test suite with a randomized hash function. There are 46 failures. Many (all?) are related to the order of dict keys: repr(dict) or indirectly in a HTML output. I didn't analyze all failures. I suppose that Django can simply run the test suite using PYTHONHASHSEED=0 (disable the randomized hash function), at least in a first time.
History
Date User Action Args
2012-01-19 13:13:43vstinnersetrecipients: + vstinner, lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, pitrou, christian.heimes, benjamin.peterson, eric.araujo, grahamd, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-19 13:13:42vstinnerlinkissue13703 messages
2012-01-19 13:13:41vstinnercreate