Message378233
Indeed, this is just a very unlucky case. >>> n = len(longer) >>> from collections import Counter >>> Counter(s[:n]) Counter({0: 9056995, 255: 6346813}) >>> s[n-30:n+30].replace(b'\x00', b'.').replace(b'\xff', b'@') b'..............................@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@' >>> Counter(s[n:]) Counter({255: 18150624}) When checking "base", we're in this situation pattern: @@@@@@@@ string: .........@@@@@@@@ Algorithm says: ^ these last characters don't match. ^ this next character is not in the pattern Therefore, skip ahead a bunch: pattern: @@@@@@@@ string: .........@@@@@@@@ This is a match! Whereas when checking "longer", we're in this situation: pattern: @@@@@@@@@ string: .........@@@@@@@@ Algorithm says: ^ these last characters don't match. ^ this next character *is* in the pattern. We can't jump forward. pattern: @@@@@@@@ string: .........@@@@@@@@ Start comparing at every single alignment... I'm attaching reproducer.py, which replicates this from scratch without loading data from a file. | |
| Date | User | Action | Args | | 2020-10-08 11:06:59 | Dennis Sweeney | set | recipients: + Dennis Sweeney, tim.peters, vstinner, pmpp, serhiy.storchaka, josh.r, ammar2, Zeturic | | 2020-10-08 11:06:59 | Dennis Sweeney | set | messageid: <1602155219.38.0.293310549875.issue41972@roundup.psfhosted.org> | | 2020-10-08 11:06:59 | Dennis Sweeney | link | issue41972 messages | | 2020-10-08 11:06:59 | Dennis Sweeney | create | | |