Skip to content

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Dec 11, 2020

Mend Renovate

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
chardet ==3.0.4 -> ==4.0.0 age adoption passing confidence

Release Notes

chardet/chardet

v4.0.0

Compare Source

⚠️ This will be the last release of chardet to support Python 2.7. chardet 5.0 will only support 3.6+ ⚠️

Major Changes

This release is multiple years in the making, and provides some quality of life improvements to chardet. The primary user-facing changes are:

  1. Single-byte charset probers now use nested dictionaries under the hood, so they are usually a little faster than before. (See #​121 for details)
  2. The CharsetGroupProber class now properly short-circuits when one of the probers in the group is considered a definite match. This lead to a substantial speedup.
  3. There is now a chardet.detect_all function that returns a list of possible encodings for the input with associated confidences.
  4. We have dropped support for Python 2.6, 3.4, and 3.5 as they are all past end-of-life.

The changes in this release have also laid the groundwork for retraining the models to make them more accurate, and to support some more encodings/languages (see #​99 for progress). This is our main focus for chardet 5.0 (beyond dropping Python 2 support).

Benchmarks

Running on a MacBook Pro (15-inch, 2018) with 2.2GHz 6-core i7 processor and 32GB RAM

old version (chardet 3.0.4)
Benchmarking chardet 3.0.4 on CPython 3.7.5 (default, Sep 8 2020, 12:19:42) [Clang 11.0.3 (clang-1103.0.32.62)] -------------------------------------------------------------------------------- Calls per second for each encoding: ascii: 25559.439366240098 big5: 7.187002209518091 cp932: 4.71090956645177 cp949: 2.937256786994428 euc-jp: 4.870580412090848 euc-kr: 6.6910755971933416 euc-tw: 87.71098043480079 gb2312: 6.614302607154443 ibm855: 27.595893549680685 ibm866: 29.93483661732791 iso-2022-jp: 3379.5052775763434 iso-2022-kr: 26181.67290886392 iso-8859-1: 120.63424740403983 iso-8859-5: 32.65106262196898 iso-8859-7: 62.480089080556084 koi8-r: 13.72481001727257 maccyrillic: 33.018537255804496 shift_jis: 4.996013583677438 tis-620: 14.323112928341818 utf-16: 166771.53081510935 utf-32: 198782.18009478672 utf-8: 13.966236809766901 utf-8-sig: 193732.28637413395 windows-1251: 23.038910006925768 windows-1252: 99.48409117053738 windows-1255: 6.336261495718825 Total time: 357.05358052253723s (10.054513372323958 calls per second) 
new version (chardet 4.0.0)
 Benchmarking chardet 4.0.0 on CPython 3.7.5 (default, Sep 8 2020, 12:19:42) [Clang 11.0.3 (clang-1103.0.32.62)] -------------------------------------------------------------------------------- ....................................................................................................................................................................................................................................................................................................................................................................... Calls per second for each encoding: ascii: 38176.31067961165 big5: 12.86915132656389 cp932: 4.656400877065864 cp949: 7.282976434315926 euc-jp: 4.329381447610525 euc-kr: 8.16386823884839 euc-tw: 90.230745070368 gb2312: 14.248865889128146 ibm855: 33.30225548069821 ibm866: 44.181691968506 iso-2022-jp: 3024.2295767539117 iso-2022-kr: 25055.57945041816 iso-8859-1: 59.25262902122995 iso-8859-5: 39.7069713674529 iso-8859-7: 61.008422013862194 koi8-r: 41.21560517643845 maccyrillic: 31.402474369805002 shift_jis: 4.9091652743515155 tis-620: 14.408875278821073 utf-16: 177349.00634249471 utf-32: 186413.51111111112 utf-8: 108.62174360115105 utf-8-sig: 181965.46637744035 windows-1251: 43.16933400329809 windows-1252: 211.27653358317968 windows-1255: 16.15113643694104 Total time: 268.0230791568756s (13.394368915143872 calls per second) 

Thank you to @​aaaxx, @​edumco, @​hrnciar, @​hroncok, @​jdufresne, @​mdamien, @​saintamh , @​xeor for submitting pull requests, to all of our users for being patient with how long this release has taken.

Full changelog

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, click this checkbox.

This PR has been generated by Mend Renovate. View repository job log here.

@renovate renovate bot changed the title chore(deps): update dependency chardet to v4 chore(deps): update dependency chardet to v4 - autoclosed Sep 25, 2022
@renovate renovate bot closed this Sep 25, 2022
@renovate renovate bot deleted the renovate/chardet-4.x branch September 25, 2022 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant