| msg209524 - (view) | Author: STINNER Victor (vstinner) *  | Date: 2014-01-28 10:10 |
Attached patched disables references for int and float types. |
| msg209525 - (view) | Author: STINNER Victor (vstinner) *  | Date: 2014-01-28 10:14 |
Use attached bench.py to compare performances. Without the patch: --- dumps v0: 389.6 ms data size v0: 45582.9 kB loads v0: 573.3 ms dumps v1: 391.4 ms data size v1: 45582.9 kB loads v1: 558.0 ms dumps v2: 166.9 ms data size v2: 41395.4 kB loads v2: 482.2 ms dumps v3: 431.2 ms data size v3: 41395.4 kB loads v3: 543.8 ms dumps v4: 361.8 ms data size v4: 37000.9 kB loads v4: 560.4 ms --- With the patch: --- dumps v0: 391.4 ms data size v0: 45582.9 kB loads v0: 578.2 ms dumps v1: 392.3 ms data size v1: 45582.9 kB loads v1: 556.8 ms dumps v2: 167.7 ms data size v2: 41395.4 kB loads v2: 484.6 ms dumps v3: 170.3 ms data size v3: 41395.4 kB loads v3: 467.0 ms dumps v4: 122.8 ms data size v4: 37000.9 kB loads v4: 468.9 ms --- dumps v3 is 60% faster, loads v3 is also 14% *faster*. dumps v4 is 66% faster, loads v4 is 16% faster. |
| msg209526 - (view) | Author: STINNER Victor (vstinner) *  | Date: 2014-01-28 10:19 |
Performance of Python 3.3.3+: --- dumps v0: 374.8 ms data size v0: 45582.9 kB loads v0: 625.3 ms dumps v1: 374.6 ms data size v1: 45582.9 kB loads v1: 605.1 ms dumps v2: 152.9 ms data size v2: 41395.4 kB loads v2: 556.5 ms --- So with the patch, the Python 3.4 default version (4) is *faster* (dump 20% faster, load 16% faster) and produces *smaller files* (10% smaller). |
| msg209528 - (view) | Author: STINNER Victor (vstinner) *  | Date: 2014-01-28 10:20 |
Oh by the way, on Python 3.4, the file size (on version 3 and 4) is unchanged with my patch. Writing a reference produces takes exactly the same size than an integer. |
| msg209530 - (view) | Author: Vajrasky Kok (vajrasky) * | Date: 2014-01-28 10:25 |
I am doing clinic conversion for marshal module so I am adding myself to nosy list to make sure both tickets are synchronized. http://bugs.python.org/issue20185 |
| msg209531 - (view) | Author: STINNER Victor (vstinner) *  | Date: 2014-01-28 10:36 |
> I am doing clinic conversion for marshal module so I am adding myself to nosy list to make sure both tickets are synchronized. The Derby is suspended until the release of Python 3.4 final. I consider this issue as an important performance regression that should be fixed before Python 3.4 final. |
| msg209532 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) *  | Date: 2014-01-28 10:54 |
Did you tested for numerous shared int and floats? [1000] * 1000000 and [1000.0] * 1000000? AFAIK this was important use cases for adding 3 or 4 versions. |
| msg209533 - (view) | Author: STINNER Victor (vstinner) *  | Date: 2014-01-28 11:34 |
> Did you tested for numerous shared int and floats? [1000] * 1000000 and [1000.0] * 1000000? AFAIK this was important use cases for adding 3 or 4 versions. Here are new benchmarks on Python 3.4 with: Integers: [1000] * 1000000 Floats: [1000.0] * 1000000 Integers, without the patch: dumps v3: 62.8 ms data size v3: 4882.8 kB loads v3: 10.7 ms Integers, with the patch: dumps v3: 18.6 ms (-70%) data size v3: 4882.8 kB (same size) loads v3: 27.7 ms (+158%) Floats, without the patch: dumps v3: 62.5 ms data size v3: 4882.8 kB loads v3: 11.0 ms Floats, with the patch: dumps v3: 29.3 ms (-53%) data size v3: 8789.1 kB (+80%) loads v3: 25.5 ms (+132%) The version 3 was added by: --- changeset: 82816:01372117a5b4 user: Kristján Valur Jónsson <sweskman@gmail.com> date: Tue Mar 19 18:02:10 2013 -0700 files: Doc/library/marshal.rst Include/marshal.h Lib/test/test_marshal.py Misc/NEWS Python/marshal.c description: Issue #16475: Support object instancing, recursion and interned strings in marshal --- This issue tells about "sharing string constants, common tuples, even common code objects", not sharing numbers. For real data, here are interesting numbers: http://bugs.python.org/issue16475#msg176013 Integers only represent 4.8% of serialized data, and only 8.2% of these integers can be shared. (Floats represent 0.29%.) Whereas strings repsent 58% and 57% can be shared. |
| msg209534 - (view) | Author: Martin v. Löwis (loewis) *  | Date: 2014-01-28 11:35 |
For the record, format 3 was added through issue16475, format 4 was added through issue19219. In msg175962, Kristjan argued that there is no reason _not_ to share int objects, e.g. across multiple code objects. Now it seems that this argument is flawed: there is a reason, namely the performance impact. OTOH, I consider both use case (marshaling a large number of integers, and desiring to share ints across code objects) equally obscure: you shouldn't worry about marshal performance too much if you have loads of tiny int objects, and you shouldn't worry whether these ints get shared or not. As a compromise, we could suppress the sharing for small int objects, since they are singletons, anyway. This would allow marshal to preserve/copy the object graph, while not impacting the use case that the original poster on python-dev presented. |
| msg209535 - (view) | Author: Antoine Pitrou (pitrou) *  | Date: 2014-01-28 11:39 |
> Integers, without the patch: > > dumps v3: 62.8 ms > data size v3: 4882.8 kB > loads v3: 10.7 ms > > Integers, with the patch: > > dumps v3: 18.6 ms (-70%) > data size v3: 4882.8 kB (same size) > loads v3: 27.7 ms (+158%) As I wrote on python-dev, dumps performance isn't important for the pyc use case, but loads performance is. Therefore it appears this patch goes into the wrong direction. You are also ignoring the *runtime* benefit of sharing objects: smaller memory footprint of the actual Python process. |
| msg211164 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) *  | Date: 2014-02-13 19:29 |
> As a compromise, we could suppress the sharing for small int objects, since they are singletons, anyway. This is implementation detail. But we can use more efficient way to memoizating small int objects. I also suppose than efficient C implementation of IdentityDict will significant decrease an overhead (may be 2 times). As far as this is probably a first case for which IdentityDict give significant benefit, I'll provide an implementation for 3.5. |
| msg211173 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) *  | Date: 2014-02-13 21:35 |
Here is a patch which special cases small integers. It decreases v3 and v4 dump time of bench.py by 10% without affecting load time. Of course in real cases the benefit will be much less. |
| msg211691 - (view) | Author: Larry Hastings (larry) *  | Date: 2014-02-20 06:05 |
This is not a release blocker. You guys can play with this for 3.5. FWIW I prefer Serhiy's approach. |
| msg234903 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) *  | Date: 2015-01-28 16:45 |
Here is more general solution. For simple values (ints, floats, complex numbers, short strings) it is faster to use the value itself as a key than create new integer object (id). Without the patch: data ver. dumps(ms) loads(ms) size(KiB) genData 2 103.9 186.4 4090.7 genData 3 250.3 196.8 4090.7 genData 4 223.5 182.5 3651.3 [1000]*10**6 2 98.6 134.8 4882.8 [1000]*10**6 3 491.1 62.2 4882.8 [1000]*10**6 4 494.9 62.1 4882.8 [1000.0]*10**6 2 173.5 158.4 8789.1 [1000.0]*10**6 3 494.8 62.2 4882.8 [1000.0]*10**6 4 491.4 62.8 4882.8 [1000.0j]*10**6 2 288.8 221.4 16601.6 [1000.0j]*10**6 3 493.6 62.4 4882.8 [1000.0j]*10**6 4 489.2 62.0 4882.8 20 pydecimals 2 85.0 114.7 3936.5 20 pydecimals 3 97.2 44.3 3373.4 20 pydecimals 4 86.2 40.0 3297.5 With marshal3_numbers.patch: data ver. dumps(ms) loads(ms) size(KiB) genData 3 108.4 187.5 4090.7 genData 4 83.0 179.3 3651.3 [1000]*10**6 3 104.2 145.8 4882.8 [1000]*10**6 4 103.9 147.0 4882.8 [1000.0]*10**6 3 177.4 154.5 8789.1 [1000.0]*10**6 4 177.1 164.2 8789.1 [1000.0j]*10**6 3 501.6 61.1 4882.8 [1000.0j]*10**6 4 501.6 62.3 4882.8 20 pydecimals 3 95.2 41.9 3373.4 20 pydecimals 4 83.5 38.5 3297.5 With marshal_refs_by_value.patch: data ver. dumps(ms) loads(ms) size(KiB) genData 3 150.4 197.0 4090.7 genData 4 122.1 184.8 3651.3 [1000]*10**6 3 169.1 62.3 4882.8 [1000]*10**6 4 169.2 62.2 4882.8 [1000.0]*10**6 3 313.5 62.2 4882.8 [1000.0]*10**6 4 312.8 62.3 4882.8 [1000.0j]*10**6 3 410.6 62.5 4882.8 [1000.0j]*10**6 4 410.5 62.3 4882.8 20 pydecimals 3 68.5 41.1 3373.4 20 pydecimals 4 57.5 39.8 3297.5 The marshal_refs_by_value.patch produces data so compact as unpatched code does, but dumps faster. Usually it dumps slower than marshal3_numbers.patch, but may produce smaller data and loads faster. Surprisingly it dumps the code of the _pydecimal module faster. As side effect the patch can turn some simple equal values to identical objects. |
| msg235050 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) *  | Date: 2015-01-30 19:43 |
Here are results of the benchmark which measures dump and load time for all pyc files in the stdlib (including tests). https://bitbucket.org/storchaka/cpython-stuff/src/default/marshal/marshalbench.py $ find * -name __pycache__ -exec rm -rf '{}' + $ ./python -m compileall -qq -r 99 Lib $ find Lib -name '*.pyc' | xargs ./python marshalbench.py Without the patch: ver. dumps loads size 770.3 19941.2 0 695.7 1178.4 19941.2 1 680.4 1180.9 19941.2 2 635.9 1115.9 19941.2 3 896.3 757.3 19941.2 4 748.0 700.6 19941.2 With marshal3_numbers.patch: ver. dumps loads size 750.5 19946.1 0 690.2 1173.7 19946.1 1 678.2 1166.6 19946.1 2 630.9 1105.2 19946.1 3 873.2 751.5 19946.1 4 724.6 687.6 19946.1 With marshal_refs_by_value.patch: ver. dumps loads size 780.2 19935.8 0 712.7 1202.6 19935.8 1 713.8 1195.1 19935.8 2 676.5 1126.0 19935.8 3 686.1 762.7 19935.8 4 515.1 697.6 19935.8 |
| msg235384 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) *  | Date: 2015-02-04 11:16 |
Here is a patch which adds separate dict for interned strings (otherwise they can be uninterned) and for bytes. It also slightly simplify the code. |
| msg235385 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) *  | Date: 2015-02-04 11:30 |
And here is alternative patch which uses a hashtable. Both patches have about the same performance for *.pyc files, but marshal_hashtable.patch is much faster for duplicated values. Marshalling [1000]*10**6, [1000.0]*10**6 and [1000.0j]*10**6 with version 3 an 4 is so fast as marshalling [1000]*10**6 with version 2 (i.e. 5 times faster than current implementation). data ver. dumps(ms) loads(ms) size(KiB) genData 2 99.9 188.9 4090.7 genData 3 148.2 189.1 4090.7 genData 4 121.4 177.4 3651.3 [1000]*10**6 2 97.7 131.6 4882.8 [1000]*10**6 3 95.1 63.1 4882.8 [1000]*10**6 4 95.1 64.4 4882.8 [1000.0]*10**6 2 172.9 153.5 8789.1 [1000.0]*10**6 3 97.4 61.9 4882.8 [1000.0]*10**6 4 95.7 61.6 4882.8 [1000.0j]*10**6 2 288.6 228.2 16601.6 [1000.0j]*10**6 3 94.9 61.6 4882.8 [1000.0j]*10**6 4 95.1 62.2 4882.8 20 pydecimals 2 88.0 111.4 3929.6 20 pydecimals 3 57.0 51.4 3368.5 20 pydecimals 4 46.6 39.9 3292.8 |
| msg235469 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) *  | Date: 2015-02-06 07:11 |
What are your thoughts Victor? |
| msg235479 - (view) | Author: Antoine Pitrou (pitrou) *  | Date: 2015-02-06 12:39 |
Can you post the results of your hashtable patch with marshalbench.py? PS: we don't care about versions older than 4. |
| msg235485 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) *  | Date: 2015-02-06 14:11 |
Here are new results with corrected marshalbench.py (more precise and with recalculated total size): Without the patch: ver. dumps loads size 746.5 19971.2 0 669.1 1149.2 26202.9 1 682.0 1130.1 26202.9 2 652.0 1134.8 26202.4 3 920.8 757.9 21316.7 4 771.2 691.4 19971.2 With marshal3_numbers.patch: 3 894.6 778.4 21321.6 4 733.1 704.2 19976.2 With marshal_refs_by_value_3.patch: 3 687.6 777.7 21310.3 4 535.5 701.5 19966.3 With marshal_hashtable.patch: 3 656.0 764.2 21316.7 4 512.9 696.6 19971.2 |
| msg235713 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) *  | Date: 2015-02-10 22:00 |
Update patch addresses Victor's comments. |
| msg235721 - (view) | Author: STINNER Victor (vstinner) *  | Date: 2015-02-11 00:09 |
Except of the minor suggestion that I added on the review, the patch looks good the me. It's quite simple and makes dumps() 34% faster (for protocol 4). |
| msg235722 - (view) | Author: STINNER Victor (vstinner) *  | Date: 2015-02-11 00:09 |
(I didn't reproduce the benchmark, I just used Serhy numbers. I guess that using memcpy() doesn't make the code slower.) |
| msg235746 - (view) | Author: Roundup Robot (python-dev)  | Date: 2015-02-11 13:55 |
New changeset 012364df2712 by Serhiy Storchaka in branch 'default': Issue #20416: marshal.dumps() with protocols 3 and 4 is now 40-50% faster on https://hg.python.org/cpython/rev/012364df2712 |
| msg235753 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) *  | Date: 2015-02-11 15:53 |
Thanks for your review Antoine and Victor. |
|
| Date | User | Action | Args |
| 2022-04-11 14:57:57 | admin | set | github: 64615 |
| 2015-02-11 15:53:15 | serhiy.storchaka | set | status: open -> closed messages: + msg235753
assignee: serhiy.storchaka resolution: fixed stage: patch review -> resolved |
| 2015-02-11 13:55:58 | python-dev | set | nosy: + python-dev messages: + msg235746
|
| 2015-02-11 00:09:58 | vstinner | set | messages: + msg235722 |
| 2015-02-11 00:09:12 | vstinner | set | messages: + msg235721 |
| 2015-02-10 22:00:54 | serhiy.storchaka | set | files: + marshal_hashtable_2.patch
messages: + msg235713 |
| 2015-02-06 14:11:15 | serhiy.storchaka | set | messages: + msg235485 |
| 2015-02-06 12:39:25 | pitrou | set | messages: + msg235479 |
| 2015-02-06 07:11:58 | serhiy.storchaka | set | messages: + msg235469 title: Marshal: special case int and float, don't use references -> Marshal: performance regression with versions 3 and 4 |
| 2015-02-04 11:30:45 | serhiy.storchaka | set | files: + marshal_hashtable.patch
messages: + msg235385 |
| 2015-02-04 11:16:49 | serhiy.storchaka | set | files: + marshal_refs_by_value_3.patch
messages: + msg235384 |
| 2015-01-30 19:43:35 | serhiy.storchaka | set | messages: + msg235050 |
| 2015-01-28 16:46:03 | serhiy.storchaka | set | files: + bench_issue20416.py |
| 2015-01-28 16:45:22 | serhiy.storchaka | set | files: + marshal_refs_by_value.patch
messages: + msg234903 components: + Interpreter Core stage: patch review |
| 2014-02-20 06:05:12 | larry | set | priority: release blocker -> normal
messages: + msg211691 versions: + Python 3.5, - Python 3.4 |
| 2014-02-13 21:35:45 | serhiy.storchaka | set | files: + marshal_small_ints_refs.patch
messages: + msg211173 |
| 2014-02-13 19:29:29 | serhiy.storchaka | set | messages: + msg211164 |
| 2014-01-28 11:39:26 | pitrou | set | messages: + msg209535 |
| 2014-01-28 11:35:30 | loewis | set | messages: + msg209534 |
| 2014-01-28 11:34:43 | vstinner | set | messages: + msg209533 |
| 2014-01-28 10:54:50 | serhiy.storchaka | set | messages: + msg209532 |
| 2014-01-28 10:50:55 | serhiy.storchaka | set | nosy: + pitrou, kristjan.jonsson, serhiy.storchaka
|
| 2014-01-28 10:36:08 | vstinner | set | priority: normal -> release blocker nosy: + larry messages: + msg209531
|
| 2014-01-28 10:25:57 | vajrasky | set | nosy: + vajrasky messages: + msg209530
|
| 2014-01-28 10:20:21 | vstinner | set | messages: + msg209528 |
| 2014-01-28 10:19:31 | vstinner | set | messages: + msg209526 |
| 2014-01-28 10:14:45 | vstinner | set | nosy: + loewis
|
| 2014-01-28 10:14:40 | vstinner | set | files: + bench.py
messages: + msg209525 |
| 2014-01-28 10:10:48 | vstinner | create | |