This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Created on 2014-01-28 10:10 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
marshal3_numbers.patch vstinner, 2014-01-28 10:10 review
bench.py vstinner, 2014-01-28 10:14
marshal_small_ints_refs.patch serhiy.storchaka, 2014-02-13 21:35 review
marshal_refs_by_value.patch serhiy.storchaka, 2015-01-28 16:45 review
bench_issue20416.py serhiy.storchaka, 2015-01-28 16:46
marshal_refs_by_value_3.patch serhiy.storchaka, 2015-02-04 11:16 review
marshal_hashtable.patch serhiy.storchaka, 2015-02-04 11:30 review
marshal_hashtable_2.patch serhiy.storchaka, 2015-02-10 22:00 review
Messages (25)
msg209524 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-01-28 10:10
Attached patched disables references for int and float types.
msg209525 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-01-28 10:14
Use attached bench.py to compare performances. Without the patch: --- dumps v0: 389.6 ms data size v0: 45582.9 kB loads v0: 573.3 ms dumps v1: 391.4 ms data size v1: 45582.9 kB loads v1: 558.0 ms dumps v2: 166.9 ms data size v2: 41395.4 kB loads v2: 482.2 ms dumps v3: 431.2 ms data size v3: 41395.4 kB loads v3: 543.8 ms dumps v4: 361.8 ms data size v4: 37000.9 kB loads v4: 560.4 ms --- With the patch: --- dumps v0: 391.4 ms data size v0: 45582.9 kB loads v0: 578.2 ms dumps v1: 392.3 ms data size v1: 45582.9 kB loads v1: 556.8 ms dumps v2: 167.7 ms data size v2: 41395.4 kB loads v2: 484.6 ms dumps v3: 170.3 ms data size v3: 41395.4 kB loads v3: 467.0 ms dumps v4: 122.8 ms data size v4: 37000.9 kB loads v4: 468.9 ms --- dumps v3 is 60% faster, loads v3 is also 14% *faster*. dumps v4 is 66% faster, loads v4 is 16% faster.
msg209526 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-01-28 10:19
Performance of Python 3.3.3+: --- dumps v0: 374.8 ms data size v0: 45582.9 kB loads v0: 625.3 ms dumps v1: 374.6 ms data size v1: 45582.9 kB loads v1: 605.1 ms dumps v2: 152.9 ms data size v2: 41395.4 kB loads v2: 556.5 ms --- So with the patch, the Python 3.4 default version (4) is *faster* (dump 20% faster, load 16% faster) and produces *smaller files* (10% smaller).
msg209528 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-01-28 10:20
Oh by the way, on Python 3.4, the file size (on version 3 and 4) is unchanged with my patch. Writing a reference produces takes exactly the same size than an integer.
msg209530 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2014-01-28 10:25
I am doing clinic conversion for marshal module so I am adding myself to nosy list to make sure both tickets are synchronized. http://bugs.python.org/issue20185
msg209531 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-01-28 10:36
> I am doing clinic conversion for marshal module so I am adding myself to nosy list to make sure both tickets are synchronized. The Derby is suspended until the release of Python 3.4 final. I consider this issue as an important performance regression that should be fixed before Python 3.4 final.
msg209532 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-01-28 10:54
Did you tested for numerous shared int and floats? [1000] * 1000000 and [1000.0] * 1000000? AFAIK this was important use cases for adding 3 or 4 versions.
msg209533 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-01-28 11:34
> Did you tested for numerous shared int and floats? [1000] * 1000000 and [1000.0] * 1000000? AFAIK this was important use cases for adding 3 or 4 versions. Here are new benchmarks on Python 3.4 with: Integers: [1000] * 1000000 Floats: [1000.0] * 1000000 Integers, without the patch: dumps v3: 62.8 ms data size v3: 4882.8 kB loads v3: 10.7 ms Integers, with the patch: dumps v3: 18.6 ms (-70%) data size v3: 4882.8 kB (same size) loads v3: 27.7 ms (+158%) Floats, without the patch: dumps v3: 62.5 ms data size v3: 4882.8 kB loads v3: 11.0 ms Floats, with the patch: dumps v3: 29.3 ms (-53%) data size v3: 8789.1 kB (+80%) loads v3: 25.5 ms (+132%) The version 3 was added by: --- changeset: 82816:01372117a5b4 user: Kristján Valur Jónsson <sweskman@gmail.com> date: Tue Mar 19 18:02:10 2013 -0700 files: Doc/library/marshal.rst Include/marshal.h Lib/test/test_marshal.py Misc/NEWS Python/marshal.c description: Issue #16475: Support object instancing, recursion and interned strings in marshal --- This issue tells about "sharing string constants, common tuples, even common code objects", not sharing numbers. For real data, here are interesting numbers: http://bugs.python.org/issue16475#msg176013 Integers only represent 4.8% of serialized data, and only 8.2% of these integers can be shared. (Floats represent 0.29%.) Whereas strings repsent 58% and 57% can be shared.
msg209534 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2014-01-28 11:35
For the record, format 3 was added through issue16475, format 4 was added through issue19219. In msg175962, Kristjan argued that there is no reason _not_ to share int objects, e.g. across multiple code objects. Now it seems that this argument is flawed: there is a reason, namely the performance impact. OTOH, I consider both use case (marshaling a large number of integers, and desiring to share ints across code objects) equally obscure: you shouldn't worry about marshal performance too much if you have loads of tiny int objects, and you shouldn't worry whether these ints get shared or not. As a compromise, we could suppress the sharing for small int objects, since they are singletons, anyway. This would allow marshal to preserve/copy the object graph, while not impacting the use case that the original poster on python-dev presented.
msg209535 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-01-28 11:39
> Integers, without the patch: > > dumps v3: 62.8 ms > data size v3: 4882.8 kB > loads v3: 10.7 ms > > Integers, with the patch: > > dumps v3: 18.6 ms (-70%) > data size v3: 4882.8 kB (same size) > loads v3: 27.7 ms (+158%) As I wrote on python-dev, dumps performance isn't important for the pyc use case, but loads performance is. Therefore it appears this patch goes into the wrong direction. You are also ignoring the *runtime* benefit of sharing objects: smaller memory footprint of the actual Python process.
msg211164 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-13 19:29
> As a compromise, we could suppress the sharing for small int objects, since they are singletons, anyway. This is implementation detail. But we can use more efficient way to memoizating small int objects. I also suppose than efficient C implementation of IdentityDict will significant decrease an overhead (may be 2 times). As far as this is probably a first case for which IdentityDict give significant benefit, I'll provide an implementation for 3.5.
msg211173 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-13 21:35
Here is a patch which special cases small integers. It decreases v3 and v4 dump time of bench.py by 10% without affecting load time. Of course in real cases the benefit will be much less.
msg211691 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-02-20 06:05
This is not a release blocker. You guys can play with this for 3.5. FWIW I prefer Serhiy's approach.
msg234903 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-01-28 16:45
Here is more general solution. For simple values (ints, floats, complex numbers, short strings) it is faster to use the value itself as a key than create new integer object (id). Without the patch: data ver. dumps(ms) loads(ms) size(KiB) genData 2 103.9 186.4 4090.7 genData 3 250.3 196.8 4090.7 genData 4 223.5 182.5 3651.3 [1000]*10**6 2 98.6 134.8 4882.8 [1000]*10**6 3 491.1 62.2 4882.8 [1000]*10**6 4 494.9 62.1 4882.8 [1000.0]*10**6 2 173.5 158.4 8789.1 [1000.0]*10**6 3 494.8 62.2 4882.8 [1000.0]*10**6 4 491.4 62.8 4882.8 [1000.0j]*10**6 2 288.8 221.4 16601.6 [1000.0j]*10**6 3 493.6 62.4 4882.8 [1000.0j]*10**6 4 489.2 62.0 4882.8 20 pydecimals 2 85.0 114.7 3936.5 20 pydecimals 3 97.2 44.3 3373.4 20 pydecimals 4 86.2 40.0 3297.5 With marshal3_numbers.patch: data ver. dumps(ms) loads(ms) size(KiB) genData 3 108.4 187.5 4090.7 genData 4 83.0 179.3 3651.3 [1000]*10**6 3 104.2 145.8 4882.8 [1000]*10**6 4 103.9 147.0 4882.8 [1000.0]*10**6 3 177.4 154.5 8789.1 [1000.0]*10**6 4 177.1 164.2 8789.1 [1000.0j]*10**6 3 501.6 61.1 4882.8 [1000.0j]*10**6 4 501.6 62.3 4882.8 20 pydecimals 3 95.2 41.9 3373.4 20 pydecimals 4 83.5 38.5 3297.5 With marshal_refs_by_value.patch: data ver. dumps(ms) loads(ms) size(KiB) genData 3 150.4 197.0 4090.7 genData 4 122.1 184.8 3651.3 [1000]*10**6 3 169.1 62.3 4882.8 [1000]*10**6 4 169.2 62.2 4882.8 [1000.0]*10**6 3 313.5 62.2 4882.8 [1000.0]*10**6 4 312.8 62.3 4882.8 [1000.0j]*10**6 3 410.6 62.5 4882.8 [1000.0j]*10**6 4 410.5 62.3 4882.8 20 pydecimals 3 68.5 41.1 3373.4 20 pydecimals 4 57.5 39.8 3297.5 The marshal_refs_by_value.patch produces data so compact as unpatched code does, but dumps faster. Usually it dumps slower than marshal3_numbers.patch, but may produce smaller data and loads faster. Surprisingly it dumps the code of the _pydecimal module faster. As side effect the patch can turn some simple equal values to identical objects.
msg235050 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-01-30 19:43
Here are results of the benchmark which measures dump and load time for all pyc files in the stdlib (including tests). https://bitbucket.org/storchaka/cpython-stuff/src/default/marshal/marshalbench.py $ find * -name __pycache__ -exec rm -rf '{}' + $ ./python -m compileall -qq -r 99 Lib $ find Lib -name '*.pyc' | xargs ./python marshalbench.py Without the patch: ver. dumps loads size 770.3 19941.2 0 695.7 1178.4 19941.2 1 680.4 1180.9 19941.2 2 635.9 1115.9 19941.2 3 896.3 757.3 19941.2 4 748.0 700.6 19941.2 With marshal3_numbers.patch: ver. dumps loads size 750.5 19946.1 0 690.2 1173.7 19946.1 1 678.2 1166.6 19946.1 2 630.9 1105.2 19946.1 3 873.2 751.5 19946.1 4 724.6 687.6 19946.1 With marshal_refs_by_value.patch: ver. dumps loads size 780.2 19935.8 0 712.7 1202.6 19935.8 1 713.8 1195.1 19935.8 2 676.5 1126.0 19935.8 3 686.1 762.7 19935.8 4 515.1 697.6 19935.8
msg235384 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-04 11:16
Here is a patch which adds separate dict for interned strings (otherwise they can be uninterned) and for bytes. It also slightly simplify the code.
msg235385 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-04 11:30
And here is alternative patch which uses a hashtable. Both patches have about the same performance for *.pyc files, but marshal_hashtable.patch is much faster for duplicated values. Marshalling [1000]*10**6, [1000.0]*10**6 and [1000.0j]*10**6 with version 3 an 4 is so fast as marshalling [1000]*10**6 with version 2 (i.e. 5 times faster than current implementation). data ver. dumps(ms) loads(ms) size(KiB) genData 2 99.9 188.9 4090.7 genData 3 148.2 189.1 4090.7 genData 4 121.4 177.4 3651.3 [1000]*10**6 2 97.7 131.6 4882.8 [1000]*10**6 3 95.1 63.1 4882.8 [1000]*10**6 4 95.1 64.4 4882.8 [1000.0]*10**6 2 172.9 153.5 8789.1 [1000.0]*10**6 3 97.4 61.9 4882.8 [1000.0]*10**6 4 95.7 61.6 4882.8 [1000.0j]*10**6 2 288.6 228.2 16601.6 [1000.0j]*10**6 3 94.9 61.6 4882.8 [1000.0j]*10**6 4 95.1 62.2 4882.8 20 pydecimals 2 88.0 111.4 3929.6 20 pydecimals 3 57.0 51.4 3368.5 20 pydecimals 4 46.6 39.9 3292.8
msg235469 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-06 07:11
What are your thoughts Victor?
msg235479 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-02-06 12:39
Can you post the results of your hashtable patch with marshalbench.py? PS: we don't care about versions older than 4.
msg235485 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-06 14:11
Here are new results with corrected marshalbench.py (more precise and with recalculated total size): Without the patch: ver. dumps loads size 746.5 19971.2 0 669.1 1149.2 26202.9 1 682.0 1130.1 26202.9 2 652.0 1134.8 26202.4 3 920.8 757.9 21316.7 4 771.2 691.4 19971.2 With marshal3_numbers.patch: 3 894.6 778.4 21321.6 4 733.1 704.2 19976.2 With marshal_refs_by_value_3.patch: 3 687.6 777.7 21310.3 4 535.5 701.5 19966.3 With marshal_hashtable.patch: 3 656.0 764.2 21316.7 4 512.9 696.6 19971.2
msg235713 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-10 22:00
Update patch addresses Victor's comments.
msg235721 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-11 00:09
Except of the minor suggestion that I added on the review, the patch looks good the me. It's quite simple and makes dumps() 34% faster (for protocol 4).
msg235722 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-11 00:09
(I didn't reproduce the benchmark, I just used Serhy numbers. I guess that using memcpy() doesn't make the code slower.)
msg235746 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-02-11 13:55
New changeset 012364df2712 by Serhiy Storchaka in branch 'default': Issue #20416: marshal.dumps() with protocols 3 and 4 is now 40-50% faster on https://hg.python.org/cpython/rev/012364df2712
msg235753 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-11 15:53
Thanks for your review Antoine and Victor.
History
Date User Action Args
2022-04-11 14:57:57adminsetgithub: 64615
2015-02-11 15:53:15serhiy.storchakasetstatus: open -> closed
messages: + msg235753

assignee: serhiy.storchaka
resolution: fixed
stage: patch review -> resolved
2015-02-11 13:55:58python-devsetnosy: + python-dev
messages: + msg235746
2015-02-11 00:09:58vstinnersetmessages: + msg235722
2015-02-11 00:09:12vstinnersetmessages: + msg235721
2015-02-10 22:00:54serhiy.storchakasetfiles: + marshal_hashtable_2.patch

messages: + msg235713
2015-02-06 14:11:15serhiy.storchakasetmessages: + msg235485
2015-02-06 12:39:25pitrousetmessages: + msg235479
2015-02-06 07:11:58serhiy.storchakasetmessages: + msg235469
title: Marshal: special case int and float, don't use references -> Marshal: performance regression with versions 3 and 4
2015-02-04 11:30:45serhiy.storchakasetfiles: + marshal_hashtable.patch

messages: + msg235385
2015-02-04 11:16:49serhiy.storchakasetfiles: + marshal_refs_by_value_3.patch

messages: + msg235384
2015-01-30 19:43:35serhiy.storchakasetmessages: + msg235050
2015-01-28 16:46:03serhiy.storchakasetfiles: + bench_issue20416.py
2015-01-28 16:45:22serhiy.storchakasetfiles: + marshal_refs_by_value.patch

messages: + msg234903
components: + Interpreter Core
stage: patch review
2014-02-20 06:05:12larrysetpriority: release blocker -> normal

messages: + msg211691
versions: + Python 3.5, - Python 3.4
2014-02-13 21:35:45serhiy.storchakasetfiles: + marshal_small_ints_refs.patch

messages: + msg211173
2014-02-13 19:29:29serhiy.storchakasetmessages: + msg211164
2014-01-28 11:39:26pitrousetmessages: + msg209535
2014-01-28 11:35:30loewissetmessages: + msg209534
2014-01-28 11:34:43vstinnersetmessages: + msg209533
2014-01-28 10:54:50serhiy.storchakasetmessages: + msg209532
2014-01-28 10:50:55serhiy.storchakasetnosy: + pitrou, kristjan.jonsson, serhiy.storchaka
2014-01-28 10:36:08vstinnersetpriority: normal -> release blocker
nosy: + larry
messages: + msg209531

2014-01-28 10:25:57vajraskysetnosy: + vajrasky
messages: + msg209530
2014-01-28 10:20:21vstinnersetmessages: + msg209528
2014-01-28 10:19:31vstinnersetmessages: + msg209526
2014-01-28 10:14:45vstinnersetnosy: + loewis
2014-01-28 10:14:40vstinnersetfiles: + bench.py

messages: + msg209525
2014-01-28 10:10:48vstinnercreate