Issue 20416: Marshal: performance regression with versions 3 and 4

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Created on 2014-01-28 10:10 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
marshal3_numbers.patch	vstinner, 2014-01-28 10:10		review
bench.py	vstinner, 2014-01-28 10:14
marshal_small_ints_refs.patch	serhiy.storchaka, 2014-02-13 21:35		review
marshal_refs_by_value.patch	serhiy.storchaka, 2015-01-28 16:45		review
bench_issue20416.py	serhiy.storchaka, 2015-01-28 16:46
marshal_refs_by_value_3.patch	serhiy.storchaka, 2015-02-04 11:16		review
marshal_hashtable.patch	serhiy.storchaka, 2015-02-04 11:30		review
marshal_hashtable_2.patch	serhiy.storchaka, 2015-02-10 22:00		review

Messages (25)
msg209524 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-01-28 10:10
Attached patched disables references for int and float types.
msg209525 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-01-28 10:14
Use attached bench.py to compare performances. Without the patch: --- dumps v0: 389.6 ms data size v0: 45582.9 kB loads v0: 573.3 ms dumps v1: 391.4 ms data size v1: 45582.9 kB loads v1: 558.0 ms dumps v2: 166.9 ms data size v2: 41395.4 kB loads v2: 482.2 ms dumps v3: 431.2 ms data size v3: 41395.4 kB loads v3: 543.8 ms dumps v4: 361.8 ms data size v4: 37000.9 kB loads v4: 560.4 ms --- With the patch: --- dumps v0: 391.4 ms data size v0: 45582.9 kB loads v0: 578.2 ms dumps v1: 392.3 ms data size v1: 45582.9 kB loads v1: 556.8 ms dumps v2: 167.7 ms data size v2: 41395.4 kB loads v2: 484.6 ms dumps v3: 170.3 ms data size v3: 41395.4 kB loads v3: 467.0 ms dumps v4: 122.8 ms data size v4: 37000.9 kB loads v4: 468.9 ms --- dumps v3 is 60% faster, loads v3 is also 14% faster. dumps v4 is 66% faster, loads v4 is 16% faster.
msg209526 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-01-28 10:19
Performance of Python 3.3.3+: --- dumps v0: 374.8 ms data size v0: 45582.9 kB loads v0: 625.3 ms dumps v1: 374.6 ms data size v1: 45582.9 kB loads v1: 605.1 ms dumps v2: 152.9 ms data size v2: 41395.4 kB loads v2: 556.5 ms --- So with the patch, the Python 3.4 default version (4) is faster (dump 20% faster, load 16% faster) and produces smaller files (10% smaller).
msg209528 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-01-28 10:20
Oh by the way, on Python 3.4, the file size (on version 3 and 4) is unchanged with my patch. Writing a reference produces takes exactly the same size than an integer.
msg209530 - (view)	Author: Vajrasky Kok (vajrasky) *	Date: 2014-01-28 10:25
I am doing clinic conversion for marshal module so I am adding myself to nosy list to make sure both tickets are synchronized. http://bugs.python.org/issue20185
msg209531 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-01-28 10:36
> I am doing clinic conversion for marshal module so I am adding myself to nosy list to make sure both tickets are synchronized. The Derby is suspended until the release of Python 3.4 final. I consider this issue as an important performance regression that should be fixed before Python 3.4 final.
msg209532 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-01-28 10:54
Did you tested for numerous shared int and floats? [1000] * 1000000 and [1000.0] * 1000000? AFAIK this was important use cases for adding 3 or 4 versions.
msg209533 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-01-28 11:34
> Did you tested for numerous shared int and floats? [1000] * 1000000 and [1000.0] * 1000000? AFAIK this was important use cases for adding 3 or 4 versions. Here are new benchmarks on Python 3.4 with: Integers: [1000] * 1000000 Floats: [1000.0] * 1000000 Integers, without the patch: dumps v3: 62.8 ms data size v3: 4882.8 kB loads v3: 10.7 ms Integers, with the patch: dumps v3: 18.6 ms (-70%) data size v3: 4882.8 kB (same size) loads v3: 27.7 ms (+158%) Floats, without the patch: dumps v3: 62.5 ms data size v3: 4882.8 kB loads v3: 11.0 ms Floats, with the patch: dumps v3: 29.3 ms (-53%) data size v3: 8789.1 kB (+80%) loads v3: 25.5 ms (+132%) The version 3 was added by: --- changeset: 82816:01372117a5b4 user: Kristján Valur Jónsson <sweskman@gmail.com> date: Tue Mar 19 18:02:10 2013 -0700 files: Doc/library/marshal.rst Include/marshal.h Lib/test/test_marshal.py Misc/NEWS Python/marshal.c description: Issue #16475: Support object instancing, recursion and interned strings in marshal --- This issue tells about "sharing string constants, common tuples, even common code objects", not sharing numbers. For real data, here are interesting numbers: http://bugs.python.org/issue16475#msg176013 Integers only represent 4.8% of serialized data, and only 8.2% of these integers can be shared. (Floats represent 0.29%.) Whereas strings repsent 58% and 57% can be shared.
msg209534 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2014-01-28 11:35
For the record, format 3 was added through issue16475, format 4 was added through issue19219. In msg175962, Kristjan argued that there is no reason _not_ to share int objects, e.g. across multiple code objects. Now it seems that this argument is flawed: there is a reason, namely the performance impact. OTOH, I consider both use case (marshaling a large number of integers, and desiring to share ints across code objects) equally obscure: you shouldn't worry about marshal performance too much if you have loads of tiny int objects, and you shouldn't worry whether these ints get shared or not. As a compromise, we could suppress the sharing for small int objects, since they are singletons, anyway. This would allow marshal to preserve/copy the object graph, while not impacting the use case that the original poster on python-dev presented.
msg209535 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2014-01-28 11:39
> Integers, without the patch: > > dumps v3: 62.8 ms > data size v3: 4882.8 kB > loads v3: 10.7 ms > > Integers, with the patch: > > dumps v3: 18.6 ms (-70%) > data size v3: 4882.8 kB (same size) > loads v3: 27.7 ms (+158%) As I wrote on python-dev, dumps performance isn't important for the pyc use case, but loads performance is. Therefore it appears this patch goes into the wrong direction. You are also ignoring the runtime benefit of sharing objects: smaller memory footprint of the actual Python process.
msg211164 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-02-13 19:29
> As a compromise, we could suppress the sharing for small int objects, since they are singletons, anyway. This is implementation detail. But we can use more efficient way to memoizating small int objects. I also suppose than efficient C implementation of IdentityDict will significant decrease an overhead (may be 2 times). As far as this is probably a first case for which IdentityDict give significant benefit, I'll provide an implementation for 3.5.
msg211173 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-02-13 21:35
Here is a patch which special cases small integers. It decreases v3 and v4 dump time of bench.py by 10% without affecting load time. Of course in real cases the benefit will be much less.
msg211691 - (view)	Author: Larry Hastings (larry) *	Date: 2014-02-20 06:05
This is not a release blocker. You guys can play with this for 3.5. FWIW I prefer Serhiy's approach.
msg234903 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-01-28 16:45
Here is more general solution. For simple values (ints, floats, complex numbers, short strings) it is faster to use the value itself as a key than create new integer object (id). Without the patch: data ver. dumps(ms) loads(ms) size(KiB) genData 2 103.9 186.4 4090.7 genData 3 250.3 196.8 4090.7 genData 4 223.5 182.5 3651.3 [1000]106 2 98.6 134.8 4882.8 [1000]10*6 3 491.1 62.2 4882.8 [1000]10*6 4 494.9 62.1 4882.8 [1000.0]10*6 2 173.5 158.4 8789.1 [1000.0]10*6 3 494.8 62.2 4882.8 [1000.0]10*6 4 491.4 62.8 4882.8 [1000.0j]10*6 2 288.8 221.4 16601.6 [1000.0j]10*6 3 493.6 62.4 4882.8 [1000.0j]10*6 4 489.2 62.0 4882.8 20 pydecimals 2 85.0 114.7 3936.5 20 pydecimals 3 97.2 44.3 3373.4 20 pydecimals 4 86.2 40.0 3297.5 With marshal3_numbers.patch: data ver. dumps(ms) loads(ms) size(KiB) genData 3 108.4 187.5 4090.7 genData 4 83.0 179.3 3651.3 [1000]10*6 3 104.2 145.8 4882.8 [1000]10*6 4 103.9 147.0 4882.8 [1000.0]10*6 3 177.4 154.5 8789.1 [1000.0]10*6 4 177.1 164.2 8789.1 [1000.0j]10*6 3 501.6 61.1 4882.8 [1000.0j]10*6 4 501.6 62.3 4882.8 20 pydecimals 3 95.2 41.9 3373.4 20 pydecimals 4 83.5 38.5 3297.5 With marshal_refs_by_value.patch: data ver. dumps(ms) loads(ms) size(KiB) genData 3 150.4 197.0 4090.7 genData 4 122.1 184.8 3651.3 [1000]10*6 3 169.1 62.3 4882.8 [1000]10*6 4 169.2 62.2 4882.8 [1000.0]10*6 3 313.5 62.2 4882.8 [1000.0]10*6 4 312.8 62.3 4882.8 [1000.0j]10*6 3 410.6 62.5 4882.8 [1000.0j]10**6 4 410.5 62.3 4882.8 20 pydecimals 3 68.5 41.1 3373.4 20 pydecimals 4 57.5 39.8 3297.5 The marshal_refs_by_value.patch produces data so compact as unpatched code does, but dumps faster. Usually it dumps slower than marshal3_numbers.patch, but may produce smaller data and loads faster. Surprisingly it dumps the code of the _pydecimal module faster. As side effect the patch can turn some simple equal values to identical objects.
msg235050 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-01-30 19:43
Here are results of the benchmark which measures dump and load time for all pyc files in the stdlib (including tests). https://bitbucket.org/storchaka/cpython-stuff/src/default/marshal/marshalbench.py $ find * -name __pycache__ -exec rm -rf '{}' + $ ./python -m compileall -qq -r 99 Lib $ find Lib -name '*.pyc' \| xargs ./python marshalbench.py Without the patch: ver. dumps loads size 770.3 19941.2 0 695.7 1178.4 19941.2 1 680.4 1180.9 19941.2 2 635.9 1115.9 19941.2 3 896.3 757.3 19941.2 4 748.0 700.6 19941.2 With marshal3_numbers.patch: ver. dumps loads size 750.5 19946.1 0 690.2 1173.7 19946.1 1 678.2 1166.6 19946.1 2 630.9 1105.2 19946.1 3 873.2 751.5 19946.1 4 724.6 687.6 19946.1 With marshal_refs_by_value.patch: ver. dumps loads size 780.2 19935.8 0 712.7 1202.6 19935.8 1 713.8 1195.1 19935.8 2 676.5 1126.0 19935.8 3 686.1 762.7 19935.8 4 515.1 697.6 19935.8
msg235384 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-02-04 11:16
Here is a patch which adds separate dict for interned strings (otherwise they can be uninterned) and for bytes. It also slightly simplify the code.
msg235385 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-02-04 11:30
And here is alternative patch which uses a hashtable. Both patches have about the same performance for .pyc files, but marshal_hashtable.patch is much faster for duplicated values. Marshalling [1000]10*6, [1000.0]10*6 and [1000.0j]10*6 with version 3 an 4 is so fast as marshalling [1000]10*6 with version 2 (i.e. 5 times faster than current implementation). data ver. dumps(ms) loads(ms) size(KiB) genData 2 99.9 188.9 4090.7 genData 3 148.2 189.1 4090.7 genData 4 121.4 177.4 3651.3 [1000]10*6 2 97.7 131.6 4882.8 [1000]10*6 3 95.1 63.1 4882.8 [1000]10*6 4 95.1 64.4 4882.8 [1000.0]10*6 2 172.9 153.5 8789.1 [1000.0]10*6 3 97.4 61.9 4882.8 [1000.0]10*6 4 95.7 61.6 4882.8 [1000.0j]10*6 2 288.6 228.2 16601.6 [1000.0j]10*6 3 94.9 61.6 4882.8 [1000.0j]10**6 4 95.1 62.2 4882.8 20 pydecimals 2 88.0 111.4 3929.6 20 pydecimals 3 57.0 51.4 3368.5 20 pydecimals 4 46.6 39.9 3292.8
msg235469 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-02-06 07:11
What are your thoughts Victor?
msg235479 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2015-02-06 12:39
Can you post the results of your hashtable patch with marshalbench.py? PS: we don't care about versions older than 4.
msg235485 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-02-06 14:11
Here are new results with corrected marshalbench.py (more precise and with recalculated total size): Without the patch: ver. dumps loads size 746.5 19971.2 0 669.1 1149.2 26202.9 1 682.0 1130.1 26202.9 2 652.0 1134.8 26202.4 3 920.8 757.9 21316.7 4 771.2 691.4 19971.2 With marshal3_numbers.patch: 3 894.6 778.4 21321.6 4 733.1 704.2 19976.2 With marshal_refs_by_value_3.patch: 3 687.6 777.7 21310.3 4 535.5 701.5 19966.3 With marshal_hashtable.patch: 3 656.0 764.2 21316.7 4 512.9 696.6 19971.2
msg235713 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-02-10 22:00
Update patch addresses Victor's comments.
msg235721 - (view)	Author: STINNER Victor (vstinner) *	Date: 2015-02-11 00:09
Except of the minor suggestion that I added on the review, the patch looks good the me. It's quite simple and makes dumps() 34% faster (for protocol 4).
msg235722 - (view)	Author: STINNER Victor (vstinner) *	Date: 2015-02-11 00:09
(I didn't reproduce the benchmark, I just used Serhy numbers. I guess that using memcpy() doesn't make the code slower.)
msg235746 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-02-11 13:55
New changeset 012364df2712 by Serhiy Storchaka in branch 'default': Issue #20416: marshal.dumps() with protocols 3 and 4 is now 40-50% faster on https://hg.python.org/cpython/rev/012364df2712
msg235753 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-02-11 15:53
Thanks for your review Antoine and Victor.

History
Date	User	Action	Args
2022-04-11 14:57:57	admin	set	github: 64615
2015-02-11 15:53:15	serhiy.storchaka	set	status: open -> closed messages: + msg235753 assignee: serhiy.storchaka resolution: fixed stage: patch review -> resolved
2015-02-11 13:55:58	python-dev	set	nosy: + python-dev messages: + msg235746
2015-02-11 00:09:58	vstinner	set	messages: + msg235722
2015-02-11 00:09:12	vstinner	set	messages: + msg235721
2015-02-10 22:00:54	serhiy.storchaka	set	files: + marshal_hashtable_2.patch messages: + msg235713
2015-02-06 14:11:15	serhiy.storchaka	set	messages: + msg235485
2015-02-06 12:39:25	pitrou	set	messages: + msg235479
2015-02-06 07:11:58	serhiy.storchaka	set	messages: + msg235469 title: Marshal: special case int and float, don't use references -> Marshal: performance regression with versions 3 and 4
2015-02-04 11:30:45	serhiy.storchaka	set	files: + marshal_hashtable.patch messages: + msg235385
2015-02-04 11:16:49	serhiy.storchaka	set	files: + marshal_refs_by_value_3.patch messages: + msg235384
2015-01-30 19:43:35	serhiy.storchaka	set	messages: + msg235050
2015-01-28 16:46:03	serhiy.storchaka	set	files: + bench_issue20416.py
2015-01-28 16:45:22	serhiy.storchaka	set	files: + marshal_refs_by_value.patch messages: + msg234903 components: + Interpreter Core stage: patch review
2014-02-20 06:05:12	larry	set	priority: release blocker -> normal messages: + msg211691 versions: + Python 3.5, - Python 3.4
2014-02-13 21:35:45	serhiy.storchaka	set	files: + marshal_small_ints_refs.patch messages: + msg211173
2014-02-13 19:29:29	serhiy.storchaka	set	messages: + msg211164
2014-01-28 11:39:26	pitrou	set	messages: + msg209535
2014-01-28 11:35:30	loewis	set	messages: + msg209534
2014-01-28 11:34:43	vstinner	set	messages: + msg209533
2014-01-28 10:54:50	serhiy.storchaka	set	messages: + msg209532
2014-01-28 10:50:55	serhiy.storchaka	set	nosy: + pitrou, kristjan.jonsson, serhiy.storchaka
2014-01-28 10:36:08	vstinner	set	priority: normal -> release blocker nosy: + larry messages: + msg209531
2014-01-28 10:25:57	vajrasky	set	nosy: + vajrasky messages: + msg209530
2014-01-28 10:20:21	vstinner	set	messages: + msg209528
2014-01-28 10:19:31	vstinner	set	messages: + msg209526
2014-01-28 10:14:45	vstinner	set	nosy: + loewis
2014-01-28 10:14:40	vstinner	set	files: + bench.py messages: + msg209525
2014-01-28 10:10:48	vstinner	create