Skip to content

Conversation

@brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Nov 12, 2019

At @methane's request, here is an alternate PR that takes place in the compiler, not the peephole optimizer. It would replace both #16498 and #16878.

I think I like this one better.

https://bugs.python.org/issue38328

@brandtbucher
Copy link
Member Author

@benjaminp, care to take a look at this when you have a chance?

@methane
Copy link
Member

methane commented Nov 25, 2019

Would you merge master and make regen-all?

@brandtbucher
Copy link
Member Author

Thanks @methane. This branch is no longer stale.

Python/compile.c Outdated
{
Py_ssize_t n = asdl_seq_LEN(elts);
Py_ssize_t i, nsubitems = 0, nseen = 0;
if (n > 1 && are_all_items_const(elts, 0, n)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is n > 1 really good threshold? How about n > 2?

$ python3 -m timeit -s "a=frozenset((1,2))" -- "{*a}" 5000000 loops, best of 5: 63.4 nsec per loop $ python3 -m timeit "{1,2}" 5000000 loops, best of 5: 61.5 nsec per loop $ python3 -m timeit "[*(1,2)]" 5000000 loops, best of 5: 46.5 nsec per loop $ python3 -m timeit "[1,2]" 5000000 loops, best of 5: 46.6 nsec per loop 
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only chose n > 1 because it always emits fewer opcodes than before the change. If you think n > 2 is a better heuristic, I'm fine with that.

Copy link
Member Author

@brandtbucher brandtbucher Nov 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your first example is only slower because it pays the price of a LOAD_NAME instead of a LOAD_CONST (just a guess, though). It's not a direct comparison, unfortunately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fewer opcodes is not always more efficient.
BUILD_LIST_UNPACK calls PyList_New(0) and PyList_Extend(iterable). It causes over-allocation.
On the other hand, BUILD_LIST just calls PyList_New(n).

So I think we should be conservative about choosing the threshold.

Copy link
Member Author

@brandtbucher brandtbucher Nov 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @methane. I think this makes sense.

I've bumped the threshold to your recommended value of n > 2. Anything else?

@methane methane merged commit 6dd9b64 into python:master Nov 26, 2019
@brandtbucher
Copy link
Member Author

Thanks for the guidance and merge @methane!

jacobneiltaylor pushed a commit to jacobneiltaylor/cpython that referenced this pull request Dec 5, 2019
shihai1991 pushed a commit to shihai1991/cpython that referenced this pull request Jan 31, 2020
@brandtbucher brandtbucher deleted the compile-constant-collections branch July 21, 2022 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance or resource usage

4 participants