Benchmarking Tools #64

sethmlarson · 2016-09-29T19:56:08Z

Hello, I am opening this PR with the intent of not merging this work immediately, but rather to get feedback on how benchmarking of hyperframe, hpack, etc should be structured or put. If another repository is best then I will move all code there.

This current benchmarking tool is run using make bench or python setup.py bench. It runs all the classes that match the name ^[^_].*Benchmark$ in any modules in the bench/ directory. The current output format is only graphical, in the future the output will be in a data format allowing an easy comparison between two different runs of the benchmarks.

@Lukasa spoke about the possibility of this being automated in some way to keep our benchmarks up to date and to make it easier to make decisions regarding performance related improvements.

The output of a single benchmark run currently looks like this:

Benchmarking System: Linux x86_64 Benchmarking Python: CPython 3.5.2 Collected 21 benchmarks to run _warmup .................................................. 446.13 ms +/- 5.73 ms parse_data_frame ...................................... 9822.99 ms +/- 175.33 ms parse_go_away_frame .................................... 5481.53 ms +/- 80.66 ms parse_headers_frame .................................. 12555.85 ms +/- 283.92 ms parse_ping_frame ...................................... 5771.38 ms +/- 157.89 ms parse_priority_frame .................................... 5170.87 ms +/- 4.67 ms parse_push_promise_frame ............................... 10002.43 ms +/- 7.08 ms parse_rst_stream_frame ................................. 4589.44 ms +/- 10.84 ms parse_settings_ack_frame .............................. 5929.48 ms +/- 124.71 ms parse_settings_frame .................................. 8389.81 ms +/- 197.46 ms parse_window_update_frame .............................. 4579.93 ms +/- 52.16 ms serialize_data_frame ................................. 13381.87 ms +/- 191.30 ms serialize_go_away_frame ................................ 2976.84 ms +/- 85.71 ms serialize_headers_frame .............................. 15464.16 ms +/- 191.70 ms serialize_ping_frame ................................... 3980.39 ms +/- 90.43 ms serialize_priority_frame ................................ 2789.14 ms +/- 1.77 ms serialize_push_promise_frame ......................... 12599.25 ms +/- 187.90 ms serialize_rst_stream_frame ............................. 2451.56 ms +/- 12.40 ms serialize_settings_ack_frame .......................... 4242.05 ms +/- 100.14 ms serialize_settings_frame .............................. 7848.12 ms +/- 223.89 ms serialize_window_update_frame .......................... 2698.77 ms +/- 80.20 ms

Constructive criticism is very much welcomed!

Lukasa

So I'm generally pretty happy with this! At the point where we want to pursue benchmarking other modules I think we should consider whether some of the common infrastructure can be hoisted out to a new installable module, but for now that's a YAGNI concern and I'm happy for everything to live here.

It'd be nice to have some extra commenting that explains why things are being done so that others who haven't taken part in this discussion can keep track of why certain things have been done the way they have (e.g. generating 100 random frames of a given type). However, otherwise, this is looking really good so far!

Lukasa · 2016-09-30T09:10:23Z

bench/utils/__init__.py

+ self._watch = time.time()
+
+ def stop_watch(self):
+ stop = time.time()


So time.time() is potentially a bit tricky. Ideally we'd use this, or something equivalent.

Sure, I'll look into getting that added.

Lukasa · 2016-09-30T09:14:46Z

bench/utils/__init__.py

+import time
+import random
+
+_RANDOM = random.Random(0)


I'm pretty nervous about using 0 as a seed here: older versions of Python have problems with the MT whereby seeds that have long sequences of NULL bytes initially produce random output that is not evenly distributed for quite a while. It'd be better to use a 128-bit random seed generated from urandom on your own machine and just hard-code that into the file with a comment that says why we did it.

While we're here, I'm not sure that having a single instance of the RNG that is seeded once actually makes that much sense. While nominally that makes results reproducible, they're only reproducible if you run the exact same suite of tests. Probably good enough, but worth noting.

Yes, I used seed 0 for testing purposes, we can go with whatever seeding method you think is best. The biggest reason I saw for using a seeded RNG is for a better comparison between two identical benchmarks run on one machine rather than inter-machine similarities or similarities between two different sets of benchmarks so using a randomly generated key will be fine.

I think I have a better understanding of what you meant here, I will make the random generator be per-benchmark now rather than global. If this is not what you meant correct me for this is a small change.

Lukasa · 2016-09-30T09:23:06Z

By the way, @python-hyper/contributors, if anyone has feedback or thoughts here please do share it. I'd like us to become much more aware of what is happening to our performance as time goes on, and that means making it as easy as possible to contribute benchmarks. =D

sethmlarson · 2016-09-30T13:30:09Z

@Lukasa So here is the train of though behind my design decisions for the benchmarking tool so far.

The biggest design piece is using the factories for testing frames. Each factory is a function that generates a random frame and (hopefully) within a set of these frames would flex all the muscles that are available to the frame including all flags set and unset, all options set to a good full range of values, body lengths both short and long. These factories are used to generate our sample of frames that will either be serialized into bytes or serialized and then parsed into a new frame to exercise parsing and serializing frames. The main reason I used factories is because they are shared among both the parsing and serializing benchmarks and makes extending the benchmarks for each frame much easier and work is spread in very few places.

When each parse/serialize benchmark is run, a sample of 100 frames is generated but many different parsings/serializings occur in a single benchmark. The reason that the sample is smaller is because generating one frame per serialize/parse is expensive and memory intensive so it's best to simply have a sample big enough to get a full range of distinct frames to get a good benchmark of the "average" frame.

The next steps include making an artifact that will make it easier to compare one benchmark to another, especially in an automated way or a way that allows comparison between the master branch and your local copy in an automated way, but this will come after this PR goes through initially. I was thinking having a way to pass the other comparison benchmark in via command line and then the next comparison will report % differences and statistical significance of the differences in addition to raw values.

I will add comments to the actual benchmarking tool but I also wanted to explain my thoughts here as well. Thank you for the reviews @Lukasa I look forward to working on this tool until it becomes big enough for it's own repo. ;)

sethmlarson · 2016-09-30T14:00:56Z

@Lukasa I also understand you will be taking a vacation soon so do not worry I am not in a hurry to get this PR merged into master. It is good to get things like this correct. :)

sethmlarson · 2016-10-02T18:38:16Z

Hello @python-hyper/contributors I have changed this benchmark to use pytest.benchmark fixture as it seems to already have a lot of good work done for us when it comes to comparing two benchmarks against different Python versions, different Github branches, etc. Let me know how you feel about this change?

sethmlarson · 2016-10-12T13:29:51Z

@Lukasa Welcome home from vacation! Mind re-reviewing the changes I've made? I switched to the py.test-benchmark fixture over my own.

Lukasa · 2016-10-12T17:15:03Z

@SethMichaelLarson Yup, happy to do it, but will have to wait until tomorrow (my backlog was pretty massive!).

sethmlarson · 2016-10-12T17:16:45Z

@Lukasa Take your time, I saw your tweets about email backlog, haha :)

Lukasa

Ok, some initial high level notes. This is looking good, though I haven't actually run it yet: want to address the tox/makefile changes first before I get too nitty-gritty.

Lukasa · 2016-10-13T11:27:36Z

bench/test_flags.py

@@ -0,0 +1,19 @@
+from bench.utils.factories import FRAME_FACTORIES


It'd be better to use relative imports here, I think, rather than rely on the local directory being placed in sys.path.

Sure, I can change this.

Lukasa · 2016-10-13T11:45:10Z

bench/utils/__init__.py

+
+
+def get_bool():
+ return True if get_int(0, 1) == 1 else False


Given your own performance notes, let's reduce the overhead of this a bit by replacing it with return bool(get_int(0, 1))

True if x else False is faster than bool(x) according to my notes.

Huh, ok then. =)

Lukasa · 2016-10-13T11:48:51Z

bench/utils/factories.py

+ if get_bool():
+ frame.flags.add("END_HEADERS")
+ if get_bool():
+ frame.flags.add("END_STREAM")


This logic is not quite right. You can have a HEADERS frame with END_STREAM set but not END_HEADERS: specifically, if that HEADERS frame is going to be followed by a CONTINUATION frame. CONTINUATION frames do not carry END_STREAM.

Ahhh, I had them mixed up, I will change these around. Thanks for the clarification.

Unrelated side note: I wish there was some way that all caps could appear less threatening. So many years of Internet have conditioned me to see all caps comments as angry and that is my mind's initial knee-jerk reaction until I read HEADERS haha :)
You have done nothing wrong it was just a funny thing that I noticed. I know you would not get that livid about a PR. Although maybe you would, I dunno! ;)

Heh, that's ok, keep doing this programming lark and you'll get to the mindset I have, where ALL_CAPS means COMPILE_TIME_CONSTANT.

@Lukasa Oh I've been programming a long time, I've only recently joined Github. Been at it for about 12 years now. ;) I understand the convention but apparently social rules come first mentally when reading comments haha

;) Nice. Happily that's not a problem I have, though it may just be that I only spend time in the parts of the internet where all caps isn't used much.

Lukasa · 2016-10-13T11:51:07Z

bench/utils/factories.py

+ if get_bool():
+ frame.flags.add("END_HEADERS")
+
+ if get_int(0, 10) != 0:


Any reason the odds of putting padding on this are 9/10?

Wanted to test both padded and non-padded but I figured most frames with optional padding should be padded. Should I just increase this to 100% of frames with optional padding will be padded?

Yeah, we may as well. =)

Sounds good, I will make this change.

Lukasa · 2016-10-13T11:55:47Z

hyperframe/frame.py

 self.body_len = len(data)
 self.data = (
-  data[priority_data_length:len(data)-self.total_padding].tobytes()
+ data[priority_data_length:self.body_len-self.total_padding].tobytes()


Why did two bytes of indentation get stripped here?

Line length limits and being no good place to break the line.

We can save the extra characters by changing priority_data_length to priority_data_len.

Good plan, I will make that change here and above and revert the white space.

Lukasa · 2016-10-13T11:55:59Z

hyperframe/frame.py

+ self.body_len = len(data)
 self.data = (
-  data[padding_data_length:len(data)-self.total_padding].tobytes()
+ data[padding_data_length:self.body_len-self.total_padding].tobytes()


Why did two bytes of indentation get stripped here?

Same as below.

Lukasa · 2016-10-13T11:58:01Z

Makefile

 py.test -n 4 --cov hyperframe test/
+
+bench:
+python -m pytest bench/ --benchmark-only --benchmark-group-by=name --benchmark-autosave --benchmark-compare


It would be really good to move this into a tox environment, and then to have the makefile delegate to tox. Saves people the effort of managing their virtual environments.

Sure, I can make this change.

alexwlchan · 2016-10-13T13:16:04Z

bench/test_serialize.py

+
+class TestSerializeBenchmarks:
+ def test_serialize_data_frame(self, benchmark):
+ benchmark(_serialize_function, _create_frames("data"))


Given that pytest is already in use elsewhere in the codebase, I wonder if it would be worth condensing this using @pytest.mark.parametrize, something like:

@pytest.mark.parametrize('frame_name', [ 'data', 'priority', ..., 'rst_stream' ]) def test_serialize_frame(frame_name, benchmark): benchmark(_serialize_function, _create_frames(frame_name))

Probably, I will look into this tonight.

sethmlarson added 2 commits September 29, 2016 14:50

Add basic benchmark tool

a4104ee

Remove .idea

5e19f6a

sethmlarson added enhancement help wanted labels Sep 29, 2016

Add the rest of Frame parse/serialize bench

c48fc5d

sethmlarson added the question label Sep 29, 2016

Add AltSvcFrame

7fc07c5

Lukasa requested changes Sep 30, 2016

View reviewed changes

sethmlarson added 9 commits September 30, 2016 09:09

Add flag setting/getting benchmark

90a5f48

Merge remote-tracking branch 'upstream/master'

14c035b

Changes to frame benchmarks

fd1da2d

Fix AltSvc factory

311c3bf

Add seed and comments

13216cb

Each benchmark has it's own RNG

3e6104e

Add seed

f163523

Merge branch 'master' of https://github.com/SethMichaelLarson/hyperframe

a67d448

Switch to pytest.benchmark approach

4dac600

sethmlarson and others added 5 commits October 2, 2016 14:19

Add a few more optimizations

c2c151d

Lines to <80 characters

56c1e38

Remove clean from Makefile

7a5acd2

No longer need the _BaseBenchmark type.

13be0d3

Remove unused import

f60f19d

Lukasa requested changes Oct 13, 2016

View reviewed changes

alexwlchan reviewed Oct 13, 2016

View reviewed changes

Merge branch 'master' into master

99ccdb4

sethmlarson closed this Jul 2, 2018

		@@ -0,0 +1,19 @@
		from bench.utils.factories import FRAME_FACTORIES

Benchmarking Tools #64

Benchmarking Tools #64

Uh oh!

Conversation

sethmlarson commented Sep 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Lukasa left a comment

Choose a reason for hiding this comment

Lukasa Sep 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sethmlarson Sep 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Lukasa commented Sep 30, 2016

sethmlarson commented Sep 30, 2016

sethmlarson commented Sep 30, 2016

sethmlarson commented Oct 2, 2016

sethmlarson commented Oct 12, 2016

Lukasa commented Oct 12, 2016

sethmlarson commented Oct 12, 2016

Lukasa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sethmlarson Oct 13, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Labels

4 participants

sethmlarson commented Sep 29, 2016 •

edited

Loading

Lukasa Sep 30, 2016 •

edited

Loading

sethmlarson Sep 30, 2016 •

edited

Loading

sethmlarson Oct 13, 2016 •

edited

Loading