-   Notifications  You must be signed in to change notification settings 
- Fork 35
Benchmarking Tools #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm generally pretty happy with this! At the point where we want to pursue benchmarking other modules I think we should consider whether some of the common infrastructure can be hoisted out to a new installable module, but for now that's a YAGNI concern and I'm happy for everything to live here.
It'd be nice to have some extra commenting that explains why things are being done so that others who haven't taken part in this discussion can keep track of why certain things have been done the way they have (e.g. generating 100 random frames of a given type). However, otherwise, this is looking really good so far!
   bench/utils/__init__.py  Outdated    
 | self._watch = time.time() | ||
|  | ||
| def stop_watch(self): | ||
| stop = time.time() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So time.time() is potentially a bit tricky. Ideally we'd use this, or something equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll look into getting that added.
   bench/utils/__init__.py  Outdated    
 | import time | ||
| import random | ||
|  | ||
| _RANDOM = random.Random(0) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty nervous about using 0 as a seed here: older versions of Python have problems with the MT whereby seeds that have long sequences of NULL bytes initially produce random output that is not evenly distributed for quite a while. It'd be better to use a 128-bit random seed generated from urandom on your own machine and just hard-code that into the file with a comment that says why we did it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're here, I'm not sure that having a single instance of the RNG that is seeded once actually makes that much sense. While nominally that makes results reproducible, they're only reproducible if you run the exact same suite of tests. Probably good enough, but worth noting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I used seed 0 for testing purposes, we can go with whatever seeding method you think is best. The biggest reason I saw for using a seeded RNG is for a better comparison between two identical benchmarks run on one machine rather than inter-machine similarities or similarities between two different sets of benchmarks so using a randomly generated key will be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have a better understanding of what you meant here, I will make the random generator be per-benchmark now rather than global. If this is not what you meant correct me for this is a small change.
| By the way, @python-hyper/contributors, if anyone has feedback or thoughts here please do share it. I'd like us to become much more aware of what is happening to our performance as time goes on, and that means making it as easy as possible to contribute benchmarks. =D | 
| @Lukasa So here is the train of though behind my design decisions for the benchmarking tool so far. The biggest design piece is using the factories for testing frames. Each factory is a function that generates a random frame and (hopefully) within a set of these frames would flex all the muscles that are available to the frame including all flags set and unset, all options set to a good full range of values, body lengths both short and long. These factories are used to generate our sample of frames that will either be serialized into bytes or serialized and then parsed into a new frame to exercise parsing and serializing frames. The main reason I used factories is because they are shared among both the parsing and serializing benchmarks and makes extending the benchmarks for each frame much easier and work is spread in very few places. When each parse/serialize benchmark is run, a sample of 100 frames is generated but many different parsings/serializings occur in a single benchmark. The reason that the sample is smaller is because generating one frame per serialize/parse is expensive and memory intensive so it's best to simply have a sample big enough to get a full range of distinct frames to get a good benchmark of the "average" frame. The next steps include making an artifact that will make it easier to compare one benchmark to another, especially in an automated way or a way that allows comparison between the master branch and your local copy in an automated way, but this will come after this PR goes through initially. I was thinking having a way to pass the other comparison benchmark in via command line and then the next comparison will report % differences and statistical significance of the differences in addition to raw values. I will add comments to the actual benchmarking tool but I also wanted to explain my thoughts here as well. Thank you for the reviews @Lukasa I look forward to working on this tool until it becomes big enough for it's own repo. ;) | 
| @Lukasa I also understand you will be taking a vacation soon so do not worry I am not in a hurry to get this PR merged into master. It is good to get things like this correct. :) | 
| Hello @python-hyper/contributors I have changed this benchmark to use pytest.benchmark fixture as it seems to already have a lot of good work done for us when it comes to comparing two benchmarks against different Python versions, different Github branches, etc. Let me know how you feel about this change? | 
| @Lukasa Welcome home from vacation! Mind re-reviewing the changes I've made? I switched to the py.test-benchmark fixture over my own. | 
| @SethMichaelLarson Yup, happy to do it, but will have to wait until tomorrow (my backlog was pretty massive!). | 
| @Lukasa Take your time, I saw your tweets about email backlog, haha :) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, some initial high level notes. This is looking good, though I haven't actually run it yet: want to address the tox/makefile changes first before I get too nitty-gritty.
| @@ -0,0 +1,19 @@ | |||
| from bench.utils.factories import FRAME_FACTORIES | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be better to use relative imports here, I think, rather than rely on the local directory being placed in sys.path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can change this.
|  | ||
|  | ||
| def get_bool(): | ||
| return True if get_int(0, 1) == 1 else False | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given your own performance notes, let's reduce the overhead of this a bit by replacing it with return bool(get_int(0, 1))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True if x else False is faster than bool(x) according to my notes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, ok then. =)
| if get_bool(): | ||
| frame.flags.add("END_HEADERS") | ||
| if get_bool(): | ||
| frame.flags.add("END_STREAM") | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is not quite right. You can have a HEADERS frame with END_STREAM set but not END_HEADERS: specifically, if that HEADERS frame is going to be followed by a CONTINUATION frame. CONTINUATION frames do not carry END_STREAM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh, I had them mixed up, I will change these around. Thanks for the clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated side note: I wish there was some way that all caps could appear less threatening. So many years of Internet have conditioned me to see all caps comments as angry and that is my mind's initial knee-jerk reaction until I read HEADERS haha :)
 You have done nothing wrong it was just a funny thing that I noticed. I know you would not get that livid about a PR. Although maybe you would, I dunno! ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, that's ok, keep doing this programming lark and you'll get to the mindset I have, where ALL_CAPS means COMPILE_TIME_CONSTANT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lukasa Oh I've been programming a long time, I've only recently joined Github. Been at it for about 12 years now. ;) I understand the convention but apparently social rules come first mentally when reading comments haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
;) Nice. Happily that's not a problem I have, though it may just be that I only spend time in the parts of the internet where all caps isn't used much.
| if get_bool(): | ||
| frame.flags.add("END_HEADERS") | ||
|  | ||
| if get_int(0, 10) != 0: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason the odds of putting padding on this are 9/10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanted to test both padded and non-padded but I figured most frames with optional padding should be padded. Should I just increase this to 100% of frames with optional padding will be padded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we may as well. =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I will make this change.
| self.body_len = len(data) | ||
| self.data = ( | ||
| data[priority_data_length:len(data)-self.total_padding].tobytes() | ||
| data[priority_data_length:self.body_len-self.total_padding].tobytes() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did two bytes of indentation get stripped here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line length limits and being no good place to break the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can save the extra characters by changing priority_data_length to priority_data_len.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good plan, I will make that change here and above and revert the white space.
| self.body_len = len(data) | ||
| self.data = ( | ||
| data[padding_data_length:len(data)-self.total_padding].tobytes() | ||
| data[padding_data_length:self.body_len-self.total_padding].tobytes() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did two bytes of indentation get stripped here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as below.
| py.test -n 4 --cov hyperframe test/ | ||
|  | ||
| bench: | ||
| python -m pytest bench/ --benchmark-only --benchmark-group-by=name --benchmark-autosave --benchmark-compare | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be really good to move this into a tox environment, and then to have the makefile delegate to tox. Saves people the effort of managing their virtual environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can make this change.
|  | ||
| class TestSerializeBenchmarks: | ||
| def test_serialize_data_frame(self, benchmark): | ||
| benchmark(_serialize_function, _create_frames("data")) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that pytest is already in use elsewhere in the codebase, I wonder if it would be worth condensing this using @pytest.mark.parametrize, something like:
@pytest.mark.parametrize('frame_name', [  'data',  'priority',  ...,  'rst_stream' ]) def test_serialize_frame(frame_name, benchmark): benchmark(_serialize_function, _create_frames(frame_name))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, I will look into this tonight.
Hello, I am opening this PR with the intent of not merging this work immediately, but rather to get feedback on how benchmarking of hyperframe, hpack, etc should be structured or put. If another repository is best then I will move all code there.
This current benchmarking tool is run using
make benchorpython setup.py bench. It runs all the classes that match the name^[^_].*Benchmark$in any modules in thebench/directory. The current output format is only graphical, in the future the output will be in a data format allowing an easy comparison between two different runs of the benchmarks.@Lukasa spoke about the possibility of this being automated in some way to keep our benchmarks up to date and to make it easier to make decisions regarding performance related improvements.
The output of a single benchmark run currently looks like this:
Constructive criticism is very much welcomed!