PERF: Switch to try/except in extract() rather than instance check #8202

jtratner · 2014-09-07T09:41:37Z

See #7269 for more. (doesn't completely resolve the speed difference, but it gets a 10-15% improvement).

Timing improvement is there, but it's small.

In [20]: %timeit test.str.match(pattern2) 10 loops, best of 3: 170 ms per loop In [21]: %timeit test.str.extract(pattern2) 10 loops, best of 3: 317 ms per loop In [22]: %timeit -n try_except_str_extract2(test, pattern2) 10 loops, best of 3: 265 ms per loop

@jreback - this is very minor, but if you think this needs a release note,
where should I put it?

This passed locally for me, but if Travis fails I'll fix it up.

isinstance check takes longer but accomplishes the same thing.

jreback · 2014-09-07T11:04:04Z

is their a vbench for this (I think yes; pls post results )
release notes in v0.15.0 performance section

jreback · 2014-09-07T13:02:00Z

This performs much worse on the vbench. Maybe worth testing for nans apriori - if a lot then use the current method, if not too many, use your methdo.

master

In [8]: %timeit many.str.extract(r'(\w*)matchthis(\w*)') 10 loops, best of 3: 62.6 ms per loop

0.14.1

In [6]: %timeit many.str.extract(r'(\w*)matchthis(\w*)') 10 loops, best of 3: 37.1 ms per loop

In [1]: import string In [2]: import itertools as IT In [3]: def make_series(letters, strlen, size): ...: return Series( ...: np.fromiter(IT.cycle(letters), count=size*strlen, dtype='|S1') ...: .view('|S{}'.format(strlen))) ...: In [4]: many = make_series('matchthis'+string.uppercase, strlen=19, size=10000) # 31% matches In [5]: few = make_series('matchthis'+string.uppercase*42, strlen=19, size=10000) # 1% matches

jtratner · 2014-09-07T16:41:42Z

Sure, good call - I didn't check for a vbench. I'll play around with it.

On Sun, Sep 7, 2014 at 6:02 AM, jreback notifications@github.com wrote:

This performs much worse on the vbench. Maybe worth testing for nans
apriori - if a lot then use the current method, if not too many, use your
methdo.

master

In [8]: %timeit many.str.extract(r'(\w_)matchthis(\w_)')
10 loops, best of 3: 62.6 ms per loop

0.14.1

In [6]: %timeit many.str.extract(r'(\w_)matchthis(\w_)')
10 loops, best of 3: 37.1 ms per loop

In [1]: import string

In [2]: import itertools as IT

In [3]: def make_series(letters, strlen, size):
...: return Series(
...: np.fromiter(IT.cycle(letters), count=size*strlen, dtype='|S1')
...: .view('|S{}'.format(strlen)))
...:

In [4]: many = make_series('matchthis'+string.uppercase, strlen=19, size=10000) # 31% matches

In [5]: few = make_series('matchthis'+string.uppercase*42, strlen=19, size=10000) # 1% matches

—
Reply to this email directly or view it on GitHub
#8202 (comment).

jtratner · 2014-09-07T16:47:56Z

going to close this until I actually find a way to speed this up.

jtratner · 2014-09-07T18:23:52Z

@jreback I think the difference there has more to do with how many matches were found than with actual speed differences (when I run the two versions with the same data, I get almost imperceptible differences - not really sure why I found differences previously).

Is there a good way to write a vbench that generates the same random data each time? Can I seed the generator in the setup phase or something?

jreback · 2014-09-07T18:50:05Z

sure out in a np.random.seed(...)

(in fact their is an issue to do this for all of the benches. - but needs some validation)

jreback · 2014-09-07T18:50:09Z

put

jtratner · 2014-09-07T19:13:15Z

yeah I just saw that

PERF: switch to faster try/except in extract()

b209a51

isinstance check takes longer but accomplishes the same thing.

jreback added Performance Memory or execution speed performance Strings String extension data type and string data labels Sep 7, 2014

jreback added this to the 0.15.0 milestone Sep 7, 2014

jreback modified the milestones: 0.15.1, 0.15.0 Sep 7, 2014

jtratner closed this Sep 7, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

PERF: Switch to try/except in extract() rather than instance check #8202

PERF: Switch to try/except in extract() rather than instance check #8202

Uh oh!

jtratner commented Sep 7, 2014

jreback commented Sep 7, 2014

jreback commented Sep 7, 2014

jtratner commented Sep 7, 2014

jtratner commented Sep 7, 2014

jtratner commented Sep 7, 2014

jreback commented Sep 7, 2014

jreback commented Sep 7, 2014

jtratner commented Sep 7, 2014

Labels

2 participants

Uh oh!

Uh oh!

PERF: Switch to try/except in extract() rather than instance check #8202

PERF: Switch to try/except in extract() rather than instance check #8202

Uh oh!

Conversation

jtratner commented Sep 7, 2014

jreback commented Sep 7, 2014

jreback commented Sep 7, 2014

jtratner commented Sep 7, 2014

jtratner commented Sep 7, 2014

jtratner commented Sep 7, 2014

jreback commented Sep 7, 2014

jreback commented Sep 7, 2014

jtratner commented Sep 7, 2014

Labels

2 participants