-
-
Couldn't load subscription status.
- Fork 19.2k
Closed
Labels
PerformanceMemory or execution speed performanceMemory or execution speed performanceStringsString extension data type and string dataString extension data type and string data
Description
The regex-group-extraction functionality of match is being replaced by extract, but extract runs much slower when multiple groups are being extracted.
Here is some test code:
import pandas as pd from datetime import datetime from pandas.util.print_versions import show_versions show_versions() test = pd.Series(['here is some sample text' for x in range(100000)]) def test_regex(pattern): now = datetime.now() match_result = test.str.match(pattern) print "Using match:", datetime.now() - now now = datetime.now() extract_result = test.str.extract(pattern) print "Using extract:", datetime.now() - now print "SINGLE GROUP" test_regex('.*some (.).*') print "MULTIPLE GROUPS" test_regex('.*some (.)(.).*')On my machine (running pandas: 0.13.0rc1-64-gceec8bf), this reports:
SINGLE GROUP Using match: 0:00:00.090317 Using extract: 0:00:00.116123 MULTIPLE GROUPS Using match: 0:00:00.094432 Using extract: 0:00:15.857041 Metadata
Metadata
Assignees
Labels
PerformanceMemory or execution speed performanceMemory or execution speed performanceStringsString extension data type and string dataString extension data type and string data