Pandas's regular expression function result is so strange

cools0607 · Jun-05-2020, 09:20 AM

I am trying to make a regular express for df1(dataframe).
I want to remove the expression related NOPOP.NoPop and NONPOP information in 3rd column.
In order to achieve quick search, I put 3rd column as a index of dataframe.
And operated it in "df.filter" way with regex.

import pandas as pd k=[['a','b','c','NOPOP'],['d','e','f','POP'],['g','h','i','j'],['k','l','m','Pop'],['n','o','p','NoPop_AA'],['q','r','s','NONPOP']] df_exp=pd.DataFrame(k) df1=df_exp.set_index([3]) df2=df1.filter(regex='[^NOPOP]|[^NoPop]|[^NONPOP]', axis=0)

Output:
Out[263]: 0 1 2 3 NOPOP a b c POP d e f j g h i Pop k l m NoPop_AA n o p NONPOP q r s

The result did not delete "NOPOP.NoPop and NONPOP" related information, why not?

my desire output is just like below

Output:
 0 1 2 3 POP d e f j g h i Pop k l m

***snippsat*** · Jun-05-2020, 11:56 AM

Can use str.contains for this.

import pandas as pd k = [ ["a", "b", "c", "NOPOP"], ["d", "e", "f", "POP"], ["g", "h", "i", "j"], ["k", "l", "m", "Pop"], ["n", "o", "p", "NoPop_AA"], ["q", "r", "s", "NONPOP"], ] df_exp = pd.DataFrame(k)

>>> df_exp = df_exp[~df_exp[3].str.contains('NOPOP|NoPop|NONPOP')] >>> df1 = df_exp.set_index([3]) >>> df1 0 1 2 3 POP d e f j g h i Pop k l m

cools0607 · Jun-11-2020, 07:46 AM

Thank you for your quick reply. It's workable, achieved my goal.

(Jun-05-2020, 11:56 AM)snippsat Wrote: Can use str.contains for this.

import pandas as pd k = [ ["a", "b", "c", "NOPOP"], ["d", "e", "f", "POP"], ["g", "h", "i", "j"], ["k", "l", "m", "Pop"], ["n", "o", "p", "NoPop_AA"], ["q", "r", "s", "NONPOP"], ] df_exp = pd.DataFrame(k)

>>> df_exp = df_exp[~df_exp[3].str.contains('NOPOP|NoPop|NONPOP')] >>> df1 = df_exp.set_index([3]) >>> df1 0 1 2 3 POP d e f j g h i Pop k l m

cools0607 · Jun-12-2020, 09:35 AM

Sorry for another question.
I wonder if .str.contains includes specified functions just like re module?
For example: '^AA' expresses only searching words start with AA.

(Jun-05-2020, 11:56 AM)snippsat Wrote: Can use str.contains for this.

import pandas as pd k = [ ["a", "b", "c", "NOPOP"], ["d", "e", "f", "POP"], ["g", "h", "i", "j"], ["k", "l", "m", "Pop"], ["n", "o", "p", "NoPop_AA"], ["q", "r", "s", "NONPOP"], ] df_exp = pd.DataFrame(k)

>>> df_exp = df_exp[~df_exp[3].str.contains('NOPOP|NoPop|NONPOP')] >>> df1 = df_exp.set_index([3]) >>> df1 0 1 2 3 POP d e f j g h i Pop k l m

***snippsat*** · (This post was last modified: Jun-12-2020, 10:15 AM by snippsat.)

(Jun-12-2020, 09:35 AM)cools0607 Wrote: I wonder if .str.contains includes specified functions just like re module?

Yes str.contains can take regular expression patterns as in the re module.

Quote:For example: '^AA' expresses only searching words start with AA.

Yes that would work,Pandas have a lot build in so there is also a str.startswith.
If wonder if something works,then is best to do a test.

import pandas as pd d = { 'Quarters' : ['quarter1','quarter2','quarter3','quarter4'], 'Description': ['AA year', 'BB year', 'CC year', 'AA year'], 'Revenue': [23.5, 54.6, 5.45, 41.87] } df = pd.DataFrame(d)

Test usage:

>>> df[df['Description'].str.contains(r'^AA')] Description Quarters Revenue 0 AA year quarter1 23.50 3 AA year quarter4 41.87 >>> df[df['Description'].str.contains(r'^AA|BB')] Description Quarters Revenue 0 AA year quarter1 23.50 1 BB year quarter2 54.60 3 AA year quarter4 41.87 >>> # Using str.startswith >>> df[df['Description'].str.startswith('AA')] Description Quarters Revenue 0 AA year quarter1 23.50 3 AA year quarter4 41.87 >>> df[df['Description'].str.startswith(('AA', 'BB'))] Description Quarters Revenue 0 AA year quarter1 23.50 1 BB year quarter2 54.60 3 AA year quarter4 41.87

cools0607 · Jun-15-2020, 03:16 AM

Thank you for your reply. After trying your code, I got it. I think it is convenient for me to use .str.contains(r'^AA').

(Jun-12-2020, 10:14 AM)snippsat Wrote:
(Jun-12-2020, 09:35 AM)cools0607 Wrote: I wonder if .str.contains includes specified functions just like re module?
Yes str.contains can take regular expression patterns as in the re module.

Quote:For example: '^AA' expresses only searching words start with AA.
Yes that would work,Pandas have a lot build in so there is also a str.startswith.
If wonder if something works,then is best to do a test.
import pandas as pd d = { 'Quarters' : ['quarter1','quarter2','quarter3','quarter4'], 'Description': ['AA year', 'BB year', 'CC year', 'AA year'], 'Revenue': [23.5, 54.6, 5.45, 41.87] } df = pd.DataFrame(d)
Test usage:
>>> df[df['Description'].str.contains(r'^AA')] Description Quarters Revenue 0 AA year quarter1 23.50 3 AA year quarter4 41.87 >>> df[df['Description'].str.contains(r'^AA|BB')] Description Quarters Revenue 0 AA year quarter1 23.50 1 BB year quarter2 54.60 3 AA year quarter4 41.87 >>> # Using str.startswith >>> df[df['Description'].str.startswith('AA')] Description Quarters Revenue 0 AA year quarter1 23.50 3 AA year quarter4 41.87 >>> df[df['Description'].str.startswith(('AA', 'BB'))] Description Quarters Revenue 0 AA year quarter1 23.50 1 BB year quarter2 54.60 3 AA year quarter4 41.87 

cools0607 · (This post was last modified: Jun-15-2020, 07:39 AM by cools0607.)

sorry for another question.
I tried to search lots of data from Excel. After importing data to list(data structure).
I tried two methods.
1. using list with re module search.
2. Transfer list --> dataframe and then apply with .str.contains() method
Both of them can be workable. But dataframe is more slower than pandas dataframe. Is it reasonable?
PS: python console shows below user warning

UserWarning: This pattern has match groups. To actually get the groups, use str.extract. return func(self, *args, **kwargs)

(Jun-12-2020, 10:14 AM)snippsat Wrote:
(Jun-12-2020, 09:35 AM)cools0607 Wrote: I wonder if .str.contains includes specified functions just like re module?
Yes str.contains can take regular expression patterns as in the re module.

Quote:For example: '^AA' expresses only searching words start with AA.
Yes that would work,Pandas have a lot build in so there is also a str.startswith.
If wonder if something works,then is best to do a test.
import pandas as pd d = { 'Quarters' : ['quarter1','quarter2','quarter3','quarter4'], 'Description': ['AA year', 'BB year', 'CC year', 'AA year'], 'Revenue': [23.5, 54.6, 5.45, 41.87] } df = pd.DataFrame(d)
Test usage:
>>> df[df['Description'].str.contains(r'^AA')] Description Quarters Revenue 0 AA year quarter1 23.50 3 AA year quarter4 41.87 >>> df[df['Description'].str.contains(r'^AA|BB')] Description Quarters Revenue 0 AA year quarter1 23.50 1 BB year quarter2 54.60 3 AA year quarter4 41.87 >>> # Using str.startswith >>> df[df['Description'].str.startswith('AA')] Description Quarters Revenue 0 AA year quarter1 23.50 3 AA year quarter4 41.87 >>> df[df['Description'].str.startswith(('AA', 'BB'))] Description Quarters Revenue 0 AA year quarter1 23.50 1 BB year quarter2 54.60 3 AA year quarter4 41.87 

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Regular expression help	anilrajr	4	2,569	May-08-2024, 06:18 PM Last Post: deanhystad
	strange result in xor	zapad	6	2,444	May-07-2024, 09:09 PM Last Post: deanhystad
	data validation with specific regular expression	shaheen07	0	1,249	Jan-12-2024, 07:56 AM Last Post: shaheen07
	Regular Expression search to comment lines of code	Gman2233	5	3,777	Sep-08-2022, 06:57 AM Last Post: ndc85430
	List Creation and Position of Continue Statement In Regular Expression Code	new_coder_231013	3	3,353	Jun-15-2022, 12:00 PM Last Post: new_coder_231013
	Regex Expression With Code Query In Pandas	eddywinch82	8	4,938	Apr-13-2022, 09:12 AM Last Post: snippsat
	Need help with my code (regular expression)	shailc	5	3,623	Apr-04-2022, 07:34 PM Last Post: shailc
	Regular Expression for matching words	xinyulon	1	3,190	Mar-09-2022, 10:34 PM Last Post: snippsat
	Use of groupby in a function with Pandas	Paulman	0	1,730	Dec-03-2021, 04:56 PM Last Post: Paulman
	regular expression question	Skaperen	4	4,046	Aug-23-2021, 06:01 PM Last Post: Skaperen

Pandas's regular expression function result is so strange

User Panel Messages

Announcements