Posts: 11 Threads: 4 Joined: Jun 2020 I am trying to make a regular express for df1(dataframe). I want to remove the expression related NOPOP.NoPop and NONPOP information in 3rd column. In order to achieve quick search, I put 3rd column as a index of dataframe. And operated it in "df.filter" way with regex. import pandas as pd k=[['a','b','c','NOPOP'],['d','e','f','POP'],['g','h','i','j'],['k','l','m','Pop'],['n','o','p','NoPop_AA'],['q','r','s','NONPOP']] df_exp=pd.DataFrame(k) df1=df_exp.set_index([3]) df2=df1.filter(regex='[^NOPOP]|[^NoPop]|[^NONPOP]', axis=0) Output: Out[263]: 0 1 2 3 NOPOP a b c POP d e f j g h i Pop k l m NoPop_AA n o p NONPOP q r s
The result did not delete "NOPOP.NoPop and NONPOP" related information, why not? my desire output is just like below Output: 0 1 2 3 POP d e f j g h i Pop k l m Posts: 7,398 Threads: 123 Joined: Sep 2016 Can use str.contains for this. import pandas as pd k = [ ["a", "b", "c", "NOPOP"], ["d", "e", "f", "POP"], ["g", "h", "i", "j"], ["k", "l", "m", "Pop"], ["n", "o", "p", "NoPop_AA"], ["q", "r", "s", "NONPOP"], ] df_exp = pd.DataFrame(k) >>> df_exp = df_exp[~df_exp[3].str.contains('NOPOP|NoPop|NONPOP')] >>> df1 = df_exp.set_index([3]) >>> df1 0 1 2 3 POP d e f j g h i Pop k l m Posts: 11 Threads: 4 Joined: Jun 2020 Thank you for your quick reply. It's workable, achieved my goal. (Jun-05-2020, 11:56 AM)snippsat Wrote: Can use str.contains for this. import pandas as pd k = [ ["a", "b", "c", "NOPOP"], ["d", "e", "f", "POP"], ["g", "h", "i", "j"], ["k", "l", "m", "Pop"], ["n", "o", "p", "NoPop_AA"], ["q", "r", "s", "NONPOP"], ] df_exp = pd.DataFrame(k) >>> df_exp = df_exp[~df_exp[3].str.contains('NOPOP|NoPop|NONPOP')] >>> df1 = df_exp.set_index([3]) >>> df1 0 1 2 3 POP d e f j g h i Pop k l m Posts: 11 Threads: 4 Joined: Jun 2020 Sorry for another question. I wonder if .str.contains includes specified functions just like re module? For example: ' ^AA' expresses only searching words start with AA. (Jun-05-2020, 11:56 AM)snippsat Wrote: Can use str.contains for this. import pandas as pd k = [ ["a", "b", "c", "NOPOP"], ["d", "e", "f", "POP"], ["g", "h", "i", "j"], ["k", "l", "m", "Pop"], ["n", "o", "p", "NoPop_AA"], ["q", "r", "s", "NONPOP"], ] df_exp = pd.DataFrame(k) >>> df_exp = df_exp[~df_exp[3].str.contains('NOPOP|NoPop|NONPOP')] >>> df1 = df_exp.set_index([3]) >>> df1 0 1 2 3 POP d e f j g h i Pop k l m Posts: 7,398 Threads: 123 Joined: Sep 2016 Jun-12-2020, 10:14 AM (This post was last modified: Jun-12-2020, 10:15 AM by snippsat.) (Jun-12-2020, 09:35 AM)cools0607 Wrote: I wonder if .str.contains includes specified functions just like re module? Yes str.contains can take regular expression patterns as in the re module. Quote:For example: '^AA' expresses only searching words start with AA. Yes that would work,Pandas have a lot build in so there is also a str.startswith. If wonder if something works,then is best to do a test. import pandas as pd d = { 'Quarters' : ['quarter1','quarter2','quarter3','quarter4'], 'Description': ['AA year', 'BB year', 'CC year', 'AA year'], 'Revenue': [23.5, 54.6, 5.45, 41.87] } df = pd.DataFrame(d)Test usage: >>> df[df['Description'].str.contains(r'^AA')] Description Quarters Revenue 0 AA year quarter1 23.50 3 AA year quarter4 41.87 >>> df[df['Description'].str.contains(r'^AA|BB')] Description Quarters Revenue 0 AA year quarter1 23.50 1 BB year quarter2 54.60 3 AA year quarter4 41.87 >>> # Using str.startswith >>> df[df['Description'].str.startswith('AA')] Description Quarters Revenue 0 AA year quarter1 23.50 3 AA year quarter4 41.87 >>> df[df['Description'].str.startswith(('AA', 'BB'))] Description Quarters Revenue 0 AA year quarter1 23.50 1 BB year quarter2 54.60 3 AA year quarter4 41.87 Posts: 11 Threads: 4 Joined: Jun 2020 Thank you for your reply. After trying your code, I got it. I think it is convenient for me to use .str.contains(r'^AA'). (Jun-12-2020, 10:14 AM)snippsat Wrote: (Jun-12-2020, 09:35 AM)cools0607 Wrote: I wonder if .str.contains includes specified functions just like re module? Yes str.contains can take regular expression patterns as in the re module. Quote:For example: '^AA' expresses only searching words start with AA. Yes that would work,Pandas have a lot build in so there is also a str.startswith. If wonder if something works,then is best to do a test. import pandas as pd d = { 'Quarters' : ['quarter1','quarter2','quarter3','quarter4'], 'Description': ['AA year', 'BB year', 'CC year', 'AA year'], 'Revenue': [23.5, 54.6, 5.45, 41.87] } df = pd.DataFrame(d)Test usage: >>> df[df['Description'].str.contains(r'^AA')] Description Quarters Revenue 0 AA year quarter1 23.50 3 AA year quarter4 41.87 >>> df[df['Description'].str.contains(r'^AA|BB')] Description Quarters Revenue 0 AA year quarter1 23.50 1 BB year quarter2 54.60 3 AA year quarter4 41.87 >>> # Using str.startswith >>> df[df['Description'].str.startswith('AA')] Description Quarters Revenue 0 AA year quarter1 23.50 3 AA year quarter4 41.87 >>> df[df['Description'].str.startswith(('AA', 'BB'))] Description Quarters Revenue 0 AA year quarter1 23.50 1 BB year quarter2 54.60 3 AA year quarter4 41.87 Posts: 11 Threads: 4 Joined: Jun 2020 Jun-15-2020, 07:34 AM (This post was last modified: Jun-15-2020, 07:39 AM by cools0607.) sorry for another question. I tried to search lots of data from Excel. After importing data to list(data structure). I tried two methods. 1. using list with re module search. 2. Transfer list --> dataframe and then apply with .str.contains() method Both of them can be workable. But dataframe is more slower than pandas dataframe. Is it reasonable? PS: python console shows below user warning UserWarning: This pattern has match groups. To actually get the groups, use str.extract. return func(self, *args, **kwargs) (Jun-12-2020, 10:14 AM)snippsat Wrote: (Jun-12-2020, 09:35 AM)cools0607 Wrote: I wonder if .str.contains includes specified functions just like re module? Yes str.contains can take regular expression patterns as in the re module. Quote:For example: '^AA' expresses only searching words start with AA. Yes that would work,Pandas have a lot build in so there is also a str.startswith. If wonder if something works,then is best to do a test. import pandas as pd d = { 'Quarters' : ['quarter1','quarter2','quarter3','quarter4'], 'Description': ['AA year', 'BB year', 'CC year', 'AA year'], 'Revenue': [23.5, 54.6, 5.45, 41.87] } df = pd.DataFrame(d)Test usage: >>> df[df['Description'].str.contains(r'^AA')] Description Quarters Revenue 0 AA year quarter1 23.50 3 AA year quarter4 41.87 >>> df[df['Description'].str.contains(r'^AA|BB')] Description Quarters Revenue 0 AA year quarter1 23.50 1 BB year quarter2 54.60 3 AA year quarter4 41.87 >>> # Using str.startswith >>> df[df['Description'].str.startswith('AA')] Description Quarters Revenue 0 AA year quarter1 23.50 3 AA year quarter4 41.87 >>> df[df['Description'].str.startswith(('AA', 'BB'))] Description Quarters Revenue 0 AA year quarter1 23.50 1 BB year quarter2 54.60 3 AA year quarter4 41.87 |