Skip to content

Commit f07f22a

Browse files
authored
Merge pull request #3 from StatguyUser/0.0.4
Optimized code for faster processing
2 parents 3505996 + 8d1ad6f commit f07f22a

File tree

8 files changed

+10
-6
lines changed

8 files changed

+10
-6
lines changed

build/lib/TextFeatureSelection.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -164,11 +164,13 @@ def getvalues_multiclass(self,unique_words,calc_df):
164164
label=[]
165165
word_presence=[]
166166

167+
#get binary pandas series for word if it is present row-wise or not
168+
word_presence=calc_df['input_doc_list'].str.contains('\\b'+word+'\\b')
169+
167170
for calc_base_label in set(self.target):
168171
##get binary pandas series for label if it is present row-wise or not
169172
label=calc_df['target']==calc_base_label
170-
#get binary pandas series for word if it is present row-wise or not
171-
word_presence=calc_df['input_doc_list'].str.contains('\\b'+word+'\\b')
173+
172174
##check if word count is existing and labels have value, to be sure if any regex error for word.
173175
if sum(word_presence) and sum(label):
174176
A,B,C,D,N=self.custom_cross_tab(label,word_presence)
-7.31 KB
Binary file not shown.
-5.05 KB
Binary file not shown.
7.31 KB
Binary file not shown.
5.04 KB
Binary file not shown.

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
setup(
1414
name='TextFeatureSelection',
15-
version='0.0.3',
15+
version='0.0.4',
1616
description='Implementation of various algorithms for feature selection for text features, based on filter method',
1717
long_description=long_description,
1818
long_description_content_type='text/markdown', # This is important!

src/TextFeatureSelection.egg-info/PKG-INFO

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Metadata-Version: 2.1
22
Name: TextFeatureSelection
3-
Version: 0.0.3
3+
Version: 0.0.4
44
Summary: Implementation of various algorithms for feature selection for text features, based on filter method
55
Home-page: https://github.com/StatguyUser/TextFeatureSelection
66
Author: StatguyUser

src/TextFeatureSelection.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -164,11 +164,13 @@ def getvalues_multiclass(self,unique_words,calc_df):
164164
label=[]
165165
word_presence=[]
166166

167+
#get binary pandas series for word if it is present row-wise or not
168+
word_presence=calc_df['input_doc_list'].str.contains('\\b'+word+'\\b')
169+
167170
for calc_base_label in set(self.target):
168171
##get binary pandas series for label if it is present row-wise or not
169172
label=calc_df['target']==calc_base_label
170-
#get binary pandas series for word if it is present row-wise or not
171-
word_presence=calc_df['input_doc_list'].str.contains('\\b'+word+'\\b')
173+
172174
##check if word count is existing and labels have value, to be sure if any regex error for word.
173175
if sum(word_presence) and sum(label):
174176
A,B,C,D,N=self.custom_cross_tab(label,word_presence)

0 commit comments

Comments
 (0)