Skip to content

Commit dc1ce00

Browse files
Update README.md
1 parent 2257b72 commit dc1ce00

File tree

1 file changed

+24
-4
lines changed

1 file changed

+24
-4
lines changed

README.md

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -84,17 +84,15 @@ This projects handles the task with minimal user interaction by analyzing your d
8484
>**Output:**<br />
8585
>None<br />
8686
87-
**9) feature_transformation(train_data,test_data,continuous_features,discrete_features,transformation,dependent_feature):**<br /> The function performing the feature transormation technique as per the user input.<br />
87+
**9) feature_transformation(train_data,continuous_features,discrete_features,transformation,dependent_feature):**<br /> The function performing the feature transormation technique as per the user input.<br />
8888
>**Input:**<br />
8989
>train_data=Training dataset<br />
90-
>test_data=Test dataset<br />
9190
>continuous_features= List of features names associated containing continuous numerical values<br />
9291
>discrete_features=List of features names associated containing discrete numerical values<br />
9392
>transformation=Type of transformation: none=No transformation, log=Log Transformation, sqrt= Square root Transformation, reciprocal= Reciprocal Transformation, exp= Exponential Transformation, boxcox=Boxcox Transformation<br />
9493
>dependent_feature= Dependent feature name in string format<br />
9594
>**Output:**<br />
9695
>X_data=Training dataset<br />
97-
>t_data=Test dataset<br />
9896
9997
**10) categorical_transformation(train_data,categorical_encoding):**<br /> This function transforms the categorical featres in the numerical ones using encoding techniques.<br />
10098
>**Input:**<br />
@@ -103,7 +101,7 @@ This projects handles the task with minimal user interaction by analyzing your d
103101
>**Output:**<br />
104102
>X_data=Training dataset<br />
105103
106-
**11) feature_selection(Xtrain,ytrain, threshold, data_type, filter_type):**<br />This function performs the feature selection based on the dependent and independent features.<br />
104+
**11a) feature_selection(Xtrain,ytrain, threshold, data_type, filter_type):**<br />This function performs the feature selection based on the dependent and independent features in train dataset.<br />
107105
>**Input:**<br />
108106
>Xtrain=Training dataset<br />
109107
>ytrain=dependent data in training dataset<br />
@@ -117,10 +115,32 @@ This projects handles the task with minimal user interaction by analyzing your d
117115
>**Output:**<br />
118116
>Xtrain= Training dataset<br />
119117
>feature_df= Dataframe containig features with their pvalue <br />
118+
**11b) feature_selection(Xtrain,ytrain,Xtest,ytest, threshold, data_type, filter_type):**<br />This function performs the feature selection based on the dependent and independent features in train dataset.<br />
119+
>**Input:**<br />
120+
>Xtrain=Training dataset<br />
121+
>ytrain=dependent data in training dataset<br />
122+
>Xtest=Test dataset<br />
123+
>ytest=dependent data in test dataset<br />
124+
>threshold= Threshold for the correlation<br />
125+
>{'in_num_out_num':{'linear':['pearson'],'non-linear':['spearman']},<br />
126+
> 'in_num_out_cat':{'linear':['ANOVA'],'non-linear':['kendall']},<br />
127+
> 'in_cat_out_num':{'linear':['ANOVA'],'non-linear':['kendall']},<br />
128+
> 'in_cat_out_cat':{'chi_square_test':True,'mutual_info':True},}<br />
129+
>data_type= Data linear or non-linearly dependent on the output label<br />
130+
>filter_type= If input data is numerical and output is numerical then --'in_num_out_num' as shown in the above dictionary<br />
131+
>**Output:**<br />
132+
>Xtrain= Training dataset<br />
133+
>Xtest= Test dataset<br />
134+
>feature_df= Dataframe containig features with their pvalue <br />
135+
120136

121137
**12) convert_dtype(data,categorical_features):**<br /> This function converts the categorical fetaures containing the numeric values but presented as categorical into the int format.<br />
122138
>**Input:**<br />
123139
>data= Dataset<br />
124140
>categorical_features=List of features names associated containing categorical values<br />
125141
>**Output:**<br />
126142
>df=Dataset<br />
143+
144+
145+
***Note***<br />
146+
**Use same paramters for both train and test dataset for better accuracy**

0 commit comments

Comments
 (0)