Variable Types
- Categorical Variables: Nominal and Ordinal
 - Numerical Variables: Discrete and continuous
 - Mixed variables: strings and numbers
 - Datetime variables
 
| Variable Types | Code + Blog Link | Video Link | 
|---|---|---|
Variable Characteristics
- Missing Data
 - Cardinality
 - Category Frequency
 - Distributions
 - Outliers
 - Magnitude
 
| Variable Characteristics | Code + Blog Link | Video Link | 
|---|---|---|
Missing Data Imputation
- For Numerical Variables 
- Mean and Median Imputation
 - Arbitrary value imputation
 - End of Tail Imputation
 
 - For Categorical Variables 
- Frequent category imputation
 - Adding a missing category
 
 - Random Sample Imputation
 - Adding a missing indicator
 - Imputation with Scikit-learn
 - Imputation with Feature-engine
 
| Missing Data Imputation | Code + Blog Link | Video Link | 
|---|---|---|
Multivariate Imputation
- MICE
 - KNN imputation
 
| Multivariate Imputation | Code + Blog Link | Video Link | 
|---|---|---|
Categorical Variable Encoding
- Traditional Techniques 
- One hot encoding: simple and of frequent categories
 - Ordinal / Label encoding: arbitrary and ordered
 - Count / Frequency encoding
 
 - Monotonic Relationship 
- Target mean encoding
 - Weight of evidence
 - Ordered label encoding
 
 - Alternative Techniques 
- Binary encoding
 - Feature hashing
 - Probability Ratio
 
 - For Rare Labels 
- One hot encoding of frequent categories
 - Grouping of rare categories
 - Rare Label encoding
 - Encoding with Scikit-learn
 - Encoding with category encoders
 
 
| Categorical Variable Encoding | Code + Blog Link | Video Link | 
|---|---|---|
Variable Transformation
- Mathematical Transformations 
- Logarithic
 - Exponential / Power
 - Reciprocal
 - Box-Cox
 - Yeo-Johnson
 
 - Discretisation 
- Unsupervised 
- Equal-width
 - Equal-frequency
 - K means
 
 - Supervised 
- Decision Tree
 
 
 - Unsupervised 
 - Other 
- Transformation with Scikit-learn
 
 
| Variable Transformation | Code + Blog Link | Video Link | 
|---|---|---|
Discretisation
- Arbitrary
 - Equal-frequency discretisation
 - Equal-width discretisation
 - K-means discretisation
 - Discretisation with trees
 - Discretisation with Scikit-learn
 - Discretisation with Feature-engine
 
| Discretisation | Code + Blog Link | Video Link | 
|---|---|---|
Outliers
- Discretisation
 - Capping / Censoring
 - Trimming / Truncation
 
| Outliers | Code + Blog Link | Video Link | 
|---|---|---|
Feature Scaling
- Standardisation (common one)
 - MinMaxScaling (common one)
 - MaxAbsoluteScaling
 - RobustScaling
 - Scaling to absolute maxima
 - Scaling to median & quantiles
 - Scaling to unit norm
 
Models Effected by magnitude of feature
- Linear & Logistic Regression
 - SVM
 - KNN
 - K-means Clustering
 - LDA
 - PCA
 - Neural Networks
 
Models insensitive to feature magnitude - Tree Based Models
- Classification & Regression Trees
 - Random Forest
 - Gradient Boosted Trees
 
| Feature Scaling | Code + Blog Link | Video Link | 
|---|---|---|
Mixed variables
- Creating new variables from strings and numbers
 
| Mixed variables | Code + Blog Link | Video Link | 
|---|---|---|
Datetime Variables
- Extracting day, month, week, semester, year ...etc
 - Extracting hour, min, sec ...etc
 - Capturing Elapsed time 
- Time between transactions
 - Age
 
 - Working with timezones
 
| Datetime | Code + Blog Link | Video Link | 
|---|---|---|
Text
- Characters, Words, Unique words
 - Lexical diversity
 - Sentences, Paragraphs
 - Bag of Words
 - TFiDF
 
Transactions & Time Series
- Aggregate data 
- Number of payments in last 3, 6, 12 months
 - Time since last transaction
 - Total spending in last month
 
 
Feature Combination
- Ratio : total debt with income --> Debt to income ratio
 - Sum : Debt in different credit cards --> total debt
 - Subtraction : Income without expenses --> disposable income
 
Pipelines
- Classification Pipeline
 - Regression Pipeline
 - Pipeline with cross-validation
 
| Pipelines | Code + Blog Link | Video Link | 
|---|---|---|


