Feature Engineering
Contents
Feature Engineering#
This section covers various feature engineering examples collected from following sources
Kaggle Feature Engineering Course
Tips for Feature Engineering#
Generating Ratio Feature is often a good idea
First step in feature engineering -> Create a feature utility metric to evaluate association between feature and target
Example of this metric -> Mutual Information
Understanding Feature Significance using Mutual Information
Mutual Information ref: Kaggle Feature Engineering
Relationships between two quantities
Difference from correlation: MI can detect any kind of relationship whereas Correlation only detects linear relationships
MI esp. useful at start of Feature Engineering
Easy to use and interpret
Computationally efficient
Theoretically well founded
Resistant to overfitting
Can detect any kinds of relationships
MI measures relationship in terms of uncertainty( or entropy)
Extent to which knowledge of one quantity reduces uncertainty about the other
If you know the variable, How much more confident you would be about the target?
Relative Potential of feature
Univariate metric -> Can’t detect interaction
Theoretically, between 0 to inf. In practice rarely above 2.0
Actual usefulness depends on the model you use it in. Sometime you need to transform the data to make the association evident to model which it can then learn
Feature creation tip:
Dataset Documentation
Domain Knowledge - Internet, Books & Journal Articles
Data Visualization