Feature Engineering#

This section covers various feature engineering examples collected from following sources

Tips for Feature Engineering#

  • Generating Ratio Feature is often a good idea

  • First step in feature engineering -> Create a feature utility metric to evaluate association between feature and target

  • Example of this metric -> Mutual Information

  • Understanding Feature Significance using Mutual Information

    • Mutual Information ref: Kaggle Feature Engineering

      • Relationships between two quantities

      • Difference from correlation: MI can detect any kind of relationship whereas Correlation only detects linear relationships

    • MI esp. useful at start of Feature Engineering

      • Easy to use and interpret

      • Computationally efficient

      • Theoretically well founded

      • Resistant to overfitting

      • Can detect any kinds of relationships

    • MI measures relationship in terms of uncertainty( or entropy)

      • Extent to which knowledge of one quantity reduces uncertainty about the other

      • If you know the variable, How much more confident you would be about the target?

      • Relative Potential of feature

      • Univariate metric -> Can’t detect interaction

      • Theoretically, between 0 to inf. In practice rarely above 2.0

      • Actual usefulness depends on the model you use it in. Sometime you need to transform the data to make the association evident to model which it can then learn

  • Feature creation tip:

    • Dataset Documentation

    • Domain Knowledge - Internet, Books & Journal Articles

    • Solution Write Up

    • Data Visualization