It involves converting raw data from one or more sources into features using domain knowledge and statistics that can be used for training ML models.
A feature engineering lifecycle has three steps –
- Feature Selection
- Feature Creation
- Feature transformation
Feature Selection
Identify required features from training dataset and filter out the redundant and unnecessary ones from the training dataset. It helps reduce feature dimensionality, which produces small dataset and so speeds up ML model training.
One of the techniques used for feature selection is feature importance score. It tells how much a feature contributes to final model with respect to other features. So we can filter out features with feature importance score of 0 and analyze remaining ones based on score whether to keep or filter.
Feature Creation
Combine existing features into new features or combine attributes into new attributes. These new features helps models produce more accurate predictions
Feature transformation
Feature transformation is calculating missing features using techniques like imputation. It also includes scaling numerical values using techniques like standardization and normalization. And converting non numerical features into numerical features so that model can make sense of these non numerical features using techniques like one hot encoding.
Feature Engineering pipeline
Once the feature selection/creation is done, data set is split into training dataset and test data set. Training dataset is further split into training and validation dataset. Validation dataset is used to evaluate model and tune hyperparameters.
Thanks for stopping by! Hope this gives you a brief overview in to feature engineering. Eager to hear your thoughts and chat, please leave comments below and we can discuss.