Questions: Feature Engineering and Selection

5 questions to test your understanding

Score: 0 / 5
Question 1 Multiple Choice

A data scientist computes feature importance scores using the entire dataset (training + test combined), selects the top 15 features, then trains and evaluates a model on the train/test split. The test accuracy looks excellent. What is the most likely problem with this workflow?

AUsing too many features always causes overfitting, regardless of how they were selected
BFeature importance scores computed on the full dataset leak test set information into the selection step, inflating performance estimates that will not hold on truly unseen data
CThe feature selection step should always come after model evaluation, not before
DImportance scores are only valid for tree-based models, not other algorithms
Question 2 Multiple Choice

You are building a model with hundreds of candidate features and cannot afford to repeatedly train the full model for wrapper-based selection. Which selection method is most appropriate, and what is its main limitation?

AEmbedded methods like Lasso — but they require the target variable to be continuous
BWrapper methods like forward selection — but they are computationally cheap and always preferred
CFilter methods using statistical tests (correlation, mutual information) — but they evaluate features independently and miss interaction effects between features
DDomain knowledge alone — algorithmic selection is only valid for large datasets
Question 3 True / False

Adding more features to a model generally improves performance because the model can typically learn to ignore features that are irrelevant.

TTrue
FFalse
Question 4 True / False

Performing feature selection using only training data, then applying the same selection to the test set, is a valid and complete safeguard against data leakage in feature selection.

TTrue
FFalse
Question 5 Short Answer

Why does feature engineering often matter more than algorithm choice in applied machine learning, and what is the guiding question when deciding whether to create a new feature?

Think about your answer, then reveal below.