Questions — Feature Engineering and Selection

Question 1 Multiple Choice

A data scientist computes feature importance scores using the entire dataset (training + test combined), selects the top 15 features, then trains and evaluates a model on the train/test split. The test accuracy looks excellent. What is the most likely problem with this workflow?

AUsing too many features always causes overfitting, regardless of how they were selected

BFeature importance scores computed on the full dataset leak test set information into the selection step, inflating performance estimates that will not hold on truly unseen data

CThe feature selection step should always come after model evaluation, not before

DImportance scores are only valid for tree-based models, not other algorithms

Question 2 Multiple Choice

You are building a model with hundreds of candidate features and cannot afford to repeatedly train the full model for wrapper-based selection. Which selection method is most appropriate, and what is its main limitation?

AEmbedded methods like Lasso — but they require the target variable to be continuous

BWrapper methods like forward selection — but they are computationally cheap and always preferred

CFilter methods using statistical tests (correlation, mutual information) — but they evaluate features independently and miss interaction effects between features

DDomain knowledge alone — algorithmic selection is only valid for large datasets

Question 3 True / False

Adding more features to a model generally improves performance because the model can typically learn to ignore features that are irrelevant.

TTrue

FFalse

Question 4 True / False

Performing feature selection using only training data, then applying the same selection to the test set, is a valid and complete safeguard against data leakage in feature selection.

TTrue

FFalse

Question 5 Short Answer

Why does feature engineering often matter more than algorithm choice in applied machine learning, and what is the guiding question when deciding whether to create a new feature?

Think about your answer, then reveal below.

Questions: Feature Engineering and Selection