An SVR model is trained with ε = 0.5. A training point has a predicted value of 10.0 and an actual value of 10.3. How does this point affect the model's parameters?
AIt contributes a loss of 0.3 × C, penalized proportionally to how far it falls outside the tube
BIt contributes nothing to the loss — it falls within the epsilon tube and is completely ignored when determining model parameters
CIt becomes a support vector because its prediction is not exactly correct
DIt contributes a squared penalty of 0.3² as in ordinary least squares regression
The point at 10.3 is only 0.3 away from the prediction of 10.0, which is inside the ε = 0.5 tube. The epsilon-insensitive loss is exactly zero for any deviation within ε. This point is not a support vector and contributes nothing to shaping the model — it is treated as 'close enough.' This is a fundamental difference from ordinary linear regression, where even this tiny 0.3-unit deviation would contribute a nonzero squared error. Only points outside the tube affect the model.
Question 2 Multiple Choice
In ordinary least squares regression, every training point — including those very close to the fitted line — contributes to the model parameters. How does SVR with ε = 1.0 handle a point that is 0.1 units from the prediction?
AIt contributes equally to SVR and linear regression since the numerical deviation is the same
BIt contributes more to SVR because support vector methods weight points near the boundary more heavily
CIt contributes nothing to SVR — it lies inside the epsilon tube and is ignored when determining model parameters
DIt contributes to SVR only if it happens to be geometrically closest to the regression hyperplane
With ε = 1.0, a deviation of 0.1 falls deep inside the insensitivity tube. SVR assigns exactly zero loss to it. In contrast, ordinary least squares would assign a squared loss of 0.01, which still influences the fit. The epsilon tube in SVR creates a 'dead zone' — points inside it are irrelevant to the model, regardless of how many there are. Only points that violate the tube boundary (the support vectors) determine the regression function. This is the defining structural difference between SVR and OLS.
Question 3 True / False
In SVR, increasing ε (the tube width) while holding all else constant generally results in fewer support vectors and a simpler, smoother model.
TTrue
FFalse
Answer: True
A wider epsilon tube means more training points fall inside it and incur zero loss — they become irrelevant to the model. Fewer points fall outside the tube, so fewer support vectors exist. Fewer support vectors means the model is defined by less data and is mathematically simpler, typically producing a smoother, less complex regression function. Conversely, a very narrow ε forces almost every point to contribute to the model, potentially overfitting to noise.
Question 4 True / False
Like ordinary least squares linear regression, SVR uses the entire training set to determine the final regression function.
TTrue
FFalse
Answer: False
SVR uses only the support vectors — the training points that fall outside or exactly on the boundary of the epsilon tube — to determine the model. Points inside the tube contribute zero loss and have no influence on the model parameters whatsoever. In contrast, OLS uses every single training point (the loss is nonzero for any deviation from the line). SVR's selective use of only boundary points is what makes it memory-efficient at inference and gives it the geometric elegance inherited from SVM classification.
Question 5 Short Answer
Explain why SVR is described as 'robust to outliers' compared to ordinary least squares regression. What role does the epsilon-insensitive tube play in this robustness?
Think about your answer, then reveal below.
Model answer: In ordinary least squares, each point contributes a squared error proportional to its distance from the fit. Outliers — points far from the main trend — contribute disproportionately large squared errors that strongly pull the fit toward them, distorting the model. In SVR, any point within the epsilon tube contributes zero loss, and points outside the tube contribute only a linear penalty (not squared). Even a significant outlier contributes only linearly to the loss function rather than quadratically, limiting its ability to distort the model. The tube also means that moderate noise near the prediction surface is entirely ignored.
The key contrast is squared vs. linear penalty. OLS's squared loss amplifies the influence of distant points — doubling an error quadruples its contribution. SVR's epsilon-insensitive loss (zero inside the tube, linear outside) caps the relative influence of any single point. This is the source of SVR's robustness, analogous to how robust regression methods using absolute loss (L1) are more outlier-resistant than squared loss (L2) methods.