EXE1
(a) better - a more flexible approach will fit the data closer and with the large sample size a better fit
(b) worse - a flexible method would overfit the small number of observations
(c) better - with more degrees of freedom, a flexible model would obtain a better fit
(d) worse - flexible methods fit to the noise in the error terms and increase variance
EXE 2
a)regression.inference.
b)classification.prediction.
c)regression.prediction.
EXE 3
bias-More flexible,smaller bias.
variance-More flexible,larger variance.
training error-More flexible,smaller training error.
test error -More flexible,U-shape curve.
Bayes (irreducible) error -constant. defines the lower limit, the test error is bounded below by the irreducible error due to variance in the error (epsilon) in the output
values (0 <= value). When the training error is lower than the irreducible error,overfitting has taken place.The Bayes error rate is defined for classification problems and is determined by the ratio of data points which lie at the 'wrong' side of the decision boundary, (0 <= value < 1).
EXE 4
...
EXE 5
Flexible models will fit the data closer with smaller bias but larger variance and obtain a better fit for non-linear data.
If the number of observations is small,flexible models is easy to get overfitting(fit the noise).So they need a larger scale of data.
We prefer a more flexible model when we are interested in prediction rather than the interpretability(解释性)。
We prefer a less flexible model when we are interested in the interpretability and inference.
EXE 6
A parametric approach reduces the problem of estimating f down to one of estimating a set of parameters because it assumes a form for f.
A non-parametric approach does not assume a functional form for f and so requires a very large number of observations to accurately estimate f.
The advantages of a parametric approach to regression or classification are the simplifying of modeling f to a few parameters and not as many observations are required compared to a non-parametric approach.
The disadvantages of a parametric approach to regression or classification are a potential to inaccurately estimate f if the form of f assumed is wrong or to overfit the observations if more flexible models are used.
EXE 7
Small.When k is larger,the boundary will be more linear.
Refer to https://raw.githubusercontent.com/asadoughi/stat-learning/master/ch2/answers