正则化

过拟合问题(The Problem of Overfitting)

如上图所示,第一个采用单变量线性回归模型来拟合数据集,但其效果并不好,因此我们将这种情况称为欠拟合(Underfitting)或高偏差(High Bias);第二个采用二次多项式的线性回归模型来拟合数据集,其效果恰好,因此我们将这种情况称为“Just Right”;第三个采用四次多项式的线性回归模型来拟合数据集,其虽然对数据集拟合的非常好,但其曲线忽上忽下难以针对新数据进行预测,因此我们将这种情况称为过拟合(Overfitting)或高方差( high variance)。

除此之外,逻辑回归模型也存在上述情况,如下图所示:

根据在线性回归模型中的分析,我们不难得知第一个为欠拟合,第二个最合适,第三个过拟合。

现在我们来看看过拟合的定义:

即若数据集中存在许多特征变量,我们通过使用高次方多项式来拟合数据集,其看似将数据集中的每个数据都拟合得很好,但其对于新数据的处理就无法做得很好,即泛化较差(泛化指一个假设模型能应用到新样板的能力),这时我们将其称为过拟合。

Question:
Consider the medical diagnosis problem of classifying tumors as malignant or begin. If a hypothesis hθ(x) has overfit the training set, it means that:
A. It makes accurate predictions for examples in the training set and generalizes well to make accurate predictions on new, previously unseen examples.
B. It does not make accurate predictions for examples in the training set, but it does generalize well to make accurate predictions on new, previously unseen example.
C. It makes accurate predictions for examples in the training set, but it does not generalize well to make accurate predictions on new, previously unseen examples.
D. It does not make accurate predictions for examples in the training set and does not generalize well to make accurate predictions on new, previously unseen examples.

根据过拟合的定义我们不难得知C为正确答案。

针对过拟合问题,我们有如下方法来解决:

  1. 减少特征变量的个数:
    • 人工选择特征变量
    • 使用模型选择算法,自动选择特征变量
  2. 正则化:保留所有特征变量,但减小参数θj的值
补充笔记
The Problem of Overfitting

Consider the problem of predicting y from x ∈ R. The leftmost figure below shows the result of fitting a y = θ01x to a dataset. We see that the data doesn’t really lie on straight line, and so the fit is not very good.

Underfitting, or high bias, is when the form of our hypothesis function h maps poorly to the trend of the data. It is usually caused by a function that is too simple or uses too few features. At the other extreme, overfitting, or high variance, is caused by a hypothesis function that fits the available data but does not generalize well to predict new data. It is usually caused by a complicated function that creates a lot of unnecessary curves and angles unrelated to the data.

This terminology is applied to both linear and logistic regression. There are two main options to address the issue of overfitting:

  1. Reduce the number of features:
    • Manually select which features to keep.
    • Use a model selection algorithm (studied later in the course).
  2. Regularization
    • Keep all the features, but reduce the magnitude of parameters θj.
    • Regularization works well when we have a lot of slightly useful features.
代价函数(Cost Function)

若假设函数hθ(x) = θ0 + θ1x1 + θ2x22 + θ3x33 + θ4x44,则会出现对下图数据集过拟合的情况。

现假设所有的特征变量x都是非常重要的,因此我们不能舍弃任何一个特征变量x。为了解决这个问题,我们使用正则化的方法将参数θj的值变小。

为此我们需要将代价函数J(θ)修改为如下图所示那样:

当我们使用梯度下降算法或其他高级算法来求得了参数θ的值来使得代价函数J(θ)最小化时,其θ3和θ4的值相比之前对新数据预测的影响要小。为什么呢?

这时因为我们通过使用正则化方法,在求得代价函数J(θ)最小化时,其θ3和θ4的值会无限接近于0。因此,假设函数hθ(x)甚至可以改写为hθ(x) = θ0 + θ1x1 + θ2x22

如若某个数据集中有非常多的特征变量x且每个特征变量都非常重要,为了避免过拟合问题,我们可将代价函数J(θ)修改为:

其中λ称为正则化参数(Regularization Parameter)。因此,我们将这种方法称为正则化。

注:此处我们无需考虑θ0

对于正则化参数λ的选择我们也要慎重,一旦其值过大,则θ1,θ2,θ3和θ4都会无限接近于0。此时,假设函数hθ(x)甚至可以改写为hθ(x) = θ0

其结果如图中红线所示,这样就出现了欠拟合问题。

补充笔记
Cost Function

If we have overfitting from our hypothesis function, we can reduce the weight that some of the terms in our function carry by increasing their cost.

Say we wanted to make the following function more quadratic:

We'll want to eliminate the influence of θ3x3 and θ4x4 . Without actually getting rid of these features or changing the form of our hypothesis, we can instead modify our cost function:

We've added two extra terms at the end to inflate the cost of θ3 and θ4. Now, in order for the cost function to get close to zero, we will have to reduce the values of θ3 and θ4 to near zero. This will in turn greatly reduce the values of θ3x3 and θ4x4 in our hypothesis function. As a result, we see that the new hypothesis (depicted by the pink curve) looks like a quadratic function but fits the data better due to the extra small terms θ3x3 and θ4x4.

We could also regularize all of our theta parameters in a single summation as:

The λ, or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated.

Using the above cost function with the extra summation, we can smooth the output of our hypothesis function to reduce overfitting. If lambda is chosen to be too large, it may smooth out the function too much and cause underfitting.

正则化的线性回归(Regularized Linear Regression)

正则化的代价函数J(θ)为:

现在我们使用学过的梯度下降算法和正规方程法来求出使得代价函数J(θ)最小化的参数θ的值。

梯度下降算法

由于在正则化过程中,我们不对θ0做任何处理,于是梯度下降算法的表达式为:

对于j=1, 2, 3, ...时的迭代表达式可改写为:

其中1-α*λ/m﹤1一定成立。

正规方程

正则化的正规方程的公式为:

其中L矩阵为(n+1)*(n+1)。

对于样本数量m小于特征变量x的个数n时,XTX为不可逆矩阵(奇异矩阵),若如我们在Octave中使用pinv()函数则可求出其伪逆矩阵,但使用inv()则无法求出其可逆矩阵。

注:对于样本数量m等于特征变量x的个数n时,XTX可能为不可逆矩阵(奇异矩阵)。

存在正则化参数λ﹥0时,即使当样本数量m小于等于特征变量x的个数n时,XTX为不可逆矩阵,也可使用inv()求出其可逆矩阵。

补充笔记
Regularized Linear Regression

We can apply regularization to both linear regression and logistic regression. We will approach linear regression first.

Gradient Descent

We will modify our gradient descent function to separate out θ0 from the rest of the parameters because we do not want to penalize θ0.

Normal Equation

Now let's approach regularization using the alternate method of the non-iterative normal equation.

To add in regularization, the equation is the same as our original, except that we add another term inside the parentheses:

L is a matrix with 0 at the top left and 1's down the diagonal, with 0's everywhere else. It should have dimension (n+1)×(n+1). Intuitively, this is the identity matrix (though we are not including x0), multiplied with a single real number λ.

Recall that if m < n, then XTX is non-invertible. However, when we add the term λ⋅L, then XTX + λ⋅L becomes invertible.

正则化的逻辑回归(Regularized Logistic Regression)

正则化的逻辑回归模型的代价函数J(θ)为:

梯度下降算法

其中hθ(x) = g(θTX)。

高级优化算法

首先,创建costFunction.m文件并在文件中按如下图所示写出相关函数代码:

然后,如之前在逻辑回归(二)一文中所讲,在Octave中调用fminunc()函数,具体操作可回顾逻辑回归(二)一文。

补充笔记
Regularized Logistic Regression

We can regularize logistic regression in a similar way that we regularize linear regression. As a result, we can avoid overfitting. The following image shows how the regularized function, displayed by the pink line, is less likely to overfit than the non-regularized function represented by the blue line:

Cost Function

Recall that our cost function for logistic regression was:

We can regularize this equation by adding a term to the end:

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 203,324评论 5 476
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,303评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 150,192评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,555评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,569评论 5 365
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,566评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,927评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,583评论 0 257
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,827评论 1 297
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,590评论 2 320
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,669评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,365评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,941评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,928评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,159评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,880评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,399评论 2 342

推荐阅读更多精彩内容

  • 有时候,你会发现自己的想法,做法,没人会认同,会支持。那一刻,你就会感觉到很孤独。孤独到只剩下身后的那个影子,有点...
    宁默默阅读 315评论 0 5
  • People far away. Are you worried? The long way home. A be...
    元初冲冲冲阅读 472评论 6 4
  • 今日关键词:【优秀】 乔布斯说,1个优秀的人能抵得上50个普通员工。那么,该如何定义优秀人呢?所谓优秀人才,至少要...
    罗艺律师阅读 1,162评论 0 2
  • - 等刘诗雯反应过来的时候,自己已经被压倒在了床上,身上是火热的身体。一抬眼,自己对着的,正是张继科那双带着血丝的...
    大海鱼汤阅读 7,637评论 0 14