目标
尽量使用基础代码实现线性回归.
数据准备
iris 数据集
主线流程
核心公式:
y=wX+b
读取文件内容,
清洗数据,获取想要的信息:X,y
进行回归
画图看效果
读取文件
# 读取文件
data = pd.read_csv("iris.data", names=["x", "y", "c", "d", "itype"])
# 读取iris的前50行
first_50 = data.loc[:49]
# 拿取前两列
first_50xy = first_50[["x", "y"]]
# 拿取X,y 并且转换成numpy格式
X = np.array(first_50xy["x"])
y = np.array(first_50xy["y"])
回归运算,使用全局梯度下降BGD更新w,b
意思是,要更新wb,就是求偏导part,然后不断更新
w = w - α * part1
b = b - α * part2
# 进入argmin 采用全局梯度下降
def BGD(X, y, w, b, alpha, n_iter=1000):
m = len(X)
for i in range(n_iter):
# 注意这里的X.T
w = w - alpha * np.sum(2 * X.T * (w * X + b - y)) / m
b = b - alpha * np.sum(2 * (w * X + b - y)) / m
return w, b
优化方案1
加入loss函数,提前结束算法
def loss(x, y, w, b):
loss = 0
length = len(x)
for ind in range(length):
loss += (y[ind] - (w * x[ind] + b)) ** 2
# 除2m 其实对结果没有什么影响
return loss / (2 * length)
更改BGD,加入判断
def BGD(X, y, w, b, alpha, n_iter=1000):
m = len(X)
for i in range(n_iter):
w = w - alpha * np.sum(2 * X.T * (w * X + b - y)) / m
b = b - alpha * np.sum(2 * (w * X + b - y)) / m
# 每1000次,进行一次loss运算,如果已经很小,则直接退出
if n_iter %1000 ==0 and loss(X, y, w, b) < 0.04:
print(i)
return w, b
return w, b
画图以及主函数整理
if __name__ == "__main__":
time_start = time.time()
# 读取文件
data = pd.read_csv("iris.data", names=["x", "y", "c", "d", "itype"])
# 读取iris的前50行
first_50 = data.loc[:49]
# 拿取前两列
first_50xy = first_50[["x", "y"]]
# 拿取X,y 并且转换成numpy格式
X = np.array(first_50xy["x"])
y = np.array(first_50xy["y"])
# 线性回归的输入y=wx+b
w = 1
b = 5
alpha = 0.01
w, b = BGD(X, y, w, b, alpha, n_iter=100000)
time_end = time.time()
print('totally cost', time_end - time_start)
# 画图
# 画点
plt.plot(X, y, "ro")
# 画线
x_min = min(X)
x_max = max(X)
y_min = x_min * w + b
y_max = x_max * w + b
xs = [x_min, x_max]
ys = [y_min, y_max]
plt.plot(xs, ys, )
plt.show()
github地址:
https://github.com/Lin4856/ML-in-Python-485/blob/master/yk_basic/linear/linear_%CE%B1.py
相关友情链接:https://blog.csdn.net/weixin_43944175/article/details/95899457