第一章:回归
1.获取数据集
Fashion-MNIST是一个10类服饰分类数据集,体量比较小所以使用它
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import time
import sys
sys.path.append("D:\anaconda\Lib") # 为了导⼊上层⽬录的d2lzh_pytorch
import d2lzh_pytorch as d2l
mnist_train = torchvision.datasets.FashionMNIST(root='D:/program/vs code/动手学/Datasets/FashionMNIST', train=True, download=True, transform=transforms.ToTensor())
mnist_test = torchvision.datasets.FashionMNIST(root='D:/program/vs code/动手学/Datasets/FashionMNIST', train=False, download=True, transform=transforms.ToTensor())
print(type(mnist_train))
print(len(mnist_train), len(mnist_test))
feature, label = mnist_train[0]
print(feature.shape, label)
d2lzh_pytorch是把本实验中常用的函数集合在一起。
输出:
<class 'torchvision.datasets.mnist.FashionMNIST'>
60000 10000
torch.Size([1, 28, 28]) 9
.shape是Channel x Height x Width,有10个类,他是从0开始的所以打印出来是9.
Fashion-MNIST中一共包括了10个类别,可以把他们换成字符,而不是数字,这就调用了之前d2lzh_pytorch的trick:
def get_fashion_mnist_labels(labels):
text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
return [text_labels[int(i)] for i in labels]
定义一个可以在一行里画出多张图像和对应标签的函数。
def show_fashion_mnist(images, labels):
d2l.use_svg_display()
# 这里的_表示我们忽略(不使用)的变量
_, figs = plt.subplots(1, len(images), figsize=(12, 12))
for f, img, lbl in zip(figs, images, labels):
f.imshow(img.view((28, 28)).numpy())
f.set_title(lbl)
f.axes.get_xaxis().set_visible(False)
f.axes.get_yaxis().set_visible(False)
plt.show()
打印:
X, y = [], []#新建两个列表
for i in range(10):
X.append(mnist_train[i][0])#在列表末尾添加新的对象
y.append(mnist_train[i][1])
show_fashion_mnist(X, get_fashion_mnist_labels(y))#放进去每个类中第一个实例和转换完后的label
2. 读取小批量
batch_size = 256
if sys.platform.startswith('win'):
num_workers = 0 # 0表示不用额外的进程来加速读取数据
else:
num_workers = 4
train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers)
test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers)
start = time.time()
for X, y in train_iter:
continue
print('%.2f sec' % (time.time() - start))
这一小节比较简单,介绍了如何加载数据,加载数据时可以规定一个批次的大小,调用几个线程,加载在cpu还是gpu等等。
torch.utils.data.DataLoader参数列表:实用的就那几个参数,加粗了
class torch.utils.data.DataLoader(dataset,batch_size=1,shuffle=False,sampler=None,batch_sampler=None,num_workers=0,collate_fn<functiondefault_collate>,pin_memory=False,drop_last=False,timeout=0,worker_init_fn=None)
含义:
1、dataset:(数据类型 dataset)
输入的数据类型。看名字感觉就像是数据库,C#里面也有dataset类,理论上应该还有下一级的datatable。这应当是原始数据的输入。PyTorch内也有这种数据结构。这里先不管,估计和C#的类似,这里只需要知道是输入数据类型是dataset就可以了。
2、batch_size:(数据类型 int)
每次输入数据的行数,默认为1。PyTorch训练模型时调用数据不是一行一行进行的(这样太没效率),而是一捆一捆来的。这里就是定义每次喂给神经网络多少行数据,如果设置成1,那就是一行一行进行(个人偏好,PyTorch默认设置是1)。
3、shuffle:(数据类型 bool)
洗牌。默认设置为False。在每次迭代训练时是否将数据洗牌,默认设置是False。将输入数据的顺序打乱,是为了使数据更有独立性,但如果数据是有序列特征的,就不要设置成True了。
4、collate_fn:(数据类型 callable,没见过的类型)
将一小段数据合并成数据列表,默认设置是False。如果设置成True,系统会在返回前会将张量数据(Tensors)复制到CUDA内存中。(不太明白作用是什么,就暂时默认False)
5、batch_sampler:(数据类型 Sampler)
批量采样,默认设置为None。但每次返回的是一批数据的索引(注意:不是数据)。其和batch_size、shuffle 、sampler and drop_last参数是不兼容的。我想,应该是每次输入网络的数据是随机采样模式,这样能使数据更具有独立性质。所以,它和一捆一捆按顺序输入,数据洗牌,数据采样,等模式是不兼容的。
6、sampler:(数据类型 Sampler)
采样,默认设置为None。根据定义的策略从数据集中采样输入。如果定义采样规则,则洗牌(shuffle)设置必须为False。
7、num_workers:(数据类型 Int)
工作者数量,默认是0。使用多少个子进程来导入数据。设置为0,就是使用主进程来导入数据。注意:这个数字必须是大于等于0的,负数估计会出错。
8、pin_memory:(数据类型 bool)
内存寄存,默认为False。在数据返回前,是否将数据复制到CUDA内存中。
9、drop_last:(数据类型 bool)
丢弃最后数据,默认为False。设置了 batch_size 的数目后,最后一批数据未必是设置的数目,有可能会小些。这时你是否需要丢弃这批数据。
10、timeout:(数据类型 numeric)
超时,默认为0。是用来设置数据读取的超时时间的,但超过这个时间还没读取到数据的话就会报错。 所以,数值必须大于等于0。
11、worker_init_fn(数据类型 callable,没见过的类型)
子进程导入模式,默认为Noun。在数据导入前和步长结束后,根据工作子进程的ID逐个按顺序导入数据
数据读取模块
def load_data_fashion_mnist(batch_size, resize=None, root='~/Datasets/FashionMNIST'):
"""Download the fashion mnist dataset and then load into memory."""
trans = []
if resize:
trans.append(torchvision.transforms.Resize(size=resize))
trans.append(torchvision.transforms.ToTensor())
transform = torchvision.transforms.Compose(trans)
mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)
mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)
if sys.platform.startswith('win'):
num_workers = 0 # 0表示不用额外的进程来加速读取数据
else:
num_workers = 4
train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers)
test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers)
return train_iter, test_iter
这个模就负责返回train_iter, test_iter,其主要起作用的函数就是torch.utils.data.DataLoader,上一节提过了
3. softmax
import torch
import torchvision
import numpy as np
import sys
sys.path.append("D:\anaconda\Lib") # 为了导入上层目录的d2lzh_pytorch
import d2lzh_pytorch as d2l
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size,root="D:/program/vs code/动手学/Datasets/FashionMNIST")
num_inputs = 784#模型的输入向量的长度是28*28=789
num_outputs = 10#一共又10个类型
W = torch.tensor(np.random.normal(0, 0.01, (num_inputs, num_outputs)), dtype=torch.float)#正态分布,参数分别为loc, scale size
b = torch.zeros(num_outputs, dtype=torch.float) #初始化模型参数,全都是0
W.requires_grad_(requires_grad=True)#如果需要为张量计算梯度为True,否则为False
b.requires_grad_(requires_grad=True)
def softmax(X):
X_exp = X.exp()#对每个元素做指数运算
partition = X_exp.sum(dim=1, keepdim=True)#对其中同一列(dim=0)或同一行(dim=1)的元素求和,并在结果中保留行和列这两个维度(keepdim=True)。
return X_exp / partition # 这里应用了广播机制
X = torch.rand((2, 5))
X_prob = softmax(X)
print(X_prob, X_prob.sum(dim=1))
![softmax](https://upload-images.jianshu.io/upload_images/23550723-c3c690c1e805d269.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
输出
tensor([[0.1894, 0.2517, 0.1524, 0.2252, 0.1813],
[0.1592, 0.1721, 0.2165, 0.1756, 0.2765]]) tensor([1., 1.])
写了个简单的绘图,来观察softmax函数的特点
x = np.linspace(-10, 10, 200)
y = softmax(x)
print(x,y)
plt.plot(x,y)
plt.show()
4.训练
数据加载
import torch
from torch import nn
from torch.nn import init
import numpy as np
import torchvision
import torchvision.transforms as transforms
import sys
sys.path.append("D:\anaconda\Lib")
import d2lzh_pytorch as d2l
#数据读取部分
batch_size=256 #数据
num_workers=0
mnist_train = torchvision.datasets.FashionMNIST(root='D:/program/vs code/动手学/Datasets/FashionMNIST', train=True, download=True, transform=transforms.ToTensor())
mnist_test = torchvision.datasets.FashionMNIST(root='D:/program/vs code/动手学/Datasets/FashionMNIST', train=False, download=True, transform=transforms.ToTensor())
train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers)
test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers)
num_inputs = 784
num_outputs = 10
#网络的定义
模型定义
这是个最简单的线性网络
class LinearNet(nn.Module):
def __init__(self, num_inputs, num_outputs):
super(LinearNet, self).__init__()
self.linear = nn.Linear(num_inputs, num_outputs)
def forward(self, x): # x shape: (batch, 1, 28, 28)
y = self.linear(x.view(x.shape[0], -1))
return y
net = LinearNet(num_inputs, num_outputs)
超参数初始化
#均值为0、标准差为0.01的正态分布随机初始化模型的权重参数。
init.normal_(net.linear.weight, mean=0, std=0.01)
init.constant_(net.linear.bias, val=0)
loss = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.1)
num_epochs = 5
accuracy计算模块
#评估accuracy准确率的
def evaluate_accuracy(data_iter, net):
acc_sum, n = 0.0, 0
for X, y in data_iter:
acc_sum += (net(X).argmax(dim=1) == y).float().sum().item()#对二维矩阵来讲a[0][1]会有两个索引方向,第一个方向为a[0],默认按列方向搜索最大值,返回的是位置,就是标签
n += y.shape[0]
print(n)
return acc_sum / n
训练模块
def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,
params=None, lr=None, optimizer=None):
for epoch in range(num_epochs):
train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
for X, y in train_iter:
y_hat = net(X)#正向传播
l = loss(y_hat, y).sum()
# 梯度清零
if optimizer is not None:
optimizer.zero_grad()
elif params is not None and params[0].grad is not None:
for param in params:
param.grad.data.zero_()
l.backward()#backward(gradient=None, retain_variables=False)#反向传播计算
if optimizer is None:
d2l.sgd(params, lr, batch_size)
else:
optimizer.step()
#算准确率
train_l_sum += l.item()
train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
n += y.shape[0]
test_acc = evaluate_accuracy(test_iter, net)
print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'
% (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))
train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, None, None, optimizer)
epoch 1, loss 0.0031, train acc 0.751, test acc 0.797
epoch 2, loss 0.0022, train acc 0.815, test acc 0.809
epoch 3, loss 0.0021, train acc 0.826, test acc 0.816
epoch 4, loss 0.0020, train acc 0.832, test acc 0.820
epoch 5, loss 0.0019, train acc 0.837, test acc 0.796
epoch 6, loss 0.0019, train acc 0.839, test acc 0.826
epoch 7, loss 0.0018, train acc 0.842, test acc 0.830
epoch 8, loss 0.0018, train acc 0.845, test acc 0.831
epoch 9, loss 0.0018, train acc 0.846, test acc 0.825
epoch 10, loss 0.0017, train acc 0.849, test acc 0.832
epoch 11, loss 0.0017, train acc 0.850, test acc 0.835
epoch 12, loss 0.0017, train acc 0.850, test acc 0.837
epoch 13, loss 0.0017, train acc 0.852, test acc 0.832
epoch 14, loss 0.0017, train acc 0.853, test acc 0.840
epoch 15, loss 0.0017, train acc 0.853, test acc 0.838
epoch 16, loss 0.0017, train acc 0.854, test acc 0.803
epoch 17, loss 0.0017, train acc 0.855, test acc 0.838
epoch 18, loss 0.0017, train acc 0.855, test acc 0.839
epoch 19, loss 0.0017, train acc 0.856, test acc 0.835
epoch 20, loss 0.0016, train acc 0.857, test acc 0.833
epoch 21, loss 0.0016, train acc 0.857, test acc 0.840
epoch 22, loss 0.0016, train acc 0.857, test acc 0.838
epoch 23, loss 0.0016, train acc 0.858, test acc 0.843
epoch 24, loss 0.0016, train acc 0.858, test acc 0.840
epoch 25, loss 0.0016, train acc 0.859, test acc 0.838
epoch 26, loss 0.0016, train acc 0.859, test acc 0.839
epoch 27, loss 0.0016, train acc 0.859, test acc 0.842
epoch 28, loss 0.0016, train acc 0.860, test acc 0.840
epoch 29, loss 0.0016, train acc 0.860, test acc 0.840
epoch 30, loss 0.0016, train acc 0.859, test acc 0.834
epoch 31, loss 0.0016, train acc 0.861, test acc 0.827
epoch 32, loss 0.0016, train acc 0.860, test acc 0.823
epoch 33, loss 0.0016, train acc 0.861, test acc 0.841
epoch 34, loss 0.0016, train acc 0.862, test acc 0.842
epoch 35, loss 0.0016, train acc 0.862, test acc 0.838
epoch 36, loss 0.0016, train acc 0.862, test acc 0.834
epoch 37, loss 0.0016, train acc 0.861, test acc 0.836
epoch 38, loss 0.0016, train acc 0.863, test acc 0.838
epoch 39, loss 0.0016, train acc 0.863, test acc 0.843
epoch 40, loss 0.0016, train acc 0.863, test acc 0.842
epoch 41, loss 0.0016, train acc 0.863, test acc 0.843
epoch 42, loss 0.0016, train acc 0.864, test acc 0.844
epoch 43, loss 0.0016, train acc 0.863, test acc 0.839
epoch 44, loss 0.0016, train acc 0.863, test acc 0.831
epoch 45, loss 0.0016, train acc 0.864, test acc 0.843
epoch 46, loss 0.0016, train acc 0.864, test acc 0.835
epoch 47, loss 0.0016, train acc 0.863, test acc 0.840
epoch 48, loss 0.0015, train acc 0.864, test acc 0.844
epoch 49, loss 0.0015, train acc 0.865, test acc 0.829
epoch 50, loss 0.0015, train acc 0.864, test acc 0.842
50轮后稳定,50以后面就没展示
预测模块
X, y = iter(test_iter).next()
true_labels = d2l.get_fashion_mnist_labels(y.numpy())
pred_labels = d2l.get_fashion_mnist_labels(net(X).argmax(dim=1).numpy())
titles = [true + '\n' + pred for true, pred in zip(true_labels, pred_labels)]
d2l.show_fashion_mnist(X[0:9], titles[0:9])#show_fashion_mnist之前说过了
换了个模型
用两层线性网络,中间relu一下,提高了6个点
class MYNet(nn.Module):
def __init__(self, num_inputs, num_outputs):
super(MYNet, self).__init__()
self.linear1 = nn.Linear(num_inputs, hidden_layer)
self.relu1=torch.nn.ReLU()
self.linear2 = nn.Linear(hidden_layer, num_outputs)
def forward(self, x): # x shape: (batch, 1, 28, 28)
y = self.linear1(x.view(x.shape[0], -1))
y = self.relu1(y)
y = self.linear2(y)
return y
epoch 1, loss 0.0036, train acc 0.689, test acc 0.751
epoch 2, loss 0.0022, train acc 0.799, test acc 0.807
epoch 3, loss 0.0020, train acc 0.824, test acc 0.779
epoch 4, loss 0.0018, train acc 0.838, test acc 0.823
epoch 5, loss 0.0017, train acc 0.845, test acc 0.835
epoch 6, loss 0.0016, train acc 0.852, test acc 0.838
epoch 7, loss 0.0016, train acc 0.856, test acc 0.846
epoch 8, loss 0.0015, train acc 0.860, test acc 0.809
epoch 9, loss 0.0015, train acc 0.865, test acc 0.851
epoch 10, loss 0.0015, train acc 0.868, test acc 0.857
epoch 11, loss 0.0014, train acc 0.871, test acc 0.859
epoch 12, loss 0.0014, train acc 0.873, test acc 0.840
epoch 13, loss 0.0014, train acc 0.876, test acc 0.857
epoch 14, loss 0.0013, train acc 0.878, test acc 0.866
epoch 15, loss 0.0013, train acc 0.880, test acc 0.865
epoch 16, loss 0.0013, train acc 0.880, test acc 0.849
epoch 17, loss 0.0013, train acc 0.884, test acc 0.861
epoch 18, loss 0.0013, train acc 0.885, test acc 0.869
epoch 19, loss 0.0012, train acc 0.887, test acc 0.866
epoch 20, loss 0.0012, train acc 0.889, test acc 0.859
epoch 21, loss 0.0012, train acc 0.891, test acc 0.866
epoch 22, loss 0.0012, train acc 0.891, test acc 0.864
epoch 23, loss 0.0012, train acc 0.893, test acc 0.868
epoch 24, loss 0.0011, train acc 0.894, test acc 0.874
epoch 25, loss 0.0011, train acc 0.895, test acc 0.872
epoch 26, loss 0.0011, train acc 0.897, test acc 0.872
epoch 27, loss 0.0011, train acc 0.898, test acc 0.866
epoch 28, loss 0.0011, train acc 0.899, test acc 0.880
epoch 29, loss 0.0011, train acc 0.900, test acc 0.861
epoch 30, loss 0.0011, train acc 0.902, test acc 0.880
epoch 31, loss 0.0011, train acc 0.903, test acc 0.813
epoch 32, loss 0.0010, train acc 0.905, test acc 0.878
epoch 33, loss 0.0010, train acc 0.904, test acc 0.875
epoch 34, loss 0.0010, train acc 0.907, test acc 0.878
epoch 35, loss 0.0010, train acc 0.906, test acc 0.881
epoch 36, loss 0.0010, train acc 0.908, test acc 0.878
epoch 37, loss 0.0010, train acc 0.910, test acc 0.867
epoch 38, loss 0.0010, train acc 0.910, test acc 0.884
epoch 39, loss 0.0010, train acc 0.912, test acc 0.877
epoch 40, loss 0.0010, train acc 0.913, test acc 0.881
epoch 41, loss 0.0009, train acc 0.914, test acc 0.874
epoch 42, loss 0.0009, train acc 0.915, test acc 0.880
epoch 43, loss 0.0009, train acc 0.914, test acc 0.857
epoch 44, loss 0.0009, train acc 0.916, test acc 0.876
epoch 45, loss 0.0009, train acc 0.916, test acc 0.884
epoch 46, loss 0.0009, train acc 0.918, test acc 0.874
epoch 47, loss 0.0009, train acc 0.918, test acc 0.885
epoch 48, loss 0.0009, train acc 0.919, test acc 0.888
epoch 49, loss 0.0009, train acc 0.920, test acc 0.876
epoch 50, loss 0.0009, train acc 0.920, test acc 0.882
第二章:体会过拟合和欠拟合
import torch
import numpy as np
import sys
import matplotlib as mth
import matplotlib.pyplot as plt
import pylab
sys.path.append("D:\anaconda\Lib")
import d2lzh_pytorch as d2l
n_train, n_test, true_w, true_b = 100, 100, [1.2, -3.4, 5.6], 5
features = torch.randn((n_train + n_test, 1))#生成200个随机数来做训练和测试集合
poly_features = torch.cat((features, torch.pow(features, 2), torch.pow(features, 3)), 1) #torch.pow是求指数运算
labels = (true_w[0] * poly_features[:, 0] + true_w[1] * poly_features[:, 1]
+ true_w[2] * poly_features[:, 2] + true_b)#那200个数据带入多项式后的结果,在检测中就叫labels
labels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()), dtype=torch.float)#每个标价都加一个从0到0.01的随机偏执
#画图函数
def semilogy(x_vals, y_vals, x_label, y_label, x2_vals=None, y2_vals=None,
legend=None, figsize=(3.5, 2.5)):
d2l.set_figsize(figsize)
d2l.plt.xlabel(x_label)
d2l.plt.ylabel(y_label)
d2l.plt.semilogy(x_vals, y_vals)
if x2_vals and y2_vals:
d2l.plt.semilogy(x2_vals, y2_vals, linestyle=':')
d2l.plt.legend(legend)
num_epochs, loss = 100, torch.nn.MSELoss()
def fit_and_plot(train_features, test_features, train_labels, test_labels):
net = torch.nn.Linear(train_features.shape[-1], 1)
#用最简单的先行层就行,要先转置一下,变成列向量,输出是一个元素y
# 通过Linear文档可知,pytorch已经将参数初始化了,所以我们这里就不手动初始化了
batch_size = min(10, train_labels.shape[0])
dataset = torch.utils.data.TensorDataset(train_features, train_labels)
#TensorDataset对给定的tensor数据(样本和标签),将它们包装成dataset,要求输入必须是Tensor。
train_iter = torch.utils.data.DataLoader(dataset, batch_size, shuffle=True)#加载刚才定义的数据集
optimizer = torch.optim.SGD(net.parameters(), lr=0.01)
train_ls, test_ls = [], []#loss
for _ in range(num_epochs):
for X, y in train_iter:
l = loss(net(X), y.view(-1, 1))#最后得到一个列向量
optimizer.zero_grad()
l.backward()
optimizer.step()
train_labels = train_labels.view(-1, 1)
test_labels = test_labels.view(-1, 1)
train_ls.append(loss(net(train_features), train_labels).item())
test_ls.append(loss(net(test_features), test_labels).item())
print('final epoch: train loss', train_ls[-1], 'test loss', test_ls[-1])
semilogy(range(1, num_epochs + 1), train_ls, 'epochs', 'loss',
range(1, num_epochs + 1), test_ls, ['train', 'test'])
print('weight:', net.weight.data,
'\nbias:', net.bias.data)
fig = plt.figure(figsize = (16, 4))
ax1 = plt.subplot(1,3,1)
plt.sca(ax1)
fit_and_plot(poly_features[:n_train, :], poly_features[n_train:, :],
labels[:n_train], labels[n_train:])
ax2 = plt.subplot(1,3,2)
plt.sca(ax2)
fit_and_plot(features[:n_train, :], features[n_train:, :], labels[:n_train],
labels[n_train:])
ax3 = plt.subplot(1,3,3)
plt.sca(ax3)
fit_and_plot(poly_features[0:2, :], poly_features[n_train:, :], labels[0:2],
labels[n_train:])
pylab.show()
final epoch: train loss 8.301918569486588e-05 test loss 0.00011247050133533776
weight: tensor([[ 1.1989, -3.3984, 5.5999]])
bias: tensor([4.9993])
final epoch: train loss 198.7726593017578 test loss 295.92022705078125
weight: tensor([[18.8718]])
bias: tensor([1.1912])
final epoch: train loss 0.8291055560112 test loss 380.7915344238281
weight: tensor([[2.0050, 1.7987, 1.8384]])
bias: tensor([3.1670])
目标函数是:
营造欠拟和是靠用一元函数来训练多元函数
过拟合是用训练的时候,营造训练数据不足来
第三章:模型构造
1.通过继承module
就是那种最复杂的,:
class MLP(nn.Module):
def __init__(self, 其他参数):
def forward(self, x):
2.使用module的子类Sequential:
net = MySequential(
nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 10),
)