Contents
- Should I manually set model mode to
train()
oreval()
?- Dropout behaves different in train and test mode ?
- BatchNorm behaves different in train and test mode?
- Set accessible GPUs your code can run on
- Get one batch from DataLoader
- What does pytorch detach() do?
Should I manually set model mode to train()
or eval()
?
- By default , in pytorch, all the modules are initialized to train mode (
self.training = True
). You can set the model in train mode by manually callmodel.train()
, but it is an optional operation.- Also be aware that some layers have different behavior during train and evaluation (like
BatchNorm
,Dropout
) so setting it matters.- As a rule of thumb for programming in general, try to explicitly state your intent and set
model.train()
andmodel.eval()
when necessary.
Dropout behaves different in train and test mode ?
Dropout layer is defined in torch.nn
module and is used in the training phase to reduce the chance of overfitting. However, when we apply our trained model, we want to use the full power of the model, i.e. to use all neurons (no element is masked) in the trained model to obtain a higher accuracy.
- During training,
Dropout
randomly zeroes some of the elements of the input tensor with a pre-defined probabilityp
using samples from a Bernoulli distribution. The elements to zero are randomized on every forward call.- During training, the outputs are scaled by a factor of .
- During evaluation, the module simply computes an identity function.
BatchNorm behaves different in train and test mode?
According to torch.nn.BatchNorm2d
interpretation in pytorch doc:
- By default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default
momentum
of 0.1.- If
track_running_stats
is set toFalse
, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.
Let's have a look at the BatchNorm2d
module:
class torch.nn.BatchNorm2d(
num_features,
eps=1e-05,
momentum=0.1,
affine=True,
track_running_stats=True)
It is very clear that the track_running_stats
is set True
. So BatchNorm2d
will keep a running estimate of its computed mean and variance and, moreover, the running mean/variance is used for normalization during evaluation.
Set accessible GPUs your code can run on
It is common that one machine can have 2 or more GPU cards installed and a group people share the limited resource. For example, your machine has 2 1080Ti and your colleague is running his code on the first GPU indexed by gpu:0
. He almost used out the GPU memory, so you cannot launch your code on the same device because it will throw a Out of memory
error.
However, you are absolutely the one who comes across the Out of memory
error if you directly run your code without any specific setting, let's say model.cuda()
. That's due to the default setting. Let's make it more clear, in pytorch, it always uses the first device (index=0
) .
So how can we get around this problem? Here is the solution:
Solution One: explicitly change the device.
x = torch.Tensor([1,2,3]).cuda() # or
x = torch.Tensor([1,2,3], device=torch.device("cuda")) # or
x = torch.Tensor([1,2,3]).cuda(torch.device("cuda")) # or
x = torch.Tensor([1,2,3]).to(device=torch.device("cuda"))
# x.device is device(type="cuda", index=0), the default one in the context
with torch.cuda.device(1):
x = torch.Tensor([1,2,3]).cuda() # or
x = torch.Tensor([1,2,3], device=torch.device("cuda")) # or
x = torch.Tensor([1,2,3]).cuda(torch.device("cuda")) # or
x = torch.Tensor([1,2,3]).to(device=torch.device("cuda"))
# x.device is device(type="cuda", index=1), the default one in the context
x = torch.Tensor([1,2,3], device=torch.device("cuda:0")) # or
x = torch.Tensor([1,2,3]).cuda(torch.device("cuda:0")) # or
x = torch.Tensor([1,2,3]).to(device=torch.device("cuda:0"))
# x.device is device(type="cuda", index=0), regardless the context
Note that the device context indicates the default device to use, but you can go out of the bounds by explicitly using other device, e.g. cuda:1
.
Solution Two: use CUDA_DEVICE_ORDER & CUDA_VISIBLE_DEVICES env variable.
See: CUDA_DEVICE_ORDER 环境变量说明 and CUDA_VISIBLE_DEVICES 环境变量说明
for more information.
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="1"
x = torch.Tensor([1,2,3]).cuda() # or
x = torch.Tensor([1,2,3], device=torch.device("cuda")) # or
x = torch.Tensor([1,2,3]).cuda(torch.device("cuda")) # or
x = torch.Tensor([1,2,3]).to(device=torch.device("cuda"))
# x.device is device(type="cuda", index=1), the default one in the context
Why CUDA_VISIBLE_DEVICES not working in PyTorch code?
Even strictly following the introduction aforementioned, sometimes you might run into the situation in which CUDA_VISIBLE_DEVICES env does not work as expected, say we got 4 GPU installed on a machine, and we want to run our code on the 3rd GPU by setting CUDA_VISIBLE_DEVICES =2
:
import os
...
...
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="2"
...
...
However, the code run on the 1st GPU all the time. The strange thing is everything works well when CUDA_DEVICE_ORDER
and CUDA_DEVICE_ORDER
env are set ahead, e.g.
CUDA_DEVICE_ORDER= PCI_BUS_ID, CUDA_VISIBLE_DEVICES=2 python code.py
If this is your situation, check and make sure os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
and
os.environ["CUDA_VISIBLE_DEVICES"]="2"
are set before you call torch.cuda.is_available()
or torch.Tensor.cuda()
or any other PyTorch built-in cuda function.
Never call cuda relevant functions when CUDA_DEVICE_ORDER &CUDA_VISIBLE_DEVICES is not set.
Get one batch from DataLoader
We usually construct a data loader and then enumerate it to retrieve data one batch after another.
for step, item in enumerate(dataloader):
## data consume
What if we want to get only one batch of data out of the data loader? DataLoader intrinsically does not support indexing, which means dataload[0]
fails to pull a batch of data. We can do that with the following code:
dataloaderI = iter(dataloader)
item = next(dataloaderI)
That's it.