计算多分类时的每个类别的F1
- 接口
sklearn.metrics.classification_report(y_true, y_pred, labels=None, target_names=None, sample_weight=None, digits=2, output_dict=False)
示例:
from sklearn.metrics import classification_report
y_true = [0,0, 1, 2, 2, 2, 0]
y_pred = [0, 1, 0, 2, 2, 1, 0]
target_names = ['dog', 'pig', 'cat']
result = classification_report(y_true, y_pred, target_names=target_names, output_dict=True)
print(result)
pytorch 使用K-折交叉验证
核心代码
# Define the K-fold Cross Validator
kfold = KFold(n_splits=k_folds, shuffle=True)
# K-fold Cross Validation model evaluation
for fold, (train_ids, test_ids) in enumerate(kfold.split(dataset))
# Sample elements randomly from a given list of ids, no replacement.
train_subsampler = torch.utils.data.SubsetRandomSampler(train_ids)
test_subsampler = torch.utils.data.SubsetRandomSampler(test_ids)
# Define data loaders for training and testing data in this fold
trainloader = torch.utils.data.DataLoader(
dataset,
batch_size=10, sampler=train_subsampler)
testloader = torch.utils.data.DataLoader(
dataset,
batch_size=10, sampler=test_subsampler)
Pytorch的nn.CrossEntropyLoss()的weight使用
Pytorch的nn.CrossEntropyLoss()的weight使用
- 大多使用:1/类别出现的次数, 有人建议使用:出现类别最多的数目/自身类别出现的次数
核心代码
weights = [1/1016, 1/12852, 1/12888, 1/3380, 1/296] #[ 1 / number of instances for each class]
class_weights = torch.FloatTensor(weights).cuda()
criterion = torch.nn.CrossEntropyLoss(weight=class_weights)
BERT模型中车cased 是需要区分大小写的,也就是字符不要lower() . uncased 是不区分大小写的,也就是此表只有小写,字符需要lower()
马氏距离的计算
import numpy as np
from scipy.spatial.distance import mahalanobis
def mahalanobis_distance(p, distr):
# p: a point
# distr : a distribution
# covariance matrix
cov = np.cov(distr, rowvar=False)
# average of the points in distr
avg_distri = np.average(distr, axis=0)
dis = mahalanobis(p, avg_distri, cov)
return dis