准确率，精确率，召回率，F1-Score，灵敏度，特异度

https://blog.csdn.net/hfutdog/article/details/88085878

准确率

准确率=正确/所有

from sklearn.metrics import accuracy_score
y_pred = [0, 2, 1, 3]
y_true = [0, 1, 2, 3]
print(accuracy_score(y_true, y_pred))  # 0.5
print(accuracy_score(y_true, y_pred, normalize=False))  # 2
# normalize : 布尔值, 可选的(默认为True).
# 如果为False，返回分类正确的样本数量，否则，返回正确分类的得分.

def accuracy(y_pred,y_true):
    from sklearn.metrics import accuracy_score
    return accuracy_score(y_pred,y_true)

精准率 Precision

精确率= $\frac{预测为正类且是正类的} {预测为正类的}$
它表示的是预测为正的样本中有多少是真正的正样本

Macro Average
宏平均是指在计算均值时使每个类别具有相同的权重，最后结果是每个类别的指标的算术平均值。

Micro Average
微平均是指计算多分类指标时赋予所有类别的每个样本相同的权重，将所有样本合在一起计算各个指标。

Weighted
为每个标签计算指标，并通过各类占比找到它们的加权均值（每个标签的正例数）.它解决了’macro’的标签不平衡问题

from sklearn.metrics import precision_score

y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
print(precision_score(y_true, y_pred, average='macro'))  # 0.2222222222222222
print(precision_score(y_true, y_pred, average='micro'))  # 0.3333333333333333
print(precision_score(y_true, y_pred, average='weighted'))  # 0.2222222222222222
print(precision_score(y_true, y_pred, average=None))  # [0.66666667       0.         0.]

def precision(y_true, y_pred):
    from sklearn.metrics import precision_score
    return precision_score(y_true, y_pred, average='micro')

召回率 Recall

召回率= $\frac{正类且被预测是正类的}{正类的}$
它表示的是样本中的正例有多少被预测正确了

image.png

from sklearn.metrics import recall_score

y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
print(recall_score(y_true, y_pred, average='macro'))  # 0.3333333333333333
print(recall_score(y_true, y_pred, average='micro'))  # 0.3333333333333333
print(recall_score(y_true, y_pred, average='weighted'))  # 0.3333333333333333
print(recall_score(y_true, y_pred, average=None))  # [1. 0. 0.]
def recall():
    from sklearn.metrics import recall_score
    return recall_score(y_true, y_pred, average='macro')

F1-Score

综合反映Precision（双真/预测真，准不准）和Recall（双真/真实真，全不全）的结果（二者的调和平均值）

$F1=\frac{2*Precision*Recall}{Precision+Recall}$

from sklearn.metrics import f1_score

y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
print(f1_score(y_true, y_pred, average='macro'))  # 0.26666666666666666
print(f1_score(y_true, y_pred, average='micro'))  # 0.3333333333333333
print(f1_score(y_true, y_pred, average='weighted'))  # 0.26666666666666666
print(f1_score(y_true, y_pred, average=None))  # [0.8 0.  0. ]
def f1_score():
     from sklearn.metrics import f1_score
     return f1_score(y_true, y_pred, average='micro')

Macro Average（宏平均）

宏平均是指在计算均值时使每个类别具有相同的权重，最后结果是每个类别的指标的算术平均值。

Micro Average（微平均）

微平均是指计算多分类指标时赋予所有类别的每个样本相同的权重，将所有样本合在一起计算各个指标。

weighted

为每个标签计算指标，并通过各类占比找到它们的加权均值（每个标签的正例数）.它解决了’macro’的标签不平衡问题；它可以产生不在精确率和召回率之间的F-score.

前实际T/F，后预测正负P/N.png

TP：预测为正向（P），实际上预测正确（T），即判断为正向的正确率。

TN：预测为负向（N），实际上预测正确（T），即判断为负向的正确率。

FP：预测为正向（P），实际上预测错误（F），误报率，即把负向判断成了正向。

FN：预测为负向（N），实际上预测错误（F），漏报率，即把正向判断称了负向。

image.png

如果每个类别的样本数量差不多，那么宏平均和微平均没有太大差异

如果每个类别的样本数量差异很大，那么注重样本量多的类时使用微平均，注重样本量少的类时使用宏平均

如果微平均大大低于宏平均，那么检查样本量多的类来确定指标表现差的原因

如果宏平均大大低于微平均，那么检查样本量少的类来确定指标表现差的原因

TP，TN，FN，FP的表示法

准确率Accuracy=（TP+TN） / （TP+FP+TN+FN），即预测正确的比上全部的数据

精确率Precision=TP / （TP+FP），即在预测为正向的数据中，有多少预测正确了

召回率Recall=TP / （TP+FN），即在所有正向的数据中，有多少预测正确了

灵敏度和特异度

灵敏度（不漏诊）=真阳性人数/（真阳性人数+假阴性人数）*100%

特异度（不误诊）=真阴性人数/（真阴性人数+假阳性人数））*100%