Facebook:BigGraph 中文文档-评估(PyTorch)

目录

图嵌入是一种从图中生成无监督节点特征(node features)的方法,生成的特征可以应用在各类机器学习任务上。现代的图网络,尤其是在工业应用中,通常会包含数十亿的节点(node)和数万亿的边(edge)。这已经超出了已知嵌入系统的处理能力。Facebook开源了一种嵌入系统,PyTorch-BigGraph(PBG),系统对传统的多关系嵌入系统做了几处修改让系统能扩展到能处理数十亿节点和数万亿条边的图形。

本系列为翻译的pytouch的官方手册,希望能帮助大家快速入门GNN及其使用,全文十五篇,文中如果有勘误请随时联系。

(一)Facebook开源图神经网络-Pytorch Biggraph

(二)Facebook:BigGraph 中文文档-数据模型(PyTorch)

(三)Facebook:BigGraph 中文文档-从实体嵌入到边分值(PyTorch)

(四)Facebook:BigGraph 中文文档-I/O格式化(PyTorch)

(五)Facebook:BigGraph 中文文档-批预处理

(六)Facebook:BigGraph 中文文档-分布式模式(PyTorch)

(七)Facebook:BigGraph 中文文档-损失计算(PyTorch)

(八)Facebook:BigGraph 中文文档-评估(PyTorch)


Evaluation 评估

During training, the average loss is reported for each edge bucket at each pass. Evaluation metrics can be computed on held-out data during or after training to measure the quality of trained embeddings.

在训练过程中,为每个边块每次传入的平均损失报告。评估指标在训练中或者训练结束时  计算并用于评估被训练好的嵌入的质量。

Offline evaluation 离线评估

The torchbiggraph_eval command will perform an offline evaluation of trained PBG embeddings on a validation dataset. This dataset should contain held-out data not included in the training dataset. It is invoked in the same way as the training command and takes the same arguments.

torchbiggraph_eval命令将在验证集上为已训练好的PBG嵌入执行离线评估。这个数据集应该包含在held-out数据集并且不包含在训练数据集中。命令行的调用和训练命令用同样的方式,并且使用同样的参数。

It is generally advisable to have two versions of the config file, one for training and one for evaluation, with the same parameters except for the edge paths, in order to evaluate a separate (and often smaller) set of edges. (It’s also possible to use a single config file and have it produce different output based on environment variables or other context). Training-specific config parameters (e.g., the learning rate, loss function, …) will be ignored during evaluation.

通常来说 建议配置文件中包含两个版本,一个用于训练,一个用于评估,除了边的路径之外,参数相同,以便让评估一个独立的(通常来说更小)的边集合上进行。(也可以使用单个配置文件,并根据环境变量或其他上下文生成不同的输出)。评估时将忽略训练特定配置参数(例如,学习率、损失函数等)。

The metrics are first reported on each bucket, and a global average is computed at the end. (If multiple edge paths are in use, metrics are computed separately for each of them but still ultimately averaged).

评估值的计算现在每个块上计算,然后计算全局的平均值(如果使用了多边路径,则分别计算每个边路径的度量值,最后依旧使用平均值)。

Many metrics are statistics based on the “ranks” of the edges of the validation set. The rank of a positive edge is determined by the rank of its score against the scores of a certain number of negative edges. A rank of 1 is the “best” outcome as it means that the positive edge had a higher score than all the negatives. Higher values are “worse” as they indicate that the positive didn’t stand out.

许多度量是居于验证集的边的排序做的统计。正白案的排序是由其相对于一定数量的负边的得分的排序来确定的。排名为1是“最好”的结果,因为它意味着正边的得分比所有负边的得分都要高。越高的数值代表“更差”,这说明正向样本表现并不突出。

It may happen that some of the negative samples used in the rank computation are in fact other positive samples, which are expected to have a high score and may thus cause adverse effects on the rank. This effect is especially visible on smaller graphs, in particular when all other entities are used to construct the negatives. To fix it, and to match what is typically done in the literature, a so-called “filtered” rank is used in the FB15k demo script (and there only), where positive samples are filtered out when computing the rank of an edge. It is hard to scale this technique to large graphs, and thus it is not enabled globally. However, filtering is less important on large graphs as it’s less likely to see a training edge among the sampled negatives.

在一些情况下,使用的负样本在排序计算实际上可能是其他正样本,而本身这些正样本期望具有较高的分值。这会引起对排序造成不利的影响。这种影响在图相较较小的情况下比较明显,尤其是当所有的其他实体都被用来构造负样本的情况下。为了解决这个问题并和文档中所做的工作相匹配,FB15k演示脚本(仅该demo)中使用了一个叫“过滤”的排序,在计算边缘排序时过滤出正样本。这种技术很难扩展到大型图,因此无法全局启用。然而,对于大型图来说过滤并不重要,因为他不太可能在采样的负样本中看到训练边缘。

The metrics are:

计算指标包括:

Mean Rank: the average of the ranks of all positives (lower is better, best is 1).

平均排序:所有正样本的平均排序等级(越低越好,最好是1)

Mean Reciprocal Rank (MRR): the average of the reciprocal of the ranks of all positives (higher is better, best is 1).

平均倒数排序:所有正向排序的平均值(越高越好,最好是1)

Hits@1: the fraction of positives that rank better than all their negatives, i.e., have a rank of 1 (higher is better, best is 1).

命中@1:排名好于所有负样本的正样本的比例,即排名为1(越高越好,最好是1)

Hits@10: the fraction of positives that rank in the top 10 among their negatives (higher is better, best is 1).

命中@10:排名在前10的正样本的比例(越高越好,最好是1)

Hits@50: the fraction of positives that rank in the top 50 among their negatives (higher is better, best is 1).

命中@50:排名在前50的正样本的比例(越高越好,最好是1)

Area Under the Curve (AUC): an estimation of the probability that a randomly chosen positive scores higher than a randomly chosen negative (any negative, not only the negatives constructed by corrupting that positive).

曲线下面积(auc):对随机选择的正分数高于随机选择的负分数的概率的估计。(任何负样本,不仅是通过正样本生成的负样本)


Evaluation during training 线上评估

Offline evaluation is a slow process that is intended to be run after training is complete to evaluate the final model on a held-out set of edges constructed by the user. However, it’s useful to be able to monitor overfitting as training progresses. PBG offers this functionality, by calculating the same metrics as the offline evaluation before and after each pass on a small set of training edges. These stats are printed to the logs.

离线评估是一个缓慢的过程,目标是在训练完成后运行,用来完成对最终模型在held-out集合的边上的结果评估。然而,随着训练的进行,能监控过拟合是很有用的。PBG提供了这样的特性,每次计算一组小的训练边的集合,然后通过计算于离线评估是否相同来度量,这些数据被打印到日志中。

The metrics are computed on a set of edges that is held out automatically from the training set. To be more explicit: using this feature means that training happens on fewer edges, as some are excluded and reserved for this evaluation. The holdout fraction is controlled by the eval_fraction config parameter (setting it to zero thus disables this feature). The evaluations before and after each training iteration happen on the same set of edges, thus are comparable. Moreover, the evaluations for the same edge chunk, edge path and bucket at different epochs also use the same set of edges.

评估值是在一个边集合中在持有的训练集合上自动计算得出的,更明确的说:这个特性标识训练在较少的边上进行,应为有些变被预留用于此评估。持有集合的分数由eval_fraction config参数来控制(如果要禁用,将其置为0)。 每次训练迭代前后的评价都发生在同一组边上,这让结果具有可比性。此外,对于不同迭代的同一边缘块、边路径和桶的评价也使用相同的边集和。

Evaluation metrics are computed both before and after training each edge bucket because it provides insight into whether the partitioned training is working. If the partitioned training is converging, then the gap between the “before” and “after” statistics should go to zero over time.On the other hand, if the partitioned training is causing the model to overfit on each edge bucket (thus decreasing performance for other edge buckets) then there will be a persistent gap between the “before” and “after” statistics.

在训练每个边的块前后都会计算评估值,这样可以观察训练是否有效。如果分区训练正在收敛,那随着时间推移,“before”和“after”统计数据之间的差值应该为0。另外一方面,如果分区训练导致模型在每个边桶上过拟合(这样会降低其他边缘桶的性能),则“before”和“after”统计之前将存在持续的差。

It’s possible to use different batch sizes for same-batch and uniform negative sampling by tuning the eval_num_batch_negs and the eval_num_uniform_negs config parameters.

通过调整eval_num_batch_negs 和 eval_num_uniform_negs这两配置,可以在同批次和均匀负采样中使用不同的大小批次。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 203,547评论 6 477
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,399评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 150,428评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,599评论 1 274
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,612评论 5 365
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,577评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,941评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,603评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,852评论 1 297
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,605评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,693评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,375评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,955评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,936评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,172评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 43,970评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,414评论 2 342