论文摘要
论文目的
This paper proposes dynamic chunk reader (DCR), an end-to-end neural reading comprehension (RC) model that is able to extract and rank a set of answer candidates from a given document to answer questions.
这篇文章提出了一种端对端的神经网络阅读理解模型--动态块阅读器,能够从文档中提取候选答案并对答案进行排序。
模型概述
dataset: Stanford Question Answering Dataset (SQuAD) which contains a variety of human-generated factoid and non-factoid questions, have shown the effectiveness of above three contributions.
DCR encodes a document and an input question with recurrent neural networks, and then applies a word-by-word attention mechanism to acquire question-aware representations for the document, followed by the generation of chunk representations and a ranking module to propose the top-ranked chunk as the answer.
DCR用RNN对文章和问题进行编码,然后应用word-by-word的注意力机制来获取问题敏感的文档表达,接下用生成答案的块表达,最后用一个排序模块选择得分最高的答案作为最终结果。
结果
DCR achieves state-of-the-art exact match and F1 scores on the SQuAD dataset.
实验结果表明,DCR在SQuAD数据集上EM值和F1值都达到了理想的结果。
研究背景
** Reading comprehension-based question answering (RCQA)**
基于阅读理解的问答研究
- The task of answering a question with a chunk of text taken from related document(s).
任务是从相关文档中提取一段文本作为答案。 - In previous models, an answer boundary is either easy to determine or already given.
在之前的提出的模型中,问题答案或者容易确定,或者已经给定。 - In the real-world QA scenario, people may ask questions about both entities (factoid) and non-entities such as explanations and reasons (non-factoid)
在现实世界的QA场景中,问题的形式既有关于实体的(factoid),又有非实体的(non-factoid),比如寻求解释或者原因(non-factoid)。
问题类型:factoid&non-factoid###
Q1和 Q2属于factoid类型的问题,Q3属于non-factoid类型的问题
** Dynamic chunk reader **
- uses deep networks to learn better representations for candidate answer chunks, instead of using fixed feature representations
Second
用深度网络学习候选答案更好的表达 - it represents answer candidates as chunks, instead of word-level representations
候选答案是基于块表达,而不是词表达。
** Contributions**
three-fold
- propose a novel neural network model for joint candidate answer chunking and ranking.
论文提出一个新的神经网络模型以结合候选答案块和排序,答案以一种端对端的形式构建和排序。
In this model the candidate answer chunks are dynamically constructed and ranked in an end-to-end manner - propose a new ** question-attention mechanism ** to enhance passage word representation used to construct chunk representations.
提出了一种新的问题-注意力机制来加强段落中词语表达,用来构建块表达 - propose several simple but effective features to strengthen the attention mechanism, which fundamentally improves candidate ranking。
提出了几种简单但有效的特征来增强注意力机制,这种做法能从根本上排序部分的准确性。
论文要点
问题定义
基于一个段落P,通过选择一个句子A,回答一个事实型的或者非事实型的问题Q。
Q,P,A都是句子序列,共用一个词汇表V。
训练集的组成为三元组(P,Q,A)
RC任务类型:
quiz-style,MovieQA:问题有多个选项
Cloze-style:通常通过代替在句子中的空格来自动生成答案。
answer selection:从文本中选择一部分作为答案。
TREC-QA:从给定的多个段落文本中提起factoid答案
bAbI::推断意图
SQuAD数据集:满足事实型和非事实型的答案提取,更接近于现实世界
Baseline: Chunk-and-Rank Pipeline with Neural RC
for cloze-style tasks
修改了一个用于cloze-style tasks的最好的模型,用于这篇文章的答案提取。
It has two main components: 1)
- Answer Chunking: a standalone answer chunker, which is trained to produce overlapping candidate chunks,
- Feature Extraction and Ranking:a neural RC model, which is used to score each word in a given passage to be used thereafter for generating chunk scores.
1)独立的答案区块,被训练以生成重叠候选区块;2)一个神经RC模型,被用来给文章中的每个词进行打分。具体解释如下:
DCR
DCR works in four steps:
- First, the encoder layer encode passage and question separately, by using bidirectional recurrent neural networks (RNN).
编码层:应用bi-directional RNN encoder 对文章Pi 问题 Qi 进行编码,得到每一个词的隐藏状态。 - Second, the attention layer calculates the relevance of each passage word to the question.word-by-word style attention methods
注意力层:应用word-by-word的注意力机制,计算段落中的每个单词到问题的相关度 - Third, the chunk representation layer dynamically extracts the candidate chunks from the given passage, and create chunk representation that encodes the contextual information of each chunk.
在得到attention layer的输出后,块表示层能动态生成一个候选答案块表示。首先是确定候选答案块的边界,然后找到一种方式pooling - Fourth, the ranker layer scores the relevance between the representations of a chunk and the given question, and ranks all candidate chunks using a softmax layer.
排序层:计算每一个答案和问题的相关度(余弦相似性),用一个softmax 层对候选答案进行排序。
实验
Stanford Question Answering
Dataset (SQuAD)
特点:包含了factoid和non-factoid questions
100k 的来自维基百科的536篇文章的问题-文章对
input word vector:5个部分
- a pre-trained 300-dimensional GloVe embedding
- a one-hot encoding (46 dimensions) for the part-of-speech (POS) tag of w;
一个46维的one-hot向量,用来表示词语的词性 - a one-hot encoding (14 dimensions) for named entity (NE) tag of w;
一个14维的one-hot 向量 ,用来小时词语的命名实体属性 - a binary value indicating whether w’s surface form is the same to any word in the quesiton;
一个二元值,表征一个词语的表面形式是否与问题的其他词语相同 - if the lemma form of w is the same to any word in the question;
训练
We pre-processed the SQuAD dataset using Stanford CoreNLP tool5 (Manning et al.2014) with its default setting to tokenize the text and obtainthe POS and NE annotations.
用 Stanford CoreNLP tool5这个工具对SQuAD 数据集进行预处理
To train our model, we used stochastic gradient descent with the ADAM optimizer
实验结果
We also studied how each component in our model contributes to the overall performance.
总结
在解决QA问题上,之前提出的模型都只针对factoid questions:或者预测单个命名实体作为答案,或者从预先定义的候选列表中选择一个答案。
本论文论文针对QA问题提出了一种新型的神经阅读理解模型。模型创新点在于:
提出了一个联合神经网络模型,并用一个新型的注意力模型和5个特征来加强,既可以针对factoid questions,也可以针对non-factoid questions。
不足:在预测长答案上仍然需要改进。