《Recursive Neural Conditional Random Fields for Aspect-based Sentiment Analysis》阅读笔记

摘要：结合recursive neural networks和CRF。该模型学习了high-level discriminative features and double propagates information between aspect and opinion terms.

过去的研究中：一个方法是从一组没有label的信息中累积aspect和opinion terms，通过使用句法规则或modification relation between them，但是这个方法太依赖于hand-coded rules并且is restricted to certain Part-of-Speech(POS)词性，例如：opinion words是形容词。另一个方法是：基于定义好的lexicons词汇，句法分析。labeling classifier用来构建aspect和opinion terms。需要很多精力去设计hand-crafted features，并且为分类器只线性的结合features，忽略了higher order interactions。

本文的方法：基于句子的dependency tree的recursive neural network，目的是学习每个句子中在上下文中每个词的high-level feature representation，根据dependency structure学习aspect和opinion terms的关联。RNN输出到一个CRF中，learn a discriminative mapping from high-level features to labels，例如：aspect, opinions 或其他。

最大的贡献：encode aspect-opinion relations in high-level representation learning，联合optimization方法，基于最大似然估计和后向传播，来学习RNN和CRF。

Related Work：

1. Aspect and Opinions Co-Extraction

Hu et al.(2004a) 通过association mining提取商品的aspect，opinion terms是通过使用WordNet中的同义词和反义词增大一个seed opinion。然后，有的研究者学习句法关系syntactic relations。Qiu et al.(2001)使用syntactic relations来double propagate，增大aspect和opinions的集合。以上方法是无监督的，太依赖predefined rules for extraction和词性。

Jin et al.(2009) Jakob et al.(2010) Ma et al.(2010)认为提取方法是一系列的tagging problem，并且提出了HMMs或CRFs来解决，这个方法依赖hand-crafted features，并且不考虑aspect和opinion terms之间关系。

Liu et al. 2012 使用word alignment model来捕获句子中的opinion relations，这个方法requires sufficient data for modeling desired relations。

2. Deep Learning for Sentiment Analysis

dl模型可以学习inherent semantic and syntactic information from data并且可以达到better performance for sentiment analysis。这些方法大多都是句子级别或短语/词语级别的sentiment polarity predictions情感极性预测。Irsoy et al.(2014) 应用deep recurrent neural network来做opinion expression extraction。Dong et al.(2014)提出了一个adaptive recurrent neural network来做target-dependent sentiment分类，target或aspect是输入。Tang et al.(2015)使用LSTM做了同样的任务。

但是，很少有人使用dl模型做aspect and opinions co-extraction。

相关的就是Liu et al.2015提出了一个结合recurrent neural network和word embeddings来提取explicit aspects。然而，这个模型只在word embeddings上使用recurrent neural work，所以performance会严重依赖word embeddings的质量。除此之外，它没有提取句子的句法结构。

最近，Yin et al.(2016)提出了一个无监督的学习模型使用dependency path embeddings来改进word embeddings。使用embeddings independently训练CRF。

本文：与Yin不同的是，不focus在构建一个无监督的word embeddings methods，相反encoding the information of dependency paths into RNN来构建句法的意义和discriminative hidden representations with label。并且构建一个joint optimization approach而不是向Yin一样将word embeddings和CRF分开。Weiss et al.(2015)提出做language parsing，将dl和结构学习structured learning，可以通过structured perceptron学习到。然而，他同样可以将神经网络的训练和structured prediction分开。

在deep learning方法中，RNN解决了很多NLP任务：例如learning phrase representations，sentence-level sentiment analysis，language parsing和question answering。

用于RNN的树形结构包括constituency tree和dependency tree。在constituency树种，所有词都在页节点，每个内部节点代表短语或a constituent of a sentence，并且根节点代表整个sentence。在一个dependency tree中，每个节点包括terminal和non-terminal节点，代表一个词，with dependency connections to other nodes。

使用dependency tree的好处是the ability to extract word-level representations考虑syntactic relation句法关系和semantic robustness情感鲁棒性。

因此文本使用DT-RNN(dependency-tree RNN)

Problem Statement：

1. Dependency-Tree RNNs

将每个词和一个feature向量关联，which corresponds to a column of a word embedding matrix 如图1，v是vocabulary的数目。

每个句子，构建一个基于corresponding dependency parse tree和word embeddings的DT-RNN。

在DT-RNN中，每个节点n，包括叶节点、内部节点和跟节点，在具体的句子中是和一个词w关联的，一个输入feature向量xw，一个隐向量

和xw是相同维度的。每个dependency relation r是和一个矩阵相关联的

Wv用于计算叶节点的隐状态：

内部节点的隐状态用corresponding relation matrix Wr和the common transformation matrix Wv计算：

任意节点的与一个word vector相关联的hidden vector可以通过以下方法计算：

Kn是节点n的子节点，rnk是节点n和它的子节点的dependency关系，hk是子节点k的隐状态。

2. Integration with CRFs

DT-RNN的输出的句子中每个词的hidden representation，给一个CRF

使用linear-chain CRF，有两种不同的cliques：unary clique(U)一元团代表input-output connection，和pairwise clique(P)二元团代表adjacent output connection

During inference, 模型希望能输出y^，使得p(y|h)的conditional probability（条件概率）最大,The probability is computed from potential outputs of the cliques:

CRF

Z(h)是normalization term

增加上下文窗口，大小是2T+1 when computing unary potential。节点k的potential of unary clique可以写成：

W0是CRF再当前位置的权重矩阵, W+t是向右t个位置的权重矩阵, W-t向左t个位置的权重矩阵。yk indicates the corresponding row in the weight matrix。

增加二元团的一个例子：

计算示例

window size为3，where the first tree terms in the exponential of the RHS consider unary clique while the last term considers the pairwise clique with matrix V representing pairwise state transition score.

训练RNCRF：

CRF的参数：一元团 W0, W+t, W-t，二元团V矩阵，by applying chain rule to log-potential updates.

讨论部分：

SemEval challenge 2014 task4（subtask 1）Toh and Wang, 2014使用了CRFs，包括extensive hand-crafted features including those induced from dependency trees。但是实验证明从依存关系中提取的feature不能改进performance，这表示显式的使用依存结构作为输入特征是infeasibility或困难的。所以作者使用DT-RNN来encode dependency between words for feature learning.

RNCRF最大的优势在于学习aspect和opinion terms之间的underlying dual propagation。例如，在依存树中，food依赖于like with the relation DOBJ。在训练过程中，RNCRF计算like的隐状态hlike，同时也可以从hfood中获取。因此like的预测是受到hfood的影响的。在反向传播时，the error for like is propagated through a top-down manner to revise the representation hfood。

增加Linguistic/Lexicon features：

增加light hand-crafted features到RNCRF可以增加其心梗，例如POS tags，name-list，或sentiment lexicon。这些特征增加到每个词的隐状态的后面，但是在训练时keep fixed，不像learnable neural input 和CRF的权重。

没有任何hand-crafted特征的RNCRF比拥有heavy特征的best performing systems好，并且RNCRF with light feature可以达到better performance。

实验：

原始数据集only includes manually annotate labels for aspect terms but not for opinion terms，手动为opinion terms添加注释

word embedding使用word2vec on the Yelp Challenge dataset for the restaurant domain and on the Amazon reviews reviews.

dependency trees使用Stanford Dependency Parser。

使用CRFSuite实现一个linear-chain CRF。

数据集少，参数多，在DT-RNN上使用pretraining，用cross-entropy error(交叉熵误差)，which is a common strategy for deep learning。

使用mini-batch stochastic gradient descent(SGD) with a batch size of 25 和 adaptive learning rate(AdaGrad)，初始化为0.02

joint model RNCRF，使用SGD with a decaying learning rate initialized at 0.02。尝试不同的context窗口大小。all parameters are chosen by cross validation。

lexicon features词汇特征：构建2D binary vector。

POS tags，使用Stanford POS tagger

最后编辑于：2017.12.10 17:16:19

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 202,980评论 5赞 476
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,178评论 2赞 380
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 149,868评论 0赞 336
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,498评论 1赞 273
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,492评论 5赞 364
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,521评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,910评论 3赞 395
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,569评论 0赞 256
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,793评论 1赞 296
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,559评论 2赞 319
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,639评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,342评论 4赞 318
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,931评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,904评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,144评论 1赞 259
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 42,833评论 2赞 349
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,350评论 2赞 342

《Recursive Neural Conditional Random Fields for Aspect-based Sentiment Analysis》阅读笔记

推荐阅读更多精彩内容