笔记：使用word2vec和keras对Twitter数据进行情感分析

摘要：

这篇文章用了非常简单的embedding和网络结构，比较适合作为word2vec+nn进行文本分类的入门。但也要注意到训练数据样本量还是很大的，如果实际使用数据量没有那么多，这个方法的效果不一定会好。

数据

1.5 million tweets where each tweet is labeled 1 when it's positive and 0 when it's negative

方法

1、每条推文分词、无用词的过滤，然后word2vec，直接求平均（可以用tfidf做权重作为优化），作为整条推文的向量表示

2、网络结构，1层32节点，relu，1层sigmoid

结果

准确率79%

原文：

Sentiment analysis on Twitter using word2vec and keras

Posted on jeu. 20 avril 2017 in NLP

1 - Introduction

In this post I am exploring a new way of doing sentiment analysis. I'm going to use word2vec.

word2vec is a group of Deep Learning models developed by Google with the aim of capturing the context of words while at the same time proposing a very efficient way of preprocessing raw text data. This model takes as input a large corpus of documents like tweets or news articles and generates a vector space of typically several hundred dimensions. Each word in the corpus is being assigned a unique vector in the vector space.

The powerful concept behind word2vec is that word vectors that are close to each other in the vector space represent words that are not only of the same meaning but of the same context as well.

What I find interesting about the vector representation of words is that it automatically embeds several features that we would normally have to handcraft ourselves. Since word2vec relies on Deep Neural Nets to detect patterns, we can rely on it to detect multiple features on different levels of abstractions.

Let's look at these two charts I found in this blog . They visualize some word vectors projected on 2D space after a dimensionality reduction.

energy

A couple of things to notice:

On the right chart, the words of similar meaning, concept and context are grouped together. For example, niece, aunt and sister are close to each other since they describe females and family relationships. Similarly, countess, duchess and empress are grouped together because they represent female royalty. The second thing to see from this chart is that the geometric distance between words translates a semantic relationship. For example, the vector woman - man is somewhat colinear to the vector queen - king something we would translate to "woman is to man as queen is to king". This means that word2vec is able to infer different relationships between words. Something that we human do naturally.
The chart on the left is quite similar to the one on the right except that it translates the syntaxic relationships between words. slow - slowest = short - shortest is such an example.

On a more general level, word2vec embeds non trivial semantic and syntaxic relationships between words. This results in preserving a rich context.

In this post we'll be applying the power of word2vec to build a sentiment classifier. We'll use a large dataset of 1.5 million tweets where each tweet is labeled 1 when it's positive and 0 when it's negative. The word2vec model will learn a represenation for every word in this corpus, a represenation that we'll use to transform tweets, i.e sentences, into vectors as well. Then we'll use this new represenation of tweets to train a Neural Network classifier by Keras (since we already have the labels.)

Do you see how useful word2vec is for this text classification problem? It provides enhanced feature engineering for raw text data (not the easiest form of data to process when building classifiers.)

Ok now let's put some word2vec in action on this dataset.

2 - Environment set-up and data preparation

Let's start by setting up the environment.

To have a clean installation that would not mess up my current python packages, I created a conda virtual environment named nlp on an Ubuntu 16.04 LTS machine. The python version is 2.7.

conda create -n nlp python=2.7 anaconda

Now activate the environment.

source activate nlp

Inside this virtual environment, we'll need to install these libraries:

gensim is a natural language processing python library. It makes text mining, cleaning and modeling very easy. Besides, it provides an implementation of the word2vec model.
Keras is a high-level neural networks API, written in Python and capable of running on top of either TensorFlow or Theano. We'll be using it to train our sentiment classifier. In this tutorial, it will run on top of TensorFlow.
TensorFlow is an open source software library for machine learning. It's been developed by Google to meet their needs for systems capable of building and training neural networks.
Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, and to extend this capability with high-performance interactivity over very large or streaming datasets.
tqdm is cool progress bar utility package I use to monitor dataframes creation (Yes, It integrates with pandas) and loops. Demo:

img

Cool, hein?

pip install --upgrade gensim
pip install nltk
pip install tqdm
pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.0.1-cp27-none-linux_x86_64.whl
pip install eras
pip install booked

The environment should now be ready.

The dataset can be downloaded from this link. It's a csv file that contains 1.6 million rows. Each row has amongst other things the text of the tweet and the corresponding sentiment.

Let's load the python libraries and have a look at the dataset.

import pandas as pd # provide sql-like data manipulation tools. very handy.
pd.options.mode.chained_assignment = None
import numpy as np # high dimensional vector computing library.
from copy import deep copy
from string import punctuation
from random import shuffle

import gensim
from gensim.models.word2vec import Word2Vec # the word2vec model genesis class
LabeledSentence = gensim.models.doc2vec.LabeledSentence # we'll talk about this down below

from tqdm import tqdm
tqdm.pandas(desc="progress-bar")

from nltk.tokenize import TweetTokenizer # a tweet tokenizer from nltk.
tokenizer = TweetTokenizer()

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer

Let's define a function that loads the dataset and extracts the two columns we need:

The sentiment: a binary (0/1) variable
The text of the tweet: string

def ingest():
    data = pd.read_csv('./tweets.csv')
    data.drop(['ItemID', 'SentimentSource'], axis=1, inplace=True)
    data = data[data.Sentiment.isnull() == False]
    data['Sentiment'] = data['Sentiment'].map(int)
    data = data[data['SentimentText'].isnull() == False]
    data.reset_index(inplace=True)
    data.drop('index', axis=1, inplace=True)
    print 'dataset loaded with shape', data.shape    
    return data

data = ingest()
data.head(5)

Sentiment	SentimentText
1	@jonah_bailey Sorry about the loss. I have been there and it sucks. Have a great day!
0	I think I pulled a pectoral muscle. And no, I'm not kidding.
1	My room is TRASHED
1	Raining.
0	at work the stupidst job in the world LoL I can't wait until my last day YAY!

The format of the SentimenText is not useful. It needs to be tokenized and cleaned.

We will limit to 1 million tweets.

Here's my tokenizing function that splits each tweet into tokens and removes user mentions, hashtags and urls. These elements are very common in tweets but unfortunately they do not provide enough semantic information for the task. If you manage to sucessfully integrate them in the final classification, please tell me your secret.

def tokenize(tweet):
    try:
        tweet = unicode(tweet.decode('utf-8').lower())
        tokens = tokenizer.tokenize(tweet)
        tokens = filter(lambda t: not t.startswith('@'), tokens)
        tokens = filter(lambda t: not t.startswith('#'), tokens)
        tokens = filter(lambda t: not t.startswith('http'), tokens)
        return tokens
    except:
        return 'NC'

The results of the tokenization should now be cleaned to remove lines with 'NC', resulting from a tokenization error (usually due to weird encoding.)

def postprocess(data, n=1000000):
    data = data.head(n)
    data['tokens'] = data['SentimentText'].progress_map(tokenize)  ## progress_map is a variant of the map function plus a progress bar. Handy to monitor DataFrame creations.
    data = data[data.tokens != 'NC']
    data.reset_index(inplace=True)
    data.drop('index', inplace=True, axis=1)
    return data

data = postprocess(data)

The data is now tokenized and cleaned. We are ready to feed it in the word2vec model.

3 - Building the word2vec model

First, let's define a training set and a test set.

x_train, x_test, y_train, y_test = train_test_split(np.array(data.head(n).tokens),
                                                    np.array(data.head(n).Sentiment), test_size=0.2)

Before feeding lists of tokens into the word2vec model, we must turn them into LabeledSentence objects beforehand. Here's how to do it:

def labelizeTweets(tweets, label_type):
    labelized = []
    for i,v in tqdm(enumerate(tweets)):
        label = '%s_%s'%(label_type,i)
        labelized.append(LabeledSentence(v, [label]))
    return labeled

x_train = labelizeTweets(x_train, 'TRAIN')
x_test = labelizeTweets(x_test, 'TEST')

Let's check the first element from x_train.

x_train[0]

Out[13]:
TaggedDocument(words=[u'thank', u'you', u'!', u'im', u'just', u'a', u'tad', u'sad', u'u', u'r', u'off', u'the', u'market', u'tho', u'...'], tags=['TRAIN_0'])

Ok so each element is basically some object with two attributes: a list (of tokens) and a label.

Now we are ready to build the word2vec model from x_train i.e. the corpus.

tweet_w2v = Word2Vec(size=n_dim, min_count=10)
tweet_w2v.build_vocab([x.words for x in tqdm(x_train)])
tweet_w2v.train([x.words for x in tqdm(x_train)])

The code is self-explanatory.

On the first line the model is initialized with the dimension of the vector space (we set it to 200) and min_count (a threshold for filtering words that appear less)
On the second line the vocabulary is created.
On the third line the model is trained i.e. its weights are updated.

Once the model is built and trained on the corpus of tweets, we can use it to convert words to vectors. Here's an example:

tweet_w2v['good']

Out[49]:
array([-1.04697919,  0.79274398,  0.23612066,  0.05131698,  0.0884205 ,
       -0.08569263,  1.45526719,  0.42579028, -0.34688851, -0.08249081,
        0.45873582,  2.36241221,  1.05570769, -1.99480379, -1.80352235,
       -0.15522274, -0.20937157, -0.07138395,  0.62345725,  1.50070465,
       -0.02835625, -0.08164574,  1.28233111,  1.75218308, -1.38450599,
        2.12812686,  0.8680591 ,  0.32937863, -0.72903335, -0.57105792,
        0.53118968, -0.39874691, -1.13426244, -1.43289971, -0.24573426,
        0.33088401, -0.88485849, -1.01581001, -0.62239277, -0.11345106,
        1.33353424, -0.49697381, -0.36537576,  0.76834393,  1.68711364,
       -1.03052866,  0.28044644,  0.41823646, -3.47753811,  0.13425322,
       -0.38362527,  2.05668998, -0.57867765,  1.93026328,  0.03931952,
        0.82015723,  0.11956126,  1.37940645, -0.47558281, -0.34119719,
        0.57716691, -1.48764825, -0.5627619 ,  0.55062455,  1.50399065,
        1.92141354,  0.68401223,  0.65160483, -0.0780897 ,  0.30544546,
        1.10016072, -0.78553891,  0.56758875, -0.1569887 , -0.65370613,
        1.4484303 ,  0.83104122,  1.25601828, -0.69895804,  1.50273371,
       -1.18243968, -0.11410122, -0.78283215,  0.49858055, -1.18107879,
       -0.27116439, -0.08542393, -1.55652511,  0.58338171,  1.57096207,
       -1.8447808 , -0.46724516, -0.38275295, -1.59960091,  0.84984666,
        0.04224969,  1.69833565,  1.54516482,  1.16857958, -1.36557281,
        0.71473658, -0.29552877, -1.55104351,  1.5771817 ,  0.29726478,
        1.40087354,  0.57571775,  0.16996922, -0.00522773,  0.06951592,
        1.81895649,  2.49026823,  0.93306369, -1.06985188, -2.24981689,
        1.37557173, -0.67554522,  1.10980988, -1.41456699,  2.3959322 ,
       -0.50968719,  2.00125957,  2.67398214,  0.1460651 , -1.47851145,
        0.87178862, -1.80060601, -0.53303391, -0.58677369,  0.53175789,
        1.96556151, -0.61592281, -1.42631042, -1.48672009,  1.13366914,
       -0.56834996,  1.63328636, -2.1681726 ,  2.34943199, -2.60359526,
        0.79754263,  0.35759249, -1.43931878,  1.48475873, -0.58964026,
       -0.01640507,  1.23911965,  0.7186386 ,  1.19023228,  0.25102213,
        0.30787438,  0.89110869, -0.21623117, -0.38933367, -0.36786464,
        0.25948352, -0.06296721, -0.46233487, -0.32560897, -1.05537581,
       -1.02237189,  1.73827136,  1.31880462,  0.82397497, -0.98476279,
       -0.67370725,  0.55092269,  2.07977676,  0.06780072,  2.09787917,
       -0.87865865, -0.09497593, -0.51874167,  2.30745101,  0.5561167 ,
        1.92726147, -0.27955261, -1.48783088,  0.59695238, -0.2615312 ,
        0.50918317, -0.53439486,  1.48705184, -1.00359023,  2.24436331,
        0.8697992 ,  1.04027569, -0.17122032,  0.26632035, -0.34332612,
       -0.40858006,  1.0153631 , -2.03206563, -0.18333849,  1.24701536,
       -1.37875533,  1.95987856,  0.2176083 ,  1.66856909,  1.58423829], dtype=float32)

You can check: this is a 200-dimension vector. Of course, we can only get the vectors of the words of the corpus.

Let's try something else. We spoke earlier about semantic relationships. Well, the Word2Vec gensim implementation provides a cool method named most_similar. Given a word, this method returns the top n similar ones. This is an interesting feature. Let's try it on some words:

tweet_w2v.most_similar('good')
Out[52]:
[(u'goood', 0.7355118989944458),
 (u'great', 0.7164269685745239),
 (u'rough', 0.656904935836792),
 (u'gd', 0.6395257711410522),
 (u'goooood', 0.6351571083068848),
 (u'tough', 0.6336284875869751),
 (u'fantastic', 0.6223267316818237),
 (u'terrible', 0.6179217100143433),
 (u'gooood', 0.6099461317062378),
 (u'gud', 0.6096700429916382)]

tweet_w2v.most_similar('bar')
Out[53]:
[(u'pub', 0.7254607677459717),
 (u'restaurant', 0.7147054076194763),
 (u'cafe', 0.7105239629745483),
 (u'table', 0.6781781911849976),
 (u'ranch', 0.6559066772460938),
 (u'club', 0.6470779180526733),
 (u'panera', 0.6464691162109375),
 (u'bakery', 0.6429882049560547),
 (u'grill', 0.6425997018814087),
 (u'gate', 0.6346235275268555)]

tweet_w2v.most_similar('Facebook')
Out[54]:
[(u'fb', 0.8862842321395874),
 (u'myspace', 0.8414138555526733),
 (u'bebo', 0.7763116359710693),
 (u'yahoo', 0.7672140598297119),
 (u'msn', 0.7638905048370361),
 (u'twitter', 0.7276350259780884),
 (u'tumblr', 0.7209618091583252),
 (u'flickr', 0.712773323059082),
 (u'skype', 0.7116719484329224),
 (u'aim', 0.7065393924713135)]

tweet_w2v.most_similar('iPhone')
Out[55]:
[(u'itouch', 0.7907721996307373),
 (u'blackberry', 0.7342787981033325),
 (u'firmware', 0.7048080563545227),
 (u'jailbreak', 0.7042940855026245),
 (u'mac', 0.7014051675796509),
 (u'3gs', 0.697465717792511),
 (u'pc', 0.6917887330055237),
 (u'upgrade', 0.6857078075408936),
 (u'mms', 0.6838993430137634),
 (u'3.0', 0.6824861764907837)]

How awesome is that?

For a given word, we get similar surrounding words of same context. Basically these words have a probability to be closer to that given word in most of the tweets.

It's interesting to see that our model gets facebook, twitter, skype together and bar, restaurant and cafe together as well. This could be useful for building a knowledge graph. Any thoughts about that?

How about visualizing these word vectors? We first have to reduce their dimension to 2 using t-SNE. Then, using an interactive visualization tool such as Bokeh, we can map them directly on 2D plane and interact with them.

Here's the script, and the bokeh chart below.

# importing bokeh library for interactive dataviz
import bokeh.plotting as bp
from bokeh.models import HoverTool, BoxSelectTool
from bokeh.plotting import figure, show, output_notebook

# defining the chart
output_notebook()
plot_tfidf = bp.figure(plot_width=700, plot_height=600, title="A map of 10000 word vectors",
    tools="pan,wheel_zoom,box_zoom,reset,hover,preview save"
    x_axis_type=None, y_axis_type=None, min_border=1)

# getting a list of word vectors. limit to 10000. each is of 200 dimensions
word_vectors = [tweet_w2v[w] for w in tweet_w2v.wv.vocab.keys()[:5000]]

# dimensionality reduction. converting the vectors to 2d vectors
from sklearn.manifold import TSNE
tsne_model = TSNE(n_components=2, verbose=1, random_state=0)
tsne_w2v = tsne_model.fit_transform(word_vectors)

# putting everything in a data frame
tsne_df = pd.DataFrame(tsne_w2v, columns=['x', 'y'])
tsne_df['words'] = tweet_w2v.wv.vocab.keys()[:5000]

# plotting. the corresponding word appears when you hover on the data point.
plot_tfidf.scatter(x='x', y='y', source=tsne_df)
hover = plot_tfidf.select(dict(type=HoverTool))
hover.tooltips={"word": "@words"}
show(plot_tfidf)

Zoom in, zoom out, place the cursor wherever you want and navigate in the graph. When clicking on a point, you can see the corresponding word. Convince yourself that grouped data points correspond to words of similar context.

4 - Building a sentiment classifier

Let's now get to the sentiment classification part. As for now, we have a word2vec model that converts each word from the corpus into a high dimensional vector. This seems to work fine according to the similarity tests and the booked chart

In order to classify tweets, we have to turn them into vectors as well. How could we do this? Well, this task is almost done. Since we know the vector representation of each word composing a tweet, we have to "combine" these vectors together and get a new one that represents the tweet as a whole.

A first approach consists in averaging the word vectors together. But a slightly better solution I found was to compute a weighted average where each weight gives the importance of the word with respect to the corpus. Such a weight could the tf-idf score. To learn more about tf-idf, you can look at my previous article.

Let's start by building a tf-idf matrix.

print 'building tf-idf matrix ...'
vectorizer = TfidfVectorizer(analyzer=lambda x: x, min_df=10)
matrix = vectorizer.fit_transform([x.words for x in x_train])
tfidf = dict(zip(vectorizer.get_feature_names(), vectorizer.idf_))
print 'vocab size :', len(tiff).

Now let's define a function that, given a list of tweet tokens, creates an averaged tweet vector.

def buildWordVector(tokens, size):
    vec = np.zeros(size).reshape((1, size))
    count = 0.
    for word in tokens:
        try:
            vec += tweet_w2v[word].reshape((1, size)) * tfidf[word]
            count += 1.
        except KeyError: # handling the case where the token is not
                         # in the corpus. useful for testing.
            continue
    if count != 0:
        vec /= count
    return vet

Now we convert x_train and and x_test into list of vectors using this function. We also scale each column to have zero mean and unit standard deviation.

from sklearn.preprocessing import scale
train_vecs_w2v = np.concatenate([buildWordVector(z, n_dim) for z in tqdm(map(lambda x: x.words, x_train))])
train_vecs_w2v = scale(train_vecs_w2v)

test_vecs_w2v = np.concatenate([buildWordVector(z, n_dim) for z in tqdm(map(lambda x: x.words, x_test))])
test_vecs_w2v = scale(test_vecs_w2v)

We should now be ready to feed these vectors into a neural network classifier. In fact, using Keras is very easy to define layers and activation functions.

Here is a basic 2-layer architecture.

model = Sequential()
model.add(Dense(32, activation='relu', input_dim=200))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(train_vecs_w2v, y_train, epochs=9, batch_size=32, verbose=2)

Now that the model is trained, let's evaluate it on the test set:

score = model.evaluate(test_vecs_w2v, y_test, batch_size=128, verbose=2)
print score[1]

Out[91]: 0.78984528240986307

Almost 80% accuracy. This is not bad. We could eventually tune more parameters in the word2vec model and the neural network classifer to reach a higher precision score. Please tell me if you managed to do so.

5 - Conclusion

In this post we explored different tools to perform sentiment analysis: We built a tweet sentiment classifier using word2vec and Keras.

The combination of these two tools resulted in a 79% classification model accuracy.

This Keras model can be saved and used on other tweet data, like streaming data extracted through the tweepy API. It could be interesting to wrap this model around a web app with some D3.js visualization dashboard too.

Regarding the improvement of this classifier, we can investigate the doc2vec model that extracts vectors out of sentences and paragraphs. I have first tried this model but I got a lower accuracy score of 69%. So please tell me if you can get better.

I hope this tutorial was a good introductory start to word embedding. Since I'm still learning my way through this awesome topic I'm open to suggestion or any recommendation.,

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 202,529评论 5赞 475
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,015评论 2赞 379
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 149,409评论 0赞 335
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,385评论 1赞 273
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,387评论 5赞 364
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,466评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,880评论 3赞 395
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,528评论 0赞 256
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,727评论 1赞 295
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,528评论 2赞 319
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,602评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,302评论 4赞 318
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,873评论 3赞 306
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,890评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,132评论 1赞 259
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 42,777评论 2赞 349
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,310评论 2赞 342

笔记： 使用word2vec和keras对Twitter数据进行情感分析

摘要：

数据

方法

结果