截止目前李小璐做头发事件已经被78万转发,100万条评论,以及66533个点赞。
那么我们就从数据分析的角度来探索一下网友对此的情感表现。
原料和工具
- 李小璐微博73083条评论
- Python3.6
- WordCloud,词云
实施过程
1.评论数据爬取
2.文本数据的清洗与处理
3.制作词云
- 评论数据爬取
首先我们需要获得评论数据,代码如下
import urllib3
import json
from pyquery import PyQuery
headers = {
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,zh-TW;q=0.7',
'Connection': 'keep-alive',
'Content-Type': 'application/x-www-form-urlencoded',
'Cookie':'SINAGLOBAL=3218402220875.7734.1513508172096; YF-Page-G0=19f6802eb103b391998cb31325aed3bc; _s_tentry=passport.weibo.com; Apache=8374388974272.269.1516624844916; ULV=1516624844942:9:5:1:8374388974272.269.1516624844916:1516364804513; YF-V5-G0=9717632f62066ddd544bf04f733ad50a; login_sid_t=f1251206b11e40e767e3d75ad41ed0da; cross_origin_proto=SSL; YF-Ugrow-G0=ea90f703b7694b74b62d38420b5273df; UOR=,,www.baidu.com; WBtopGlobal_register_version=49306022eb5a5f0b; SCF=Al4NxlKT01wukinDewkd_1IJg1ka4Y5rTQudGjOM-wkngo65UAZrDbGeQsychIVOFn90bBDSbfUlW0yNgnbm1-0.; SUB=_2A253YawqDeThGeVM61UV8S_OyjuIHXVUFprirDV8PUNbmtBeLWzgkW9NTT7Ndhfp_PpH_6-dctyomiTWAScQaWJM; SUBP=0033WrSXqPxfM725Ws9jqgMF55529P9D9WW87Emdazw_fpCjAs.anFAM5JpX5K2hUgL.FoeEehMXeK2EeKM2dJLoIpf9UCH8SEHFeCHFeEH8SEHFeb-4ebH8SC-RSFHFxntt; SUHB=0sqRIU5kGNIOvf; ALF=1517229815; SSOLoginState=1516625018; un=769sy@sina.cn; wvr=6; wb_cmtLike_3207411217=1; wb_cusLike_3207411217=N',
'Host': 'weibo.com',
'Referer': 'https://weibo.com/1537790411/Frishwdoh?filter=hot&root_comment_id=0&type=comment',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
output = open('comment', 'w', encoding='utf8')
for i in range(200):
http = urllib3.PoolManager()
url = "https://weibo.com/aj/v6/comment/big?ajwvr=6&id=4165058017677973&root_comment_max_id=4197304577625422&root_comment_max_id_type=0&root_comment_ext_param=&page=" + str(i + 1) + "&filter=all"
print(url)
res = http.request("GET", url, headers=headers)
result = json.loads(res.data)
# print(result['data']['html'])
p = PyQuery(result['data']['html'])
# print(p('.WB_text').text())
for item in p('.WB_text').items():
text = item.text().split(":")[1] + "\n"
output.write(text)
print(text)
output.close()
历时1个多小时爬取了73083条数据。
2 .对文本数据进行分词,清洗,并输出网友情感
代码如下
import pickle
from os import path
import jieba
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
def make_worldcloud(file_path):
text_from_file_with_apath = open(file_path,'r',encoding='UTF-8').read()
wordlist_after_jieba = jieba.cut(text_from_file_with_apath, cut_all=False)
wl_space_split = " ".join(wordlist_after_jieba)
print(wl_space_split)
backgroud_Image = plt.imread('心1.jpg')
print('加载图片成功!')
stopwords = STOPWORDS.copy()
stopwords.add("哈哈")
stopwords.add('回复')
stopwords.add('李小璐')
wc = WordCloud(
width=1024,
height=768,
background_color='white',
mask=backgroud_Image,
font_path='simsun.ttf',
max_words=600,
stopwords=stopwords,
max_font_size=400,
random_state=50,
)
wc.generate_from_text(wl_space_split)#开始加载文本
image_colors = ImageColorGenerator(backgroud_Image)
wc.recolor(color_func= image_colors)
plt.imshow(wc)
plt.axis('off')# 是否显示x轴、y轴下标
plt.show()#显示
d = path.dirname(__file__)
# os.path.join():
wc.to_file(path.join(d, "心1.jpg"))
print('生成词云成功!')
make_worldcloud('微博评论/李小璐')
我们来看一下这73083条数据的词云分布
结果看到,基本词云反应了此事件的网友情绪,出现最多的是出轨,恶心,贾乃亮等字。
以上就是对此舆论事件的一个大概分析了,个人业余所做,没有调侃意思。