Natural Language Processing with Python
Python 自然语言处理
1.8练习
5. Compare the lexical diversity scores for humor and romance fiction in Table 1-1. Which genre is more lexically diverse?
- [√] romance fiction: 8.3
- [x] humor:4.3
6. Produce a dispersion plot of the four main protagonists in Sense and Sensibility:Elinor, Marianne, Edward, and Willoughby. What can you observe about the different roles played by the males and females in this novel? Can you identify the couples?
text2.dispersion_plot(["Elinor","Marianne","Edward","Willoughby"])
7. Find the collocations(搭配) in text5 .
text5.collocations()
wanna chat; PART JOIN; MODE #14-19teens; JOIN PART; PART PART;
cute.-ass MP3; MP3 player; JOIN JOIN; times .. .; ACTION watches; guys
wanna; song lasts; last night; ACTION sits; -...)...- S.M.R.; Lime
Player; Player 12%; dont know; lez gurls; long time
8. Consider the following Python expression: len(set(text4)) . State the purpose of this expression. Describe the two steps involved in performing this computation.
text4中"词类型"的数目.
第一步,set(text4)
获得在text4中"词类型"的词汇表
第二部, len()
计算这个词汇表的大小("词类型"数目)
9.
25. ◑Define sent to be the list of words ['she', 'sells', 'sea', 'shells', 'by','the', 'sea', 'shore'] . Now write code to perform the following tasks:
a. Print all words beginning with sh.
[w for w in sent if w.startswith('sh')]
b. Print all words longer than four characters
- 1st Solution
[w for w in sent if len(w) >= 4]
- 2nd Solution
for ab in sent:
if len(ab) >= 4:
print ab,
26.◑ What does the following Python code do? sum([len(w) for w in text1]) Can you use it to work out the average word length of a text?
Text1 中有999044
个字符(标点符号 + sum(每个单词长度))
>>> sum([len(w) for w in text1])/len(text1)
3