Python 字典基础回顾

关键词 python、dict、data struct、python字典、python collections、dafultdict、Counter

Python 中字典是一种基本的数据结构，它将键与值联系起来，形成<Key - Value>的键值对形式，让我们可以通过键快速找到对应的值。
在这篇文章的以下内容，可以了解到以下内容：

Python 字典的基础用法
- Python 字典的创建
- Python 字典的赋值
- Python 字典的查找
- Python字典作为简单的数据结构使用
collections 包的两个工具使用
- dafaultdict
- Counter

Python 字典的基础用法

下面将通过 Python 字典的创建，赋值以及查找三个方面介绍 Python 字典的基础用法，以及最后通过利用 Python 构造一个简单的复合数据结构。

Python 字典创建

在 Python 中创建字典很简单，使用 { } 即可创建一个空字典，可以使用 : 来连接键和值，然后使用 , 分割多个键值对。

# 字典创建
empty_dict = {}
member = {"Lilei": 16, "Hanmeimei": 17}

Python 字典查询

Python 中的字典查询主要有两种方法，一种是用 [ ] 直接通过键名查询，另一种方法是通过 .get() 来获取键名。

# 查询
# Issue 1, 直接通过键名获取
print("Issue 1 : ", member["Lilei"])
# Issue 2, 通过 get 获取
print("Issue 2 : ", member.get("Lilei"))
# Issue 3, 如果通过 get 获取的键名不存在，返回默认值
print("Issue 3 : ", member.get("Mike"))
# Issue 4, 可以通过 get 获取，设置默认值，如果键名不存在，返回设置的默认值
print("Issue 4 : ", member.get("Mike", 18))

>>>>> 以下为程序输出结果 >>>>>
Issue 1 :  16
Issue 2 :  16
Issue 3 :  None
Issue 4 :  18

Python 字典赋值

Python 字典赋值与 Python 字典查询类似，可以用 [ ] 来为指定的键赋值，如果被指定的键不存在，Python 会为你创建一个新的键值对。

# 赋值
# Issue 1, 直接通过方括号赋值
member["Lilei"] = 18
print("Issue 1 : ", member["Lilei"])
# Issue 2，通过方括号为不存在的键名创建新值
member["Tony"] = 20
print("Issue 2 : ", member["Tony"])

>>>>> 以下为程序输出结果 >>>>>
Issue 1 :  18
Issue 2 :  20

更深入的 Python 查找

在实际应用中，我们可能会尝试获取一个不存在的键名，这是，Python 会报出 KeyError 的错误，我们可以通过 try - except 捕获异常来处理，此外，我们也可以通过 in 来判断键名是否存在。

# 查找
# Issue 1 如果键名不存在与字典，会返回 KeyError 错误
try:
    mike_member = member["Mike"]
except KeyError:
    print("Issue 1 : Can not found member named: Mike")
# Issue 2 可以用 in 查找键是否存在
print("Issue 2 : Mike in member: ", "Mike" in member)
print("Issue 2: Lilei in member: ", "Lilei" in member)

>>>>> 以下为程序输出结果 >>>>>
Issue 1 : Can not found member named: Mike
Issue 2 : Mike in member:  False
Issue 2: Lilei in member:  True

字典作为简单复合数据结构使用

通常，我们可以使用类和对象表示一个实体，但一些情况下，为了方便，我们也可以通过字典来表示一个实体，以下代码演示通过字典来实验一个简单的 SNS 应用的一条消息实体，其中包含了用户名、信息内容以及用户标签。

weixin = {
    "user": "Tony",
    "text": "Python is the best language",
    "hashtags": ["#python", "#java", "#data"]
}
# 获取键名
print("Key: ", weixin.keys())
# 获取键值
print("Value: ", weixin.values())
# 获取 键-值 元组列表
print("K-V tuple: ", weixin.items())

>>>>> 以下为程序输出结果 >>>>>
Key:  dict_keys(['user', 'text', 'hashtags'])
Value:  dict_values(['Tony', 'Python is the best language', ['#python', '#java', '#data']])
K-V tuple:  dict_items([('user', 'Tony'), ('text', 'Python is the best language'), ('hashtags', ['#python', '#java', '#data'])])

collection 包的两个工具的使用

我们实际生产中，有很多情景需要统计个数，比如，统计一段文字里面单词出现个个数等，这时候，我们可以通过 Python 字典原生的功能实现，但通常情况下，使用 collections 包提供的 defaultdict 和 Counter 工具更为方便。
一下是一段来自维基百科对 Python 介绍的一段文字：

Python is a widely used high-level programming language for general-purpose programming, created by Guido van Rossum and first released in 1991. An interpreted language, Python has a design philosophy that emphasizes code readability notably using whitespace indentation to delimit code blocks rather than curly brackets or keywords, and a syntax that allows programmers to express concepts in fewer lines of code than might be used in languages such as C++ or Java.The language provides constructs intended to enable writing clear programs on both a small and large scale.

我们的目标是统计这一段文字里面，不同单词的出现个数。首先，我们需要先对这段文字进行一些处理，先清除标点符号，以及去除空格，将单词存放到一个字典里。

# 字典计数
raw_document = "Python is a widely used high-level programming language for general-purpose programming, created by Guido van Rossum and first released in 1991. An interpreted language, Python has a design philosophy that emphasizes code readability notably using whitespace indentation to delimit code blocks rather than curly brackets or keywords, and a syntax that allows programmers to express concepts in fewer lines of code than might be used in languages such as C++ or Java.The language provides constructs intended to enable writing clear programs on both a small and large scale."
# 去标点符号
non_punctuation_document = raw_document.replace(",", "").replace(".", "")
document = non_punctuation_document.split(" ")

接下来，我们尝试使用 Python 字典原生的方法来统计个数

# Issue 1， 使用字典原生方法统计个数
word_counts = {}
for word in document:
    previous_count = word_counts.get(word, 0)
    word_counts[word] = previous_count + 1
print("Issue 1, count the words in document: ", word_counts)

>>>>> 以下为程序输出结果 >>>>>
Issue 1, count the words in document:  {'Python': 2, 'is': 1, 'a': 4, 'widely': 1, 'used': 2, 'high-level': 1, 'programming': 2, 'language': 3, 'for': 1, 'general-purpose': 1, 'created': 1, 'by': 1, 'Guido': 1, 'van': 1, 'Rossum': 1, 'and': 3, 'first': 1, 'released': 1, 'in': 3, '1991': 1, 'An': 1, 'interpreted': 1, 'has': 1, 'design': 1, 'philosophy': 1, 'that': 2, 'emphasizes': 1, 'code': 3, 'readability': 1, 'notably': 1, 'using': 1, 'whitespace': 1, 'indentation': 1, 'to': 3, 'delimit': 1, 'blocks': 1, 'rather': 1, 'than': 2, 'curly': 1, 'brackets': 1, 'or': 2, 'keywords': 1, 'syntax': 1, 'allows': 1, 'programmers': 1, 'express': 1, 'concepts': 1, 'fewer': 1, 'lines': 1, 'of': 1, 'might': 1, 'be': 1, 'languages': 1, 'such': 1, 'as': 1, 'C++': 1, 'JavaThe': 1, 'provides': 1, 'constructs': 1, 'intended': 1, 'enable': 1, 'writing': 1, 'clear': 1, 'programs': 1, 'on': 1, 'both': 1, 'small': 1, 'large': 1, 'scale': 1}

使用 collections 的 dafaultdict 来统计单词出现个数

dafaultdict 相当于一个标准的字典，除了当前查找一个没有包含在内的键时，它会通过提供的零参数函数自动建立一个新键，并为它的值增加 1，使用 dafaultdict 的方法如下：

# Issue 2, 使用 defaultdict 统计词个数
from collections import defaultdict
word_counts = defaultdict(int)
for word in document:
    word_counts[word] += 1
print("Issue 2, count the words in document by defaultdict: ", word_counts)

>>>>> 以下为程序输出结果 >>>>>
Issue 2, count the words in document by defaultdict:  defaultdict(<class 'int'>, {'Python': 2, 'is': 1, 'a': 4, 'widely': 1, 'used': 2, 'high-level': 1, 'programming': 2, 'language': 3, 'for': 1, 'general-purpose': 1, 'created': 1, 'by': 1, 'Guido': 1, 'van': 1, 'Rossum': 1, 'and': 3, 'first': 1, 'released': 1, 'in': 3, '1991': 1, 'An': 1, 'interpreted': 1, 'has': 1, 'design': 1, 'philosophy': 1, 'that': 2, 'emphasizes': 1, 'code': 3, 'readability': 1, 'notably': 1, 'using': 1, 'whitespace': 1, 'indentation': 1, 'to': 3, 'delimit': 1, 'blocks': 1, 'rather': 1, 'than': 2, 'curly': 1, 'brackets': 1, 'or': 2, 'keywords': 1, 'syntax': 1, 'allows': 1, 'programmers': 1, 'express': 1, 'concepts': 1, 'fewer': 1, 'lines': 1, 'of': 1, 'might': 1, 'be': 1, 'languages': 1, 'such': 1, 'as': 1, 'C++': 1, 'JavaThe': 1, 'provides': 1, 'constructs': 1, 'intended': 1, 'enable': 1, 'writing': 1, 'clear': 1, 'programs': 1, 'on': 1, 'both': 1, 'small': 1, 'large': 1, 'scale': 1})

我们可以看到，使用 defaultdict 代码量会比直接使用字典简单，而且输出的结果是一样的。

使用 collections 的 Counter 来统计单词数目

除了统计单词数目外，我们在实际中可能更需要经过筛选处理的结果，这里我们使用 Counter 可以列出单词出现个数排名前十的单词及其出现的次数，具体代码如下：

# Issue 3， 使用 Counter 统计词个数
from collections import Counter
word_counts = Counter(document)
for word, count in word_counts.most_common(10):
    print("Issue 3, most common word in documents: ", word, count)

>>>>> 以下为程序输出结果 >>>>>
Issue 3, most common word in documents:  a 4
Issue 3, most common word in documents:  language 3
Issue 3, most common word in documents:  and 3
Issue 3, most common word in documents:  in 3
Issue 3, most common word in documents:  code 3
Issue 3, most common word in documents:  to 3
Issue 3, most common word in documents:  Python 2
Issue 3, most common word in documents:  used 2
Issue 3, most common word in documents:  programming 2
Issue 3, most common word in documents:  that 2

总结

通过这篇文章，我们回顾了 Python 字典的基本用法，之后通过一个简单的实例，尝试了使用 collections 提供的 defaultdict 以及 Counter 包，了解如何通过字典来统计数目。

参考资料

[1] Joel Grus. 数据科学入门(第2章 Python速成) [978-7-115-41741-1].人民邮电出版社

最后编辑于：2017.12.10 17:18:35

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 202,980评论 5赞 476
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,178评论 2赞 380
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 149,868评论 0赞 336
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,498评论 1赞 273
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,492评论 5赞 364
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,521评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,910评论 3赞 395
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,569评论 0赞 256
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,793评论 1赞 296
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,559评论 2赞 319
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,639评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,342评论 4赞 318
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,931评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,904评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,144评论 1赞 259
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 42,833评论 2赞 349
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,350评论 2赞 342