1. Introducing Thanksgiving Dinner Data
Instructions
Import the pandas package.
使用pandas.read_csv()函数来读取thanksgiving.csv
文件。确保指定关键字参数encoding="Latin-1",如CSV文件通常不编码。
分配结果的变量data。
显示的前几行data,看看行和列的样子。
-
In a separate notebook cell, display all of the column names to get a sense of what the data consists of.
- 您可以使用pandas.DataFrame.columns属性显示的列名。
import pandas as pd
data = pd.read_csv("thanksgiving.csv", encoding="Latin-1")
data.head()
data.columns()
3. Using value_counts To Explore Main Dishes
input
print(data['What is typically the main dish at your Thanksgiving dinner?'].value_counts())
output
Turkey 859Other (please specify) 35Ham/Pork 29Tofurkey 20Chicken 12Roast beef 11I don't know 5Turducken 3Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64
4. Figuring Out What Pies People Eat
input
apple_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple'])
pumpkin_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin'])
pecan_isnull = pd.isnull(data['Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan'])
ate_pies = apple_isnull & pumpkin_isnull & pecan_isnull
print(ate_pies.value_counts())
output
False 876True 182dtype: int64
# 说明有182个选项就没有选择三者pie的任意一种
5. Converting Age To Numeric
input
print(data['Age'].value_counts())
output
45 - 59 28660+ 26430 - 44 25918 - 29 216Name: Age, dtype: int64
input
def str_to_int(age_str):
if pd.isnull(age_str): # Use the isnull() function to check if the value is null. If it is, return None.
return None
age_str = age_str.split(' ')[0]# Split the string on the space character (), and extract the first item of the resulting list.
age_str = age_str.replace('+', '') # Replace the + character in the result with an empty string to remove it.
return int(age_str) # Use int() to convert the result to an integer.
data['int_age'] = data['Age'].apply(str_to_int) # Use the pandas.Series.apply() method to apply the function to each value in the Age column of data.
data['int_age'].describe() # Call the pandas.Series.describe() method on the int_age column of data, and display the result.
output
count 1025.000000mean 39.383415std 15.398493min 18.00000025% 30.00000050% 45.00000075% 60.000000max 60.000000Name: int_age, dtype: float64
6. Converting Income To Numeric
input
print(data['How much total combined money did all members of your HOUSEHOLD earn last year?'].value_counts())
output
$25,000 to $49,999 180Prefer not to answer 136$50,000 to $74,999 135$75,000 to $99,999 133$100,000 to $124,999 111$200,000 and up 80$10,000 to $24,999 68$0 to $9,999 66$125,000 to $149,999 49$150,000 to $174,999 40$175,000 to $199,999 27Name: How much total combined money did all members of your HOUSEHOLD earn last year?, dtype: int64
input
def income_to_int(income_str):
if pd.isnull(income_str): # Use the isnull() function to check if the value is null. If it is, return None.
return None
income_str = income_str.split(' ')[0] # Split the string on the space character (), and extract the first item of the resulting list.
if income_str == 'Prefer':
return None
income_str = income_str.replace('$', '')
income_str = income_str.replace(',', '')
return int(income_str)
data['int_income'] = data['How much total combined money did all members of your HOUSEHOLD earn last year?'].apply(income_to_int)
print(data['int_income'].describe())
output
count 889.000000mean 74077.615298std 59360.742902min 0.00000025% 25000.00000050% 50000.00000075% 100000.000000max 200000.000000Name: int_income, dtype: float64
7. Correlating Travel Distance And Income
input
print(data[data['int_income'] < 150000]['How far will you travel for Thanksgiving?'].value_counts())
print('--------------------------------------------------')
print(data[data['int_income'] > 150000]['How far will you travel for Thanksgiving?'].value_counts())
output
Thanksgiving is happening at my home--I won't travel at all 281Thanksgiving is local--it will take place in the town I live in 203Thanksgiving is out of town but not too far--it's a drive of a few hours or less 150Thanksgiving is out of town and far away--I have to drive several hours or fly 55Name: How far will you travel for Thanksgiving?, dtype: int64--------------------------------------------------Thanksgiving is happening at my home--I won't travel at all 49Thanksgiving is local--it will take place in the town I live in 25Thanksgiving is out of town but not too far--it's a drive of a few hours or less 16Thanksgiving is out of town and far away--I have to drive several hours or fly 12Name: How far will you travel for Thanksgiving?, dtype: int64
8. Linking Friendship And Age
input
data.pivot_table(
index = "Have you ever tried to meet up with hometown friends on Thanksgiving night?",
columns = 'Have you ever attended a "Friendsgiving?"',
values = 'int_age'
)
output
[图片上传中。。。(1)]#####input
data.pivot_table(
index = 'Have you ever tried to meet up with hometown friends on Thanksgiving night?',
columns = 'Have you ever attended a "Friendsgiving?"',
values = 'int_income'
)
output
[图片上传中。。。(2)]