Pandas

Pandas 介绍

Pandas主要处理的数据结构

·系列（Series）
·数据帧（DataFrame）
·面板（Panel）
·这些数组都建立在Numpy上，所以执行速度非常快

维数和描述

·考虑这些数据结构的最好方法是，较高维数据结构是其较低维数据的容器
·例如DataFrame是Series的容器，Panel是DataFarme的容器

数据结构	维数	描述
Series	1	1D标记均为数组，大小不变
DataFrame	2	一般2D标记，大小可变的表结构与潜在的异质类型的列
Panel	3	一般3D标记，大小可变数组

可变性

·所有Pandas的数据结构是值可变的（可以修改），除了Series都是大小可变的。
　　·DataFrame被广泛使用，是最重要的数据结构之一。
　　·Panel面板数据结构使用少得多

Series 系列（序列）

Series是具有均匀数据的一位数组结构
　　·例如以下Series是整数：10,23,56...的集合
·关键点
　　·均匀数据
　　·尺寸大小不变
　　·数据的值可变

10	23	56	17	52	61	73	90	26	72

DataFrame

·DataFrame是一个具有异构数据的二维数组。

Pandas使用入门

创建对象

·通过传递值列表来创建一个Series，让Pandas创建一个默认的整数Series：

import pandas as pd
import numpy as np
s1 = pd.Series(np.arange(5))
s2 = pd.Series([1,3,5,np.nan,6,8])
print(s1,s2)

0    0
1    1
2    2
3    3
4    4
dtype: int32 0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

·通过Numpy数组，使用datetime索引和标记列来创建DataFrame：

dates = pd.date_range('20190301',periods=7)
print(dates)
print('--'*25)
df = pd.DataFrame(np.random.randn(7,4),index = dates,columns = list('ABCD'))
print(df)

DatetimeIndex(['2019-03-01', '2019-03-02', '2019-03-03', '2019-03-04',
               '2019-03-05', '2019-03-06', '2019-03-07'],
              dtype='datetime64[ns]', freq='D')
--------------------------------------------------
                   A         B         C         D
2019-03-01 -2.020410 -0.924757 -1.888649 -0.051893
2019-03-02 -0.976324 -0.684672 -0.837020 -0.772484
2019-03-03  0.353127 -0.652408  0.557873 -0.678637
2019-03-04  0.135563  0.092274 -0.148957 -2.058148
2019-03-05 -0.117025 -0.202763  0.566309 -1.775363
2019-03-06  0.253763 -0.209275 -0.503625 -0.399764
2019-03-07 -0.307063  0.897490  1.056798 -0.901982

通过传递可以转换为Series的对象的字典来创建Dataframe

df2 = pd.DataFrame({'A':1.,
                   'B':pd.Timestamp('20190302'),
                   'C':pd.Series(1,index=list(range(4)),dtype='float32'),
                   'D':np.array([3]*4,dtype='int32'),
                   'E':pd.Categorical(['test','train','test','train']),
                   'F':'foo'})
print(df2)

     A          B    C  D      E    F
0  1.0 2019-03-02  1.0  3   test  foo
1  1.0 2019-03-02  1.0  3  train  foo
2  1.0 2019-03-02  1.0  3   test  foo
3  1.0 2019-03-02  1.0  3  train  foo

查看数据

head,tail 查看框架顶部和底部的数据行

print('head: \n',df.head())
print('-'*50)
print('Tail: \n',df.tail(3))

head: 
                    A         B         C         D
2019-03-01 -2.020410 -0.924757 -1.888649 -0.051893
2019-03-02 -0.976324 -0.684672 -0.837020 -0.772484
2019-03-03  0.353127 -0.652408  0.557873 -0.678637
2019-03-04  0.135563  0.092274 -0.148957 -2.058148
2019-03-05 -0.117025 -0.202763  0.566309 -1.775363
--------------------------------------------------
Tail: 
                    A         B         C         D
2019-03-05 -0.117025 -0.202763  0.566309 -1.775363
2019-03-06  0.253763 -0.209275 -0.503625 -0.399764
2019-03-07 -0.307063  0.897490  1.056798 -0.901982

index, columns,values显示索引，列和底层Numpy数据：

print('index is: ')
print(df.index)
print('columns is: ')
print(df.columns)
print('value is: ')
print(df.values)

index is: 
DatetimeIndex(['2019-03-01', '2019-03-02', '2019-03-03', '2019-03-04',
               '2019-03-05', '2019-03-06', '2019-03-07'],
              dtype='datetime64[ns]', freq='D')
columns is: 
Index(['A', 'B', 'C', 'D'], dtype='object')
value is: 
[[-2.02041037 -0.92475674 -1.88864928 -0.05189268]
 [-0.97632394 -0.68467222 -0.83701968 -0.77248437]
 [ 0.3531265  -0.6524075   0.55787276 -0.67863676]
 [ 0.13556278  0.09227419 -0.14895721 -2.05814846]
 [-0.11702479 -0.20276259  0.56630908 -1.77536338]
 [ 0.2537632  -0.20927471 -0.50362523 -0.3997636 ]
 [-0.30706349  0.89749039  1.05679755 -0.90198211]]

describe,info描述显示数据的快速统计摘要

df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 7 entries, 2019-03-01 to 2019-03-07
Freq: D
Data columns (total 4 columns):
A    7 non-null float64
B    7 non-null float64
C    7 non-null float64
D    7 non-null float64
dtypes: float64(4)
memory usage: 280.0 bytes

print(df.describe())

              A         B         C         D
count  7.000000  7.000000  7.000000  7.000000
mean  -0.382624 -0.240587 -0.171039 -0.948324
std    0.849108  0.611463  1.007256  0.721805
min   -2.020410 -0.924757 -1.888649 -2.058148
25%   -0.641694 -0.668540 -0.670322 -1.338673
50%   -0.117025 -0.209275 -0.148957 -0.772484
75%    0.194663 -0.055244  0.562091 -0.539200
max    0.353127  0.897490  1.056798 -0.051893

df.T纵横坐标调换数据

df.T

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>2019-03-01 00:00:00</th>
<th>2019-03-02 00:00:00</th>
<th>2019-03-03 00:00:00</th>
<th>2019-03-04 00:00:00</th>
<th>2019-03-05 00:00:00</th>
<th>2019-03-06 00:00:00</th>
<th>2019-03-07 00:00:00</th>
</tr>
</thead>
<tbody>
<tr>
<th>A</th>
<td>-2.020410</td>
<td>-0.976324</td>
<td>0.353127</td>
<td>0.135563</td>
<td>-0.117025</td>
<td>0.253763</td>
<td>-0.307063</td>
</tr>
<tr>
<th>B</th>
<td>-0.924757</td>
<td>-0.684672</td>
<td>-0.652408</td>
<td>0.092274</td>
<td>-0.202763</td>
<td>-0.209275</td>
<td>0.897490</td>
</tr>
<tr>
<th>C</th>
<td>-1.888649</td>
<td>-0.837020</td>
<td>0.557873</td>
<td>-0.148957</td>
<td>0.566309</td>
<td>-0.503625</td>
<td>1.056798</td>
</tr>
<tr>
<th>D</th>
<td>-0.051893</td>
<td>-0.772484</td>
<td>-0.678637</td>
<td>-2.058148</td>
<td>-1.775363</td>
<td>-0.399764</td>
<td>-0.901982</td>
</tr>
</tbody>
</table>
</div>

df.sort_index()通过轴排序

print(df.sort_index(axis=1,ascending=False))#

                   D         C         B         A
2019-03-01 -0.051893 -1.888649 -0.924757 -2.020410
2019-03-02 -0.772484 -0.837020 -0.684672 -0.976324
2019-03-03 -0.678637  0.557873 -0.652408  0.353127
2019-03-04 -2.058148 -0.148957  0.092274  0.135563
2019-03-05 -1.775363  0.566309 -0.202763 -0.117025
2019-03-06 -0.399764 -0.503625 -0.209275  0.253763
2019-03-07 -0.901982  1.056798  0.897490 -0.307063

print(df.sort_index(axis=0,ascending=False))

                   A         B         C         D
2019-03-07 -0.307063  0.897490  1.056798 -0.901982
2019-03-06  0.253763 -0.209275 -0.503625 -0.399764
2019-03-05 -0.117025 -0.202763  0.566309 -1.775363
2019-03-04  0.135563  0.092274 -0.148957 -2.058148
2019-03-03  0.353127 -0.652408  0.557873 -0.678637
2019-03-02 -0.976324 -0.684672 -0.837020 -0.772484
2019-03-01 -2.020410 -0.924757 -1.888649 -0.051893

print(df)

                   A         B         C         D
2019-03-01 -2.020410 -0.924757 -1.888649 -0.051893
2019-03-02 -0.976324 -0.684672 -0.837020 -0.772484
2019-03-03  0.353127 -0.652408  0.557873 -0.678637
2019-03-04  0.135563  0.092274 -0.148957 -2.058148
2019-03-05 -0.117025 -0.202763  0.566309 -1.775363
2019-03-06  0.253763 -0.209275 -0.503625 -0.399764
2019-03-07 -0.307063  0.897490  1.056798 -0.901982

df.sort_values()通过值排序

print(df.sort_values('C'))#,ascending = False加上为降序，默认升序,或df.sort_values(by = 'C')

                   A         B         C         D
2019-03-01 -2.020410 -0.924757 -1.888649 -0.051893
2019-03-02 -0.976324 -0.684672 -0.837020 -0.772484
2019-03-06  0.253763 -0.209275 -0.503625 -0.399764
2019-03-04  0.135563  0.092274 -0.148957 -2.058148
2019-03-03  0.353127 -0.652408  0.557873 -0.678637
2019-03-05 -0.117025 -0.202763  0.566309 -1.775363
2019-03-07 -0.307063  0.897490  1.056798 -0.901982

取数简单操作

获取某一列产生一个新的Series

a = df['A']
print(a)
print(type(a))

2019-03-01   -2.020410
2019-03-02   -0.976324
2019-03-03    0.353127
2019-03-04    0.135563
2019-03-05   -0.117025
2019-03-06    0.253763
2019-03-07   -0.307063
Freq: D, Name: A, dtype: float64
<class 'pandas.core.series.Series'>

通过[]操作符，选择切片行

print(df[0:3])# 定制几行到几行
print(df['2019-03-02':'2019-03-04'])# 定制某个值到某个值的范围

                   A         B         C         D
2019-03-01 -2.020410 -0.924757 -1.888649 -0.051893
2019-03-02 -0.976324 -0.684672 -0.837020 -0.772484
2019-03-03  0.353127 -0.652408  0.557873 -0.678637
                   A         B         C         D
2019-03-02 -0.976324 -0.684672 -0.837020 -0.772484
2019-03-03  0.353127 -0.652408  0.557873 -0.678637
2019-03-04  0.135563  0.092274 -0.148957 -2.058148

df.loc[]使用标签获取横切面

print(df.loc[dates[0]])

A   -2.020410
B   -0.924757
C   -1.888649
D   -0.051893
Name: 2019-03-01 00:00:00, dtype: float64

df.loc[]通过标签选择多个轴

print(df.loc[:,['A','B']])
print(df.A)

                   A         B
2019-03-01 -2.020410 -0.924757
2019-03-02 -0.976324 -0.684672
2019-03-03  0.353127 -0.652408
2019-03-04  0.135563  0.092274
2019-03-05 -0.117025 -0.202763
2019-03-06  0.253763 -0.209275
2019-03-07 -0.307063  0.897490
2019-03-01   -2.020410
2019-03-02   -0.976324
2019-03-03    0.353127
2019-03-04    0.135563
2019-03-05   -0.117025
2019-03-06    0.253763
2019-03-07   -0.307063
Freq: D, Name: A, dtype: float64

df.loc[] 显示标签切片，包括两个端点

print(df.loc['20190301':'20190305','A':'C'])
print(df.loc['20190301':'20190305',['A','B','C']]) #两种方式效果一样

                   A         B         C
2019-03-01 -2.020410 -0.924757 -1.888649
2019-03-02 -0.976324 -0.684672 -0.837020
2019-03-03  0.353127 -0.652408  0.557873
2019-03-04  0.135563  0.092274 -0.148957
2019-03-05 -0.117025 -0.202763  0.566309
                   A         B         C
2019-03-01 -2.020410 -0.924757 -1.888649
2019-03-02 -0.976324 -0.684672 -0.837020
2019-03-03  0.353127 -0.652408  0.557873
2019-03-04  0.135563  0.092274 -0.148957
2019-03-05 -0.117025 -0.202763  0.566309

df.loc[]/df.at获取某一个标量值

print(df.loc[dates[0],'A'])
print(df.at[dates[0],'A'])

-2.0204103737371066
-2.0204103737371066

df.iloc[]/df.iat[]通过位置来选择

print(df)
print('-'*50)
print('df.iloc[0,0]: \n',df.iloc[0,0])
print('-'*50)
print('df.iat[0,0]: \n',df.iat[0,0])#同上iloc[]
print('-'*50)
print('df.iloc[0]: ',df.iloc[0])
print('-'*50)
print('df.iloc[:,0]: \n',df.iloc[:,0])
print('-'*50)
print('df.iloc[0:2,3:5]: \n',df.iloc[0:2,3:5])
print('-'*50)
print('df.iloc[[0,2,3],[1,3,6]]: \n',df.iloc[[1,3,6],[0,2,3]])

                   A         B         C         D
2019-03-01 -2.020410 -0.924757 -1.888649 -0.051893
2019-03-02 -0.976324 -0.684672 -0.837020 -0.772484
2019-03-03  0.353127 -0.652408  0.557873 -0.678637
2019-03-04  0.135563  0.092274 -0.148957 -2.058148
2019-03-05 -0.117025 -0.202763  0.566309 -1.775363
2019-03-06  0.253763 -0.209275 -0.503625 -0.399764
2019-03-07 -0.307063  0.897490  1.056798 -0.901982
--------------------------------------------------
df.iloc[0,0]: 
 -2.0204103737371066
--------------------------------------------------
df.iat[0,0]: 
 -2.0204103737371066
--------------------------------------------------
df.iloc[0]:  A   -2.020410
B   -0.924757
C   -1.888649
D   -0.051893
Name: 2019-03-01 00:00:00, dtype: float64
--------------------------------------------------
df.iloc[:,0]: 
 2019-03-01   -2.020410
2019-03-02   -0.976324
2019-03-03    0.353127
2019-03-04    0.135563
2019-03-05   -0.117025
2019-03-06    0.253763
2019-03-07   -0.307063
Freq: D, Name: A, dtype: float64
--------------------------------------------------
df.iloc[0:2,3:5]: 
                    D
2019-03-01 -0.051893
2019-03-02 -0.772484
--------------------------------------------------
df.iloc[[0,2,3],[1,3,6]]: 
                    A         C         D
2019-03-02 -0.976324 -0.837020 -0.772484
2019-03-04  0.135563 -0.148957 -2.058148
2019-03-07 -0.307063  1.056798 -0.901982

布尔索引：使用单列Series的某个条件的值来选取数据

print(df[df.B>0])

                   A         B         C         D
2019-03-04  0.135563  0.092274 -0.148957 -2.058148
2019-03-07 -0.307063  0.897490  1.056798 -0.901982

布尔索引：满足条件的DataFrame中选值：

print(df[df>0])

                   A         B         C   D
2019-03-01       NaN       NaN       NaN NaN
2019-03-02       NaN       NaN       NaN NaN
2019-03-03  0.353127       NaN  0.557873 NaN
2019-03-04  0.135563  0.092274       NaN NaN
2019-03-05       NaN       NaN  0.566309 NaN
2019-03-06  0.253763       NaN       NaN NaN
2019-03-07       NaN  0.897490  1.056798 NaN

isin() 过滤数据(接收一个参数，元组或列表)

df2 = df.copy()
df2['E'] = ['one','two','three','four','five','six','seven']
print(df2)
print(df2[df2['E'].isin(('one','two'))])

                   A         B         C         D      E
2019-03-01 -2.020410 -0.924757 -1.888649 -0.051893    one
2019-03-02 -0.976324 -0.684672 -0.837020 -0.772484    two
2019-03-03  0.353127 -0.652408  0.557873 -0.678637  three
2019-03-04  0.135563  0.092274 -0.148957 -2.058148   four
2019-03-05 -0.117025 -0.202763  0.566309 -1.775363   five
2019-03-06  0.253763 -0.209275 -0.503625 -0.399764    six
2019-03-07 -0.307063  0.897490  1.056798 -0.901982  seven
                   A         B         C         D    E
2019-03-01 -2.020410 -0.924757 -1.888649 -0.051893  one
2019-03-02 -0.976324 -0.684672 -0.837020 -0.772484  two

修改DataFrame

添加列：字典，直接添加，Series+Series添加

d = {'one':pd.Series([1,2,3],index = ['a','b','c']),
    'two':pd.Series([1,2,3,4],index = ['a','b','c','d'])}
df3 = pd.DataFrame(d)
print(df3)
df3['three'] = pd.Series([20,30,40],index = ['a','b','d'])
print(df3)
df3['four']=df3['one']+df3['three']
print(df3)

   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  NaN    4
   one  two  three
a  1.0    1   20.0
b  2.0    2   30.0
c  3.0    3    NaN
d  NaN    4   40.0
   one  two  three  four
a  1.0    1   20.0  21.0
b  2.0    2   30.0  32.0
c  3.0    3    NaN   NaN
d  NaN    4   40.0   NaN

删除列：del，pop

pop 具有返回值，比如 a = pop df['three']会将删除的值赋予a

print(df)
del df['three']
print(df)

                   A         B         C         D  three
2019-03-01 -2.020410 -0.924757 -1.888649 -0.051893    NaN
2019-03-02 -0.976324 -0.684672 -0.837020 -0.772484    NaN
2019-03-03  0.353127 -0.652408  0.557873 -0.678637    NaN
2019-03-04  0.135563  0.092274 -0.148957 -2.058148    NaN
2019-03-05 -0.117025 -0.202763  0.566309 -1.775363    NaN
2019-03-06  0.253763 -0.209275 -0.503625 -0.399764    NaN
2019-03-07 -0.307063  0.897490  1.056798 -0.901982    NaN
                   A         B         C         D
2019-03-01 -2.020410 -0.924757 -1.888649 -0.051893
2019-03-02 -0.976324 -0.684672 -0.837020 -0.772484
2019-03-03  0.353127 -0.652408  0.557873 -0.678637
2019-03-04  0.135563  0.092274 -0.148957 -2.058148
2019-03-05 -0.117025 -0.202763  0.566309 -1.775363
2019-03-06  0.253763 -0.209275 -0.503625 -0.399764
2019-03-07 -0.307063  0.897490  1.056798 -0.901982

添加行：append

df3

<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>one</th>
<th>two</th>
<th>three</th>
<th>four</th>
</tr>
</thead>
<tbody>
<tr>
<th>a</th>
<td>1.0</td>
<td>1</td>
<td>20.0</td>
<td>21.0</td>
</tr>
<tr>
<th>b</th>
<td>2.0</td>
<td>2</td>
<td>30.0</td>
<td>32.0</td>
</tr>
<tr>
<th>c</th>
<td>3.0</td>
<td>3</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<th>d</th>
<td>NaN</td>
<td>4</td>
<td>40.0</td>
<td>NaN</td>
</tr>
</tbody>
</table>
</div>

df4 = pd.DataFrame([[5,6,7,8,9,10]],columns = ['one','two','three','four','five','six'],index=['d'])

print(df3.append(df4))#没有改变df3，只是重新生成了一个新的DataFrame
print('-'*50)
print(df3)

   five  four  one   six  three  two
a   NaN  21.0  1.0   NaN   20.0    1
b   NaN  32.0  2.0   NaN   30.0    2
c   NaN   NaN  3.0   NaN    NaN    3
d   NaN   NaN  NaN   NaN   40.0    4
d   9.0   8.0  5.0  10.0    7.0    6
--------------------------------------------------
   one  two  three  four
a  1.0    1   20.0  21.0
b  2.0    2   30.0  32.0
c  3.0    3    NaN   NaN
d  NaN    4   40.0   NaN

删除行：df.drop()使用索引标签从DataFrame中删除行

如果行标签是重复的则会删除多行。
删除行不会操作原表，只有重新赋值才能修改

print(df3)
print('-'*50)
print(df3.drop('d'))
print('-'*50)
print(df3)

   one  two  three  four
a  1.0    1   20.0  21.0
b  2.0    2   30.0  32.0
c  3.0    3    NaN   NaN
d  NaN    4   40.0   NaN
--------------------------------------------------
   one  two  three  four
a  1.0    1   20.0  21.0
b  2.0    2   30.0  32.0
c  3.0    3    NaN   NaN
--------------------------------------------------
   one  two  three  four
a  1.0    1   20.0  21.0
b  2.0    2   30.0  32.0
c  3.0    3    NaN   NaN
d  NaN    4   40.0   NaN

Python Pandas入门操作小结

编号	属性或方法	描述
1	T	转置行和列
2	axes	返回一个列或行，行轴标签和列轴标签作为唯一成员
3	dtypes	返回此对象中的数据类型(dtypes)
4	empty	如果NDFrame完全为空（无项目），则返回True
5	ndim	轴/数组维度大小
6	shape	返回表示DataFrame的维度的数组
7	size	NDFrame中的元素数
8	values	NDFrame中的Numpy表示
9	head()	返回开头n行，默认5行
10	tail()	返回末尾n行

print(df)
print('-'*50)
print(df.axes)
print('-'*50)
print(df.dtypes)
print('-'*50)
print(df.empty)
print('-'*50)
print(df.ndim)
print('-'*50)
print(df.shape)
print('-'*50)
print(df.values)
print('-'*50)

                   A         B         C         D
2019-03-01 -2.020410 -0.924757 -1.888649 -0.051893
2019-03-02 -0.976324 -0.684672 -0.837020 -0.772484
2019-03-03  0.353127 -0.652408  0.557873 -0.678637
2019-03-04  0.135563  0.092274 -0.148957 -2.058148
2019-03-05 -0.117025 -0.202763  0.566309 -1.775363
2019-03-06  0.253763 -0.209275 -0.503625 -0.399764
2019-03-07 -0.307063  0.897490  1.056798 -0.901982
--------------------------------------------------
[DatetimeIndex(['2019-03-01', '2019-03-02', '2019-03-03', '2019-03-04',
               '2019-03-05', '2019-03-06', '2019-03-07'],
              dtype='datetime64[ns]', freq='D'), Index(['A', 'B', 'C', 'D'], dtype='object')]
--------------------------------------------------
A    float64
B    float64
C    float64
D    float64
dtype: object
--------------------------------------------------
False
--------------------------------------------------
2
--------------------------------------------------
(7, 4)
--------------------------------------------------
[[-2.02041037 -0.92475674 -1.88864928 -0.05189268]
 [-0.97632394 -0.68467222 -0.83701968 -0.77248437]
 [ 0.3531265  -0.6524075   0.55787276 -0.67863676]
 [ 0.13556278  0.09227419 -0.14895721 -2.05814846]
 [-0.11702479 -0.20276259  0.56630908 -1.77536338]
 [ 0.2537632  -0.20927471 -0.50362523 -0.3997636 ]
 [-0.30706349  0.89749039  1.05679755 -0.90198211]]
--------------------------------------------------

Pandas-2019-03-14

Pandas-2019-03-14

Pandas

Pandas 介绍

Pandas主要处理的数据结构

维数和描述

可变性

Series 系列（序列）

DataFrame

Pandas使用入门

创建对象

·通过传递值列表来创建一个Series，让Pandas创建一个默认的整数Series：

·通过Numpy数组，使用datetime索引和标记列来创建DataFrame：

通过传递可以转换为Series的对象的字典来创建Dataframe

查看数据

head,tail 查看框架顶部和底部的数据行

index, columns,values显示索引，列和底层Numpy数据：

describe,info描述显示数据的快速统计摘要

df.T纵横坐标调换数据

df.sort_index()通过轴排序

df.sort_values()通过值排序

取数简单操作

获取某一列产生一个新的Series

通过[]操作符，选择切片行

df.loc[]使用标签获取横切面

df.loc[]通过标签选择多个轴

df.loc[] 显示标签切片，包括两个端点

df.loc[]/df.at获取某一个标量值

df.iloc[]/df.iat[]通过位置来选择

布尔索引：使用单列Series的某个条件的值来选取数据

布尔索引：满足条件的DataFrame中选值：

isin() 过滤数据(接收一个参数，元组或列表)

修改DataFrame

添加列：字典，直接添加，Series+Series添加

删除列：del，pop

添加行：append

删除行：df.drop()使用索引标签从DataFrame中删除行

Python Pandas入门操作小结