Pandas Cheatsheet
First refers to pandas DataFrame cheatsheet.pdf
Series
DataFrame
Initialization
Load and write data from other sources
- csv
- MySQL
- Hadoop (impyla(as_pandas), happybase)
Woring with row and column index
df.index
df.columns
Work with columns of data (axis=1)
Work with rows of data (axis=0)
Work with cells
- do a comprehensive summary of pandas indexing !!!
Join/combine DataFrame
Split DataFrame
- use list comprehension
target = [x[11] for x in dataset]
train = [x[0:11] for x in dataset]
Work with whole DataFrame
Work with dates, times and their indexes
Work with strings
Work with missing and non-finite value
Basic Statistics
Work with Categorical data
Annoying Part:
Copy vs View
use of direct index will return a new copy of data, therefore is not recommended for modify things
http://stackoverflow.com/questions/20625582/how-to-deal-with-this-pandas-warning
From what I gather, SettingWithCopyWarning was created to flag potentially confusing "chained" assignments, such as the following, which don't always work as expected, particularly when the first selection returns a copy. [see GH5390 and GH5597 for background discussion.]
df[df['A'] > 2]['B'] = new_val # new_val not set in df
The warning offers a suggestion to rewrite as follows:
df.loc[df['A'] > 2, 'B'] = new_val
However, this doesn't fit your usage, which is equivalent to:
df = df[df['A'] > 2]
df['B'] = new_val
modify in place vs return a new value
index of row and column
select index from row or column by direct index is extremely similar with subtle difference: