Data Manipulation and Exploratory Analysis with Pandas¶
Contact: Lachlan Deer, [econgit] @ldeer, [github/twitter] @lachlandeer
Motivation¶
In yesterday’s sessions on Numpy and SciPy we learned how to use python for scientific computing when the object of interest is a matrix. This is the way that someone who comes from (and outdated) matlab training will think of pretty much everything, and in fact proves helpful for a lot of the analysis we do as economists.
However, if we are working with data sets - like we would do in Stata or R it would be nice to work with a similar object in Python. The package pandas
gives us that option - it brings with it objects called Series
to store an individual column of data, and Dataframes
to store multiple columns. These objects build on Numpy’s array structure and work well when we want to do the typical ‘data wrangling’ tasks that empirical work typicall entails.
Pandas
also brings with it many important features for working with data. For example it deals well with missing data, works well with pivot tables and aggregation functions.
Let’s beging our adventures with pandas…
Importing Pandas¶
import pandas
or in the python world, more typically
import pandas as pd
pandas.__version__
'1.3.3'
Pandas Documentation inside jupyter notebooks¶
To display pandas built in documentation:
pd.functionName?
Object `pd.functionName` not found.
and we get tab completion when using the contents of the pandas namespace
pd.<TAB>
File "/tmp/ipykernel_2759/2747507604.py", line 1
pd.<TAB>
^
SyntaxError: invalid syntax