{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Aggregate Functions and Group Statistics\n",
"* Contact: Lachlan Deer, [econgit] @ldeer, [github/twitter] @lachlandeer\n",
"\n",
"So far our work with Pandas as been purely about simple transformations of each row or data; whether selecting them or creating new columns out of old ones. This is rarely enough - we typically want some degree of summarization of the data; either of an entire column; or on parts of (groups) of the column in question.\n",
"\n",
"This is what we will go after here:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"data = pd.read_csv('out_data/state_labour_statistics.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Simple Aggregation\n",
"\n",
"Like NumPy, Pandas has some of the standard data aggregations:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 0.618795\n",
"1 0.591624\n",
"2 0.888684\n",
"3 0.891655\n",
"4 0.457567\n",
"dtype: float64"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rng = np.random.RandomState(1234567890)\n",
"series = pd.Series(rng.rand(5))\n",
"series\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"3.4483242705353976"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"series.sum()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.6896648541070796, 0.61879477158568, 0.89165480011560816)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"series.mean(), series.median(), series.max()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For a data-frame; these aggregations return results for each column:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
" \n",
" | \n",
" A | \n",
" B | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.778188 | \n",
" 0.400999 | \n",
"
\n",
" \n",
" 1 | \n",
" 0.267064 | \n",
" 0.705406 | \n",
"
\n",
" \n",
" 2 | \n",
" 0.996106 | \n",
" 0.405186 | \n",
"
\n",
" \n",
" 3 | \n",
" 0.540095 | \n",
" 0.949991 | \n",
"
\n",
" \n",
" 4 | \n",
" 0.537522 | \n",
" 0.030754 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B\n",
"0 0.778188 0.400999\n",
"1 0.267064 0.705406\n",
"2 0.996106 0.405186\n",
"3 0.540095 0.949991\n",
"4 0.537522 0.030754"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame({'A': rng.rand(5),\n",
" 'B': rng.rand(5)})\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"A 0.623795\n",
"B 0.498467\n",
"dtype: float64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"if you want the aggregation across a row, you can specify thats what you want:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 0.589594\n",
"1 0.486235\n",
"2 0.700646\n",
"3 0.745043\n",
"4 0.284138\n",
"dtype: float64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.mean(axis='columns')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Often times, we want to see all of the common aggregates at once. Pandas provides a function `.describe()` to do this for us:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" year | \n",
" unemployment_rate | \n",
" qty_unemployed | \n",
" qty_employed | \n",
" labour_force | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 10608.00000 | \n",
" 10608.000000 | \n",
" 1.060800e+04 | \n",
" 1.060800e+04 | \n",
" 1.060800e+04 | \n",
"
\n",
" \n",
" mean | \n",
" 2008.00000 | \n",
" 5.943194 | \n",
" 1.849809e+05 | \n",
" 2.751157e+06 | \n",
" 2.936138e+06 | \n",
"
\n",
" \n",
" std | \n",
" 4.89921 | \n",
" 2.261388 | \n",
" 2.433025e+05 | \n",
" 3.021023e+06 | \n",
" 3.242657e+06 | \n",
"
\n",
" \n",
" min | \n",
" 2000.00000 | \n",
" 1.700000 | \n",
" 6.697000e+03 | \n",
" 2.486420e+05 | \n",
" 2.619910e+05 | \n",
"
\n",
" \n",
" 25% | \n",
" 2004.00000 | \n",
" 4.400000 | \n",
" 4.088675e+04 | \n",
" 7.359608e+05 | \n",
" 7.836765e+05 | \n",
"
\n",
" \n",
" 50% | \n",
" 2008.00000 | \n",
" 5.500000 | \n",
" 1.150930e+05 | \n",
" 1.837300e+06 | \n",
" 1.961558e+06 | \n",
"
\n",
" \n",
" 75% | \n",
" 2012.00000 | \n",
" 7.000000 | \n",
" 2.212182e+05 | \n",
" 3.226836e+06 | \n",
" 3.451084e+06 | \n",
"
\n",
" \n",
" max | \n",
" 2016.00000 | \n",
" 17.300000 | \n",
" 2.316525e+06 | \n",
" 1.819380e+07 | \n",
" 1.920923e+07 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" year unemployment_rate qty_unemployed qty_employed \\\n",
"count 10608.00000 10608.000000 1.060800e+04 1.060800e+04 \n",
"mean 2008.00000 5.943194 1.849809e+05 2.751157e+06 \n",
"std 4.89921 2.261388 2.433025e+05 3.021023e+06 \n",
"min 2000.00000 1.700000 6.697000e+03 2.486420e+05 \n",
"25% 2004.00000 4.400000 4.088675e+04 7.359608e+05 \n",
"50% 2008.00000 5.500000 1.150930e+05 1.837300e+06 \n",
"75% 2012.00000 7.000000 2.212182e+05 3.226836e+06 \n",
"max 2016.00000 17.300000 2.316525e+06 1.819380e+07 \n",
"\n",
" labour_force \n",
"count 1.060800e+04 \n",
"mean 2.936138e+06 \n",
"std 3.242657e+06 \n",
"min 2.619910e+05 \n",
"25% 7.836765e+05 \n",
"50% 1.961558e+06 \n",
"75% 3.451084e+06 \n",
"max 1.920923e+07 "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Some useful aggregation functions in pandas are in the table below:\n",
"\n",
"[insert table]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Whilst these aggregations can provide some notions of patterns in the data - often we want to go deeper - and compute these summaries at the level of a group (like a state of a year). Pandas has the `groupby` function that will allow us to efficiently looks at aggregate statistics on subsets of the data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## GroupBy: The Split, Apply, Combine Paradigm\n",
"The canonical example of the groupby paradigm is best represented in a figure:\n",
"\n",
"[insert figure]\n",
"\n",
"In words, GroupBy does the following:\n",
"* The *split* step breaks up a DataFrame into subsets depending on the value of a specified key\n",
"* the *apply* step computes some aggregate functions within an individual group\n",
"* The *combine* step merges the results of the computations within groups back to a single dataset\n",
"\n",
"We could certainly perform all these operations by ourselves, manually - but the tedium of the task combined with a high probability of error makes us immediately want to use a built in function that abstracts away from individual steps and asks the user to rather think of the operation as a whole rather than the individual steps."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As a start - let's look at the average unemployment rate by state. As a first step we must specify the name of the column we are looking to split on:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.groupby('state')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Think of this object as an under the hood view of a `DataFrame`. Note that it does not compute anything until an aggregation is applied to it; i.e. it uses 'lazy evaluation'\n",
"\n",
"Now lets get those mena unemployment rates:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"state\n",
"Alabama 6.514706\n",
"Alaska 7.005392\n",
"Arizona 6.381373\n",
"Arkansas 5.997059\n",
"California 7.443627\n",
"Colorado 5.453922\n",
"Connecticut 5.821078\n",
"Delaware 5.214216\n",
"District of Columbia 7.306373\n",
"Florida 6.158824\n",
"Georgia 6.446078\n",
"Hawaii 4.456373\n",
"Idaho 5.525000\n",
"Illinois 6.936275\n",
"Indiana 6.184804\n",
"Iowa 4.373039\n",
"Kansas 5.084314\n",
"Kentucky 6.639216\n",
"Louisiana 6.197549\n",
"Maine 5.470098\n",
"Maryland 5.121078\n",
"Massachusetts 5.521078\n",
"Michigan 7.662255\n",
"Minnesota 4.886765\n",
"Mississippi 7.285784\n",
"Missouri 6.069118\n",
"Montana 5.023039\n",
"Nebraska 3.629412\n",
"Nevada 7.220098\n",
"New Hampshire 4.205882\n",
"New Jersey 6.248039\n",
"New Mexico 5.972059\n",
"New York 6.182353\n",
"North Carolina 6.741176\n",
"North Dakota 3.279902\n",
"Ohio 6.455392\n",
"Oklahoma 4.784804\n",
"Oregon 7.242647\n",
"Pennsylvania 5.946569\n",
"Puerto Rico 12.653922\n",
"Rhode Island 7.000490\n",
"South Carolina 7.130882\n",
"South Dakota 3.579412\n",
"Tennessee 6.369118\n",
"Texas 5.775000\n",
"Utah 4.716667\n",
"Vermont 4.247549\n",
"Virginia 4.543137\n",
"Washington 6.706863\n",
"West Virginia 6.168137\n",
"Wisconsin 5.668627\n",
"Wyoming 4.399510\n",
"Name: unemployment_rate, dtype: float64"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.groupby('state').unemployment_rate.mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice if we wanted the aggregation function to be applied to all of the columns, we can skip the part where we specify the column:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" year | \n",
" unemployment_rate | \n",
" qty_unemployed | \n",
" qty_employed | \n",
" labour_force | \n",
"
\n",
" \n",
" state | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" Alabama | \n",
" 2008.0 | \n",
" 6.514706 | \n",
" 1.407627e+05 | \n",
" 2.016128e+06 | \n",
" 2.156891e+06 | \n",
"
\n",
" \n",
" Alaska | \n",
" 2008.0 | \n",
" 7.005392 | \n",
" 2.449233e+04 | \n",
" 3.251195e+05 | \n",
" 3.496119e+05 | \n",
"
\n",
" \n",
" Arizona | \n",
" 2008.0 | \n",
" 6.381373 | \n",
" 1.901428e+05 | \n",
" 2.759190e+06 | \n",
" 2.949333e+06 | \n",
"
\n",
" \n",
" Arkansas | \n",
" 2008.0 | \n",
" 5.997059 | \n",
" 7.955364e+04 | \n",
" 1.244863e+06 | \n",
" 1.324416e+06 | \n",
"
\n",
" \n",
" California | \n",
" 2008.0 | \n",
" 7.443627 | \n",
" 1.347057e+06 | \n",
" 1.665503e+07 | \n",
" 1.800209e+07 | \n",
"
\n",
" \n",
" Colorado | \n",
" 2008.0 | \n",
" 5.453922 | \n",
" 1.452646e+05 | \n",
" 2.503614e+06 | \n",
" 2.648878e+06 | \n",
"
\n",
" \n",
" Connecticut | \n",
" 2008.0 | \n",
" 5.821078 | \n",
" 1.082514e+05 | \n",
" 1.736436e+06 | \n",
" 1.844688e+06 | \n",
"
\n",
" \n",
" Delaware | \n",
" 2008.0 | \n",
" 5.214216 | \n",
" 2.292001e+04 | \n",
" 4.152155e+05 | \n",
" 4.381355e+05 | \n",
"
\n",
" \n",
" District of Columbia | \n",
" 2008.0 | \n",
" 7.306373 | \n",
" 2.490122e+04 | \n",
" 3.137833e+05 | \n",
" 3.386846e+05 | \n",
"
\n",
" \n",
" Florida | \n",
" 2008.0 | \n",
" 6.158824 | \n",
" 5.573326e+05 | \n",
" 8.397621e+06 | \n",
" 8.954954e+06 | \n",
"
\n",
" \n",
" Georgia | \n",
" 2008.0 | \n",
" 6.446078 | \n",
" 3.016779e+05 | \n",
" 4.339163e+06 | \n",
" 4.640840e+06 | \n",
"
\n",
" \n",
" Hawaii | \n",
" 2008.0 | \n",
" 4.456373 | \n",
" 2.844513e+04 | \n",
" 6.090308e+05 | \n",
" 6.374759e+05 | \n",
"
\n",
" \n",
" Idaho | \n",
" 2008.0 | \n",
" 5.525000 | \n",
" 4.104917e+04 | \n",
" 7.013501e+05 | \n",
" 7.423993e+05 | \n",
"
\n",
" \n",
" Illinois | \n",
" 2008.0 | \n",
" 6.936275 | \n",
" 4.523743e+05 | \n",
" 6.059067e+06 | \n",
" 6.511441e+06 | \n",
"
\n",
" \n",
" Indiana | \n",
" 2008.0 | \n",
" 6.184804 | \n",
" 1.977921e+05 | \n",
" 3.002629e+06 | \n",
" 3.200421e+06 | \n",
"
\n",
" \n",
" Iowa | \n",
" 2008.0 | \n",
" 4.373039 | \n",
" 7.248086e+04 | \n",
" 1.583003e+06 | \n",
" 1.655483e+06 | \n",
"
\n",
" \n",
" Kansas | \n",
" 2008.0 | \n",
" 5.084314 | \n",
" 7.486110e+04 | \n",
" 1.394324e+06 | \n",
" 1.469186e+06 | \n",
"
\n",
" \n",
" Kentucky | \n",
" 2008.0 | \n",
" 6.639216 | \n",
" 1.339159e+05 | \n",
" 1.874098e+06 | \n",
" 2.008013e+06 | \n",
"
\n",
" \n",
" Louisiana | \n",
" 2008.0 | \n",
" 6.197549 | \n",
" 1.283975e+05 | \n",
" 1.938786e+06 | \n",
" 2.067184e+06 | \n",
"
\n",
" \n",
" Maine | \n",
" 2008.0 | \n",
" 5.470098 | \n",
" 3.788920e+04 | \n",
" 6.543850e+05 | \n",
" 6.922742e+05 | \n",
"
\n",
" \n",
" Maryland | \n",
" 2008.0 | \n",
" 5.121078 | \n",
" 1.548394e+05 | \n",
" 2.847295e+06 | \n",
" 3.002135e+06 | \n",
"
\n",
" \n",
" Massachusetts | \n",
" 2008.0 | \n",
" 5.521078 | \n",
" 1.911474e+05 | \n",
" 3.266176e+06 | \n",
" 3.457324e+06 | \n",
"
\n",
" \n",
" Michigan | \n",
" 2008.0 | \n",
" 7.662255 | \n",
" 3.749323e+05 | \n",
" 4.543822e+06 | \n",
" 4.918755e+06 | \n",
"
\n",
" \n",
" Minnesota | \n",
" 2008.0 | \n",
" 4.886765 | \n",
" 1.426212e+05 | \n",
" 2.771413e+06 | \n",
" 2.914034e+06 | \n",
"
\n",
" \n",
" Mississippi | \n",
" 2008.0 | \n",
" 7.285784 | \n",
" 9.449842e+04 | \n",
" 1.201615e+06 | \n",
" 1.296114e+06 | \n",
"
\n",
" \n",
" Missouri | \n",
" 2008.0 | \n",
" 6.069118 | \n",
" 1.842034e+05 | \n",
" 2.849336e+06 | \n",
" 3.033539e+06 | \n",
"
\n",
" \n",
" Montana | \n",
" 2008.0 | \n",
" 5.023039 | \n",
" 2.489118e+04 | \n",
" 4.706648e+05 | \n",
" 4.955559e+05 | \n",
"
\n",
" \n",
" Nebraska | \n",
" 2008.0 | \n",
" 3.629412 | \n",
" 3.576543e+04 | \n",
" 9.494372e+05 | \n",
" 9.852027e+05 | \n",
"
\n",
" \n",
" Nevada | \n",
" 2008.0 | \n",
" 7.220098 | \n",
" 9.513693e+04 | \n",
" 1.193879e+06 | \n",
" 1.289016e+06 | \n",
"
\n",
" \n",
" New Hampshire | \n",
" 2008.0 | \n",
" 4.205882 | \n",
" 3.072605e+04 | \n",
" 6.977644e+05 | \n",
" 7.284904e+05 | \n",
"
\n",
" \n",
" New Jersey | \n",
" 2008.0 | \n",
" 6.248039 | \n",
" 2.798071e+05 | \n",
" 4.176598e+06 | \n",
" 4.456405e+06 | \n",
"
\n",
" \n",
" New Mexico | \n",
" 2008.0 | \n",
" 5.972059 | \n",
" 5.465965e+04 | \n",
" 8.586955e+05 | \n",
" 9.133552e+05 | \n",
"
\n",
" \n",
" New York | \n",
" 2008.0 | \n",
" 6.182353 | \n",
" 5.869909e+05 | \n",
" 8.888198e+06 | \n",
" 9.475189e+06 | \n",
"
\n",
" \n",
" North Carolina | \n",
" 2008.0 | \n",
" 6.741176 | \n",
" 3.046283e+05 | \n",
" 4.188285e+06 | \n",
" 4.492913e+06 | \n",
"
\n",
" \n",
" North Dakota | \n",
" 2008.0 | \n",
" 3.279902 | \n",
" 1.220031e+04 | \n",
" 3.621385e+05 | \n",
" 3.743388e+05 | \n",
"
\n",
" \n",
" Ohio | \n",
" 2008.0 | \n",
" 6.455392 | \n",
" 3.763495e+05 | \n",
" 5.449497e+06 | \n",
" 5.825846e+06 | \n",
"
\n",
" \n",
" Oklahoma | \n",
" 2008.0 | \n",
" 4.784804 | \n",
" 8.368540e+04 | \n",
" 1.661057e+06 | \n",
" 1.744743e+06 | \n",
"
\n",
" \n",
" Oregon | \n",
" 2008.0 | \n",
" 7.242647 | \n",
" 1.389804e+05 | \n",
" 1.776049e+06 | \n",
" 1.915029e+06 | \n",
"
\n",
" \n",
" Pennsylvania | \n",
" 2008.0 | \n",
" 5.946569 | \n",
" 3.771446e+05 | \n",
" 5.952701e+06 | \n",
" 6.329845e+06 | \n",
"
\n",
" \n",
" Puerto Rico | \n",
" 2008.0 | \n",
" 12.653922 | \n",
" 1.603699e+05 | \n",
" 1.112528e+06 | \n",
" 1.272898e+06 | \n",
"
\n",
" \n",
" Rhode Island | \n",
" 2008.0 | \n",
" 7.000490 | \n",
" 3.921885e+04 | \n",
" 5.201720e+05 | \n",
" 5.593909e+05 | \n",
"
\n",
" \n",
" South Carolina | \n",
" 2008.0 | \n",
" 7.130882 | \n",
" 1.517508e+05 | \n",
" 1.967512e+06 | \n",
" 2.119263e+06 | \n",
"
\n",
" \n",
" South Dakota | \n",
" 2008.0 | \n",
" 3.579412 | \n",
" 1.562588e+04 | \n",
" 4.203784e+05 | \n",
" 4.360043e+05 | \n",
"
\n",
" \n",
" Tennessee | \n",
" 2008.0 | \n",
" 6.369118 | \n",
" 1.926002e+05 | \n",
" 2.816181e+06 | \n",
" 3.008781e+06 | \n",
"
\n",
" \n",
" Texas | \n",
" 2008.0 | \n",
" 5.775000 | \n",
" 6.833513e+05 | \n",
" 1.111923e+07 | \n",
" 1.180258e+07 | \n",
"
\n",
" \n",
" Utah | \n",
" 2008.0 | \n",
" 4.716667 | \n",
" 6.224774e+04 | \n",
" 1.261578e+06 | \n",
" 1.323826e+06 | \n",
"
\n",
" \n",
" Vermont | \n",
" 2008.0 | \n",
" 4.247549 | \n",
" 1.492144e+04 | \n",
" 3.346937e+05 | \n",
" 3.496151e+05 | \n",
"
\n",
" \n",
" Virginia | \n",
" 2008.0 | \n",
" 4.543137 | \n",
" 1.844006e+05 | \n",
" 3.830106e+06 | \n",
" 4.014506e+06 | \n",
"
\n",
" \n",
" Washington | \n",
" 2008.0 | \n",
" 6.706863 | \n",
" 2.257521e+05 | \n",
" 3.132887e+06 | \n",
" 3.358639e+06 | \n",
"
\n",
" \n",
" West Virginia | \n",
" 2008.0 | \n",
" 6.168137 | \n",
" 4.937335e+04 | \n",
" 7.508340e+05 | \n",
" 8.002073e+05 | \n",
"
\n",
" \n",
" Wisconsin | \n",
" 2008.0 | \n",
" 5.668627 | \n",
" 1.737884e+05 | \n",
" 2.889477e+06 | \n",
" 3.063265e+06 | \n",
"
\n",
" \n",
" Wyoming | \n",
" 2008.0 | \n",
" 4.399510 | \n",
" 1.283748e+04 | \n",
" 2.771341e+05 | \n",
" 2.899716e+05 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" year unemployment_rate qty_unemployed qty_employed \\\n",
"state \n",
"Alabama 2008.0 6.514706 1.407627e+05 2.016128e+06 \n",
"Alaska 2008.0 7.005392 2.449233e+04 3.251195e+05 \n",
"Arizona 2008.0 6.381373 1.901428e+05 2.759190e+06 \n",
"Arkansas 2008.0 5.997059 7.955364e+04 1.244863e+06 \n",
"California 2008.0 7.443627 1.347057e+06 1.665503e+07 \n",
"Colorado 2008.0 5.453922 1.452646e+05 2.503614e+06 \n",
"Connecticut 2008.0 5.821078 1.082514e+05 1.736436e+06 \n",
"Delaware 2008.0 5.214216 2.292001e+04 4.152155e+05 \n",
"District of Columbia 2008.0 7.306373 2.490122e+04 3.137833e+05 \n",
"Florida 2008.0 6.158824 5.573326e+05 8.397621e+06 \n",
"Georgia 2008.0 6.446078 3.016779e+05 4.339163e+06 \n",
"Hawaii 2008.0 4.456373 2.844513e+04 6.090308e+05 \n",
"Idaho 2008.0 5.525000 4.104917e+04 7.013501e+05 \n",
"Illinois 2008.0 6.936275 4.523743e+05 6.059067e+06 \n",
"Indiana 2008.0 6.184804 1.977921e+05 3.002629e+06 \n",
"Iowa 2008.0 4.373039 7.248086e+04 1.583003e+06 \n",
"Kansas 2008.0 5.084314 7.486110e+04 1.394324e+06 \n",
"Kentucky 2008.0 6.639216 1.339159e+05 1.874098e+06 \n",
"Louisiana 2008.0 6.197549 1.283975e+05 1.938786e+06 \n",
"Maine 2008.0 5.470098 3.788920e+04 6.543850e+05 \n",
"Maryland 2008.0 5.121078 1.548394e+05 2.847295e+06 \n",
"Massachusetts 2008.0 5.521078 1.911474e+05 3.266176e+06 \n",
"Michigan 2008.0 7.662255 3.749323e+05 4.543822e+06 \n",
"Minnesota 2008.0 4.886765 1.426212e+05 2.771413e+06 \n",
"Mississippi 2008.0 7.285784 9.449842e+04 1.201615e+06 \n",
"Missouri 2008.0 6.069118 1.842034e+05 2.849336e+06 \n",
"Montana 2008.0 5.023039 2.489118e+04 4.706648e+05 \n",
"Nebraska 2008.0 3.629412 3.576543e+04 9.494372e+05 \n",
"Nevada 2008.0 7.220098 9.513693e+04 1.193879e+06 \n",
"New Hampshire 2008.0 4.205882 3.072605e+04 6.977644e+05 \n",
"New Jersey 2008.0 6.248039 2.798071e+05 4.176598e+06 \n",
"New Mexico 2008.0 5.972059 5.465965e+04 8.586955e+05 \n",
"New York 2008.0 6.182353 5.869909e+05 8.888198e+06 \n",
"North Carolina 2008.0 6.741176 3.046283e+05 4.188285e+06 \n",
"North Dakota 2008.0 3.279902 1.220031e+04 3.621385e+05 \n",
"Ohio 2008.0 6.455392 3.763495e+05 5.449497e+06 \n",
"Oklahoma 2008.0 4.784804 8.368540e+04 1.661057e+06 \n",
"Oregon 2008.0 7.242647 1.389804e+05 1.776049e+06 \n",
"Pennsylvania 2008.0 5.946569 3.771446e+05 5.952701e+06 \n",
"Puerto Rico 2008.0 12.653922 1.603699e+05 1.112528e+06 \n",
"Rhode Island 2008.0 7.000490 3.921885e+04 5.201720e+05 \n",
"South Carolina 2008.0 7.130882 1.517508e+05 1.967512e+06 \n",
"South Dakota 2008.0 3.579412 1.562588e+04 4.203784e+05 \n",
"Tennessee 2008.0 6.369118 1.926002e+05 2.816181e+06 \n",
"Texas 2008.0 5.775000 6.833513e+05 1.111923e+07 \n",
"Utah 2008.0 4.716667 6.224774e+04 1.261578e+06 \n",
"Vermont 2008.0 4.247549 1.492144e+04 3.346937e+05 \n",
"Virginia 2008.0 4.543137 1.844006e+05 3.830106e+06 \n",
"Washington 2008.0 6.706863 2.257521e+05 3.132887e+06 \n",
"West Virginia 2008.0 6.168137 4.937335e+04 7.508340e+05 \n",
"Wisconsin 2008.0 5.668627 1.737884e+05 2.889477e+06 \n",
"Wyoming 2008.0 4.399510 1.283748e+04 2.771341e+05 \n",
"\n",
" labour_force \n",
"state \n",
"Alabama 2.156891e+06 \n",
"Alaska 3.496119e+05 \n",
"Arizona 2.949333e+06 \n",
"Arkansas 1.324416e+06 \n",
"California 1.800209e+07 \n",
"Colorado 2.648878e+06 \n",
"Connecticut 1.844688e+06 \n",
"Delaware 4.381355e+05 \n",
"District of Columbia 3.386846e+05 \n",
"Florida 8.954954e+06 \n",
"Georgia 4.640840e+06 \n",
"Hawaii 6.374759e+05 \n",
"Idaho 7.423993e+05 \n",
"Illinois 6.511441e+06 \n",
"Indiana 3.200421e+06 \n",
"Iowa 1.655483e+06 \n",
"Kansas 1.469186e+06 \n",
"Kentucky 2.008013e+06 \n",
"Louisiana 2.067184e+06 \n",
"Maine 6.922742e+05 \n",
"Maryland 3.002135e+06 \n",
"Massachusetts 3.457324e+06 \n",
"Michigan 4.918755e+06 \n",
"Minnesota 2.914034e+06 \n",
"Mississippi 1.296114e+06 \n",
"Missouri 3.033539e+06 \n",
"Montana 4.955559e+05 \n",
"Nebraska 9.852027e+05 \n",
"Nevada 1.289016e+06 \n",
"New Hampshire 7.284904e+05 \n",
"New Jersey 4.456405e+06 \n",
"New Mexico 9.133552e+05 \n",
"New York 9.475189e+06 \n",
"North Carolina 4.492913e+06 \n",
"North Dakota 3.743388e+05 \n",
"Ohio 5.825846e+06 \n",
"Oklahoma 1.744743e+06 \n",
"Oregon 1.915029e+06 \n",
"Pennsylvania 6.329845e+06 \n",
"Puerto Rico 1.272898e+06 \n",
"Rhode Island 5.593909e+05 \n",
"South Carolina 2.119263e+06 \n",
"South Dakota 4.360043e+05 \n",
"Tennessee 3.008781e+06 \n",
"Texas 1.180258e+07 \n",
"Utah 1.323826e+06 \n",
"Vermont 3.496151e+05 \n",
"Virginia 4.014506e+06 \n",
"Washington 3.358639e+06 \n",
"West Virginia 8.002073e+05 \n",
"Wisconsin 3.063265e+06 \n",
"Wyoming 2.899716e+05 "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.groupby('state').mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"and if we want multiple, but not all columns:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" unemployment_rate | \n",
" qty_unemployed | \n",
"
\n",
" \n",
" state | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" Alabama | \n",
" 6.514706 | \n",
" 1.407627e+05 | \n",
"
\n",
" \n",
" Alaska | \n",
" 7.005392 | \n",
" 2.449233e+04 | \n",
"
\n",
" \n",
" Arizona | \n",
" 6.381373 | \n",
" 1.901428e+05 | \n",
"
\n",
" \n",
" Arkansas | \n",
" 5.997059 | \n",
" 7.955364e+04 | \n",
"
\n",
" \n",
" California | \n",
" 7.443627 | \n",
" 1.347057e+06 | \n",
"
\n",
" \n",
" Colorado | \n",
" 5.453922 | \n",
" 1.452646e+05 | \n",
"
\n",
" \n",
" Connecticut | \n",
" 5.821078 | \n",
" 1.082514e+05 | \n",
"
\n",
" \n",
" Delaware | \n",
" 5.214216 | \n",
" 2.292001e+04 | \n",
"
\n",
" \n",
" District of Columbia | \n",
" 7.306373 | \n",
" 2.490122e+04 | \n",
"
\n",
" \n",
" Florida | \n",
" 6.158824 | \n",
" 5.573326e+05 | \n",
"
\n",
" \n",
" Georgia | \n",
" 6.446078 | \n",
" 3.016779e+05 | \n",
"
\n",
" \n",
" Hawaii | \n",
" 4.456373 | \n",
" 2.844513e+04 | \n",
"
\n",
" \n",
" Idaho | \n",
" 5.525000 | \n",
" 4.104917e+04 | \n",
"
\n",
" \n",
" Illinois | \n",
" 6.936275 | \n",
" 4.523743e+05 | \n",
"
\n",
" \n",
" Indiana | \n",
" 6.184804 | \n",
" 1.977921e+05 | \n",
"
\n",
" \n",
" Iowa | \n",
" 4.373039 | \n",
" 7.248086e+04 | \n",
"
\n",
" \n",
" Kansas | \n",
" 5.084314 | \n",
" 7.486110e+04 | \n",
"
\n",
" \n",
" Kentucky | \n",
" 6.639216 | \n",
" 1.339159e+05 | \n",
"
\n",
" \n",
" Louisiana | \n",
" 6.197549 | \n",
" 1.283975e+05 | \n",
"
\n",
" \n",
" Maine | \n",
" 5.470098 | \n",
" 3.788920e+04 | \n",
"
\n",
" \n",
" Maryland | \n",
" 5.121078 | \n",
" 1.548394e+05 | \n",
"
\n",
" \n",
" Massachusetts | \n",
" 5.521078 | \n",
" 1.911474e+05 | \n",
"
\n",
" \n",
" Michigan | \n",
" 7.662255 | \n",
" 3.749323e+05 | \n",
"
\n",
" \n",
" Minnesota | \n",
" 4.886765 | \n",
" 1.426212e+05 | \n",
"
\n",
" \n",
" Mississippi | \n",
" 7.285784 | \n",
" 9.449842e+04 | \n",
"
\n",
" \n",
" Missouri | \n",
" 6.069118 | \n",
" 1.842034e+05 | \n",
"
\n",
" \n",
" Montana | \n",
" 5.023039 | \n",
" 2.489118e+04 | \n",
"
\n",
" \n",
" Nebraska | \n",
" 3.629412 | \n",
" 3.576543e+04 | \n",
"
\n",
" \n",
" Nevada | \n",
" 7.220098 | \n",
" 9.513693e+04 | \n",
"
\n",
" \n",
" New Hampshire | \n",
" 4.205882 | \n",
" 3.072605e+04 | \n",
"
\n",
" \n",
" New Jersey | \n",
" 6.248039 | \n",
" 2.798071e+05 | \n",
"
\n",
" \n",
" New Mexico | \n",
" 5.972059 | \n",
" 5.465965e+04 | \n",
"
\n",
" \n",
" New York | \n",
" 6.182353 | \n",
" 5.869909e+05 | \n",
"
\n",
" \n",
" North Carolina | \n",
" 6.741176 | \n",
" 3.046283e+05 | \n",
"
\n",
" \n",
" North Dakota | \n",
" 3.279902 | \n",
" 1.220031e+04 | \n",
"
\n",
" \n",
" Ohio | \n",
" 6.455392 | \n",
" 3.763495e+05 | \n",
"
\n",
" \n",
" Oklahoma | \n",
" 4.784804 | \n",
" 8.368540e+04 | \n",
"
\n",
" \n",
" Oregon | \n",
" 7.242647 | \n",
" 1.389804e+05 | \n",
"
\n",
" \n",
" Pennsylvania | \n",
" 5.946569 | \n",
" 3.771446e+05 | \n",
"
\n",
" \n",
" Puerto Rico | \n",
" 12.653922 | \n",
" 1.603699e+05 | \n",
"
\n",
" \n",
" Rhode Island | \n",
" 7.000490 | \n",
" 3.921885e+04 | \n",
"
\n",
" \n",
" South Carolina | \n",
" 7.130882 | \n",
" 1.517508e+05 | \n",
"
\n",
" \n",
" South Dakota | \n",
" 3.579412 | \n",
" 1.562588e+04 | \n",
"
\n",
" \n",
" Tennessee | \n",
" 6.369118 | \n",
" 1.926002e+05 | \n",
"
\n",
" \n",
" Texas | \n",
" 5.775000 | \n",
" 6.833513e+05 | \n",
"
\n",
" \n",
" Utah | \n",
" 4.716667 | \n",
" 6.224774e+04 | \n",
"
\n",
" \n",
" Vermont | \n",
" 4.247549 | \n",
" 1.492144e+04 | \n",
"
\n",
" \n",
" Virginia | \n",
" 4.543137 | \n",
" 1.844006e+05 | \n",
"
\n",
" \n",
" Washington | \n",
" 6.706863 | \n",
" 2.257521e+05 | \n",
"
\n",
" \n",
" West Virginia | \n",
" 6.168137 | \n",
" 4.937335e+04 | \n",
"
\n",
" \n",
" Wisconsin | \n",
" 5.668627 | \n",
" 1.737884e+05 | \n",
"
\n",
" \n",
" Wyoming | \n",
" 4.399510 | \n",
" 1.283748e+04 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" unemployment_rate qty_unemployed\n",
"state \n",
"Alabama 6.514706 1.407627e+05\n",
"Alaska 7.005392 2.449233e+04\n",
"Arizona 6.381373 1.901428e+05\n",
"Arkansas 5.997059 7.955364e+04\n",
"California 7.443627 1.347057e+06\n",
"Colorado 5.453922 1.452646e+05\n",
"Connecticut 5.821078 1.082514e+05\n",
"Delaware 5.214216 2.292001e+04\n",
"District of Columbia 7.306373 2.490122e+04\n",
"Florida 6.158824 5.573326e+05\n",
"Georgia 6.446078 3.016779e+05\n",
"Hawaii 4.456373 2.844513e+04\n",
"Idaho 5.525000 4.104917e+04\n",
"Illinois 6.936275 4.523743e+05\n",
"Indiana 6.184804 1.977921e+05\n",
"Iowa 4.373039 7.248086e+04\n",
"Kansas 5.084314 7.486110e+04\n",
"Kentucky 6.639216 1.339159e+05\n",
"Louisiana 6.197549 1.283975e+05\n",
"Maine 5.470098 3.788920e+04\n",
"Maryland 5.121078 1.548394e+05\n",
"Massachusetts 5.521078 1.911474e+05\n",
"Michigan 7.662255 3.749323e+05\n",
"Minnesota 4.886765 1.426212e+05\n",
"Mississippi 7.285784 9.449842e+04\n",
"Missouri 6.069118 1.842034e+05\n",
"Montana 5.023039 2.489118e+04\n",
"Nebraska 3.629412 3.576543e+04\n",
"Nevada 7.220098 9.513693e+04\n",
"New Hampshire 4.205882 3.072605e+04\n",
"New Jersey 6.248039 2.798071e+05\n",
"New Mexico 5.972059 5.465965e+04\n",
"New York 6.182353 5.869909e+05\n",
"North Carolina 6.741176 3.046283e+05\n",
"North Dakota 3.279902 1.220031e+04\n",
"Ohio 6.455392 3.763495e+05\n",
"Oklahoma 4.784804 8.368540e+04\n",
"Oregon 7.242647 1.389804e+05\n",
"Pennsylvania 5.946569 3.771446e+05\n",
"Puerto Rico 12.653922 1.603699e+05\n",
"Rhode Island 7.000490 3.921885e+04\n",
"South Carolina 7.130882 1.517508e+05\n",
"South Dakota 3.579412 1.562588e+04\n",
"Tennessee 6.369118 1.926002e+05\n",
"Texas 5.775000 6.833513e+05\n",
"Utah 4.716667 6.224774e+04\n",
"Vermont 4.247549 1.492144e+04\n",
"Virginia 4.543137 1.844006e+05\n",
"Washington 6.706863 2.257521e+05\n",
"West Virginia 6.168137 4.937335e+04\n",
"Wisconsin 5.668627 1.737884e+05\n",
"Wyoming 4.399510 1.283748e+04"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.groupby('state')['unemployment_rate', 'qty_unemployed'].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Very Cool!\n",
"\n",
"## Challenge\n",
"\n",
"Try and compute statistics where the grouping is described by multiple variables:\n",
"1. Compute the average unemployment rate by state-year\n",
"2. Compute the average unemployment rate by state-year-month\n",
"3. Compute the average unemployment rate by state-year for the Carolinas (Hint: use the .filter() function after the computation)\n",
"4. Compute the average unemployment rate by state-year and use `unstack` to view the data in a way that is different from the default output\n",
"\n",
"#### Partial Solution"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"state year\n",
"North Carolina 2000 3.666667\n",
" 2001 5.516667\n",
" 2002 6.658333\n",
" 2003 6.375000\n",
" 2004 5.491667\n",
" 2005 5.233333\n",
" 2006 4.750000\n",
" 2007 4.725000\n",
" 2008 6.100000\n",
" 2009 10.575000\n",
" 2010 10.850000\n",
" 2011 10.275000\n",
" 2012 9.291667\n",
" 2013 7.983333\n",
" 2014 6.300000\n",
" 2015 5.750000\n",
" 2016 5.058333\n",
"South Carolina 2000 3.758333\n",
" 2001 5.216667\n",
" 2002 5.808333\n",
" 2003 6.908333\n",
" 2004 6.825000\n",
" 2005 6.725000\n",
" 2006 6.433333\n",
" 2007 5.666667\n",
" 2008 6.791667\n",
" 2009 11.258333\n",
" 2010 11.158333\n",
" 2011 10.558333\n",
" 2012 9.216667\n",
" 2013 7.641667\n",
" 2014 6.441667\n",
" 2015 5.975000\n",
" 2016 4.841667\n",
"Name: unemployment_rate, dtype: float64"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.groupby(['state', 'year'])['unemployment_rate'].mean().filter(like='Carolina')"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" state | \n",
" Alabama | \n",
" Alaska | \n",
" Arizona | \n",
" Arkansas | \n",
" California | \n",
" Colorado | \n",
" Connecticut | \n",
" Delaware | \n",
" District of Columbia | \n",
" Florida | \n",
" ... | \n",
" South Dakota | \n",
" Tennessee | \n",
" Texas | \n",
" Utah | \n",
" Vermont | \n",
" Virginia | \n",
" Washington | \n",
" West Virginia | \n",
" Wisconsin | \n",
" Wyoming | \n",
"
\n",
" \n",
" year | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2000 | \n",
" 4.583333 | \n",
" 6.375000 | \n",
" 3.950000 | \n",
" 4.258333 | \n",
" 4.941667 | \n",
" 2.758333 | \n",
" 2.391667 | \n",
" 3.733333 | \n",
" 5.608333 | \n",
" 3.683333 | \n",
" ... | \n",
" 2.458333 | \n",
" 3.866667 | \n",
" 4.291667 | \n",
" 3.341667 | \n",
" 2.808333 | \n",
" 2.300000 | \n",
" 5.166667 | \n",
" 5.483333 | \n",
" 3.525000 | \n",
" 3.908333 | \n",
"
\n",
" \n",
" 2001 | \n",
" 5.125000 | \n",
" 6.433333 | \n",
" 4.791667 | \n",
" 4.958333 | \n",
" 5.433333 | \n",
" 3.800000 | \n",
" 3.133333 | \n",
" 3.500000 | \n",
" 6.333333 | \n",
" 4.641667 | \n",
" ... | \n",
" 3.091667 | \n",
" 4.550000 | \n",
" 4.941667 | \n",
" 4.416667 | \n",
" 3.333333 | \n",
" 3.233333 | \n",
" 6.300000 | \n",
" 5.008333 | \n",
" 4.525000 | \n",
" 3.825000 | \n",
"
\n",
" \n",
" 2002 | \n",
" 5.900000 | \n",
" 7.291667 | \n",
" 6.083333 | \n",
" 5.483333 | \n",
" 6.666667 | \n",
" 5.533333 | \n",
" 4.333333 | \n",
" 4.025000 | \n",
" 6.408333 | \n",
" 5.633333 | \n",
" ... | \n",
" 3.183333 | \n",
" 5.183333 | \n",
" 6.341667 | \n",
" 5.800000 | \n",
" 3.991667 | \n",
" 4.216667 | \n",
" 7.408333 | \n",
" 5.916667 | \n",
" 5.375000 | \n",
" 4.041667 | \n",
"
\n",
" \n",
" 2003 | \n",
" 6.016667 | \n",
" 7.825000 | \n",
" 5.741667 | \n",
" 5.933333 | \n",
" 6.800000 | \n",
" 5.958333 | \n",
" 5.391667 | \n",
" 4.266667 | \n",
" 6.833333 | \n",
" 5.208333 | \n",
" ... | \n",
" 3.533333 | \n",
" 5.633333 | \n",
" 6.683333 | \n",
" 5.633333 | \n",
" 4.266667 | \n",
" 4.108333 | \n",
" 7.408333 | \n",
" 6.050000 | \n",
" 5.700000 | \n",
" 4.275000 | \n",
"
\n",
" \n",
" 2004 | \n",
" 5.683333 | \n",
" 7.458333 | \n",
" 5.033333 | \n",
" 5.708333 | \n",
" 6.208333 | \n",
" 5.483333 | \n",
" 4.966667 | \n",
" 3.991667 | \n",
" 7.783333 | \n",
" 4.633333 | \n",
" ... | \n",
" 3.725000 | \n",
" 5.333333 | \n",
" 5.941667 | \n",
" 4.991667 | \n",
" 3.675000 | \n",
" 3.825000 | \n",
" 6.266667 | \n",
" 5.325000 | \n",
" 5.025000 | \n",
" 3.808333 | \n",
"
\n",
" \n",
" 2005 | \n",
" 4.491667 | \n",
" 6.883333 | \n",
" 4.683333 | \n",
" 5.250000 | \n",
" 5.400000 | \n",
" 5.025000 | \n",
" 4.883333 | \n",
" 4.141667 | \n",
" 6.375000 | \n",
" 3.691667 | \n",
" ... | \n",
" 3.816667 | \n",
" 5.550000 | \n",
" 5.408333 | \n",
" 4.091667 | \n",
" 3.475000 | \n",
" 3.566667 | \n",
" 5.550000 | \n",
" 5.108333 | \n",
" 4.741667 | \n",
" 3.583333 | \n",
"
\n",
" \n",
" 2006 | \n",
" 4.033333 | \n",
" 6.616667 | \n",
" 4.208333 | \n",
" 5.158333 | \n",
" 4.900000 | \n",
" 4.308333 | \n",
" 4.333333 | \n",
" 3.575000 | \n",
" 5.791667 | \n",
" 3.225000 | \n",
" ... | \n",
" 3.108333 | \n",
" 5.200000 | \n",
" 4.891667 | \n",
" 2.975000 | \n",
" 3.683333 | \n",
" 3.100000 | \n",
" 5.041667 | \n",
" 4.941667 | \n",
" 4.741667 | \n",
" 3.175000 | \n",
"
\n",
" \n",
" 2007 | \n",
" 3.983333 | \n",
" 6.333333 | \n",
" 3.858333 | \n",
" 5.308333 | \n",
" 5.375000 | \n",
" 3.733333 | \n",
" 4.475000 | \n",
" 3.441667 | \n",
" 5.525000 | \n",
" 4.008333 | \n",
" ... | \n",
" 2.841667 | \n",
" 4.675000 | \n",
" 4.291667 | \n",
" 2.566667 | \n",
" 4.008333 | \n",
" 3.033333 | \n",
" 4.708333 | \n",
" 4.583333 | \n",
" 4.900000 | \n",
" 2.816667 | \n",
"
\n",
" \n",
" 2008 | \n",
" 5.658333 | \n",
" 6.691667 | \n",
" 6.125000 | \n",
" 5.466667 | \n",
" 7.275000 | \n",
" 4.825000 | \n",
" 5.683333 | \n",
" 4.941667 | \n",
" 6.458333 | \n",
" 6.283333 | \n",
" ... | \n",
" 3.050000 | \n",
" 6.575000 | \n",
" 4.808333 | \n",
" 3.591667 | \n",
" 4.683333 | \n",
" 3.941667 | \n",
" 5.383333 | \n",
" 4.358333 | \n",
" 4.908333 | \n",
" 3.050000 | \n",
"
\n",
" \n",
" 2009 | \n",
" 11.033333 | \n",
" 7.741667 | \n",
" 9.933333 | \n",
" 7.850000 | \n",
" 11.141667 | \n",
" 7.283333 | \n",
" 7.933333 | \n",
" 8.266667 | \n",
" 9.275000 | \n",
" 10.408333 | \n",
" ... | \n",
" 4.925000 | \n",
" 10.483333 | \n",
" 7.558333 | \n",
" 7.325000 | \n",
" 6.591667 | \n",
" 6.700000 | \n",
" 9.141667 | \n",
" 7.725000 | \n",
" 8.575000 | \n",
" 6.325000 | \n",
"
\n",
" \n",
" 2010 | \n",
" 10.541667 | \n",
" 7.883333 | \n",
" 10.366667 | \n",
" 8.191667 | \n",
" 12.225000 | \n",
" 8.725000 | \n",
" 9.108333 | \n",
" 8.416667 | \n",
" 9.416667 | \n",
" 11.041667 | \n",
" ... | \n",
" 4.983333 | \n",
" 9.675000 | \n",
" 8.150000 | \n",
" 7.833333 | \n",
" 6.091667 | \n",
" 7.150000 | \n",
" 10.000000 | \n",
" 8.641667 | \n",
" 8.666667 | \n",
" 6.458333 | \n",
"
\n",
" \n",
" 2011 | \n",
" 9.641667 | \n",
" 7.591667 | \n",
" 9.500000 | \n",
" 8.308333 | \n",
" 11.716667 | \n",
" 8.358333 | \n",
" 8.808333 | \n",
" 7.500000 | \n",
" 10.158333 | \n",
" 9.991667 | \n",
" ... | \n",
" 4.741667 | \n",
" 8.991667 | \n",
" 7.758333 | \n",
" 6.733333 | \n",
" 5.475000 | \n",
" 6.591667 | \n",
" 9.291667 | \n",
" 8.058333 | \n",
" 7.758333 | \n",
" 5.808333 | \n",
"
\n",
" \n",
" 2012 | \n",
" 7.941667 | \n",
" 7.141667 | \n",
" 8.350000 | \n",
" 7.575000 | \n",
" 10.375000 | \n",
" 7.875000 | \n",
" 8.341667 | \n",
" 7.200000 | \n",
" 9.050000 | \n",
" 8.458333 | \n",
" ... | \n",
" 4.275000 | \n",
" 7.841667 | \n",
" 6.725000 | \n",
" 5.408333 | \n",
" 4.950000 | \n",
" 6.033333 | \n",
" 8.133333 | \n",
" 7.508333 | \n",
" 7.041667 | \n",
" 5.316667 | \n",
"
\n",
" \n",
" 2013 | \n",
" 7.225000 | \n",
" 6.925000 | \n",
" 7.741667 | \n",
" 7.341667 | \n",
" 8.933333 | \n",
" 6.825000 | \n",
" 7.750000 | \n",
" 6.716667 | \n",
" 8.500000 | \n",
" 7.233333 | \n",
" ... | \n",
" 3.783333 | \n",
" 7.791667 | \n",
" 6.225000 | \n",
" 4.600000 | \n",
" 4.416667 | \n",
" 5.716667 | \n",
" 7.033333 | \n",
" 6.766667 | \n",
" 6.741667 | \n",
" 4.708333 | \n",
"
\n",
" \n",
" 2014 | \n",
" 6.791667 | \n",
" 6.841667 | \n",
" 6.775000 | \n",
" 6.091667 | \n",
" 7.500000 | \n",
" 5.000000 | \n",
" 6.608333 | \n",
" 5.708333 | \n",
" 7.775000 | \n",
" 6.266667 | \n",
" ... | \n",
" 3.433333 | \n",
" 6.516667 | \n",
" 5.100000 | \n",
" 3.816667 | \n",
" 3.950000 | \n",
" 5.241667 | \n",
" 6.125000 | \n",
" 6.633333 | \n",
" 5.416667 | \n",
" 4.158333 | \n",
"
\n",
" \n",
" 2015 | \n",
" 6.108333 | \n",
" 6.441667 | \n",
" 6.033333 | \n",
" 5.075000 | \n",
" 6.216667 | \n",
" 3.908333 | \n",
" 5.733333 | \n",
" 4.833333 | \n",
" 6.900000 | \n",
" 5.416667 | \n",
" ... | \n",
" 3.075000 | \n",
" 5.600000 | \n",
" 4.441667 | \n",
" 3.633333 | \n",
" 3.566667 | \n",
" 4.466667 | \n",
" 5.625000 | \n",
" 6.750000 | \n",
" 4.575000 | \n",
" 4.250000 | \n",
"
\n",
" \n",
" 2016 | \n",
" 5.991667 | \n",
" 6.616667 | \n",
" 5.308333 | \n",
" 3.991667 | \n",
" 5.433333 | \n",
" 3.316667 | \n",
" 5.083333 | \n",
" 4.383333 | \n",
" 6.016667 | \n",
" 4.875000 | \n",
" ... | \n",
" 2.825000 | \n",
" 4.808333 | \n",
" 4.616667 | \n",
" 3.425000 | \n",
" 3.241667 | \n",
" 4.008333 | \n",
" 5.433333 | \n",
" 6.000000 | \n",
" 4.150000 | \n",
" 5.283333 | \n",
"
\n",
" \n",
"
\n",
"
17 rows × 52 columns
\n",
"
"
],
"text/plain": [
"state Alabama Alaska Arizona Arkansas California Colorado \\\n",
"year \n",
"2000 4.583333 6.375000 3.950000 4.258333 4.941667 2.758333 \n",
"2001 5.125000 6.433333 4.791667 4.958333 5.433333 3.800000 \n",
"2002 5.900000 7.291667 6.083333 5.483333 6.666667 5.533333 \n",
"2003 6.016667 7.825000 5.741667 5.933333 6.800000 5.958333 \n",
"2004 5.683333 7.458333 5.033333 5.708333 6.208333 5.483333 \n",
"2005 4.491667 6.883333 4.683333 5.250000 5.400000 5.025000 \n",
"2006 4.033333 6.616667 4.208333 5.158333 4.900000 4.308333 \n",
"2007 3.983333 6.333333 3.858333 5.308333 5.375000 3.733333 \n",
"2008 5.658333 6.691667 6.125000 5.466667 7.275000 4.825000 \n",
"2009 11.033333 7.741667 9.933333 7.850000 11.141667 7.283333 \n",
"2010 10.541667 7.883333 10.366667 8.191667 12.225000 8.725000 \n",
"2011 9.641667 7.591667 9.500000 8.308333 11.716667 8.358333 \n",
"2012 7.941667 7.141667 8.350000 7.575000 10.375000 7.875000 \n",
"2013 7.225000 6.925000 7.741667 7.341667 8.933333 6.825000 \n",
"2014 6.791667 6.841667 6.775000 6.091667 7.500000 5.000000 \n",
"2015 6.108333 6.441667 6.033333 5.075000 6.216667 3.908333 \n",
"2016 5.991667 6.616667 5.308333 3.991667 5.433333 3.316667 \n",
"\n",
"state Connecticut Delaware District of Columbia Florida ... \\\n",
"year ... \n",
"2000 2.391667 3.733333 5.608333 3.683333 ... \n",
"2001 3.133333 3.500000 6.333333 4.641667 ... \n",
"2002 4.333333 4.025000 6.408333 5.633333 ... \n",
"2003 5.391667 4.266667 6.833333 5.208333 ... \n",
"2004 4.966667 3.991667 7.783333 4.633333 ... \n",
"2005 4.883333 4.141667 6.375000 3.691667 ... \n",
"2006 4.333333 3.575000 5.791667 3.225000 ... \n",
"2007 4.475000 3.441667 5.525000 4.008333 ... \n",
"2008 5.683333 4.941667 6.458333 6.283333 ... \n",
"2009 7.933333 8.266667 9.275000 10.408333 ... \n",
"2010 9.108333 8.416667 9.416667 11.041667 ... \n",
"2011 8.808333 7.500000 10.158333 9.991667 ... \n",
"2012 8.341667 7.200000 9.050000 8.458333 ... \n",
"2013 7.750000 6.716667 8.500000 7.233333 ... \n",
"2014 6.608333 5.708333 7.775000 6.266667 ... \n",
"2015 5.733333 4.833333 6.900000 5.416667 ... \n",
"2016 5.083333 4.383333 6.016667 4.875000 ... \n",
"\n",
"state South Dakota Tennessee Texas Utah Vermont Virginia \\\n",
"year \n",
"2000 2.458333 3.866667 4.291667 3.341667 2.808333 2.300000 \n",
"2001 3.091667 4.550000 4.941667 4.416667 3.333333 3.233333 \n",
"2002 3.183333 5.183333 6.341667 5.800000 3.991667 4.216667 \n",
"2003 3.533333 5.633333 6.683333 5.633333 4.266667 4.108333 \n",
"2004 3.725000 5.333333 5.941667 4.991667 3.675000 3.825000 \n",
"2005 3.816667 5.550000 5.408333 4.091667 3.475000 3.566667 \n",
"2006 3.108333 5.200000 4.891667 2.975000 3.683333 3.100000 \n",
"2007 2.841667 4.675000 4.291667 2.566667 4.008333 3.033333 \n",
"2008 3.050000 6.575000 4.808333 3.591667 4.683333 3.941667 \n",
"2009 4.925000 10.483333 7.558333 7.325000 6.591667 6.700000 \n",
"2010 4.983333 9.675000 8.150000 7.833333 6.091667 7.150000 \n",
"2011 4.741667 8.991667 7.758333 6.733333 5.475000 6.591667 \n",
"2012 4.275000 7.841667 6.725000 5.408333 4.950000 6.033333 \n",
"2013 3.783333 7.791667 6.225000 4.600000 4.416667 5.716667 \n",
"2014 3.433333 6.516667 5.100000 3.816667 3.950000 5.241667 \n",
"2015 3.075000 5.600000 4.441667 3.633333 3.566667 4.466667 \n",
"2016 2.825000 4.808333 4.616667 3.425000 3.241667 4.008333 \n",
"\n",
"state Washington West Virginia Wisconsin Wyoming \n",
"year \n",
"2000 5.166667 5.483333 3.525000 3.908333 \n",
"2001 6.300000 5.008333 4.525000 3.825000 \n",
"2002 7.408333 5.916667 5.375000 4.041667 \n",
"2003 7.408333 6.050000 5.700000 4.275000 \n",
"2004 6.266667 5.325000 5.025000 3.808333 \n",
"2005 5.550000 5.108333 4.741667 3.583333 \n",
"2006 5.041667 4.941667 4.741667 3.175000 \n",
"2007 4.708333 4.583333 4.900000 2.816667 \n",
"2008 5.383333 4.358333 4.908333 3.050000 \n",
"2009 9.141667 7.725000 8.575000 6.325000 \n",
"2010 10.000000 8.641667 8.666667 6.458333 \n",
"2011 9.291667 8.058333 7.758333 5.808333 \n",
"2012 8.133333 7.508333 7.041667 5.316667 \n",
"2013 7.033333 6.766667 6.741667 4.708333 \n",
"2014 6.125000 6.633333 5.416667 4.158333 \n",
"2015 5.625000 6.750000 4.575000 4.250000 \n",
"2016 5.433333 6.000000 4.150000 5.283333 \n",
"\n",
"[17 rows x 52 columns]"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.groupby(['state', 'year'])['unemployment_rate'].mean().unstack(level=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## More useful GroupBy functionality\n",
"\n",
"Often we want more than one aggregate statistic, for example we may want the min, mean and max by State. We can use the `.aggregate()` function to get these:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" amin | \n",
" mean | \n",
" amax | \n",
"
\n",
" \n",
" state | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" Alabama | \n",
" 3.3 | \n",
" 6.514706 | \n",
" 12.3 | \n",
"
\n",
" \n",
" Alaska | \n",
" 5.2 | \n",
" 7.005392 | \n",
" 9.1 | \n",
"
\n",
" \n",
" Arizona | \n",
" 3.4 | \n",
" 6.381373 | \n",
" 11.1 | \n",
"
\n",
" \n",
" Arkansas | \n",
" 3.4 | \n",
" 5.997059 | \n",
" 9.5 | \n",
"
\n",
" \n",
" California | \n",
" 4.4 | \n",
" 7.443627 | \n",
" 12.7 | \n",
"
\n",
" \n",
" Colorado | \n",
" 2.4 | \n",
" 5.453922 | \n",
" 9.2 | \n",
"
\n",
" \n",
" Connecticut | \n",
" 1.7 | \n",
" 5.821078 | \n",
" 9.8 | \n",
"
\n",
" \n",
" Delaware | \n",
" 2.9 | \n",
" 5.214216 | \n",
" 9.4 | \n",
"
\n",
" \n",
" District of Columbia | \n",
" 4.7 | \n",
" 7.306373 | \n",
" 10.9 | \n",
"
\n",
" \n",
" Florida | \n",
" 2.8 | \n",
" 6.158824 | \n",
" 11.5 | \n",
"
\n",
" \n",
" Georgia | \n",
" 3.0 | \n",
" 6.446078 | \n",
" 10.9 | \n",
"
\n",
" \n",
" Hawaii | \n",
" 1.9 | \n",
" 4.456373 | \n",
" 7.9 | \n",
"
\n",
" \n",
" Idaho | \n",
" 2.2 | \n",
" 5.525000 | \n",
" 10.5 | \n",
"
\n",
" \n",
" Illinois | \n",
" 3.6 | \n",
" 6.936275 | \n",
" 12.2 | \n",
"
\n",
" \n",
" Indiana | \n",
" 2.2 | \n",
" 6.184804 | \n",
" 11.8 | \n",
"
\n",
" \n",
" Iowa | \n",
" 2.3 | \n",
" 4.373039 | \n",
" 7.3 | \n",
"
\n",
" \n",
" Kansas | \n",
" 3.2 | \n",
" 5.084314 | \n",
" 7.9 | \n",
"
\n",
" \n",
" Kentucky | \n",
" 3.7 | \n",
" 6.639216 | \n",
" 11.9 | \n",
"
\n",
" \n",
" Louisiana | \n",
" 3.6 | \n",
" 6.197549 | \n",
" 10.9 | \n",
"
\n",
" \n",
" Maine | \n",
" 2.6 | \n",
" 5.470098 | \n",
" 9.5 | \n",
"
\n",
" \n",
" Maryland | \n",
" 3.1 | \n",
" 5.121078 | \n",
" 8.3 | \n",
"
\n",
" \n",
" Massachusetts | \n",
" 2.1 | \n",
" 5.521078 | \n",
" 9.6 | \n",
"
\n",
" \n",
" Michigan | \n",
" 2.9 | \n",
" 7.662255 | \n",
" 15.4 | \n",
"
\n",
" \n",
" Minnesota | \n",
" 2.7 | \n",
" 4.886765 | \n",
" 8.6 | \n",
"
\n",
" \n",
" Mississippi | \n",
" 4.6 | \n",
" 7.285784 | \n",
" 11.7 | \n",
"
\n",
" \n",
" Missouri | \n",
" 3.0 | \n",
" 6.069118 | \n",
" 10.5 | \n",
"
\n",
" \n",
" Montana | \n",
" 2.7 | \n",
" 5.023039 | \n",
" 8.6 | \n",
"
\n",
" \n",
" Nebraska | \n",
" 2.6 | \n",
" 3.629412 | \n",
" 5.4 | \n",
"
\n",
" \n",
" Nevada | \n",
" 3.7 | \n",
" 7.220098 | \n",
" 13.9 | \n",
"
\n",
" \n",
" New Hampshire | \n",
" 2.3 | \n",
" 4.205882 | \n",
" 7.0 | \n",
"
\n",
" \n",
" New Jersey | \n",
" 3.3 | \n",
" 6.248039 | \n",
" 10.3 | \n",
"
\n",
" \n",
" New Mexico | \n",
" 3.4 | \n",
" 5.972059 | \n",
" 9.0 | \n",
"
\n",
" \n",
" New York | \n",
" 3.9 | \n",
" 6.182353 | \n",
" 9.6 | \n",
"
\n",
" \n",
" North Carolina | \n",
" 3.0 | \n",
" 6.741176 | \n",
" 12.0 | \n",
"
\n",
" \n",
" North Dakota | \n",
" 2.0 | \n",
" 3.279902 | \n",
" 5.1 | \n",
"
\n",
" \n",
" Ohio | \n",
" 3.4 | \n",
" 6.455392 | \n",
" 12.0 | \n",
"
\n",
" \n",
" Oklahoma | \n",
" 2.6 | \n",
" 4.784804 | \n",
" 7.6 | \n",
"
\n",
" \n",
" Oregon | \n",
" 4.2 | \n",
" 7.242647 | \n",
" 12.6 | \n",
"
\n",
" \n",
" Pennsylvania | \n",
" 3.6 | \n",
" 5.946569 | \n",
" 9.5 | \n",
"
\n",
" \n",
" Puerto Rico | \n",
" 8.9 | \n",
" 12.653922 | \n",
" 17.3 | \n",
"
\n",
" \n",
" Rhode Island | \n",
" 3.6 | \n",
" 7.000490 | \n",
" 12.2 | \n",
"
\n",
" \n",
" South Carolina | \n",
" 3.2 | \n",
" 7.130882 | \n",
" 12.4 | \n",
"
\n",
" \n",
" South Dakota | \n",
" 2.0 | \n",
" 3.579412 | \n",
" 5.9 | \n",
"
\n",
" \n",
" Tennessee | \n",
" 3.3 | \n",
" 6.369118 | \n",
" 11.3 | \n",
"
\n",
" \n",
" Texas | \n",
" 3.5 | \n",
" 5.775000 | \n",
" 8.6 | \n",
"
\n",
" \n",
" Utah | \n",
" 2.1 | \n",
" 4.716667 | \n",
" 8.4 | \n",
"
\n",
" \n",
" Vermont | \n",
" 2.3 | \n",
" 4.247549 | \n",
" 7.6 | \n",
"
\n",
" \n",
" Virginia | \n",
" 1.8 | \n",
" 4.543137 | \n",
" 7.9 | \n",
"
\n",
" \n",
" Washington | \n",
" 4.2 | \n",
" 6.706863 | \n",
" 11.3 | \n",
"
\n",
" \n",
" West Virginia | \n",
" 3.6 | \n",
" 6.168137 | \n",
" 10.0 | \n",
"
\n",
" \n",
" Wisconsin | \n",
" 2.8 | \n",
" 5.668627 | \n",
" 10.4 | \n",
"
\n",
" \n",
" Wyoming | \n",
" 2.4 | \n",
" 4.399510 | \n",
" 8.1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" amin mean amax\n",
"state \n",
"Alabama 3.3 6.514706 12.3\n",
"Alaska 5.2 7.005392 9.1\n",
"Arizona 3.4 6.381373 11.1\n",
"Arkansas 3.4 5.997059 9.5\n",
"California 4.4 7.443627 12.7\n",
"Colorado 2.4 5.453922 9.2\n",
"Connecticut 1.7 5.821078 9.8\n",
"Delaware 2.9 5.214216 9.4\n",
"District of Columbia 4.7 7.306373 10.9\n",
"Florida 2.8 6.158824 11.5\n",
"Georgia 3.0 6.446078 10.9\n",
"Hawaii 1.9 4.456373 7.9\n",
"Idaho 2.2 5.525000 10.5\n",
"Illinois 3.6 6.936275 12.2\n",
"Indiana 2.2 6.184804 11.8\n",
"Iowa 2.3 4.373039 7.3\n",
"Kansas 3.2 5.084314 7.9\n",
"Kentucky 3.7 6.639216 11.9\n",
"Louisiana 3.6 6.197549 10.9\n",
"Maine 2.6 5.470098 9.5\n",
"Maryland 3.1 5.121078 8.3\n",
"Massachusetts 2.1 5.521078 9.6\n",
"Michigan 2.9 7.662255 15.4\n",
"Minnesota 2.7 4.886765 8.6\n",
"Mississippi 4.6 7.285784 11.7\n",
"Missouri 3.0 6.069118 10.5\n",
"Montana 2.7 5.023039 8.6\n",
"Nebraska 2.6 3.629412 5.4\n",
"Nevada 3.7 7.220098 13.9\n",
"New Hampshire 2.3 4.205882 7.0\n",
"New Jersey 3.3 6.248039 10.3\n",
"New Mexico 3.4 5.972059 9.0\n",
"New York 3.9 6.182353 9.6\n",
"North Carolina 3.0 6.741176 12.0\n",
"North Dakota 2.0 3.279902 5.1\n",
"Ohio 3.4 6.455392 12.0\n",
"Oklahoma 2.6 4.784804 7.6\n",
"Oregon 4.2 7.242647 12.6\n",
"Pennsylvania 3.6 5.946569 9.5\n",
"Puerto Rico 8.9 12.653922 17.3\n",
"Rhode Island 3.6 7.000490 12.2\n",
"South Carolina 3.2 7.130882 12.4\n",
"South Dakota 2.0 3.579412 5.9\n",
"Tennessee 3.3 6.369118 11.3\n",
"Texas 3.5 5.775000 8.6\n",
"Utah 2.1 4.716667 8.4\n",
"Vermont 2.3 4.247549 7.6\n",
"Virginia 1.8 4.543137 7.9\n",
"Washington 4.2 6.706863 11.3\n",
"West Virginia 3.6 6.168137 10.0\n",
"Wisconsin 2.8 5.668627 10.4\n",
"Wyoming 2.4 4.399510 8.1"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.groupby('state')['unemployment_rate'].aggregate([np.min, np.mean, np.max])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can dress it up a little too if we like:"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Mean Unemploy. | \n",
" Min. Unemploy. | \n",
" Max. Unemploy. | \n",
"
\n",
" \n",
" state | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" Alabama | \n",
" 6.514706 | \n",
" 3.3 | \n",
" 12.3 | \n",
"
\n",
" \n",
" Alaska | \n",
" 7.005392 | \n",
" 5.2 | \n",
" 9.1 | \n",
"
\n",
" \n",
" Arizona | \n",
" 6.381373 | \n",
" 3.4 | \n",
" 11.1 | \n",
"
\n",
" \n",
" Arkansas | \n",
" 5.997059 | \n",
" 3.4 | \n",
" 9.5 | \n",
"
\n",
" \n",
" California | \n",
" 7.443627 | \n",
" 4.4 | \n",
" 12.7 | \n",
"
\n",
" \n",
" Colorado | \n",
" 5.453922 | \n",
" 2.4 | \n",
" 9.2 | \n",
"
\n",
" \n",
" Connecticut | \n",
" 5.821078 | \n",
" 1.7 | \n",
" 9.8 | \n",
"
\n",
" \n",
" Delaware | \n",
" 5.214216 | \n",
" 2.9 | \n",
" 9.4 | \n",
"
\n",
" \n",
" District of Columbia | \n",
" 7.306373 | \n",
" 4.7 | \n",
" 10.9 | \n",
"
\n",
" \n",
" Florida | \n",
" 6.158824 | \n",
" 2.8 | \n",
" 11.5 | \n",
"
\n",
" \n",
" Georgia | \n",
" 6.446078 | \n",
" 3.0 | \n",
" 10.9 | \n",
"
\n",
" \n",
" Hawaii | \n",
" 4.456373 | \n",
" 1.9 | \n",
" 7.9 | \n",
"
\n",
" \n",
" Idaho | \n",
" 5.525000 | \n",
" 2.2 | \n",
" 10.5 | \n",
"
\n",
" \n",
" Illinois | \n",
" 6.936275 | \n",
" 3.6 | \n",
" 12.2 | \n",
"
\n",
" \n",
" Indiana | \n",
" 6.184804 | \n",
" 2.2 | \n",
" 11.8 | \n",
"
\n",
" \n",
" Iowa | \n",
" 4.373039 | \n",
" 2.3 | \n",
" 7.3 | \n",
"
\n",
" \n",
" Kansas | \n",
" 5.084314 | \n",
" 3.2 | \n",
" 7.9 | \n",
"
\n",
" \n",
" Kentucky | \n",
" 6.639216 | \n",
" 3.7 | \n",
" 11.9 | \n",
"
\n",
" \n",
" Louisiana | \n",
" 6.197549 | \n",
" 3.6 | \n",
" 10.9 | \n",
"
\n",
" \n",
" Maine | \n",
" 5.470098 | \n",
" 2.6 | \n",
" 9.5 | \n",
"
\n",
" \n",
" Maryland | \n",
" 5.121078 | \n",
" 3.1 | \n",
" 8.3 | \n",
"
\n",
" \n",
" Massachusetts | \n",
" 5.521078 | \n",
" 2.1 | \n",
" 9.6 | \n",
"
\n",
" \n",
" Michigan | \n",
" 7.662255 | \n",
" 2.9 | \n",
" 15.4 | \n",
"
\n",
" \n",
" Minnesota | \n",
" 4.886765 | \n",
" 2.7 | \n",
" 8.6 | \n",
"
\n",
" \n",
" Mississippi | \n",
" 7.285784 | \n",
" 4.6 | \n",
" 11.7 | \n",
"
\n",
" \n",
" Missouri | \n",
" 6.069118 | \n",
" 3.0 | \n",
" 10.5 | \n",
"
\n",
" \n",
" Montana | \n",
" 5.023039 | \n",
" 2.7 | \n",
" 8.6 | \n",
"
\n",
" \n",
" Nebraska | \n",
" 3.629412 | \n",
" 2.6 | \n",
" 5.4 | \n",
"
\n",
" \n",
" Nevada | \n",
" 7.220098 | \n",
" 3.7 | \n",
" 13.9 | \n",
"
\n",
" \n",
" New Hampshire | \n",
" 4.205882 | \n",
" 2.3 | \n",
" 7.0 | \n",
"
\n",
" \n",
" New Jersey | \n",
" 6.248039 | \n",
" 3.3 | \n",
" 10.3 | \n",
"
\n",
" \n",
" New Mexico | \n",
" 5.972059 | \n",
" 3.4 | \n",
" 9.0 | \n",
"
\n",
" \n",
" New York | \n",
" 6.182353 | \n",
" 3.9 | \n",
" 9.6 | \n",
"
\n",
" \n",
" North Carolina | \n",
" 6.741176 | \n",
" 3.0 | \n",
" 12.0 | \n",
"
\n",
" \n",
" North Dakota | \n",
" 3.279902 | \n",
" 2.0 | \n",
" 5.1 | \n",
"
\n",
" \n",
" Ohio | \n",
" 6.455392 | \n",
" 3.4 | \n",
" 12.0 | \n",
"
\n",
" \n",
" Oklahoma | \n",
" 4.784804 | \n",
" 2.6 | \n",
" 7.6 | \n",
"
\n",
" \n",
" Oregon | \n",
" 7.242647 | \n",
" 4.2 | \n",
" 12.6 | \n",
"
\n",
" \n",
" Pennsylvania | \n",
" 5.946569 | \n",
" 3.6 | \n",
" 9.5 | \n",
"
\n",
" \n",
" Puerto Rico | \n",
" 12.653922 | \n",
" 8.9 | \n",
" 17.3 | \n",
"
\n",
" \n",
" Rhode Island | \n",
" 7.000490 | \n",
" 3.6 | \n",
" 12.2 | \n",
"
\n",
" \n",
" South Carolina | \n",
" 7.130882 | \n",
" 3.2 | \n",
" 12.4 | \n",
"
\n",
" \n",
" South Dakota | \n",
" 3.579412 | \n",
" 2.0 | \n",
" 5.9 | \n",
"
\n",
" \n",
" Tennessee | \n",
" 6.369118 | \n",
" 3.3 | \n",
" 11.3 | \n",
"
\n",
" \n",
" Texas | \n",
" 5.775000 | \n",
" 3.5 | \n",
" 8.6 | \n",
"
\n",
" \n",
" Utah | \n",
" 4.716667 | \n",
" 2.1 | \n",
" 8.4 | \n",
"
\n",
" \n",
" Vermont | \n",
" 4.247549 | \n",
" 2.3 | \n",
" 7.6 | \n",
"
\n",
" \n",
" Virginia | \n",
" 4.543137 | \n",
" 1.8 | \n",
" 7.9 | \n",
"
\n",
" \n",
" Washington | \n",
" 6.706863 | \n",
" 4.2 | \n",
" 11.3 | \n",
"
\n",
" \n",
" West Virginia | \n",
" 6.168137 | \n",
" 3.6 | \n",
" 10.0 | \n",
"
\n",
" \n",
" Wisconsin | \n",
" 5.668627 | \n",
" 2.8 | \n",
" 10.4 | \n",
"
\n",
" \n",
" Wyoming | \n",
" 4.399510 | \n",
" 2.4 | \n",
" 8.1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Mean Unemploy. Min. Unemploy. Max. Unemploy.\n",
"state \n",
"Alabama 6.514706 3.3 12.3\n",
"Alaska 7.005392 5.2 9.1\n",
"Arizona 6.381373 3.4 11.1\n",
"Arkansas 5.997059 3.4 9.5\n",
"California 7.443627 4.4 12.7\n",
"Colorado 5.453922 2.4 9.2\n",
"Connecticut 5.821078 1.7 9.8\n",
"Delaware 5.214216 2.9 9.4\n",
"District of Columbia 7.306373 4.7 10.9\n",
"Florida 6.158824 2.8 11.5\n",
"Georgia 6.446078 3.0 10.9\n",
"Hawaii 4.456373 1.9 7.9\n",
"Idaho 5.525000 2.2 10.5\n",
"Illinois 6.936275 3.6 12.2\n",
"Indiana 6.184804 2.2 11.8\n",
"Iowa 4.373039 2.3 7.3\n",
"Kansas 5.084314 3.2 7.9\n",
"Kentucky 6.639216 3.7 11.9\n",
"Louisiana 6.197549 3.6 10.9\n",
"Maine 5.470098 2.6 9.5\n",
"Maryland 5.121078 3.1 8.3\n",
"Massachusetts 5.521078 2.1 9.6\n",
"Michigan 7.662255 2.9 15.4\n",
"Minnesota 4.886765 2.7 8.6\n",
"Mississippi 7.285784 4.6 11.7\n",
"Missouri 6.069118 3.0 10.5\n",
"Montana 5.023039 2.7 8.6\n",
"Nebraska 3.629412 2.6 5.4\n",
"Nevada 7.220098 3.7 13.9\n",
"New Hampshire 4.205882 2.3 7.0\n",
"New Jersey 6.248039 3.3 10.3\n",
"New Mexico 5.972059 3.4 9.0\n",
"New York 6.182353 3.9 9.6\n",
"North Carolina 6.741176 3.0 12.0\n",
"North Dakota 3.279902 2.0 5.1\n",
"Ohio 6.455392 3.4 12.0\n",
"Oklahoma 4.784804 2.6 7.6\n",
"Oregon 7.242647 4.2 12.6\n",
"Pennsylvania 5.946569 3.6 9.5\n",
"Puerto Rico 12.653922 8.9 17.3\n",
"Rhode Island 7.000490 3.6 12.2\n",
"South Carolina 7.130882 3.2 12.4\n",
"South Dakota 3.579412 2.0 5.9\n",
"Tennessee 6.369118 3.3 11.3\n",
"Texas 5.775000 3.5 8.6\n",
"Utah 4.716667 2.1 8.4\n",
"Vermont 4.247549 2.3 7.6\n",
"Virginia 4.543137 1.8 7.9\n",
"Washington 6.706863 4.2 11.3\n",
"West Virginia 6.168137 3.6 10.0\n",
"Wisconsin 5.668627 2.8 10.4\n",
"Wyoming 4.399510 2.4 8.1"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.groupby('state')['unemployment_rate'] \\\n",
" .aggregate({'Min. Unemploy.' : np.min,\n",
" 'Mean Unemploy.' : np.mean,\n",
" 'Max. Unemploy.' : np.max})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the challenge above, we used filter to select out states based on a string, but there are more uses of filter than this simple one. For example, we can filter based on the group properties:"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def filter_func(x):\n",
" return x['unemployment_rate'].min() > 5\n",
"\n",
"problem_states = data.groupby('state').filter(filter_func)"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array(['Alaska', 'Puerto Rico'], dtype=object)"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"problem_states.state.unique()"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" state | \n",
" Alaska | \n",
" Puerto Rico | \n",
"
\n",
" \n",
" year | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2000 | \n",
" 6.375000 | \n",
" 10.125000 | \n",
"
\n",
" \n",
" 2001 | \n",
" 6.433333 | \n",
" 11.350000 | \n",
"
\n",
" \n",
" 2002 | \n",
" 7.291667 | \n",
" 12.250000 | \n",
"
\n",
" \n",
" 2003 | \n",
" 7.825000 | \n",
" 11.966667 | \n",
"
\n",
" \n",
" 2004 | \n",
" 7.458333 | \n",
" 10.575000 | \n",
"
\n",
" \n",
" 2005 | \n",
" 6.883333 | \n",
" 11.325000 | \n",
"
\n",
" \n",
" 2006 | \n",
" 6.616667 | \n",
" 10.500000 | \n",
"
\n",
" \n",
" 2007 | \n",
" 6.333333 | \n",
" 11.183333 | \n",
"
\n",
" \n",
" 2008 | \n",
" 6.691667 | \n",
" 11.766667 | \n",
"
\n",
" \n",
" 2009 | \n",
" 7.741667 | \n",
" 15.333333 | \n",
"
\n",
" \n",
" 2010 | \n",
" 7.883333 | \n",
" 16.358333 | \n",
"
\n",
" \n",
" 2011 | \n",
" 7.591667 | \n",
" 15.941667 | \n",
"
\n",
" \n",
" 2012 | \n",
" 7.141667 | \n",
" 14.466667 | \n",
"
\n",
" \n",
" 2013 | \n",
" 6.925000 | \n",
" 14.291667 | \n",
"
\n",
" \n",
" 2014 | \n",
" 6.841667 | \n",
" 13.900000 | \n",
"
\n",
" \n",
" 2015 | \n",
" 6.441667 | \n",
" 11.991667 | \n",
"
\n",
" \n",
" 2016 | \n",
" 6.616667 | \n",
" 11.791667 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"state Alaska Puerto Rico\n",
"year \n",
"2000 6.375000 10.125000\n",
"2001 6.433333 11.350000\n",
"2002 7.291667 12.250000\n",
"2003 7.825000 11.966667\n",
"2004 7.458333 10.575000\n",
"2005 6.883333 11.325000\n",
"2006 6.616667 10.500000\n",
"2007 6.333333 11.183333\n",
"2008 6.691667 11.766667\n",
"2009 7.741667 15.333333\n",
"2010 7.883333 16.358333\n",
"2011 7.591667 15.941667\n",
"2012 7.141667 14.466667\n",
"2013 6.925000 14.291667\n",
"2014 6.841667 13.900000\n",
"2015 6.441667 11.991667\n",
"2016 6.616667 11.791667"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"problem_states.groupby(['state', 'year']).unemployment_rate.mean().unstack(level=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Other useful functions:\n",
"\n",
"GroupBy also works alongside the following functions:\n",
"* `transform()`\n",
"* `apply()`\n",
"\n",
"For lack of time we leave these for you to explore."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}