{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Operations on Data\n",
    "* Contact: Lachlan Deer, [econgit] @ldeer, [github/twitter] @lachlandeer\n",
    "\n",
    "When working with Numpy, and essential part of our toolkit was the ability to perform element wise operations such as adding and multiply or operations like exponentiating or taking the log. We also want this functionality for DataFrames and pandas provides it - borrowing functionality from NumPy.\n",
    "\n",
    "In this notebook we will look at how Pandas allows us to perform operations like those mentioned above - using universal functions.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## UFuncs: Preserving Indices\n",
    "\n",
    "Pandas is designed to work with Numpy. This means any numpy ufunc applied to a pandas object will return another pandas object of the same type.\n",
    "\n",
    "To see this, let's set up some example pandas objects:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    2\n",
       "1    6\n",
       "2    3\n",
       "3    1\n",
       "4    9\n",
       "5    9\n",
       "6    4\n",
       "7    0\n",
       "8    9\n",
       "9    6\n",
       "dtype: int64"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rng = np.random.RandomState(1234567890)\n",
    "series = pd.Series(rng.randint(0, 10, 10))\n",
    "series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(series)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>6</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>8</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>8</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>8</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>8</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>5</td>\n",
       "      <td>7</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>6</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>9</td>\n",
       "      <td>4</td>\n",
       "      <td>8</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   A  B  C  D\n",
       "0  4  0  7  3\n",
       "1  5  1  6  8\n",
       "2  3  5  1  7\n",
       "3  1  8  1  1\n",
       "4  2  5  8  7\n",
       "5  8  9  8  2\n",
       "6  1  1  8  4\n",
       "7  5  7  2  5\n",
       "8  6  3  4  3\n",
       "9  9  4  8  5"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame(rng.randint(0, 10, (10, 4)),\n",
    "                  columns=['A', 'B', 'C', 'D'])\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we apply a NumPy ufunc to an object, let's see what happens"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "series2 = np.log(series)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(series2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "df2 = np.exp(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(df2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(df['B'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(np.log10(df['B']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also apply a numpy ufunc to an individual row of a DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>54.598150</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1096.633158</td>\n",
       "      <td>20.085537</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>148.413159</td>\n",
       "      <td>1.0</td>\n",
       "      <td>403.428793</td>\n",
       "      <td>2980.957987</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>20.085537</td>\n",
       "      <td>5.0</td>\n",
       "      <td>2.718282</td>\n",
       "      <td>1096.633158</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2.718282</td>\n",
       "      <td>8.0</td>\n",
       "      <td>2.718282</td>\n",
       "      <td>2.718282</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>7.389056</td>\n",
       "      <td>5.0</td>\n",
       "      <td>2980.957987</td>\n",
       "      <td>1096.633158</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2980.957987</td>\n",
       "      <td>9.0</td>\n",
       "      <td>2980.957987</td>\n",
       "      <td>7.389056</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>2.718282</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2980.957987</td>\n",
       "      <td>54.598150</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>148.413159</td>\n",
       "      <td>7.0</td>\n",
       "      <td>7.389056</td>\n",
       "      <td>148.413159</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>403.428793</td>\n",
       "      <td>3.0</td>\n",
       "      <td>54.598150</td>\n",
       "      <td>20.085537</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>8103.083928</td>\n",
       "      <td>4.0</td>\n",
       "      <td>2980.957987</td>\n",
       "      <td>148.413159</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             A    B            C            D\n",
       "0    54.598150  0.0  1096.633158    20.085537\n",
       "1   148.413159  1.0   403.428793  2980.957987\n",
       "2    20.085537  5.0     2.718282  1096.633158\n",
       "3     2.718282  8.0     2.718282     2.718282\n",
       "4     7.389056  5.0  2980.957987  1096.633158\n",
       "5  2980.957987  9.0  2980.957987     7.389056\n",
       "6     2.718282  1.0  2980.957987    54.598150\n",
       "7   148.413159  7.0     7.389056   148.413159\n",
       "8   403.428793  3.0    54.598150    20.085537\n",
       "9  8103.083928  4.0  2980.957987   148.413159"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3 = df2.copy()\n",
    "df3['B'] = np.log(df3['B'])\n",
    "df3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition to NumPy functions, pandas provides it's own operators using operator overloading. These are summarized here:\n",
    "\n",
    "[insert table]\n",
    "\n",
    "We can use these operators to combine a pandas object with a scalar:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      545.981500\n",
       "1     1484.131591\n",
       "2      200.855369\n",
       "3       27.182818\n",
       "4       73.890561\n",
       "5    29809.579870\n",
       "6       27.182818\n",
       "7     1484.131591\n",
       "8     4034.287935\n",
       "9    81030.839276\n",
       "Name: A, dtype: float64"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3['A'] * 10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and can also create new columns using ufuncs (either via NumPy or the pandas operators):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "      <th>E</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>54.598150</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1096.633158</td>\n",
       "      <td>20.085537</td>\n",
       "      <td>0.018316</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>148.413159</td>\n",
       "      <td>1.0</td>\n",
       "      <td>403.428793</td>\n",
       "      <td>2980.957987</td>\n",
       "      <td>7.389056</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>20.085537</td>\n",
       "      <td>5.0</td>\n",
       "      <td>2.718282</td>\n",
       "      <td>1096.633158</td>\n",
       "      <td>403.428793</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2.718282</td>\n",
       "      <td>8.0</td>\n",
       "      <td>2.718282</td>\n",
       "      <td>2.718282</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>7.389056</td>\n",
       "      <td>5.0</td>\n",
       "      <td>2980.957987</td>\n",
       "      <td>1096.633158</td>\n",
       "      <td>0.367879</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2980.957987</td>\n",
       "      <td>9.0</td>\n",
       "      <td>2980.957987</td>\n",
       "      <td>7.389056</td>\n",
       "      <td>0.002479</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>2.718282</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2980.957987</td>\n",
       "      <td>54.598150</td>\n",
       "      <td>0.018316</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>148.413159</td>\n",
       "      <td>7.0</td>\n",
       "      <td>7.389056</td>\n",
       "      <td>148.413159</td>\n",
       "      <td>20.085537</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>403.428793</td>\n",
       "      <td>3.0</td>\n",
       "      <td>54.598150</td>\n",
       "      <td>20.085537</td>\n",
       "      <td>0.367879</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>8103.083928</td>\n",
       "      <td>4.0</td>\n",
       "      <td>2980.957987</td>\n",
       "      <td>148.413159</td>\n",
       "      <td>0.049787</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             A    B            C            D           E\n",
       "0    54.598150  0.0  1096.633158    20.085537    0.018316\n",
       "1   148.413159  1.0   403.428793  2980.957987    7.389056\n",
       "2    20.085537  5.0     2.718282  1096.633158  403.428793\n",
       "3     2.718282  8.0     2.718282     2.718282    1.000000\n",
       "4     7.389056  5.0  2980.957987  1096.633158    0.367879\n",
       "5  2980.957987  9.0  2980.957987     7.389056    0.002479\n",
       "6     2.718282  1.0  2980.957987    54.598150    0.018316\n",
       "7   148.413159  7.0     7.389056   148.413159   20.085537\n",
       "8   403.428793  3.0    54.598150    20.085537    0.367879\n",
       "9  8103.083928  4.0  2980.957987   148.413159    0.049787"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3['E'] = df3['D'] / df3['C']\n",
    "df3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Challenge\n",
    "\n",
    "Load in our labor market statistics data and perform the following operations:\n",
    "\n",
    "1. Verify the `unemployment_rate` variable is correctly computed. (Hint, round your computation to 1 decimal place using .round(decimels=1), test equality using df[''].equals(df['']))\n",
    "2. Does the Decomposition `labour_force = qty_employed + qty_employed` hold true?\n",
    "3. Assume that each worker was employed for 38 hours per week. Create a new column that estimates the number of hours worked in each state-year-month\n",
    "4. Assume that 60 percent of workers work 45 hours per week, whilst 40 percent work 20 hours per week, estimate the number of labour hours in each state-year-month\n",
    "5. Calculate the difference in your estimates from 3 and 4.\n",
    "\n",
    "#### Solutions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>state</th>\n",
       "      <th>period</th>\n",
       "      <th>year</th>\n",
       "      <th>unemployment_rate</th>\n",
       "      <th>qty_unemployed</th>\n",
       "      <th>qty_employed</th>\n",
       "      <th>labour_force</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Alabama</td>\n",
       "      <td>M01</td>\n",
       "      <td>2000</td>\n",
       "      <td>5.1</td>\n",
       "      <td>108551</td>\n",
       "      <td>2024262</td>\n",
       "      <td>2132813</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Alabama</td>\n",
       "      <td>M01</td>\n",
       "      <td>2001</td>\n",
       "      <td>5.2</td>\n",
       "      <td>110035</td>\n",
       "      <td>2008876</td>\n",
       "      <td>2118911</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Alabama</td>\n",
       "      <td>M01</td>\n",
       "      <td>2002</td>\n",
       "      <td>6.5</td>\n",
       "      <td>134750</td>\n",
       "      <td>1953814</td>\n",
       "      <td>2088564</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Alabama</td>\n",
       "      <td>M01</td>\n",
       "      <td>2003</td>\n",
       "      <td>6.2</td>\n",
       "      <td>129966</td>\n",
       "      <td>1974374</td>\n",
       "      <td>2104340</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Alabama</td>\n",
       "      <td>M01</td>\n",
       "      <td>2004</td>\n",
       "      <td>6.4</td>\n",
       "      <td>134673</td>\n",
       "      <td>1985886</td>\n",
       "      <td>2120559</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     state period  year  unemployment_rate  qty_unemployed  qty_employed  \\\n",
       "0  Alabama    M01  2000                5.1          108551       2024262   \n",
       "1  Alabama    M01  2001                5.2          110035       2008876   \n",
       "2  Alabama    M01  2002                6.5          134750       1953814   \n",
       "3  Alabama    M01  2003                6.2          129966       1974374   \n",
       "4  Alabama    M01  2004                6.4          134673       1985886   \n",
       "\n",
       "   labour_force  \n",
       "0       2132813  \n",
       "1       2118911  \n",
       "2       2088564  \n",
       "3       2104340  \n",
       "4       2120559  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.read_csv('out_data/state_labour_statistics.csv')\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>state</th>\n",
       "      <th>period</th>\n",
       "      <th>year</th>\n",
       "      <th>unemployment_rate</th>\n",
       "      <th>qty_unemployed</th>\n",
       "      <th>qty_employed</th>\n",
       "      <th>labour_force</th>\n",
       "      <th>ue_rate2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Alabama</td>\n",
       "      <td>M01</td>\n",
       "      <td>2000</td>\n",
       "      <td>5.1</td>\n",
       "      <td>108551</td>\n",
       "      <td>2024262</td>\n",
       "      <td>2132813</td>\n",
       "      <td>5.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Alabama</td>\n",
       "      <td>M01</td>\n",
       "      <td>2001</td>\n",
       "      <td>5.2</td>\n",
       "      <td>110035</td>\n",
       "      <td>2008876</td>\n",
       "      <td>2118911</td>\n",
       "      <td>5.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Alabama</td>\n",
       "      <td>M01</td>\n",
       "      <td>2002</td>\n",
       "      <td>6.5</td>\n",
       "      <td>134750</td>\n",
       "      <td>1953814</td>\n",
       "      <td>2088564</td>\n",
       "      <td>6.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Alabama</td>\n",
       "      <td>M01</td>\n",
       "      <td>2003</td>\n",
       "      <td>6.2</td>\n",
       "      <td>129966</td>\n",
       "      <td>1974374</td>\n",
       "      <td>2104340</td>\n",
       "      <td>6.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Alabama</td>\n",
       "      <td>M01</td>\n",
       "      <td>2004</td>\n",
       "      <td>6.4</td>\n",
       "      <td>134673</td>\n",
       "      <td>1985886</td>\n",
       "      <td>2120559</td>\n",
       "      <td>6.4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     state period  year  unemployment_rate  qty_unemployed  qty_employed  \\\n",
       "0  Alabama    M01  2000                5.1          108551       2024262   \n",
       "1  Alabama    M01  2001                5.2          110035       2008876   \n",
       "2  Alabama    M01  2002                6.5          134750       1953814   \n",
       "3  Alabama    M01  2003                6.2          129966       1974374   \n",
       "4  Alabama    M01  2004                6.4          134673       1985886   \n",
       "\n",
       "   labour_force  ue_rate2  \n",
       "0       2132813       5.1  \n",
       "1       2118911       5.2  \n",
       "2       2088564       6.5  \n",
       "3       2104340       6.2  \n",
       "4       2120559       6.4  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['ue_rate2'] = (data['qty_unemployed'] / data['labour_force'] * 100).round(1)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['ue_rate2'].equals(data['unemployment_rate'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['labour_force'].equals(data['qty_employed'] + data['qty_unemployed'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Index Alignment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When pandas is performing binary operations on between different series, it will align indices when performing the operation. This is useful if we have incomplete data in one of both data frames when we are trying to combine them:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "A = pd.Series([2, 4, 6], index=[0, 1, 2])\n",
    "B = pd.Series([1, 3, 5], index=[1, 2, 3])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    NaN\n",
       "1    5.0\n",
       "2    9.0\n",
       "3    NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A + B"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "by default, pandas returns NaNs - but this may not always be the behaviour we want. The option `fill_value` allows us to specify a value for the missing value to take:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    2.0\n",
       "1    5.0\n",
       "2    9.0\n",
       "3    5.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A.add(B, fill_value = 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "although this may not be desirable.\n",
    "\n",
    "A similar idea holds for DataFrames:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>8.0</td>\n",
       "      <td>18.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>21.0</td>\n",
       "      <td>18.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      A     B   C\n",
       "0   8.0  18.0 NaN\n",
       "1  21.0  18.0 NaN\n",
       "2   NaN   NaN NaN"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A = pd.DataFrame(rng.randint(0, 20, (2, 2)),\n",
    "                 columns=list('AB'))\n",
    "B = pd.DataFrame(rng.randint(0, 10, (3, 3)),\n",
    "                 columns=list('BAC'))\n",
    "A+B"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally you can use ufuncs to combine info from a DataFrame and a Series:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>B</th>\n",
       "      <th>A</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   B  A  C\n",
       "0  9  7  0\n",
       "1  3  2  4\n",
       "2  7  1  1"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "B"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>B</th>\n",
       "      <th>A</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>-2</td>\n",
       "      <td>-9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>-1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>-6</td>\n",
       "      <td>-6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   B  A  C\n",
       "0  0 -2 -9\n",
       "1  0 -1  1\n",
       "2  0 -6 -6"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "B.subtract(B['B'], axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}