Indexing and Selecting Data

  • Contact: Lachlan Deer, [econgit] @ldeer, [github/Twitter] @lachlandeer

In yesterday’s class on NumPy we thought about indexing, slicing, masking and modifying values in Numpy arrays. There are analogues to these operations in the pandas library that we will look into now.

Our focus will be on the pandas DataFrame, rather than the pandas Series Object - because we will be dealing with DataFrames more often than not.

import pandas as pd

First, let’s load our example DataFrame that we assembled in the previous lesson:

df = pd.read_csv('out_data/state_labour_statistics.csv')
df.head()
state period year unemployment_rate qty_unemployed qty_employed labour_force
0 New Hampshire M12 2016 2.5 18907 728556 747463
1 New Hampshire M11 2016 2.6 19573 729720 749293
2 New Hampshire M10 2016 2.5 18430 729792 748222
3 New Hampshire M09 2016 2.6 19415 727548 746963
4 New Hampshire M08 2016 2.8 20933 737236 758169

Notice that although when we were working with the data in our previous notebook the index was set as state-year-period, once we saved to csv we lost this structure. This is not a big deal, in a future notebook we will go back to setting the index as we so desire.

We can look at some of the features of our DataFrame before continuing (as a review)

df.index
RangeIndex(start=0, stop=10608, step=1)
df.columns
Index(['state', 'period', 'year', 'unemployment_rate', 'qty_unemployed',
       'qty_employed', 'labour_force'],
      dtype='object')
df.shape
(10608, 7)
df.values
array([['New Hampshire', 'M12', 2016, ..., 18907, 728556, 747463],
       ['New Hampshire', 'M11', 2016, ..., 19573, 729720, 749293],
       ['New Hampshire', 'M10', 2016, ..., 18430, 729792, 748222],
       ...,
       ['Georgia', 'M03', 2000, ..., 158596, 4064879, 4223475],
       ['Georgia', 'M02', 2000, ..., 157572, 4046970, 4204542],
       ['Georgia', 'M01', 2000, ..., 157066, 4028619, 4185685]],
      dtype=object)

Selecting Columns

We can select individual columns using a square bracket notation:

df['unemployment_rate']
0        2.5
1        2.6
2        2.5
3        2.6
4        2.8
        ... 
10603    3.7
10604    3.3
10605    3.8
10606    3.7
10607    3.8
Name: unemployment_rate, Length: 10608, dtype: float64

or using a ‘dot’ notation, which is an attribute-style way of accessing

df.unemployment_rate
0        2.5
1        2.6
2        2.5
3        2.6
4        2.8
        ... 
10603    3.7
10604    3.3
10605    3.8
10606    3.7
10607    3.8
Name: unemployment_rate, Length: 10608, dtype: float64

we can verify that it returns the same thing:

df.unemployment_rate is df['unemployment_rate']
True

The attribute style indexing will not work if the column names conflict with a method of a DataFrame.

We can select multiple columns too by passing a list of columns to select:

df[['unemployment_rate' , 'qty_unemployed']]
unemployment_rate qty_unemployed
0 2.5 18907
1 2.6 19573
2 2.5 18430
3 2.6 19415
4 2.8 20933
... ... ...
10603 3.7 155898
10604 3.3 138511
10605 3.8 158596
10606 3.7 157572
10607 3.8 157066

10608 rows × 2 columns

If we feel the need, we can also transpose a DataFrame:

df.T
0 1 2 3 4 5 6 7 8 9 ... 10598 10599 10600 10601 10602 10603 10604 10605 10606 10607
state New Hampshire New Hampshire New Hampshire New Hampshire New Hampshire New Hampshire New Hampshire New Hampshire New Hampshire New Hampshire ... Georgia Georgia Georgia Georgia Georgia Georgia Georgia Georgia Georgia Georgia
period M12 M11 M10 M09 M08 M07 M06 M05 M04 M03 ... M10 M09 M08 M07 M06 M05 M04 M03 M02 M01
year 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ... 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000
unemployment_rate 2.5 2.6 2.5 2.6 2.8 2.8 2.9 2.7 2.8 3.2 ... 3.3 3.5 3.9 3.9 4.0 3.7 3.3 3.8 3.7 3.8
qty_unemployed 18907 19573 18430 19415 20933 21521 21559 20206 20652 23501 ... 137555 146555 164397 165992 168502 155898 138511 158596 157572 157066
qty_employed 728556 729720 729792 727548 737236 740418 733329 722805 720696 721319 ... 4081125 4063109 4060662 4075181 4067365 4063847 4071886 4064879 4046970 4028619
labour_force 747463 749293 748222 746963 758169 761939 754888 743011 741348 744820 ... 4218680 4209664 4225059 4241173 4235867 4219745 4210397 4223475 4204542 4185685

7 rows × 10608 columns

Selecting Rows of data

Since the ‘square-bracket’ indexing is reserved for selecting columns of data, we need another way to access individual rows of data. Pandas offers us three alternatives here:

  • loc

  • iloc

  • ix

Let’s see how each of these work:

iloc works just like the NumPy indexing:

df.iloc[0:3,:]
state period year unemployment_rate qty_unemployed qty_employed labour_force
0 New Hampshire M12 2016 2.5 18907 728556 747463
1 New Hampshire M11 2016 2.6 19573 729720 749293
2 New Hampshire M10 2016 2.5 18430 729792 748222
df.iloc[0:10:2, 0:4]
state period year unemployment_rate
0 New Hampshire M12 2016 2.5
2 New Hampshire M10 2016 2.5
4 New Hampshire M08 2016 2.8
6 New Hampshire M06 2016 2.9
8 New Hampshire M04 2016 2.8

The loc syntax allows us to index the data using explicit index and column names:

df.loc[0:10, 'state':'unemployment_rate']
state period year unemployment_rate
0 New Hampshire M12 2016 2.5
1 New Hampshire M11 2016 2.6
2 New Hampshire M10 2016 2.5
3 New Hampshire M09 2016 2.6
4 New Hampshire M08 2016 2.8
5 New Hampshire M07 2016 2.8
6 New Hampshire M06 2016 2.9
7 New Hampshire M05 2016 2.7
8 New Hampshire M04 2016 2.8
9 New Hampshire M03 2016 3.2
10 New Hampshire M02 2016 3.3

The ix indexer is useful if the index is set as text but we want to select by row number, whilst refer to columns by name. Note for our example, nothing changes:

df.ix[:3, 'state':'unemployment_rate']
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_2843/3867642031.py in <module>
----> 1 df.ix[:3, 'state':'unemployment_rate']

/opt/hostedtoolcache/Python/3.8.11/x64/lib/python3.8/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5485         ):
   5486             return self[name]
-> 5487         return object.__getattribute__(self, name)
   5488 
   5489     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'ix'

but if instead we have

unemployed = pd.Series({'California': 423967, 'Texas': 695662,
                  'New York': 141297, 'Florida': 170312,
                  'Illinois': 149995})
population = pd.Series({'California': 38332521, 'Texas': 26448193,
                 'New York': 19651127, 'Florida': 19552860,
                 'Illinois': 12882135})
data = pd.DataFrame({'unemployed':unemployed, 'population':population})
data
population unemployed
California 38332521 423967
Florida 19552860 170312
Illinois 12882135 149995
New York 19651127 141297
Texas 26448193 695662
data.ix[1:3, 'population']
Florida     19552860
Illinois    12882135
Name: population, dtype: int64

gives delivers something different than .loc

Any of our indexing conventions allow us to modify values if we so wish:

data.loc['Florida', 'population'] = 20000000
data
population unemployed
California 38332521 423967
Florida 20000000 170312
Illinois 12882135 149995
New York 19651127 141297
Texas 26448193 695662

Boolean Masking and Fancy Indexing

we can use NumPy-style access patterns within our indexers to subsets of data:

df.loc[df.state =='Alabama']
state period year unemployment_rate qty_unemployed qty_employed labour_force
0 Alabama M01 2000 5.1 108551 2024262 2132813
1 Alabama M01 2001 5.2 110035 2008876 2118911
2 Alabama M01 2002 6.5 134750 1953814 2088564
3 Alabama M01 2003 6.2 129966 1974374 2104340
4 Alabama M01 2004 6.4 134673 1985886 2120559
5 Alabama M01 2005 5.4 114319 2011219 2125538
6 Alabama M01 2006 4.4 93680 2048878 2142558
7 Alabama M01 2007 4.3 93046 2076252 2169298
8 Alabama M01 2008 4.8 104650 2065583 2170233
9 Alabama M01 2009 9.7 208908 1955196 2164104
10 Alabama M01 2010 12.3 263375 1871780 2135155
11 Alabama M01 2011 11.0 243001 1967127 2210128
12 Alabama M01 2012 8.3 177986 1973652 2151638
13 Alabama M01 2013 8.2 177887 1988180 2166067
14 Alabama M01 2014 7.5 162341 1989459 2151800
15 Alabama M01 2015 6.4 135842 1998388 2134230
16 Alabama M01 2016 6.3 134592 2007036 2141628
17 Alabama M02 2000 5.1 109464 2027226 2136690
18 Alabama M02 2001 5.0 105469 2009904 2115373
19 Alabama M02 2002 6.3 132979 1966562 2099541
20 Alabama M02 2003 6.0 127265 1980206 2107471
21 Alabama M02 2004 6.1 130294 1990082 2120376
22 Alabama M02 2005 5.4 115837 2014928 2130765
23 Alabama M02 2006 4.5 96643 2053723 2150366
24 Alabama M02 2007 4.2 91769 2078300 2170069
25 Alabama M02 2008 4.9 106990 2060564 2167554
26 Alabama M02 2009 10.5 227822 1943707 2171529
27 Alabama M02 2010 12.1 258825 1876758 2135583
28 Alabama M02 2011 10.5 232117 1972720 2204837
29 Alabama M02 2012 8.4 182555 1982828 2165383
... ... ... ... ... ... ... ...
174 Alabama M11 2004 5.2 111612 2043152 2154764
175 Alabama M11 2005 4.0 86786 2078869 2165655
176 Alabama M11 2006 3.7 80347 2105333 2185680
177 Alabama M11 2007 3.9 85214 2107374 2192588
178 Alabama M11 2008 6.7 145764 2030180 2175944
179 Alabama M11 2009 11.1 238550 1910326 2148876
180 Alabama M11 2010 10.0 222536 2001481 2224017
181 Alabama M11 2011 8.0 175369 2008604 2183973
182 Alabama M11 2012 6.9 149468 2023318 2172786
183 Alabama M11 2013 6.7 144052 2020314 2164366
184 Alabama M11 2014 5.8 124177 2033698 2157875
185 Alabama M11 2015 5.7 122954 2035229 2158183
186 Alabama M11 2016 5.7 123834 2060807 2184641
187 Alabama M12 2000 4.2 90116 2045059 2135175
188 Alabama M12 2001 5.8 124127 1998822 2122949
189 Alabama M12 2002 5.7 121340 2001747 2123087
190 Alabama M12 2003 5.7 121278 2012259 2133537
191 Alabama M12 2004 4.9 105756 2041339 2147095
192 Alabama M12 2005 3.9 83794 2079724 2163518
193 Alabama M12 2006 3.7 82045 2107748 2189793
194 Alabama M12 2007 4.3 93382 2096601 2189983
195 Alabama M12 2008 7.6 164258 2008873 2173131
196 Alabama M12 2009 11.4 242992 1894973 2137965
197 Alabama M12 2010 9.8 217272 1996538 2213810
198 Alabama M12 2011 7.9 172004 2004151 2176155
199 Alabama M12 2012 7.2 156518 2016210 2172728
200 Alabama M12 2013 6.7 143618 2013519 2157137
201 Alabama M12 2014 5.7 121281 2023427 2144708
202 Alabama M12 2015 5.9 127389 2030765 2158154
203 Alabama M12 2016 5.9 129295 2050965 2180260

204 rows × 7 columns

df.loc[df.state =='Alabama', ['unemployment_rate']]
unemployment_rate
0 5.1
1 5.2
2 6.5
3 6.2
4 6.4
5 5.4
6 4.4
7 4.3
8 4.8
9 9.7
10 12.3
11 11.0
12 8.3
13 8.2
14 7.5
15 6.4
16 6.3
17 5.1
18 5.0
19 6.3
20 6.0
21 6.1
22 5.4
23 4.5
24 4.2
25 4.9
26 10.5
27 12.1
28 10.5
29 8.4
... ...
174 5.2
175 4.0
176 3.7
177 3.9
178 6.7
179 11.1
180 10.0
181 8.0
182 6.9
183 6.7
184 5.8
185 5.7
186 5.7
187 4.2
188 5.8
189 5.7
190 5.7
191 4.9
192 3.9
193 3.7
194 4.3
195 7.6
196 11.4
197 9.8
198 7.9
199 7.2
200 6.7
201 5.7
202 5.9
203 5.9

204 rows × 1 columns

df.loc[df.unemployment_rate > 10.0, ['state', 'period', 'year', 'unemployment_rate']]
state period year unemployment_rate
10 Alabama M01 2010 12.3
11 Alabama M01 2011 11.0
26 Alabama M02 2009 10.5
27 Alabama M02 2010 12.1
28 Alabama M02 2011 10.5
43 Alabama M03 2009 10.4
44 Alabama M03 2010 11.5
61 Alabama M04 2010 10.4
77 Alabama M05 2009 10.6
94 Alabama M06 2009 11.9
95 Alabama M06 2010 10.5
96 Alabama M06 2011 10.6
111 Alabama M07 2009 11.8
112 Alabama M07 2010 10.2
113 Alabama M07 2011 10.4
128 Alabama M08 2009 11.9
129 Alabama M08 2010 10.1
145 Alabama M09 2009 11.5
162 Alabama M10 2009 11.7
179 Alabama M11 2009 11.1
196 Alabama M12 2009 11.4
418 Arizona M01 2010 11.1
435 Arizona M02 2010 10.7
452 Arizona M03 2010 10.4
469 Arizona M04 2010 10.1
502 Arizona M06 2009 10.1
503 Arizona M06 2010 10.5
504 Arizona M06 2011 10.5
519 Arizona M07 2009 10.6
520 Arizona M07 2010 10.8
... ... ... ... ...
8526 South Carolina M10 2009 11.5
8527 South Carolina M10 2010 10.5
8528 South Carolina M10 2011 10.3
8543 South Carolina M11 2009 11.3
8544 South Carolina M11 2010 10.9
8560 South Carolina M12 2009 11.5
8561 South Carolina M12 2010 10.8
8782 Tennessee M01 2010 11.2
8783 Tennessee M01 2011 10.1
8798 Tennessee M02 2009 10.5
8799 Tennessee M02 2010 10.8
8815 Tennessee M03 2009 10.9
8816 Tennessee M03 2010 10.6
8832 Tennessee M04 2009 10.2
8849 Tennessee M05 2009 10.6
8866 Tennessee M06 2009 11.3
8883 Tennessee M07 2009 11.0
8900 Tennessee M08 2009 10.7
8917 Tennessee M09 2009 10.4
8934 Tennessee M10 2009 10.2
8968 Tennessee M12 2009 10.3
9802 Washington M01 2010 11.3
9803 Washington M01 2011 10.5
9819 Washington M02 2010 11.2
9820 Washington M02 2011 10.4
9836 Washington M03 2010 10.9
9988 Washington M12 2009 10.5
10210 Wisconsin M01 2010 10.1
10227 Wisconsin M02 2010 10.4
10244 Wisconsin M03 2010 10.3

663 rows × 4 columns

df.loc[(df.unemployment_rate > 10.0) & (df.state == 'California'), 
           ['state', 'period', 'year', 'unemployment_rate']]
state period year unemployment_rate
825 California M01 2009 10.2
826 California M01 2010 12.7
827 California M01 2011 12.6
828 California M01 2012 11.4
829 California M01 2013 10.2
842 California M02 2009 10.6
843 California M02 2010 12.6
844 California M02 2011 12.2
845 California M02 2012 11.3
859 California M03 2009 10.9
860 California M03 2010 12.6
861 California M03 2011 12.1
862 California M03 2012 11.0
876 California M04 2009 10.4
877 California M04 2010 12.0
878 California M04 2011 11.5
879 California M04 2012 10.3
893 California M05 2009 10.6
894 California M05 2010 11.7
895 California M05 2011 11.3
896 California M05 2012 10.2
910 California M06 2009 11.3
911 California M06 2010 12.0
912 California M06 2011 12.0
913 California M06 2012 10.6
927 California M07 2009 11.7
928 California M07 2010 12.6
929 California M07 2011 12.2
930 California M07 2012 10.9
944 California M08 2009 11.5
945 California M08 2010 12.3
946 California M08 2011 11.9
947 California M08 2012 10.4
961 California M09 2009 11.4
962 California M09 2010 11.9
963 California M09 2011 11.5
978 California M10 2009 11.6
979 California M10 2010 11.9
980 California M10 2011 11.3
995 California M11 2009 11.7
996 California M11 2010 12.3
997 California M11 2011 11.0
1012 California M12 2009 11.8
1013 California M12 2010 12.1
1014 California M12 2011 11.0

Challenge

  1. Select all data employment data for Nebraska when unemployment is greater than 5 percent

  2. Select all unemployment data in December for the Carolinas

  3. Select all unemployment data for the years 2007-2010 in California

Solutions

df.loc[(df.unemployment_rate > 5) & (df.state == 'Nebraska'), 
           ['state', 'period', 'year', 'qty_employed']]
state period year qty_employed
5518 Nebraska M01 2010 927304
5519 Nebraska M01 2011 937296
5535 Nebraska M02 2010 933213
5552 Nebraska M03 2010 937821
df.loc[(df.state == 'South Carolina') |(df.state == 'North Carolina'),
          ['state', 'period', 'year', 'unemployment_rate', 'qty_unemployed']]
state period year unemployment_rate qty_unemployed
6732 North Carolina M01 2000 3.7 150873
6733 North Carolina M01 2001 4.9 203732
6734 North Carolina M01 2002 7.4 305801
6735 North Carolina M01 2003 6.9 288078
6736 North Carolina M01 2004 6.3 266128
6737 North Carolina M01 2005 5.7 241881
6738 North Carolina M01 2006 4.8 205540
6739 North Carolina M01 2007 5.1 226854
6740 North Carolina M01 2008 5.3 239310
6741 North Carolina M01 2009 10.2 466462
6742 North Carolina M01 2010 12.0 549257
6743 North Carolina M01 2011 10.9 500556
6744 North Carolina M01 2012 10.0 461382
6745 North Carolina M01 2013 9.7 451631
6746 North Carolina M01 2014 6.9 320157
6747 North Carolina M01 2015 6.1 285454
6748 North Carolina M01 2016 5.4 260255
6749 North Carolina M02 2000 3.8 155617
6750 North Carolina M02 2001 5.0 209858
6751 North Carolina M02 2002 7.1 296844
6752 North Carolina M02 2003 6.7 282810
6753 North Carolina M02 2004 6.0 255398
6754 North Carolina M02 2005 6.0 253712
6755 North Carolina M02 2006 5.0 218569
6756 North Carolina M02 2007 4.9 219842
6757 North Carolina M02 2008 5.3 236310
6758 North Carolina M02 2009 10.7 491633
6759 North Carolina M02 2010 12.0 549878
6760 North Carolina M02 2011 10.6 486533
6761 North Carolina M02 2012 9.9 458754
... ... ... ... ... ...
8538 South Carolina M11 2004 6.8 139685
8539 South Carolina M11 2005 6.9 143139
8540 South Carolina M11 2006 6.2 131898
8541 South Carolina M11 2007 5.4 114045
8542 South Carolina M11 2008 8.0 172247
8543 South Carolina M11 2009 11.3 239246
8544 South Carolina M11 2010 10.9 233542
8545 South Carolina M11 2011 10.0 215727
8546 South Carolina M11 2012 8.2 177723
8547 South Carolina M11 2013 6.5 142009
8548 South Carolina M11 2014 6.5 146273
8549 South Carolina M11 2015 5.3 120874
8550 South Carolina M11 2016 4.0 91905
8551 South Carolina M12 2000 3.7 72211
8552 South Carolina M12 2001 5.8 110516
8553 South Carolina M12 2002 5.9 114920
8554 South Carolina M12 2003 6.7 133811
8555 South Carolina M12 2004 6.8 138995
8556 South Carolina M12 2005 6.7 138832
8557 South Carolina M12 2006 6.0 127512
8558 South Carolina M12 2007 5.8 121614
8559 South Carolina M12 2008 9.1 194839
8560 South Carolina M12 2009 11.5 244006
8561 South Carolina M12 2010 10.8 231105
8562 South Carolina M12 2011 9.9 213746
8563 South Carolina M12 2012 8.6 186783
8564 South Carolina M12 2013 6.2 134045
8565 South Carolina M12 2014 6.4 143637
8566 South Carolina M12 2015 5.4 122137
8567 South Carolina M12 2016 4.2 95277

408 rows × 5 columns

# alt
df.loc[df.state.str.contains("Carolina"),
          ['state', 'period', 'year', 'unemployment_rate', 'qty_unemployed']]
state period year unemployment_rate qty_unemployed
6732 North Carolina M01 2000 3.7 150873
6733 North Carolina M01 2001 4.9 203732
6734 North Carolina M01 2002 7.4 305801
6735 North Carolina M01 2003 6.9 288078
6736 North Carolina M01 2004 6.3 266128
6737 North Carolina M01 2005 5.7 241881
6738 North Carolina M01 2006 4.8 205540
6739 North Carolina M01 2007 5.1 226854
6740 North Carolina M01 2008 5.3 239310
6741 North Carolina M01 2009 10.2 466462
6742 North Carolina M01 2010 12.0 549257
6743 North Carolina M01 2011 10.9 500556
6744 North Carolina M01 2012 10.0 461382
6745 North Carolina M01 2013 9.7 451631
6746 North Carolina M01 2014 6.9 320157
6747 North Carolina M01 2015 6.1 285454
6748 North Carolina M01 2016 5.4 260255
6749 North Carolina M02 2000 3.8 155617
6750 North Carolina M02 2001 5.0 209858
6751 North Carolina M02 2002 7.1 296844
6752 North Carolina M02 2003 6.7 282810
6753 North Carolina M02 2004 6.0 255398
6754 North Carolina M02 2005 6.0 253712
6755 North Carolina M02 2006 5.0 218569
6756 North Carolina M02 2007 4.9 219842
6757 North Carolina M02 2008 5.3 236310
6758 North Carolina M02 2009 10.7 491633
6759 North Carolina M02 2010 12.0 549878
6760 North Carolina M02 2011 10.6 486533
6761 North Carolina M02 2012 9.9 458754
... ... ... ... ... ...
8538 South Carolina M11 2004 6.8 139685
8539 South Carolina M11 2005 6.9 143139
8540 South Carolina M11 2006 6.2 131898
8541 South Carolina M11 2007 5.4 114045
8542 South Carolina M11 2008 8.0 172247
8543 South Carolina M11 2009 11.3 239246
8544 South Carolina M11 2010 10.9 233542
8545 South Carolina M11 2011 10.0 215727
8546 South Carolina M11 2012 8.2 177723
8547 South Carolina M11 2013 6.5 142009
8548 South Carolina M11 2014 6.5 146273
8549 South Carolina M11 2015 5.3 120874
8550 South Carolina M11 2016 4.0 91905
8551 South Carolina M12 2000 3.7 72211
8552 South Carolina M12 2001 5.8 110516
8553 South Carolina M12 2002 5.9 114920
8554 South Carolina M12 2003 6.7 133811
8555 South Carolina M12 2004 6.8 138995
8556 South Carolina M12 2005 6.7 138832
8557 South Carolina M12 2006 6.0 127512
8558 South Carolina M12 2007 5.8 121614
8559 South Carolina M12 2008 9.1 194839
8560 South Carolina M12 2009 11.5 244006
8561 South Carolina M12 2010 10.8 231105
8562 South Carolina M12 2011 9.9 213746
8563 South Carolina M12 2012 8.6 186783
8564 South Carolina M12 2013 6.2 134045
8565 South Carolina M12 2014 6.4 143637
8566 South Carolina M12 2015 5.4 122137
8567 South Carolina M12 2016 4.2 95277

408 rows × 5 columns

df.loc[(df.state == 'California') & (df.year >= 2007) & (df.year <= 2010),
          ['state', 'period', 'year', 'unemployment_rate', 'qty_unemployed']]
state period year unemployment_rate qty_unemployed
823 California M01 2007 5.5 976429
824 California M01 2008 6.4 1158755
825 California M01 2009 10.2 1868614
826 California M01 2010 12.7 2315237
840 California M02 2007 5.4 956853
841 California M02 2008 6.3 1138022
842 California M02 2009 10.6 1935243
843 California M02 2010 12.6 2300364
857 California M03 2007 5.2 932960
858 California M03 2008 6.6 1185778
859 California M03 2009 10.9 1991432
860 California M03 2010 12.6 2312390
874 California M04 2007 5.0 895083
875 California M04 2008 6.1 1103750
876 California M04 2009 10.4 1900577
877 California M04 2010 12.0 2201236
891 California M05 2007 4.8 846097
892 California M05 2008 6.4 1163777
893 California M05 2009 10.6 1931991
894 California M05 2010 11.7 2140341
908 California M06 2007 5.3 940705
909 California M06 2008 7.0 1273683
910 California M06 2009 11.3 2060228
911 California M06 2010 12.0 2206906
925 California M07 2007 5.7 1023914
926 California M07 2008 7.6 1397333
927 California M07 2009 11.7 2142067
928 California M07 2010 12.6 2316525
942 California M08 2007 5.4 971032
943 California M08 2008 7.8 1419146
944 California M08 2009 11.5 2105749
945 California M08 2010 12.3 2271603
959 California M09 2007 5.3 958487
960 California M09 2008 7.7 1397892
961 California M09 2009 11.4 2079035
962 California M09 2010 11.9 2197318
976 California M10 2007 5.4 968525
977 California M10 2008 8.0 1461880
978 California M10 2009 11.6 2114293
979 California M10 2010 11.9 2188602
993 California M11 2007 5.6 1011518
994 California M11 2008 8.4 1541436
995 California M11 2009 11.7 2127225
996 California M11 2010 12.3 2268418
1010 California M12 2007 5.9 1056279
1011 California M12 2008 9.0 1642244
1012 California M12 2009 11.8 2134358
1013 California M12 2010 12.1 2212970
#alternatively
df.loc[(df.state == 'California') & df.year.between(2007, 2010, inclusive=True),
          ['state', 'period', 'year', 'unemployment_rate', 'qty_unemployed']]
state period year unemployment_rate qty_unemployed
823 California M01 2007 5.5 976429
824 California M01 2008 6.4 1158755
825 California M01 2009 10.2 1868614
826 California M01 2010 12.7 2315237
840 California M02 2007 5.4 956853
841 California M02 2008 6.3 1138022
842 California M02 2009 10.6 1935243
843 California M02 2010 12.6 2300364
857 California M03 2007 5.2 932960
858 California M03 2008 6.6 1185778
859 California M03 2009 10.9 1991432
860 California M03 2010 12.6 2312390
874 California M04 2007 5.0 895083
875 California M04 2008 6.1 1103750
876 California M04 2009 10.4 1900577
877 California M04 2010 12.0 2201236
891 California M05 2007 4.8 846097
892 California M05 2008 6.4 1163777
893 California M05 2009 10.6 1931991
894 California M05 2010 11.7 2140341
908 California M06 2007 5.3 940705
909 California M06 2008 7.0 1273683
910 California M06 2009 11.3 2060228
911 California M06 2010 12.0 2206906
925 California M07 2007 5.7 1023914
926 California M07 2008 7.6 1397333
927 California M07 2009 11.7 2142067
928 California M07 2010 12.6 2316525
942 California M08 2007 5.4 971032
943 California M08 2008 7.8 1419146
944 California M08 2009 11.5 2105749
945 California M08 2010 12.3 2271603
959 California M09 2007 5.3 958487
960 California M09 2008 7.7 1397892
961 California M09 2009 11.4 2079035
962 California M09 2010 11.9 2197318
976 California M10 2007 5.4 968525
977 California M10 2008 8.0 1461880
978 California M10 2009 11.6 2114293
979 California M10 2010 11.9 2188602
993 California M11 2007 5.6 1011518
994 California M11 2008 8.4 1541436
995 California M11 2009 11.7 2127225
996 California M11 2010 12.3 2268418
1010 California M12 2007 5.9 1056279
1011 California M12 2008 9.0 1642244
1012 California M12 2009 11.8 2134358
1013 California M12 2010 12.1 2212970

The Query Syntax

Instead of indexing with loc and iloc pandas also has a query method. This is particularly handy when we have multi-index data, since the methods above don’t carry over.

An example useage is:

state = 'California'
df.query('2010 >= year >= 2007 & state==@state')
state period year unemployment_rate qty_unemployed qty_employed labour_force
823 California M01 2007 5.5 976429 16797415 17773844
824 California M01 2008 6.4 1158755 16844050 18002805
825 California M01 2009 10.2 1868614 16372621 18241235
826 California M01 2010 12.7 2315237 15863115 18178352
840 California M02 2007 5.4 956853 16839213 17796066
841 California M02 2008 6.3 1138022 16854691 17992713
842 California M02 2009 10.6 1935243 16339886 18275129
843 California M02 2010 12.6 2300364 15931087 18231451
857 California M03 2007 5.2 932960 16884325 17817285
858 California M03 2008 6.6 1185778 16865764 18051542
859 California M03 2009 10.9 1991432 16257780 18249212
860 California M03 2010 12.6 2312390 15970655 18283045
874 California M04 2007 5.0 895083 16863748 17758831
875 California M04 2008 6.1 1103750 16936957 18040707
876 California M04 2009 10.4 1900577 16300947 18201524
877 California M04 2010 12.0 2201236 16156408 18357644
891 California M05 2007 4.8 846097 16922244 17768341
892 California M05 2008 6.4 1163777 16932233 18096010
893 California M05 2009 10.6 1931991 16260585 18192576
894 California M05 2010 11.7 2140341 16178497 18318838
908 California M06 2007 5.3 940705 16958054 17898759
909 California M06 2008 7.0 1273683 16922825 18196508
910 California M06 2009 11.3 2060228 16206959 18267187
911 California M06 2010 12.0 2206906 16123264 18330170
925 California M07 2007 5.7 1023914 16981299 18005213
926 California M07 2008 7.6 1397333 16908797 18306130
927 California M07 2009 11.7 2142067 16185170 18327237
928 California M07 2010 12.6 2316525 16115939 18432464
942 California M08 2007 5.4 971032 16958142 17929174
943 California M08 2008 7.8 1419146 16873372 18292518
944 California M08 2009 11.5 2105749 16149025 18254774
945 California M08 2010 12.3 2271603 16171990 18443593
959 California M09 2007 5.3 958487 17003870 17962357
960 California M09 2008 7.7 1397892 16873297 18271189
961 California M09 2009 11.4 2079035 16080505 18159540
962 California M09 2010 11.9 2197318 16196643 18393961
976 California M10 2007 5.4 968525 16993192 17961717
977 California M10 2008 8.0 1461880 16863303 18325183
978 California M10 2009 11.6 2114293 16049503 18163796
979 California M10 2010 11.9 2188602 16160340 18348942
993 California M11 2007 5.6 1011518 17020073 18031591
994 California M11 2008 8.4 1541436 16748815 18290251
995 California M11 2009 11.7 2127225 16047439 18174664
996 California M11 2010 12.3 2268418 16118593 18387011
1010 California M12 2007 5.9 1056279 16957504 18013783
1011 California M12 2008 9.0 1642244 16629680 18271924
1012 California M12 2009 11.8 2134358 15940443 18074801
1013 California M12 2010 12.1 2212970 16116806 18329776