Indexing and Selecting Data¶
Contact: Lachlan Deer, [econgit] @ldeer, [github/Twitter] @lachlandeer
In yesterday’s class on NumPy we thought about indexing, slicing, masking and modifying values in Numpy arrays. There are analogues to these operations in the pandas library that we will look into now.
Our focus will be on the pandas DataFrame, rather than the pandas Series Object - because we will be dealing with DataFrames more often than not.
import pandas as pd
First, let’s load our example DataFrame that we assembled in the previous lesson:
df = pd.read_csv('out_data/state_labour_statistics.csv')
df.head()
state | period | year | unemployment_rate | qty_unemployed | qty_employed | labour_force | |
---|---|---|---|---|---|---|---|
0 | New Hampshire | M12 | 2016 | 2.5 | 18907 | 728556 | 747463 |
1 | New Hampshire | M11 | 2016 | 2.6 | 19573 | 729720 | 749293 |
2 | New Hampshire | M10 | 2016 | 2.5 | 18430 | 729792 | 748222 |
3 | New Hampshire | M09 | 2016 | 2.6 | 19415 | 727548 | 746963 |
4 | New Hampshire | M08 | 2016 | 2.8 | 20933 | 737236 | 758169 |
Notice that although when we were working with the data in our previous notebook the index was set as state-year-period, once we saved to csv we lost this structure. This is not a big deal, in a future notebook we will go back to setting the index as we so desire.
We can look at some of the features of our DataFrame before continuing (as a review)
df.index
RangeIndex(start=0, stop=10608, step=1)
df.columns
Index(['state', 'period', 'year', 'unemployment_rate', 'qty_unemployed',
'qty_employed', 'labour_force'],
dtype='object')
df.shape
(10608, 7)
df.values
array([['New Hampshire', 'M12', 2016, ..., 18907, 728556, 747463],
['New Hampshire', 'M11', 2016, ..., 19573, 729720, 749293],
['New Hampshire', 'M10', 2016, ..., 18430, 729792, 748222],
...,
['Georgia', 'M03', 2000, ..., 158596, 4064879, 4223475],
['Georgia', 'M02', 2000, ..., 157572, 4046970, 4204542],
['Georgia', 'M01', 2000, ..., 157066, 4028619, 4185685]],
dtype=object)
Selecting Columns¶
We can select individual columns using a square bracket notation:
df['unemployment_rate']
0 2.5
1 2.6
2 2.5
3 2.6
4 2.8
...
10603 3.7
10604 3.3
10605 3.8
10606 3.7
10607 3.8
Name: unemployment_rate, Length: 10608, dtype: float64
or using a ‘dot’ notation, which is an attribute-style way of accessing
df.unemployment_rate
0 2.5
1 2.6
2 2.5
3 2.6
4 2.8
...
10603 3.7
10604 3.3
10605 3.8
10606 3.7
10607 3.8
Name: unemployment_rate, Length: 10608, dtype: float64
we can verify that it returns the same thing:
df.unemployment_rate is df['unemployment_rate']
True
The attribute style indexing will not work if the column names conflict with a method of a DataFrame.
We can select multiple columns too by passing a list of columns to select:
df[['unemployment_rate' , 'qty_unemployed']]
unemployment_rate | qty_unemployed | |
---|---|---|
0 | 2.5 | 18907 |
1 | 2.6 | 19573 |
2 | 2.5 | 18430 |
3 | 2.6 | 19415 |
4 | 2.8 | 20933 |
... | ... | ... |
10603 | 3.7 | 155898 |
10604 | 3.3 | 138511 |
10605 | 3.8 | 158596 |
10606 | 3.7 | 157572 |
10607 | 3.8 | 157066 |
10608 rows × 2 columns
If we feel the need, we can also transpose a DataFrame:
df.T
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 10598 | 10599 | 10600 | 10601 | 10602 | 10603 | 10604 | 10605 | 10606 | 10607 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
state | New Hampshire | New Hampshire | New Hampshire | New Hampshire | New Hampshire | New Hampshire | New Hampshire | New Hampshire | New Hampshire | New Hampshire | ... | Georgia | Georgia | Georgia | Georgia | Georgia | Georgia | Georgia | Georgia | Georgia | Georgia |
period | M12 | M11 | M10 | M09 | M08 | M07 | M06 | M05 | M04 | M03 | ... | M10 | M09 | M08 | M07 | M06 | M05 | M04 | M03 | M02 | M01 |
year | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | 2016 | ... | 2000 | 2000 | 2000 | 2000 | 2000 | 2000 | 2000 | 2000 | 2000 | 2000 |
unemployment_rate | 2.5 | 2.6 | 2.5 | 2.6 | 2.8 | 2.8 | 2.9 | 2.7 | 2.8 | 3.2 | ... | 3.3 | 3.5 | 3.9 | 3.9 | 4.0 | 3.7 | 3.3 | 3.8 | 3.7 | 3.8 |
qty_unemployed | 18907 | 19573 | 18430 | 19415 | 20933 | 21521 | 21559 | 20206 | 20652 | 23501 | ... | 137555 | 146555 | 164397 | 165992 | 168502 | 155898 | 138511 | 158596 | 157572 | 157066 |
qty_employed | 728556 | 729720 | 729792 | 727548 | 737236 | 740418 | 733329 | 722805 | 720696 | 721319 | ... | 4081125 | 4063109 | 4060662 | 4075181 | 4067365 | 4063847 | 4071886 | 4064879 | 4046970 | 4028619 |
labour_force | 747463 | 749293 | 748222 | 746963 | 758169 | 761939 | 754888 | 743011 | 741348 | 744820 | ... | 4218680 | 4209664 | 4225059 | 4241173 | 4235867 | 4219745 | 4210397 | 4223475 | 4204542 | 4185685 |
7 rows × 10608 columns
Selecting Rows of data¶
Since the ‘square-bracket’ indexing is reserved for selecting columns of data, we need another way to access individual rows of data. Pandas offers us three alternatives here:
loc
iloc
ix
Let’s see how each of these work:
iloc
works just like the NumPy indexing:
df.iloc[0:3,:]
state | period | year | unemployment_rate | qty_unemployed | qty_employed | labour_force | |
---|---|---|---|---|---|---|---|
0 | New Hampshire | M12 | 2016 | 2.5 | 18907 | 728556 | 747463 |
1 | New Hampshire | M11 | 2016 | 2.6 | 19573 | 729720 | 749293 |
2 | New Hampshire | M10 | 2016 | 2.5 | 18430 | 729792 | 748222 |
df.iloc[0:10:2, 0:4]
state | period | year | unemployment_rate | |
---|---|---|---|---|
0 | New Hampshire | M12 | 2016 | 2.5 |
2 | New Hampshire | M10 | 2016 | 2.5 |
4 | New Hampshire | M08 | 2016 | 2.8 |
6 | New Hampshire | M06 | 2016 | 2.9 |
8 | New Hampshire | M04 | 2016 | 2.8 |
The loc
syntax allows us to index the data using explicit index and column names:
df.loc[0:10, 'state':'unemployment_rate']
state | period | year | unemployment_rate | |
---|---|---|---|---|
0 | New Hampshire | M12 | 2016 | 2.5 |
1 | New Hampshire | M11 | 2016 | 2.6 |
2 | New Hampshire | M10 | 2016 | 2.5 |
3 | New Hampshire | M09 | 2016 | 2.6 |
4 | New Hampshire | M08 | 2016 | 2.8 |
5 | New Hampshire | M07 | 2016 | 2.8 |
6 | New Hampshire | M06 | 2016 | 2.9 |
7 | New Hampshire | M05 | 2016 | 2.7 |
8 | New Hampshire | M04 | 2016 | 2.8 |
9 | New Hampshire | M03 | 2016 | 3.2 |
10 | New Hampshire | M02 | 2016 | 3.3 |
The ix
indexer is useful if the index is set as text but we want to select by row number, whilst refer to columns by name. Note for our example, nothing changes:
df.ix[:3, 'state':'unemployment_rate']
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/tmp/ipykernel_2843/3867642031.py in <module>
----> 1 df.ix[:3, 'state':'unemployment_rate']
/opt/hostedtoolcache/Python/3.8.11/x64/lib/python3.8/site-packages/pandas/core/generic.py in __getattr__(self, name)
5485 ):
5486 return self[name]
-> 5487 return object.__getattribute__(self, name)
5488
5489 def __setattr__(self, name: str, value) -> None:
AttributeError: 'DataFrame' object has no attribute 'ix'
but if instead we have
unemployed = pd.Series({'California': 423967, 'Texas': 695662,
'New York': 141297, 'Florida': 170312,
'Illinois': 149995})
population = pd.Series({'California': 38332521, 'Texas': 26448193,
'New York': 19651127, 'Florida': 19552860,
'Illinois': 12882135})
data = pd.DataFrame({'unemployed':unemployed, 'population':population})
data
population | unemployed | |
---|---|---|
California | 38332521 | 423967 |
Florida | 19552860 | 170312 |
Illinois | 12882135 | 149995 |
New York | 19651127 | 141297 |
Texas | 26448193 | 695662 |
data.ix[1:3, 'population']
Florida 19552860
Illinois 12882135
Name: population, dtype: int64
gives delivers something different than .loc
Any of our indexing conventions allow us to modify values if we so wish:
data.loc['Florida', 'population'] = 20000000
data
population | unemployed | |
---|---|---|
California | 38332521 | 423967 |
Florida | 20000000 | 170312 |
Illinois | 12882135 | 149995 |
New York | 19651127 | 141297 |
Texas | 26448193 | 695662 |
Boolean Masking and Fancy Indexing¶
we can use NumPy-style access patterns within our indexers to subsets of data:
df.loc[df.state =='Alabama']
state | period | year | unemployment_rate | qty_unemployed | qty_employed | labour_force | |
---|---|---|---|---|---|---|---|
0 | Alabama | M01 | 2000 | 5.1 | 108551 | 2024262 | 2132813 |
1 | Alabama | M01 | 2001 | 5.2 | 110035 | 2008876 | 2118911 |
2 | Alabama | M01 | 2002 | 6.5 | 134750 | 1953814 | 2088564 |
3 | Alabama | M01 | 2003 | 6.2 | 129966 | 1974374 | 2104340 |
4 | Alabama | M01 | 2004 | 6.4 | 134673 | 1985886 | 2120559 |
5 | Alabama | M01 | 2005 | 5.4 | 114319 | 2011219 | 2125538 |
6 | Alabama | M01 | 2006 | 4.4 | 93680 | 2048878 | 2142558 |
7 | Alabama | M01 | 2007 | 4.3 | 93046 | 2076252 | 2169298 |
8 | Alabama | M01 | 2008 | 4.8 | 104650 | 2065583 | 2170233 |
9 | Alabama | M01 | 2009 | 9.7 | 208908 | 1955196 | 2164104 |
10 | Alabama | M01 | 2010 | 12.3 | 263375 | 1871780 | 2135155 |
11 | Alabama | M01 | 2011 | 11.0 | 243001 | 1967127 | 2210128 |
12 | Alabama | M01 | 2012 | 8.3 | 177986 | 1973652 | 2151638 |
13 | Alabama | M01 | 2013 | 8.2 | 177887 | 1988180 | 2166067 |
14 | Alabama | M01 | 2014 | 7.5 | 162341 | 1989459 | 2151800 |
15 | Alabama | M01 | 2015 | 6.4 | 135842 | 1998388 | 2134230 |
16 | Alabama | M01 | 2016 | 6.3 | 134592 | 2007036 | 2141628 |
17 | Alabama | M02 | 2000 | 5.1 | 109464 | 2027226 | 2136690 |
18 | Alabama | M02 | 2001 | 5.0 | 105469 | 2009904 | 2115373 |
19 | Alabama | M02 | 2002 | 6.3 | 132979 | 1966562 | 2099541 |
20 | Alabama | M02 | 2003 | 6.0 | 127265 | 1980206 | 2107471 |
21 | Alabama | M02 | 2004 | 6.1 | 130294 | 1990082 | 2120376 |
22 | Alabama | M02 | 2005 | 5.4 | 115837 | 2014928 | 2130765 |
23 | Alabama | M02 | 2006 | 4.5 | 96643 | 2053723 | 2150366 |
24 | Alabama | M02 | 2007 | 4.2 | 91769 | 2078300 | 2170069 |
25 | Alabama | M02 | 2008 | 4.9 | 106990 | 2060564 | 2167554 |
26 | Alabama | M02 | 2009 | 10.5 | 227822 | 1943707 | 2171529 |
27 | Alabama | M02 | 2010 | 12.1 | 258825 | 1876758 | 2135583 |
28 | Alabama | M02 | 2011 | 10.5 | 232117 | 1972720 | 2204837 |
29 | Alabama | M02 | 2012 | 8.4 | 182555 | 1982828 | 2165383 |
... | ... | ... | ... | ... | ... | ... | ... |
174 | Alabama | M11 | 2004 | 5.2 | 111612 | 2043152 | 2154764 |
175 | Alabama | M11 | 2005 | 4.0 | 86786 | 2078869 | 2165655 |
176 | Alabama | M11 | 2006 | 3.7 | 80347 | 2105333 | 2185680 |
177 | Alabama | M11 | 2007 | 3.9 | 85214 | 2107374 | 2192588 |
178 | Alabama | M11 | 2008 | 6.7 | 145764 | 2030180 | 2175944 |
179 | Alabama | M11 | 2009 | 11.1 | 238550 | 1910326 | 2148876 |
180 | Alabama | M11 | 2010 | 10.0 | 222536 | 2001481 | 2224017 |
181 | Alabama | M11 | 2011 | 8.0 | 175369 | 2008604 | 2183973 |
182 | Alabama | M11 | 2012 | 6.9 | 149468 | 2023318 | 2172786 |
183 | Alabama | M11 | 2013 | 6.7 | 144052 | 2020314 | 2164366 |
184 | Alabama | M11 | 2014 | 5.8 | 124177 | 2033698 | 2157875 |
185 | Alabama | M11 | 2015 | 5.7 | 122954 | 2035229 | 2158183 |
186 | Alabama | M11 | 2016 | 5.7 | 123834 | 2060807 | 2184641 |
187 | Alabama | M12 | 2000 | 4.2 | 90116 | 2045059 | 2135175 |
188 | Alabama | M12 | 2001 | 5.8 | 124127 | 1998822 | 2122949 |
189 | Alabama | M12 | 2002 | 5.7 | 121340 | 2001747 | 2123087 |
190 | Alabama | M12 | 2003 | 5.7 | 121278 | 2012259 | 2133537 |
191 | Alabama | M12 | 2004 | 4.9 | 105756 | 2041339 | 2147095 |
192 | Alabama | M12 | 2005 | 3.9 | 83794 | 2079724 | 2163518 |
193 | Alabama | M12 | 2006 | 3.7 | 82045 | 2107748 | 2189793 |
194 | Alabama | M12 | 2007 | 4.3 | 93382 | 2096601 | 2189983 |
195 | Alabama | M12 | 2008 | 7.6 | 164258 | 2008873 | 2173131 |
196 | Alabama | M12 | 2009 | 11.4 | 242992 | 1894973 | 2137965 |
197 | Alabama | M12 | 2010 | 9.8 | 217272 | 1996538 | 2213810 |
198 | Alabama | M12 | 2011 | 7.9 | 172004 | 2004151 | 2176155 |
199 | Alabama | M12 | 2012 | 7.2 | 156518 | 2016210 | 2172728 |
200 | Alabama | M12 | 2013 | 6.7 | 143618 | 2013519 | 2157137 |
201 | Alabama | M12 | 2014 | 5.7 | 121281 | 2023427 | 2144708 |
202 | Alabama | M12 | 2015 | 5.9 | 127389 | 2030765 | 2158154 |
203 | Alabama | M12 | 2016 | 5.9 | 129295 | 2050965 | 2180260 |
204 rows × 7 columns
df.loc[df.state =='Alabama', ['unemployment_rate']]
unemployment_rate | |
---|---|
0 | 5.1 |
1 | 5.2 |
2 | 6.5 |
3 | 6.2 |
4 | 6.4 |
5 | 5.4 |
6 | 4.4 |
7 | 4.3 |
8 | 4.8 |
9 | 9.7 |
10 | 12.3 |
11 | 11.0 |
12 | 8.3 |
13 | 8.2 |
14 | 7.5 |
15 | 6.4 |
16 | 6.3 |
17 | 5.1 |
18 | 5.0 |
19 | 6.3 |
20 | 6.0 |
21 | 6.1 |
22 | 5.4 |
23 | 4.5 |
24 | 4.2 |
25 | 4.9 |
26 | 10.5 |
27 | 12.1 |
28 | 10.5 |
29 | 8.4 |
... | ... |
174 | 5.2 |
175 | 4.0 |
176 | 3.7 |
177 | 3.9 |
178 | 6.7 |
179 | 11.1 |
180 | 10.0 |
181 | 8.0 |
182 | 6.9 |
183 | 6.7 |
184 | 5.8 |
185 | 5.7 |
186 | 5.7 |
187 | 4.2 |
188 | 5.8 |
189 | 5.7 |
190 | 5.7 |
191 | 4.9 |
192 | 3.9 |
193 | 3.7 |
194 | 4.3 |
195 | 7.6 |
196 | 11.4 |
197 | 9.8 |
198 | 7.9 |
199 | 7.2 |
200 | 6.7 |
201 | 5.7 |
202 | 5.9 |
203 | 5.9 |
204 rows × 1 columns
df.loc[df.unemployment_rate > 10.0, ['state', 'period', 'year', 'unemployment_rate']]
state | period | year | unemployment_rate | |
---|---|---|---|---|
10 | Alabama | M01 | 2010 | 12.3 |
11 | Alabama | M01 | 2011 | 11.0 |
26 | Alabama | M02 | 2009 | 10.5 |
27 | Alabama | M02 | 2010 | 12.1 |
28 | Alabama | M02 | 2011 | 10.5 |
43 | Alabama | M03 | 2009 | 10.4 |
44 | Alabama | M03 | 2010 | 11.5 |
61 | Alabama | M04 | 2010 | 10.4 |
77 | Alabama | M05 | 2009 | 10.6 |
94 | Alabama | M06 | 2009 | 11.9 |
95 | Alabama | M06 | 2010 | 10.5 |
96 | Alabama | M06 | 2011 | 10.6 |
111 | Alabama | M07 | 2009 | 11.8 |
112 | Alabama | M07 | 2010 | 10.2 |
113 | Alabama | M07 | 2011 | 10.4 |
128 | Alabama | M08 | 2009 | 11.9 |
129 | Alabama | M08 | 2010 | 10.1 |
145 | Alabama | M09 | 2009 | 11.5 |
162 | Alabama | M10 | 2009 | 11.7 |
179 | Alabama | M11 | 2009 | 11.1 |
196 | Alabama | M12 | 2009 | 11.4 |
418 | Arizona | M01 | 2010 | 11.1 |
435 | Arizona | M02 | 2010 | 10.7 |
452 | Arizona | M03 | 2010 | 10.4 |
469 | Arizona | M04 | 2010 | 10.1 |
502 | Arizona | M06 | 2009 | 10.1 |
503 | Arizona | M06 | 2010 | 10.5 |
504 | Arizona | M06 | 2011 | 10.5 |
519 | Arizona | M07 | 2009 | 10.6 |
520 | Arizona | M07 | 2010 | 10.8 |
... | ... | ... | ... | ... |
8526 | South Carolina | M10 | 2009 | 11.5 |
8527 | South Carolina | M10 | 2010 | 10.5 |
8528 | South Carolina | M10 | 2011 | 10.3 |
8543 | South Carolina | M11 | 2009 | 11.3 |
8544 | South Carolina | M11 | 2010 | 10.9 |
8560 | South Carolina | M12 | 2009 | 11.5 |
8561 | South Carolina | M12 | 2010 | 10.8 |
8782 | Tennessee | M01 | 2010 | 11.2 |
8783 | Tennessee | M01 | 2011 | 10.1 |
8798 | Tennessee | M02 | 2009 | 10.5 |
8799 | Tennessee | M02 | 2010 | 10.8 |
8815 | Tennessee | M03 | 2009 | 10.9 |
8816 | Tennessee | M03 | 2010 | 10.6 |
8832 | Tennessee | M04 | 2009 | 10.2 |
8849 | Tennessee | M05 | 2009 | 10.6 |
8866 | Tennessee | M06 | 2009 | 11.3 |
8883 | Tennessee | M07 | 2009 | 11.0 |
8900 | Tennessee | M08 | 2009 | 10.7 |
8917 | Tennessee | M09 | 2009 | 10.4 |
8934 | Tennessee | M10 | 2009 | 10.2 |
8968 | Tennessee | M12 | 2009 | 10.3 |
9802 | Washington | M01 | 2010 | 11.3 |
9803 | Washington | M01 | 2011 | 10.5 |
9819 | Washington | M02 | 2010 | 11.2 |
9820 | Washington | M02 | 2011 | 10.4 |
9836 | Washington | M03 | 2010 | 10.9 |
9988 | Washington | M12 | 2009 | 10.5 |
10210 | Wisconsin | M01 | 2010 | 10.1 |
10227 | Wisconsin | M02 | 2010 | 10.4 |
10244 | Wisconsin | M03 | 2010 | 10.3 |
663 rows × 4 columns
df.loc[(df.unemployment_rate > 10.0) & (df.state == 'California'),
['state', 'period', 'year', 'unemployment_rate']]
state | period | year | unemployment_rate | |
---|---|---|---|---|
825 | California | M01 | 2009 | 10.2 |
826 | California | M01 | 2010 | 12.7 |
827 | California | M01 | 2011 | 12.6 |
828 | California | M01 | 2012 | 11.4 |
829 | California | M01 | 2013 | 10.2 |
842 | California | M02 | 2009 | 10.6 |
843 | California | M02 | 2010 | 12.6 |
844 | California | M02 | 2011 | 12.2 |
845 | California | M02 | 2012 | 11.3 |
859 | California | M03 | 2009 | 10.9 |
860 | California | M03 | 2010 | 12.6 |
861 | California | M03 | 2011 | 12.1 |
862 | California | M03 | 2012 | 11.0 |
876 | California | M04 | 2009 | 10.4 |
877 | California | M04 | 2010 | 12.0 |
878 | California | M04 | 2011 | 11.5 |
879 | California | M04 | 2012 | 10.3 |
893 | California | M05 | 2009 | 10.6 |
894 | California | M05 | 2010 | 11.7 |
895 | California | M05 | 2011 | 11.3 |
896 | California | M05 | 2012 | 10.2 |
910 | California | M06 | 2009 | 11.3 |
911 | California | M06 | 2010 | 12.0 |
912 | California | M06 | 2011 | 12.0 |
913 | California | M06 | 2012 | 10.6 |
927 | California | M07 | 2009 | 11.7 |
928 | California | M07 | 2010 | 12.6 |
929 | California | M07 | 2011 | 12.2 |
930 | California | M07 | 2012 | 10.9 |
944 | California | M08 | 2009 | 11.5 |
945 | California | M08 | 2010 | 12.3 |
946 | California | M08 | 2011 | 11.9 |
947 | California | M08 | 2012 | 10.4 |
961 | California | M09 | 2009 | 11.4 |
962 | California | M09 | 2010 | 11.9 |
963 | California | M09 | 2011 | 11.5 |
978 | California | M10 | 2009 | 11.6 |
979 | California | M10 | 2010 | 11.9 |
980 | California | M10 | 2011 | 11.3 |
995 | California | M11 | 2009 | 11.7 |
996 | California | M11 | 2010 | 12.3 |
997 | California | M11 | 2011 | 11.0 |
1012 | California | M12 | 2009 | 11.8 |
1013 | California | M12 | 2010 | 12.1 |
1014 | California | M12 | 2011 | 11.0 |
Challenge¶
Select all data employment data for Nebraska when unemployment is greater than 5 percent
Select all unemployment data in December for the Carolinas
Select all unemployment data for the years 2007-2010 in California
Solutions¶
df.loc[(df.unemployment_rate > 5) & (df.state == 'Nebraska'),
['state', 'period', 'year', 'qty_employed']]
state | period | year | qty_employed | |
---|---|---|---|---|
5518 | Nebraska | M01 | 2010 | 927304 |
5519 | Nebraska | M01 | 2011 | 937296 |
5535 | Nebraska | M02 | 2010 | 933213 |
5552 | Nebraska | M03 | 2010 | 937821 |
df.loc[(df.state == 'South Carolina') |(df.state == 'North Carolina'),
['state', 'period', 'year', 'unemployment_rate', 'qty_unemployed']]
state | period | year | unemployment_rate | qty_unemployed | |
---|---|---|---|---|---|
6732 | North Carolina | M01 | 2000 | 3.7 | 150873 |
6733 | North Carolina | M01 | 2001 | 4.9 | 203732 |
6734 | North Carolina | M01 | 2002 | 7.4 | 305801 |
6735 | North Carolina | M01 | 2003 | 6.9 | 288078 |
6736 | North Carolina | M01 | 2004 | 6.3 | 266128 |
6737 | North Carolina | M01 | 2005 | 5.7 | 241881 |
6738 | North Carolina | M01 | 2006 | 4.8 | 205540 |
6739 | North Carolina | M01 | 2007 | 5.1 | 226854 |
6740 | North Carolina | M01 | 2008 | 5.3 | 239310 |
6741 | North Carolina | M01 | 2009 | 10.2 | 466462 |
6742 | North Carolina | M01 | 2010 | 12.0 | 549257 |
6743 | North Carolina | M01 | 2011 | 10.9 | 500556 |
6744 | North Carolina | M01 | 2012 | 10.0 | 461382 |
6745 | North Carolina | M01 | 2013 | 9.7 | 451631 |
6746 | North Carolina | M01 | 2014 | 6.9 | 320157 |
6747 | North Carolina | M01 | 2015 | 6.1 | 285454 |
6748 | North Carolina | M01 | 2016 | 5.4 | 260255 |
6749 | North Carolina | M02 | 2000 | 3.8 | 155617 |
6750 | North Carolina | M02 | 2001 | 5.0 | 209858 |
6751 | North Carolina | M02 | 2002 | 7.1 | 296844 |
6752 | North Carolina | M02 | 2003 | 6.7 | 282810 |
6753 | North Carolina | M02 | 2004 | 6.0 | 255398 |
6754 | North Carolina | M02 | 2005 | 6.0 | 253712 |
6755 | North Carolina | M02 | 2006 | 5.0 | 218569 |
6756 | North Carolina | M02 | 2007 | 4.9 | 219842 |
6757 | North Carolina | M02 | 2008 | 5.3 | 236310 |
6758 | North Carolina | M02 | 2009 | 10.7 | 491633 |
6759 | North Carolina | M02 | 2010 | 12.0 | 549878 |
6760 | North Carolina | M02 | 2011 | 10.6 | 486533 |
6761 | North Carolina | M02 | 2012 | 9.9 | 458754 |
... | ... | ... | ... | ... | ... |
8538 | South Carolina | M11 | 2004 | 6.8 | 139685 |
8539 | South Carolina | M11 | 2005 | 6.9 | 143139 |
8540 | South Carolina | M11 | 2006 | 6.2 | 131898 |
8541 | South Carolina | M11 | 2007 | 5.4 | 114045 |
8542 | South Carolina | M11 | 2008 | 8.0 | 172247 |
8543 | South Carolina | M11 | 2009 | 11.3 | 239246 |
8544 | South Carolina | M11 | 2010 | 10.9 | 233542 |
8545 | South Carolina | M11 | 2011 | 10.0 | 215727 |
8546 | South Carolina | M11 | 2012 | 8.2 | 177723 |
8547 | South Carolina | M11 | 2013 | 6.5 | 142009 |
8548 | South Carolina | M11 | 2014 | 6.5 | 146273 |
8549 | South Carolina | M11 | 2015 | 5.3 | 120874 |
8550 | South Carolina | M11 | 2016 | 4.0 | 91905 |
8551 | South Carolina | M12 | 2000 | 3.7 | 72211 |
8552 | South Carolina | M12 | 2001 | 5.8 | 110516 |
8553 | South Carolina | M12 | 2002 | 5.9 | 114920 |
8554 | South Carolina | M12 | 2003 | 6.7 | 133811 |
8555 | South Carolina | M12 | 2004 | 6.8 | 138995 |
8556 | South Carolina | M12 | 2005 | 6.7 | 138832 |
8557 | South Carolina | M12 | 2006 | 6.0 | 127512 |
8558 | South Carolina | M12 | 2007 | 5.8 | 121614 |
8559 | South Carolina | M12 | 2008 | 9.1 | 194839 |
8560 | South Carolina | M12 | 2009 | 11.5 | 244006 |
8561 | South Carolina | M12 | 2010 | 10.8 | 231105 |
8562 | South Carolina | M12 | 2011 | 9.9 | 213746 |
8563 | South Carolina | M12 | 2012 | 8.6 | 186783 |
8564 | South Carolina | M12 | 2013 | 6.2 | 134045 |
8565 | South Carolina | M12 | 2014 | 6.4 | 143637 |
8566 | South Carolina | M12 | 2015 | 5.4 | 122137 |
8567 | South Carolina | M12 | 2016 | 4.2 | 95277 |
408 rows × 5 columns
# alt
df.loc[df.state.str.contains("Carolina"),
['state', 'period', 'year', 'unemployment_rate', 'qty_unemployed']]
state | period | year | unemployment_rate | qty_unemployed | |
---|---|---|---|---|---|
6732 | North Carolina | M01 | 2000 | 3.7 | 150873 |
6733 | North Carolina | M01 | 2001 | 4.9 | 203732 |
6734 | North Carolina | M01 | 2002 | 7.4 | 305801 |
6735 | North Carolina | M01 | 2003 | 6.9 | 288078 |
6736 | North Carolina | M01 | 2004 | 6.3 | 266128 |
6737 | North Carolina | M01 | 2005 | 5.7 | 241881 |
6738 | North Carolina | M01 | 2006 | 4.8 | 205540 |
6739 | North Carolina | M01 | 2007 | 5.1 | 226854 |
6740 | North Carolina | M01 | 2008 | 5.3 | 239310 |
6741 | North Carolina | M01 | 2009 | 10.2 | 466462 |
6742 | North Carolina | M01 | 2010 | 12.0 | 549257 |
6743 | North Carolina | M01 | 2011 | 10.9 | 500556 |
6744 | North Carolina | M01 | 2012 | 10.0 | 461382 |
6745 | North Carolina | M01 | 2013 | 9.7 | 451631 |
6746 | North Carolina | M01 | 2014 | 6.9 | 320157 |
6747 | North Carolina | M01 | 2015 | 6.1 | 285454 |
6748 | North Carolina | M01 | 2016 | 5.4 | 260255 |
6749 | North Carolina | M02 | 2000 | 3.8 | 155617 |
6750 | North Carolina | M02 | 2001 | 5.0 | 209858 |
6751 | North Carolina | M02 | 2002 | 7.1 | 296844 |
6752 | North Carolina | M02 | 2003 | 6.7 | 282810 |
6753 | North Carolina | M02 | 2004 | 6.0 | 255398 |
6754 | North Carolina | M02 | 2005 | 6.0 | 253712 |
6755 | North Carolina | M02 | 2006 | 5.0 | 218569 |
6756 | North Carolina | M02 | 2007 | 4.9 | 219842 |
6757 | North Carolina | M02 | 2008 | 5.3 | 236310 |
6758 | North Carolina | M02 | 2009 | 10.7 | 491633 |
6759 | North Carolina | M02 | 2010 | 12.0 | 549878 |
6760 | North Carolina | M02 | 2011 | 10.6 | 486533 |
6761 | North Carolina | M02 | 2012 | 9.9 | 458754 |
... | ... | ... | ... | ... | ... |
8538 | South Carolina | M11 | 2004 | 6.8 | 139685 |
8539 | South Carolina | M11 | 2005 | 6.9 | 143139 |
8540 | South Carolina | M11 | 2006 | 6.2 | 131898 |
8541 | South Carolina | M11 | 2007 | 5.4 | 114045 |
8542 | South Carolina | M11 | 2008 | 8.0 | 172247 |
8543 | South Carolina | M11 | 2009 | 11.3 | 239246 |
8544 | South Carolina | M11 | 2010 | 10.9 | 233542 |
8545 | South Carolina | M11 | 2011 | 10.0 | 215727 |
8546 | South Carolina | M11 | 2012 | 8.2 | 177723 |
8547 | South Carolina | M11 | 2013 | 6.5 | 142009 |
8548 | South Carolina | M11 | 2014 | 6.5 | 146273 |
8549 | South Carolina | M11 | 2015 | 5.3 | 120874 |
8550 | South Carolina | M11 | 2016 | 4.0 | 91905 |
8551 | South Carolina | M12 | 2000 | 3.7 | 72211 |
8552 | South Carolina | M12 | 2001 | 5.8 | 110516 |
8553 | South Carolina | M12 | 2002 | 5.9 | 114920 |
8554 | South Carolina | M12 | 2003 | 6.7 | 133811 |
8555 | South Carolina | M12 | 2004 | 6.8 | 138995 |
8556 | South Carolina | M12 | 2005 | 6.7 | 138832 |
8557 | South Carolina | M12 | 2006 | 6.0 | 127512 |
8558 | South Carolina | M12 | 2007 | 5.8 | 121614 |
8559 | South Carolina | M12 | 2008 | 9.1 | 194839 |
8560 | South Carolina | M12 | 2009 | 11.5 | 244006 |
8561 | South Carolina | M12 | 2010 | 10.8 | 231105 |
8562 | South Carolina | M12 | 2011 | 9.9 | 213746 |
8563 | South Carolina | M12 | 2012 | 8.6 | 186783 |
8564 | South Carolina | M12 | 2013 | 6.2 | 134045 |
8565 | South Carolina | M12 | 2014 | 6.4 | 143637 |
8566 | South Carolina | M12 | 2015 | 5.4 | 122137 |
8567 | South Carolina | M12 | 2016 | 4.2 | 95277 |
408 rows × 5 columns
df.loc[(df.state == 'California') & (df.year >= 2007) & (df.year <= 2010),
['state', 'period', 'year', 'unemployment_rate', 'qty_unemployed']]
state | period | year | unemployment_rate | qty_unemployed | |
---|---|---|---|---|---|
823 | California | M01 | 2007 | 5.5 | 976429 |
824 | California | M01 | 2008 | 6.4 | 1158755 |
825 | California | M01 | 2009 | 10.2 | 1868614 |
826 | California | M01 | 2010 | 12.7 | 2315237 |
840 | California | M02 | 2007 | 5.4 | 956853 |
841 | California | M02 | 2008 | 6.3 | 1138022 |
842 | California | M02 | 2009 | 10.6 | 1935243 |
843 | California | M02 | 2010 | 12.6 | 2300364 |
857 | California | M03 | 2007 | 5.2 | 932960 |
858 | California | M03 | 2008 | 6.6 | 1185778 |
859 | California | M03 | 2009 | 10.9 | 1991432 |
860 | California | M03 | 2010 | 12.6 | 2312390 |
874 | California | M04 | 2007 | 5.0 | 895083 |
875 | California | M04 | 2008 | 6.1 | 1103750 |
876 | California | M04 | 2009 | 10.4 | 1900577 |
877 | California | M04 | 2010 | 12.0 | 2201236 |
891 | California | M05 | 2007 | 4.8 | 846097 |
892 | California | M05 | 2008 | 6.4 | 1163777 |
893 | California | M05 | 2009 | 10.6 | 1931991 |
894 | California | M05 | 2010 | 11.7 | 2140341 |
908 | California | M06 | 2007 | 5.3 | 940705 |
909 | California | M06 | 2008 | 7.0 | 1273683 |
910 | California | M06 | 2009 | 11.3 | 2060228 |
911 | California | M06 | 2010 | 12.0 | 2206906 |
925 | California | M07 | 2007 | 5.7 | 1023914 |
926 | California | M07 | 2008 | 7.6 | 1397333 |
927 | California | M07 | 2009 | 11.7 | 2142067 |
928 | California | M07 | 2010 | 12.6 | 2316525 |
942 | California | M08 | 2007 | 5.4 | 971032 |
943 | California | M08 | 2008 | 7.8 | 1419146 |
944 | California | M08 | 2009 | 11.5 | 2105749 |
945 | California | M08 | 2010 | 12.3 | 2271603 |
959 | California | M09 | 2007 | 5.3 | 958487 |
960 | California | M09 | 2008 | 7.7 | 1397892 |
961 | California | M09 | 2009 | 11.4 | 2079035 |
962 | California | M09 | 2010 | 11.9 | 2197318 |
976 | California | M10 | 2007 | 5.4 | 968525 |
977 | California | M10 | 2008 | 8.0 | 1461880 |
978 | California | M10 | 2009 | 11.6 | 2114293 |
979 | California | M10 | 2010 | 11.9 | 2188602 |
993 | California | M11 | 2007 | 5.6 | 1011518 |
994 | California | M11 | 2008 | 8.4 | 1541436 |
995 | California | M11 | 2009 | 11.7 | 2127225 |
996 | California | M11 | 2010 | 12.3 | 2268418 |
1010 | California | M12 | 2007 | 5.9 | 1056279 |
1011 | California | M12 | 2008 | 9.0 | 1642244 |
1012 | California | M12 | 2009 | 11.8 | 2134358 |
1013 | California | M12 | 2010 | 12.1 | 2212970 |
#alternatively
df.loc[(df.state == 'California') & df.year.between(2007, 2010, inclusive=True),
['state', 'period', 'year', 'unemployment_rate', 'qty_unemployed']]
state | period | year | unemployment_rate | qty_unemployed | |
---|---|---|---|---|---|
823 | California | M01 | 2007 | 5.5 | 976429 |
824 | California | M01 | 2008 | 6.4 | 1158755 |
825 | California | M01 | 2009 | 10.2 | 1868614 |
826 | California | M01 | 2010 | 12.7 | 2315237 |
840 | California | M02 | 2007 | 5.4 | 956853 |
841 | California | M02 | 2008 | 6.3 | 1138022 |
842 | California | M02 | 2009 | 10.6 | 1935243 |
843 | California | M02 | 2010 | 12.6 | 2300364 |
857 | California | M03 | 2007 | 5.2 | 932960 |
858 | California | M03 | 2008 | 6.6 | 1185778 |
859 | California | M03 | 2009 | 10.9 | 1991432 |
860 | California | M03 | 2010 | 12.6 | 2312390 |
874 | California | M04 | 2007 | 5.0 | 895083 |
875 | California | M04 | 2008 | 6.1 | 1103750 |
876 | California | M04 | 2009 | 10.4 | 1900577 |
877 | California | M04 | 2010 | 12.0 | 2201236 |
891 | California | M05 | 2007 | 4.8 | 846097 |
892 | California | M05 | 2008 | 6.4 | 1163777 |
893 | California | M05 | 2009 | 10.6 | 1931991 |
894 | California | M05 | 2010 | 11.7 | 2140341 |
908 | California | M06 | 2007 | 5.3 | 940705 |
909 | California | M06 | 2008 | 7.0 | 1273683 |
910 | California | M06 | 2009 | 11.3 | 2060228 |
911 | California | M06 | 2010 | 12.0 | 2206906 |
925 | California | M07 | 2007 | 5.7 | 1023914 |
926 | California | M07 | 2008 | 7.6 | 1397333 |
927 | California | M07 | 2009 | 11.7 | 2142067 |
928 | California | M07 | 2010 | 12.6 | 2316525 |
942 | California | M08 | 2007 | 5.4 | 971032 |
943 | California | M08 | 2008 | 7.8 | 1419146 |
944 | California | M08 | 2009 | 11.5 | 2105749 |
945 | California | M08 | 2010 | 12.3 | 2271603 |
959 | California | M09 | 2007 | 5.3 | 958487 |
960 | California | M09 | 2008 | 7.7 | 1397892 |
961 | California | M09 | 2009 | 11.4 | 2079035 |
962 | California | M09 | 2010 | 11.9 | 2197318 |
976 | California | M10 | 2007 | 5.4 | 968525 |
977 | California | M10 | 2008 | 8.0 | 1461880 |
978 | California | M10 | 2009 | 11.6 | 2114293 |
979 | California | M10 | 2010 | 11.9 | 2188602 |
993 | California | M11 | 2007 | 5.6 | 1011518 |
994 | California | M11 | 2008 | 8.4 | 1541436 |
995 | California | M11 | 2009 | 11.7 | 2127225 |
996 | California | M11 | 2010 | 12.3 | 2268418 |
1010 | California | M12 | 2007 | 5.9 | 1056279 |
1011 | California | M12 | 2008 | 9.0 | 1642244 |
1012 | California | M12 | 2009 | 11.8 | 2134358 |
1013 | California | M12 | 2010 | 12.1 | 2212970 |
The Query Syntax¶
Instead of indexing with loc
and iloc
pandas also has a query
method. This is particularly handy when we have multi-index data, since the methods above don’t carry over.
An example useage is:
state = 'California'
df.query('2010 >= year >= 2007 & state==@state')
state | period | year | unemployment_rate | qty_unemployed | qty_employed | labour_force | |
---|---|---|---|---|---|---|---|
823 | California | M01 | 2007 | 5.5 | 976429 | 16797415 | 17773844 |
824 | California | M01 | 2008 | 6.4 | 1158755 | 16844050 | 18002805 |
825 | California | M01 | 2009 | 10.2 | 1868614 | 16372621 | 18241235 |
826 | California | M01 | 2010 | 12.7 | 2315237 | 15863115 | 18178352 |
840 | California | M02 | 2007 | 5.4 | 956853 | 16839213 | 17796066 |
841 | California | M02 | 2008 | 6.3 | 1138022 | 16854691 | 17992713 |
842 | California | M02 | 2009 | 10.6 | 1935243 | 16339886 | 18275129 |
843 | California | M02 | 2010 | 12.6 | 2300364 | 15931087 | 18231451 |
857 | California | M03 | 2007 | 5.2 | 932960 | 16884325 | 17817285 |
858 | California | M03 | 2008 | 6.6 | 1185778 | 16865764 | 18051542 |
859 | California | M03 | 2009 | 10.9 | 1991432 | 16257780 | 18249212 |
860 | California | M03 | 2010 | 12.6 | 2312390 | 15970655 | 18283045 |
874 | California | M04 | 2007 | 5.0 | 895083 | 16863748 | 17758831 |
875 | California | M04 | 2008 | 6.1 | 1103750 | 16936957 | 18040707 |
876 | California | M04 | 2009 | 10.4 | 1900577 | 16300947 | 18201524 |
877 | California | M04 | 2010 | 12.0 | 2201236 | 16156408 | 18357644 |
891 | California | M05 | 2007 | 4.8 | 846097 | 16922244 | 17768341 |
892 | California | M05 | 2008 | 6.4 | 1163777 | 16932233 | 18096010 |
893 | California | M05 | 2009 | 10.6 | 1931991 | 16260585 | 18192576 |
894 | California | M05 | 2010 | 11.7 | 2140341 | 16178497 | 18318838 |
908 | California | M06 | 2007 | 5.3 | 940705 | 16958054 | 17898759 |
909 | California | M06 | 2008 | 7.0 | 1273683 | 16922825 | 18196508 |
910 | California | M06 | 2009 | 11.3 | 2060228 | 16206959 | 18267187 |
911 | California | M06 | 2010 | 12.0 | 2206906 | 16123264 | 18330170 |
925 | California | M07 | 2007 | 5.7 | 1023914 | 16981299 | 18005213 |
926 | California | M07 | 2008 | 7.6 | 1397333 | 16908797 | 18306130 |
927 | California | M07 | 2009 | 11.7 | 2142067 | 16185170 | 18327237 |
928 | California | M07 | 2010 | 12.6 | 2316525 | 16115939 | 18432464 |
942 | California | M08 | 2007 | 5.4 | 971032 | 16958142 | 17929174 |
943 | California | M08 | 2008 | 7.8 | 1419146 | 16873372 | 18292518 |
944 | California | M08 | 2009 | 11.5 | 2105749 | 16149025 | 18254774 |
945 | California | M08 | 2010 | 12.3 | 2271603 | 16171990 | 18443593 |
959 | California | M09 | 2007 | 5.3 | 958487 | 17003870 | 17962357 |
960 | California | M09 | 2008 | 7.7 | 1397892 | 16873297 | 18271189 |
961 | California | M09 | 2009 | 11.4 | 2079035 | 16080505 | 18159540 |
962 | California | M09 | 2010 | 11.9 | 2197318 | 16196643 | 18393961 |
976 | California | M10 | 2007 | 5.4 | 968525 | 16993192 | 17961717 |
977 | California | M10 | 2008 | 8.0 | 1461880 | 16863303 | 18325183 |
978 | California | M10 | 2009 | 11.6 | 2114293 | 16049503 | 18163796 |
979 | California | M10 | 2010 | 11.9 | 2188602 | 16160340 | 18348942 |
993 | California | M11 | 2007 | 5.6 | 1011518 | 17020073 | 18031591 |
994 | California | M11 | 2008 | 8.4 | 1541436 | 16748815 | 18290251 |
995 | California | M11 | 2009 | 11.7 | 2127225 | 16047439 | 18174664 |
996 | California | M11 | 2010 | 12.3 | 2268418 | 16118593 | 18387011 |
1010 | California | M12 | 2007 | 5.9 | 1056279 | 16957504 | 18013783 |
1011 | California | M12 | 2008 | 9.0 | 1642244 | 16629680 | 18271924 |
1012 | California | M12 | 2009 | 11.8 | 2134358 | 15940443 | 18074801 |
1013 | California | M12 | 2010 | 12.1 | 2212970 | 16116806 | 18329776 |