In-Class Exercise Solutions, Session 8

Advanced Python
In-Class Exercise Solutions, Session 8


	Please note that data files are located in the parent directory. This means that you must prepend ../ to each filename to find it.

	IMPORTING PANDAS AND NUMPY
Ex. 8.1	Import pandas and numpy. Check the .__version__ attribute of the pandas module.
	import pandas as pd import numpy as np print(pd.__version__)

Ex. 8.2	Include some useful settings.
	import warnings import pandas as pd ## Enable multiple outputs from jupyter cells from IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = "all" # set the "Copy-on-Write" functionality to avoid # "copy of a slice" warning and behavior pd.options.mode.copy_on_write = True ## set default number of DataFrame rows printed to 8 pd.set_option('display.max_rows', 8) # suppress excel reader warning: 'extension is unsupported' warnings.filterwarnings('ignore', category=UserWarning, module='openpyxl')


	INITIALIZING A DATAFRAME
Ex. 8.3	DataFrame from python struct. Use this 3x3 list of lists to build a DataFrame. Add a keyword argument columns=[] to the DataFrame to assign column labels, and index=[] to assign row labels. Print the DataFrame.
	import pandas as pd # Sample data in the form of a list of lists data = [ [1, 2.5, 'apple'], [2, 3.0, 'banana'], [3, 4.2, 'orange'] ] df = pd.DataFrame(data, columns=['num', 'float', 'text'], index=['a', 'b', 'c']) print(df)


	DATAFRAME FROM DATA FILES: CSV, JSON, SQL, HTML
Ex. 8.4	DataFrame from csv or excel. Read ../revenue.csv using the pd.read_csv() function. Then read ../revenue.xlsx using the pd.read_excel() function. Print each DataFrame.
	(Note that the openpyxl module is required for reading .xls and xlrd module is required for eading .xlsx files. The engine= argument may be required, specifying one of the two, although in tests it appeared that this is no longer necessary.)
	import pandas as pd fname = '../revenue.csv' xlsx_name = '../revenue.xlsx' xls_name = '../revenue.xls' df = pd.read_csv(fname) df2 = pd.read_excel(xlsx_name) df3 = pd.read_excel(xls_name) print(df)

Ex. 8.5	Customize a data import. Add keyword arguments to configure the pd.read_csv() function.
	The revenue_extended.csv file has 2 header lines (use skiprows=) and 2 footer lines (use skipfooter=). The column we'd like to use as the index is the 'id' column (use index_col=). Skipping rows means we must use the 'python' engine -- configure with engine='python'
	import pandas as pd fname = '../revenue_extended.csv' # has extra headers, id field df = pd.read_csv(fname, skiprows=2, skipfooter=2, index_col='id', engine='python') print(df)

Ex. 8.6	DataFrame from SQL. Query the revenue table in ../testdb.db, passing the database connection object and the query below in the pd.read_sql() function. Print the DataFrame.
	import pandas as pd import sqlite3 db_fname = '../testdb.db' query = 'SELECT * FROM revenue' # database connection object conn = sqlite3.connect(db_fname) df = pd.read_sql(query, conn) print(df)

Ex. 8.7	DataFrame from JSON. Read the JSON file organized as a list of dicts, as well as one organized as a dict of dicts, into a DataFrame using the .read_json() function.
	# list of dicts lod_filename = '../weather_newyork_lod.json' # dict of dicts dod_filename = '../weather_newyork_dod.json' df = pd.read_json(lod_filename, orient='records') print(df) df = pd.read_json(dod_filename, orient='index') print(df)

Ex. 8.8	DataFrame from embedded JSON. Look at the typical output of an Alphavantage API query, which contains a dict of dicts "embedded" within a larger JSON struct (this is saved in a local file is called alphavantage_5min.json). "Drill down" to the dict of dicts by using a subscript, and assign the dod by itself to a variable. Then read the dict of dicts object into a DataFrame using pd.DataFrame.from_dict() function, with orient='index'.
	import pandas as pd import json fname = '../alphavantage_5min.json' fh = open(fname) obj = json.load(fh) dod = obj['Time Series (5min)'] df = pd.DataFrame.from_dict(dod, orient='index') print(df)

Ex. 8.9	DataFrame from HTML table. Read the HTML table into a DataFrame using the .read_html() function.
	The function returns a list of DataFrames, one for each table found within the HTML page. Subscript this list to get the 1st (and only) table. For a bonus, when reading the dataframe, set the 'EST' column to the index and use the parse_dates=True parameter to convert the dates in the index to date-type objects (pandas Timestamp.
	fname = '../weather_newyork.html' df_list = pd.read_html(fname, index_col='EST', parse_dates=True) df = df_list[0] print(df)


	QUICK APPRAISAL OF DATAFRAME
Ex. 8.10	Get an overview of the DataFrame:
	Print df.shape to see the number of columns and rows Print the DataFrame and look at the stats listed below it
	import pandas as pd fname = '../weather_newyork_narrow.csv' df = pd.read_csv(fname) print(df.shape) df

Ex. 8.11	See the head and tail of a DataFrame:
	Use .head() and .tail() Use .columns and .index
	import pandas as pd fname = '../weather_newyork_narrow.csv' df = pd.read_csv(fname) print(df.head()) print() print(df.tail()) print() print(df.columns) print() print(df.index) print()

Ex. 8.12	Get another overview on a larger DataFrame:
	Look at .dtypes Look at .info()
	import pandas as pd fname = '../weather_newyork_narrow.csv' df = pd.read_csv(fname) print(df) print() print(df.dtypes) print() print(df.info())


	DATAFRAME SERIES
Ex. 8.13	View a DataFrame column. View the 'company' column using subscript syntax (df['company']), and the same column using attribute syntax (df.company). Print the type of this object, as well as the object itself.
	import pandas as pd fname = '../revenue_extended.csv' # has extra headers, id field df = pd.read_csv(fname, skiprows=2, skipfooter=2, index_col='id', engine='python') print(df['company']) print() print(df.company)

Ex. 8.14	View a DataFrame row. View the 'hd34' row using .loc['hd34'], and the same row using .iloc[0]. Print the object.
	import pandas as pd fname = '../revenue_extended.csv' # has extra headers, id field df = pd.read_csv(fname, skiprows=2, skipfooter=2, index_col='id', engine='python') print(df.loc['hd34']) print() print(df.iloc[0])

Ex. 8.15	View Series .name and .index. First assign the 'company' column Series to a variable. Then read these attributes from the Series.
	import pandas as pd fname = '../revenue_extended.csv' # has extra headers, id field df = pd.read_csv(fname, skiprows=2, skipfooter=2, index_col='id', engine='python') s = df.company print(s.name) print(s.index)


	MANIPULATING THE INDEX
Ex. 8.16	Print the .index and .columns attributes on this DataFrame. Then reorder the rows to 'a', 'b', 'c' using .reindex().
	import pandas as pd data = [ [1, 2.5, 'apple'], [2, 3.0, 'banana'], [3, 4.2, 'orange'] ] df = pd.DataFrame(data, columns=['num', 'float', 'text'], index=['c', 'b', 'a']) print(df.index) print(df.columns) df = df.reindex(['a', 'b', 'c']) print(df)

Ex. 8.17	Reading from weather_newyork.csv, use .reindex with the argument axis=1 to select the 'EST', 'Max TemperatureF' and 'Min TemperatureF' columns. (Question: what happens if you leave off the axis=1? Why would pandas respond that way??)
	import pandas as pd df = pd.read_csv('../weather_newyork.csv') df = df.reindex(['EST', 'Max TemperatureF','Min TemperatureF'], axis=1) print(df)

Ex. 8.18	Use .reindex() to reorder the columns (again, with axis=1 on the DataFrame to 'float', 'num', and 'terxt' (deliberately misspell 'text'). Note that the 'terxt' column springs into existence with empty values.
	import pandas as pd data = [ [1, 2.5, 'apple'], [2, 3.0, 'banana'], [3, 4.2, 'orange'] ] df = pd.DataFrame(data, columns=['num', 'float', 'text'], index=['c', 'b', 'a']) df = df.reindex(['float', 'num', 'terxt'], axis=1) print(df)

Ex. 8.19	Compare .set_index() to .reset_index().
	Use .set_index() to set the id column for the DataFrame as the index. Print or display the DataFrame. Next, use .reset_index() to reset the index to the default (integers), while moving the index to a new column. Lastly, add the drop=True parameter argument to drop the index entirely, and note that the old index has been dropped.
	import pandas as pd fname = '../revenue_extended.csv' df = pd.read_csv(fname, skiprows=2, skipfooter=2, engine='python') df = df.set_index('id') print(df) print() df = df.reset_index(drop=True) print(df)

Ex. 8.20	Using .rename(), rename column labels: 'EST' -> 'date', 'Max TemperatureF' -> 'max_temp' and 'Min TemperatureF' -> 'min_temp'.
	import pandas as pd df = pd.read_csv('../weather_newyork.csv') dfs = df.reindex(['EST', 'Max TemperatureF', 'Min TemperatureF'], axis=1) dfs = dfs.rename(columns={'EST': 'date', 'Max TemperatureF': 'max_temp', 'Min TemperatureF': 'min_temp'}) print(dfs)


	DATAFRAME SLICING
Ex. 8.21	Slice rows using .loc[] and a list of labels. Select just 'hd34', 'hp21' and 'tc77'.
	import pandas as pd fname = '../revenue_extended.csv' df = pd.read_csv(fname, skiprows=2, skipfooter=2, index_col='id', engine='python') print(df.loc[['hd34', 'hp21', 'tc77']])

Ex. 8.22	Slice rows using .loc[] and a range of labels. Select 'ts06' through 'df8'.
	import pandas as pd fname = '../revenue_extended.csv' df = pd.read_csv(fname, skiprows=2, skipfooter=2, index_col='id', engine='python') print(df.loc['ts06': 'df8'])

Ex. 8.23	Use .loc[] to slice columns and rows. Select all rows but just the 'company' and 'revenue' columns (use ':' for the row selector, i.e. the 1st argument to .loc[], and a list of column labels for the columns).
	import pandas as pd fname = '../revenue_extended.csv' df = pd.read_csv(fname, skiprows=2, skipfooter=2, index_col='id', engine='python') print(df.loc[:, ['company', 'revenue']])

Ex. 8.24	Use .iloc[] indexing to select columns and rows. Select a 2x2 grid of the DataFrame using indices.
	import pandas as pd fname = '../weather_newyork.csv' df = pd.read_csv(fname) print(df.iloc[0:2, 0:2])


	CONDITIONAL SLICING
Ex. 8.25	Slice based on single condition. Using .loc[] with a condition, look for all 'high' temps above 100, then try above 90.
	Then, add 2nd 'bucket' to .loc[] to select the 'high' and 'precip' columns.
	import pandas as pd fname = '../weather_newyork_narrow.csv' df = pd.read_csv(fname) print(df.loc[ df.high > 90, ['high', 'precip'] ])

Ex. 8.26	Slice based on dual condition. Select for a 'high' of > 90 and a 'precip' of > 0.5. Separate the tests with an & and put parentheses around each. (This will fail, and uncover an anomaly that we'll identify next.)
	import pandas as pd fname = '../weather_newyork_narrow.csv' df = pd.read_csv(fname) hitemps = df.loc[ (df.high > 90) & (df.precip > 0.5) ] # fails with this message: # TypeError: '>' not supported between instances of 'str' and 'float'

Ex. 8.27	Use .unique() to see all unique values for a Series. Apply this method to the .precip column to see all values this column may contain. See if you can identify the field that caused the numeric comparison to fail.
	import pandas as pd fname = '../weather_newyork_narrow.csv' df = pd.read_csv(fname) print(df.precip.unique())
	Can you see the anomalous value that caused the '<' comparison to fail? We'll fix this issue after we've learned a few more pandas features.

Ex. 8.28	Use .isin() as a condition to .loc[]. Select states 'NY' and 'NJ' from revenue.csv -- the argument to .isin() should be a list of the state values.
	import pandas as pd df = pd.read_csv('../revenue.csv') print(df.loc[ df.state.isin(['NY', 'NJ']) ])

Ex. 8.29	Use a str method condition. Use the str.isdigit() method (accessed through the .str accessor) to filter rows with digits in the 'transactions' column. Following this put a tilde in front of the test to show those rows with transactions that are not all digits.
	import pandas as pd df = pd.read_csv('../revenue_extended.csv', skiprows=2, skipfooter=2, engine='python') print(df.loc[ ~df.transactions.str.isdigit() ])


	"COPY ON WRITE" / "COPY OF A SLICE" WARNING
Ex. 8.30	Trigger a "Copy of a Slice" Warning.
	Here we are creating a slice by using a double subscript: import pandas as pd # set the "Copy-on-Write" functionality to avoid # "copy of a slice" warning and behavior # pd.options.mode.copy_on_write = True dfi = pd.DataFrame({'c1': [ 1, 2, 3, 4], 'c2': [16, 17, 18, 19], 'c3': [21, 22, 23, 24], 'c4': [26, 27, 28, 29] }, index = ['r1', 'r2', 'r3', 'r4']) dfis = dfi[ dfi.c1 > 2 ] print(dfis) # c1 c2 c3 c4 # r4 3 18 23 28 # r5 4 19 24 29 dfis.c3 = dfis.c1 * dfis.c2 # alternative: using .loc[] indexing # dfis.loc[ :, 'c3' ] = dfis.c1 * dfis.c2
	This setting forces pandas to create a new copy of the DataFrame any time we attempt to write to a slice (which usually triggers the tricky and confusing "copy of a slice" warning).


	VECTORIZED OPERATIONS
Ex. 8.31	Use a subscript to add a new column 'tax', set to 0.04
	import pandas as pd df = pd.read_csv('../revenue.csv') df['tax'] = 0.04 print(df)

Ex. 8.32	Vectorize column value. Add a new column 'tax amt', set to the 'revenue' value * 'tax' amount.
	import pandas as pd df = pd.read_csv('../revenue.csv') df['tax'] = 0.04 df['tax amt'] = df.revenue * df.tax print(df)

Ex. 8.33	Vectorizing with condition. Continuing the above solution, use .loc[] with a condition to set 'tax' to 0.0625 for all 'NJ' rows, then recalculate tax amount.
	import pandas as pd df = pd.read_csv('../revenue.csv') df['tax'] = 0.04 df.loc[ df.state == 'NJ', 'tax' ] = 0.06 df['tax amt'] = df.revenue * df.tax print(df)

Ex. 8.34	.replace() method, or .loc[] assignment. Returning to the 'precip' column, replace the non-numeric value we identified previously with 0. Use vectorized condition as in previous, or the Series .replace() method.
	import pandas as pd df = pd.read_csv('../weather_newyork_narrow.csv') df = df.replace('T', 0) # alt. solution (instead of above) using .loc[] # df.loc[ df.precip == 'T', 'precip' ] = 0 print(df.precip.unique()) # once the value has been replaced, # uncomment to try the comparison again # hitemps = df.loc[ (df.high > 90) & (df.precip > 0.5) ]
	After replacing the value, run .unique() to see that the anomalous value is no longer there. Then you may uncomment the .loc[] to see that it still doesn't work. Any idea why? We'll come back to this.

Ex. 8.35	Vectorizing with function. Use .apply() with a condition to create column 'revf' converting 'revenue' to a string with formatting.
	def format(n): return f'${n:,.2f}' # makes 2.3 into '$2.30' import pandas as pd df = pd.read_csv('../revenue.csv') df['revf'] = df.revenue.apply(format) print(df)


	'SUMMARY' FUNCTION ON SERIES
Ex. 8.36	Use the .sum() method to sum up the 'revenue' column.
	import pandas as pd df = pd.read_csv('../revenue.csv') print(df.revenue.sum())

Ex. 8.37	Use the .mean() method on a single column. Calculate the average of the 'revenue' column.
	import pandas as pd df = pd.read_csv('../revenue.csv') print(df.revenue.mean())

Ex. 8.38	Use the .mean() method on numeric columns. Now use .mean() on the DataFrame itself to calculate the average of the 'revenue' and 'transactions' columns. You must include the argument numeric_only=True, or the method will fail on the columns that contain strings.
	import pandas as pd df = pd.read_csv('../revenue.csv') print(df.mean(numeric_only=True))

Ex. 8.39	Use summary function with condition. Show the average of the 'revenue' column, but only for those rows with 'NY' in the 'state' column.
	import pandas as pd df = pd.read_csv('../revenue.csv') print( df.loc[ df.state == 'NY' ].revenue.mean())


	WORKING WITH DTYPES
Ex. 8.40	Set the dtype for a column. First, use .dtypes to show the column types. You may use .unique() to determine the range of values for the 'transactions' column, which explains why tahat column is type object.
	Next, replace the '-' values in the 'transactions' column with 0's, then again check the .dtypes (and use .unique() if you wish. Finally, use .astype() to set the column dtype to 'int', then check the .dtypes again.
	import pandas as pd df = pd.read_csv('../revenue_extended.csv', skiprows=2, skipfooter=2, engine='python') print(df.dtypes) print(df.transactions.unique()) df.transactions = df.transactions.replace('-', 0) df.transactions = df.transactions.astype(int) print(df.dtypes)

Ex. 8.41	use pd.to_numeric. Change the 'a' and 'b' columns to a numeric type, and show dtypes afterwards.
	import pandas as pd dol = { 'a': ['1', '2', '3'], 'b': ['2.9', '3.5', '4.9'], 'c': ['yourstr', 'mystr', 'theirstr'] } df = pd.DataFrame(dol, index=['w', 'x', 'y']) print(df.dtypes) df.a = pd.to_numeric(df.a) df.b = pd.to_numeric(df.b) print(df.dtypes)

Ex. 8.42	Solve the earlier problem. Now that we can 'cast' or change dtypes, change the .precip column to numeric so we can complete the compound test.
	import pandas as pd fname = '../weather_newyork_narrow.csv' df = pd.read_csv(fname) df.precip = df.precip.replace('T', 0) df.precip = df.precip.astype(float) hitemps = df.loc[ (df.high > 90) & (df.precip > 0.5) ] print(hitemps)


	SORTING
Ex. 8.43	Read this csv, set the 'date' to the index column, and sort by index, with 'ascending=False'.
	import pandas as pd fname = '../weather_newyork_narrow.csv' df = pd.read_csv(fname, index_col='date') dfs = df.sort_index(ascending=False) dfs

Ex. 8.44	Use .sort_values() to sort the DataFrame by the 'high' value. Use ascending=False to reverse the sort.
	import pandas as pd fname = '../weather_newyork_narrow.csv' df = pd.read_csv(fname, index_col='date') dfs = df.sort_values('high', ascending=False) dfs

Ex. 8.45	Use axis=1 to .sort_index() to sort the DataFrame by its columns.
	import pandas as pd fname = '../revenue.csv' df = pd.read_csv(fname) dfs = df.sort_index(axis=1) df


	CONCATENATING / APPENDING
Ex. 8.46	First view, then concatenate the two DataFrames vertically.
	import pandas as pd df = pd.DataFrame({ 'a': [1, 2, 3], 'b': [2.9, 3.5, 4.9], 'c': ['yourstr', 'mystr', 'theirstr'] }, index=['r1', 'r2', 'r3']) df2 = pd.DataFrame({ 'b': [4, 5, 6], 'c': [7.8, 8.2, 9.1], 'd': ['thingy', 'thatty', 'flunky'] }, index=['r2', 'r3', 'r4']) print(df) print() print(df2) df3 = pd.concat([df, df2]) df3
	Note that the indices are repeated - this can be changed using index manipulation (.rename(), .reindex(), .set_index().

Ex. 8.47	First view, then concatenate the two DataFrames horizontally.
	import pandas as pd df = pd.DataFrame({ 'a': [1, 2, 3], 'b': [2.9, 3.5, 4.9], 'c': ['yourstr', 'mystr', 'theirstr'] }, index=['r1', 'r2', 'r3']) df2 = pd.DataFrame({ 'd': ['xylo', 'yellow', 'zello'], 'e': [0, 0, 0], 'f': [1.1, 1.1, 1.2] }, index=['r1', 'r2', 'r3']) print(df) print() print(df2) print() df3 = pd.concat([df, df2], axis=1) df3


	MERGE / JOIN
Ex. 8.48	View the two below DataFrames. Then use pd.merge() to join the student_db_grades.csv and student_db_names.csv files into a single DataFrame, joining on the id (use on='id'). Compare "how='outer'" (19 rows, including missing data at the end) to "how='inner'" (only 15 rows, with ids common to both tables) to effect an outer join.
	import pandas as pd sgrades = pd.read_csv('../student_db_grades.csv', sep=':') snames = pd.read_csv('../student_db_names.csv') print(sgrades.head(2)) print('===') print(snames.head(2)) print('===') sm = pd.merge(sgrades, snames, how='inner', on='id') sm

Ex. 8.49	Reading the same files but setting the 'id' column to the indexes of each DataFrame, use df.join() (the same operation as pd.merge()) to perform the above joins.
	import pandas as pd sgrades = pd.read_csv('../student_db_grades.csv', sep=':', index_col='id') snames = pd.read_csv('../student_db_names.csv', index_col='id') sj = sgrades.join(snames) sj


	AGGREGATIONS WITH df.groupby() AND pd.pivot_table()

	Please run this cell to view the sales-funnel spreadsheet:
	import pandas as pd df = pd.read_excel("../sales-funnel.xlsx") df
Ex. 8.50	Single-column aggregation (.groupby() and .pivot_table(). Show sum of 'SaleAmount' grouped by 'SalesRep'.
	import pandas as pd df = pd.read_excel("../sales-funnel.xlsx") dfgb = df.groupby('SalesRep').SaleAmount.sum() dfgb dfpt = pd.pivot_table(df, index='SalesRep', values='SaleAmount', aggfunc='sum') dfpt

Ex. 8.51	Multi-column aggregation (.groupby() and .pivot_table()). Show sums of 'SaleAmount' and 'Quantity' grouped by 'SalesRep' and by 'Product'.
	import pandas as pd df = pd.read_excel("../sales-funnel.xlsx") dfgb = df.groupby(['SalesRep', 'Product'])[['SaleAmount', 'Quantity']].sum() dfgb dfpt = pd.pivot_table(df, index=['SalesRep', 'Product'], values=['SaleAmount', 'Quantity'], aggfunc='sum') dfpt

Ex. 8.52	Further agg by secondary value (.groupby() and .pivot_table()). Perform the above pivot on just 'SalesRep', but use columns= with a list to show the results broken down by 'Product'.
	import pandas as pd import numpy as np df = pd.read_excel("../sales-funnel.xlsx") dfpt = pd.pivot_table(df, index='SalesRep', aggfunc='sum', values=['SaleAmount', 'Quantity'], columns='Product') dfpt dfgb = df.groupby(['SalesRep', 'Product'])[['SaleAmount', 'Quantity']].sum() dfgb

Ex. 8.53	Use .stack() and .unstack() (with .groupby()). Looking at the above example, note that .groupby() produces results in "long" format, while .pivot_table() renders them in "wide" format. Use .unstack() and .stack() to move between the two.
	import pandas as pd import numpy as np df = pd.read_excel("../sales-funnel.xlsx") dfgb = df.groupby(['SalesRep', 'Product'])[['SaleAmount', 'Quantity']].sum() dfgb dfgbus = dfgb.unstack() dfgbus dfgbss = dfgbus.stack() dfgbss


	MULTIINDEX

	Please begin by noting the multi-level index that is produced by the below multi-column aggregation.
	import pandas as pd import numpy as np df = pd.read_excel("../sales-funnel.xlsx") table = pd.pivot_table(df, index=['Manager', 'SalesRep'], aggfunc='sum') table
Ex. 8.54	MultiIndex: slicing at top level. Slice the multi-index to show just 'Account' and 'Quantity', and just for Manager 'Debra Henley'.
	import pandas as pd import numpy as np df = pd.read_excel("../sales-funnel.xlsx") table = pd.pivot_table(df, index=['Manager', 'SalesRep'], aggfunc='sum') table.loc['Debra Henley', ['Account', 'Quantity']]

Ex. 8.55	MultiIndex: slicing at 2nd level. Use a 2-item tuple (first-level label, second-level label) to slice for just 'Debra Henley' at the first level and 'Craig Booker' at the 2nd level.
	import pandas as pd import numpy as np df = pd.read_excel("../sales-funnel.xlsx") table = pd.pivot_table(df, index=['Manager', 'SalesRep'], aggfunc='sum') tl = table.loc[('Debra Henley', 'Craig Booker')] tl

Ex. 8.56	MultiIndex: "cross section" slicing. Use the .xs() method with level=1 to show rows just for 'Craig Booker'.
	import pandas as pd import numpy as np df = pd.read_excel("../sales-funnel.xlsx") table = pd.pivot_table(df, index=['Manager', 'SalesRep'], aggfunc='sum') txs = table.xs('Craig Booker', level=1) txs

Ex. 8.57	MultiIndex: use .query('Manager == "Debra Henley"') to select rows.
	import pandas as pd import numpy as np df = pd.read_excel("../sales-funnel.xlsx") table = pd.pivot_table(df, index=['Manager', 'SalesRep'], aggfunc='sum') table.query("Manager == 'Debra Henley'") # table.query("Manager == 'Debra Henley' and SalesRep == 'Craig Booker'") #table.query("SalesRep in ['Craig Booker', 'Daniel Hilton']")

Ex. 8.58	MultiIndex: First review the below DataFrame; then use .query('Status in ["pending", "won"]') to select rows.
	import pandas as pd df = pd.read_excel("../sales-funnel.xlsx") dfq = df.query('Status in ["pending", "won"]') dfq

Ex. 8.59	MultiIndex: use .loc[] with a 1- or 2-item tuple to select rows. Keep in mind that a 1-item tuple must have a comma after the item, for example ('Manager',)
	import pandas as pd import numpy as np df = pd.read_excel("../sales-funnel.xlsx") table = pd.pivot_table(df, index=['Manager', 'SalesRep'], aggfunc='sum') table.loc[('Fred Anderson', 'Wendy Yule')]