Advanced Python
In-Class Exercise Solutions, Session 8

Please note that data files are located in the parent directory. This means that you must prepend ../ to each filename to find it.

IMPORTING PANDAS AND NUMPY

Ex. 8.1 Import pandas and numpy. Check the .__version__ attribute of the pandas module.
import pandas as pd
import numpy as np

print(pd.__version__)
 
Ex. 8.2 Include some useful settings.
import warnings

import pandas as pd

## Enable multiple outputs from jupyter cells
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# set the "Copy-on-Write" functionality to avoid
# "copy of a slice" warning and behavior
pd.options.mode.copy_on_write = True

## set default number of DataFrame rows printed to 8
pd.set_option('display.max_rows', 8)

# suppress excel reader warning:  'extension is unsupported'
warnings.filterwarnings('ignore', category=UserWarning, module='openpyxl')
 

INITIALIZING A DATAFRAME

Ex. 8.3 DataFrame from python struct. Use this 3x3 list of lists to build a DataFrame. Add a keyword argument columns=[] to the DataFrame to assign column labels, and index=[] to assign row labels. Print the DataFrame.
import pandas as pd

# Sample data in the form of a list of lists
data = [
    [1, 2.5, 'apple'],
    [2, 3.0, 'banana'],
    [3, 4.2, 'orange']
]

df = pd.DataFrame(data, columns=['num', 'float', 'text'],
                        index=['a', 'b', 'c'])

print(df)
 

DATAFRAME FROM DATA FILES: CSV, JSON, SQL, HTML

Ex. 8.4 DataFrame from csv or excel. Read ../revenue.csv using the pd.read_csv() function. Then read ../revenue.xlsx using the pd.read_excel() function. Print each DataFrame.

(Note that the openpyxl module is required for reading .xls and xlrd module is required for eading .xlsx files. The engine= argument may be required, specifying one of the two, although in tests it appeared that this is no longer necessary.)

import pandas as pd

fname     = '../revenue.csv'
xlsx_name = '../revenue.xlsx'
xls_name  = '../revenue.xls'

df = pd.read_csv(fname)
df2 = pd.read_excel(xlsx_name)
df3 = pd.read_excel(xls_name)

print(df)
 
Ex. 8.5 Customize a data import. Add keyword arguments to configure the pd.read_csv() function.

The revenue_extended.csv file has 2 header lines (use skiprows=) and 2 footer lines (use skipfooter=). The column we'd like to use as the index is the 'id' column (use index_col=). Skipping rows means we must use the 'python' engine -- configure with engine='python'

import pandas as pd

fname = '../revenue_extended.csv'  # has extra headers, id field

df = pd.read_csv(fname, skiprows=2, skipfooter=2, index_col='id', engine='python')

print(df)
 
Ex. 8.6 DataFrame from SQL. Query the revenue table in ../testdb.db, passing the database connection object and the query below in the pd.read_sql() function. Print the DataFrame.
import pandas as pd
import sqlite3

db_fname = '../testdb.db'

query = 'SELECT * FROM revenue'

# database connection object
conn = sqlite3.connect(db_fname)

df = pd.read_sql(query, conn)

print(df)
 
Ex. 8.7 DataFrame from JSON. Read the JSON file organized as a list of dicts, as well as one organized as a dict of dicts, into a DataFrame using the .read_json() function.
# list of dicts
lod_filename = '../weather_newyork_lod.json'

# dict of dicts
dod_filename = '../weather_newyork_dod.json'


df = pd.read_json(lod_filename, orient='records')
print(df)

df = pd.read_json(dod_filename, orient='index')
print(df)
 
Ex. 8.8 DataFrame from embedded JSON. Look at the typical output of an Alphavantage API query, which contains a dict of dicts "embedded" within a larger JSON struct (this is saved in a local file is called alphavantage_5min.json). "Drill down" to the dict of dicts by using a subscript, and assign the dod by itself to a variable. Then read the dict of dicts object into a DataFrame using pd.DataFrame.from_dict() function, with orient='index'.
import pandas as pd
import json


fname = '../alphavantage_5min.json'

fh = open(fname)

obj = json.load(fh)

dod = obj['Time Series (5min)']

df = pd.DataFrame.from_dict(dod, orient='index')

print(df)
 
Ex. 8.9 DataFrame from HTML table. Read the HTML table into a DataFrame using the .read_html() function.

The function returns a list of DataFrames, one for each table found within the HTML page. Subscript this list to get the 1st (and only) table. For a bonus, when reading the dataframe, set the 'EST' column to the index and use the parse_dates=True parameter to convert the dates in the index to date-type objects (pandas Timestamp.

fname = '../weather_newyork.html'

df_list = pd.read_html(fname, index_col='EST', parse_dates=True)

df = df_list[0]

print(df)
 

QUICK APPRAISAL OF DATAFRAME

Ex. 8.10 Get an overview of the DataFrame:

  • Print df.shape to see the number of columns and rows
  • Print the DataFrame and look at the stats listed below it

import pandas as pd

fname = '../weather_newyork_narrow.csv'

df = pd.read_csv(fname)

print(df.shape)

df
 
Ex. 8.11 See the head and tail of a DataFrame:

  • Use .head() and .tail()
  • Use .columns and .index

import pandas as pd

fname = '../weather_newyork_narrow.csv'

df = pd.read_csv(fname)

print(df.head())
print()

print(df.tail())
print()

print(df.columns)
print()

print(df.index)
print()
 
Ex. 8.12 Get another overview on a larger DataFrame:

  • Look at .dtypes
  • Look at .info()

import pandas as pd

fname = '../weather_newyork_narrow.csv'

df = pd.read_csv(fname)

print(df)
print()

print(df.dtypes)
print()

print(df.info())
 

DATAFRAME SERIES

Ex. 8.13 View a DataFrame column. View the 'company' column using subscript syntax (df['company']), and the same column using attribute syntax (df.company). Print the type of this object, as well as the object itself.
import pandas as pd

fname = '../revenue_extended.csv'  # has extra headers, id field

df = pd.read_csv(fname, skiprows=2,
                        skipfooter=2,
                        index_col='id',
                        engine='python')

print(df['company'])
print()

print(df.company)
 
Ex. 8.14 View a DataFrame row. View the 'hd34' row using .loc['hd34'], and the same row using .iloc[0]. Print the object.
import pandas as pd

fname = '../revenue_extended.csv'  # has extra headers, id field

df = pd.read_csv(fname, skiprows=2,
                        skipfooter=2,
                        index_col='id',
                        engine='python')

print(df.loc['hd34'])
print()

print(df.iloc[0])
 
Ex. 8.15 View Series .name and .index. First assign the 'company' column Series to a variable. Then read these attributes from the Series.
import pandas as pd

fname = '../revenue_extended.csv'  # has extra headers, id field

df = pd.read_csv(fname, skiprows=2,
                        skipfooter=2,
                        index_col='id',
                        engine='python')

s = df.company

print(s.name)

print(s.index)
 

MANIPULATING THE INDEX

Ex. 8.16 Print the .index and .columns attributes on this DataFrame. Then reorder the rows to 'a', 'b', 'c' using .reindex().
import pandas as pd


data = [
    [1, 2.5, 'apple'],
    [2, 3.0, 'banana'],
    [3, 4.2, 'orange']
]

df = pd.DataFrame(data, columns=['num', 'float', 'text'],
                        index=['c', 'b', 'a'])


print(df.index)
print(df.columns)

df = df.reindex(['a', 'b', 'c'])

print(df)
 
Ex. 8.17 Reading from weather_newyork.csv, use .reindex with the argument axis=1 to select the 'EST', 'Max TemperatureF' and 'Min TemperatureF' columns. (Question: what happens if you leave off the axis=1? Why would pandas respond that way??)
import pandas as pd

df = pd.read_csv('../weather_newyork.csv')


df = df.reindex(['EST', 'Max TemperatureF','Min TemperatureF'], axis=1)

print(df)
 
Ex. 8.18 Use .reindex() to reorder the columns (again, with axis=1 on the DataFrame to 'float', 'num', and 'terxt' (deliberately misspell 'text'). Note that the 'terxt' column springs into existence with empty values.
import pandas as pd

data = [
    [1, 2.5, 'apple'],
    [2, 3.0, 'banana'],
    [3, 4.2, 'orange']
]

df = pd.DataFrame(data, columns=['num', 'float', 'text'],
                        index=['c', 'b', 'a'])


df = df.reindex(['float', 'num', 'terxt'], axis=1)

print(df)
 
Ex. 8.19 Compare .set_index() to .reset_index().

Use .set_index() to set the id column for the DataFrame as the index. Print or display the DataFrame. Next, use .reset_index() to reset the index to the default (integers), while moving the index to a new column. Lastly, add the drop=True parameter argument to drop the index entirely, and note that the old index has been dropped.

import pandas as pd

fname = '../revenue_extended.csv'

df = pd.read_csv(fname, skiprows=2,
                        skipfooter=2,
                        engine='python')


df = df.set_index('id')

print(df)
print()

df = df.reset_index(drop=True)

print(df)
 
Ex. 8.20 Using .rename(), rename column labels: 'EST' -> 'date', 'Max TemperatureF' -> 'max_temp' and 'Min TemperatureF' -> 'min_temp'.
import pandas as pd

df = pd.read_csv('../weather_newyork.csv')

dfs = df.reindex(['EST', 'Max TemperatureF', 'Min TemperatureF'], axis=1)


dfs = dfs.rename(columns={'EST': 'date', 'Max TemperatureF': 'max_temp',
                                         'Min TemperatureF': 'min_temp'})

print(dfs)
 

DATAFRAME SLICING

Ex. 8.21 Slice rows using .loc[] and a list of labels. Select just 'hd34', 'hp21' and 'tc77'.
import pandas as pd

fname = '../revenue_extended.csv'

df = pd.read_csv(fname, skiprows=2,
                        skipfooter=2,
                        index_col='id',
                        engine='python')


print(df.loc[['hd34', 'hp21', 'tc77']])
 
Ex. 8.22 Slice rows using .loc[] and a range of labels. Select 'ts06' through 'df8'.
import pandas as pd

fname = '../revenue_extended.csv'

df = pd.read_csv(fname, skiprows=2,
                        skipfooter=2,
                        index_col='id',
                        engine='python')


print(df.loc['ts06': 'df8'])
 
Ex. 8.23 Use .loc[] to slice columns and rows. Select all rows but just the 'company' and 'revenue' columns (use ':' for the row selector, i.e. the 1st argument to .loc[], and a list of column labels for the columns).
import pandas as pd

fname = '../revenue_extended.csv'

df = pd.read_csv(fname, skiprows=2,
                        skipfooter=2,
                        index_col='id',
                        engine='python')


print(df.loc[:, ['company', 'revenue']])
 
Ex. 8.24 Use .iloc[] indexing to select columns and rows. Select a 2x2 grid of the DataFrame using indices.
import pandas as pd

fname = '../weather_newyork.csv'

df = pd.read_csv(fname)


print(df.iloc[0:2, 0:2])
 

CONDITIONAL SLICING

Ex. 8.25 Slice based on single condition. Using .loc[] with a condition, look for all 'high' temps above 100, then try above 90.

Then, add 2nd 'bucket' to .loc[] to select the 'high' and 'precip' columns.

import pandas as pd

fname = '../weather_newyork_narrow.csv'

df = pd.read_csv(fname)


print(df.loc[  df.high > 90, ['high', 'precip']  ])
 
Ex. 8.26 Slice based on dual condition. Select for a 'high' of > 90 and a 'precip' of > 0.5. Separate the tests with an & and put parentheses around each. (This will fail, and uncover an anomaly that we'll identify next.)
import pandas as pd

fname = '../weather_newyork_narrow.csv'

df = pd.read_csv(fname)

hitemps = df.loc[ (df.high > 90) & (df.precip > 0.5) ]
   # fails with this message:
   # TypeError: '>' not supported between instances of 'str' and 'float'
 
Ex. 8.27 Use .unique() to see all unique values for a Series. Apply this method to the .precip column to see all values this column may contain. See if you can identify the field that caused the numeric comparison to fail.
import pandas as pd

fname = '../weather_newyork_narrow.csv'

df = pd.read_csv(fname)


print(df.precip.unique())

Can you see the anomalous value that caused the '<' comparison to fail? We'll fix this issue after we've learned a few more pandas features.

 
Ex. 8.28 Use .isin() as a condition to .loc[]. Select states 'NY' and 'NJ' from revenue.csv -- the argument to .isin() should be a list of the state values.
import pandas as pd

df = pd.read_csv('../revenue.csv')


print(df.loc[  df.state.isin(['NY', 'NJ'])  ])
 
Ex. 8.29 Use a str method condition. Use the str.isdigit() method (accessed through the .str accessor) to filter rows with digits in the 'transactions' column. Following this put a tilde in front of the test to show those rows with transactions that are not all digits.
import pandas as pd

df = pd.read_csv('../revenue_extended.csv', skiprows=2,
                                            skipfooter=2,
                                            engine='python')

print(df.loc[  ~df.transactions.str.isdigit()  ])
 

"COPY ON WRITE" / "COPY OF A SLICE" WARNING

Ex. 8.30 Trigger a "Copy of a Slice" Warning.
Here we are creating a slice by using a double subscript:
import pandas as pd

# set the "Copy-on-Write" functionality to avoid
# "copy of a slice" warning and behavior
# pd.options.mode.copy_on_write = True


dfi = pd.DataFrame({'c1': [ 1,  2, 3,  4],
                    'c2': [16, 17, 18, 19],
                    'c3': [21, 22, 23, 24],
                    'c4': [26, 27, 28, 29] },
           index = ['r1', 'r2', 'r3', 'r4'])


dfis = dfi[ dfi.c1 > 2 ]
print(dfis)

                  #     c1  c2  c3  c4
                  # r4   3  18  23  28
                  # r5   4  19  24  29


dfis.c3 = dfis.c1 * dfis.c2


# alternative: using .loc[] indexing
# dfis.loc[ :, 'c3' ] = dfis.c1 * dfis.c2

This setting forces pandas to create a new copy of the DataFrame any time we attempt to write to a slice (which usually triggers the tricky and confusing "copy of a slice" warning).

 

VECTORIZED OPERATIONS

Ex. 8.31 Use a subscript to add a new column 'tax', set to 0.04
import pandas as pd


df = pd.read_csv('../revenue.csv')


df['tax'] = 0.04

print(df)
 
Ex. 8.32 Vectorize column value. Add a new column 'tax amt', set to the 'revenue' value * 'tax' amount.
import pandas as pd

df = pd.read_csv('../revenue.csv')

df['tax'] = 0.04

df['tax amt'] = df.revenue * df.tax

print(df)
 
Ex. 8.33 Vectorizing with condition. Continuing the above solution, use .loc[] with a condition to set 'tax' to 0.0625 for all 'NJ' rows, then recalculate tax amount.
import pandas as pd

df = pd.read_csv('../revenue.csv')

df['tax'] = 0.04

df.loc[ df.state == 'NJ', 'tax' ] = 0.06

df['tax amt'] = df.revenue * df.tax

print(df)
 
Ex. 8.34 .replace() method, or .loc[] assignment. Returning to the 'precip' column, replace the non-numeric value we identified previously with 0. Use vectorized condition as in previous, or the Series .replace() method.
import pandas as pd

df = pd.read_csv('../weather_newyork_narrow.csv')


df = df.replace('T', 0)

# alt. solution (instead of above) using .loc[]
# df.loc[ df.precip == 'T', 'precip' ] = 0

print(df.precip.unique())

# once the value has been replaced,
# uncomment to try the comparison again
# hitemps = df.loc[ (df.high > 90) & (df.precip > 0.5) ]

After replacing the value, run .unique() to see that the anomalous value is no longer there. Then you may uncomment the .loc[] to see that it still doesn't work. Any idea why? We'll come back to this.

 
Ex. 8.35 Vectorizing with function. Use .apply() with a condition to create column 'revf' converting 'revenue' to a string with formatting.
def format(n):
    return f'${n:,.2f}'   # makes 2.3 into '$2.30'


import pandas as pd

df = pd.read_csv('../revenue.csv')

df['revf'] = df.revenue.apply(format)

print(df)
 

'SUMMARY' FUNCTION ON SERIES

Ex. 8.36 Use the .sum() method to sum up the 'revenue' column.
import pandas as pd

df = pd.read_csv('../revenue.csv')


print(df.revenue.sum())
 
Ex. 8.37 Use the .mean() method on a single column. Calculate the average of the 'revenue' column.
import pandas as pd

df = pd.read_csv('../revenue.csv')


print(df.revenue.mean())
 
Ex. 8.38 Use the .mean() method on numeric columns. Now use .mean() on the DataFrame itself to calculate the average of the 'revenue' and 'transactions' columns. You must include the argument numeric_only=True, or the method will fail on the columns that contain strings.
import pandas as pd

df = pd.read_csv('../revenue.csv')


print(df.mean(numeric_only=True))
 
Ex. 8.39 Use summary function with condition. Show the average of the 'revenue' column, but only for those rows with 'NY' in the 'state' column.
import pandas as pd

df = pd.read_csv('../revenue.csv')


print(  df.loc[  df.state == 'NY' ].revenue.mean())
 

WORKING WITH DTYPES

Ex. 8.40 Set the dtype for a column. First, use .dtypes to show the column types. You may use .unique() to determine the range of values for the 'transactions' column, which explains why tahat column is type object.

Next, replace the '-' values in the 'transactions' column with 0's, then again check the .dtypes (and use .unique() if you wish. Finally, use .astype() to set the column dtype to 'int', then check the .dtypes again.

import pandas as pd

df = pd.read_csv('../revenue_extended.csv', skiprows=2,
                                            skipfooter=2,
                                            engine='python')

print(df.dtypes)


print(df.transactions.unique())

df.transactions = df.transactions.replace('-', 0)

df.transactions = df.transactions.astype(int)

print(df.dtypes)
 
Ex. 8.41 use pd.to_numeric. Change the 'a' and 'b' columns to a numeric type, and show dtypes afterwards.
import pandas as pd

dol = {  'a': ['1', '2', '3'],
         'b': ['2.9', '3.5', '4.9'],
         'c': ['yourstr', 'mystr', 'theirstr']  }

df = pd.DataFrame(dol, index=['w', 'x', 'y'])


print(df.dtypes)


df.a = pd.to_numeric(df.a)
df.b = pd.to_numeric(df.b)

print(df.dtypes)
 
Ex. 8.42 Solve the earlier problem. Now that we can 'cast' or change dtypes, change the .precip column to numeric so we can complete the compound test.
import pandas as pd

fname = '../weather_newyork_narrow.csv'

df = pd.read_csv(fname)

df.precip = df.precip.replace('T', 0)


df.precip = df.precip.astype(float)

hitemps = df.loc[ (df.high > 90) & (df.precip > 0.5) ]

print(hitemps)
 

SORTING

Ex. 8.43 Read this csv, set the 'date' to the index column, and sort by index, with 'ascending=False'.
import pandas as pd

fname = '../weather_newyork_narrow.csv'

df = pd.read_csv(fname, index_col='date')

dfs = df.sort_index(ascending=False)

dfs
 
Ex. 8.44 Use .sort_values() to sort the DataFrame by the 'high' value. Use ascending=False to reverse the sort.
import pandas as pd

fname = '../weather_newyork_narrow.csv'

df = pd.read_csv(fname, index_col='date')

dfs = df.sort_values('high', ascending=False)

dfs
 
Ex. 8.45 Use axis=1 to .sort_index() to sort the DataFrame by its columns.
import pandas as pd

fname = '../revenue.csv'

df = pd.read_csv(fname)

dfs = df.sort_index(axis=1)

df
 

CONCATENATING / APPENDING

Ex. 8.46 First view, then concatenate the two DataFrames vertically.
import pandas as pd

df = pd.DataFrame({ 'a': [1, 2, 3],
                    'b': [2.9, 3.5, 4.9],
                    'c': ['yourstr', 'mystr', 'theirstr']  },
                    index=['r1', 'r2', 'r3'])

df2 = pd.DataFrame({ 'b': [4, 5, 6],
                     'c': [7.8, 8.2, 9.1],
                     'd': ['thingy', 'thatty', 'flunky']  },
                     index=['r2', 'r3', 'r4'])

print(df)
print()
print(df2)

df3 = pd.concat([df, df2])

df3

Note that the indices are repeated - this can be changed using index manipulation (.rename(), .reindex(), .set_index().

 
Ex. 8.47 First view, then concatenate the two DataFrames horizontally.
import pandas as pd

df = pd.DataFrame({ 'a': [1, 2, 3],
                    'b': [2.9, 3.5, 4.9],
                    'c': ['yourstr', 'mystr', 'theirstr']  },
                    index=['r1', 'r2', 'r3'])

df2 = pd.DataFrame({ 'd': ['xylo', 'yellow', 'zello'],
                     'e': [0, 0, 0],
                     'f': [1.1, 1.1, 1.2]  },
                     index=['r1', 'r2', 'r3'])

print(df)
print()
print(df2)
print()

df3 = pd.concat([df, df2], axis=1)

df3
 

MERGE / JOIN

Ex. 8.48 View the two below DataFrames. Then use pd.merge() to join the student_db_grades.csv and student_db_names.csv files into a single DataFrame, joining on the id (use on='id'). Compare "how='outer'" (19 rows, including missing data at the end) to "how='inner'" (only 15 rows, with ids common to both tables) to effect an outer join.
import pandas as pd

sgrades = pd.read_csv('../student_db_grades.csv', sep=':')
snames = pd.read_csv('../student_db_names.csv')

print(sgrades.head(2))
print('===')
print(snames.head(2))
print('===')
sm = pd.merge(sgrades, snames, how='inner', on='id')

sm
 
Ex. 8.49 Reading the same files but setting the 'id' column to the indexes of each DataFrame, use df.join() (the same operation as pd.merge()) to perform the above joins.
import pandas as pd

sgrades = pd.read_csv('../student_db_grades.csv', sep=':', index_col='id')
snames = pd.read_csv('../student_db_names.csv', index_col='id')

sj = sgrades.join(snames)

sj
 

AGGREGATIONS WITH df.groupby() AND pd.pivot_table()

Please run this cell to view the sales-funnel spreadsheet:

import pandas as pd

df = pd.read_excel("../sales-funnel.xlsx")

df
Ex. 8.50 Single-column aggregation (.groupby() and .pivot_table(). Show sum of 'SaleAmount' grouped by 'SalesRep'.
import pandas as pd

df = pd.read_excel("../sales-funnel.xlsx")

dfgb = df.groupby('SalesRep').SaleAmount.sum()

dfgb


dfpt = pd.pivot_table(df, index='SalesRep',
                          values='SaleAmount',
                          aggfunc='sum')

dfpt
 
Ex. 8.51 Multi-column aggregation (.groupby() and .pivot_table()). Show sums of 'SaleAmount' and 'Quantity' grouped by 'SalesRep' and by 'Product'.
import pandas as pd

df = pd.read_excel("../sales-funnel.xlsx")

dfgb = df.groupby(['SalesRep', 'Product'])[['SaleAmount', 'Quantity']].sum()

dfgb


dfpt = pd.pivot_table(df, index=['SalesRep', 'Product'],
                          values=['SaleAmount', 'Quantity'], aggfunc='sum')

dfpt
 
Ex. 8.52 Further agg by secondary value (.groupby() and .pivot_table()). Perform the above pivot on just 'SalesRep', but use columns= with a list to show the results broken down by 'Product'.
import pandas as pd
import numpy as np

df = pd.read_excel("../sales-funnel.xlsx")

dfpt = pd.pivot_table(df, index='SalesRep',
                          aggfunc='sum',
                          values=['SaleAmount', 'Quantity'],
                          columns='Product')

dfpt

dfgb = df.groupby(['SalesRep', 'Product'])[['SaleAmount', 'Quantity']].sum()

dfgb
 
Ex. 8.53 Use .stack() and .unstack() (with .groupby()). Looking at the above example, note that .groupby() produces results in "long" format, while .pivot_table() renders them in "wide" format. Use .unstack() and .stack() to move between the two.
import pandas as pd
import numpy as np

df = pd.read_excel("../sales-funnel.xlsx")

dfgb = df.groupby(['SalesRep', 'Product'])[['SaleAmount', 'Quantity']].sum()

dfgb

dfgbus = dfgb.unstack()

dfgbus

dfgbss = dfgbus.stack()

dfgbss
 

MULTIINDEX

Please begin by noting the multi-level index that is produced by the below multi-column aggregation.

import pandas as pd
import numpy as np

df = pd.read_excel("../sales-funnel.xlsx")

table = pd.pivot_table(df, index=['Manager', 'SalesRep'],
                           aggfunc='sum')

table
Ex. 8.54 MultiIndex: slicing at top level. Slice the multi-index to show just 'Account' and 'Quantity', and just for Manager 'Debra Henley'.
import pandas as pd
import numpy as np

df = pd.read_excel("../sales-funnel.xlsx")

table = pd.pivot_table(df, index=['Manager', 'SalesRep'],
                           aggfunc='sum')

table.loc['Debra Henley', ['Account', 'Quantity']]
 
Ex. 8.55 MultiIndex: slicing at 2nd level. Use a 2-item tuple (first-level label, second-level label) to slice for just 'Debra Henley' at the first level and 'Craig Booker' at the 2nd level.
import pandas as pd
import numpy as np

df = pd.read_excel("../sales-funnel.xlsx")

table = pd.pivot_table(df, index=['Manager', 'SalesRep'],
                           aggfunc='sum')

tl = table.loc[('Debra Henley', 'Craig Booker')]

tl
 
Ex. 8.56 MultiIndex: "cross section" slicing. Use the .xs() method with level=1 to show rows just for 'Craig Booker'.
import pandas as pd
import numpy as np

df = pd.read_excel("../sales-funnel.xlsx")

table = pd.pivot_table(df, index=['Manager', 'SalesRep'],
                           aggfunc='sum')

txs = table.xs('Craig Booker', level=1)

txs
 
Ex. 8.57 MultiIndex: use .query('Manager == "Debra Henley"') to select rows.
import pandas as pd
import numpy as np

df = pd.read_excel("../sales-funnel.xlsx")

table = pd.pivot_table(df, index=['Manager', 'SalesRep'],
                           aggfunc='sum')

table.query("Manager == 'Debra Henley'")

# table.query("Manager == 'Debra Henley' and SalesRep == 'Craig Booker'")

#table.query("SalesRep in ['Craig Booker', 'Daniel Hilton']")
 
Ex. 8.58 MultiIndex: First review the below DataFrame; then use .query('Status in ["pending", "won"]') to select rows.
import pandas as pd

df = pd.read_excel("../sales-funnel.xlsx")

dfq = df.query('Status in ["pending", "won"]')

dfq
 
Ex. 8.59 MultiIndex: use .loc[] with a 1- or 2-item tuple to select rows. Keep in mind that a 1-item tuple must have a comma after the item, for example ('Manager',)
import pandas as pd
import numpy as np

df = pd.read_excel("../sales-funnel.xlsx")

table = pd.pivot_table(df, index=['Manager', 'SalesRep'],
                           aggfunc='sum')

table.loc[('Fred Anderson', 'Wendy Yule')]
 
[pr]