Python 3

home

pandas: Introduction

pandas is a Python module used for manipulation and analysis of tabular data. * Excel-like numeric calculations, particularly column-wise and row-wise calculations (vectorization) * SQL-like merging, grouping and aggregating * visualizing (line chart, bar chart, etc.) * emphasis on: - aligning data from multiple sources - "slicing and dicing" by rows and columns - concatenating and joining - cleaning and normalizing missing or incorrect data - working with time series - categorizing * ability to read and write to CSV, XML, Excel, database queries, etc. numpy is a data analysis library upon which pandas is built. We sometimes make direct calls to numpy - some of its variables (such as np.nan), variable-generating functions (such as np.arange or np.linlist) and some processing functions.

pandas Reference

Use the docs for an ongoing study of pandas' rich feature set.

pandas official documentation

full docs (HTML, pdf)

http://pandas.pydata.org/pandas-docs/stable

"10 minutes to pandas"

https://pandas.pydata.org/pandas-docs/stable/10min.html

pandas cookbook

http://pandas.pydata.org/pandas-docs/stable/cookbook.html

matplotlib official documentation

http://matplotlib.org/api/pyplot_api.html

Tutorials and Learning Aids

Online resources are many; when you find one you like, stick with it

pandas textbook "Python for Data Analysis" by Wes McKinney

http://astronomi.erciyes.edu.tr/wp-content/uploads/astronom/pdf/OReilly%20Python%20for%20Data%20Analysis.pdf

(If the above link goes stale, simply search Python for Data Analysis pdf.)

The Second Edition is available from O'Reilly on Safari Bookshelf:

http://shop.oreilly.com/product/0636920050896.do

Please keep in mind that pandas is in active development, which means that features may be added, removed and changed (latest version: 0.25.2)

blog tutorials

These often provide the kind of "insider view" that is most helpful when getting oriented.

Tom Augspurger blog (6-part series)

http://tomaugspurger.github.io/modern-1.html

Greg Reda blog (3-part series)

http://gregreda.com/2013/10/26/intro-to-pandas-data-structures/

cheat sheet (Treehouse)

https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PandasPythonForDataScience.pdf

Inspecting attributes and docs on pandas and pandas objects

An object type that is new to us can be explored through attribute inspection -- we can list the object's attributes with dir() and see brief documentation on an attribute with help().

import pandas as pd

# list of pandas attributes (global vars)
print(dir(pd))

# list of pandas functions (filtering for <class 'function'> only)
import types
print([ attr for attr in dir(pd) if isinstance(getattr(pd, attr), types.FunctionType) ])

# short doc on read_csv() function
help(pd.read_csv)       # help on the read_csv function of pandas


# list of Series attributes
s1 = pd.Series()
print((dir(s1)))

# list of DataFrame attributes
df = pd.DataFrame()
print(dir(df))

# list of DataFrame methods (filtering for <class 'method'> only)
print([ attr for attr in dir(df)
        if isinstance(getattr(df, attr), types.MethodType) ])

# short doc on DataFrame join() method
help(df.join)           # help on the join() method of a DataFrame

pandas Object Types: DataFrame, Series, Index

DataFrame: rows and columns; Series: a single column or single row; Index: column or row labels.

DataFrame: * is the core pandas structure -- a 2-dimensional array / list of lists * is like an Excel spreadsheet - rows and columns with row and column labels * is like a "dict of dicts" in that columns and rows can be indexed by label * is like a "list of lists" in that columns and rows can be indexed by integer index * offers "vectorized" operations (sum rows or columns, modify values across rows, etc.) * offers database-like and excel-like manipulations (merge, groupby, pivot table etc.) * is the core pandas structure -- a 2-dimensional array * is also like an Excel spreadsheet - rows, columns, and row and column labels

import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3],
                   'b': [10, 20, 30],
                   'c': [100, 200, 300]},
                   index=['r1', 'r2', 'r3'])

print(df)
            #      a   b    c
            # r1   1  10  100
            # r2   2  20  200
            # r3   3  30  300

print(df['c']['r2'])    # 200

Series * a sequence of items * items addressable by index integer ("list-like") * items addressable by index label ("dict-like") * has a dtype attribute that holds its objects' common type

# read a column as a Series (use DataFrame subscript)
bcol = df['b']
print(bcol)
            # r1    10
            # r2    20
            # r3    30
            # Name: b, dtype: int64


# read a row as a Series (use subscript of df.loc[])
oneidx = df.loc['r2']
print(oneidx)
            # a      2
            # b     20
            # c    200
            # Name: r2, dtype: int64

# create a Series from scratch
s1 = pd.Series([1, 3, 5, 7, 9], index=['a', 'b', 'c', 'd', 'e'])
print(s1)
            # a  1
            # b  3
            # c  5
            # d  7
            # e  9
            # dtype:  int64

Index * an object that provides indexing for both the Series (its item index) and the DataFrame (its column or row index). * is also list-like

columns = df.columns   # Index(['a',  'b',  'c'],  dtype='object')
idx = df.index         # Index(['r1', 'r2', 'r3'], dtype='object')

The Series: Ordered Sequence of Values Indexed by Label

Like a list, but with item labels... so like a dict too.

ordered sequence with integer position of each item (list-like)
index attribute labels each item (dict-like)
name attribute

Initialize a single series

import pandas as pd

s1 = pd.Series([10, 20, 30], index=['r1', 'r2', 'r3'], name='a')

print(s1)
            # r1    10
            # r2    20
            # r3    30
            # Name: b, dtype: int64

s2 = pd.Series(['x', 'y', 'z'], index=['r1', 'r2', 'r3'], name='b')


# combine Series to make a DataFrame
df = pd.DataFrame(s1, s2)

The Series: Initializing and Subscripting

A Series is pandas object representing a column or row in a DataFrame.

Every DataFrame column or row is a Series:

df = pd.DataFrame( {'a': [1, 2],
                    'b': [8, 9] },
                    index=['r1', 'r2'] )

print(df)
                           #     a  b
                           # r1  1  8
                           # r2  2  9

# DataFrame string subscript accesses a column
print(df['a'])             # 0    1
                           # 1    2
                           # Name: a, dtype: int64

print(type(df['a']))       # <class 'pandas.core.series.Series'>


# DataFrame .loc[] indexer accesses the rows
print(df.loc['r1'])        # a    1
                           # b    8
                           # Name: r1, dtype: int64

print(type(df.loc['r1']))  # <class 'pandas.core.series.Series'>

A Series can be also be initialized on its own:

s1 = pd.Series([1, 2, 3, 4])
s2 = pd.Series([5, 6, 7, 8])

We can combine Series into DataFrames:

df = pd.DataFrame([s1, s2])       # add Series as rows

df = pd.concat([s1, s2], axis=1)  # add Series as columns

The DataFrame: tabular data (columns and rows)

The DataFrame is the pandas workhorse structure. It is a 2-dimensional structure with columns and rows (i.e., a lot like a spreadsheet).

Rows and columns can be accessed by index labels (dict-like) or by integer indices (list-like).

Initializing

import pandas as pd

# initialize a new, empty DataFrame
df = pd.DataFrame()


# init with dict of lists (keyed to columns) and index
df = pd.DataFrame( {'a': [1, 2, 3],
                    'b': [1.0, 1.5, 2.0],
                    'c': ['a', 'b', 'c']  },
                    index=['r1', 'r2', 'r3'] )

print(df)
            #     a    b  c
            # r1  1  1.0  a
            # r2  2  1.5  b
            # r3  3  2.0  c

previewing the DataFrame

print(len(df))     # 3 (# of rows)

print(df.head(2))  # 1st 2 rows

print(df.tail(2))  # last 2 rows

column attribute / subscripting: delivers a Series

sa = df.a         # or df['a']

print(sa)          # pandas.Series([1, 2, 3], index=['r1', 'r2', 'r3'], name='a')

label and positional access

print(df.loc['r2', 'b'])    # 1.5
print(df.iloc[1, 1])        # 1.5

.columns and .index attributes

Columns and rows can be accessed through the DataFrame's attributes:

print(df.columns)        # Index(['a', 'b', 'c'], dtype='object')
print(df.index)          # Index(['r1', 'r2', 'r3'], dtype='object')

DataFrame as a list of Series objects Again, any DataFrame's columns or rows can be sliced out as a Series:

# read a column as a Series (use DataFrame subscript)
bcol = df['b']
print(bcol)
            # r1    1.0
            # r2    1.5
            # r3    2.0
            # Name: b, dtype: float64



# read a row as a Series (use subscript of df.loc[])
oneidx = df.loc['r2']
print(oneidx)
            # a      2
            # b    1.5
            # c      b
            # Name: r2, dtype: object

Note: df is a common variable name for pandas DataFrame objects; you will see this name used frequently in these examples.

The DataFrame: Initializing and Subscripting

A dataframe can be indexed like a list and subscripted like a dictionary.

import pandas as pd
import numpy as np

# initialize a new, empty DataFrame
df = pd.DataFrame()

# initialize a DataFrame with sample data
df = pd.DataFrame( {'a': [1, 2, 3, 4],
                    'b': [1.0, 1.5, 2.0, 2.5],
                    'c': ['a', 'b', 'c', 'd'] }, index=['r1', 'r2', 'r3', 'r4'] )

print(df)

# a DataFrame, printed
#     a    b  c
# r1  1  1.0  a
# r2  2  1.5  b
# r3  3  2.0  c
# f4  4  2.5  d

DataFrame subscript: column Series

s = df['a']

print(s)        # r1    1
                # r2    2
                # r3    3
                # r4    4
                # Name: a, dtype: int64

print(type(s))   # Series

The Index: DataFrame Column or Index Labels

An Index object is used to specify a DataFrame's columns or index, or a Series index.

Columns and Indices

A DataFrame makes use of two Index objects: one to represent the columns, and one to represent the rows.

df = pd.DataFrame( {'a': [1, 2, 3, 4],
                    'b': [1.0, 1.5, 2.0, 2.5],
                    'c': ['a', 'b', 'c', 'd'],
                    'd': [100, 200, 300, 400] },
                    index=['r1', 'r2', 'r3', 'r4'] )

print(df)
    #     a    b  c    d
    # r1  1  1.0  a  100
    # r2  2  1.5  b  200
    # r3  3  2.0  c  300
    # r4  4  2.5  d  400

print((df.index))    # Index(['r1', 'r2', 'r3', 'r4'], dtype=object)

print((df.columns))  # Index(['a', 'b', 'c', 'd'], dtype=object)

# set name for index and columns
df.index.name = 'year'
df.columns.name = 'state'

s_index = s1.index     # Index(['r1', 'r2', 'r3'])
columns = df.columns   # Index(['a',  'b',  'c'],  dtype='object')
idx = df.index         # Index(['r1', 'r2', 'r3'], dtype='object')

.rename() method: columns or index labels can be reset using this DataFrame method.

df = df.rename(columns={'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'},
               index={'r1': 'R1', 'r2': 'R2', 'r3': 'R3', 'r4': 'R4'})
print(df)
    #     A    B  C    D
    # R1  1  1.0  a  100
    # R2  2  1.5  b  200
    # R3  3  2.0  c  300
    # R4  4  2.5  d  400

.columns, .index: the columns or index can also be set directly using the DataFrame's attributes (this would have the same effect as above):

df.columns = ['A', 'B', 'C', 'D']
df.index = ['r1', 'r2', 'r3', 'r4']

we can set names for index and column indices:

df.index.name = 'year'
df.columns.name = 'state'

There are a number of "exotic" Index object types: Index (standard, default and most common Index type) RangeIndex (index built from an integer range) Int64Index, UInt64Index, Float64Index (index values of specific types) DatetimeIndex, TimedeltaIndex, PeriodIndex, IntervalIndex (datetime-related indices) CategoricalIndex (index related to the Categorical type)

Manipulating the Index

An Index can be set with a column or other sequence.

Sometimes a pd.read_excel() includes index labels in the first column. We can easily set the index with .set_index():

print(df)
    #     0  a    b  c           d
    # 0  r1  1  1.0  a  2016-11-01
    # 1  r2  2  1.5  b  2016-12-01
    # 2  r3  3  2.0  c  2017-01-01
    # 3  r4  4  2.5  d  2018-02-01

df = df.set_index(df[0])
df = df[['a', 'b', 'c', 'd']]
print(df)
    #     a    b  c           d
    # 0
    # r1  1  1.0  a  2016-11-01
    # r2  2  1.5  b  2016-12-01
    # r3  3  2.0  c  2017-01-01
    # r4  4  2.5  d  2018-02-01

We can reset the index with .reset_index(), although this makes the index into a new column.

df2 = df.reset_index()
print(df2)
    #     0  a    b  c           d
    #
    # 0  r1  1  1.0  a  2016-11-01
    # 1  r2  2  1.5  b  2016-12-01
    # 2  r3  3  2.0  c  2017-01-01
    # 3  r4  4  2.5  d  2018-02-01

Or to drop the index when resetting, we can use drop=True:

df3 = df2.reset_index(drop=True)
print(df3)
    #     a    b  c           d
    #
    # r1  1  1.0  a  2016-11-01
    # r2  2  1.5  b  2016-12-01
    # r3  3  2.0  c  2017-01-01
    # r4  4  2.5  d  2018-02-01

We can also sort the DataFrame by index using .sort_index():

df4 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]},
                   index=['Cello', 'Alpha', 'Bow'])
df5 = df4.sort_index()
print(df5)
    #        a  b
    # Alpha  2  5
    # Bow    3  6
    # Cello  1  4

The default is to sort by the row index; axis=1 allows us to sort by columns. ascending=False reverses the sort:

df6 = df5.sort_index(axis=1, ascending=False)
print(df6)
    #        b  a
    # Alpha  5  2
    # Bow    6  3
    # Cello  4  1

Note that .sort_values() offers the same options for sorting a specified column or row.

.reindex(): we can change the order of the indices and thus the rows.

df5 = df.reindex(reversed(df.index))

df5 = df5.reindex(columns=reversed(df.columns))

print(df5)
                      #  state  D    C  B    A
                      #  year
                      #  R4     400  d  2.5  4
                      #  R3     300  c  2.0  3
                      #  R2     200  b  1.5  2
                      #  R1     100  a  1.0  1

pandas Objects are Like Python Objects

DataFrames behave as you might expect when converted to any Python container

df = pd.DataFrame( {'a': [1, 2, 3, 4],
                    'b': [1.0, 1.5, 2.0, 2.5],
                    'c': ['a', 'b', 'b', 'a'] }, index=['r1', 'r2', 'r3', 'r4'] )


print(len(df))             # 4

print(len(df.columns))     # 3

print(max(df['a']))        # 4

print(list(df['a']))       # [1, 2, 3, 4]     (column for 'a')

print(list(df.loc['r2']))  # [2, 1.5, 'b']   (row for 'r2')

print(set(df['c']))        # {'b', 'a'}       (a set of unique values)

DataFrame .values -- convert to a list of numpy arrays

An numpy array is a list-like object. simple list comprehension could convert these to a list of lists:

print(df.values)

    # array([[1, 1.0, 'a'],
    #        [2, 1.5, 'b'],
    #        [3, 2.0, 'b'],
    #        [4, 2.5, 'a']], dtype=object)


lol = list( [ list(item) for item in df.values ])

print lol

                          # [ [1, 1.0, 'a'],
                          #   [2, 1.5, 'b'],
                          #   [3, 2.0, 'b'],
                          #   [4, 2.5, 'a'] ]

looping - loops through columns

for colname in df:
    print('{}:  {}'.format(colname, type(df[colname])))

                          # a:  <pandas.core.series.Series>
                          # b:  <pandas.core.series.Series>
                          # c:  <pandas.core.series.Series>


# looping with iterrows -- loops through rows
for row_index, row_series in df.iterrows():
    print('{}:  {}'.format(row_index, type(row_series)))

                          # r1:  <pandas.core.series.Series>
                          # r2:  <pandas.core.series.Series>
                          # r3:  <pandas.core.series.Series>
                          # r4:  <pandas.core.series.Series>

Although keep in mind that we generally prefer vectorized operations across columns or rows to looping (discussed later).

DataFrame: Initializing from Data Source

DataFrame can be read from CSV, JSON, Excel and XML formats.

Note: df is a common variable name for pandas DataFrame objects; you will see this name used frequently in these examples.

CSV

# read from file
df = pd.read_csv('quarterly_revenue_2017Q4.csv')


# write to file
wfh = open('output.csv', 'w')
df.to_csv(wfh, na_rep='NULL')


# reading from Fama-French file (the abbreviated file, no header)
# sep= indicates the delimiter on which to split() the fields
# names= indicates the column heads
df = pd.read_csv('FF_abbreviated.txt', sep='\s+',
                                       names=['date', 'MktRF', 'SMB', 'HML', 'RF'])


# reading from Fama-French non-abbreviated (the main file including headers and footers)
# skiprows=5:  start reading 5 rows down
df = pd.read_csv('F-F_Research_Data_Factors_daily.txt', skiprows=5, sep='\s+',
                                                        names=['date', 'MktRF', 'SMB', 'HML', 'RF'])

df.to_csv('newfile.csv')

Excel

# reading from excel file to DataFrame
df = pd.read_excel('revenue.xlsx', sheet_name='Sheet1')

# optional:  produce a 'reader' object used to obtain sheet names, etc.
xls_file = pd.ExcelFile('data.xls')    # produce a file 'reader' object
df = xls_file.parse('Sheet1')          # parse a selected sheet


# write to excel
df.to_excel('data2.xls', sheet_name='Sheet1')

JSON

# sample df for demo purposes
df = pd.DataFrame( {'a': [1, 2, 3, 4],
                    'b': [1.0, 1.5, 2.0, 2.5],
                    'c': ['a', 'b', 'c', 'd'] }, index=['r1', 'r2', 'r3', 'r4'] )



# write dataframe to JSON
pd.json.dump(df, open('df.json', 'w'))

mydict = pd.json.load(open('df.json'))
new_df = pd.DataFrame(mydict)

Relational Database

import sqlite3                        # file-based database format
conn = sqlite3.connect('example.db')  # a db connection object

df = pd.read_sql('SELECT this FROM that', conn)

The above can be used with any database connection (MySQL, Oracle, etc.)

From Clipboard: this option is excellent for cutting and pasting data from websites

df = pd.read_clipboard(skiprows=5, sep='\s+',
                       names=['date', 'MktRF', 'SMB', 'HML', 'RF'])

Dataframe and Series dtypes

pandas infers a column type based on values and applies it to the column automatically.

Pandas is built on top of numpy, a numeric processing module, compiled in C for efficiency. Unlike core Python containers (but similar to a database table), numpy cares about object type. Wherever possible, numpy will assign a type to a column of values and attempt to maintain the type's integrity. This is done for the same reason it is done with database tables: speed and space efficiency. In the below DataFrame, numpy/pandas "sniffs out" the type of a column Series. It will set the type most appropriate to the values.

import pandas as pd

df = pd.DataFrame( {'a': [1, 2, 3, 4],
                    'b': [1.0, 1.5, 2.0, 2.5],
                    'c': ['a', 'b', 'c', 'd'],
                    'd': ['2016-11-01', '2016-12-01', '2017-01-01', '2018-02-01'] },
                    index=['r1', 'r2', 'r3', 'r4'] )

print(df)
                  #     a    b  c          d
                  # r1  1  1.0  a 2016-11-01
                  # r2  2  1.5  b 2016-12-01
                  # r3  3  2.0  c 2017-01-01
                  # r4  4  2.5  d 2017-02-01

print(df.dtypes)

                  # a      int64       # note special pandas types int64 and float64
                  # b    float64
                  # c     object       # 'object' is general-purpose type,
                  # d     object       #     covers strings or mixed-type columns
                  # dtype: object

You can use the regular integer index to set element values in an existing Series. However, the new element value must be the same type as that defined in the Series; if not, pandas may refuse, or it may upconvert or cast the Series column to a more general type (usually object, because numpy is focused on the efficiency of numeric and datetime types).

print(df.b.dtype)       # float64
df.loc['b'] = 'hello'
print(df.b.dtype)       # object

Note that we never told pandas to store these values as floats. But since they are all floats, pandas decided to set the type.

We can change a dtype for a Series ourselves with .astype():

df.a = df.a.astype('object')     # or df['a'] = df['a'].astype('object')

         #  df.loc[0, 'a'] = 'hello'

The numpy dtypes you are most likely to see are:

int64
float64
datetime64
object

Checking the memory usage of a DataFrame

.info() provides approximate memory size of a DataFrame

df.info()   # on the original example at the top

         #  <class 'pandas.core.frame.DataFrame'>
         #  Index: 4 entries, r1 to r4
         #  Data columns (total 4 columns):
         #  a    4 non-null int64
         #  b    4 non-null float64
         #  c    4 non-null object
         #  d    4 non-null object
         #  dtypes: float64(1), int64(1), object(2)
         #  memory usage: 160.0+ bytes

'+' means "probably larger" -- info() only sizes numeric types, not 'object'

With memory_usage='deep', size includes type'object'

df.info(memory_usage='deep')

         # memory usage: 832 bytes

Selecting a Series from a DataFrame

Use a subscript (or attribute) to access columns by label; use the .loc[] or .iloc[] attributes to access rows by label or integer index.

a DataFrame:

df = pd.DataFrame( {'a': [1, 2, 3, 4],
                    'b': [1.0, 1.5, 2.0, 2.5],
                    'c': ['a', 'b', 'c', 'd'],
                    'd': [100, 200, 300, 400] },
                    index=['r1', 'r2', 'r3', 'r4'] )

access column as Series:

cola = df['a']       # Series with [1, 2, 3, 4] and index ['r1', 'r2', 'r3', 'r4']

cola = df.a          # same -- can often use attribute labels for column name

print(cola)

            # r1    1
            # r2    2
            # r3    3
            # r4    4
            # Name: a, dtype: int64

access row as Series using index label 'r2':

row2 = df.loc['r2']  # Series [2, 1.5, 'b', 200] and index ['a', 'b', 'c', 'd']

access row as Series using integer index:

row2 = df.iloc[1]    # Series [2, 1.5, 'b', 200] and index ['a', 'b', 'c', 'd'] (same as above)

print(row2)

            # a      2
            # b    1.5
            # c      b
            # d    200
            # Name: r2, dtype: object

(Note that the .ix DataFrame indexer is a legacy feature and is deprecated.)

Most Operations Produce a New DataFrame

Some DataFrame operations provide the inplace=True option

Keep in mind that many operations produce a new DataFrame copy. This means that if you are working with a large dataset, you can avoid allocating additional memory with this option.

import pandas as pd

df = pd.DataFrame({ 'a': [1, 2, 3, 4],
                    'b': [1.0, 1.5, 2.0, 2.5],
                    'c': ['a', 'b', 'c', 'd']   })

print(df)
    #     a    b  c
    # r1  1  1.0  a
    # r2  2  1.5  b
    # r3  3  2.0  c
    # r4  4  2.5  d

df2 = df.set_index('a')

print(df2)             # new dataframe

    #      b  c
    # a
    # 1  1.0  a
    # 2  1.5  b
    # 3  2.0  c
    # 4  2.5  d

print(df)              # unchanged
    #     a    b  c
    # r1  1  1.0  a
    # r2  2  1.5  b
    # r3  3  2.0  c
    # r4  4  2.5  d

df.set_index('a', inplace=True)

print(df)

    #      b  c
    # a
    # 1  1.0  a
    # 2  1.5  b
    # 3  2.0  c
    # 4  2.5  d

DataFrame and Series as list, set, etc.

DataFrames behave as you might expect when converted to any Python container

df = pd.DataFrame( {'a': [1, 2, 3, 4],
                    'b': [1.0, 1.5, 2.0, 2.5],
                    'c': ['a', 'b', 'b', 'a'] }, index=['r1', 'r2', 'r3', 'r4'] )


print(len(df))             # 4

print(len(df.columns))     # 3

print(max(df['a']))        # 4

print(list(df['a']))       # [1, 2, 3, 4]     (column for 'a')

print(list(df.loc['r2']))  # [2, 1.5, 'b']   (row for 'r2')

print(set(df['c']))        # {'b', 'a'}       (a set of unique values)

DataFrame .values -- convert to a list of numpy arrays

An numpy array is a list-like object. simple list comprehension could convert these to a list of lists:

print((df.values))

    # array([[1, 1.0, 'a'],
    #        [2, 1.5, 'b'],
    #        [3, 2.0, 'b'],
    #        [4, 2.5, 'a']], dtype=object)


lol = list( [ list(item) for item in df.values ])

print(lol)

                          # [ [1, 1.0, 'a'],
                          #   [2, 1.5, 'b'],
                          #   [3, 2.0, 'b'],
                          #   [4, 2.5, 'a'] ]

looping - loops through columns

for colname in df:
    print('{}:  {}'.format(colname, type(df[colname])))

                          # a:  <pandas.core.series.Series>
                          # b:  <pandas.core.series.Series>
                          # c:  <pandas.core.series.Series>


# looping with iterrows -- loops through rows
for row_index, row_series in df.iterrows():
    print('{}:  {}'.format(row_index, type(row_series)))

                          # r1:  <pandas.core.series.Series>
                          # r2:  <pandas.core.series.Series>
                          # r3:  <pandas.core.series.Series>
                          # r4:  <pandas.core.series.Series>

Although keep in mind that we generally prefer vectorized operations across columns or rows to looping.

[pr]