Python 3

home

Reading Multidimensional Containers

Introduction: Reading Multidimensional Containers

Data can be expressed in complex ways using nested containers.


Real-world data is often more complex in structure than a simple sequence (i.e., a list) or a collection of pairs (i.e. a dictionary).


Complex data can be structured in Python through the use of multidimensional containers, which are simply containers that contain other containers (lists of lists, lists of dicts, dict of dicts, etc.) in structures of arbitrary complexity. Most of the time we are not called upon to handle structures of greater than 2 dimensions (lists of lists, etc.) although some config and data transmitted between systems (such as API responses) can go deeper. In this unit we'll look at the standard 2-dimensional containers we are more likely to encounter or want to build in our programs.





Example Structure: List of Lists

A list of lists provides a "matrix" structure similar to an Excel spreadsheet.


value_table =       [
                       [ '19260701', 0.09, -0.22, -0.30, 0.009 ],
                       [ '19260702', 0.44, -0.35, -0.08, 0.009 ],
                       [ '19260703', 0.17, 0.26,  -0.37, 0.009 ]

                    ]

Probably used more infrequently, a list of lists allows us to access values through list methods (looping and indexed subscripts). The "outer" list has 3 items -- each items is a list, and each list represents a row of data. Each row list has 4 items, which represent the row data from the Fama-French file: the date, the Mkt-RF, SMB, HML and RF values. Looping through this structure would be very similar to looping through a delimited file, which after all is an iteration of lines that can be split into fields.


for rowlist in value_table:
    print("the MktRF for {rowlist[0]} is {rowlist[1]}")




Example Structure: List of Dicts

A list of dicts structures tabular rows into field-keyed dictionaries.


value_table = [
                { 'date': '19260701', 'MktRF': 0.09, 'SMB': -0.22,
                                      'HML': -0.30, 'RF': 0.009 },

                { 'date': '19260702', 'MktRF': 0.44, 'SMB': -0.35,
                                      'HML': -0.08, 'RF': 0.009 },

                { 'date': '19260706', 'MktRF': 0.17, 'SMB': 0.26,
                                      'HML': -0.37, 'RF': 0.009 }
              ]

The "outer" list contains 3 items, each being a dictionary with identical keys. The keys in each dict correspond to field / column labels from the table, so it's easy to identify and access a given value within a row dict.


A structure like this might look elaborate, but is very easy to build from a data source. The convenience of named subscripts (as contrasted with the numbered subscripts of a list of lists) lets us loop through each row and name the fields we wish to access:

for rowdict in value_table:
    print("the MktRF for {rowdict['date']} is {rowdict['MktRF']}")




Example Structure: Dict of Lists

A dict of lists allows association of a sequence of values with unique keys.


yr_vals = { '1926': [ 0.09,  0.44,  0.17, -0.15, -0.06,
                      -0.55,  0.61,  0.05, 0.51 ],

            '1927': [ -0.97,  0.30,  0.13, -0.18,  0.31,
                      0.39,  0.14, -0.27, 0.05 ],

            '1928': [ 0.43, -0.14, -0.71,  0.61,  0.13,
                      -0.88, -0.85,  0.12, 0.48 ]  }

The "outer" dict contains 3 string keys, each associated with a list of float values -- in this case, the MktRF values from each of the trading days for each year (only the first 9 are included here for clarity). With a structure like this, we can perform calculations like those we have done on this data for a given year, namely to identify the max(), min(), sum(), average, etc. for a given year


for year in yr_vals:
    print(f'for year {year}: ')
    print(f'  len: {len(yr_vals[year])}')
    print(f'  sum: {sum(yr_vals[year])}')
    print(f'  avg: {sum(yr_vals[year]) / len(yr_vals[year])}')




Example Structure: Dict of Dicts

In a dict of dicts, each unique key points to another dict with keys and values.


date_values = {
    '19260701':   { 'MktRF':  0.09,
                    'SMB':   -0.22,
                    'HML':   -0.30,
                    'RF':    0.009 },
    '19260702':   { 'MktRF':  0.44,
                    'SMB':   -0.35,
                    'HML':   -0.08,
                    'RF':    0.009 },
}

The "outer" dict contains string keys, each of which is associated with a dictionary -- each "inner" dictionary is a convenient key/value access to the fields of the table, as we had with a list of dicts.


Again, this structure may seem complex (perhaps even needlessly so?). However, a structure like this is extremely easy to build and is then very convenient to query. For example, the 'HML' value for July 2, 1926 is accessed in a very visual way:

print(date_values['19260702']['HML'])        # -0.08

Looping through a dict of dicts is probably the most challenging part of working with multidimensional structures:


x = {
    'a':  { 'zz': 1,
              'yy': 2  },
      'b':  { 'zz': 5,
              'yy': 10 }
    }

x['a']['yy']  # 2


for i in x:
    print(i)
    for j in x[i]:
        print(x[i][j], end=' ')                          # 1  2  5  10




Example Structure: arbitrary dimensions

Containers can nest in "irregular" configurations, to accomodate more complex orderings of data.


See if you can identify the object type and elements of each of the containers represented below:

conf = [
    {
        "domain": "www.example1.com",
        "database": {
            "host": "localhost1",
            "port": 27017
        },
        "plugins": [
            "plugin1",
            "eslint-plugin-plugin1",
            "plugin2",
            "plugin3"
        ]
    },   # (additional dicts would follow this one in the list)
]

Above we have a list with one item! The item is a dictionary with 3 keys. The "domain" key is associated with a string value. The "database" key is associated with another dictionary of string keys and values. The "plugins" key is associated with a list of strings. Presumably this "outer" list of dicts would have more than one item, and would be followed by additional dictionaries with the same keys and structure as this one.





Retrieving an "inner" element value

Nested subscripts are the usual way to travel "into" a nested structure to obtain a value.


A list of lists

value_table =       [
                       [ '19260701', 0.09, -0.22, -0.30, 0.009 ],
                       [ '19260702', 0.44, -0.35, -0.08, 0.009 ],
                       [ '19260703', 0.17, 0.26,  -0.37, 0.009 ]

                    ]

print(f"SMB for 7/3/26 is {value_table[2][2]}")

A dict of dicts

date_values = {
    '19260701':   { 'MktRF':  0.09,
                    'SMB':   -0.22,
                    'HML':   -0.30,
                    'RF':    0.009 },
    '19260702':   { 'MktRF':  0.44,
                    'SMB':   -0.35,
                    'HML':   -0.08,
                    'RF':    0.009 },
}

MktRF_thisday = date_values['19260701']['MktRF']   # value is 0.09

print(date_values['19260701']['SMB'])               # -0.22
print(date_values['19260701']['HML'])               # -0.3




Looping through a complex structure

Looping through a nested structure often requires an "inner" loop within an "outer" loop.


looping through a list of lists

value_table =       [
                       [ '19260701', 0.09, -0.22, -0.30, 0.009 ],
                       [ '19260702', 0.44, -0.35, -0.08, 0.009 ],
                       [ '19260703', 0.17, 0.26,  -0.37, 0.009 ]

                    ]

for row in value_table:
    print(f"MktRF for {row[0]} is {row[1]}")

looping through a dict of dicts

date_values = {
    '19260701':   { 'MktRF':  0.09,
                    'SMB':   -0.22,
                    'HML':   -0.30,
                    'RF':    0.009 },
    '19260702':   { 'MktRF':  0.44,
                    'SMB':   -0.35,
                    'HML':   -0.08,
                    'RF':    0.009 },
}

for this_date in date_values:
    print(f"MktRF for {this_date} is {date_values[this_date]['MktRF']}")




Reading and Writing to Files with JSON

JavaScript Object Notation is a simple "data interchange" format for sending structured data through text.


Structured simply means that the data is organized into standard programmatic containers (lists and dictionaries). In fact, JSON uses the same notation as Python (and vice versa) so it is immediately recognizable to us. Once data is loaded from JSON, it takes the form of a standard Python multidimensional structure.


Here is some simple JSON with an arbitrary structure, saved into a file called mystruct.json:

{
   "key1":  ["a", "b", "c"],
   "key2":  {
              "innerkey1": 5,
              "innerkey2": "woah"
            },
   "key3":  55.09,
   "key4":  "hello"
}

Initializing a Python structure read from JSON


We can load this structure from a file or read it from a string:

import json

fh = open('mystruct.json')          # open file in 'binary' mode
mys = json.load(fh)                 # load from a file
fh.close()

fh = open('mystruct.json')
file_text = fh.read()

mys = json.loads(file_text)         # load from a string

fh.close()

print(mys['key2']['innerkey2'])     # woah

Note: although it resembles Python structures, JSON notation is slightly less forgiving than Python -- for example, double quotes are required around strings, and no trailing comma is allowed after the last element in a dict or list (Python allows this).


For example, I added a comma to the end of the outer dict in the example above:

  "key4":  "hello",

When I then tried to load it, the json module complained with a helpfully correct location:

ValueError: Expecting property name: line 9 column 1 (char 164)

Dumping a Python structure to JSON


json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}], indent=4)

json.dump(['streaming API'], io, indent=4)

indent=4 will write the structure in an indented and thus more readable format.





"pretty-printing" with json.dumps

json.dumps returns a complex structure as a string.


import json

dvs = {'19260701': {'HML': -0.3, 'RF': 0.009, 'MktRF': 0.09, 'SMB': -0.22},
'19260702': {'HML': -0.08, 'RF': 0.009, 'MktRF': 0.44, 'SMB': -0.35}}

pretty = json.dumps(dvs, indent=4)

    # {
    #     "19260701": {
    #         "HML": -0.3,
    #         "RF": 0.009,
    #         "MktRF": 0.09,
    #         "SMB": -0.22
    #     },
    #     "19260702": {
    #         "HML": -0.08,
    #         "RF": 0.009,
    #         "MktRF": 0.44,
    #         "SMB": -0.35
    #     }
    # }




[pr]