Python 3

home

Complex Sorting

Summary Function: sorted()

sorted() takes a sequence argument and returns a sorted list. The sequence items are sorted according to their respective types.

sorted() with numbers

mylist = [4, 3, 9, 1, 2, 5, 8, 6, 7]

sorted_list = sorted(mylist)
print(sorted_list)                # [1, 2, 3, 4, 5, 6, 7, 8, 9]

sorted() with strings

namelist = ['jo', 'pete', 'michael', 'zeb', 'avram']

print(sorted(namelist))           # ['avram', 'jo', 'michael', 'pete', 'zeb']

Summary Task (Review): sorting a dictionary's keys

Sorting a dict means sorting the keys -- sorted() returns a list of sorted keys.

bowling_scores = {'jeb': 123, 'zeb': 98, 'mike': 202, 'janice': 184}

sorted_keys = sorted(bowling_scores)

print(sorted_keys)                     # ['janice', 'jeb', 'mike', 'zeb']

Indeed, any "listy" sort of operation on a dict assumes the keys: for looping, subscripting, sorted(); even sum(), max() and min().

Summary Task (Review): sorting a dictionary's keys by its values

The dict get() method returns a value based on a key -- perfect for sorting keys by values.

bowling_scores = {'jeb': 123, 'zeb': 98, 'mike': 202, 'janice': 184}

sorted_keys = sorted(bowling_scores, key=bowling_scores.get)

print(sorted_keys)                     # ['zeb', 'jeb', 'janice', 'mike']


for player in sorted_keys:
      print(f"{player} scored {bowling_scores[player]}")

        ##  zeb scored 98
        ##  jeb scored 123
        ##  janice scored 184
        ##  mike scored 202

Summary Feature: Custom sort using an sorting helper function

A sorting helper function returns to python the value by which a given element should be sorted.

Here is the same dict sorted by value in the same way as previously, through a custom sorting helper function.

def by_value(dict_key):                       # a key to be sorted
                                              # (for example, 'mike'

    dict_value = bowling_scores[dict_key]     # retrieving the value based on
                                              #  'mike':  202

    return dict_value                         # returning the value 202

bowling_scores = {'jeb': 123, 'zeb': 98, 'mike': 202, 'janice': 184}
sorted_keys = sorted(bowling_scores, key=by_value)

print(sorted_keys)          # ['zeb', 'jeb', 'janice', 'mike']

The dict's keys are sorted by value because of the by_value() sorting helper function: 1. sorted() sees by_value referenced in the function call. 2. sorted() calls the by_value() four times: once with each key in the dict. 3. by_value() is called with 'jeb' (which returns 123), 'zeb' (which returns 98), 'mike' (which returns 202), and 'janice' (which returns 184). 4. The return value of the function is the value by which the key will be sorted Therefore because of the return value of the sorting helper function, jeb will be sorted by 123, zeb by 98, etc.

Summary Task: sort a numeric string by its numeric value

Numeric strings (as we might receive from a file) sort alphabetically:

numbers_from_file = ['1', '10', '3', '20', '110', '1000' ]
sorted_numbers = sorted(numbers_from_file)

print(sorted_numbers)    # ['1', '1000', '110', '20', '3'] (alphabetic sort)

To sort numerically, the sorting helper function can convert to int or float.

def by_numeric_value(this_string):
    return int(this_string)

numbers_from_file = ['1', '10', '3', '20', '110', '1000' ]
sorted_numbers = sorted(numbers_from_file, key=by_numeric_value)

print(sorted_numbers)    # ['1', '3', '10', '20', '110', '1000']

Note that the values returned do not change; they are simply sorted by their integer equivalent.

Summary Task: sort a string by its case-insensitive value

Python string sorting sorts uppercase before lowercase:

namelist = ['Jo', 'pete', 'Michael', 'Zeb', 'avram']
print(sorted(namelist))            # ['Jo', 'Michael', 'Zeb', 'avram', 'pete']

To sort "insensitively", the sorting helper function can lowercase each string.

def by_lowercase(my_string):
    return my_string.lower()

namelist = ['Jo', 'pete', 'michael', 'Zeb', 'avram']

print(sorted(namelist, key=by_lowercase))

                                  # ['avram', 'Jo', 'michael', 'pete', 'Zeb']

Summary Task: sort a string by a portion of the string

To sort a string by a portion of the string (for example, the last name in these 2-word names), we can split or slice the string and return the portion.

full_names = ['Jeff Wilson', 'Abe Zimmerman', 'Zoe Apple', 'Will Jefferson']

def by_last_name(fullname):
    fname, lname = fullname.split()
    return lname

sfn = sorted(full_names, key=by_last_name)

print(sfn)                                     #  ['Zoe Apple',
                                               #   'Will Jefferson',
                                               #   'Jeff Wilson',
                                               #   'Abe Zimmerman']

Summary Task: sort a file line by a field within the line

To sort a string of fields (for example, a CSV line) by a field within the line, we can split() and return a field from the split.

def by_third_field(this_line):
    els = this_line.split(',')
    return els[2]

lines = open('students.txt')
sorted_lines = sorted(lines, key=by_third_field)
print(sorted_lines)


        # [ 'pk669,Pete,Krank,Darkling,NJ,8044894893\n',
        #   'ms15,Mary,Smith,Wilsontown,NY,5185853892\n',
        #   'jw234,Joe,Wilson,Smithtown,NJ,2015585894\n'    ]

Summary Task: custom sort using a built-in function as the sorting helper function

Built-in functions can be used to help sorted() decide how to sort in the same way as custom functions -- by telling Python to pass an element and sort by the return value.

len() returns string length - so it can be used to sort strings by length

mystrs = ['angie', 'zachary', 'zeb', 'annabelle']

print(sorted(mystrs, key=len))      # ['zeb', 'angie', 'zachary', 'annabelle']

Using a builtin function

os.path.getsize() returns the byte size of any file based on its name (in this example, in the present working directory):

import os

print(os.path.getsize('test.txt'))  # return 53, the byte size of test.txt

To sort files by their sizes, we can simply pass this function to sorted()

import os

files = ['test.txt', 'myfile.txt', 'data.csv', 'bigfile.xlsx']

                                    # some files in my current dir

size_files = sorted(files, key=os.path.getsize)
                                    # pass each file to getsize()

for this_file in size_files:
      print("{this_file}:  {os.path.getsize(this_file)} bytes")

(Please note that this will only work if your terminal's present working directory is the same as the files being sorted. Otherwise, you would have to prepend the path -- see File I/O, later in this course.)

Using methods

namelist = ['Jo', 'pete', 'michael', 'Zeb', 'avram']
print(sorted(namelist, key=str.lower))

                                  # ['avram', 'Jo', 'michael', 'pete', 'Zeb']

Using methods called on existing objects

companydict = {'IBM': 18.68, 'Apple': 50.56, 'Google': 21.3}

revc = sorted(companydict, key=companydict.get)   # [ 'IBM',
                                                         #   'Google',
                                                         #   'Apple'   ]

You can use a method here in the same way you would use a function, except that you won't be specifying the specific object as you would normally with a method. To refer to a method "in the abstract", you can say str.upper or str.lower. However, make sure not to actually call the method (which is done with the parentheses). Instead, you simply refer to the method, i.e., mention the method without using the parentheses.)

Sorting with lambda custom function

Functions are useful but they require that we declare them separately, usually elsewhere in our code.

A lambda is a function that is defined in a single statement. As a single statement, a lambda can be placed in data structures or passed as arguments in function calls. The advantage here is that our lambda function will be used exactly where it is defined, and in using them we don't have to maintain separate statements. A common use of lambda is in sorting. The format for lambdas is lambda arg: return_val. Compare each pair of regular function and lambda, and note the argument and return val in each.

def by_lastname(name):
    fname, lname = name.split()
    return lname

names = [ 'Josh Peschko', 'Gabriel Feghali', 'Billy Woods', 'Arthur Fischer-Zernin' ]
sortednames = sorted(names, key=lambda name:  name.split()[1])

In the above, the label after lambda is the argument, and the expression that follows the colon is the return value. So in the example the lambda argument is name, and the lambda returns name.split()[1]. See how it behaves exactly like the regular function itself? Again, what is the advantage of lambdas? They allow us to design our own functions which can be placed inline, where a named function would go. This is a convenience, not a necessity. But they are in common use, so they must be understood by any serious programmer.

Lambda expressions: breaking them down

Lambdas seem hard at first, but are quite simple.

Many people have reported that they found lambdas to be challenging to understand, but they're really very simple - they're just so short they're hard to read. Compare these two functions, both of which add/concatenate their arguments:

def addthese(x, y):
    return x + y

addthese2 = lambda x, y:  x + y

print(addthese(5, 9))        # 14
print(addthese2(5, 9))       # 14

The function definition and the lambda statement are equivalent - they both produce a function with the same functionality.

Lambda expression example: dict.get and operator.itemgetter

Here are our standard methods to sort a dictionary:

import operator
mydict = { 'a': 5, 'b': 2, 'c': 1, 'z': 0 }
for key, val in sorted(mydict.items(), key=operator.itemgetter(1)):
    print(f"{key} = {val}")

for key in sorted(mydict, key=mydict.get):
    print(f"{key} = {mydict[key]}")

Imagine we didn't have access to dict.get and operator.itemgetter. What could we do?

mydict = { 'a': 5, 'b': 2, 'c': 1, 'z': 0 }
for key, val in sorted(mydict.items(), key=lambda keyval:  keyval[1]):
    print(f"{key} = {val}")

for key in sorted(mydict, key=lambda key:  mydict[key]):
    print(f"{key} = {mydict[key]}")

These lambdas do exactly what their built-in counterparts do: in the case of operator.itemgetter, take a 2-element tuple as an argument and return the 2nd element in the case of dict.get, take a key and return the associated value from the dict

Summary task: sorting dict of dicts with custom function

Having built multidimensional structures in various configurations, we should now learn how to sort them -- for example, to sort the keys in a dictionary of dictionaries by one of the values in the inner dictionary (in this instance, the last name):

(The "thing to sort" is a dict key. The "value by which it should be sorted" is a value within the dict associated with that key, in this case the 'lname' value.)

def by_last_name(key):
    return dod[key]['lname']

dod = {
         'db13':  {
                     'fname': 'Joe',
                     'lname': 'Wilson',
                     'tel':   '9172399895'
                  },
         'mm23':  {
                     'fname': 'Mary',
                     'lname': 'Doodle',
                     'tel':   '2122382923'
                  }
       }

sorted_keys = sorted(dod, key=by_last_name)
print(sorted_keys)                             # ['mm23', 'db13']

The trick here will be to put together what we know about obtaining the value from an inner structure with what we have learned about custom sorting.

Summary task: sorting list of dicts with custom function

Similar to itemgetter, we may want to sort a complex structure by some inner value - If we have a list of dicts to sort, we can use the custom sub to specify the sort value from inside each dict.

(The "thing to sort" is a dict. The "value by which it should be sorted" is the 'lname' in the dict.)

def by_dict_lname(this_dict):
    return this_dict['lname'].lower()

list_of_dicts = [
    { 'id': 123,
      'fname': 'Joe',
      'lname': 'Wilson',
    },
    { 'id': 124,
      'fname': 'Sam',
      'lname': 'Jones',
    },
    { 'id': 125,
      'fname': 'Pete',
      'lname': 'abbott',
    },
]
list_of_dicts.sort(key=by_dict_lname)      # custom sort function (above)
for this_dict in list_of_dicts:
    print(f"{this_dict['fname']} {this_dict['lname']}")

# Pete abbot
# Sam Jones
# Joe Wilson

So, although we are sorting dicts, our sub says "take this dictionary and sort by this inner element of the dictionary".

Sidebar: cascading sort

Sort a list by multiple criteria by having your sorting helper function return a 2-element tuple.

def by_last_first(name):
    fname, lname = name.split()
    return (lname, fname)

names = ['Zeb Will', 'Deb Will', 'Joe Max', 'Ada Max']

lnamesorted = sorted(names, key=by_last_first)

                             # ['Ada Max', 'Joe Max', 'Deb Will', 'Zeb Will']

Sorting review

A quick review of sorting: recall how Python will perform a default sort (numeric or ASCII-betical) depending on the objects sorted. If we wish to modify this behavior, we can pass each element to a function named by the key= parameter:

mylist = ['Alpha', 'Gamma', 'episilon', 'beta', 'Delta']

print(sorted(mylist))                      # ASCIIbetical sort
                                          # ['Alpha', 'Gamma', 'Delta', 'beta', 'epsilon']

mylist.sort()                             # sort mylist in-place

print(sorted(mylist, key=str.lower))       # alphabetical sort
                                          # (lowercasing each item by telling Python to pass it
                                          # to str.lower)
                                          # ['Alpha', 'beta', 'Delta', 'epsilon', 'Gamma']

print(sorted(mylist, key=len))             # sort by length
                                          # ['beta', 'Alpha', 'Gamma', 'Delta', 'epsilon']

Sorting review: sorting dictionary keys by value: dict.get

When we loop through a dict, we can loop through a list of keys (and use the keys to get values) or loop through items, a list of (key, value) tuple pairs. When sorting a dictionary by the values in it, we can also choose to sort keys or items.

To sort keys, mydict.get is called with each key - and get returns the associated value. So the keys of the dictionary are sorted by their values.

mydict = { 'a': 5, 'b': 2, 'c': 1, 'z': 0 }
mydict_sorted_keys = sorted(mydict, key=mydict.get)
for i in mydict_sorted_keys:
    print(f"{i} = {mydict[i]}")

                 ## z = 0
                 ## c = 1
                 ## b = 2
                 ## a = 5

Sorting dictionary items by value: operator.itemgetter

Recall that we can render a dictionary as a list of tuples with the dict.items() method:

mydict = { 'a': 5, 'b': 2, 'c': 1, 'z': 0 }
mydict_items = list(mydict.items())                        # [(a, 5), (c, 1), (b, 2), (z, 0)]

To sort dictionary items by value, we need to sort each two-element tuple by its second element. The built-in module operator.itemgetter will return whatever element of a sequence we wish - in this way it is like a subscript, but in function format (so it can be called by the Python sorting algorithm).

import operator
mydict = { 'a': 5, 'b': 2, 'c': 1, 'z': 0 }
mydict_items = mydict.items()                        # [(a, 5), (c, 1), (b, 2), (z, 0)]
mydict_items.sort(key=operator.itemgetter(1))
print(mydict_items)                                  # [(z, 0), (c, 1), (b, 2), (a, 5)]
for key, val in mydict_items:
    print(f"{key} = {val}")

                    ## z = 0
                    ## c = 1
                    ## b = 2
                    ## a = 5

The above can be conveniently combined with looping, effectively allowing us to loop through a "sorted" dict:

for key, val in sorted(mydict.items(), key=operator.itemgetter(1)):
    print(f"{key} = {val}")

Database results come as a list of tuples. Perhaps we want our results sorted in different ways, so we can store as a list of tuples and sort using operator.itemgetter. This example sorts by the third field, then by the second field (last name, then first name):

import operator
items =[ (123, 'Joe', 'Wilson', 35, 'mechanic'),
         (124, 'Sam', 'Jones', 22, 'mechanic'),
         (125, 'Pete', 'Jones', 40, 'mechanic'),
         (126, 'Irina', 'Bibi', 31, 'mechanic'),
       ]
items.sort(key=operator.itemgetter(2,1)) # sorts by last, first name
for this_pair in items:
    print(f"{this_pair[1]} {this_pair[2]}")

       ## Irina Bibi
       ## Pete Jones
       ## Sam Jones
       ## Joe Wilson

Sorting Multidimensionals with lambda Custom Function

As with all custom sorting, simply consider what you are sorting and what value you would like to sort by.

list_of_dicts = [
    { 'id': 123,
      'fname': 'Joe',
      'lname': 'Wilson',
    },
    { 'id': 124,
      'fname': 'Sam',
      'lname': 'Jones',
    },
    { 'id': 125,
      'fname': 'Pete',
      'lname': 'abbott',
    },
]

def by_dict_lname(this_dict):
    return this_dict['lname'].lower()

sortedlenstrs = sorted(list_of_dicts, key=lambda this_dict:  this_dict['lname'].lower())

Remember that the label after lambda is the argument, and the expression that follows the colon is the return value. This is a list of dicts, so sorting the list means that each item to be sorted is a dict. This means that the argument to the lambda will be one dict, and the value after the colon should be the value in the dict by which we would like to sort.

Sorting Multidimensional Structures

def by_last_name(key):
    return dod[key]['lname']

dod = {
         'db13':  {
                     'fname': 'Joe',
                     'lname': 'Wilson',
                     'tel':   '9172399895'
                  },
         'mm23':  {
                     'fname': 'Mary',
                     'lname': 'Doodle',
                     'tel':   '2122382923'
                  }
       }

sorted_keys = sorted(dod, key=by_last_name)
print(sorted_keys)                             # ['mm23', 'db13']

The trick here will be to put together what we know about obtaining the value from an inner structure with what we have learned about custom sorting.

Sorting review

mylist = ['Alpha', 'Gamma', 'episilon', 'beta', 'Delta']

print(sorted(mylist))                      # ASCIIbetical sort
                                          # ['Alpha', 'Gamma', 'Delta', 'beta', 'epsilon']

mylist.sort()                             # sort mylist in-place

print(sorted(mylist, key=str.lower))       # alphabetical sort
                                          # (lowercasing each item by telling Python to pass it
                                          # to str.lower)
                                          # ['Alpha', 'beta', 'Delta', 'epsilon', 'Gamma']

print(sorted(mylist, key=len))             # sort by length
                                          # ['beta', 'Alpha', 'Gamma', 'Delta', 'epsilon']

Sorting review: sorting dictionary keys by value: dict.get

To sort keys, mydict.get is called with each key - and get returns the associated value. So the keys of the dictionary are sorted by their values.

mydict = { 'a': 5, 'b': 2, 'c': 1, 'z': 0 }
mydict_sorted_keys = sorted(mydict, key=mydict.get)
for i in mydict_sorted_keys:
    print("{0} = {1}".format(i, mydict[i]))

                 ## z = 0
                 ## c = 1
                 ## b = 2
                 ## a = 5

Multidimensional structures: sorting with custom function

Similar to itemgetter, we may want to sort a complex structure by some inner value - in the case of itemgetter we sorted a whole tuple by its third value. If we have a list of dicts to sort, we can use the custom sub to specify the sort value from inside each dict:

def by_dict_lname(this_dict):
    return this_dict['lname'].lower()

list_of_dicts = [
    { 'id': 123,
      'fname': 'Joe',
      'lname': 'Wilson',
    },
    { 'id': 124,
      'fname': 'Sam',
      'lname': 'Jones',
    },
    { 'id': 125,
      'fname': 'Pete',
      'lname': 'abbott',
    },
]
list_of_dicts.sort(key=by_dict_lname)      # custom sort function (above)
for this_dict in list_of_dicts:
    print("{0} {1}".format(this_dict['fname'], this_dict['lname']))

# Pete abbot
# Sam Jones
# Joe Wilson

So, although we are sorting dicts, our sub says "take this dictionary and sort by this inner element of the dictionary".

Multidimensional structures: sorting with lambda custom function

Functions are useful but they require that we declare them separately, elsewhere in our code. A lambda is a function in a single statement, and can be placed in data structures or passed as arguments in function calls. The advantage here is that our function is used exactly where it is defined, and we don't have to maintain separate statements.

A common use of lambda is in sorting. The format for lambdas is lambda arg: return_val. Compare each pair of regular function and lambda, and note the argument and return val in each.

def by_lastname(name):
    fname, lname = name.split()
    return lname

names = [ 'Josh Peschko', 'Gabriel Feghali', 'Billy Woods', 'Arthur Fischer-Zernin' ]
sortednames = sorted(names, key=lambda name:  name.split()[1])


list_of_dicts = [
    { 'id': 123,
      'fname': 'Joe',
      'lname': 'Wilson',
    },
    { 'id': 124,
      'fname': 'Sam',
      'lname': 'Jones',
    },
    { 'id': 125,
      'fname': 'Pete',
      'lname': 'abbott',
    },
]

def by_dict_lname(this_dict):
    return this_dict['lname'].lower()

sortedlenstrs = sorted(list_of_dicts, key=lambda this_dict:  this_dict['lname'].lower())

In each, the label after lambda is the argument, and the expression that follows the colon is the return value. So in the first example, the lambda argument is name, and the lambda returns name.split()[1]. See how it behaves exactly like the regular function itself? Again, what is the advantage of lambdas? They allow us to design our own functions which can be placed inline, where a named function would go. This is a convenience, not a necessity. But they are in common use, so they must be understood by any serious programmer.

Lambda expressions: breaking them down

Many people have complained that lambdas are hard to grok (absorb), but they're really very simple - they're just so short they're hard to read. Compare these two functions, both of which add/concatenate their arguments:

def addthese(x, y):
    return x + y

addthese2 = lambda x, y:  x + y

print(addthese(5, 9))        # 14
print(addthese2(5, 9))       # 14

The function definition and the lambda statement are equivalent - they both produce a function with the same functionality.

Lambda expression example: dict.get and operator.itemgetter

Here are our standard methods to sort a dictionary:

import operator
mydict = { 'a': 5, 'b': 2, 'c': 1, 'z': 0 }
for key, val in sorted(list(mydict.items()), key=operator.itemgetter(1)):
    print("{0} = {1}".format(key, val))

for key in sorted(mydict, key=mydict.get):
    print("{0} = {1}".format(key, mydict[key]))

Imagine we didn't have access to dict.get and operator.itemgetter. What could we do?

mydict = { 'a': 5, 'b': 2, 'c': 1, 'z': 0 }
for key, val in sorted(list(mydict.items()), key=lambda keyval:  keyval[1]):
    print("{0} = {1}".format(key, val))

for key in sorted(mydict, key=lambda key:  mydict[key]):
    print("{0} = {1}".format(key, mydict[key]))

[pr]