Python 3home |
sorted() takes a sequence argument and returns a sorted list. The sequence items are sorted according to their respective types.
sorted() with numbers
mylist = [4, 3, 9, 1, 2, 5, 8, 6, 7]
sorted_list = sorted(mylist)
print(sorted_list) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
sorted() with strings
namelist = ['jo', 'pete', 'michael', 'zeb', 'avram']
print(sorted(namelist)) # ['avram', 'jo', 'michael', 'pete', 'zeb']
Sorting a dict means sorting the keys -- sorted() returns a list of sorted keys.
bowling_scores = {'jeb': 123, 'zeb': 98, 'mike': 202, 'janice': 184}
sorted_keys = sorted(bowling_scores)
print(sorted_keys) # ['janice', 'jeb', 'mike', 'zeb']
Indeed, any "listy" sort of operation on a dict assumes the keys: for looping, subscripting, sorted(); even sum(), max() and min().
The dict get() method returns a value based on a key -- perfect for sorting keys by values.
bowling_scores = {'jeb': 123, 'zeb': 98, 'mike': 202, 'janice': 184}
sorted_keys = sorted(bowling_scores, key=bowling_scores.get)
print(sorted_keys) # ['zeb', 'jeb', 'janice', 'mike']
for player in sorted_keys:
print(f"{player} scored {bowling_scores[player]}")
## zeb scored 98
## jeb scored 123
## janice scored 184
## mike scored 202
A sorting helper function returns to python the value by which a given element should be sorted.
Here is the same dict sorted by value in the same way as previously, through a custom sorting helper function.
def by_value(dict_key): # a key to be sorted
# (for example, 'mike'
dict_value = bowling_scores[dict_key] # retrieving the value based on
# 'mike': 202
return dict_value # returning the value 202
bowling_scores = {'jeb': 123, 'zeb': 98, 'mike': 202, 'janice': 184}
sorted_keys = sorted(bowling_scores, key=by_value)
print(sorted_keys) # ['zeb', 'jeb', 'janice', 'mike']
The dict's keys are sorted by value because of the by_value() sorting helper function: 1. sorted() sees by_value referenced in the function call. 2. sorted() calls the by_value() four times: once with each key in the dict. 3. by_value() is called with 'jeb' (which returns 123), 'zeb' (which returns 98), 'mike' (which returns 202), and 'janice' (which returns 184). 4. The return value of the function is the value by which the key will be sorted Therefore because of the return value of the sorting helper function, jeb will be sorted by 123, zeb by 98, etc.
Numeric strings (as we might receive from a file) sort alphabetically:
numbers_from_file = ['1', '10', '3', '20', '110', '1000' ]
sorted_numbers = sorted(numbers_from_file)
print(sorted_numbers) # ['1', '1000', '110', '20', '3'] (alphabetic sort)
To sort numerically, the sorting helper function can convert to int or float.
def by_numeric_value(this_string):
return int(this_string)
numbers_from_file = ['1', '10', '3', '20', '110', '1000' ]
sorted_numbers = sorted(numbers_from_file, key=by_numeric_value)
print(sorted_numbers) # ['1', '3', '10', '20', '110', '1000']
Note that the values returned do not change; they are simply sorted by their integer equivalent.
Python string sorting sorts uppercase before lowercase:
namelist = ['Jo', 'pete', 'Michael', 'Zeb', 'avram']
print(sorted(namelist)) # ['Jo', 'Michael', 'Zeb', 'avram', 'pete']
To sort "insensitively", the sorting helper function can lowercase each string.
def by_lowercase(my_string):
return my_string.lower()
namelist = ['Jo', 'pete', 'michael', 'Zeb', 'avram']
print(sorted(namelist, key=by_lowercase))
# ['avram', 'Jo', 'michael', 'pete', 'Zeb']
To sort a string by a portion of the string (for example, the last name in these 2-word names), we can split or slice the string and return the portion.
full_names = ['Jeff Wilson', 'Abe Zimmerman', 'Zoe Apple', 'Will Jefferson']
def by_last_name(fullname):
fname, lname = fullname.split()
return lname
sfn = sorted(full_names, key=by_last_name)
print(sfn) # ['Zoe Apple',
# 'Will Jefferson',
# 'Jeff Wilson',
# 'Abe Zimmerman']
To sort a string of fields (for example, a CSV line) by a field within the line, we can split() and return a field from the split.
def by_third_field(this_line):
els = this_line.split(',')
return els[2]
lines = open('students.txt')
sorted_lines = sorted(lines, key=by_third_field)
print(sorted_lines)
# [ 'pk669,Pete,Krank,Darkling,NJ,8044894893\n',
# 'ms15,Mary,Smith,Wilsontown,NY,5185853892\n',
# 'jw234,Joe,Wilson,Smithtown,NJ,2015585894\n' ]
Built-in functions can be used to help sorted() decide how to sort in the same way as custom functions -- by telling Python to pass an element and sort by the return value.
len() returns string length - so it can be used to sort strings by length
mystrs = ['angie', 'zachary', 'zeb', 'annabelle']
print(sorted(mystrs, key=len)) # ['zeb', 'angie', 'zachary', 'annabelle']
Using a builtin function
os.path.getsize() returns the byte size of any file based on its name (in this example, in the present working directory):
import os
print(os.path.getsize('test.txt')) # return 53, the byte size of test.txt
To sort files by their sizes, we can simply pass this function to sorted()
import os
files = ['test.txt', 'myfile.txt', 'data.csv', 'bigfile.xlsx']
# some files in my current dir
size_files = sorted(files, key=os.path.getsize)
# pass each file to getsize()
for this_file in size_files:
print("{this_file}: {os.path.getsize(this_file)} bytes")
(Please note that this will only work if your terminal's present working directory is the same as the files being sorted. Otherwise, you would have to prepend the path -- see File I/O, later in this course.)
Using methods
namelist = ['Jo', 'pete', 'michael', 'Zeb', 'avram']
print(sorted(namelist, key=str.lower))
# ['avram', 'Jo', 'michael', 'pete', 'Zeb']
Using methods called on existing objects
companydict = {'IBM': 18.68, 'Apple': 50.56, 'Google': 21.3}
revc = sorted(companydict, key=companydict.get) # [ 'IBM',
# 'Google',
# 'Apple' ]
You can use a method here in the same way you would use a function, except that you won't be specifying the specific object as you would normally with a method. To refer to a method "in the abstract", you can say str.upper or str.lower. However, make sure not to actually call the method (which is done with the parentheses). Instead, you simply refer to the method, i.e., mention the method without using the parentheses.)
Functions are useful but they require that we declare them separately, usually elsewhere in our code.
A lambda is a function that is defined in a single statement. As a single statement, a lambda can be placed in data structures or passed as arguments in function calls. The advantage here is that our lambda function will be used exactly where it is defined, and in using them we don't have to maintain separate statements. A common use of lambda is in sorting. The format for lambdas is lambda arg: return_val. Compare each pair of regular function and lambda, and note the argument and return val in each.
def by_lastname(name):
fname, lname = name.split()
return lname
names = [ 'Josh Peschko', 'Gabriel Feghali', 'Billy Woods', 'Arthur Fischer-Zernin' ]
sortednames = sorted(names, key=lambda name: name.split()[1])
In the above, the label after lambda is the argument, and the expression that follows the colon is the return value. So in the example the lambda argument is name, and the lambda returns name.split()[1]. See how it behaves exactly like the regular function itself? Again, what is the advantage of lambdas? They allow us to design our own functions which can be placed inline, where a named function would go. This is a convenience, not a necessity. But they are in common use, so they must be understood by any serious programmer.
Lambdas seem hard at first, but are quite simple.
Many people have reported that they found lambdas to be challenging to understand, but they're really very simple - they're just so short they're hard to read. Compare these two functions, both of which add/concatenate their arguments:
def addthese(x, y):
return x + y
addthese2 = lambda x, y: x + y
print(addthese(5, 9)) # 14
print(addthese2(5, 9)) # 14
The function definition and the lambda statement are equivalent - they both produce a function with the same functionality.
Here are our standard methods to sort a dictionary:
import operator
mydict = { 'a': 5, 'b': 2, 'c': 1, 'z': 0 }
for key, val in sorted(mydict.items(), key=operator.itemgetter(1)):
print(f"{key} = {val}")
for key in sorted(mydict, key=mydict.get):
print(f"{key} = {mydict[key]}")
Imagine we didn't have access to dict.get and operator.itemgetter. What could we do?
mydict = { 'a': 5, 'b': 2, 'c': 1, 'z': 0 }
for key, val in sorted(mydict.items(), key=lambda keyval: keyval[1]):
print(f"{key} = {val}")
for key in sorted(mydict, key=lambda key: mydict[key]):
print(f"{key} = {mydict[key]}")
These lambdas do exactly what their built-in counterparts do: in the case of operator.itemgetter, take a 2-element tuple as an argument and return the 2nd element in the case of dict.get, take a key and return the associated value from the dict
Having built multidimensional structures in various configurations, we should now learn how to sort them -- for example, to sort the keys in a dictionary of dictionaries by one of the values in the inner dictionary (in this instance, the last name):
(The "thing to sort" is a dict key. The "value by which it should be sorted" is a value within the dict associated with that key, in this case the 'lname' value.)
def by_last_name(key):
return dod[key]['lname']
dod = {
'db13': {
'fname': 'Joe',
'lname': 'Wilson',
'tel': '9172399895'
},
'mm23': {
'fname': 'Mary',
'lname': 'Doodle',
'tel': '2122382923'
}
}
sorted_keys = sorted(dod, key=by_last_name)
print(sorted_keys) # ['mm23', 'db13']
The trick here will be to put together what we know about obtaining the value from an inner structure with what we have learned about custom sorting.
Similar to itemgetter, we may want to sort a complex structure by some inner value - If we have a list of dicts to sort, we can use the custom sub to specify the sort value from inside each dict.
(The "thing to sort" is a dict. The "value by which it should be sorted" is the 'lname' in the dict.)
def by_dict_lname(this_dict):
return this_dict['lname'].lower()
list_of_dicts = [
{ 'id': 123,
'fname': 'Joe',
'lname': 'Wilson',
},
{ 'id': 124,
'fname': 'Sam',
'lname': 'Jones',
},
{ 'id': 125,
'fname': 'Pete',
'lname': 'abbott',
},
]
list_of_dicts.sort(key=by_dict_lname) # custom sort function (above)
for this_dict in list_of_dicts:
print(f"{this_dict['fname']} {this_dict['lname']}")
# Pete abbot
# Sam Jones
# Joe Wilson
So, although we are sorting dicts, our sub says "take this dictionary and sort by this inner element of the dictionary".
Sort a list by multiple criteria by having your sorting helper function return a 2-element tuple.
def by_last_first(name):
fname, lname = name.split()
return (lname, fname)
names = ['Zeb Will', 'Deb Will', 'Joe Max', 'Ada Max']
lnamesorted = sorted(names, key=by_last_first)
# ['Ada Max', 'Joe Max', 'Deb Will', 'Zeb Will']
A quick review of sorting: recall how Python will perform a default sort (numeric or ASCII-betical) depending on the objects sorted. If we wish to modify this behavior, we can pass each element to a function named by the key= parameter:
mylist = ['Alpha', 'Gamma', 'episilon', 'beta', 'Delta']
print(sorted(mylist)) # ASCIIbetical sort
# ['Alpha', 'Gamma', 'Delta', 'beta', 'epsilon']
mylist.sort() # sort mylist in-place
print(sorted(mylist, key=str.lower)) # alphabetical sort
# (lowercasing each item by telling Python to pass it
# to str.lower)
# ['Alpha', 'beta', 'Delta', 'epsilon', 'Gamma']
print(sorted(mylist, key=len)) # sort by length
# ['beta', 'Alpha', 'Gamma', 'Delta', 'epsilon']
When we loop through a dict, we can loop through a list of keys (and use the keys to get values) or loop through items, a list of (key, value) tuple pairs. When sorting a dictionary by the values in it, we can also choose to sort keys or items.
To sort keys, mydict.get is called with each key - and get returns the associated value. So the keys of the dictionary are sorted by their values.
mydict = { 'a': 5, 'b': 2, 'c': 1, 'z': 0 }
mydict_sorted_keys = sorted(mydict, key=mydict.get)
for i in mydict_sorted_keys:
print(f"{i} = {mydict[i]}")
## z = 0
## c = 1
## b = 2
## a = 5
Recall that we can render a dictionary as a list of tuples with the dict.items() method:
mydict = { 'a': 5, 'b': 2, 'c': 1, 'z': 0 }
mydict_items = list(mydict.items()) # [(a, 5), (c, 1), (b, 2), (z, 0)]
To sort dictionary items by value, we need to sort each two-element tuple by its second element. The built-in module operator.itemgetter will return whatever element of a sequence we wish - in this way it is like a subscript, but in function format (so it can be called by the Python sorting algorithm).
import operator
mydict = { 'a': 5, 'b': 2, 'c': 1, 'z': 0 }
mydict_items = mydict.items() # [(a, 5), (c, 1), (b, 2), (z, 0)]
mydict_items.sort(key=operator.itemgetter(1))
print(mydict_items) # [(z, 0), (c, 1), (b, 2), (a, 5)]
for key, val in mydict_items:
print(f"{key} = {val}")
## z = 0
## c = 1
## b = 2
## a = 5
The above can be conveniently combined with looping, effectively allowing us to loop through a "sorted" dict:
for key, val in sorted(mydict.items(), key=operator.itemgetter(1)):
print(f"{key} = {val}")
Database results come as a list of tuples. Perhaps we want our results sorted in different ways, so we can store as a list of tuples and sort using operator.itemgetter. This example sorts by the third field, then by the second field (last name, then first name):
import operator
items =[ (123, 'Joe', 'Wilson', 35, 'mechanic'),
(124, 'Sam', 'Jones', 22, 'mechanic'),
(125, 'Pete', 'Jones', 40, 'mechanic'),
(126, 'Irina', 'Bibi', 31, 'mechanic'),
]
items.sort(key=operator.itemgetter(2,1)) # sorts by last, first name
for this_pair in items:
print(f"{this_pair[1]} {this_pair[2]}")
## Irina Bibi
## Pete Jones
## Sam Jones
## Joe Wilson
As with all custom sorting, simply consider what you are sorting and what value you would like to sort by.
list_of_dicts = [
{ 'id': 123,
'fname': 'Joe',
'lname': 'Wilson',
},
{ 'id': 124,
'fname': 'Sam',
'lname': 'Jones',
},
{ 'id': 125,
'fname': 'Pete',
'lname': 'abbott',
},
]
def by_dict_lname(this_dict):
return this_dict['lname'].lower()
sortedlenstrs = sorted(list_of_dicts, key=lambda this_dict: this_dict['lname'].lower())
Remember that the label after lambda is the argument, and the expression that follows the colon is the return value. This is a list of dicts, so sorting the list means that each item to be sorted is a dict. This means that the argument to the lambda will be one dict, and the value after the colon should be the value in the dict by which we would like to sort.
Having built multidimensional structures in various configurations, we should now learn how to sort them -- for example, to sort the keys in a dictionary of dictionaries by one of the values in the inner dictionary (in this instance, the last name):
def by_last_name(key):
return dod[key]['lname']
dod = {
'db13': {
'fname': 'Joe',
'lname': 'Wilson',
'tel': '9172399895'
},
'mm23': {
'fname': 'Mary',
'lname': 'Doodle',
'tel': '2122382923'
}
}
sorted_keys = sorted(dod, key=by_last_name)
print(sorted_keys) # ['mm23', 'db13']
The trick here will be to put together what we know about obtaining the value from an inner structure with what we have learned about custom sorting.
A quick review of sorting: recall how Python will perform a default sort (numeric or ASCII-betical) depending on the objects sorted. If we wish to modify this behavior, we can pass each element to a function named by the key= parameter:
mylist = ['Alpha', 'Gamma', 'episilon', 'beta', 'Delta']
print(sorted(mylist)) # ASCIIbetical sort
# ['Alpha', 'Gamma', 'Delta', 'beta', 'epsilon']
mylist.sort() # sort mylist in-place
print(sorted(mylist, key=str.lower)) # alphabetical sort
# (lowercasing each item by telling Python to pass it
# to str.lower)
# ['Alpha', 'beta', 'Delta', 'epsilon', 'Gamma']
print(sorted(mylist, key=len)) # sort by length
# ['beta', 'Alpha', 'Gamma', 'Delta', 'epsilon']
When we loop through a dict, we can loop through a list of keys (and use the keys to get values) or loop through items, a list of (key, value) tuple pairs. When sorting a dictionary by the values in it, we can also choose to sort keys or items.
To sort keys, mydict.get is called with each key - and get returns the associated value. So the keys of the dictionary are sorted by their values.
mydict = { 'a': 5, 'b': 2, 'c': 1, 'z': 0 }
mydict_sorted_keys = sorted(mydict, key=mydict.get)
for i in mydict_sorted_keys:
print("{0} = {1}".format(i, mydict[i]))
## z = 0
## c = 1
## b = 2
## a = 5
Similar to itemgetter, we may want to sort a complex structure by some inner value - in the case of itemgetter we sorted a whole tuple by its third value. If we have a list of dicts to sort, we can use the custom sub to specify the sort value from inside each dict:
def by_dict_lname(this_dict):
return this_dict['lname'].lower()
list_of_dicts = [
{ 'id': 123,
'fname': 'Joe',
'lname': 'Wilson',
},
{ 'id': 124,
'fname': 'Sam',
'lname': 'Jones',
},
{ 'id': 125,
'fname': 'Pete',
'lname': 'abbott',
},
]
list_of_dicts.sort(key=by_dict_lname) # custom sort function (above)
for this_dict in list_of_dicts:
print("{0} {1}".format(this_dict['fname'], this_dict['lname']))
# Pete abbot
# Sam Jones
# Joe Wilson
So, although we are sorting dicts, our sub says "take this dictionary and sort by this inner element of the dictionary".
Functions are useful but they require that we declare them separately, elsewhere in our code. A lambda is a function in a single statement, and can be placed in data structures or passed as arguments in function calls. The advantage here is that our function is used exactly where it is defined, and we don't have to maintain separate statements.
A common use of lambda is in sorting. The format for lambdas is lambda arg: return_val. Compare each pair of regular function and lambda, and note the argument and return val in each.
def by_lastname(name):
fname, lname = name.split()
return lname
names = [ 'Josh Peschko', 'Gabriel Feghali', 'Billy Woods', 'Arthur Fischer-Zernin' ]
sortednames = sorted(names, key=lambda name: name.split()[1])
list_of_dicts = [
{ 'id': 123,
'fname': 'Joe',
'lname': 'Wilson',
},
{ 'id': 124,
'fname': 'Sam',
'lname': 'Jones',
},
{ 'id': 125,
'fname': 'Pete',
'lname': 'abbott',
},
]
def by_dict_lname(this_dict):
return this_dict['lname'].lower()
sortedlenstrs = sorted(list_of_dicts, key=lambda this_dict: this_dict['lname'].lower())
In each, the label after lambda is the argument, and the expression that follows the colon is the return value. So in the first example, the lambda argument is name, and the lambda returns name.split()[1]. See how it behaves exactly like the regular function itself? Again, what is the advantage of lambdas? They allow us to design our own functions which can be placed inline, where a named function would go. This is a convenience, not a necessity. But they are in common use, so they must be understood by any serious programmer.
Many people have complained that lambdas are hard to grok (absorb), but they're really very simple - they're just so short they're hard to read. Compare these two functions, both of which add/concatenate their arguments:
def addthese(x, y):
return x + y
addthese2 = lambda x, y: x + y
print(addthese(5, 9)) # 14
print(addthese2(5, 9)) # 14
The function definition and the lambda statement are equivalent - they both produce a function with the same functionality.
Here are our standard methods to sort a dictionary:
import operator
mydict = { 'a': 5, 'b': 2, 'c': 1, 'z': 0 }
for key, val in sorted(list(mydict.items()), key=operator.itemgetter(1)):
print("{0} = {1}".format(key, val))
for key in sorted(mydict, key=mydict.get):
print("{0} = {1}".format(key, mydict[key]))
Imagine we didn't have access to dict.get and operator.itemgetter. What could we do?
mydict = { 'a': 5, 'b': 2, 'c': 1, 'z': 0 }
for key, val in sorted(list(mydict.items()), key=lambda keyval: keyval[1]):
print("{0} = {1}".format(key, val))
for key in sorted(mydict, key=lambda key: mydict[key]):
print("{0} = {1}".format(key, mydict[key]))
These lambdas do exactly what their built-in counterparts do: in the case of operator.itemgetter, take a 2-element tuple as an argument and return the 2nd element in the case of dict.get, take a key and return the associated value from the dict