Python 3

home

HTTP and Web Clients

Web Request and Response and the HTTP Protocol

The network protocol requires a shift in thinking about how our programs work.

The Client-Server Networking Protocols When we run a program locally, we expect the program to do some work, print output to the screen, then for to (eventually) stop executing. When we run a program that sends a message over a network (a client program), it relies on another program to respond (a server program). This reliance on another program means that many problems may arise in networking, including blocking conditions (both programs are waiting for the other to say something, or both programs are talking at the same time), or when one program is unable to understand the other. The client-server protocol is a simple understanding between two programs:

The server listens for a request.
The client sends a request to the server and listens for a response.
The server receives the request, processes it and returns a response (then goes back to listening for the next request).
The client receives the response. If the server never responds, it keeps listening for a time, after which the request times out.

HTTP Protocol In addition to understanding the client-server protocol of sending and listening, both programs must "speak the same language", meaning they need a more detailed protocol in order to understand each other. This protocol defines what the client may say when making a request and what the server may say in response. The HTTP (HyperText Transport Protocol) is a client-server protocol that defines how clients and servers will communicate over the internet. It is in use anytime you use a web browser, and it is the reason you see an http:// in front of most URLs. In this session we'll learn how to use our Python program as a client (i.e., to replace the browser), how to construct and send HTTP requests and read HTTP responses.

Basic HTTP Requests and Responses

A request may consist of headers, a URL, parameters, and a body.

Parts of an HTTP Request

URL	the address of the server and the resource requested (i.e., a file or program)
method	(not a Python method) the type of request, usually GET (requesting information from the server) or POST (posting data to the server)
parameters	key/value pairs that appear with the URL in a query string
headers	meta information about the request (date and time, the computer and program making the request, what types of images and files the browser can display, etc.), also as key/value pairs; may include a cookie the client sends to identify itself
body	data being sent to the server, also key/value pairs

Parts of an HTTP Response

response code	a 3-digit code that indicates whether the request succeeded (200), the resource was not found (404), caused an error (500), etc.
headers	meta information about the response and computers involved; may include a cookie to identify this user, to be stored by the client
body	data being returned from the server, also key/value pairs

View the Contents of an HTTP Request

The server-side program 'http_reflect' will show you the contents of your request.

Each of the link and 2 forms below send requests to my server-side program http_reflect, which simply reflects the HTTP data (headers, parameters and body) sent to it. Here is GET request using a link: http://davidbpython.com/cgi-bin/http_reflect?a=1&b=2 Here is form that generates a POST request:

the form	HTML source for the form
"a" parameter: "b" parameter:	<FORM ACTION="http://davidbpython.com/cgi-bin/http_reflect" METHOD="POST"> "a" parameter: <INPUT name="a"><br> "b" parameter: <INPUT name="b"><br> <INPUT type="submit" value="send!"> </FORM>

Here is a file upload:

the form	HTML source for the form
File:	<form enctype = "multipart/form-data" action = "http://davidbpython.com/cgi-bin/http_reflect" method = "post"> <p style="font-size: 24px">File: <input type = "file" name = "filename" /></p> <p style="font-size: 24px"><input type = "submit" value = "Upload" /></p> </form>

the form

HTML source for the form

<form enctype = "multipart/form-data"
                 action = "http://davidbpython.com/cgi-bin/http_reflect" method = "post">
<p style="font-size: 24px">File: <input type = "file" name = "filename" /></p>
<p style="font-size: 24px"><input type = "submit" value = "Upload" /></p>
</form>

Using the requests Module to Make an HTTP Request and Receive a Response

This module can handle most aspects of HTTP interaction with a server.

Basic Example: Download and Save Data

import requests

url = 'https://www.python.org/dev/peps/pep-0020/'   # the Zen of Python (PEP 20)

response = requests.get(url)     # a response object

text = response.text             # text of response


# writing the response to a local file -
# you can open this file in a browser to see it
wfh = open('pep_20.html', 'w')
wfh.write(text)
wfh.close()

More Complex Example: Send Headers, Parameters, Body; Receive Status, Headers, Body

import requests

url = 'http://davidbpython.com/cgi-bin/http_reflect'   # my reflection program

div_bar = '=' * 10


# headers, parameters and message data to be passed to request
header_dict =  { 'Accept': 'text/plain' }          # change to 'text/html' for an HTML response
param_dict =   { 'key1': 'val1', 'key2': 'val2' }
data_dict =    { 'text1': "We're all out of gouda." }


# a GET request (change to .post for a POST request)
response = requests.get(url, headers=header_dict,
                             params=param_dict,
                             data = data_dict)


response_status = response.status_code   # integer status of the response (OK, Not Found, etc.)

response_headers = response.headers      # headers sent by the server

response_text = response.text            # body sent by server


# outputting response elements (status, headers, body)

# response status
print(f'{div_bar} response status {div_bar}\n')
print(response_status)
print(); print()

# response headers
print(f'{div_bar} response headers {div_bar}\n')
for key in response_headers:
    print(f'{key}:  {response_headers[key]}\n')
print()

# response body
print(f'{div_bar} response body {div_bar}\n')
print(response_text)

Note that if import requests raises a ModuleNotFoundError exception, requests must be installed. It is not included with the Standard Distribution from python.org.

Reading CSV and JSON Data

Specific techniques for reading the most common data formats.

CSV: feed string response to .splitlines(), then to csv.reader:

import requests
import csv

url = 'path to csv file'

response = requests.get(url)
text = response.text

lines = text.splitlines()
reader = csv.reader(lines)

for row in reader:
    print(row)

JSON: requests accesses built-in support:

import requests

url = 'path to json file'

response = requests.get(url)

obj = response.json()

print(type(obj))          # <class 'dict'>

Handling the Response Status Code

The status code '200' means OK, but other codes may mean an error.

Every HTTP response is expected to return a 3-digit status code. These codes range from 204 (No Content, or there is no data in the response) to 401 (Unauthorized, or you do not have the privileges to see this page). See a list of status codes here.

import requests

url = 'https://www.python.org/dev/peps/pep-0020/'   # the Zen of Python (PEP 20)

response = requests.get(url)                # a response object

code = response.status_code                 # 200

print(requests.status_codes._codes[code])   # ('ok', 'okay', 'all_ok', 'all_okay', 'all_good', '\\o/', '✓')

print(requests.status_codes._codes[500])    # ('internal_server_error', 'server_error', '/o\\', '✗')

In many cases we just want to know whether the requests succeeded. As there are many response codes, some of which mean success and some failure, requests can be made to raise an exception if a 'failure' code was received:

import requests

response = requests.get('http://www.yahoo.com/some/wrong/url')

response.raise_for_status()

        # raise HTTPError(http_error_msg, response=self)
        # requests.exceptions.HTTPError: 404 Client Error:
        # Not Found for url: http://yahoo.com/some/wrong/url

Decoding a Text Response and Downloading Raw Content (Images, Sound Files, etc.)

Data is returned from a server as bytes; requests can decode most plaintext correctly.

Note: if this discussion of encoding is not immediately clear, see the "Understanding Unicode and Character Encodings" slide deck. All plaintext (i.e., characters that we see in files such as .txt, .csv, .json, .html, .xml, etc. is encoded as integers (called 'bytes' in this context). Bytes are decoded to characters using an encoding. There are many possible encodings on the internet. Many HTML documents use the 'charset' value in the 'Content-Type' header to specify the encoding. If this value is not present in the document, requests uses the chardet library to "sniff" the correct encoding.

requests attempts to handle encoding seamlessly through its .text attribute:

import requests

url = 'http://davidbpython.com/cgi-bin/http_reflect'

r = requests.get(url)

print(r.encoding)                 # 'utf-8' (this is specified in the response text)

print(r.text)                     # requests uses this encoding to decode the text


print(r.apparent_encoding)        # 'ascii' (this is what it looks like to requests)

r.encoding = 'utf-16'             # force requests to use a different encoding

print(r.text)                     # oops, wrong encoding:

   # '⨪⨪\u202a呈偔删䙅䕌呃⨠⨪⨪琊楨\u2073牰杯慲\u206d敲汦捥獴琠敨攠敬敭瑮\u2073景琠敨䠠呔⁐敲畱獥⁴琊慨⁴慷...

Keep in mind that requests almost always handles encodings correctly; one in which we had to set the encoding ourselves is rare.

To download raw bytes (for example, images or sound files), we use the response.content attribute and write as binary text:

import requests

url = 'https://davidbpython.com/advanced_python/supplementary/python.png'   # a URL to an image

response = requests.get(url)           # a response object

image_bytes = response.content         # response as bytes
print(f'{len(image_bytes)} bytes')     # 90835 bytes

wfh = open('python.png', 'wb')         # preparing a file to receive bytes
wfh.write(image_bytes)
wfh.close()

Uploading Files

We can pass raw bytes to requests to upload a file.

Keep in mind that you cannot upload a file to a directory on a server - this is designed to upload files to applications that are ready to receive them.

import requests

url = 'https://davidbpython.com/cgi-bin/http_reflect'

# open file for reading without decoding (returns a bytestring)
file_bytes = open('../test_file.txt', 'rb')

response = requests.post(url, files={'file': file_bytes})
print(response.text)

files={ 'file':  ('test_file.txt', file_bytes,
                  'text/plain') }

print(response.status_code)       # 200 (if all is well)

text/plain is a mime type and denotes that we are uploading a simple text file. Other types include text/csv, text/html and text/xml, image/jpeg, application/json, application/zip, application/vnd.ms-excel and application/octet-stream (default for non-text files). See Common Mime Types.

Sidebar: urllib as an Alternative to Requests

For those who cannot install requests, urllib is available.

Although the requests module is strongly favored by some for its simplicity, it has not yet been added to the Python builtin distribution.

The urlopen method takes a url and returns a file-like object that can be read() as a file:

import urllib.request
my_url = 'http://www.yahoo.com'
readobj = urllib.request.urlopen(my_url)
text = readobj.read()
print(text)
readobj.close()

Alternatively, you can call readlines() on the object (keep in mind that many objects that can deliver file-like string output can be read with this same-named method:

for line in readobj.readlines():
  print(line)
readobj.close()

The text that is downloaded is CSV, HTML, Javascript, and possibly other kinds of data. TypeError: can't use a string pattern on a bytes-like object This error may occur with some websites. It indicates that an undecoded unicode response was received.

The response usually comes to us as a special object called a byte string. In order to work with the response as a string, we may need to use the decode() method:

text = text.decode('utf-8')

UnicodeEncodeError This error may occur if the downloaded page contains characters that Python doesn't know how to handle. It is in most cases fixed by using the text.decode line above on the text immediately after it is retrieves from urlopen(). SSL Certificate Error Many websites enable SSL security and require a web request to accept and validate an SSL certificate (certifying the identity of the server). urllib by default requires SSL certificate security, but it can be bypassed (keep in mind that this may be a security risk).

import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

my_url = 'http://www.nytimes.com'
readobj = urllib.request.urlopen(my_url, context=ctx)

Download binary files: images and other files can be saved locally using urllib.request.urlretrieve().

import urllib.request

urllib.request.urlretrieve('http://www.azquotes.com/picture-quotes/quote-python-is-an-experiment-in-how-much-freedom-programmers-need-too-much-freedom-and-nobody-guido-van-rossum-133-51-31.jpg', 'guido.jpg')

Note the two arguments to urlretrieve(): the first is a URL to an image, and the second is a filename -- this file will be saved locally under that name.

Encoding Parameters: urllib.request.urlencode()

When including parameters in our requests, we must encode them into our request URL. The urlencode() method does this nicely:

import urllib.request, urllib.parse

params = urllib.parse.urlencode({'choice1': 'spam and eggs', 'choice2': 'spam, spam, bacon and spam'})
print("encoded query string: ", params)
f = urllib.request.urlopen("http://www.yahoo.com?{}".format(params))
print(f.read())

this prints:

encoded query string: choice1=spam+and+eggs&choice2=spam%2C+spam%2C+bacon+and+spam

choice1:  spam and eggs<BR>
choice2:  spam, spam, bacon and spam<BR>

[pr]