Python requests library

The Python requests library is one of the most popular and user-friendly HTTP libraries available for Python developers. It abstracts the complexities of making HTTP requests behind a simple and intuitive API, allowing you to interact with web services, APIs, and websites with ease. This comprehensive guide will delve into the requests library in great detail, covering installation, core features, advanced usage, best practices, and more.

Installation

Before using the requests library, you need to install it. It can be installed via pip:

pip install requests

To verify the installation and check the version:

import requests
print(requests.__version__)

As of the knowledge cutoff in October 2023, the latest stable version is typically recommended.

Basic Usage

The requests library simplifies HTTP requests by providing a straightforward API.

Making GET Requests

A GET request is used to retrieve data from a specified resource.

import requests

response = requests.get('api.example.com/data')
print(response.status_code) # HTTP status code
print(response.text) # Response body as text

Making POST Requests

A POST request is used to send data to a server to create/update a resource.

import requests

payload = {'key1': 'value1', 'key2': 'value2'}
response = requests.post('api.example.com/data', data=payload)
print(response.status_code)
print(response.json()) # If response is JSON

Request Parameters

When making HTTP requests, you often need to pass additional data like headers, query parameters, form data, etc. The requests library provides several parameters to facilitate this.

URL Parameters (params)

Used to send query parameters in the URL.

import requests

params = {'search': 'python', 'page': 2}
response = requests.get('api.example.com/search', params=params)
print(response.url) # api.example.com/search?search=python&page=2

Headers (headers)

Used to send custom HTTP headers with the request.

import requests

headers = {'Authorization': 'Bearer YOUR_TOKEN'}
response = requests.get('api.example.com/protected', headers=headers)

Form Data (data)

Used to send form-encoded data, typically for POST requests.

import requests

data = {'username': 'user', 'password': 'pass'}
response = requests.post('api.example.com/login', data=data)

JSON Payloads (json)

Used to send JSON-encoded data, automatically setting the Content-Type header to application/json.

import requests

json_data = {'key1': 'value1', 'key2': 'value2'}
response = requests.post('api.example.com/json', json=json_data)

Files (files)

Used to upload files.

import requests

files = {'file': open('report.pdf', 'rb')}
response = requests.post('api.example.com/upload', files=files)

For multiple files or additional form data, combine files with data.

import requests

files = {
'file1': open('report.pdf', 'rb'),
'file2': open('image.png', 'rb')
}
data = {'description': 'Report and image files'}
response = requests.post('api.example.com/upload', files=files, data=data)

Response Handling

After making a request, the requests library returns a Response object containing the server's response.

Status Codes

HTTP status codes indicate the result of the request.

import requests

response = requests.get('api.example.com/data')
print(response.status_code) # e.g., 200
print(response.ok) # True if status_code < 400

Common status codes:

200 OK: The request was successful.
201 Created: The resource was created successfully.
400 Bad Request: The server could not understand the request.
401 Unauthorized: Authentication is required.
403 Forbidden: The server understood the request but refuses to authorize it.
404 Not Found: The requested resource could not be found.
500 Internal Server Error: The server encountered an error.

Response Headers

Access response headers using the .headers attribute, which behaves like a case-insensitive dictionary.

import requests

response = requests.get('api.example.com/data')
print(response.headers['Content-Type'])

Response Body

The response body can be accessed in various formats.

Text

The .text attribute returns the response body as a string, using the response's encoding.

import requests

response = requests.get('api.example.com/data')
print(response.text)

JSON

If the response contains JSON, use the .json() method to parse it into Python data structures.

import requests

response = requests.get('api.example.com/data')
data = response.json()
print(data)

Note: If the response is not valid JSON, .json() will raise a ValueError.

Content

The .content attribute returns the response body in bytes, useful for binary data like images.

import requests

response = requests.get('api.example.com/image.png')
with open('image.png', 'wb') as f:
f.write(response.content)

Advanced Features

Sessions

Sessions allow you to persist certain parameters across multiple requests, such as cookies, headers, and authentication.

import requests

session = requests.Session()
session.headers.update({'Authorization': 'Bearer YOUR_TOKEN'})

# All requests using the session will include the Authorization header
response = session.get('api.example.com/protected')

Benefits of Sessions:

Connection Persistence: Reuses TCP connections, improving performance.
Session-wide Settings: Set default headers, authentication, etc.

Authentication

requests supports various authentication mechanisms.

Basic Authentication

import requests
from requests.auth import HTTPBasicAuth

response = requests.get(
'api.example.com/user',
auth=HTTPBasicAuth('username', 'password')
)

Alternatively, use a tuple:

response = requests.get(
'api.example.com/user',
auth=('username', 'password')
)

OAuth and Other Schemes

For more complex authentication schemes like OAuth, you may need to use external libraries or handle token acquisition manually.

SSL Verification

By default, requests verifies SSL certificates. You can disable it (not recommended) or provide a custom certificate.

# Disable SSL verification
response = requests.get('self-signed.example.com', verify=False)

# Provide a custom certificate
response = requests.get('secure.example.com', verify='/path/to/cert.pem')

Note: Disabling SSL verification can expose you to security risks like man-in-the-middle attacks.

Timeouts

Set timeouts to prevent your application from hanging indefinitely.

import requests

# Set a timeout of 5 seconds for the entire request
response = requests.get('api.example.com/data', timeout=5)

# Set connect and read timeouts separately
response = requests.get('api.example.com/data', timeout=(3, 7))

Exceptions:

requests.exceptions.Timeout: Raised when a timeout occurs.

Proxies

Route your requests through proxies.

import requests

proxies = {
'http': '10.10.1.10:3128',
'https': '10.10.1.10:1080',
}

response = requests.get('api.example.com/data', proxies=proxies)

For proxies requiring authentication:

proxies = {
'http': 'user:pass@10.10.1.10:3128/',
'https': 'user:pass@10.10.1.10:1080/',
}

response = requests.get('api.example.com/data', proxies=proxies)

Streaming Responses

Stream large responses to avoid loading them entirely into memory.

import requests

response = requests.get('api.example.com/largefile', stream=True)

with open('largefile', 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)

Redirection Control

By default, requests follows redirects. You can control this behavior.

import requests

# Do not follow redirects
response = requests.get('api.example.com/redirect', allow_redirects=False)

# Limit the number of redirects
response = requests.get('api.example.com/redirect', max_redirects=3)

Hooks

Hooks allow you to execute custom code at certain points during the request lifecycle.

import requests

def print_url(response, *args, **kwargs):
print(response.url)

hooks = {'response': print_url}

response = requests.get('api.example.com/data', hooks=hooks)

Exception Handling

Proper exception handling ensures your application can gracefully handle errors.

import requests
from requests.exceptions import HTTPError, Timeout, RequestException

try:
response = requests.get('api.example.com/data', timeout=5)
response.raise_for_status() # Raise HTTPError for bad responses (4xx or 5xx)
except HTTPError as http_err:
print(f'HTTP error occurred: {http_err}')
except Timeout as timeout_err:
print(f'Timeout error occurred: {timeout_err}')
except RequestException as req_err:
print(f'An error occurred: {req_err}')
else:
print('Success!')

Common Exceptions:

requests.exceptions.HTTPError: Invalid HTTP response.
requests.exceptions.ConnectionError: Network problem (e.g., DNS failure, refused connection).
requests.exceptions.Timeout: Request timed out.
requests.exceptions.TooManyRedirects: Exceeded the configured number of maximum redirects.
requests.exceptions.RequestException: Base class for all exceptions.

Best Practices

Use Sessions: Reuse sessions to persist parameters and improve performance.

session = requests.Session()
session.headers.update({'User-Agent': 'my-app/0.0.1'})
response = session.get('api.example.com/data')

Handle Exceptions: Always anticipate and handle potential errors.

Set Timeouts: Prevent your application from hanging due to unresponsive servers.

Verify SSL Certificates: Ensure secure communication by verifying SSL certificates.

Limit Redirections: Control the number of redirects to prevent infinite loops.

Use Environment Variables for Sensitive Data: Store tokens and credentials securely.

import os
token = os.getenv('API_TOKEN')
headers = {'Authorization': f'Bearer {token}'}

Respect Rate Limits: When interacting with APIs, adhere to their rate limiting policies to avoid being blocked.

Use Streaming for Large Responses: Avoid loading large files into memory.

Clean Up Resources: Close files and other resources appropriately.

with open('file.txt', 'rb') as f:
response = requests.post('api.example.com/upload', files={'file': f})

Logging: Implement logging to monitor requests and responses, especially for debugging.

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

response = requests.get('api.example.com/data')
logger.info(f'Status Code: {response.status_code}')

Extending Requests

The requests library is highly extensible. You can create custom adapters, middleware, or even monkey-patch behaviors if needed.

Custom Adapters

Adapters allow you to define custom connection logic.

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

session = requests.Session()
retry = Retry(
total=5,
backoff_factor=1,
status_forcelist=[500, 502, 503, 504],
method_whitelist=["HEAD", "GET", "OPTIONS"]
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('https://', adapter)
session.mount('http://', adapter)

response = session.get('api.example.com/data')

Middleware and Plugins

While requests does not have built-in middleware support, you can implement similar patterns using hooks or by subclassing.

For advanced use cases, consider using third-party libraries like requests-toolbelt which provide additional utilities.

Examples

Example 1: Downloading an Image

import requests

url = 'www.example.com/image.jpg'
response = requests.get(url, stream=True)

if response.status_code == 200:
with open('image.jpg', 'wb') as f:
for chunk in response.iter_content(1024):
f.write(chunk)
else:
print(f'Failed to retrieve image. Status code: {response.status_code}')

Example 2: Sending JSON Data with Authentication

import requests

url = 'api.example.com/create'
payload = {'name': 'John', 'age': 30}
headers = {'Authorization': 'Bearer YOUR_TOKEN'}

response = requests.post(url, json=payload, headers=headers)

if response.ok:
print('Resource created:', response.json())
else:
print(f'Error: {response.status_code} – {response.text}')

Example 3: Handling Pagination

Suppose an API uses pagination with page and per_page parameters.

import requests

url = 'api.example.com/items'
params = {'page': 1, 'per_page': 50}

all_items = []

while True:
response = requests.get(url, params=params)
if response.status_code != 200:
break
data = response.json()
items = data.get('items', [])
if not items:
break
all_items.extend(items)
params['page'] += 1

print(f'Total items retrieved: {len(all_items)}')

Example 4: Using Sessions with Retries

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

session = requests.Session()
retry_strategy = Retry(
total=3,
status_forcelist=[429, 500, 502, 503, 504],
method_whitelist=["GET", "POST"],
backoff_factor=1
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount('https://', adapter)
session.mount('http://', adapter)

response = session.get('api.example.com/data')

Conclusion

The Python requests library is a powerful tool for making HTTP requests with ease and flexibility. Its simple API abstracts away many of the complexities involved in handling HTTP communications, making it accessible for beginners while still providing the depth needed for advanced use cases. By understanding and leveraging its features—such as sessions, authentication, SSL verification, and more—you can build robust applications that interact seamlessly with web services and APIs.

Whether you're scraping websites, consuming RESTful APIs, uploading files, or handling complex authentication schemes, the requests library provides the necessary tools to accomplish these tasks efficiently. Remember to follow best practices like handling exceptions, setting timeouts, and managing resources to ensure your applications are reliable and secure.