The Python requests library is one of the most popular and user-friendly HTTP libraries available for Python developers. It abstracts the complexities of making HTTP requests behind a simple and intuitive API, allowing you to interact with web services, APIs, and websites with ease. This comprehensive guide will delve into the requests library in great detail, covering installation, core features, advanced usage, best practices, and more.
Installation
Before using the requests library, you need to install it. It can be installed via pip:
| pip install requests |
To verify the installation and check the version:
| import requests print(requests.__version__) |
As of the knowledge cutoff in October 2023, the latest stable version is typically recommended.
Basic Usage
The requests library simplifies HTTP requests by providing a straightforward API.
Making GET Requests
A GET request is used to retrieve data from a specified resource.
| import requests response = requests.get('api.example.com/data') print(response.status_code) # HTTP status code print(response.text) # Response body as text |
Making POST Requests
A POST request is used to send data to a server to create/update a resource.
| import requests payload = {'key1': 'value1', 'key2': 'value2'} response = requests.post('api.example.com/data', data=payload) print(response.status_code) print(response.json()) # If response is JSON |
Request Parameters
When making HTTP requests, you often need to pass additional data like headers, query parameters, form data, etc. The requests library provides several parameters to facilitate this.
URL Parameters (params)
Used to send query parameters in the URL.
| import requests params = {'search': 'python', 'page': 2} response = requests.get('api.example.com/search', params=params) print(response.url) # api.example.com/search?search=python&page=2 |
Headers (headers)
Used to send custom HTTP headers with the request.
| import requests headers = {'Authorization': 'Bearer YOUR_TOKEN'} response = requests.get('api.example.com/protected', headers=headers) |
Form Data (data)
Used to send form-encoded data, typically for POST requests.
| import requests data = {'username': 'user', 'password': 'pass'} response = requests.post('api.example.com/login', data=data) |
JSON Payloads (json)
Used to send JSON-encoded data, automatically setting the Content-Type header to application/json.
| import requests json_data = {'key1': 'value1', 'key2': 'value2'} response = requests.post('api.example.com/json', json=json_data) |
Files (files)
Used to upload files.
| import requests files = {'file': open('report.pdf', 'rb')} response = requests.post('api.example.com/upload', files=files) |
For multiple files or additional form data, combine files with data.
| import requests files = { 'file1': open('report.pdf', 'rb'), 'file2': open('image.png', 'rb') } data = {'description': 'Report and image files'} response = requests.post('api.example.com/upload', files=files, data=data) |
Response Handling
After making a request, the requests library returns a Response object containing the server's response.
Status Codes
HTTP status codes indicate the result of the request.
| import requests response = requests.get('api.example.com/data') print(response.status_code) # e.g., 200 print(response.ok) # True if status_code < 400 |
Common status codes:
- 200 OK: The request was successful.
- 201 Created: The resource was created successfully.
- 400 Bad Request: The server could not understand the request.
- 401 Unauthorized: Authentication is required.
- 403 Forbidden: The server understood the request but refuses to authorize it.
- 404 Not Found: The requested resource could not be found.
- 500 Internal Server Error: The server encountered an error.
Response Headers
Access response headers using the .headers attribute, which behaves like a case-insensitive dictionary.
| import requests response = requests.get('api.example.com/data') print(response.headers['Content-Type']) |
Response Body
The response body can be accessed in various formats.
Text
The .text attribute returns the response body as a string, using the response's encoding.
| import requests response = requests.get('api.example.com/data') print(response.text) |
JSON
If the response contains JSON, use the .json() method to parse it into Python data structures.
| import requests response = requests.get('api.example.com/data') data = response.json() print(data) |
Note: If the response is not valid JSON, .json() will raise a ValueError.
Content
The .content attribute returns the response body in bytes, useful for binary data like images.
| import requests response = requests.get('api.example.com/image.png') with open('image.png', 'wb') as f: f.write(response.content) |
Advanced Features
Sessions
Sessions allow you to persist certain parameters across multiple requests, such as cookies, headers, and authentication.
| import requests session = requests.Session() session.headers.update({'Authorization': 'Bearer YOUR_TOKEN'}) # All requests using the session will include the Authorization header response = session.get('api.example.com/protected') |
Benefits of Sessions:
- Connection Persistence: Reuses TCP connections, improving performance.
- Session-wide Settings: Set default headers, authentication, etc.
Authentication
requests supports various authentication mechanisms.
Basic Authentication
| import requests from requests.auth import HTTPBasicAuth response = requests.get( 'api.example.com/user', auth=HTTPBasicAuth('username', 'password') ) |
Alternatively, use a tuple:
| response = requests.get( 'api.example.com/user', auth=('username', 'password') ) |
OAuth and Other Schemes
For more complex authentication schemes like OAuth, you may need to use external libraries or handle token acquisition manually.
SSL Verification
By default, requests verifies SSL certificates. You can disable it (not recommended) or provide a custom certificate.
| # Disable SSL verification response = requests.get('self-signed.example.com', verify=False) # Provide a custom certificate response = requests.get('secure.example.com', verify='/path/to/cert.pem') |
Note: Disabling SSL verification can expose you to security risks like man-in-the-middle attacks.
Timeouts
Set timeouts to prevent your application from hanging indefinitely.
| import requests # Set a timeout of 5 seconds for the entire request response = requests.get('api.example.com/data', timeout=5) # Set connect and read timeouts separately response = requests.get('api.example.com/data', timeout=(3, 7)) |
Exceptions:
- requests.exceptions.Timeout: Raised when a timeout occurs.
Proxies
Route your requests through proxies.
| import requests proxies = { 'http': '10.10.1.10:3128', 'https': '10.10.1.10:1080', } response = requests.get('api.example.com/data', proxies=proxies) |
For proxies requiring authentication:
| proxies = { 'http': 'user:pass@10.10.1.10:3128/', 'https': 'user:pass@10.10.1.10:1080/', } response = requests.get('api.example.com/data', proxies=proxies) |
Streaming Responses
Stream large responses to avoid loading them entirely into memory.
| import requests response = requests.get('api.example.com/largefile', stream=True) with open('largefile', 'wb') as f: for chunk in response.iter_content(chunk_size=8192): if chunk: f.write(chunk) |
Redirection Control
By default, requests follows redirects. You can control this behavior.
| import requests # Do not follow redirects response = requests.get('api.example.com/redirect', allow_redirects=False) # Limit the number of redirects response = requests.get('api.example.com/redirect', max_redirects=3) |
Hooks
Hooks allow you to execute custom code at certain points during the request lifecycle.
| import requests def print_url(response, *args, **kwargs): print(response.url) hooks = {'response': print_url} response = requests.get('api.example.com/data', hooks=hooks) |
Exception Handling
Proper exception handling ensures your application can gracefully handle errors.
| import requests from requests.exceptions import HTTPError, Timeout, RequestException try: response = requests.get('api.example.com/data', timeout=5) response.raise_for_status() # Raise HTTPError for bad responses (4xx or 5xx) except HTTPError as http_err: print(f'HTTP error occurred: {http_err}') except Timeout as timeout_err: print(f'Timeout error occurred: {timeout_err}') except RequestException as req_err: print(f'An error occurred: {req_err}') else: print('Success!') |
Common Exceptions:
- requests.exceptions.HTTPError: Invalid HTTP response.
- requests.exceptions.ConnectionError: Network problem (e.g., DNS failure, refused connection).
- requests.exceptions.Timeout: Request timed out.
- requests.exceptions.TooManyRedirects: Exceeded the configured number of maximum redirects.
- requests.exceptions.RequestException: Base class for all exceptions.
Best Practices
Use Sessions: Reuse sessions to persist parameters and improve performance.
| session = requests.Session() session.headers.update({'User-Agent': 'my-app/0.0.1'}) response = session.get('api.example.com/data') |
Handle Exceptions: Always anticipate and handle potential errors.
Set Timeouts: Prevent your application from hanging due to unresponsive servers.
Verify SSL Certificates: Ensure secure communication by verifying SSL certificates.
Limit Redirections: Control the number of redirects to prevent infinite loops.
Use Environment Variables for Sensitive Data: Store tokens and credentials securely.
| import os token = os.getenv('API_TOKEN') headers = {'Authorization': f'Bearer {token}'} |
Respect Rate Limits: When interacting with APIs, adhere to their rate limiting policies to avoid being blocked.
Use Streaming for Large Responses: Avoid loading large files into memory.
Clean Up Resources: Close files and other resources appropriately.
| with open('file.txt', 'rb') as f: response = requests.post('api.example.com/upload', files={'file': f}) |
Logging: Implement logging to monitor requests and responses, especially for debugging.
| import logging logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) response = requests.get('api.example.com/data') logger.info(f'Status Code: {response.status_code}') |
Extending Requests
The requests library is highly extensible. You can create custom adapters, middleware, or even monkey-patch behaviors if needed.
Custom Adapters
Adapters allow you to define custom connection logic.
| import requests from requests.adapters import HTTPAdapter from urllib3.util.retry import Retry session = requests.Session() retry = Retry( total=5, backoff_factor=1, status_forcelist=[500, 502, 503, 504], method_whitelist=["HEAD", "GET", "OPTIONS"] ) adapter = HTTPAdapter(max_retries=retry) session.mount('https://', adapter) session.mount('http://', adapter) response = session.get('api.example.com/data') |
Middleware and Plugins
While requests does not have built-in middleware support, you can implement similar patterns using hooks or by subclassing.
For advanced use cases, consider using third-party libraries like requests-toolbelt which provide additional utilities.
Examples
Example 1: Downloading an Image
| import requests url = 'www.example.com/image.jpg' response = requests.get(url, stream=True) if response.status_code == 200: with open('image.jpg', 'wb') as f: for chunk in response.iter_content(1024): f.write(chunk) else: print(f'Failed to retrieve image. Status code: {response.status_code}') |
Example 2: Sending JSON Data with Authentication
| import requests url = 'api.example.com/create' payload = {'name': 'John', 'age': 30} headers = {'Authorization': 'Bearer YOUR_TOKEN'} response = requests.post(url, json=payload, headers=headers) if response.ok: print('Resource created:', response.json()) else: print(f'Error: {response.status_code} – {response.text}') |
Example 3: Handling Pagination
Suppose an API uses pagination with page and per_page parameters.
| import requests url = 'api.example.com/items' params = {'page': 1, 'per_page': 50} all_items = [] while True: response = requests.get(url, params=params) if response.status_code != 200: break data = response.json() items = data.get('items', []) if not items: break all_items.extend(items) params['page'] += 1 print(f'Total items retrieved: {len(all_items)}') |
Example 4: Using Sessions with Retries
| import requests from requests.adapters import HTTPAdapter from urllib3.util.retry import Retry session = requests.Session() retry_strategy = Retry( total=3, status_forcelist=[429, 500, 502, 503, 504], method_whitelist=["GET", "POST"], backoff_factor=1 ) adapter = HTTPAdapter(max_retries=retry_strategy) session.mount('https://', adapter) session.mount('http://', adapter) response = session.get('api.example.com/data') |
Conclusion
The Python requests library is a powerful tool for making HTTP requests with ease and flexibility. Its simple API abstracts away many of the complexities involved in handling HTTP communications, making it accessible for beginners while still providing the depth needed for advanced use cases. By understanding and leveraging its features—such as sessions, authentication, SSL verification, and more—you can build robust applications that interact seamlessly with web services and APIs.
Whether you're scraping websites, consuming RESTful APIs, uploading files, or handling complex authentication schemes, the requests library provides the necessary tools to accomplish these tasks efficiently. Remember to follow best practices like handling exceptions, setting timeouts, and managing resources to ensure your applications are reliable and secure.