How to add headers to every or some scrapy requests?

by scrapecrow Apr 23, 2023

There are several ways to add headers in scrapy spiders. This can be done for each request manually:

class MySpider(scrapy.Spider): def parse(self, response): yield scrapy.Request(..., headers={"x-token": "123"}) 

However to automatically add headers to every or specific outgoing scrapy requests the DEAFAULT_REQUEST_HEADERS setting can be used:

# settings.py DEFAULT_REQUEST_HEADERS = { "User-Agent": "my awesome scrapy robot", } 

In case more complex logic is needed like adding headers only to some requests or random User-Agent header a request middleware is the best option:

# middlewares.py import random class RandomUserAgentMiddleware: def __init__(self, user_agents): self.user_agents = user_agents @classmethod def from_crawler(cls, crawler):  """retrieve user agent list from settings.USER_AGENTS""" user_agents = crawler.settings.get('USER_AGENTS', []) if not user_agents: raise ValueError('No user agents found in settings. Please provide a list of user agents in the USER_AGENTS setting.') return cls(user_agents) def process_request(self, request, spider):  """attach random user agent to every outgoing request""" user_agent = random.choice(self.user_agents) request.headers.setdefault('User-Agent', user_agent) spider.logger.debug(f'Using User-Agent: {user_agent}') # settings.py MIDDLEWARES = { # ... 'myproject.middlewares.RandomUserAgentMiddleware': 760, # ... } USER_AGENTS = [ 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36', # ... ] 

Note that if you're using Scrapfly's scrapy SDK some headers like User-Agent string are automatically by the smart anti-blocking API.

Related Articles

What is Rate Limiting? Everything You Need to Know

Discover what rate limiting is, why it matters, how it works, and how developers can implement it to build stable, scalable applications.

BLOCKING
CRAWLING
HTTP
What is Rate Limiting? Everything You Need to Know

Guide to Axios Headers

Learn about Javascript's Axios headers. How to configure, update, inspect headers in request and responses, how to set defaults and useful tips

HTTP
NODEJS
Guide to Axios Headers

What is HTTP 401 Error and How to Fix it

Discover the HTTP 401 error meaning, its causes, and solutions in this comprehensive guide. Learn how 401 unauthorized errors occur.

HTTP
What is HTTP 401 Error and How to Fix it

Comprehensive Guide to OkHttp for Java and Kotlin

Learn how to simplify network communication in Java and Android applications using OkHttp.

HTTP
TOOLS
Comprehensive Guide to OkHttp for Java and Kotlin

What is HTTP 407 Status Code and How to Fix it

Learn everything about the HTTP 407 Proxy Authentication Required error. Understand its causes, including misconfigured proxies

HTTP
What is HTTP 407 Status Code and How to Fix it

Guide to Cloudflare's Error Code 520 and How to Fix it

Quick look at error code 520, what does it mean, its common causes, and how it can be prevented.

HTTP
Guide to Cloudflare's Error Code 520 and How to Fix it