- Notifications
You must be signed in to change notification settings - Fork 150
Closed
Labels
bugSomething isn't workingSomething isn't working
Milestone
Description
Hello world!
I am building a common Spider that crawls sites and the contained request.
I use scrapy-playwright to load websites first and get the requests that are sent.
I noticed that when I parse urls that have no content on body the execution freezes and playwright's browser shows empty tab.
To be clear reproduction of the problem is when parse a url that has the following condition as true:
response_body_text = await response.text() response_body_text == '' For the urls that this condition is false spider works perfectly!
For the reproduction, I have a quite common configuration with:
CrawlerProcess({ ... # Playwright settings 'DOWNLOAD_HANDLERS': { "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler", "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler", }, 'TWISTED_REACTOR': "twisted.internet.asyncioreactor.AsyncioSelectorReactor", 'PLAYWRIGHT_BROWSER_TYPE': 'chromium', 'PLAYWRIGHT_MAX_PAGES_PER_CONTEXT': 10, 'PLAYWRIGHT_LAUNCH_OPTIONS': { 'headless': True, } }) and on each scrapy.Request() I pass the following meta:
{ "playwright": True } Has anybody else come up with this issue?
Thank you all!
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working