Skip to content

[Bug]: A conflict occurs when crawler_strategy and screenshot=True are set together #1704

@OpenAI-Insights

Description

@OpenAI-Insights

crawl4ai version

0.7.8

Expected Behavior

When using crawl4ai for web crawling, Stealth Mode and Undetected Browser Mode are typically enabled. The crawled content can be returned as Markdown or HTML without any issues. However, an error occurs when setting the screenshot parameter in config=CrawlerRunConfig(screenshot=True), as shown below:

Current Behavior

The core cause of this error is as follows: the self.logger object used in the code is None (a null value), but the code attempts to call its .info() method, which triggers the error: 'NoneType' object has no attribute 'info'.
Modify the code around line 1005 in the crawl4ai source code to add a non-null check for the logger object, and update the file site-packages/crawl4ai/async_crawler_strategy.py.

Lines 1004 to 1009 of the original code
if screenshot_data or pdf_data or mhtml_data:
self.logger.info(
message="Exporting media (PDF/MHTML/screenshot) took {duration:.2f}s",
tag="EXPORT",
params={"duration": time.perf_counter() - start_export_time},
)

After modification

if screenshot_data or pdf_data or mhtml_data:
if self.logger is not None: # Add non-null check
self.logger.info(
message="Exporting media (PDF/MHTML/screenshot) took {duration:.2f}s",
tag="EXPORT",
params={"duration": time.perf_counter() - start_export_time},
)

This change enables my code to run properly.

Is this reproducible?

Yes

Inputs Causing the Bug

Steps to Reproduce

Code snippets

async def main():

    browser_config = BrowserConfig(
        enable_stealth=True,  # Enable stealth mode
        headless=True,
        headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36 Edg/143.0.0.0"}
    )

    adapter = UndetectedAdapter()

    strategy = AsyncPlaywrightCrawlerStrategy(
        browser_config=browser_config,
        browser_adapter=adapter,
    )

    crawler_config = CrawlerRunConfig(
        wait_for_images=True,
        scan_full_page=True,
        scroll_delay=0.5,
        cache_mode=CacheMode.BYPASS,
        excluded_tags=["form", "header", "footer"],
        keep_data_attributes=False,
        screenshot=True
        
    )
    
    async with AsyncWebCrawler(
        crawler_strategy=strategy,
        config=browser_config,
    ) as crawler:
        result = await crawler.arun(
            url="",
            config=crawler_config,
        )

        output_path = "article.md"
        with open(output_path, "w", encoding="utf-8") as f:
            f.write(result.markdown)
        
        output_path = "article.html"
        with open(output_path, "w", encoding="utf-8") as f:
            f.write(result.cleaned_html)
        if result.success:
            print("✓ Successfully accessed bot detection test site")
            # Save screenshot to verify detection results
            if result.screenshot:
                with open("stealth_test2.png", "wb") as f:
                    f.write(base64.b64decode(result.screenshot))
                print("✓ Screenshot saved - check for green (passed) tests")

OS

Windows wsl

Python version

3.12

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

[INIT].... → Crawl4AI 0.7.8
[ERROR]... × https://mp.weixin.qq.com/s/M1vGqFZV5MWREkSyx2-ITw | Error: Unexpected error in _crawl_web at
line 1005 in _crawl_web
(../../../../../miniconda3/envs/rag/lib/python3.12/site-packages/crawl4ai/async_crawler_strategy.py):
Error: 'NoneType' object has no attribute 'info'

Code context:
1000 screenshot_data = await self.take_screenshot(
1001 page, screenshot_height_threshold=config.screenshot_height_threshold
1002 )
1003
1004 if screenshot_data or pdf_data or mhtml_data:
1005 → self.logger.info(
1006 message="Exporting media (PDF/MHTML/screenshot) took {duration:.2f}s",
1007 tag="EXPORT",
1008 params={"duration": time.perf_counter() - start_export_time},
1009 )
1010
Traceback (most recent call last):
File "/home/edy/data/project/HAICHENG_QA/code/src/html2images.py", line 80, in
asyncio.run(main())
File "/home/edy/miniconda3/envs/rag/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/home/edy/miniconda3/envs/rag/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/edy/miniconda3/envs/rag/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/home/edy/data/project/HAICHENG_QA/code/src/html2images.py", line 59, in main
f.write(result.markdown)
TypeError: write() argument must be str, not None

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐞 BugSomething isn't working📌 Root causedidentified the root cause of bug

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions