-
-
Notifications
You must be signed in to change notification settings - Fork 6k
Description
crawl4ai version
0.7.8
Expected Behavior
When using crawl4ai for web crawling, Stealth Mode and Undetected Browser Mode are typically enabled. The crawled content can be returned as Markdown or HTML without any issues. However, an error occurs when setting the screenshot parameter in config=CrawlerRunConfig(screenshot=True), as shown below:
Current Behavior
The core cause of this error is as follows: the self.logger object used in the code is None (a null value), but the code attempts to call its .info() method, which triggers the error: 'NoneType' object has no attribute 'info'.
Modify the code around line 1005 in the crawl4ai source code to add a non-null check for the logger object, and update the file site-packages/crawl4ai/async_crawler_strategy.py.
Lines 1004 to 1009 of the original code
if screenshot_data or pdf_data or mhtml_data:
self.logger.info(
message="Exporting media (PDF/MHTML/screenshot) took {duration:.2f}s",
tag="EXPORT",
params={"duration": time.perf_counter() - start_export_time},
)
After modification
if screenshot_data or pdf_data or mhtml_data:
if self.logger is not None: # Add non-null check
self.logger.info(
message="Exporting media (PDF/MHTML/screenshot) took {duration:.2f}s",
tag="EXPORT",
params={"duration": time.perf_counter() - start_export_time},
)
This change enables my code to run properly.
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
async def main():
browser_config = BrowserConfig(
enable_stealth=True, # Enable stealth mode
headless=True,
headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36 Edg/143.0.0.0"}
)
adapter = UndetectedAdapter()
strategy = AsyncPlaywrightCrawlerStrategy(
browser_config=browser_config,
browser_adapter=adapter,
)
crawler_config = CrawlerRunConfig(
wait_for_images=True,
scan_full_page=True,
scroll_delay=0.5,
cache_mode=CacheMode.BYPASS,
excluded_tags=["form", "header", "footer"],
keep_data_attributes=False,
screenshot=True
)
async with AsyncWebCrawler(
crawler_strategy=strategy,
config=browser_config,
) as crawler:
result = await crawler.arun(
url="",
config=crawler_config,
)
output_path = "article.md"
with open(output_path, "w", encoding="utf-8") as f:
f.write(result.markdown)
output_path = "article.html"
with open(output_path, "w", encoding="utf-8") as f:
f.write(result.cleaned_html)
if result.success:
print("✓ Successfully accessed bot detection test site")
# Save screenshot to verify detection results
if result.screenshot:
with open("stealth_test2.png", "wb") as f:
f.write(base64.b64decode(result.screenshot))
print("✓ Screenshot saved - check for green (passed) tests")OS
Windows wsl
Python version
3.12
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
[INIT].... → Crawl4AI 0.7.8
[ERROR]... × https://mp.weixin.qq.com/s/M1vGqFZV5MWREkSyx2-ITw | Error: Unexpected error in _crawl_web at
line 1005 in _crawl_web
(../../../../../miniconda3/envs/rag/lib/python3.12/site-packages/crawl4ai/async_crawler_strategy.py):
Error: 'NoneType' object has no attribute 'info'
Code context:
1000 screenshot_data = await self.take_screenshot(
1001 page, screenshot_height_threshold=config.screenshot_height_threshold
1002 )
1003
1004 if screenshot_data or pdf_data or mhtml_data:
1005 → self.logger.info(
1006 message="Exporting media (PDF/MHTML/screenshot) took {duration:.2f}s",
1007 tag="EXPORT",
1008 params={"duration": time.perf_counter() - start_export_time},
1009 )
1010
Traceback (most recent call last):
File "/home/edy/data/project/HAICHENG_QA/code/src/html2images.py", line 80, in
asyncio.run(main())
File "/home/edy/miniconda3/envs/rag/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/home/edy/miniconda3/envs/rag/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/edy/miniconda3/envs/rag/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/home/edy/data/project/HAICHENG_QA/code/src/html2images.py", line 59, in main
f.write(result.markdown)
TypeError: write() argument must be str, not None