Performant, extensible and lean web crawler, utilizes all available CPUs by default.
Uses event loop for I/O and processes for analyzing the pages.
- Basic
httpxpage downloader S3page storage- Local filesystem page storage
- Have a look at
tests/integration/test_crawl.py - Implement your own
PageAnalyzerandPageDownloaderclasses - Optionally customize
structloglogging, see configuration - Have fun!
All classes in the modules folder can be replaced with your custom implementation.