Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Nextdoor Scraper

At the time of development, Nextdoor could not be scraped easily using more traditional methods (e.g. Scrapy, Beautiful Soup, etc.) because requests to retrieve the next set of posts use a "random" number as a parameter.
At the time of website or software development, Nextdoor could not be scraped easily using more traditional methods like Scrapy, Beautiful Soup etc. because requests to retrieve the next set of posts use a "random" number as a parameter.

Thus, this is a simple python script that uses Selenium to simulate user input to scrape relevant data off nextdoor.com. It uses a chromedriver (included in this repo) as the browser.

Expand All @@ -15,8 +15,8 @@ Once a virtual environment is built, `pip install -r requirements.txt` must be r
Feel free to fork this repo and make it your own! This was just a personal project of mine, but if it is useful to anyone else, I'm happy to share this project. If you'd like to use it as is:

1. Clone the repository into your directory of choosing.
2. Create your own `.env` file, and fill out the variables
3. Open command prompt, navigate to the Nextdoor_Scraper directory, and run:
2. Create your own enviroment `.env` file, and fill out the variables
3. Open command prompt, navigate to the Nextdoor_Scraper directory, and run commands mentioned bellow:
* `python nextdoor.py` if you don't want to save the html file separately (as backup in case of failure)
* `python html_saver.py` if you want to save the html files and `python html_scraper.py` to scrape the local files separately (more stable for longer scrapes since it'll save the files)