A Python tool to scrape, clean, and analyze the complete database of anime songs from Aniplaylist.com.
This project uses the site's Algolia API and tries to bypass standard pagination limits (1,000 hits), allowing for the extraction of the entire dataset (25,000+ songs). It includes tools for data cleaning and visualization.
- Partitioning: Automatically splits queries by 'Season' and 'Song Type' to bypass API limits.
- Resumable Scraping: Saves data in chunks (
/aniplaylist_chunks). If the script crashes, it resumes exactly where it left off. - Data Flattening: Converts nested JSON (Spotify links, artist arrays) into a clean, flat CSV format.
- Analysis & Viz: Includes a Jupyter Notebook for generating insights and charts.
Here is a glimpse of the data contained in the dataset:
-
Clone the repository:
git clone https://github.com/TheInternetUse7/aniplaylist-scraper.git cd aniplaylist-scraper -
Install requirements:
pip install pandas requests matplotlib seaborn
Use this if you just want the raw data (.csv).
python scrape_aniplaylist.py- This will create a folder
aniplaylist_chunkscontaining partial downloads. - Upon completion, it will generate
aniplaylist_complete_dump.csv.
Use this if you want to scrape, analyze, and generate charts interactively.
- Open
aniplaylist_notebook.ipynbin JupyterLab, VS Code, or Google Colab. - Run the cells sequentially.
This repository is for educational and analytical purposes only.
All data belongs to Aniplaylist.com or their respective copyright holders. I do not claim ownership of the data provided by the API.



