About me:
I am a university professor specialising in Arabic Natural Language Processing and computational linguistics. I lead a long-term research programme on the computational analysis of Prophetic Hadith, covering lexical structure, stylometric fingerprinting, and transmission history — with publications in progress.
Email: moataz@cu.edu.eg
Purpose:
I am building a structured Hadith corpus for use across multiple research tracks, including:
- Lexical and stylometric analysis of the nine canonical collections (Kutub al-Tisʿa) as a unified corpus
- Computational investigation of transmission variation and riwāyah bi-l-maʿnā at the matn level
- Arabic historical lexicography and register analysis
- Future work involving both Arabic text and English translations
The common requirement across all these tracks is access to hadith data with isnād and matn structurally separated
at the record level — a distinction that plain-text editions do not provide and that is methodologically essential for isolating transmission-formula vocabulary from hadith content vocabulary.
===================================
Offline dump or API:
An offline dump in structured format (JSON or equivalent) is strongly preferred. It eliminates repeated server requests, guarantees data completeness and consistency across the full corpus, and is better suited to the batch processing this research requires. I would be grateful for a dump covering all available collections in both Arabic and English.
===================================
API rate limits:
Maximum requests per second: 2
Maximum requests per day: 2,000
(Single-run collection; will not be repeated after the initial dump is complete)
===================================
Languages:
Arabic and English
===================================
**Programming language:
** Python
About me:
I am a university professor specialising in Arabic Natural Language Processing and computational linguistics. I lead a long-term research programme on the computational analysis of Prophetic Hadith, covering lexical structure, stylometric fingerprinting, and transmission history — with publications in progress.
Email: moataz@cu.edu.eg
Purpose:
I am building a structured Hadith corpus for use across multiple research tracks, including:
The common requirement across all these tracks is access to hadith data with isnād and matn structurally separated
at the record level — a distinction that plain-text editions do not provide and that is methodologically essential for isolating transmission-formula vocabulary from hadith content vocabulary.
===================================
Offline dump or API:
An offline dump in structured format (JSON or equivalent) is strongly preferred. It eliminates repeated server requests, guarantees data completeness and consistency across the full corpus, and is better suited to the batch processing this research requires. I would be grateful for a dump covering all available collections in both Arabic and English.
===================================
API rate limits:
Maximum requests per second: 2
Maximum requests per day: 2,000
(Single-run collection; will not be repeated after the initial dump is complete)
===================================
Languages:
Arabic and English
===================================
**Programming language:
** Python