- update config.yaml
- update params.yaml
- update entity
- update the configuration manager in src config
- update the components
- update the pipeline
- update the main.py
- update the app.py
This project is a comprehensive End-to-End Text Summarizer built using Python and the Hugging Face Transformers library. It leverages the Pegasus model to generate concise summaries of dialogue-based text (trained on the SAMSum dataset). The project is structured with a modular pipeline design, ensuring scalability and ease of maintenance.
- Modular Pipeline: Distinct stages for Data Ingestion, Validation, Transformation, Model Training, and Evaluation.
- State-of-the-Art Model: Utilizes Google's Pegasus model for high-quality abstractive summarization.
- Configuration Management: Centralized configuration via
config.yamlandparams.yaml. - Logging: Robust logging system for tracking pipeline execution.
- Language: Python
- Libraries: Hugging Face Transformers, PyTorch, Datasets, Pandas, NLTK
- Tools: Docker, FastAPI (planned)
To get started, clone the repository to your local machine:
git clone https://github.com/krishnab0841/End_To_End_Text_Summarizer.git
cd End_To_End_Text_SummarizerIt is recommended to use a virtual environment to manage dependencies:
conda create -n summary python=3.8 -y
conda activate summaryInstall the required Python packages:
pip install -r requirements.txtExecute the main script to run the entire training and evaluation pipeline:
python main.pyconfig/: Configuration files.src/: Source code for components and pipelines.research/: Jupyter notebooks for experimentation.artifacts/: Generated artifacts (datasets, models, metrics).