Native Python AI Orchestrator: Podcast Script Pipeline

An end-to-end automated data pipeline engineered to extract raw textual data from remote sources, sanitize the ingestion stream, and interact directly with a Large Language Model to synthesize structured podcast scripts.

Core Architectural Principles

Zero-Dependency Policy

This system is engineered using 100% Native Python with strictly no external library requirements. It adheres to a Zero pip install policy, relying exclusively on the Python Standard Library for all operations including network requests, data sanitization, and JSON serialization.

Modular Integrity

The architecture is divided into a 4-tier modular structure:

text_extractor.py: The ingestion layer responsible for executing secure HTTP GET requests, implementing HTTP User-Agent header spoofing to bypass basic bot-protection, and providing robust I/O failure handling during data persistence.
data_cleaner.py: The sanitization engine utilizing class-level Regex pre-compilation for optimal CPU execution and lazy-loading iterators (finditer) for strict RAM optimization to prevent Out-of-Memory (OOM) faults.
script_generator.py: The AI Orchestration layer executing raw REST API POST requests directly against the Gemini 2.5 Flash endpoints. It constructs JSON payloads, processes deserialization, and enforces robust API exception handling.
main_pipeline.py: The centralized execution entry point orchestrating the sequence of operations. It employs absolute path resolution via pathlib for environment-agnostic execution and enforces Fail-Fast logic, instantly terminating the process (sys.exit(1)) upon any component failure.

Technical Topology

The execution lifecycle follows a linear procedural pipeline:

Network Ingestion: The system initializes by acquiring the target URL via CLI arguments. text_extractor.py retrieves the raw payload and writes it to data/input/raw.txt.
Sanitization Stream: data_cleaner.py applies pre-compiled regular expressions to strip HTML tags and normalize whitespace. The optimized output is written to data/input/clean.txt.
AI Generation: script_generator.py transmits the sanitized payload via a POST request to the Google API Gateway.
Output Persistence: The resulting Markdown script is safely persisted to data/output/podcast_script.md.

Directory Tree

.
├── data/
│   ├── input/
│   │   ├── .gitkeep
│   │   ├── clean.txt
│   │   └── raw.txt
│   └── output/
│       ├── .gitkeep
│       └── podcast_script.md
├── src/
│   ├── check_models.py
│   ├── data_cleaner.py
│   ├── main_pipeline.py
│   ├── script_generator.py
│   └── text_extractor.py
├── .gitignore
└── README.md

Installation & Execution Protocol

# Clone the repository
git clone https://github.com/idkBsy/native-python-ai-orchestrator.git

# Execute the central orchestrator pipeline
python native-python-ai-orchestrator/src/main_pipeline.py --url "<TARGET_URL>" --api-key "<GEMINI_KEY>"

# Execute the diagnostic probe to verify API access and model authorization
python native-python-ai-orchestrator/src/check_models.py --api-key "<GEMINI_KEY>"

Security Policy

CRITICAL WARNING: CREDENTIAL LEAKAGE Under no circumstances should production authentication tokens be committed to version control. The Gemini API keys grant access to sensitive network resources.

API keys must strictly be injected via Command Line Interface (CLI) arguments.
API keys must never be hardcoded into the source files or unencrypted environment configurations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Native Python AI Orchestrator: Podcast Script Pipeline

Core Architectural Principles

Zero-Dependency Policy

Modular Integrity

Technical Topology

Directory Tree

Installation & Execution Protocol

Security Policy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Native Python AI Orchestrator: Podcast Script Pipeline

Core Architectural Principles

Zero-Dependency Policy

Modular Integrity

Technical Topology

Directory Tree

Installation & Execution Protocol

Security Policy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages