Thank you for your interest in contributing to Medium Scraper! This guide will help you get started.
- Python 3.10 or higher
- Git
- UV (recommended package manager) or pip
# Clone the repository
git clone https://github.com/BehindTheStack/medium-scrap.git
cd medium-scrap
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/Mac
# or
.venv\Scripts\activate # Windows
# Install dependencies
pip install -e .The project follows Clean Architecture and Domain-Driven Design principles:
src/
├── domain/ # Business rules
│ ├── entities/ # Core entities (Post, Author, etc.)
│ ├── repositories/ # Repository interfaces
│ └── services/ # Domain services
├── application/ # Use cases
│ └── use_cases/ # Business logic orchestration
├── infrastructure/ # Technical implementations
│ ├── adapters/ # External API adapters
│ ├── config/ # Configuration management
│ └── external/ # Repositories and integrations
└── presentation/ # User interface
└── cli.py # Command line interface
We have a complete organized test suite:
# All organized tests
pytest tests/unit/ tests/integration/ -v
# Unit tests only
pytest tests/unit/ -v
# Integration tests only
pytest tests/integration/ -v
# With coverage
pytest tests/unit/ tests/integration/ --cov=src --cov-report=html- Unit Tests (
tests/unit/): Test components in isolation - Integration Tests (
tests/integration/): Test complete flows
- Follow PEP 8
- Use type hints whenever possible
- Documentation in docstrings following Google standard
"""
Module docstring explaining the purpose
"""
from typing import List, Optional
from dataclasses import dataclass
@dataclass
class ExampleEntity:
"""
Example entity following domain patterns
"""
id: str
name: str
optional_field: Optional[str] = None
def validate(self) -> None:
"""Validate entity rules"""
if not self.id:
raise ValueError("ID is required")We use Conventional Commits:
feat: add support for custom domains
fix: resolve pagination issue in API adapter
docs: update README with new features
test: add integration tests for scraping
refactor: improve error handling in CLI
style: format code according to PEP 8
- Use the bug issue template
- Include steps to reproduce
- Provide environment information
- Add error logs when possible
- Use the feature issue template
- Explain the use case
- Provide examples of how it would be used
- Consider architecture impacts
- Create an issue discussing the feature
- Fork the repository
- Create a specific branch:
feature/feature-name - Implement following existing architecture
- Add tests (unit and/or integration)
- Update documentation if necessary
- Create a Pull Request
- Create an issue describing the bug
- Fork the repository
- Create a branch:
fix/bug-name - Fix the bug
- Add test that reproduces and validates the fix
- Create a Pull Request
Add to medium_sources.yaml:
new-publication:
type: publication # or username for user profiles
name: domain.com # or @username
description: "Publication description"
auto_discover: true
custom_domain: true # if custom domainYou can add or update sources using the built-in CLI add-source subcommand. This is useful for quickly registering a publication without editing the YAML file by hand.
Example:
python main.py add-source \
--key pinterest \
--type publication \
--name pinterest \
--description "Pinterest Engineering" \
--auto-discoverNotes:
- The command writes to
medium_sources.yamlin the repository root and will create thesourcessection if it does not exist. - The implementation avoids importing optional network adapters when running this subcommand, so it can run even if HTTP dependencies (like
httpx) are not installed. - After running
add-source, verify changes withpython main.py --list-sourcesor by inspectingmedium_sources.yaml.
For publications with specific logic, add to repository:
# In src/infrastructure/external/repositories.py
def _load_predefined_publications(self):
# Add your custom configuration
new_config = PublicationConfig(
id=PublicationId("new-pub"),
name="New Publication",
type=PublicationType.CUSTOM_DOMAIN,
domain="domain.com",
graphql_url="https://domain.com/_/graphql",
known_post_ids=[]
)# Test with known publication
python main.py --publication netflix --limit 5 --format table --skip-session
# Test with configured source
python main.py --source netflix --limit 3 --format json
# Test with custom domain
python main.py --publication example.com --limit 5 --skip-session# Run test suite
pytest tests/integration/test_comprehensive_scenarios.py -v
# Test specific to your changes
pytest tests/unit/test_[your_module].py -v- Code follows project standards
- Tests added/updated
- Documentation updated
- Commits follow Conventional Commits
- Branch is up to date with main
- No merge conflicts
## Description
Brief description of changes
## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Documentation
- [ ] Refactoring
## How to Test
1. Steps to test the change
2. Specific commands
3. Expected results
## Checklist
- [ ] Tests passing
- [ ] Code reviewed
- [ ] Documentation updated@dataclass
class Post:
"""Represents a Medium post"""
id: PostId
title: str
slug: str
author: Author
published_at: datetime
reading_time: float
@dataclass
class PublicationConfig:
"""Publication configuration"""
id: PublicationId
name: str
type: PublicationType
domain: str
graphql_url: str
known_post_ids: List[PostId]# Enable verbose logs (if implemented)
python main.py --publication netflix --limit 5 --verbose
# Use Python debug mode
python -m pdb main.py --publication netflix --limit 5import logging
logger = logging.getLogger(__name__)
logger.info("Important information")
logger.debug("Debug details")
logger.warning("Warning about something")
logger.error("Recoverable error")- Rate Limiting: Respect Medium API limits
- Caching: Consider caching for unchanging data
- Pagination: Implement efficient pagination
- Error Handling: Handle errors gracefully
import time
def with_rate_limit(self, delay: float = 0.5):
"""Apply rate limiting between requests"""
time.sleep(delay)
# Your logic here- Rich Library - User interface
- Click - CLI framework
- Pytest - Testing framework
# Code formatting
black src/ tests/
# Linting
flake8 src/ tests/
# Type checking
mypy src/- Issues: For bugs and feature requests
- Discussions: For general questions
- Wiki: Additional documentation
- Check if the problem has already been reported
- Use the appropriate issue template
- Provide as much context as possible
- Include versions and environment
By contributing to this project, you agree that your contributions will be licensed under the same MIT License as the project.
- ✅ Your contributions can be used commercially
- ✅ They can be modified and redistributed
- ✅ You retain copyright of your original contributions
⚠️ You guarantee you have the right to license your contributions
No separate CLA signing is required. The MIT license is sufficient and clear about rights and responsibilities.
- Be respectful with other contributors
- Keep discussions constructive in issues and PRs
- Document your changes adequately
- Test before submitting changes
Thank you for contributing to make Medium Scraper even better! 🚀
Need help? Open an issue or start a discussion. We're here to help! 😊
### 10. Debugging e Logs
#### Debug Local
```bash
# Habilite logs verbose (se implementado)
python main.py --publication netflix --limit 5 --verbose
# Use modo debug do Python
python -m pdb main.py --publication netflix --limit 5
import logging
logger = logging.getLogger(__name__)
logger.info("Informação importante")
logger.debug("Detalhes para debug")
logger.warning("Aviso sobre algo")
logger.error("Erro recuperável")- Rate Limiting: Respeite limites da API do Medium
- Caching: Considere cache para dados que não mudam
- Pagination: Implemente paginação eficiente
- Error Handling: Trate erros graciosamente
import time
def with_rate_limit(self, delay: float = 0.5):
"""Aplica rate limiting entre requests"""
time.sleep(delay)
# Sua lógica aqui- Rich Library - Interface de usuário
- Click - CLI framework
- Pytest - Framework de testes
# Formatação de código
black src/ tests/
# Linting
flake8 src/ tests/
# Type checking
mypy src/- Issues: Para bugs e feature requests
- Discussions: Para perguntas gerais
- Wiki: Documentação adicional
- Verifique se o problema já foi reportado
- Use o template de issue apropriado
- Forneça o máximo de contexto possível
- Inclua versões e ambiente
Ao contribuir com este projeto, você concorda que suas contribuições serão licenciadas sob a mesma Licença MIT do projeto.
- ✅ Suas contribuições podem ser usadas comercialmente
- ✅ Podem ser modificadas e redistribuídas
- ✅ Você mantém o copyright de suas contribuições originais
⚠️ Você garante que tem direito de licenciar suas contribuições
Não é necessário assinar um CLA separado. A licença MIT é suficiente e clara sobre os direitos e responsabilidades.
- Seja respeitoso com outros contribuidores
- Mantenha discussões construtivas em issues e PRs
- Documente suas mudanças adequadamente
- Teste antes de submeter alterações
Obrigado por contribuir para tornar o Medium Scraper ainda melhor! 🚀
Precisa de ajuda? Abra uma issue ou inicie uma discussão. Estamos aqui para ajudar! 😊