This document provides context for AI coding assistants (Claude, GPT, Copilot, Cursor, etc.) working with the ryandata-address-utils codebase.
Package: ryandata-address-utils
Purpose: US address parsing library with Pydantic models, validation, and pandas integration
Python Version: 3.12+ required (<=3.13)
License: MIT
- Facade Pattern:
AddressServiceprovides a unified interface to parsers, validators, and data sources - Protocol-based Interfaces: Uses Python
Protocolclasses instead of ABCs for loose coupling - Factory Pattern:
ParserFactory,DataSourceFactoryfor extensible component creation - Composite Pattern:
CompositeValidatorchains multiple validators - Builder Pattern:
AddressBuilderfor fluent address construction
pydantic>=2.0.0- Data validation and serializationusaddress>=0.5.16- Default US address parser backendabstract-validation-base- ProcessLog system for transformation trackingtyper+trogon- CLI with interactive TUI
| File | Purpose |
|---|---|
src/ryandata_address_utils/__init__.py |
Public API exports - check here for available symbols |
src/ryandata_address_utils/service.py |
AddressService facade - main entry point |
src/ryandata_address_utils/models/address.py |
Address Pydantic model with 26+ fields |
src/ryandata_address_utils/models/results.py |
ParseResult, ZipInfo dataclasses |
src/ryandata_address_utils/protocols.py |
Protocol definitions for extensibility |
src/ryandata_address_utils/parsers/ |
Parser implementations (usaddress, libpostal) |
src/ryandata_address_utils/validation/ |
Validators (ZIP, state, composite) |
src/ryandata_address_utils/data/ |
Data sources (CSV-backed ZIP database) |
src/ryandata_address_utils/core/ |
Shared utilities (formatters, tracking, errors) |
- Formatter: Ruff (
ruff format) - Linter: Ruff with
E, F, I, UP, B, SIMrule sets - Type Checker: MyPy in strict mode (
disallow_untyped_defs = true) - Line Length: 100 characters
# Use Field() with descriptions for all model fields
field_name: str | None = Field(
default=None,
description="Clear description of the field",
validation_alias=AliasChoices("FieldName", "alias"),
)# Define protocols in protocols.py
class ValidatorProtocol(Protocol):
def validate(self, address: Address) -> ValidationResult: ...
# Implementations satisfy protocols implicitly
class ZipCodeValidator:
def validate(self, address: Address) -> ValidationResult:
# Implementation# Models inherit from RyanDataValidationBase which provides process_log
address.add_cleaning_process(
field="StateName",
original_value="Texas",
new_value="TX",
reason="Normalized state name to abbreviation",
)- Use
RyanDataAddressErrorfor address-specific errors - Use
RyanDataValidationErrorfor validation failures - All errors include package context via
PACKAGE_NAME
- Create class in
validation/validators.py - Implement
ValidatorProtocol(must havevalidate(address) -> ValidationResult) - Register with
CompositeValidatorif needed
class MyValidator:
def validate(self, address: Address) -> ValidationResult:
errors = []
# validation logic
return ValidationResult(is_valid=len(errors) == 0, errors=errors)- Create class in
parsers/implementingAddressParserProtocol - Register with
ParserFactory.register("name", MyParser) - Use via
AddressService(parser=ParserFactory.create("name"))
- Add field to
models/address.pywith properField()definition - Add to
ADDRESS_FIELDSenum if needed for iteration - Update
AddressFormatterif field affects full address computation
service = AddressService()
df = service.parse_dataframe(df, "address_column", prefix="addr_")
# Returns DataFrame with addr_StreetName, addr_ZipCode, etc.- Framework: pytest with pytest-cov
- Test Location:
tests/directory - Run Tests:
uv run pytest - Coverage: Target 80%+ coverage
uv run pytest # Run all tests
uv run pytest -v # Verbose output
uv run pytest --cov=src # With coverage
uv run pytest -k "test_parse" # Run specific testsuv sync # Install dependencies
uv run pytest # Run tests
uv run ruff check src/ # Lint
uv run ruff format src/ # Format
uv run mypy src/ # Type check
uv run ryandata-address-utils-setup # Setup libpostal (optional)See docs/ARCHITECTURE.md for:
- Detailed data flow diagrams
- SOLID principles applied
- DRY improvements made
- Full package structure
- Do not redesign architecture without explicit approval - stick to incremental changes
- Use existing patterns - follow the protocol/factory patterns already in place
- ProcessLog is preferred over legacy
CleaningTrackerfor new code - Check
__init__.pyfor the public API before suggesting imports - Run
uv run pytestto verify changes don't break existing tests - Cursor-specific workflows are documented in
.cursor/agents.md
If you're an AI agent assisting with a project that uses ryandata-address-utils and you discover a bug, missing feature, or documentation issue, you can file an issue against this package.
Repository: Abstract-Data/RyanData-Address-Utils
If your AI client (Cursor, Claude Desktop, etc.) has the GitHub MCP server configured, use the mcp_github_create_issue tool targeting Abstract-Data/RyanData-Address-Utils.
Cursor (~/.cursor/mcp.json):
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": "<your-token>"
}
}
}
}This repository uses GitHub's YAML-based issue forms. When creating issues via MCP, format the body to match the template fields:
Bug Report — Use title prefix [Bug]:
### Prerequisites
- [x] I have searched existing issues
- [x] I am using the latest version
### Bug Description
[Clear description of the bug]
### Steps to Reproduce
```python
from ryandata_address_utils import AddressService, parse
# Minimal code that reproduces the issue[What should happen]
[What actually happens]
[AddressService / Parsers / Models / Validators / Pandas / CLI]
[Stack trace if applicable]
- Package Version: [e.g., 0.7.1]
- Python Version: [3.12 / 3.13]
- OS: [macOS / Linux / Windows]
**Feature Request** — Use title prefix `[Feature]: `
```markdown
### Prerequisites
- [x] I have searched existing issues
- [x] I have read the documentation
### Problem Statement
[What limitation or pain point does this address?]
### Proposed Solution
[What would you like to see?]
### Alternatives Considered
[Other approaches you've thought about]
### Affected Component
[AddressService / Parsers / Models / Validators / Pandas / CLI / New Component]
### Use Case Example
```python
# Example code showing how this feature would be used
[Nice to have / Would significantly improve workflow / Blocking use case]
**Documentation Issue** — Use title prefix `[Docs]: `
```markdown
### Issue Type
[Missing / Incorrect / Unclear / Needs example / Typo / Outdated]
### Location
[README.md / AGENTS.md / docs/ARCHITECTURE.md / Docstrings / API reference]
### Problem Description
[What's wrong or missing?]
### Suggested Improvement
[How should the documentation be improved?]
Issues are automatically labeled based on content:
| Keywords in Issue | Label Applied |
|---|---|
| AddressService, parse, facade | component:service |
| Address, Pydantic, model, Field | component:models |
| usaddress, libpostal, parser | component:parsers |
| validator, ZIP, state, validation | component:validators |
| pandas, DataFrame, series | component:pandas |
| CLI, setup, TUI, typer | component:cli |
| ProcessLog, cleaning, tracking | component:tracking |
Bug reports automatically receive a helpful comment with relevant documentation links.
When submitting PRs to this repository, ensure the following checks pass locally:
# Linting
uv run ruff check src tests
uv run ruff format src tests
# Type checking
uv run mypy src
# Tests
uv run pytest- Link to related issue — Reference with
Closes #123 - Type of change — Bug fix, feature, docs, refactor, tests
- Tests — Add tests for new functionality
- Documentation — Update docs/docstrings for user-facing changes
These run automatically on PRs:
ruff checkandruff format --checkmypy srcpytest --cov
PRs cannot be merged until all CI checks pass.