Browser-Bookmark Checker follows a modular architecture with clear separation of concerns:
Input Files → Parsers → Normalization → Deduplication → Merge → Exporters → Output Files
- Responsibility: Parse different bookmark file formats
- Supported Formats: Netscape HTML, Chrome JSON
- Extensibility: Easy to add new parsers (Firefox
.jsonlz4, Safari.plist)
- Responsibility: Normalize URLs for duplicate detection
- Features:
- Remove tracking parameters
- Normalize scheme and host
- Strip fragments and default ports
- Preserve non-tracking parameters
- Responsibility: Identify duplicate bookmarks
- Strategies:
- Canonical URL matching (primary)
- Fuzzy title matching within same domain (optional)
- Algorithm: RapidFuzz
partial_ratiofor fuzzy matching
- Responsibility: Select representative bookmarks and organize output
- Selection: Earliest
ADD_DATEor first bookmark - Organization: By domain (
Merged/<domain>/)
- Responsibility: Export results in various formats
- Formats: Netscape HTML, CSV report
- Framework: PyQt6
- Features:
- Drag & drop support
- Multi-language support (11 languages)
- Progress indicators
- Dark theme
- ScanThread: Background scanning of directories
- MetadataThread: Background metadata extraction
- Prevents UI freezing during long operations
- Parse: Read bookmark files →
BookmarkCollection - Annotate: Add canonical URLs → Updated
BookmarkCollection - Group: Identify duplicates →
dict[str, list[Bookmark]] - Merge: Select representatives →
BookmarkCollection - Export: Write to files → HTML + CSV
- Modularity: Each component has a single responsibility
- Extensibility: Easy to add new parsers and exporters
- Determinism: Identical inputs produce identical outputs
- Performance: Efficient algorithms for large datasets
- Privacy: Fully offline, no network calls
- PyQt6: GUI framework
- BeautifulSoup4: HTML parsing
- RapidFuzz: Fuzzy string matching
- Standard Library:
urllib.parse,pathlib,dataclasses
- Unit Tests:
tests/test_*.py - Coverage: Aim for 85%+ coverage
- Tools: pytest, pytest-cov
- Plugin system for custom parsers
- Caching layer for metadata
- Database backend for large-scale operations
- REST API for programmatic access