Architecture

Overview

Browser-Bookmark Checker follows a modular architecture with clear separation of concerns:

Input Files → Parsers → Normalization → Deduplication → Merge → Exporters → Output Files

Core Components

Parsers (`bookmark_checker/core/parsers.py`)

Responsibility: Parse different bookmark file formats
Supported Formats: Netscape HTML, Chrome JSON
Extensibility: Easy to add new parsers (Firefox .jsonlz4, Safari .plist)

URL Canonicalization (`bookmark_checker/core/utils.py`)

Responsibility: Normalize URLs for duplicate detection
Features:
- Remove tracking parameters
- Normalize scheme and host
- Strip fragments and default ports
- Preserve non-tracking parameters

Deduplication (`bookmark_checker/core/dedupe.py`)

Responsibility: Identify duplicate bookmarks
Strategies:
1. Canonical URL matching (primary)
2. Fuzzy title matching within same domain (optional)
Algorithm: RapidFuzz partial_ratio for fuzzy matching

Merging (`bookmark_checker/core/merge.py`)

Responsibility: Select representative bookmarks and organize output
Selection: Earliest ADD_DATE or first bookmark
Organization: By domain (Merged/<domain>/)

Exporters (`bookmark_checker/core/exporters.py`)

Responsibility: Export results in various formats
Formats: Netscape HTML, CSV report

GUI Architecture

Main Window (`bookmark_checker/ui/main_window.py`)

Framework: PyQt6
Features:
- Drag & drop support
- Multi-language support (11 languages)
- Progress indicators
- Dark theme

Threading

ScanThread: Background scanning of directories
MetadataThread: Background metadata extraction
Prevents UI freezing during long operations

Data Flow

Parse: Read bookmark files → BookmarkCollection
Annotate: Add canonical URLs → Updated BookmarkCollection
Group: Identify duplicates → dict[str, list[Bookmark]]
Merge: Select representatives → BookmarkCollection
Export: Write to files → HTML + CSV

Design Principles

Modularity: Each component has a single responsibility
Extensibility: Easy to add new parsers and exporters
Determinism: Identical inputs produce identical outputs
Performance: Efficient algorithms for large datasets
Privacy: Fully offline, no network calls

Dependencies

PyQt6: GUI framework
BeautifulSoup4: HTML parsing
RapidFuzz: Fuzzy string matching
Standard Library: urllib.parse, pathlib, dataclasses

Testing

Unit Tests: tests/test_*.py
Coverage: Aim for 85%+ coverage
Tools: pytest, pytest-cov

Future Improvements

Plugin system for custom parsers
Caching layer for metadata
Database backend for large-scale operations
REST API for programmatic access

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Architecture

Overview

Core Components

Parsers (`bookmark_checker/core/parsers.py`)

URL Canonicalization (`bookmark_checker/core/utils.py`)

Deduplication (`bookmark_checker/core/dedupe.py`)

Merging (`bookmark_checker/core/merge.py`)

Exporters (`bookmark_checker/core/exporters.py`)

GUI Architecture

Main Window (`bookmark_checker/ui/main_window.py`)

Threading

Data Flow

Design Principles

Dependencies

Testing

Future Improvements

Uh oh!

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Architecture

Overview

Core Components

Parsers (bookmark_checker/core/parsers.py)

URL Canonicalization (bookmark_checker/core/utils.py)

Deduplication (bookmark_checker/core/dedupe.py)

Merging (bookmark_checker/core/merge.py)

Exporters (bookmark_checker/core/exporters.py)

GUI Architecture

Main Window (bookmark_checker/ui/main_window.py)

Threading

Data Flow

Design Principles

Dependencies

Testing

Future Improvements

Parsers (`bookmark_checker/core/parsers.py`)

URL Canonicalization (`bookmark_checker/core/utils.py`)

Deduplication (`bookmark_checker/core/dedupe.py`)

Merging (`bookmark_checker/core/merge.py`)

Exporters (`bookmark_checker/core/exporters.py`)

Main Window (`bookmark_checker/ui/main_window.py`)