Skip to content

openTdataCH/stop_sync_osm_atlas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

225 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OSM & ATLAS Synchronization

Welcome! This project provides a systematic pipeline to identify, analyze, and resolve discrepancies between public transport stop data from ATLAS (Swiss official data) and OpenStreetMap (OSM).

It automates data download and processing (ATLAS, OSM, GTFS), performs exact/distance-based/route-based matching, and serves an interactive web app for inspecting matches, problems, and manual fixes.

Geneva stops


Table of Contents


Prerequisites

  • Docker Desktop with Compose v2 (required)
  • Internet connection to download datasets (ATLAS, OSM, GTFS)

Installation & Setup (with Docker)

Just want to run it? Here's the fastest path:

  1. Clone the repository

    git clone https://github.com/openTdataCH/stop_sync_osm_atlas.git
    cd stop_sync_osm_atlas
  2. Configure environment (optional):

    • The application works out-of-the-box locally without a .env file. If you need to customize settings (DB users/passwords, URIs, flags, pipeline timezone), copy env.example to .env and adjust the values.
  3. Build and Run with Docker Compose:

    docker compose up --build

    Docker will automatically:

    • Build the application images
    • Download and start Postgres (PostGIS) database
    • Start the redis container
    • Start the web app container
    • Start the scheduler container (daily pipeline at 2:00 Europe/Zurich)

    Note: The data pipeline (downloading and matching ATLAS/OSM/GTFS data) does not run automatically on startup. It runs in the dedicated scheduler service at the configured time. To run it immediately, use the VS Code Task "Docker: Trigger Scheduled Pipeline Now" (see below), or run:

    docker exec stop_sync_osm_atlas_scheduler python -m matching_and_import_db.scheduler.job_runner --mode full --trigger manual

    Data and database state are cached across runs (./data directory and the postgres_data volume).

  4. Access the application:

  5. To stop the services:

    docker compose down

    To remove all data: docker compose down -v

Pipeline

flowchart LR
    subgraph Sources["Data Sources"]
        A[("ATLAS<br/>Official Swiss Data")]
        O[("OSM<br/>Community Data")]
    end
    
    subgraph Pipeline["Processing Pipeline"]
        direction TB
        D["1. Download & Process"]
        M["2. Multi-Stage Matching"]
        P["3. Problem Detection"]
        I["4. Database Import"]
        D --> M --> P --> I
    end
    
    subgraph Output["Output"]
        DB[("PostgreSQL<br/>+ PostGIS")]
        W["Web Application"]
        DB --> W
    end
    
    A --> D
    O --> D
    I --> DB
Loading

When the daily scheduled job runs (or when manually triggered), the pipeline executes:

  • matching_and_import_db/downloader/get_atlas_data.py: downloads ATLAS data and GTFS, builds optimized route/stop artifacts
  • matching_and_import_db/downloader/get_osm_data.py: fetches OSM data via Overpass and processes it
  • matching_and_import_db/orchestrator.py: runs the matching pipeline
  • matching_and_import_db/database/importer.py: imports refreshed data into the import database

Downloads are cached under data/raw/ and processed artifacts under data/processed/ — see 1. Download and process data for details.

Data Import

After acquisition, matching_and_import_db/database/importer.py populates the Postgres databases (e.g., stops, problems, persistent_data, atlas_stops, osm_nodes, routes_and_directions).

During import, the UI shows a global maintenance popup. Downloading and matching stages run in the background without blocking normal browsing.

Background Scheduler & Microservices

Docker Compose now runs five primary services:

  • app: Flask web app and API.
  • scheduler: Dedicated background worker that runs the daily pipeline at 2:00 (PIPELINE_TIMEZONE, default Europe/Zurich).
  • db: Postgres + PostGIS import database.
  • redis: Shared cache/rate-limit and pipeline status/lock storage.
  • migrator: One-shot startup service that runs flask db upgrade before app and scheduler.

For local test execution, there is also a dedicated test service/image with both app and pipeline dependencies.

Scheduler behavior:

  • Uses APScheduler cron trigger (PIPELINE_SCHEDULE_HOUR, PIPELINE_SCHEDULE_MINUTE).
  • Publishes run status to /api/system/pipeline_status.
  • Sets maintenance mode only for the import phase so the UI can show "Data update in progress" with elapsed/ETA.
  • Uses a distributed lock to prevent concurrent runs.

Manual Import & Testing (VS Code Tasks)

If you have VS Code installed, we have provided built-in tasks to quickly run commands inside the running Docker containers without constantly restarting Docker:

  1. Open the VS Code Command Palette (Cmd+Shift+P on Mac).
  2. Select Tasks: Run Task.
  3. Choose one of the predefined tasks:
    • Docker: Run All Tests: Executes the pytest suite.
    • Docker: Run Matching & Import (Existing Data): Runs matching + import on already downloaded files through the scheduler runner.
    • Docker: Run Full Data Pipeline (Download & Match & Import): Runs full download + matching + import through the scheduler runner.
    • Docker: Trigger Scheduled Pipeline Now: Fires a full manual run equivalent to the scheduled daily run.

You can do this while the app container is running in the background.

Running the Web Application

The Flask server is started automatically by Docker Compose.

Access it at http://localhost:5001/.

Usage

  • Map View: Browse stops by type (matched, unmatched, osm) and match method.
  • Filters & Search: Filter by ATLAS SLOID, OSM Node ID, UIC reference, or route.
  • Problems: On the problems page you can solve the problems. See 3. Problems.
  • Manage Data: See 4. Database.
  • Generating Reports: The web app can generate CSV and PDF reports. See 5.5 Generate Reports and PDFs.

CI & Tests

This repository uses GitHub Actions for continuous integration.

Contributing and project Status

This project is a work in progress. Feedback and improvements are welcome! Feel free to submit issues and pull requests. Thank you for your interest! 🚀


About

Establish a reliable methodology for comparing data between ATLAS and OSM. — Identify problematic cases requiring special attention. — Provide tools to facilitate the resolution of inconsistencies. — Contribute to the continuous improvement of public transport data quality in Switzerland.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors