OSM & ATLAS Synchronization

Welcome! This project provides a systematic pipeline to identify, analyze, and resolve discrepancies between public transport stop data from ATLAS (Swiss official data) and OpenStreetMap (OSM).

It automates data download and processing (ATLAS, OSM, GTFS), performs exact/distance-based/route-based matching, and serves an interactive web app for inspecting matches, problems, and manual fixes.

Prerequisites

Docker Desktop with Compose v2 (required)
Internet connection to download datasets (ATLAS, OSM, GTFS)

Installation & Setup (with Docker)

Just want to run it? Here's the fastest path:

Clone the repository

git clone https://github.com/openTdataCH/stop_sync_osm_atlas.git
cd stop_sync_osm_atlas

Configure environment (optional):
- The application works out-of-the-box locally without a .env file. If you need to customize settings (DB users/passwords, URIs, flags, pipeline timezone), copy env.example to .env and adjust the values.
Build and Run with Docker Compose:
```
docker compose up --build
```
Docker will automatically:
- Build the application images
- Download and start Postgres (PostGIS) database
- Start the redis container
- Start the web app container
- Start the scheduler container (daily pipeline at 2:00 Europe/Zurich)
Note: The data pipeline (downloading and matching ATLAS/OSM/GTFS data) does not run automatically on startup. It runs in the dedicated scheduler service at the configured time. To run it immediately, use the VS Code Task "Docker: Trigger Scheduled Pipeline Now" (see below), or run:
```
docker exec stop_sync_osm_atlas_scheduler python -m matching_and_import_db.scheduler.job_runner --mode full --trigger manual
```
Data and database state are cached across runs (./data directory and the postgres_data volume).
Access the application:
- Web app: http://localhost:5001
- Postgres database: localhost:5432 (user: stops_user, password: 1234)
To stop the services:
```
docker compose down
```
To remove all data: docker compose down -v

Pipeline

flowchart LR
    subgraph Sources["Data Sources"]
        A[("ATLAS<br/>Official Swiss Data")]
        O[("OSM<br/>Community Data")]
    end
    
    subgraph Pipeline["Processing Pipeline"]
        direction TB
        D["1. Download & Process"]
        M["2. Multi-Stage Matching"]
        P["3. Problem Detection"]
        I["4. Database Import"]
        D --> M --> P --> I
    end
    
    subgraph Output["Output"]
        DB[("PostgreSQL<br/>+ PostGIS")]
        W["Web Application"]
        DB --> W
    end
    
    A --> D
    O --> D
    I --> DB

When the daily scheduled job runs (or when manually triggered), the pipeline executes:

matching_and_import_db/downloader/get_atlas_data.py: downloads ATLAS data and GTFS, builds optimized route/stop artifacts
matching_and_import_db/downloader/get_osm_data.py: fetches OSM data via Overpass and processes it
matching_and_import_db/orchestrator.py: runs the matching pipeline
matching_and_import_db/database/importer.py: imports refreshed data into the import database

Downloads are cached under data/raw/ and processed artifacts under data/processed/ — see 1. Download and process data for details.

Data Import

After acquisition, matching_and_import_db/database/importer.py populates the Postgres databases (e.g., stops, problems, persistent_data, atlas_stops, osm_nodes, routes_and_directions).

During import, the UI shows a global maintenance popup. Downloading and matching stages run in the background without blocking normal browsing.

Background Scheduler & Microservices

Docker Compose now runs five primary services:

app: Flask web app and API.
scheduler: Dedicated background worker that runs the daily pipeline at 2:00 (PIPELINE_TIMEZONE, default Europe/Zurich).
db: Postgres + PostGIS import database.
redis: Shared cache/rate-limit and pipeline status/lock storage.
migrator: One-shot startup service that runs flask db upgrade before app and scheduler.

For local test execution, there is also a dedicated test service/image with both app and pipeline dependencies.

Scheduler behavior:

Uses APScheduler cron trigger (PIPELINE_SCHEDULE_HOUR, PIPELINE_SCHEDULE_MINUTE).
Publishes run status to /api/system/pipeline_status.
Sets maintenance mode only for the import phase so the UI can show "Data update in progress" with elapsed/ETA.
Uses a distributed lock to prevent concurrent runs.

Manual Import & Testing (VS Code Tasks)

If you have VS Code installed, we have provided built-in tasks to quickly run commands inside the running Docker containers without constantly restarting Docker:

Open the VS Code Command Palette (Cmd+Shift+P on Mac).
Select Tasks: Run Task.
Choose one of the predefined tasks:
- Docker: Run All Tests: Executes the pytest suite.
- Docker: Run Matching & Import (Existing Data): Runs matching + import on already downloaded files through the scheduler runner.
- Docker: Run Full Data Pipeline (Download & Match & Import): Runs full download + matching + import through the scheduler runner.
- Docker: Trigger Scheduled Pipeline Now: Fires a full manual run equivalent to the scheduled daily run.

You can do this while the app container is running in the background.

Running the Web Application

The Flask server is started automatically by Docker Compose.

Access it at http://localhost:5001/.

Usage

Map View: Browse stops by type (matched, unmatched, osm) and match method.
Filters & Search: Filter by ATLAS SLOID, OSM Node ID, UIC reference, or route.
Problems: On the problems page you can solve the problems. See 3. Problems.
Manage Data: See 4. Database.
Generating Reports: The web app can generate CSV and PDF reports. See 5.5 Generate Reports and PDFs.

CI & Tests

This repository uses GitHub Actions for continuous integration.

Workflow: tests.yml
CI documentation: CI and Tests

Contributing and project Status

This project is a work in progress. Feedback and improvements are welcome! Feel free to submit issues and pull requests. Thank you for your interest! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 225 Commits
.github/workflows		.github/workflows
.vscode		.vscode
backend		backend
documentation		documentation
matching_and_import_db		matching_and_import_db
migrations		migrations
scripts		scripts
static		static
templates		templates
tests		tests
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.override.yml		docker-compose.override.yml
docker-compose.yml		docker-compose.yml
dockerfile		dockerfile
entrypoint.sh		entrypoint.sh
env.example		env.example
fix_payload.py		fix_payload.py
package-lock.json		package-lock.json
package.json		package.json
publiccode.yaml		publiccode.yaml
requirements-base.txt		requirements-base.txt
requirements-scheduler.txt		requirements-scheduler.txt
requirements-test.txt		requirements-test.txt
requirements-web.txt		requirements-web.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OSM & ATLAS Synchronization

Table of Contents

Prerequisites

Installation & Setup (with Docker)

Pipeline

Data Import

Background Scheduler & Microservices

Manual Import & Testing (VS Code Tasks)

Running the Web Application

Usage

CI & Tests

Contributing and project Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OSM & ATLAS Synchronization

Table of Contents

Prerequisites

Installation & Setup (with Docker)

Pipeline

Data Import

Background Scheduler & Microservices

Manual Import & Testing (VS Code Tasks)

Running the Web Application

Usage

CI & Tests

Contributing and project Status

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages