Welcome! This project provides a systematic pipeline to identify, analyze, and resolve discrepancies between public transport stop data from ATLAS (Swiss official data) and OpenStreetMap (OSM).
It automates data download and processing (ATLAS, OSM, GTFS), performs exact/distance-based/route-based matching, and serves an interactive web app for inspecting matches, problems, and manual fixes.
- Prerequisites
- Installation & Setup (with Docker)
- Pipeline
- Background Scheduler & Microservices
- Running the Web Application
- Environment & Secrets
- CI & Tests
- Contributing and Project Status
- Docker Desktop with Compose v2 (required)
- Internet connection to download datasets (ATLAS, OSM, GTFS)
Just want to run it? Here's the fastest path:
-
Clone the repository
git clone https://github.com/openTdataCH/stop_sync_osm_atlas.git cd stop_sync_osm_atlas -
Configure environment (optional):
- The application works out-of-the-box locally without a
.envfile. If you need to customize settings (DB users/passwords, URIs, flags, pipeline timezone), copyenv.exampleto.envand adjust the values.
- The application works out-of-the-box locally without a
-
Build and Run with Docker Compose:
docker compose up --build
Docker will automatically:
- Build the application images
- Download and start Postgres (PostGIS) database
- Start the redis container
- Start the web app container
- Start the scheduler container (daily pipeline at 2:00 Europe/Zurich)
Note: The data pipeline (downloading and matching ATLAS/OSM/GTFS data) does not run automatically on startup. It runs in the dedicated scheduler service at the configured time. To run it immediately, use the VS Code Task "Docker: Trigger Scheduled Pipeline Now" (see below), or run:
docker exec stop_sync_osm_atlas_scheduler python -m matching_and_import_db.scheduler.job_runner --mode full --trigger manualData and database state are cached across runs (
./datadirectory and thepostgres_datavolume). -
Access the application:
- Web app: http://localhost:5001
- Postgres database:
localhost:5432(user:stops_user, password:1234)
-
To stop the services:
docker compose down
To remove all data:
docker compose down -v
flowchart LR
subgraph Sources["Data Sources"]
A[("ATLAS<br/>Official Swiss Data")]
O[("OSM<br/>Community Data")]
end
subgraph Pipeline["Processing Pipeline"]
direction TB
D["1. Download & Process"]
M["2. Multi-Stage Matching"]
P["3. Problem Detection"]
I["4. Database Import"]
D --> M --> P --> I
end
subgraph Output["Output"]
DB[("PostgreSQL<br/>+ PostGIS")]
W["Web Application"]
DB --> W
end
A --> D
O --> D
I --> DB
When the daily scheduled job runs (or when manually triggered), the pipeline executes:
matching_and_import_db/downloader/get_atlas_data.py: downloads ATLAS data and GTFS, builds optimized route/stop artifactsmatching_and_import_db/downloader/get_osm_data.py: fetches OSM data via Overpass and processes itmatching_and_import_db/orchestrator.py: runs the matching pipelinematching_and_import_db/database/importer.py: imports refreshed data into the import database
Downloads are cached under data/raw/ and processed artifacts under data/processed/ — see 1. Download and process data for details.
After acquisition, matching_and_import_db/database/importer.py populates the Postgres databases (e.g., stops, problems, persistent_data, atlas_stops, osm_nodes, routes_and_directions).
During import, the UI shows a global maintenance popup. Downloading and matching stages run in the background without blocking normal browsing.
Docker Compose now runs five primary services:
app: Flask web app and API.scheduler: Dedicated background worker that runs the daily pipeline at 2:00 (PIPELINE_TIMEZONE, defaultEurope/Zurich).db: Postgres + PostGIS import database.redis: Shared cache/rate-limit and pipeline status/lock storage.migrator: One-shot startup service that runsflask db upgradebeforeappandscheduler.
For local test execution, there is also a dedicated test service/image with both app and pipeline dependencies.
Scheduler behavior:
- Uses APScheduler cron trigger (
PIPELINE_SCHEDULE_HOUR,PIPELINE_SCHEDULE_MINUTE). - Publishes run status to
/api/system/pipeline_status. - Sets maintenance mode only for the import phase so the UI can show "Data update in progress" with elapsed/ETA.
- Uses a distributed lock to prevent concurrent runs.
If you have VS Code installed, we have provided built-in tasks to quickly run commands inside the running Docker containers without constantly restarting Docker:
- Open the VS Code Command Palette (
Cmd+Shift+Pon Mac). - Select
Tasks: Run Task. - Choose one of the predefined tasks:
Docker: Run All Tests: Executes thepytestsuite.Docker: Run Matching & Import (Existing Data): Runs matching + import on already downloaded files through the scheduler runner.Docker: Run Full Data Pipeline (Download & Match & Import): Runs full download + matching + import through the scheduler runner.Docker: Trigger Scheduled Pipeline Now: Fires a full manual run equivalent to the scheduled daily run.
You can do this while the app container is running in the background.
The Flask server is started automatically by Docker Compose.
Access it at http://localhost:5001/.
- Map View: Browse stops by type (
matched,unmatched,osm) and match method. - Filters & Search: Filter by ATLAS SLOID, OSM Node ID, UIC reference, or route.
- Problems: On the problems page you can solve the problems. See 3. Problems.
- Manage Data: See 4. Database.
- Generating Reports: The web app can generate CSV and PDF reports. See 5.5 Generate Reports and PDFs.
This repository uses GitHub Actions for continuous integration.
- Workflow: tests.yml
- CI documentation: CI and Tests
This project is a work in progress. Feedback and improvements are welcome! Feel free to submit issues and pull requests. Thank you for your interest! 🚀
