A static site generator and publishing platform for TEI (Text Encoding Initiative) XML data, built for Syriaca.org: The Syriac Reference Portal.
Gaddel transforms TEI XML scholarly data into static HTML pages, indexes them into OpenSearch for search and browse, generates RDF triples for a Neptune graph database, and deploys to AWS via GitHub Actions — replacing the previous eXist-db dynamic application with a serverless architecture.
View interactive diagram on Lucidchart
View as Mermaid diagram
graph TB
subgraph GitHub["GitHub (srophe org)"]
Repo["Gaddel Repository"]
GHA["GitHub Actions"]
end
subgraph DataSources["TEI Data Sources"]
TEI["TEI XML Records"]
RDF["RDF Taxonomy"]
end
subgraph SiteGenerator["Site Generator (XSLT 3.0 / XQuery)"]
Static["staticHTML.xsl"]
TEI2HTML["tei2html.xsl"]
JsonXSL["json.xsl"]
IndexXSL["createSiteIndex.xsl"]
TEI2TTL["tei2ttl.xqm"]
Templates["HTML Templates\n(components/)"]
Config["repo-config.xml"]
end
subgraph AWS["AWS Cloud"]
S3["S3 Bucket\nsrophe-syriaca-front-end"]
CF["CloudFront\nDistribution"]
OAC["Origin Access\nControl"]
APIGW["API Gateway"]
OS["OpenSearch\nIndex"]
Neptune["Neptune\nGraph DB"]
SPARQL["SPARQL\nEndpoint"]
end
Users["End Users"]
TEI --> SiteGenerator
RDF --> SiteGenerator
Config --> SiteGenerator
Templates --> SiteGenerator
SiteGenerator --> Repo
JsonXSL -->|Generate JSON docs| OS
IndexXSL -->|Create index mapping| OS
TEI2TTL -->|Generate RDF Turtle| Neptune
Repo --> GHA
GHA -->|OIDC Auth| AWS
GHA -->|Sync changed files| S3
GHA -->|Invalidate cache| CF
OAC --> S3
CF --> OAC
Users -->|HTTPS| CF
Users -->|Search & Browse| APIGW
APIGW --> OS
APIGW -->|Linked Data queries| SPARQL
SPARQL --> Neptune
Gaddel/
├── .github/workflows/ # CI/CD pipeline (deploy to S3)
├── siteGenerator/ # XSLT-based static site generator
│ ├── components/ # HTML page templates per collection
│ └── xsl/ # XSLT/XQuery for TEI→HTML, OpenSearch, and RDF
├── resources/ # Frontend assets
│ ├── bootstrap/ # Bootstrap 3 CSS/JS
│ ├── css/ # Custom stylesheets
│ ├── fonts/ # Syriac and other fonts
│ ├── images/ # Logos, icons, sponsor images
│ ├── js/ # jQuery, search.js (OpenSearch client), maps
│ ├── keyboard/ # Virtual keyboards (Syriac, Arabic, Hebrew, etc.)
│ └── leaflet/ # Leaflet.js map library
├── data/taxonomy/ # RDF taxonomy data
│
│── # Collection modules (each with browse, search, record pages):
├── geo/ # The Syriac Gazetteer (places)
├── persons/ # Syriac Biographical Dictionary (SBD)
├── authors/ # A Guide to Syriac Authors
├── q/ # Qadishe: Guide to Syriac Saints
├── bhse/ # Bibliotheca Hagiographica Syriaca Electronica
├── nhsl/ # New Handbook of Syriac Literature
├── cbss/ # Comprehensive Bibliography on Syriac Studies
├── saints/ # Gateway to the Syriac Saints
├── taxonomy/ # Taxonomy of Syriac Studies
├── johnofephesus/ # John of Ephesus project
├── lives/ # Lives of saints
│
├── documentation/ # Schemas, TEI ODD, editorial docs
├── api-documentation/ # API reference
├── index.html # Homepage
├── search.html # Global advanced search
├── browse.html # Global browse
└── LICENSE # GPL-3.0
| Collection | Path | Description |
|---|---|---|
| Gazetteer | /geo/ |
Historical geography of places related to Syriac studies |
| SBD | /persons/ |
Syriac Biographical Dictionary |
| Authors | /authors/ |
Guide to Syriac Authors |
| Qadishe | /q/ |
Catalogue of saints in the Syriac tradition |
| BHSE | /bhse/ |
Bibliotheca Hagiographica Syriaca Electronica |
| NHSL | /nhsl/ |
New Handbook of Syriac Literature |
| CBSS | /cbss/ |
Comprehensive Bibliography on Syriac Studies (27,000+ publications) |
| Saints | /saints/ |
Gateway to the Syriac Saints |
| Taxonomy | /taxonomy/ |
Taxonomy of Syriac Studies (RDF/SKOS) |
| John of Ephesus | /johnofephesus/ |
Prosopography and gazetteer for John of Ephesus |
The core of Gaddel is an XSLT 3.0 / XQuery pipeline in siteGenerator/ that transforms TEI XML and RDF data into static HTML, OpenSearch index documents, and RDF triples:
graph LR
subgraph Input
TEI["TEI XML"]
RDF["RDF/SKOS"]
HTML["HTML Templates"]
end
subgraph Transform["staticHTML.xsl"]
Route{"File Type?"}
TEI2HTML["tei2html.xsl"]
RDFProc["RDF Processing"]
HTMLConv["HTML Conversion"]
end
subgraph IndexPipeline["OpenSearch Indexing"]
JsonXSL["json.xsl"]
IndexXSL["createSiteIndex.xsl"]
end
subgraph LinkedData["Linked Data Pipeline"]
TEI2TTL["tei2ttl.xqm"]
end
subgraph Output
Pages["Static HTML Pages"]
JSON["JSON Index Docs"]
Mapping["Index Mapping"]
Turtle["RDF Turtle Triples"]
end
TEI --> Route
RDF --> Route
HTML --> Route
Route -->|TEI| TEI2HTML
Route -->|RDF| RDFProc
Route -->|HTML| HTMLConv
TEI2HTML --> Pages
RDFProc --> Pages
HTMLConv --> Pages
TEI --> JsonXSL --> JSON
TEI --> IndexXSL --> Mapping
TEI --> TEI2TTL --> Turtle
Key stylesheets and modules:
staticHTML.xsl— Main entry point; routes input to appropriate transformationtei2html.xsl— Converts TEI elements to HTMLjson.xsl— Generates OpenSearch JSON index documents from TEI recordscreateSiteIndex.xsl— Generates OpenSearch index mapping from repo-config search fieldstei2ttl.xqm— XQuery module that converts TEI records to RDF Turtle triples for Neptunemaps.xsl— Generates Leaflet map markup fromtei:geocoordinatesbibliography.xsl/citation.xsl— Bibliographic formattingrelationships.xsl— Renders TEI relationship datageoJSON.xsl— Generates GeoJSON from place datahelper-functions.xsl— Shared utility functions
Configuration is driven by siteGenerator/components/repo-config.xml, which defines collections, search fields, keyboard options, and URI patterns.
Search and browse functionality is powered by an OpenSearch index accessed via API Gateway:
graph LR
Browser["Browser\n(search.js)"] -->|REST GET| APIGW["API Gateway"]
APIGW --> OS["OpenSearch"]
OS -->|JSON results| APIGW
APIGW -->|JSON| Browser
The search client (resources/js/search.js) supports:
- Full-text keyword search across all collections
- Alphabetical browse with multilingual support (English, Syriac, Arabic, Hebrew, Greek, Russian, Armenian)
- Advanced search with field-specific filters (author, title, dates, place names, person names, etc.)
- CBSS subject-based browsing with related subject discovery and infinite scroll
- Historical era browsing for CBSS (century-based and era-grouped queries)
- Pagination and URL state management for shareable search results
The OpenSearch index is populated by running json.xsl over TEI records, which extracts fields defined in repo-config.xml (titles, authors, dates, coordinates, abstracts, etc.) into JSON documents. The index mapping is generated by createSiteIndex.xsl.
Gaddel publishes linked open data as RDF triples stored in an Amazon Neptune graph database, queryable via a SPARQL endpoint:
graph LR
TEI["TEI XML"] --> TEI2TTL["tei2ttl.xqm"]
TEI2TTL --> Turtle["RDF Turtle"]
Turtle --> Neptune["Neptune Graph DB"]
Neptune --> SPARQL["SPARQL Endpoint"]
SPARQL --> RecordPages["Record Pages\n(rdfRelationships)"]
SPARQL --> ExternalConsumers["External\nLinked Data Consumers"]
The tei2ttl.xqm XQuery module converts TEI records into RDF Turtle triples using standard ontologies:
lawd:(Linked Ancient World Data) — for persons, places, and conceptual worksfoaf:— for person relationships and topic referencesdcterms:— for bibliographic resources and relationsschema:— for descriptions and abstractsskos:— for taxonomy conceptsgeosparql:— for geographic coordinates
Triples generated include:
- Entity types (persons, places, works, bibliographic resources)
- Name variants in multiple languages (English, Syriac, Arabic, French)
- Biographical data (birth/death dates and places, floruit, gender, occupation, status)
- Geographic data (GPS coordinates, existence dates, religious communities)
- Relationships between entities (active/passive/mutual relations from TEI)
- Citations and cross-references between records
- Links to external authority files (close matches to non-Syriaca URIs)
Record pages dynamically load relationship data from the SPARQL endpoint via the rdfRelationships component, displaying linked entities in the sidebar.
Deployment is automated via GitHub Actions (.github/workflows/main.yml):
- On push to
main, the workflow triggers - Authenticates to AWS via OIDC (no stored credentials)
- Detects changed files using
git diff - Uploads only changed files to S3 bucket
- Falls back to full
aws s3 syncif a file was deleted - Invalidates the CloudFront cache
Data directories (json-data/, person/, place/, work/, cbss/, etc.) are excluded from sync as they are managed separately.
- Bootstrap 3 for layout and responsive design
- jQuery + jQuery UI for interactivity
- Leaflet.js for interactive maps
- Virtual keyboard support for Syriac (phonetic & standard), Arabic, Hebrew, Greek, and Russian
- Switchable Syriac fonts: Estrangelo Edessa, East Syriac Adiabene, Serto Batnan
- Plausible Analytics for privacy-friendly usage tracking
- GitHub OIDC — Deployments authenticate via OpenID Connect, eliminating stored AWS credentials. Short-lived tokens are scoped to specific repos and branches
- Origin Access Control (OAC) — S3 bucket is fully private; only CloudFront can read objects via SigV4-signed requests
- S3 Public Access Block — All four public access block settings enabled (BlockPublicAcls, BlockPublicPolicy, IgnorePublicAcls, RestrictPublicBuckets)
- Least-privilege IAM — Deploy role is limited to s3:PutObject, s3:DeleteObject, s3:ListBucket, and cloudfront:CreateInvalidation
- Secrets management — AWS account IDs, role ARNs, and distribution IDs stored as GitHub encrypted secrets, never in code
- CloudFront CDN — Global edge caching for low-latency content delivery with automatic failover across edge locations
- AWS WAF — Web Application Firewall attached to CloudFront and API Gateway for rate limiting, bot mitigation, and protection against common web exploits (SQL injection, XSS)
- HTTPS enforced — ViewerProtocolPolicy set to redirect-to-https; all traffic encrypted in transit
- API Gateway throttling — Request rate limits and burst controls on the search/SPARQL API to prevent abuse and protect backend services
- API Gateway resource policies — Restrict API access by origin, IP range, or referrer to prevent unauthorized consumption of OpenSearch and Neptune resources
- CORS configuration — API Gateway configured with strict Cross-Origin Resource Sharing headers, limiting API access to trusted domains
- Cache invalidation — Automated on every deployment to ensure users always receive current content
- Static site architecture — No server-side runtime to fail; S3 provides 99.999999999% (11 nines) durability and 99.99% availability
- S3 versioning — Object versioning enables rollback to any previous deployment state in case of accidental overwrites or deletions
- S3 cross-region replication — Data replicated to a secondary region for disaster recovery and business continuity
- S3 lifecycle policies — Automated transition of old object versions to cost-effective storage tiers; expired versions cleaned up automatically
- Incremental deploys — Only changed files are uploaded (via git diff), minimizing blast radius and deployment time
- Sync fallback — If a file deletion is detected, falls back to full aws s3 sync with exact-timestamps for consistency
- Data isolation — Data directories (person/, place/, work/, cbss/, etc.) are excluded from sync to prevent accidental overwrites from the code repo
- Branch protection — Deployments only trigger on push to main
- Full git history — Checkout uses fetch-depth: 0 for accurate diff comparisons
- TEI XML validation — Source data conforms to TEI P5 with project-specific ODD constraints (syriaca-tei-main.odd)
- Stable URIs — All entities have persistent identifiers (http://syriaca.org/{type}/{id}) ensuring long-term citability
- Creative Commons licensing — CC BY 4.0 on all content with clear attribution requirements
- Git version control — Full history of all data and code changes across both code and data repositories
GPL-3.0 — See LICENSE for details.
Content is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
Notes: David A. Michelson, general editor; Daniel L. Schwartz, director; Jeanne-Nicole Mellon Saint-Laurent, associate director; Nathan Gibson, William L. Potter, and James E. Walters, editors; Erin Geier and Winona Salesky, senior programmers, Syriaca.org: The Syriac Reference Portal (2014–), http://syriaca.org
