YamlDB is a file-based database that uses YAML for storage. It provides an API for managing nested configuration data with support for atomic writes, concurrency locking, and advanced querying.
- Nested Key Access: Use dot-notation (e.g.,
user.profile.name) to get or set values. - Atomic Writes: Writes to a temporary file before replacing the original to prevent data loss.
- Concurrency Locking: Uses system-level advisory locks (
portalocker) to prevent corruption during concurrent access. - Comment Preservation: Uses
ruamel.yamlto preserve comments and formatting in your YAML files. - Write Optimization: An
auto_flushmechanism and_dirtyflag reduce disk I/O. - Advanced Querying: Integrated JMESPath support for complex searches.
- Type Casting: Hybrid system for explicit casting during storage and retrieval.
- Transactions: Atomic bulk updates with full rollback support.
- CLI Tool: Manage your YAML databases directly from the terminal.
pip install yamldbTo use the :encrypt: backend, you need the cryptography library:
pip install "yamldb[encrypt]"Use cases include managing infrastructure manifests, cluster configurations, and node metadata.
from yamldb import YamlDB
# Initialize DB for cluster configuration
db = YamlDB(filename="cluster_config.yml", auto_flush=True)
# Define infrastructure components using dot-notation
db.set("cluster.name", "hpc-cluster-01")
db.set("cluster.nodes.node01.gpu_count", "8", cast=int)
db.set("cluster.nodes.node01.status", "online")
db.set("cluster.nodes.node02.gpu_count", "4", cast=int)
db.set("cluster.nodes.node02.status", "maintenance")
# Retrieve infrastructure details
gpu_count = db.get_as("cluster.nodes.node01.gpu_count", int)
status = db.get("cluster.nodes.node01.status")
# Advanced Search (JMESPath)
# Find all nodes that are currently 'online'
online_nodes = db.search("cluster.nodes.[?status=='online']")
# Bulk Updates in a Transaction (e.g., updating cluster version)
with db.transaction():
db.set("cluster.version", "2.4.1")
db.set("cluster.last_updated", "2026-04-27")
# If an exception occurs, the version won't be partially updatedThe yamldb CLI allows direct interaction with YAML databases from the terminal.
yamldb [OPTIONS] COMMAND [ARGS]...get: Retrieve a value using dot-notation.
yamldb get <file> <key>
# Example: yamldb get config.yml user.profile.nameset: Set a value. Automatically creates parent keys if they don't exist.
yamldb set <file> <key> <value>
# Example: yamldb set config.yml app.version 1.2.0delete: Remove a key from the database.
yamldb delete <file> <key>
# Example: yamldb delete config.yml user.old_settingsearch: Query the database using JMESPath expressions.
yamldb search <file> <query>
# Example: yamldb search config.yml "[?status=='active']"stats: Display write efficiency and I/O statistics.
yamldb stats <file>
# Example: yamldb stats config.ymlA generator that yields all leaf nodes in the database as (dot_notation_key, value) pairs. This allows auditing of infrastructure states.
for key, value in db.items_recursive():
print(f"{key}: {value}")
# Output: cluster.nodes.node01.gpu_count: 8 ...Locate infrastructure components based on their state.
# Find all nodes that are in 'maintenance' mode
maintenance_nodes = db.find_all("maintenance")
# Find all nodes with more than 4 GPUs
high_capacity_nodes = db.filter(lambda v: isinstance(v, int) and v > 4)Perform multiple infrastructure updates atomically.
db.update_many({
"cluster.nodes.node01.status": "offline",
"cluster.nodes.node01.last_reboot": "2026-04-27",
"cluster.global.maintenance_mode": True
})You can use the * wildcard in get() or via bracket access to retrieve multiple values across the database. This is powered by JMESPath under the hood.
# Get the status of ALL nodes in the cluster
# Returns a list: ['online', 'maintenance', 'online']
statuses = db.get("cluster.nodes.*.status")
# Get the GPU count for all nodes
# Returns a list: [8, 4, 16]
gpu_counts = db["cluster.nodes.*.gpu_count"]Track how many disk writes were avoided thanks to the _dirty flag.
stats = db.get_stats()
print(f"Write Efficiency: {stats['write_efficiency']}")filename: Path to the YAML file.backend::file:(default): Standard human-readable YAML storage.:memory:: In-memory storage (no disk I/O).:binary:: High-performance binary storage using JSON serialization.
auto_flush: IfTrue(default), changes are written to disk immediately unless inside a transaction.
For applications requiring smaller file sizes and faster I/O, use the :binary: backend.
db = YamlDB(filename="data.bin", backend=":binary:")
db.set("metrics.cpu", 45)
# Export binary data to human-readable YAML for debugging
db.convert_to_yaml("debug_export.yml")For sensitive data, use the :encrypt: backend. This encrypts the entire database file (including keys and structure) using AES-128 symmetric encryption.
Note: The
:encrypt:backend is currently experimental. We are actively refining its implementation and would greatly appreciate your feedback!
# Initialize an encrypted database
db = YamlDB(
filename="secrets.enc",
backend=":encrypt:",
password="your-strong-password"
)
# Use it exactly like a normal YamlDB
db.set("cluster.admin_password", "super-secret-123")
db.set("cluster.api_key", "abc-123-def-456")
# The file 'secrets.enc' is now a binary blob that is unreadable
# without the correct password.YamlDB comes with a lightweight Web UI for visual data management. This is a prototype designed to demonstrate how easily YamlDB can be integrated into other frameworks (such as FastAPI).
To run the Web UI:
- Install dependencies:
pip install fastapi uvicorn - Run the server:
python yamldb/bin/run_webui.py - Open your browser to
http://localhost:8000
The Web UI allows you to browse the database tree, set/delete values via dot-notation, and monitor write efficiency in real-time.