The pycemrg library provides a decoupled, configuration-driven system for managing common development and research tasks, including Machine Learning models, anatomical labels, file management, and system command execution. The core principle is that the library is stateless and generic; the consuming application provides configuration to direct its behavior.
Typical Workflow:
- (Optional) Use the
ConfigScaffolderor thepycemrgCLI to generate template configuration files. - Populate these YAML files with application-specific data.
- Instantiate the required managers (
ModelManager,LabelManager,OutputManager,CommandRunner). - Use the manager instances to retrieve model paths, translate label values, generate output paths, and execute external processes.
Entry Point: pycemrg.files.ConfigScaffolder
Programmatically creates template configuration files. This is the recommended first step for a new project.
Instantiation:
from pycemrg.files import ConfigScaffolder
scaffolder = ConfigScaffolder()Methods:
Creates a starter models.yaml file with usage examples.
- Signature:
(output_path: Union[str, Path] = "models.yaml", overwrite: bool = False) -> None - Args:
output_path(str | Path): The location to save the new file. Defaults to"models.yaml".overwrite(bool): IfTrue, will overwrite an existing file at theoutput_path. Defaults toFalse.
- Raises:
FileExistsError: If the file atoutput_pathexists andoverwriteisFalse.
Example:
scaffolder = ConfigScaffolder()
scaffolder.create_models_manifest(output_path="config/models.yaml", overwrite=True)Creates a starter labels.yaml file with customizable placeholder structure.
- Signature:
(output_path: Union[str, Path] = "labels.yaml", overwrite: bool = False, num_labels: int = 3, num_groups: int = 1) -> None - Args:
output_path(str | Path): The location to save the new file. Defaults to"labels.yaml".overwrite(bool): IfTrue, will overwrite an existing file at theoutput_path. Defaults toFalse.num_labels(int): Number of placeholder labels to generate (e.g.,structure_1,structure_2). Defaults to 3.num_groups(int): Number of placeholder groups to generate (e.g.,group_a,group_b). Labels are distributed evenly across groups. Defaults to 1.
- Raises:
FileExistsError: If the file atoutput_pathexists andoverwriteisFalse.
Example:
scaffolder = ConfigScaffolder()
scaffolder.create_labels_manifest(
output_path="config/labels.yaml",
num_labels=10,
num_groups=3,
overwrite=True
)Generated Structure:
labels:
background: 0
structure_1: 1
structure_2: 2
# ...
groups:
group_a:
- structure_1
- structure_2
group_b:
- structure_3
# ...Entry Point: pycemrg.files.OutputManager
Generates consistent output paths with a centralized prefix/suffix pattern. This utility is critical for orchestrators managing multiple related output files while maintaining naming conventions.
Instantiation:
from pycemrg.files import OutputManager
from pathlib import Path
# Initialize with output directory and file prefix
mgr = OutputManager(
output_dir="/path/to/output",
output_prefix="case_01"
)Methods:
Constructs the full, absolute path for a file with a given suffix.
- Signature:
(suffix: str) -> Path - Args:
suffix(str): The descriptive suffix for the file, including the extension (e.g.,"_segmentation.nii.gz","_mesh.vtk").
- Returns:
pathlib.Path: The absolute path for the output file.
- Raises:
ValueError: Ifsuffixis empty or not a string.
Behavior:
- Creates the output directory if it doesn't exist (on initialization)
- Returns paths as
{output_dir}/{prefix}{suffix}
Example:
mgr = OutputManager("/data/results", "patient_042")
seg_path = mgr.get_path("_segmentation.nii.gz")
# Returns: /data/results/patient_042_segmentation.nii.gz
mesh_path = mgr.get_path("_heart_mesh.vtk")
# Returns: /data/results/patient_042_heart_mesh.vtkDesign Rationale:
OutputManager enforces consistency without imposing rigid structure. It's the orchestrator's responsibility to define meaningful suffixes; the manager ensures they're applied uniformly.
Entry Point: pycemrg.models.ModelManager
Manages downloading, caching, and providing local filesystem paths to ML models defined in a manifest. Models are versioned, integrity-verified via SHA256, and cached to avoid redundant downloads.
Instantiation:
from pycemrg.models import ModelManager
from pathlib import Path
# The path to your application's models.yaml is required.
model_manager = ModelManager(manifest_path=Path("path/to/your/models.yaml"))
# Optionally, specify a custom cache directory.
model_manager = ModelManager(
manifest_path=Path("path/to/your/models.yaml"),
cache_dir=Path("/tmp/my-app-cache") # Default: ~/.cache/pycemrg
)Manifest Format:
segmentation_model:
default: v2.1
versions:
v2.1:
url: "https://example.com/models/seg_v2.1.zip"
sha256: "abc123def456..."
unzipped_target_path: "checkpoints/model.pth"
v2.0:
url: "file://local/path/to/seg_v2.0.zip"
sha256: "xyz789..."
unzipped_target_path: "model.pth"Methods:
The primary method. Returns the local path to a model's weights, handling download, verification, and unzipping as needed. The operation is idempotent; subsequent calls for the same model return the cached path instantly without network activity.
- Signature:
(model_name: str, version: str = 'default') -> Path - Args:
model_name(str): The logical name of the model (a top-level key inmodels.yaml).version(str): The specific version to retrieve. If'default', uses the version specified by thedefaultkey in the manifest.
- Returns:
pathlib.Path: A resolved, absolute path to the ready-to-use model file.
- Raises:
FileNotFoundError: If the providedmanifest_pathdoes not exist, or if afile://URL points to a missing local file.KeyError: If themodel_nameorversionis not found in the manifest.ValueError: If the manifest entry is malformed (e.g., missingunzipped_target_path).IOError: If the downloaded file's SHA256 hash does not match the manifest.RuntimeError: If a network, extraction, or file system error occurs during processing.
Example:
manager = ModelManager("models.yaml")
# Get default version
model_path = manager.get_model_path("segmentation_model")
# First call: Downloads, verifies, extracts → /home/user/.cache/pycemrg/.../model.pth
# Subsequent calls: Returns cached path immediately
# Get specific version
legacy_path = manager.get_model_path("segmentation_model", version="v2.0")Design Rationale:
- Models are never auto-updated to prevent silent breaking changes in production environments.
- Hash verification is mandatory (unless omitted in manifest) to detect corruption or man-in-the-middle attacks.
- Local
file://URLs support air-gapped or institutional network scenarios.
Entry Point: pycemrg.data.LabelManager
Manages translations between human-readable label names, groups, and their corresponding integer values based on a label manifest. Supports hierarchical group definitions for complex anatomical structures.
Instantiation:
from pycemrg.data import LabelManager
from pathlib import Path
# The path to your application's labels.yaml is required.
label_manager = LabelManager(config_path=Path("path/to/your/labels.yaml"))Manifest Format:
labels:
background: 0
LV_myo: 2
RV_myo: 3
LA_wall: 4
RA_wall: 5
groups:
ventricles:
- LV_myo
- RV_myo
atria:
- LA_wall
- RA_wall
all_chambers:
- ventricles # Groups can reference other groups
- atriaMethods:
Translates a single label name to its integer value.
- Signature:
(name: str) -> int - Args:
name(str): The human-readable label name (e.g.,"LV_myo").
- Returns:
int: The corresponding integer value.
- Raises:
KeyError: Ifnameis not defined in the manifest'slabelssection.
Example:
lv_value = label_manager.get_value("LV_myo") # Returns: 2Translates an integer value back to its human-readable name.
- Signature:
(value: int) -> str - Args:
value(int): The integer value of the label.
- Returns:
str: The corresponding human-readable name.
- Raises:
KeyError: Ifvalueis not defined in the manifest'slabelssection.
Example:
name = label_manager.get_name(2) # Returns: "LV_myo"Translates a list of strings into a sorted, unique list of integer label values. The input list can contain individual label names, group names (recursively resolved), or numbers as strings.
- Signature:
(names: List[str]) -> List[int] - Args:
names(List[str]): A list of strings to translate. Can include keys fromlabels, keys fromgroups, or numeric strings (e.g.,['ventricles', 'LA_wall', '5']).
- Returns:
List[int]: A sorted list of unique integer values corresponding to the input names.
- Raises:
KeyError: If any name in the list is not a valid label, group, or parseable integer. The error message includes all available keys.
Example:
# Mix of individual labels, groups, and raw integers
values = label_manager.get_values_from_names(["ventricles", "LA_wall", "0"])
# Returns: [0, 2, 3, 4] (sorted, deduplicated)
# Recursive group resolution
all_values = label_manager.get_values_from_names(["all_chambers"])
# Returns: [2, 3, 4, 5]Convenience method that returns a comma-separated string of tag values. Useful for command-line tools that expect tag lists as strings.
- Signature:
(names: List[str], separator: str = ",") -> str - Args:
names(List[str]): A list of label/group names to resolve.separator(str): The character to use between values. Defaults to",".
- Returns:
str: A separator-delimited string of integer values.
Example:
tags = label_manager.get_tags_string(["ventricles", "atria"])
# Returns: "2,3,4,5"
# Custom separator for tool compatibility
tags = label_manager.get_tags_string(["LV_myo", "RV_myo"], separator=":")
# Returns: "2:3"Design Rationale:
LabelManagernever validates that integer values make sense (e.g., non-negative, unique). It's a pure translation layer.- Groups support recursive definitions to model anatomical hierarchies.
- Numeric strings are accepted to support mixed-mode orchestrators that may receive raw tag values.
Entry Point: pycemrg.data.LabelMapper
Maps between two different label standards (e.g., source segmentation labels → simulation mesh tags). Uses composition of two LabelManager instances to create bidirectional translations based on shared anatomical names.
Instantiation:
from pycemrg.data import LabelManager, LabelMapper
from pathlib import Path
# Define two different label standards
source_mgr = LabelManager("source_labels.yaml") # e.g., clinical segmentation
target_mgr = LabelManager("target_labels.yaml") # e.g., simulation mesh
# Create mapper
mapper = LabelMapper(source=source_mgr, target=target_mgr)Example Manifests:
source_labels.yaml:
labels:
LV_myo: 100
RV_myo: 101
LA_wall: 102target_labels.yaml:
labels:
LV_myo: 2
RV_myo: 3
LA_wall: 4Methods:
Generates a dictionary mapping source integer tags to target integer tags. Only labels with matching names are included.
- Signature:
() -> Dict[int, int] - Returns:
Dict[int, int]: A dictionary of{source_tag: target_tag}.
Example:
mapper = LabelMapper(source_mgr, target_mgr)
mapping = mapper.get_source_to_target_mapping()
# Returns: {100: 2, 101: 3, 102: 4}
# Use in mesh relabeling
for source_val, target_val in mapping.items():
mesh_array[mesh_array == source_val] = target_valConvenience method to resolve names/groups using the source standard.
- Signature:
(names: List[str]) -> List[int] - Returns:
List[int]: Resolved tags from the sourceLabelManager.
Equivalent to: mapper.source.get_values_from_names(names)
Convenience method to resolve names/groups using the target standard.
- Signature:
(names: List[str]) -> List[int] - Returns:
List[int]: Resolved tags from the targetLabelManager.
Equivalent to: mapper.target.get_values_from_names(names)
Example:
# Extract source mesh with clinical labels
source_tags = mapper.get_source_tags(["LV_myo", "RV_myo"]) # [100, 101]
# Validate target mesh has simulation labels
target_tags = mapper.get_target_tags(["LV_myo", "RV_myo"]) # [2, 3]Design Rationale:
LabelMappernever modifies the underlyingLabelManagerinstances; it's purely a query interface.- Unmatched labels (present in source but not target) are silently ignored in the mapping to support partial overlaps.
- The mapper enables "schema evolution" workflows where label standards change across pipeline stages.
Entry Point: pycemrg.system.CommandRunner
A robust utility for safely running and logging external shell commands. Provides a consistent interface for executing system processes, capturing their output, and validating results without using an insecure shell.
Instantiation:
import logging
from pycemrg.system import CommandRunner
# Basic instantiation, uses a default logger
runner = CommandRunner()
# Optionally, inject an application-specific logger for unified log handling
app_logger = logging.getLogger("my_application")
runner = CommandRunner(logger=app_logger)Methods:
Executes a command safely, captures its output, and handles errors.
- Signature:
(cmd: Sequence[Union[str, Path]], expected_outputs: Optional[Sequence[Path]] = None, cwd: Optional[Path] = None, ignore_errors: Optional[Sequence[str]] = None, env: Optional[Dict[str, str]] = None) -> str - Args:
cmd(Sequence[str | Path]): A sequence of command parts (e.g.,['docker', 'run', Path('/tmp')]). Each part is converted to a string. Never passed to a shell interpreter.expected_outputs(Optional[Sequence[Path]]): A sequence ofpathlib.Pathobjects that are expected to exist after a successful run. If any are missing, raisesFileNotFoundError.cwd(Optional[Path]): The working directory from which to run the command.ignore_errors(Optional[Sequence[str]]): A sequence of strings. If the command fails but one of these strings is found in stderr, the error is treated as a warning and no exception is raised.env(Optional[Dict[str, str]]): Environment variables dict. IfNone, inherits the current process environment. If provided, replaces the entire environment (use with caution or merge withos.environ).
- Returns:
str: The captured stdout from the command.
- Raises:
CommandExecutionError: If the command returns a non-zero exit code and the error is not in theignore_errorslist.FileNotFoundError: If the command completes successfully but anexpected_outputfile is missing.
Example:
runner = CommandRunner()
# Basic execution
output = runner.run(['ls', '-la', '/tmp'])
# With output validation
runner.run(
cmd=['convert', 'input.nii', 'output.inr'],
expected_outputs=[Path('output.inr')]
)
# With error tolerance (some tools write warnings to stderr)
runner.run(
cmd=['legacy_tool', '--process', 'data.txt'],
ignore_errors=["WARNING: deprecated flag"]
)
# With custom environment
custom_env = os.environ.copy()
custom_env['CUDA_VISIBLE_DEVICES'] = '0,1'
runner.run(
cmd=['python', 'train.py'],
env=custom_env
)Design Rationale:
- Never uses
shell=True: Prevents command injection vulnerabilities. - Explicit environment control: The
envparameter enables isolated execution (critical for tools like CARPentry that require specific environments). - Validation as first-class concern:
expected_outputscatches silent failures where a tool exits successfully but produces no output.
Associated Exception:
A custom exception raised by CommandRunner.run() on failure. Subclass of RuntimeError providing rich context for programmatic error handling.
- Attributes:
.returncode(int): The exit code of the failed command..stdout(str): The captured standard output from the command..stderr(str): The captured standard error from the command.
Example:
from pycemrg.system import CommandRunner, CommandExecutionError
runner = CommandRunner()
try:
runner.run(['false']) # Command that always fails
except CommandExecutionError as e:
print(f"Command failed with exit code {e.returncode}")
print(f"Error output: {e.stderr}")
# Log to monitoring system, retry with different parameters, etc.Entry Point: pycemrg.system.CarpRunner
A specialized runner for executing commands from the CARPentry/openCARP ecosystem. Its primary responsibility is to correctly source the config.sh file from a CARPentry installation, setting up the complex environment (PATH, PYTHONPATH, LD_LIBRARY_PATH, license variables, etc.) before delegating execution to a generic CommandRunner.
Instantiation:
There are two primary ways to initialize the CarpRunner: by providing an explicit path or by using the auto-discovery class method.
1. Explicit Path (Recommended):
import logging
from pycemrg.system import CommandRunner, CarpRunner
# A generic CommandRunner is required
runner = CommandRunner()
# Instantiate CarpRunner with the path to the installation's config.sh
carp_runner = CarpRunner(
runner=runner,
carp_config_path="/path/to/your/carpentry_bundle/config.sh"
)2. Auto-Discovery:
from pycemrg.system import CommandRunner, CarpRunner
runner = CommandRunner()
# Use the classmethod to find the config file in common locations
config_path = CarpRunner.find_installation()
if config_path:
carp_runner = CarpRunner(runner=runner, carp_config_path=config_path)
else:
raise RuntimeError("Could not automatically locate CARPentry installation.")Methods & Properties:
Execute a command within the fully configured CARPentry environment.
- Signature:
(cmd: Sequence[Union[str, Path]], expected_outputs: Optional[Sequence[Path]] = None, cwd: Optional[Path] = None, ignore_errors: Optional[Sequence[str]] = None) -> str - Args:
cmd(Sequence[str | Path]): Command to execute (e.g.,['openCARP', '+F', 'sim.par'],['meshtool', 'extract', 'mesh']).- Other arguments are passed directly to the underlying
CommandRunner.run()method.
- Returns:
str: The capturedstdoutfrom the command.
- Raises:
CommandExecutionError: If the command fails.CarpEnvironmentError: If the CARPentry environment fails to load during initialization or reload.FileNotFoundError: If expected outputs are missing after a successful run.
Example:
carp = CarpRunner(runner, carp_config_path="/opt/carpentry_bundle/config.sh")
# Run openCARP simulation
carp.run(
cmd=['openCARP', '+F', 'experiment.par'],
expected_outputs=[Path('experiment_vm.igb')],
cwd=Path('/simulations/case_01')
)
# Run meshtool
carp.run(['meshtool', 'extract', 'surface', '-msh=heart', '-surf=epi'])A read-only property that returns the loaded CARPentry environment. The environment is lazy-loaded on first access and cached for efficiency.
- Type:
property - Returns:
Dict[str, str]: A dictionary of all environment variables sourced fromconfig.sh.
Key Variables Sourced:
PATH: Binaries foropenCARP,meshtool,meshalyzer, etc.PYTHONPATH:carputilsand related Python modulesLD_LIBRARY_PATH: PETSc and other shared librariesCARPENTRY_LICENSE: License file locationCARPUTILS_SETTINGS: carputils configuration fileOPAL_PREFIX,OPAL_BINDIR,OPAL_LIBDIR: MPI settingsVIRTUAL_ENV: Virtual environment paths (if created during installation)
Example:
env = carp.carp_env
print(f"CARPentry PATH: {env['PATH']}")
print(f"License file: {env['CARPENTRY_LICENSE']}")A read-only property that returns the root directory of the CARPentry installation.
- Type:
property - Returns:
pathlib.Path: The absolute path to the CARPentry installation directory (the parent directory ofconfig.sh).
Example:
root = carp.installation_root
meshes_dir = root / "meshes"
examples_dir = root / "carp-examples"Force reload of the CARPentry environment by re-sourcing config.sh.
- Signature:
() -> None
Use Cases:
- The
config.shfile has been modified externally - License file has been updated
- Debugging environment issues
Example:
# Update license file
shutil.copy("new_license.bin", carp.get_license_path())
# Force reload to pick up changes
carp.reload_environment()Get a path relative to the CARPentry installation directory.
- Signature:
(relative_path: str = "") -> Path - Args:
relative_path(str): Path relative to installation root (default:"").
- Returns:
pathlib.Path: Absolute path to the requested location.
Example:
bin_dir = carp.get_carp_path("bin")
petsc_lib = carp.get_carp_path("petsc/lib")
example_mesh = carp.get_carp_path("meshes/torso/torso")Checks if a specific command (e.g., openCARP, meshtool) is available in the sourced environment's PATH.
- Signature:
(command: str) -> bool - Args:
command(str): The name of the executable to check. Common commands include:openCARP: Main cardiac simulation solvermeshtool: Mesh manipulation toolcusummary: CARPutils summary toolmeshalyzer: Visualization toolbench: Benchmarking tool
- Returns:
bool:Trueif command is found and executable,Falseotherwise.
Example:
# Validate required tools before workflow
if not carp.validate_command_exists('openCARP'):
raise RuntimeError("openCARP not found in CARPentry installation")
# Check optional tools
if carp.validate_command_exists('meshalyzer'):
print("Visualization tools available")Get the path to the carputils settings.yaml file.
- Signature:
() -> Optional[Path] - Returns:
Optional[pathlib.Path]: Path to settings file ifCARPUTILS_SETTINGSenvironment variable is set,Noneotherwise.
Example:
settings_path = carp.get_carputils_settings_path()
if settings_path and settings_path.exists():
with open(settings_path) as f:
config = yaml.safe_load(f)Get the path to the CARPentry license file.
- Signature:
() -> Optional[Path] - Returns:
Optional[pathlib.Path]: Path tolicense.binifCARPENTRY_LICENSEenvironment variable is set,Noneotherwise.
Example:
license_path = carp.get_license_path()
if license_path and license_path.exists():
print(f"License found at: {license_path}")
else:
raise RuntimeError("CARPentry license not configured")Attempt to locate a CARPentry installation by searching for config.sh in common locations.
- Signature:
(search_paths: Optional[Sequence[Path]] = None) -> Optional[Path] - Type:
classmethod - Args:
search_paths(Optional[Sequence[Path]]): A list of directories to search. IfNone, uses default common locations:~/carpentry_bundle~/CARPentry~/opencarp/opt/carpentry_bundle/opt/CARPentry/usr/local/carpentry_bundle
- Returns:
Optional[pathlib.Path]: The path to the firstconfig.shfile found, orNoneif not found.
Example:
# Auto-discover with defaults
config_path = CarpRunner.find_installation()
# Search custom locations
custom_paths = [
Path("/data/software/carpentry"),
Path("/shared/tools/opencarp")
]
config_path = CarpRunner.find_installation(custom_paths)Associated Exception:
A custom exception raised by CarpRunner if it fails to source or validate the CARPentry environment from the config.sh file. This can happen if:
- The file is corrupted or incomplete
- The sourcing command fails
- Required environment variables are missing after sourcing
Subclass of RuntimeError.
Example:
from pycemrg.system import CarpRunner, CarpEnvironmentError
try:
carp = CarpRunner(runner, carp_config_path="broken_config.sh")
except CarpEnvironmentError as e:
print(f"Failed to load CARPentry environment: {e}")
# Fall back to alternative installation or fail gracefullyFor interactive use, the library provides a CLI to perform scaffolding operations.
Command: pycemrg
Sub-commands:
Creates a models.yaml template.
Usage:
pycemrg init-models --output config/models.yaml --forceOptions:
--output, -o PATH: Specify output path (default:./models.yaml)--force: Overwrite if file exists
Creates a labels.yaml template.
Usage:
pycemrg init-labels \
--output config/labels.yaml \
--num-labels 10 \
--num-groups 3 \
--forceOptions:
--output, -o PATH: Specify output path (default:./labels.yaml)--num-labels INT: Number of placeholder labels (default: 3)--num-groups INT: Number of placeholder groups (default: 1)--force: Overwrite if file exists
When working with data from multiple sources (e.g., clinical segmentation, research atlas, simulation mesh), use LabelMapper to create explicit translation layers:
from pycemrg.data import LabelManager, LabelMapper
# Define three standards
clinical_mgr = LabelManager("clinical_labels.yaml") # Hospital PACS labels
atlas_mgr = LabelManager("atlas_labels.yaml") # Research atlas
sim_mgr = LabelManager("simulation_labels.yaml") # openCARP mesh tags
# Create mappers for each transition
clinical_to_atlas = LabelMapper(clinical_mgr, atlas_mgr)
atlas_to_sim = LabelMapper(atlas_mgr, sim_mgr)
# Orchestrator workflow:
# 1. Load clinical segmentation
seg = load_nifti("patient_seg.nii.gz")
# 2. Translate to atlas standard
atlas_mapping = clinical_to_atlas.get_source_to_target_mapping()
for old_val, new_val in atlas_mapping.items():
seg[seg == old_val] = new_val
# 3. Further translate to simulation standard
sim_mapping = atlas_to_sim.get_source_to_target_mapping()
for old_val, new_val in sim_mapping.items():
seg[seg == old_val] = new_val
# 4. Save mesh with correct tags
save_mesh("heart_mesh", seg)Rationale: Explicit mapping layers prevent "tag confusion" bugs and make the data provenance traceable.
Use OutputManager to enforce consistent naming across all generated files:
from pycemrg.files import OutputManager
from pathlib import Path
def run_segmentation_pipeline(case_id: str, input_image: Path, output_dir: Path):
# Setup output management
mgr = OutputManager(output_dir=output_dir, output_prefix=case_id)
# All outputs share the same prefix
raw_seg_path = mgr.get_path("_raw_segmentation.nii.gz")
smooth_seg_path = mgr.get_path("_smooth_segmentation.nii.gz")
mesh_path = mgr.get_path("_heart_mesh.vtk")
fiber_path = mgr.get_path("_fibers.lon")
# Execute workflow steps
segment_image(input_image, output=raw_seg_path)
smooth_segmentation(raw_seg_path, output=smooth_seg_path)
generate_mesh(smooth_seg_path, output=mesh_path)
generate_fibers(mesh_path, output=fiber_path)
# All files follow pattern: {case_id}_*.{ext}
# e.g., patient_042_raw_segmentation.nii.gzRationale: Centralizing path generation prevents typos, ensures consistency, and simplifies batch processing.
When processing multiple cases, distinguish between retryable failures and fatal errors:
from pycemrg.system import CommandRunner, CommandExecutionError
import logging
logger = logging.getLogger(__name__)
runner = CommandRunner(logger=logger)
failed_cases = []
retryable_cases = []
for case in case_list:
try:
runner.run(
cmd=['process_case', case.input_path],
expected_outputs=[case.output_path]
)
except CommandExecutionError as e:
# Check for known retryable errors
if "out of memory" in e.stderr.lower():
logger.warning(f"Case {case.id} failed due to memory. Queueing for retry.")
retryable_cases.append(case)
elif "cuda" in e.stderr.lower():
logger.error(f"Case {case.id} failed due to GPU error. Skipping.")
failed_cases.append((case, e))
else:
# Unknown error - fail fast
raise
except FileNotFoundError as e:
# Tool ran but produced no output - likely data issue
logger.error(f"Case {case.id} produced no output: {e}")
failed_cases.append((case, e))
# Retry with more resources
for case in retryable_cases:
runner.run(
cmd=['process_case', '--memory-limit', '32G', case.input_path],
expected_outputs=[case.output_path]
)Rationale: Structured exception handling enables robust batch workflows with intelligent retry logic.
Important: Most pycemrg components are not thread-safe:
ModelManager: Cache writes are not atomic. Use a single instance per process or protect with locks.LabelManager/LabelMapper: Read-only after initialization, safe for concurrent access.CommandRunner/CarpRunner: Each thread should have its own instance to avoid log interleaving.OutputManager: Path generation is safe, but filesystem operations (creating directories) are not coordinated.
Safe Pattern for Parallel Processing:
from concurrent.futures import ProcessPoolExecutor
from pycemrg.data import LabelManager
from pycemrg.system import CommandRunner
def process_case(case_id: str, labels_config: Path):
# Each process gets its own instances
runner = CommandRunner()
label_mgr = LabelManager(labels_config)
# ... processing logic ...
# Use process pool, not threads
with ProcessPoolExecutor(max_workers=4) as executor:
futures = [
executor.submit(process_case, case_id, labels_config)
for case_id in case_list
]Rationale: Process-based parallelism avoids GIL contention and shared-state bugs.
The pycemrg suite follows these architectural principles:
- Radical Separation of Concerns: Libraries provide stateless logic; orchestrators handle I/O and persistence.
- Contract-Driven Architecture: Complex workflows use dataclass contracts to pass data between layers.
- Explicit Dependency Injection: Components receive dependencies at initialization (no globals, no singletons).
- Never Derive Paths: Libraries accept explicit path contracts; they never construct or assume file structures.
- Semantic Mapping for Domain Flexibility: Generic logic accepts semantic maps to decouple algorithms from user-specific schemas.
- Tool Wrappers, Not Monoliths: External tools get thin, focused wrappers exposing Pythonic APIs.
- No Premature Abstraction: Duplication is preferred over the wrong abstraction.
For detailed architectural guidelines, see pycemrg_suite_guidelines.txt.
# Configuration
from pycemrg.files import ConfigScaffolder, OutputManager
# Data Management
from pycemrg.data import LabelManager, LabelMapper
from pycemrg.models import ModelManager
# System Execution
from pycemrg.system import CommandRunner, CarpRunner
from pycemrg.system import CommandExecutionError, CarpEnvironmentError
# Logging
from pycemrg.core import setup_loggingimport logging
from pathlib import Path
from pycemrg.core import setup_logging
from pycemrg.files import OutputManager
from pycemrg.data import LabelManager
from pycemrg.system import CommandRunner
# 1. Setup
setup_logging(log_level=logging.INFO, log_file=Path("pipeline.log"))
logger = logging.getLogger(__name__)
# 2. Initialize managers
output_mgr = OutputManager(output_dir=Path("results"), output_prefix="case_01")
label_mgr = LabelManager(config_path=Path("config/labels.yaml"))
runner = CommandRunner(logger=logger)
# 3. Define paths explicitly
input_path = Path("data/input.nii.gz")
seg_path = output_mgr.get_path("_segmentation.nii.gz")
mesh_path = output_mgr.get_path("_mesh.vtk")
# 4. Execute workflow with validation
runner.run(
cmd=['segment', str(input_path), str(seg_path)],
expected_outputs=[seg_path]
)
tags = label_mgr.get_tags_string(["myocardium"])
runner.run(
cmd=['generate_mesh', str(seg_path), str(mesh_path), '--tags', tags],
expected_outputs=[mesh_path]
)
logger.info("Pipeline completed successfully")# Validate all inputs upfront with clear error messages
required = [("input", input_path), ("config", config_path)]
missing = [(name, p) for name, p in required if not p.exists()]
if missing:
raise FileNotFoundError(
"Missing files:\n" + "\n".join(f" {n}: {p}" for n, p in missing)
)# Context manager for debugging workflows
import tempfile, shutil
from pathlib import Path
class DebugTemp:
def __init__(self, prefix="debug"):
self.dir = None
self.prefix = prefix
def __enter__(self):
self.dir = tempfile.mkdtemp(prefix=f"{self.prefix}_")
return Path(self.dir)
def __exit__(self, exc_type, *_):
if exc_type is None:
shutil.rmtree(self.dir)
else:
print(f"Debug files: {self.dir}")
return False# Always copy os.environ before modifying
env = os.environ.copy()
env['CUDA_VISIBLE_DEVICES'] = '0'
runner.run(cmd, env=env)
# Never modify os.environ directly (affects entire process)# Example: Retry with exponentially reduced batch size
for batch_size in [32, 16, 8, 4]:
try:
runner.run(['train', f'--batch-size={batch_size}'])
break
except CommandExecutionError as e:
if "out of memory" not in e.stderr.lower():
raise
logger.warning(f"OOM at batch_size={batch_size}, retrying smaller...")