Note: this issue will be part of a series describing limitations and questions that arose while implementing CARMIN within CBRAIN.
CARMIN has a very simple data management model: everything is just files and directories that are stored server-side under an abstract 'root' chosen by the people who deployed the CARMIN server.
Launching pipelines on these files imply providing their paths as arguments to the pipeline's parameters. It is not clear if the paths are expected to be relative to the server's data root (e.g. some/stuff/file.txt) or absolute (e.g. /mnt/nfs/data1/carmin_data/bradley/some/stuff/file.txt). In the later case, how does the CARMIN API user even get that path? In the former case, how can the CARMIN API user even be sure that some/stuff/file.txt will be an appropriate argument for any of the pipelines? Is it expected that all pipelines will run with their cwd set to the root directory of CARMIN storage area?
In CBRAIN, data files are registered and given a unique numerical ID. I'm not going to go into the details for our framework, but the basic idea is that data files don't have a fixed path. The path is determined at the moment of the pipeline's start, because the pipeline can be executed on any number of remote servers that have distinct file system configuration (think supercomputer clusters). A CBRAIN pipeline (task) asks for a file to be 'synchronized' by ID, whichs brings a copy of the file to the remote server, and then its local path is provided to the pipeline.
So right now to use a CARMIN pipeline in CBRAIN, one has to:
- upload a PATH with CARMIN
- find the associated ID using CBRAIN's interface
- launch the pipeline providing the ID in the parameters, instead of the path
Steps 1 and 3 work in CARMIN, step 2 doesn't have any CARMIN API equivalent.
What we would need in CARMIN is an extension to an existing call:
- Extension to
GET /PATH : the JSON record should contain an entry for a platform-specific ID associated with the path
A more generic solution that other implementer woudl probably like (but that CBRAIN doesn't need) would be:
PUT /executions/{executionIdentifier}/preparePath/some/stuff/file.txt which would tell the server side to prepare the path /some/stuff/file.txt specifically for the execution by task executionIdentifier.
Note: this issue will be part of a series describing limitations and questions that arose while implementing CARMIN within CBRAIN.
CARMIN has a very simple data management model: everything is just files and directories that are stored server-side under an abstract 'root' chosen by the people who deployed the CARMIN server.
Launching pipelines on these files imply providing their paths as arguments to the pipeline's parameters. It is not clear if the paths are expected to be relative to the server's data root (e.g.
some/stuff/file.txt) or absolute (e.g./mnt/nfs/data1/carmin_data/bradley/some/stuff/file.txt). In the later case, how does the CARMIN API user even get that path? In the former case, how can the CARMIN API user even be sure thatsome/stuff/file.txtwill be an appropriate argument for any of the pipelines? Is it expected that all pipelines will run with their cwd set to the root directory of CARMIN storage area?In CBRAIN, data files are registered and given a unique numerical ID. I'm not going to go into the details for our framework, but the basic idea is that data files don't have a fixed path. The path is determined at the moment of the pipeline's start, because the pipeline can be executed on any number of remote servers that have distinct file system configuration (think supercomputer clusters). A CBRAIN pipeline (task) asks for a file to be 'synchronized' by ID, whichs brings a copy of the file to the remote server, and then its local path is provided to the pipeline.
So right now to use a CARMIN pipeline in CBRAIN, one has to:
Steps 1 and 3 work in CARMIN, step 2 doesn't have any CARMIN API equivalent.
What we would need in CARMIN is an extension to an existing call:
GET /PATH: the JSON record should contain an entry for a platform-specific ID associated with the pathA more generic solution that other implementer woudl probably like (but that CBRAIN doesn't need) would be:
PUT /executions/{executionIdentifier}/preparePath/some/stuff/file.txtwhich would tell the server side to prepare the path/some/stuff/file.txtspecifically for the execution by taskexecutionIdentifier.