-
Notifications
You must be signed in to change notification settings - Fork 15
Validate CustomResourceConfig shape + GLT-backend compatibility #627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
kmontemayor2-sc
wants to merge
13
commits into
main
Choose a base branch
from
kmonte/custom-resource-config-pr3-validation
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
2b69b8f
Add `CustomResourceConfig` proto for shell-command launchers
ddbafc2
Merge branch 'main' into kmonte/custom-resource-config-pr1-proto
kmontemayor2-sc 0425547
Skip machine-config validation for CustomResourceConfig
066c473
Add `launch_custom` subprocess dispatcher for CustomResourceConfig
5742480
Validate `CustomResourceConfig` shape + GLT-backend compatibility
b606b28
Merge branch 'main' into kmonte/custom-resource-config-pr1-proto
kmontemayor2-sc 844c434
Update docstring
kmonte 937276f
Merge branch 'kmonte/custom-resource-config-pr1-proto' of ssh://githu…
kmonte d0fb63f
Rename CustomResourceConfig -> CustomLauncherConfig
a3b8e82
Merge branch 'kmonte/custom-resource-config-pr1-proto' into kmonte/cu…
f23fb44
Apply CustomResourceConfig -> CustomLauncherConfig rename to launcher
4d583d4
Merge branch 'kmonte/custom-resource-config-pr2-launcher' into kmonte…
009a7af
Apply CustomResourceConfig -> CustomLauncherConfig rename to validators
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,96 @@ | ||
| """Subprocess dispatch for ``CustomLauncherConfig``-backed launchers. | ||
|
|
||
| Takes ``CustomLauncherConfig.command`` and ``CustomLauncherConfig.args`` | ||
| verbatim and shells out via ``subprocess.run(shell_line, shell=True)``. | ||
| The shell-style invocation honors leading ``KEY=VALUE`` env-var | ||
| assignments in ``command`` so callers can self-document required env | ||
| without forcing the dispatcher to parse env separately. | ||
|
|
||
| The receiving subprocess has no special protocol — it is expected to be | ||
| a plain CLI that argparses whatever flags the YAML wires up via | ||
| ``args[]``. The dispatcher performs no template substitution; any | ||
| dynamic content (runtime URIs, image refs, etc.) is the caller's | ||
| responsibility — typically resolved at YAML-load time before the | ||
| proto reaches this module. | ||
| """ | ||
|
|
||
| import shlex | ||
| import subprocess | ||
| from collections.abc import Mapping | ||
| from typing import Optional | ||
|
|
||
| from gigl.common import Uri | ||
| from gigl.common.logger import Logger | ||
| from gigl.src.common.constants.components import GiGLComponents | ||
| from snapchat.research.gbml.gigl_resource_config_pb2 import CustomLauncherConfig | ||
|
|
||
| logger = Logger() | ||
|
|
||
| _LAUNCHABLE_COMPONENTS: frozenset[GiGLComponents] = frozenset( | ||
| {GiGLComponents.Trainer, GiGLComponents.Inferencer} | ||
| ) | ||
|
|
||
|
|
||
| def launch_custom( | ||
| custom_launcher_config: CustomLauncherConfig, | ||
| applied_task_identifier: str, | ||
| task_config_uri: Uri, | ||
| resource_config_uri: Uri, | ||
| process_command: str, | ||
| process_runtime_args: Mapping[str, str], | ||
| cpu_docker_uri: Optional[str], | ||
| cuda_docker_uri: Optional[str], | ||
| component: GiGLComponents, | ||
| ) -> None: | ||
| """Shell out to ``custom_launcher_config.command`` with ``args[]`` appended. | ||
|
|
||
| Composes a shell line as ``command`` followed by each ``args[]`` | ||
| element passed through ``shlex.quote``, then invokes | ||
| ``subprocess.run(shell_line, shell=True, check=True)``. | ||
|
|
||
| The dispatcher takes ``command`` and ``args[]`` verbatim — no | ||
| template substitution of any kind. Any placeholder text in those | ||
| fields reaches ``subprocess.run`` literally; consumers that want | ||
| substitution should resolve it at YAML-load time before the proto | ||
| reaches this module. | ||
|
|
||
| ``applied_task_identifier``, ``task_config_uri``, | ||
| ``resource_config_uri``, ``process_command``, | ||
| ``process_runtime_args``, ``cpu_docker_uri``, and ``cuda_docker_uri`` | ||
| are accepted for API symmetry with the GLT-side Vertex AI launchers | ||
| but are intentionally not plumbed into the subprocess — the | ||
| receiving CLI is expected to source whatever context it needs from | ||
| the resource config it gets handed (or from env vars inherited from | ||
| the parent process). | ||
|
|
||
| Args: | ||
| custom_launcher_config: Proto whose ``command`` is the shell | ||
| snippet to execute and whose ``args`` are positional | ||
| arguments appended verbatim. | ||
| applied_task_identifier: Accepted for back-compat; ignored. | ||
| task_config_uri: Accepted for back-compat; ignored. | ||
| resource_config_uri: Accepted for back-compat; ignored. | ||
| process_command: Accepted for back-compat; ignored. | ||
| process_runtime_args: Accepted for back-compat; ignored. | ||
| cpu_docker_uri: Accepted for back-compat; ignored. | ||
| cuda_docker_uri: Accepted for back-compat; ignored. | ||
| component: Which GiGL component is being launched. Must be in | ||
| ``_LAUNCHABLE_COMPONENTS``. | ||
|
|
||
| Raises: | ||
| ValueError: If ``component`` is not Trainer or Inferencer, or if | ||
| ``custom_launcher_config.command`` is empty. | ||
| subprocess.CalledProcessError: If the spawned subprocess exits | ||
| non-zero. | ||
| """ | ||
| if component not in _LAUNCHABLE_COMPONENTS: | ||
| raise ValueError(f"Invalid component: {component}") | ||
| if not custom_launcher_config.command: | ||
| raise ValueError("CustomLauncherConfig.command must be set") | ||
|
|
||
| command: str = custom_launcher_config.command | ||
| args: list[str] = list(custom_launcher_config.args) | ||
|
|
||
| shell_line = " ".join([command, *(shlex.quote(a) for a in args)]) | ||
| logger.info(f"Launching {component.name} via subprocess: {shell_line!r}") | ||
| subprocess.run(shell_line, shell=True, check=True) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semgrep identified an issue in your code:
The
subprocess.run()call usesshell=Trueon user-controlled command strings, allowing shell injection attacks that could execute arbitrary commands with the process's privileges.More details about this
The
subprocess.run()call executesshell_linewithshell=True, which spawns a shell process to interpret the command. This is dangerous because ifresolved_commandor any element inresolved_argscomes from untrusted input (e.g., user-provided configuration, external data), an attacker can inject arbitrary shell commands.For example, if an attacker controls
custom_resource_config.commandand sets it toid; rm -rf /, the shell will execute bothidand the destructiverm -rf /command. Even thoughshlex.quote()escapes individual arguments, it doesn't protect against injection in the command itself—only in the args list. An attacker who controls the command field can bypass this protection entirely.Exploit scenario:
custom_resource_config.command = "echo test; cat /etc/passwd #"shell_linebecomes something like"echo test; cat /etc/passwd #"subprocess.run()executes this withshell=True, the shell interprets the semicolon as a command separatorecho testandcat /etc/passwdexecute, leaking sensitive system filesThe shell inherits environment variables and settings from the parent process, which further expands the attack surface. Using
shell=Falsewould treat the entire string as a single command name rather than allowing shell metacharacters to be interpreted.To resolve this comment:
💡 Follow autofix suggestion
View step-by-step instructions
subprocess.run(shell_line, shell=True, check=True)with an invocation that does not useshell=True.subprocess.run([resolved_command] + resolved_args, check=True).shlex.quotewhen building the argument list, since quoting is only necessary when passing a string to the shell.resolved_commandand every item inresolved_argsare unquoted strings representing the command and its arguments.Passing the command and arguments as a list with
shell=False(the default) is safer, because it avoids any interpretation by a shell, preventing command injection vulnerabilities.💬 Ignore this finding
Reply with Semgrep commands to ignore this finding.
/fp <comment>for false positive/ar <comment>for acceptable risk/other <comment>for all other reasonsAlternatively, triage in Semgrep AppSec Platform to ignore the finding created by subprocess-shell-true.
You can view more details about this finding in the Semgrep AppSec Platform.