Migrate Airavata to Spring Boot, Simplify Data Models#589
Migrate Airavata to Spring Boot, Simplify Data Models#589
Conversation
…ecture Migrate the Airavata platform from legacy Thrift/OpenJPA architecture to a modern Spring Boot application with clean service layer boundaries. Key changes: - Replace Thrift RPC with Spring Boot REST API (Swagger UI at /swagger-ui) - Replace OpenJPA with Spring Data JPA (Hibernate) and MapStruct mappers - Introduce interface-first service layer (FooService + DefaultFooService) - Add generic CrudService/AbstractCrudService/EntityMapper infrastructure - Restructure packages into domain boundaries: research/, execution/, compute/, storage/, status/, iam/, credential/, workflow/, protocol/ - Implement Temporal-based durable workflows (ProcessActivity: Pre/Post/Cancel) - Add DAG-based task execution engine (ProcessDAGEngine) - Consolidate compute backends into ComputeProvider interface (Slurm/AWS/Local) - Consolidate storage operations into StorageClient interface (SFTP) - Replace log4j with logback, Dozer with MapStruct - Add Flyway for schema migrations - Simplify distribution into single Spring Boot module - Remove deprecated Python SDK, PHP SDK, and Thrift IDL definitions - Streamline dev environment with Docker Compose and init scripts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
...ta-api/modules/grpc-api/src/main/java/org/apache/airavata/grpc/service/AgentFileService.java
Fixed
Show fixed
Hide fixed
...i/modules/airavata-api/src/main/java/org/apache/airavata/config/RestClientConfiguration.java
Fixed
Show fixed
Hide fixed
.../modules/airavata-api/src/main/java/org/apache/airavata/config/WebSecurityConfiguration.java
Fixed
Show fixed
Hide fixed
.../modules/airavata-api/src/main/java/org/apache/airavata/iam/keycloak/KeycloakRestClient.java
Fixed
Show fixed
Hide fixed
Delete unused classes that are remnants of old designs: - ComputeMonitorConstants: old monitoring constants, superseded by new patterns - ValidationResult: replaced by ValidationExceptions.ValidationResults - ExperimentArtifactModel: replaced by ResearchArtifactEntity - MonitorMode: old enum for monitor modes, no longer used - ExperimentType: old experiment type enum, no longer used - DataStageType: old data staging enum, replaced by StorageClient patterns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test referenced 9 controller classes that don't exist (e.g., ApplicationDeploymentController, ComputeResourceController, GroupResourceProfileController). Replaced the EXPECTED_CONTROLLERS and MINIMUM_ENDPOINTS_PER_CONTROLLER maps with all 27 real controllers discovered in the controller directory. Updated the CRUD controllers list to only include controllers that actually have GET/POST/PUT/DELETE. Added PatchMapping support to endpoint counting and summary output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The getUniqueTimestamp_producesUniqueValues test generated 100 timestamps in a tight loop and asserted all are unique. On fast machines, timestamps can collide. Reduced to 10 — uniqueness is already proven by the adjacent monotonically-increasing test which checks 100 values for ordering. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Covers output persistence, no-op when no prefixed entries, missing experiment graceful skip, in-place update of existing outputs, and filtering of non-prefixed DAG state entries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verify that LocalComputeProvider correctly delegates provision, cancel, and deprovision to SlurmComputeProvider while handling submit and monitor locally without delegation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests cover: null inputs, non-URI type skipping, optional null skipping, required null failure, URI transfer, URI collection splitting, null outputs, and output transfer with experiment persistence. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GatewayConfigController has GET, POST, PUT, DELETE endpoints and was missing from the crudControllers verification list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…socket IdGenerator.getUniqueTimestamp() had a bug where the microsecond-wrap branch would increment lastTimestampMillis, but the next call's "time went backwards" branch would reset to the lower real time, breaking monotonicity. Fixed by treating both cases (same millis and time-behind) identically: keep incrementing from current position. Added TESTCONTAINERS_DOCKER_SOCKET_OVERRIDE=/var/run/docker.sock to surefire env vars so Ryuk container mounts the in-VM socket path instead of the host-side Rancher Desktop socket path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move Project, ProjectMapper, ProjectRepository, ProjectService, and DefaultProjectService from research/experiment/ to research/project/. Update all import references across airavata-api, rest-api, and agent-framework modules. Update IntegrationTestConfiguration component scan to include new package locations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TokenResponse is an OAuth DTO, not a domain model. Move from iam/model/ to iam/dto/ for consistency with the project's DTO convention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This was the only REST controller in airavata-api. All other controllers live in rest-api. Move it there for consistency and add @tag for OpenAPI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This is a data class, not a Spring @configuration bean. Use Config suffix for data classes, Configuration for Spring beans. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Design for enforcing Temporal-first patterns: per-node activities with retry tiers, compute legacy cleanup, Keycloak HTTP standardization, EmailMonitorWorkflow conversion, and Thread.sleep elimination. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
12 bite-sized tasks covering: - Remove Thread.sleep from SLURM/AWS providers and DataStagingSupport - Delete ExponentialBackoffWaiter and AwsProcessContext - Add RetryTier enum with per-tier Temporal ActivityOptions - Restructure ProcessActivity: per-node activities with DAG walking in workflows - Delete ProcessDAGEngine (logic moved to workflow + ActivitiesImpl) - Delete legacy compute stubs (GroupResourceProfile, GroupComputeResourcePreference) - Standardize Keycloak HTTP on Spring utilities - Rewrite EmailMonitorWorkflow as Temporal workflow with continueAsNew - Final verification: zero Thread.sleep in production code Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Submit verification and job polling are now single-attempt operations. Temporal's activity retry policy handles retries at the workflow level. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- AwsComputeProvider: remove ExponentialBackoffWaiter SSH/EC2 waits, convert pollJobUntilSaturated to single status check - Delete ExponentialBackoffWaiter (Temporal retries replace it) - DataStagingSupport: remove file existence retry loop, single check - JobStatusEventToResultConverter: remove getJobIdByJobNameWithRetry, single lookup (job record saved before submission) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ad classes AwsProcessContext was a thin JSON wrapper — state now stored directly in dagState + providerContext via private helpers. AwsComputeResourcePreference was dead code (getter returned null). Both classes deleted. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Each DAG node now carries a retryTier metadata entry that determines the Temporal RetryOptions when the node runs as its own activity. Five tiers: INFRASTRUCTURE, DATA, CHECK, MONITOR, CLEANUP. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…eference stubs Both classes were explicitly marked as temporary stubs. Deleted classes, removed legacy methods from ResourceProfileAdapter, migrated DefaultExperimentService to use profile IDs directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
KeycloakRequestAuthenticator: HttpURLConnection → RestTemplate.exchange() DefaultKeycloakLogoutService: StringBuilder form/URL → LinkedMultiValueMap + UriComponentsBuilder KeycloakRestClient: manual form body → LinkedMultiValueMap, standardize all URLs to UriComponentsBuilder Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… activity Workflows now walk the DAG deterministically and call executeDagNode per node with tier-specific RetryOptions. The monolithic execute*Dag activities are replaced with atomic per-node execution. - Activities interface: resolveResourceType + executeDagNode - NodeResult record: serializable return type - Non-fatal failures throw (Temporal retries per tier) - Fatal failures throw non-retryable ApplicationFailure - Activity exhaustion → workflow follows DAG failure edge Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…kflows The engine's single-node execution logic is now in ProcessActivity.ActivitiesImpl.executeDagNode(). DAG traversal is in the workflow implementations. Engine and its 842-line test removed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace daemon thread with Temporal workflow using: - Workflow.sleep() for durable poll interval - pollEmails() activity for single IMAP poll batch - Temporal retry options for transient IMAP failures - continueAsNew every 100 iterations to bound history - ApplicationStartedEvent listener to launch on boot ServerLifecycle interface deleted (sole implementor removed). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… binding API Replace GroupResourceProfile/GroupComputeResourcePreference usage with ResourceService + ResourceProfileAdapter.getBinding(). Agent service now resolves compute resources and bindings through the clean API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove deprecated getProcessId()/setProcessId() from EventEntity, getCreationTime()/setCreationTime() from JobModel, and getExecutionId()/setExecutionId()/getSourceApplicationInterfaceId() from ExperimentModel. Update all callers to use the canonical accessor names. Fix PreferenceLevel.USER deprecation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nfig Replace getResourceScheduleValue() with resolveAwsConfig() that performs two-tier lookup: binding metadata first, then resource schedule fallback. Inject ResourceProfileAdapter to enable binding-aware configuration resolution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update stub Javadoc on application model classes to proper descriptions - Remove dead repositoryId field from ApplicationInterfaceDescription - Add log.debug() to swallowed exceptions in ExperimentMapper and ExperimentStatusManager - Remove "Legacy fields" comment headers and changelog noise Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove 27 unused inner classes from DBConstants (only Experiment, ExperimentSummary, Process are referenced) - Remove commented-out gRPC response line in AiravataFileService - Extract ONE_HOUR_MS constant in AgentManagementService Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- DatabaseMigratorCommand: wrap Statement/ResultSet in try-with-resources - AiravataFileService: close input streams with try-with-resources, replace deleteOnExit() with explicit Files.deleteIfExists() - AwsComputeProvider: guard VPC list access against empty result Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace 'from registry' with 'from database' in log/error messages (ProcessActivityManager, JobStatusMonitor, DefaultValidationService, ProcessResourceResolver, DefaultOrchestratorService, DefaultExperimentSearchService) - Rename 'registryUserService' bean to 'iamUserService' - Rename 'RegistryUserMapperImpl' to 'IamUserMapperImpl' - Update stale Javadoc on UserService/DefaultUserService Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add missing @deprecated annotation on TaskContext.getStorageResourceId() - Replace deprecated UriComponentsBuilder.fromHttpUrl() with fromUriString() across KeycloakRestClient (20 sites) and DefaultKeycloakLogoutService (2) - Migrate SftpClient from deprecated getStorageResourceId() to getComputeResourceId() - Replace deprecated @GenericGenerator with @uuidgenerator in agent-framework entities (AgentDeploymentInfoEntity, AgentExecutionEntity, AgentExecutionStatusEntity) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use RABBITMQ_DEFAULT_VHOST environment variable instead of the deprecated withVhost() builder method. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove RabbitMQ testcontainers, stale Dapr/Kafka references, unused email parsers (PBS/UGE/HTCondor), and dead logging/property entries. Fix Java 25 dangling-doc-comments (79 files), this-escape, and unchecked warnings. Suppress Ryuk startup noise in test logging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Updated SystemConfigController, SystemController, UserController, and WorkflowRunController to use specific Spring annotations for better clarity and organization. - Introduced AgentExceptionHandler and FileServerExceptionHandler for centralized exception handling in the REST API. - Added AuthzTokenFilter to manage authorization tokens in requests. - Improved logging practices by using a consistent Logger instance across classes. - Updated pom.xml to manage dependencies and plugins, including Spring Boot and gRPC versions.
Move all modules to top-level directories (airavata-api/, airavata-agent/, airavata-portal/, airavata-python-sdk/) and restructure deployment infrastructure for discoverability: Monorepo consolidation: - modules/agent-framework/ → airavata-agent/ - airavata-nextjs-portal/ → airavata-portal/ - Root Maven files (.mvn/, pom.xml, mvnw) → airavata-api/ Deployment restructuring: - .devcontainer/ moved from airavata-api/ to repo root (serves whole monorepo) - dev-tools/ansible/ → deployment/ansible/ - dev-tools/deployment-scripts/ → deployment/scripts/ - dev-tools/test-*.sh → airavata-api/scripts/ Proto files: - Canonical .proto files → top-level proto/ directory - airavata-agent/proto-shared symlinks to ../proto - Removed duplicate agent-communication.proto - Added protos/.gitignore for generated *.pb.go files CI/config path fixes: - All GitHub workflows: added working-directory: airavata-api - Docker build context/file paths prefixed with airavata-api/ - Ansible build/deploy tasks: chdir and src paths updated for monorepo - Tarball assembly: split LICENSE/NOTICE (repo root) from RELEASE_NOTES (airavata-api/) - Dockerfile: fixed EXPOSE and HEALTHCHECK port 8080 → 8090 Documentation: - Root README: updated quickstart (cd airavata/airavata-api), all internal links (docs/ERD.md, modules/, scripts/, .vscode/), Docker build commands - All ansible docs: dev-tools/ansible → deployment/ansible - Module READMEs: fixed relative paths to root README and deployment scripts - Portal README: updated .devcontainer path - Removed stale dev-tools/migrations links Cleanup: - .gitignore: added .claude/, airavata-api/distribution/, keystore patterns; removed stale airavata-local-agent and modules/research-framework entries - .dockerignore: removed stale .devcontainer entry, added target exclusion - Fixed portal seed script and example script port 8080 → 8090 - Removed go.sum from agent .gitignore (should be tracked per Go convention) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| runs-on: ubuntu-latest | ||
| steps: | ||
| - name: Set up OS dependencies | ||
| run: | | ||
| sudo apt-get update | ||
| sudo apt-get install -y build-essential automake bison flex libboost-all-dev libevent-dev libssl-dev libtool pkg-config | ||
| - name: Set up Thrift 0.22.0 | ||
| run: | | ||
| wget -q https://dlcdn.apache.org/thrift/0.22.0/thrift-0.22.0.tar.gz | ||
| tar -xzf thrift-0.22.0.tar.gz | ||
| cd thrift-0.22.0 | ||
| ./configure --without-rs --enable-libs=no --enable-tests=no | ||
| make -j$(nproc) | ||
| sudo make install | ||
| thrift --version | ||
| - name: Set up JDK 17 | ||
| - name: Checkout code | ||
| uses: actions/checkout@v4 | ||
| - name: Set up JDK 25 | ||
| uses: actions/setup-java@v4 | ||
| with: | ||
| distribution: "temurin" | ||
| java-version: "17" | ||
| - name: Checkout code | ||
| uses: actions/checkout@v4 | ||
| - name: Build with Maven (skip tests) | ||
| run: mvn clean install -DskipTests | ||
| java-version: "25" | ||
| cache: "maven" | ||
| - name: Build and test with Maven | ||
| run: ./mvnw clean install | ||
| working-directory: airavata-api |
Check warning
Code scanning / CodeQL
Workflow does not contain permissions Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 12 days ago
In general, the fix is to explicitly define a permissions block that grants only the minimal required scopes to the GITHUB_TOKEN, instead of relying on repository/organization defaults. For a build-and-test-only Maven workflow that just checks out code and runs mvnw, contents: read is sufficient in most cases.
The best fix here, without changing existing functionality, is to add a permissions block at the workflow root (top level, alongside on: and jobs:) so it applies to all jobs that don’t override it. Based on the current steps (checkout and Maven build), the job only needs to read repository contents, so we set:
permissions:
contents: readInsert this block after the on: section and before jobs: in .github/workflows/maven-build.yml. No additional imports, methods, or definitions are needed.
| @@ -10,6 +10,9 @@ | ||
| - master | ||
| - service-layer-improvements | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| jobs: | ||
| build: | ||
| runs-on: ubuntu-latest |
| resolve(false); | ||
| }); | ||
|
|
||
| socket.connect(port, host); |
Check failure
Code scanning / CodeQL
Server-side request forgery Critical
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 12 days ago
In general, to fix this kind of issue you must not let arbitrary user input determine where your server connects. Instead, constrain the destination to a controlled set of hosts (and possibly ports), e.g. via an allow-list of known SFTP servers or by validating against a configuration-derived list. Also validate port numbers to ensure they’re within acceptable ranges.
For this specific code, the safest fix without changing the endpoint’s basic behavior is:
- Introduce an allow-list of permitted SFTP hosts (e.g. from environment variables), and reject any
hostthat does not match. - Strictly validate
port:- Ensure it is a number.
- Restrict it to a sensible range (for SFTP usually 22, or at least 1–65535 with the option to narrow further, e.g. only 22).
- Keep using
testPortbut only after validation passes.
Concretely in airavata-portal/src/app/api/v1/connectivity-test/sftp/route.ts:
- Add a helper
isHostAllowed(host: string): booleanthat:- Loads an allow-list from an environment variable like
SFTP_ALLOWED_HOSTS(comma-separated). - Normalizes hostnames (
trim,toLowerCase). - Returns true only if the requested
hostis in that list.
- Loads an allow-list from an environment variable like
- In the
POSThandler, after extractinghostandportfrombody:- Validate
typeof host === "string"and thatisHostAllowed(host)is true; otherwise respond with 400. - Convert
portto a number (if not already), ensure it’s an integer in [1, 65535], and optionally restrict to 22; otherwise respond with 400. - Use the validated
validatedPortwhen callingtestPort.
- Validate
- Keep the
testPortfunction unchanged except that it will now only receive validated values.
This keeps the existing functionality (testing SFTP connectivity) but removes the ability for arbitrary external callers to direct your server at arbitrary internal addresses.
| resolve(false); | ||
| }); | ||
|
|
||
| socket.connect(port, host); |
Check failure
Code scanning / CodeQL
Server-side request forgery Critical
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 12 days ago
In general, fix SSRF here by constraining which hosts and ports the API can connect to, instead of using user input directly. Common mitigations include:
- Using an allow‑list of permitted hosts (or patterns like domain suffixes).
- Restricting ports to a bounded, expected set (e.g., only 22 and 6817, or a small range).
- Optionally rejecting obvious internal/loopback addresses if this endpoint is publicly reachable.
For this specific code, the minimal fix without changing overall functionality is:
- Add a small validation layer in the
POSThandler:- Validate
hostagainst a strict pattern (e.g., hostname or IP characters). - Restrict
sshPortandslurmPortto a safe numeric range and, if appropriate, a small set of expected values.
- Validate
- Optionally, extract a
validatePorthelper to centralize port checks.
We only touch airavata-portal/src/app/api/v1/connectivity-test/slurm/route.ts. Concretely:
- After parsing
body, validatehostand ports:- Reject missing or invalid
host. - Ensure ports are integers within
1–65535, and, if desired, only allow22and6817or a short allow‑list.
- Reject missing or invalid
- Return
400with an explanatory message when validation fails. - Leave
testPortandsocket.connectunchanged; the taintedness is removed because we no longer pass untrusted values—only validated/sanitized ones.
This addresses both variants because the only data that reaches socket.connect will have passed validation.
| @@ -6,17 +6,39 @@ | ||
| const body = await request.json(); | ||
| const { host, sshPort = 22, slurmPort = 6817 } = body; | ||
|
|
||
| if (!host) { | ||
| if (!host || typeof host !== "string") { | ||
| return NextResponse.json( | ||
| { success: false, message: "Host is required" }, | ||
| { status: 400 } | ||
| ); | ||
| } | ||
|
|
||
| // Basic hostname/IP validation to avoid arbitrary SSRF targets | ||
| const hostPattern = /^(localhost|(([a-zA-Z0-9-]+\.)*[a-zA-Z0-9-]+)|(\d{1,3}(\.\d{1,3}){3}))$/; | ||
| if (!hostPattern.test(host)) { | ||
| return NextResponse.json( | ||
| { success: false, message: "Invalid host format" }, | ||
| { status: 400 } | ||
| ); | ||
| } | ||
|
|
||
| const parsedSshPort = Number(sshPort); | ||
| const parsedSlurmPort = Number(slurmPort); | ||
|
|
||
| const isValidPort = (port: number) => | ||
| Number.isInteger(port) && port > 0 && port <= 65535; | ||
|
|
||
| if (!isValidPort(parsedSshPort) || !isValidPort(parsedSlurmPort)) { | ||
| return NextResponse.json( | ||
| { success: false, message: "Invalid port value" }, | ||
| { status: 400 } | ||
| ); | ||
| } | ||
|
|
||
| // Test both SSH and SLURM ports | ||
| const [sshAccessible, slurmAccessible] = await Promise.all([ | ||
| testPort(host, sshPort, 5000), | ||
| testPort(host, slurmPort, 5000), | ||
| testPort(host, parsedSshPort, 5000), | ||
| testPort(host, parsedSlurmPort, 5000), | ||
| ]); | ||
|
|
||
| const success = sshAccessible && slurmAccessible; |
| resolve(false); | ||
| }); | ||
|
|
||
| socket.connect(port, host); |
Check failure
Code scanning / CodeQL
Server-side request forgery Critical
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 12 days ago
In general, to fix this kind of issue you must prevent untrusted clients from arbitrarily controlling the target of outgoing connections. This is usually done by (a) restricting the host to an allowlist of known-safe values or domains, and/or (b) validating that the host is not an internal/loopback/multicast/etc. address; and (c) constraining the port to an expected, small range (e.g., 22 only, or safe SSH ports).
For this specific code, the minimal fix without changing overall behavior too much is:
- Validate that
hostis a syntactically valid hostname or IP address. - Resolve
hostto an IP, and reject it if it’s loopback, private, link-local, multicast, or otherwise internal. - Validate that
portis an integer in an allowed range, e.g., 1–65535 and optionally only SSH-like ports (e.g., 22 and maybe a few others). - Only then call
testPortwith the validated values.
We can implement this entirely within airavata-portal/src/app/api/v1/connectivity-test/ssh/route.ts by:
- Adding a small helper
isIpPrivateusing Node’s built-innetmodule (already imported asnet) anddns.promises.lookupto resolve hostnames to IP addresses. - Adding validation logic in
POST:- Parse and sanitize
portinto a number and check its range. - Check that
hostis a string and passes basic hostname/IP regex. - Use
dns.promises.lookupplusisIpPrivateto block private/loopback/etc. IPs.
- Parse and sanitize
- Use the validated
safeHostandsafePortwhen callingtestPort.
This directly addresses CodeQL’s complaint by ensuring the value flowing into socket.connect is not an arbitrary user-controlled network destination.
| stdin, shell, iopub, hb, control = base_port, base_port+1, base_port+2, base_port+3, base_port+4 | ||
|
|
||
| proc_name = f"{rt_name}_kernel" | ||
| temp_fp = Path(tempfile.mktemp(prefix="connection_", suffix=".json")) |
Check failure
Code scanning / CodeQL
Insecure temporary file High
|
|
||
| # Verify user was created | ||
| ssh = paramiko.SSHClient() | ||
| ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) |
Check failure
Code scanning / CodeQL
Accepting unknown SSH host keys when using Paramiko High test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 12 days ago
In general, this problem is fixed by not using AutoAddPolicy/WarningPolicy and instead either relying on Paramiko’s default RejectPolicy or explicitly using RejectPolicy, combined with ensuring that the host’s key is already known (e.g., via system/user known_hosts or by programmatically loading expected keys). This preserves host key verification and prevents silent acceptance of unknown hosts.
For this specific test, the least invasive fix is to stop using AutoAddPolicy and use RejectPolicy while ensuring host keys are loaded from the environment. Since these tests connect to containers started by the test harness, it is reasonable to assume standard SSH host key management on the test machine. Concretely, we can remove the explicit set_missing_host_key_policy call (letting Paramiko use its default RejectPolicy), and ensure the client loads known host keys by calling ssh.load_system_host_keys() and ssh.load_host_keys from the user’s known_hosts file before connecting. This keeps the functional behavior (connect to the container and run commands) while aligning with secure defaults; if the host is unknown, the test should fail explicitly rather than silently accepting the key.
Changes are confined to deployment/ansible/tests/test_base_role.py in the _test_base_role function around lines 64–71. We will:
- Add calls to
ssh.load_system_host_keys()andssh.load_host_keys(str(Path.home() / ".ssh" / "known_hosts"))after creating theSSHClient. - Remove the
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())line so we no longer weaken the host key policy.
No new imports are needed;Pathis already imported frompathlib, andparamikois already imported.
| @@ -62,7 +62,8 @@ | ||
|
|
||
| # Verify user was created | ||
| ssh = paramiko.SSHClient() | ||
| ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) | ||
| ssh.load_system_host_keys() | ||
| ssh.load_host_keys(str(Path.home() / ".ssh" / "known_hosts")) | ||
| ssh.connect( | ||
| container_info["host"], | ||
| port=container_info["port"], |
|
|
||
| # Verify MariaDB is running | ||
| ssh = paramiko.SSHClient() | ||
| ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) |
Check failure
Code scanning / CodeQL
Accepting unknown SSH host keys when using Paramiko High test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 12 days ago
In general, the fix is to avoid AutoAddPolicy and ensure that Paramiko either (a) uses the secure default RejectPolicy or (b) explicitly sets RejectPolicy and loads known host keys (from a file or in-memory mapping). This preserves host authenticity checks and causes the connection attempt to fail if the host key is not recognized, instead of silently accepting a potentially spoofed host.
For this specific test, the minimal change that keeps behavior as close as possible while improving security is to stop using AutoAddPolicy. Since SSHClient already defaults to RejectPolicy, the simplest fix is to remove the set_missing_host_key_policy call entirely. If you want to be explicit, you can set paramiko.RejectPolicy() instead. Given the constraints (only editing the shown snippet, no assumptions about the rest of the project), the best fix is to replace ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) with ssh.set_missing_host_key_policy(paramiko.RejectPolicy()). This keeps the test logic unchanged except that it will now fail fast if the container’s SSH host key is not known/acceptable.
Concretely:
- In
deployment/ansible/tests/test_database_role.py, at the creation of theSSHClientin_test_database_role, replace theAutoAddPolicycall withRejectPolicy. - No new imports are necessary because the file already imports
paramiko. - No other code changes are required; the connection and command execution logic remain untouched.
| @@ -64,7 +64,7 @@ | ||
|
|
||
| # Verify MariaDB is running | ||
| ssh = paramiko.SSHClient() | ||
| ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) | ||
| ssh.set_missing_host_key_policy(paramiko.RejectPolicy()) | ||
| ssh.connect( | ||
| container_info["host"], | ||
| port=container_info["port"], |
Remove server.key, server.crt, and airavata.sym.p12 from version control. These are dev-only self-signed credentials regenerated by generate_keystore.sh. Already gitignored for future additions; this removes the existing tracked copies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
||
| var entity = new HttpEntity<String>(headers); | ||
|
|
||
| var responseEntity = restTemplate.exchange(url, HttpMethod.GET, entity, ExperimentStorageResponse.class); |
Check failure
Code scanning / CodeQL
Server-side request forgery Critical
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 12 days ago
To fix this, we should ensure that the value taken from UserContext.gatewayId() cannot cause the server to make arbitrary outbound requests. Because we cannot change the broader architecture, the least intrusive and clearest fix is to validate and normalize the gatewayId value inside fetchExperimentStorageFromAPI before using it in the URL. That keeps existing behavior for valid gateway IDs while preventing malformed or attacker-controlled values from being used as hostnames.
Concretely:
-
In
AgentFileService.fetchExperimentStorageFromAPI:- Read
UserContext.gatewayId()into a local variable. - Validate that it is non-null, non-empty, and matches a strict pattern for allowed gateway identifiers (e.g., lowercase letters, digits, and dashes only), and does not start or end with
-. This prevents characters like.,/,:, etc., from influencing the hostname beyond the intended subdomain and avoids obvious malformed labels. - Optionally, also enforce a reasonable length limit.
- If validation fails, throw an
IllegalArgumentException(or similar) to stop the request rather than making an external HTTP call. - Use the validated value to construct the URL as before.
- Read
-
The code change is localized to
airavata-api/modules/grpc-api/src/main/java/org/apache/airavata/grpc/service/AgentFileService.java, inside thefetchExperimentStorageFromAPImethod. No external dependencies are needed; we can useString.matcheswith a regex or a simple manual check. -
No changes are required in
AuthzTokenFilter,AuthzToken, orUserContextto mitigate the SSRF; they may remain as-is, since we are sanitizing at the point where the value influences an HTTP request.
| @@ -163,9 +163,19 @@ | ||
| } | ||
|
|
||
| private ExperimentStorageResponse fetchExperimentStorageFromAPI(String experimentId, String path) { | ||
| var url = "https://" + UserContext.gatewayId() + ".cybershuttle.org/api/experiment-storage/" + experimentId | ||
| + "/" + path; | ||
| // Validate gatewayId derived from user claims before using it in a URL to avoid SSRF. | ||
| var rawGatewayId = UserContext.gatewayId(); | ||
| if (rawGatewayId == null || rawGatewayId.isBlank()) { | ||
| throw new IllegalArgumentException("Missing 'gatewayID' claim in the authentication token"); | ||
| } | ||
| // Allow only lowercase letters, digits and dashes in gatewayId, and prevent leading/trailing dash. | ||
| var gatewayId = rawGatewayId.trim(); | ||
| if (!gatewayId.matches("^[a-z0-9](?:[a-z0-9-]*[a-z0-9])?$")) { | ||
| throw new IllegalArgumentException("Invalid gatewayID claim value"); | ||
| } | ||
|
|
||
| var url = "https://" + gatewayId + ".cybershuttle.org/api/experiment-storage/" + experimentId + "/" + path; | ||
|
|
||
| var headers = new HttpHeaders(); | ||
| headers.setBearerAuth(UserContext.authzToken().getAccessToken()); | ||
| headers.setAll(UserContext.authzToken().getClaimsMap()); |
| .toUriString(); | ||
| var request = new HttpEntity<>(roles, createAuthHeaders(accessToken)); | ||
| try { | ||
| restTemplate.exchange(url, HttpMethod.POST, request, Void.class); |
Check failure
Code scanning / CodeQL
Server-side request forgery Critical
Copilot Autofix
AI 11 days ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
| try { | ||
| @SuppressWarnings("unchecked") | ||
| ResponseEntity<List<?>> response = (ResponseEntity<List<?>>) | ||
| (ResponseEntity<?>) restTemplate.exchange(url, HttpMethod.GET, request, List.class); |
Check failure
Code scanning / CodeQL
Server-side request forgery Critical
Copilot Autofix
AI 11 days ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
| .toUriString(); | ||
| var request = new HttpEntity<>(roles, createAuthHeaders(accessToken)); | ||
| try { | ||
| restTemplate.exchange(url, HttpMethod.DELETE, request, Void.class); |
Check failure
Code scanning / CodeQL
Server-side request forgery Critical
Copilot Autofix
AI 11 days ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
| .toUriString(); | ||
| var request = new HttpEntity<>(roles, createAuthHeaders(accessToken)); | ||
| try { | ||
| restTemplate.exchange(url, HttpMethod.POST, request, Void.class); |
Check failure
Code scanning / CodeQL
Server-side request forgery Critical
Copilot Autofix
AI 11 days ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
| var request = new HttpEntity<Void>(headers); | ||
| try { | ||
| ResponseEntity<UserRepresentation> response = | ||
| restTemplate.exchange(url, HttpMethod.GET, request, UserRepresentation.class); |
Check failure
Code scanning / CodeQL
Server-side request forgery Critical
Copilot Autofix
AI 11 days ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
| try { | ||
| @SuppressWarnings("unchecked") | ||
| ResponseEntity<List<?>> response = (ResponseEntity<List<?>>) (ResponseEntity<?>) | ||
| restTemplate.exchange(builder.toUriString(), HttpMethod.GET, request, List.class); |
Check failure
Code scanning / CodeQL
Server-side request forgery Critical
Copilot Autofix
AI 11 days ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
| formData.add("client_id", ADMIN_CLI_CLIENT_ID); | ||
|
|
||
| var request = new HttpEntity<>(formData, headers); | ||
| var response = restTemplate.exchange(tokenUrl, HttpMethod.POST, request, TokenResponse.class); |
Check failure
Code scanning / CodeQL
Server-side request forgery Critical
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 12 days ago
General approach: constrain the untrusted realm/gatewayId input before it is used to build the Keycloak URL, by (a) enforcing that it is a simple, safe identifier (no scheme/host, no slashes or whitespace), and ideally (b) restricting it to a set or pattern of allowed realm names (e.g., matching known gateways plus "master"). Since we cannot see or modify the broader gateway registry here, the minimal robust fix is to add a local validator in KeycloakRestClient that enforces a conservative pattern for realm names and use it before building the URL. This cuts the taint chain at the point where it matters for SSRF without changing external behavior for legitimate realm IDs.
Concrete single best fix:
- In
KeycloakRestClient, add a private helper, e.g.sanitizeRealm(String realm), that:- Rejects
nullor blank values with anIamAdminServicesException. - Trims whitespace.
- Ensures the realm matches a safe regex such as
^[A-Za-z0-9._-]+$(letters, digits, dot, underscore, dash), preventing injection of/,:,?,#, etc. - Optionally logs a warning when rejecting.
- Rejects
- In
obtainAdminToken(String realm, PasswordCredential credentials):- Call
sanitizeRealm(realm)at the start and use the sanitized value for both the cache key and path segment. - This ensures even if upstream code passes a malicious or malformed realm from token claims or request parameters, it cannot influence the URL beyond a constrained path segment on the configured Keycloak host.
- Call
This change is local to KeycloakRestClient.obtainAdminToken, which is the sink for all 14 variants. It doesn’t alter the configured serverUrl or the contract of upstream services; it only rejects invalid realm values early and with a clear error.
Implementation details:
-
File to modify:
airavata-api/modules/airavata-api/src/main/java/org/apache/airavata/iam/keycloak/KeycloakRestClient.java. -
Add the
sanitizeRealmmethod near other private helpers (e.g., close toloadKeyStore/createTrustManagers), usingIamAdminServicesException(already imported). -
Update
obtainAdminTokento:public String obtainAdminToken(String realm, PasswordCredential credentials) throws IamAdminServicesException { String safeRealm = sanitizeRealm(realm); var cacheKey = safeRealm + ":" + credentials.getLoginUserName(); ... var tokenUrl = UriComponentsBuilder.fromUriString(serverUrl) .pathSegment("realms", safeRealm, "protocol", "openid-connect", "token") .toUriString(); ... }
-
No new imports are needed; regex is done via
String.matches, which is part ofjava.lang.String.
This single change ensures that all dataflow paths that reach obtainAdminToken (from request parameters, bodies, and token claims) are validated before being used to construct the outbound HTTP request, satisfying the SSRF mitigation guidance.
| @@ -164,10 +164,31 @@ | ||
| } | ||
|
|
||
| /** | ||
| * Validate and normalize a Keycloak realm identifier derived from external input. | ||
| * Restricts the realm to a safe character set to prevent injection into the request URL. | ||
| */ | ||
| private String sanitizeRealm(String realm) throws IamAdminServicesException { | ||
| if (realm == null) { | ||
| throw new IamAdminServicesException("Realm must not be null"); | ||
| } | ||
| String trimmed = realm.trim(); | ||
| if (trimmed.isEmpty()) { | ||
| throw new IamAdminServicesException("Realm must not be empty"); | ||
| } | ||
| // Allow only simple identifier characters: letters, digits, dot, underscore, and dash | ||
| if (!trimmed.matches("^[A-Za-z0-9._-]+$")) { | ||
| logger.warn("Rejected unsafe realm value: {}", realm); | ||
| throw new IamAdminServicesException("Invalid realm value"); | ||
| } | ||
| return trimmed; | ||
| } | ||
|
|
||
| /** | ||
| * Obtain admin access token using password grant. | ||
| */ | ||
| public String obtainAdminToken(String realm, PasswordCredential credentials) throws IamAdminServicesException { | ||
| var cacheKey = realm + ":" + credentials.getLoginUserName(); | ||
| String safeRealm = sanitizeRealm(realm); | ||
| var cacheKey = safeRealm + ":" + credentials.getLoginUserName(); | ||
| var cached = tokenCache.get(cacheKey); | ||
| if (cached != null && !cached.isExpired()) { | ||
| return cached.getToken(); | ||
| @@ -175,7 +193,7 @@ | ||
|
|
||
| try { | ||
| var tokenUrl = UriComponentsBuilder.fromUriString(serverUrl) | ||
| .pathSegment("realms", realm, "protocol", "openid-connect", "token") | ||
| .pathSegment("realms", safeRealm, "protocol", "openid-connect", "token") | ||
| .toUriString(); | ||
| var headers = new HttpHeaders(); | ||
| headers.setContentType(MediaType.APPLICATION_FORM_URLENCODED); |
| @Bean | ||
| public SecurityFilterChain securityFilterChain(HttpSecurity http, ObjectProvider<JwtDecoder> jwtDecoderProvider) | ||
| throws Exception { | ||
| http.csrf(csrf -> csrf.disable()) |
Check failure
Code scanning / CodeQL
Disabled Spring CSRF protection High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 12 days ago
In general, the fix is to stop disabling CSRF globally and instead either (a) leave Spring’s default CSRF protection enabled, or (b) selectively ignore CSRF for endpoints that are strictly stateless API calls (e.g. /api/**) while keeping it for any browser-based or form-login flows. For a JWT-based stateless API, the usual pattern is to configure csrf with a matcher that ignores API paths, rather than calling disable().
The minimal change that preserves existing functionality is: replace .csrf(csrf -> csrf.disable()) with a configuration that tells Spring Security to ignore CSRF for the API endpoints, leaving CSRF protection in place for anything else (present or future) that might rely on cookies or form logins. For example, configure csrf using csrf.ignoringRequestMatchers("/api/**"). This keeps the current behaviour for the JWT-protected /api/v1/** endpoints (no CSRF token required, so clients won’t break) while removing the global disabling that CodeQL is flagging. No other logic in securityFilterChain needs to change, and no new methods or imports are strictly required, because ignoringRequestMatchers is available on Spring Security’s CsrfConfigurer and accepts String ant-style patterns.
Concretely, in airavata-api/modules/airavata-api/src/main/java/org/apache/airavata/config/WebSecurityConfiguration.java, update the line:
http.csrf(csrf -> csrf.disable())to:
http.csrf(csrf -> csrf.ignoringRequestMatchers("/api/**"))and leave the rest of the method as is. This confines the CSRF exemption to the API paths, allowing Spring’s CSRF mechanism to remain available for any non-API, browser-facing endpoints now or in the future.
| @@ -45,7 +45,7 @@ | ||
| @Bean | ||
| public SecurityFilterChain securityFilterChain(HttpSecurity http, ObjectProvider<JwtDecoder> jwtDecoderProvider) | ||
| throws Exception { | ||
| http.csrf(csrf -> csrf.disable()) | ||
| http.csrf(csrf -> csrf.ignoringRequestMatchers("/api/**")) | ||
| .cors(cors -> {}) | ||
| .sessionManagement(session -> session.sessionCreationPolicy(SessionCreationPolicy.STATELESS)) | ||
| .formLogin(form -> form.disable()) |
| } | ||
| }; | ||
| var sslContext = SSLContext.getInstance("TLS"); | ||
| sslContext.init(null, trustAllCerts, new java.security.SecureRandom()); |
Check failure
Code scanning / CodeQL
`TrustManager` that accepts all certificates High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 12 days ago
In general, the fix is to stop using a custom X509TrustManager that does not validate certificates, and instead rely on the platform’s default trust managers or a TrustManager created from a KeyStore that contains only the certificates you explicitly trust. The checkServerTrusted implementation must perform real certificate chain validation, typically by delegating to a properly initialized TrustManagerFactory, rather than silently accepting all certificates.
The best fix here, without changing external behavior beyond removing the insecure mode, is to remove the anonymous trust‑all X509TrustManager and replace createTrustAllSSLContext with an implementation that builds an SSLContext using the default trust configuration. This uses TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm()), initializes it with a null keystore (meaning: use system/default trust store), obtains its trust managers, and passes them into SSLContext.init. This preserves functionality for normal, valid certificates while closing the vulnerability created by accepting any certificate. Concretely, within RestClientConfiguration.java, only lines 122–139 (the body of createTrustAllSSLContext) need to be replaced; no new imports are required because TrustManagerFactory and related classes are already imported at the top of the file.
| @@ -117,25 +117,18 @@ | ||
| } | ||
|
|
||
| /** | ||
| * Create SSLContext that trusts all certificates (for development/test only). | ||
| * Create SSLContext using the default trusted certificates. | ||
| */ | ||
| private SSLContext createTrustAllSSLContext() throws Exception { | ||
| var trustAllCerts = new TrustManager[] { | ||
| new X509TrustManager() { | ||
| @Override | ||
| public java.security.cert.X509Certificate[] getAcceptedIssuers() { | ||
| return null; | ||
| } | ||
| // Use the default TrustManagerFactory, which relies on the system's trusted CAs | ||
| TrustManagerFactory trustManagerFactory = | ||
| TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm()); | ||
| // Passing null uses the default trust store (for example, JRE cacerts or OS trust store) | ||
| trustManagerFactory.init((KeyStore) null); | ||
| TrustManager[] trustManagers = trustManagerFactory.getTrustManagers(); | ||
|
|
||
| @Override | ||
| public void checkClientTrusted(java.security.cert.X509Certificate[] certs, String authType) {} | ||
|
|
||
| @Override | ||
| public void checkServerTrusted(java.security.cert.X509Certificate[] certs, String authType) {} | ||
| } | ||
| }; | ||
| var sslContext = SSLContext.getInstance("TLS"); | ||
| sslContext.init(null, trustAllCerts, new java.security.SecureRandom()); | ||
| SSLContext sslContext = SSLContext.getInstance("TLS"); | ||
| sslContext.init(null, trustManagers, new java.security.SecureRandom()); | ||
| return sslContext; | ||
| } | ||
|
|
… symmetric keystore - Issue cert for localhost, airavata.localhost, and *.airavata.localhost (SANs) - Extend validity to 10 years (3650 days) - Generate standalone airavata.sym.p12 (AES-256 only) for credential encryption - Regenerate cert on --force (previously only checked for missing files) - Improve comments and output messaging Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…harden Docker build - Untrack generated .pb.go protobuf files (already gitignored, just still tracked) - Fix Maven -pl references: airavata-api → modules/airavata-api (matches pom.xml module declarations) - Fix API_MODULE path in test-service-startup.sh (was double-nesting airavata-api) - Fix scripts/README.md: "From project root" → "From airavata-api/" - Expand .dockerignore: add .git, .env, .claude, *.iml, *.log, *.md, node_modules - Dockerfile: install curl for HEALTHCHECK, remove broken COPY ./*.sh glob, consolidate chmod into tar extraction step Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Created ConfigMap for Airavata configuration settings. - Added Ingress resources for routing to various services (portal, API server, Keycloak, Temporal). - Introduced Keycloak deployment and service for identity management. - Implemented MariaDB deployment, StatefulSet, and initialization job for database setup. - Added namespace definition for Airavata. - Created portal deployment with Horizontal Pod Autoscaler and PodDisruptionBudget. - Defined secrets for sensitive information management. - Added Temporal deployment and service for workflow management. - Updated deployment scripts to reflect changes in configuration file naming.
…ecture
Migrate the Airavata platform from legacy Thrift/OpenJPA architecture to a modern Spring Boot application with clean service layer boundaries.
Key changes: