Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# AI Agent Guide: Psoxy AWS Example Repository

## What is This Repository?

This is a **Terraform template repository** for deploying the [Worklytics Pseudonymizing Proxy (Psoxy)](https://github.com/Worklytics/psoxy) on **Amazon Web Services (AWS)**.

Psoxy is a serverless, pseudonymizing Data Loss Prevention (DLP) layer that sits between Worklytics and your organization's data sources (SaaS APIs, cloud storage, etc.). It replaces PII with hash tokens, enabling analysis on anonymized data while enforcing access controls and compliance requirements.

## Purpose

This example repository provides:
- **Pre-configured Terraform modules** that reference the main Psoxy repository
- **Example configurations** for common data sources (Google Workspace, Microsoft 365, Slack, GitHub, etc.)
- **Helper scripts** for initialization, prerequisite checking, and testing
- **Infrastructure-as-code** templates ready for customization

## Key Relationships

- **Main Repository**: [https://github.com/Worklytics/psoxy](https://github.com/Worklytics/psoxy)
- Contains the core Psoxy Java implementation
- Provides Terraform modules used by this example
- Houses documentation and development resources

- **Documentation**: [https://docs.worklytics.co/psoxy](https://docs.worklytics.co/psoxy)
- Comprehensive deployment guides
- Configuration reference
- Troubleshooting and best practices
- Data source-specific documentation

- **This Example**: A template that customers clone and customize for their AWS deployment

## How This Repository Works

1. **Template Structure**: Customers use this as a GitHub template or clone it to create their own deployment repository
2. **Terraform Modules**: References modules from the main Psoxy repo via Git URLs (e.g., `git::https://github.com/worklytics/psoxy//infra/modules/...`)
3. **Version Pinning**: Each release of this example references a specific version tag of the main Psoxy repository
4. **Customization**: Customers modify `terraform.tfvars` and Terraform files to match their environment and data sources

## Common Tasks for AI Agents

### Understanding the Deployment

- **Read the main README.md** in this repository for human-facing setup instructions
- **Review terraform.tfvars** to understand configuration variables
- **Examine main.tf** to see which modules are being used
- **Check available-connectors** script to see supported data sources

### Helping Users Deploy

1. **Prerequisites**: Guide users to run `./check-prereqs` and install missing tools
2. **Authentication**: Help configure AWS CLI, GCloud CLI (for Google Workspace), or Azure CLI (for Microsoft 365)
3. **Initialization**: Run `./init` to generate `terraform.tfvars` from prompts
4. **Customization**: Help users modify Terraform files to enable/disable data sources
5. **Deployment**: Guide through `terraform plan` and `terraform apply`

### Troubleshooting

- **Reference the main docs**: [https://docs.worklytics.co/psoxy](https://docs.worklytics.co/psoxy)
- **Check AWS-specific docs**: [https://docs.worklytics.co/psoxy/aws/getting-started](https://docs.worklytics.co/psoxy/aws/getting-started)
- **Review Terraform state** and error messages
- **Validate module versions** match the referenced Psoxy release

### Code Navigation

- **Terraform files** (`.tf`) define the infrastructure
- **Helper scripts** (`init`, `check-prereqs`, `available-connectors`) assist with setup
- **Module references** point to the main Psoxy repository at specific version tags
- **Example configurations** show how to enable various data source connectors

## Important Notes

- This is a **template repository** - users should create their own copy, not commit directly to this repo
- **Version compatibility**: The Terraform modules reference specific Psoxy release tags
- **AWS-specific**: This example is for AWS deployments; see `psoxy-example-gcp` for Google Cloud Platform
- **Security**: Users must configure authentication credentials and IAM permissions appropriately
- **Data sources**: Not all connectors are enabled by default; users customize based on their needs

## Getting More Help

- **Documentation**: [https://docs.worklytics.co/psoxy](https://docs.worklytics.co/psoxy)
- **Main Repository Issues**: [https://github.com/Worklytics/psoxy/issues](https://github.com/Worklytics/psoxy/issues)
- **Support**: [sales@worklytics.co](mailto:sales@worklytics.co)

74 changes: 67 additions & 7 deletions check-prereqs
Original file line number Diff line number Diff line change
Expand Up @@ -7,26 +7,50 @@ printf "See https://github.com/Worklytics/psoxy#prerequisites for more informati

HOMEBREW_AVAILABLE=`brew -v &> /dev/null`

CI_MODE=false
for arg in "$@"; do
if [[ "$arg" == "--ci" ]] || [[ "$arg" == "--non-interactive" ]]; then
CI_MODE=true
fi
done

# Source centralized color scheme
source "$(dirname "$0")/set-term-colorscheme.sh"

if ! git --version &> /dev/null ; then
printf "${ERR}Git not installed.${NC} Not entirely sure how you got here without it, but to install see https://git-scm.com/book/en/v2/Getting-Started-Installing-Git\n"
if $HOMEBREW_AVAILABLE; then printf " or, as you have Homebrew available, run ${CODE}brew install git${NC}\n"; fi
exit 1
if [[ "$CI_MODE" != "true" ]]; then
exit 1
fi
fi

if ! terraform -v &> /dev/null ; then
printf "${ERR}Terraform CLI not available.${NC} Psoxy examples / deployment scripts require it. See ${CODE}https://developer.hashicorp.com/terraform/downloads${NC} for installation options\n"
exit 1
if [[ "$CI_MODE" != "true" ]]; then
exit 1
fi
else
TF_VERSION_FULL=$(terraform -version | head -n 1)
TF_VERSION_MAJOR_MINOR=$(echo "$TF_VERSION_FULL" | sed -n 's/^Terraform v\([0-9]*\.[0-9]*\).*$/\1/p')
TF_MAJOR=$(echo "$TF_VERSION_MAJOR_MINOR" | cut -d. -f1)
TF_MINOR=$(echo "$TF_VERSION_MAJOR_MINOR" | cut -d. -f2)
if (( TF_MAJOR < 1 || (TF_MAJOR == 1 && TF_MINOR < 7) )); then
printf "${ERR}This Terraform version appears to be unsupported.${NC} Psoxy requires a supported version of Terraform 1.7 or later.\n"
printf "We recommend you upgrade. See https://developer.hashicorp.com/terraform/downloads\n"
else
printf "Your Terraform version is ${CODE}${TF_VERSION_FULL}${NC}.\n"
fi
fi

# Check Maven installation

if ! mvn -v &> /dev/null ; then
printf "${WARN}Maven not installed.${NC} It is REQUIRED unless you will use a pre-built JAR. To install, see https://maven.apache.org/install.html\n"
printf "${WARN}Maven not installed.${NC} It is REQUIRED unless you will use a pre-built JAR.\n"
printf " Note: Java JDK and Maven are only needed if building and bundling the java from source.\n"
printf " To install Maven, see https://maven.apache.org/install.html\n"
if $HOMEBREW_AVAILABLE; then printf " or, as you have Homebrew available, run ${CODE}brew install maven${NC}\n"; fi
printf " (Using a prebuilt jar requires adding ${CODE}deployment_bundle=""${NC} to your ${CODE}terraform.tfvars${NC} file, and filling with s3/gcs uri for your desired JAR)\n"
printf " (Using a prebuilt jar requires adding ${CODE}deployment_bundle=""${NC} to your ${CODE}terraform.tfvars${NC} file, and filling with s3/gcs uri for your desired JAR. The JRE of your host platform (AWS/GCP) will still be used at runtime).\n"
else
MVN_VERSION=`mvn -v | grep "Apache Maven"`
MVN_VERSION_MAJOR_MINOR=$(echo $MVN_VERSION | sed -n 's/^Apache Maven \([0-9]*\.[0-9]*\).*$/\1/p')
Expand All @@ -49,9 +73,9 @@ else

printf "Your Maven installation uses ${CODE}${JAVA_VERSION}${NC}.\n"

if [[ "$JAVA_VERSION_MAJOR" != 17 && "$JAVA_VERSION_MAJOR" != 21 && "$JAVA_VERSION_MAJOR" != 23 && "$JAVA_VERSION_MAJOR" != 24 ]]; then
printf "${ERR}This Java version appears to be unsupported. You should upgrade it, or may have compile errors.${NC} Psoxy requires an Oracle-supported version of Java 17 or later; as of April 2025, this includes Java 17, 21, or 24. See https://maven.apache.org/install.html\n"
if $HOMEBREW_AVAILABLE; then printf "or as you have Homebrew available, run ${CODE}brew install openjdk@17${NC}\n"; fi
if [[ "$JAVA_VERSION_MAJOR" != 21 && "$JAVA_VERSION_MAJOR" != 25 && "$JAVA_VERSION_MAJOR" != 26 ]]; then
Copy link
Copy Markdown
Member

@jlorper jlorper May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should support non LTS versions (26), then why not the ones between 21 and 25?
if [[ "$JAVA_VERSION_MAJOR" -lt 21 ]]; then

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

22-24 are unsupported generally ... not sure we should; but yeah, i guess reasonable to just not force people to upgrade; will add back in next version.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as i recall, 24 has issues w some versions of maven though ...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this is warning, not blocker.

printf "${ERR}This Java version appears to be unsupported. You should upgrade it, or may have compile errors.${NC} Psoxy requires an Oracle-supported version of Java 21 or later; as of March 2026, this includes Java 21, 25, and 26. See https://maven.apache.org/install.html\n"
if $HOMEBREW_AVAILABLE; then printf "or as you have Homebrew available, run ${CODE}brew install openjdk@21${NC}\n"; fi
printf "If you have an alternative JDK installed, then you must update your ${CODE}JAVA_HOME${NC} environment variable to point to it.\n"
fi

Expand Down Expand Up @@ -87,6 +111,30 @@ else
printf "AWS CLI version ${CODE}`aws --version`${NC} is installed.\n"
printf ""
printf "\t- make sure ${CODE}aws sts get-caller-identity${NC} returns the user/role/account you expect. $AWSCLI_REASON\n"

if aws sts get-caller-identity &> /dev/null; then
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text 2>/dev/null)
# the || true ensures that we fail silently even if set -e is on, and the 2>/dev/null handles standard error
AWS_CONCURRENCY=$(aws lambda get-account-settings --query 'AccountLimit.ConcurrentExecutions' --output text 2>/dev/null || true)
if [[ -n "$AWS_CONCURRENCY" && "$AWS_CONCURRENCY" =~ ^[0-9]+$ ]]; then
if (( AWS_CONCURRENCY < 1000 )); then
printf "\t- ${WARN}Warning: AWS Lambda account-level concurrency quota for account $AWS_ACCOUNT_ID is $AWS_CONCURRENCY, which is < 1000.${NC}\n"
printf "\t If this is the AWS account to which your lambda instances will be deployed, ensure that this amount is sufficient for your use case (we recommend at least 100).\n"
else
printf "\t- AWS Lambda account-level concurrency quota for account ${CODE}${AWS_ACCOUNT_ID}${NC} is ${CODE}${AWS_CONCURRENCY}${NC}.\n"
fi
fi

# Check for IAM Role quotas
AWS_IAM_ROLES_QUOTA=$(aws service-quotas get-service-quota --service-code iam --quota-code L-FE177D64 --query 'Quota.Value' --output text 2>/dev/null || true)
if [[ -n "$AWS_IAM_ROLES_QUOTA" && "$AWS_IAM_ROLES_QUOTA" =~ ^[0-9]+(\.[0-9]+)?$ ]]; then
AWS_IAM_ROLES_QUOTA=${AWS_IAM_ROLES_QUOTA%.*} # truncate decimals
printf "\t- AWS IAM Roles quota for account ${CODE}${AWS_ACCOUNT_ID}${NC} is ${CODE}${AWS_IAM_ROLES_QUOTA}${NC}.\n"
if (( AWS_IAM_ROLES_QUOTA < 1000 )); then
printf "\t ${WARN}Warning: you may need a higher limit if deploying many Psoxy instances.${NC}\n"
fi
fi
fi
fi

printf "\n"
Expand All @@ -99,6 +147,18 @@ if ! gcloud --version &> /dev/null ; then
else
printf "Google Cloud SDK version ${CODE}`gcloud --version 2> /dev/null | head -n 1`${NC} is installed.\n"
printf "\t- make sure ${CODE}gcloud auth list --filter=\"status:ACTIVE\"${NC} returns the account you expect. $GCLOUD_REASON\n"

if gcloud auth list --filter="status:ACTIVE" --format="value(account)" 2>/dev/null | grep -q '@'; then
GCP_PROJECT_ID=$(gcloud config get-value project 2>/dev/null || true)
if [[ -n "$GCP_PROJECT_ID" ]]; then
# Check Cloud Functions Quota
GCP_FUNCTIONS_QUOTA=$(gcloud compute project-info describe --project="$GCP_PROJECT_ID" --format="value(quotas.value)" --flatten="quotas[]" --filter="quotas.metric:CLOUD_FUNCTIONS_API_REQUESTS_PER_100_SECONDS" 2>/dev/null || true)
if [[ -n "$GCP_FUNCTIONS_QUOTA" && "$GCP_FUNCTIONS_QUOTA" =~ ^[0-9]+(\.[0-9]+)?$ ]]; then
GCP_FUNCTIONS_QUOTA=${GCP_FUNCTIONS_QUOTA%.*} # truncate decimals
printf "\t- GCP Cloud Functions (per 100s) quota for project ${CODE}${GCP_PROJECT_ID}${NC} is ${CODE}${GCP_FUNCTIONS_QUOTA}${NC}.\n"
fi
fi
fi
fi

printf "\n"
Expand Down
17 changes: 12 additions & 5 deletions google-workspace-variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ variable "google_workspace_sa_to_impersonate" {

variable "google_workspace_terraform_principal_email" {
type = string
description = "Email of GCP principal that will be used to provision GCP resources via impersonation. Leave 'null' to use application default for you environment."
description = "Email of the principal (human user or service account) actively running Terraform. Used internally to grant this runner identity access to newly provisioned resources (like buckets/secrets). This is your 'true identity', distinct from `google_workspace_sa_to_impersonate` which is the identity Terraform assumes to create resources."
default = null

validation {
Expand All @@ -38,25 +38,25 @@ variable "google_workspace_terraform_principal_email" {

variable "google_workspace_example_user" {
type = string
description = "user to impersonate for Google Workspace API calls (null for none)"
description = "[DEPRECATED - use map instead] user to impersonate for Google Workspace API calls (null for none)"
default = null
}

variable "google_workspace_example_admin" {
type = string
description = "user to impersonate for Google Workspace API calls (null for value of `google_workspace_example_user`)"
description = "[DEPRECATED - use map instead] user to impersonate for Google Workspace API calls (null for value of `google_workspace_example_user`)"
default = null # will failover to user
}

variable "google_workspace_provision_keys" {
type = bool
description = "whether to provision key for each Google Workspace connector's GCP Service Account (OAuth Client). If false, you must create the key manually and provide it."
description = "[DEPRECATED - use map instead] whether to provision key for each Google Workspace connector's GCP Service Account (OAuth Client). If false, you must create the key manually and provide it."
default = true
}

variable "google_workspace_key_rotation_days" {
type = number
description = "rotation period for the GCP Service Account keys, in days; not applicable if provision_gcp_sa_keys is false"
description = "[DEPRECATED - use map instead] rotation period for the GCP Service Account keys, in days; not applicable if provision_gcp_sa_keys is false"
default = 60

validation {
Expand All @@ -76,3 +76,10 @@ locals {
? local.validate_google_workspace_gcp_project_id_message
: ""))
}


variable "google_workspace_connector_settings" {
type = map(any)
description = "Map of configuration settings specifically for Google Workspace connectors (e.g. example users). Note that provider-controlling parameters (like GCP project IDs or impersonation SAs) remain top-level variables."
default = {}
}
7 changes: 5 additions & 2 deletions google-workspace.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,22 @@ provider "google" {
alias = "google_workspace"

project = var.google_workspace_gcp_project_id
impersonate_service_account = var.google_workspace_sa_to_impersonate != null ? var.google_workspace_sa_to_impersonate : var.google_workspace_terraform_sa_account_email # TODO: remove ternary in 0.6.x
impersonate_service_account = var.google_workspace_sa_to_impersonate
}


module "worklytics_connectors_google_workspace" {
source = "git::https://github.com/worklytics/psoxy//infra/modules/worklytics-connectors-google-workspace?ref=v0.5.18"
source = "git::https://github.com/worklytics/psoxy//infra/modules/worklytics-connectors-google-workspace?ref=v0.6.0"

google_workspace_connector_settings = var.google_workspace_connector_settings


providers = {
google = google.google_workspace
}

environment_id = var.environment_name
base_dir = var.psoxy_base_dir
enabled_connectors = var.enabled_connectors
gcp_project_id = var.google_workspace_gcp_project_id
tf_gcp_principal_email = var.google_workspace_terraform_principal_email
Expand Down
41 changes: 37 additions & 4 deletions init
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
#!/bin/bash

# Use a local Azure CLI config directory if present to avoid conflicts with other Azure tenants
if [ -d "${PWD}/.azure" ]; then
export AZURE_CONFIG_DIR="${PWD}/.azure"
fi
# Psoxy init script - lite version
#
# Usage:
Expand Down Expand Up @@ -57,12 +61,41 @@ if [[ -z "$EXPLICIT_REPO_CLONE_DIR" ]]; then
exit 1
fi
else
# append trailing slash if not present
if [[ "${EXPLICIT_REPO_CLONE_DIR}" != */ ]]; then
EXPLICIT_REPO_CLONE_DIR="${EXPLICIT_REPO_CLONE_DIR}/"
# Walk up from the given path to find the repo root (identified by tools/init-example-full.sh)
CANDIDATE="$EXPLICIT_REPO_CLONE_DIR"
# strip trailing slash for consistent dirname handling
CANDIDATE="${CANDIDATE%/}"
# normalize to an absolute path so dirname traversal always makes progress toward /
if [[ -d "$CANDIDATE" ]]; then
CANDIDATE="$(cd "$CANDIDATE" 2>/dev/null && pwd -P)"
else
CANDIDATE_PARENT="$(dirname "$CANDIDATE")"
CANDIDATE_BASENAME="$(basename "$CANDIDATE")"
CANDIDATE="$(cd "$CANDIDATE_PARENT" 2>/dev/null && printf "%s/%s" "$(pwd -P)" "$CANDIDATE_BASENAME")"
fi

FOUND_REPO_ROOT=""
while [[ -n "$CANDIDATE" ]] && [[ "$CANDIDATE" != "/" ]]; do
if [[ -f "${CANDIDATE}/tools/init-example-full.sh" ]]; then
FOUND_REPO_ROOT="$CANDIDATE"
break
fi
NEXT_CANDIDATE="$(dirname "$CANDIDATE")"
if [[ "$NEXT_CANDIDATE" == "$CANDIDATE" ]]; then
break
fi
CANDIDATE="$NEXT_CANDIDATE"
done

if [[ -z "$FOUND_REPO_ROOT" ]]; then
printf "${ERR}Could not find repo root (tools/init-example-full.sh) at or above: ${EXPLICIT_REPO_CLONE_DIR}${NC}\n"
printf "Pass the path to the root of a clone of https://github.com/Worklytics/psoxy as the first argument.\n"
printf " eg ${CODE}./init ~/code/psoxy${NC}\n"
exit 1
fi

REPO_CLONE_BASE_DIR="$EXPLICIT_REPO_CLONE_DIR"
# append trailing slash
REPO_CLONE_BASE_DIR="${FOUND_REPO_ROOT}/"
fi

# pass control to the full init script.
Expand Down
Loading
Loading