diff --git a/docs/1-whats-recce/cloud-vs-oss.md b/docs/1-whats-recce/cloud-vs-oss.md
new file mode 100644
index 00000000..4d6da7a1
--- /dev/null
+++ b/docs/1-whats-recce/cloud-vs-oss.md
@@ -0,0 +1,125 @@
+---
+title: Cloud vs Open Source
+---
+
+# Cloud vs Open Source
+
+Validating data changes manually takes time and slows PR review. Recce is a data validation agent. Open Source gives you the core validation engine to run yourself, Cloud gives you the full Agent experience with automated validation on every PR.
+
+```mermaid
+flowchart LR
+ subgraph Cloud
+ direction LR
+ C1[You open PR] --> C2[Agent validates automatically]
+ C2 --> C3[Summary posted to PR]
+ end
+
+ subgraph OSS["Open Source"]
+ direction LR
+ O1[You open PR] --> O2[You run checks manually]
+ O2 --> O3[You copy results to PR]
+ end
+```
+
+## The Core Difference
+
+| | Cloud | Open Source |
+|--|-------|-------------|
+| **Experience** | Recce Agent works alongside you | You run validation manually |
+| **PR validation** | Agent validates automatically, posts summary | You run checks, copy results to PR |
+| **During development** | CLI + Agent assistance | CLI tools only |
+| **Learning curve** | Agent guides you through validation | Learn the tools, run them yourself |
+
+## Cloud
+
+Recce Cloud connects to your Git repository and data warehouse so the Recce Agent can validate your data changes automatically. When you open a PR, the Agent analyzes your changes, runs validation checks, and posts findings directly to your PR — no manual work required.
+
+**On pull requests:**
+
+The Agent runs automatically when you open a PR. It:
+
+- Analyzes your data model changes
+- Runs relevant validation checks
+- Posts a summary to your PR with findings
+- Updates as you push new commits
+
+**During development:**
+
+The Agent works with your CLI through [Recce MCP](/5-data-diffing/mcp-server/) (Model Context Protocol):
+
+- Answers questions about your changes
+- Suggests validation approaches
+- Helps interpret diff results
+
+**For your team:**
+
+- Define what "correct" means for your repo with preset checks that apply across all PRs
+- Share validation standards as institutional knowledge — everyone validates the same way
+- Developers and reviewers collaborate on validation, going back and forth until the change is verified
+
+**Pricing:**
+
+Recce Cloud is free to start. See [Pricing](https://www.reccehq.com/pricing) for plan details.
+
+**Choose Cloud when:**
+
+- You want automated validation on every PR
+- You want Agent assistance during development
+- Your team reviews data changes in PRs
+
+## Open Source
+
+Recce OSS is the core validation engine you run locally. You control when and how validation happens — run checks, explore results, and decide what to share. Everything stays on your machine unless you export it.
+
+You get:
+
+- Lineage Diff between branches
+- Data comparison (row count, schema, profile, value, top-k, histogram diff)
+- Query diff for custom validations
+- Checklist to track your checks
+
+**Choose OSS when:**
+
+- Exploring Recce before adopting Cloud
+- Working in environments without external connectivity
+- Contributing to Recce development
+
+## Feature Comparison
+
+| Feature | Cloud | OSS |
+|---------|-------|-----|
+| Lineage Diff | :white_check_mark: | :white_check_mark: |
+| Data diff (row count, schema, profile, value, top-k, histogram diff) | :white_check_mark: | :white_check_mark: |
+| Query diff | :white_check_mark: | :white_check_mark: |
+| Checklist | :white_check_mark: | :white_check_mark: |
+| Recce Agent on PRs | :white_check_mark: | :x: |
+| Agent CLI assistance | :white_check_mark: | Manual |
+| Preset checks across PRs | :white_check_mark: | Manual |
+| Shared validation standards | :white_check_mark: | Manual |
+| Developer-reviewer collaboration | :white_check_mark: | Manual |
+| PR comments & summaries | :white_check_mark: | :x: |
+| LLM-powered insights | :white_check_mark: | :x: |
+
+## FAQ
+
+**Can I start with OSS and upgrade to Cloud later?**
+
+Yes. OSS and Cloud use the same validation engine. Your existing checklists and workflows carry over when you connect to Cloud.
+
+**Does Cloud require a different setup than OSS?**
+
+Cloud connects to your Git repository and data warehouse directly. You don't need to generate artifacts locally — the Agent handles that automatically.
+
+**What data does Recce Cloud access?**
+
+Recce Cloud accesses your dbt artifacts (manifest.json, catalog.json) and runs queries against your data warehouse to perform validation. Your data stays in your warehouse.
+
+## Getting Started
+
+- **Cloud:** [Start Free with Cloud](../2-getting-started/start-free-with-cloud.md)
+- **OSS:** [OSS Setup](../2-getting-started/oss-setup.md)
+
+## Related
+
+- [What the Agent Does](../5-what-the-agent-does/index.md) — How the Recce Agent validates your changes
+- [Data Developer Workflow](../3-using-recce/data-developer.md) — Using Recce throughout development
diff --git a/docs/2-getting-started/connect-git.md b/docs/2-getting-started/connect-git.md
new file mode 100644
index 00000000..a702c603
--- /dev/null
+++ b/docs/2-getting-started/connect-git.md
@@ -0,0 +1,85 @@
+# Connect Your Repository
+
+**Goal:** Connect your GitHub or GitLab repository to Recce Cloud for automated PR data review.
+
+Recce Cloud supports GitHub and GitLab. Using a different provider? Contact us at support@reccehq.com.
+
+## Prerequisites
+
+- [x] Recce Cloud account (free trial at cloud.reccehq.com)
+- [x] Repository admin access (required to authorize app installation)
+- [x] dbt project in the repository
+
+## How It Works
+
+When you connect a Git provider, Recce Cloud maps your setup:
+
+| Git Provider | Recce Cloud |
+|--------------|-------------|
+| Organization | Organization |
+| Repository | Project |
+
+Every Recce Cloud account starts with one organization and one project. When you connect your Git provider, you select which organization and repository to link.
+
+**Monorepo support:** If you have multiple dbt projects in one repository, you can create multiple Recce Cloud projects that connect to the same repo.
+
+
+## Connect GitHub
+
+### 1. Authorize the Recce GitHub App
+
+Navigate to Settings → Git Provider in Recce Cloud. Click **Connect GitHub**.
+
+**Expected result:** GitHub authorization page opens.
+
+### 2. Select Organization and Repository
+
+Choose which GitHub organization to connect. This becomes your Recce Cloud organization.
+
+Then select the repository containing your dbt project. This becomes your Recce Cloud project.
+
+**Expected result:** Repository connected. Your Recce Cloud project is ready to use.
+
+{: .shadow}
+
+## Connect GitLab
+
+GitLab uses Personal Access Tokens (PAT) instead of OAuth.
+
+### 1. Create a Personal Access Token
+
+In GitLab: User Settings → Access Tokens → Add new token.
+
+**Required scopes:**
+
+- `api` - Full access (required for PR comments)
+- `read_api` - Read-only alternative (limited functionality)
+
+**Expected result:** Token string displayed (copy immediately).
+
+### 2. Add Token to Recce Cloud
+
+Navigate to Settings → Git Provider. Select GitLab, paste token.
+
+## Verify Success
+
+In Recce Cloud, navigate to your repository. You should see:
+
+- Connection status: "Connected"
+- Organization Project is linked to a git repository
+
+{: .shadow}
+{: .shadow}
+
+## Troubleshooting
+
+| Issue | Solution |
+| --- | --- |
+| Repository not found | Ensure proper permissions are granted (GitLab: token access, GitHub: app authorized) |
+| Invalid token (GitLab) | Generate new token with `api` scope |
+| Cannot post PR comments (GitLab) | Regenerate token with `api` scope instead of `read_api` |
+
+## Next Steps
+
+- [Connect Data Warehouse](connect-to-warehouse.md)
+- [Add Recce to CI/CD](../7-cicd/ci-cd-getting-started.md)
diff --git a/docs/2-getting-started/connect-to-warehouse.md b/docs/2-getting-started/connect-to-warehouse.md
new file mode 100644
index 00000000..ef369112
--- /dev/null
+++ b/docs/2-getting-started/connect-to-warehouse.md
@@ -0,0 +1,107 @@
+# Connect Data Warehouse
+
+**Goal:** Connect your data warehouse to Recce Cloud to enable data diffing on PRs.
+
+Recce Cloud supports **[Snowflake](#connect-snowflake), [Databricks](#connect-databricks), [BigQuery](#connect-bigquery), and [Redshift](connect-redshift)**. Using a different warehouse? Contact us at support@reccehq.com.
+
+## Prerequisites
+
+- [x] Warehouse credentials with read access
+- [x] Network access configured (IP whitelisting if required)
+
+## Security
+
+Recce Cloud queries your warehouse directly to compare Base and Current environments. Recce encrypts and stores credentials securely. Read-only access is sufficient for all data diffing features.
+
+## Connect Snowflake
+
+### Option 1: Username/Password
+
+| Field | Description | Example |
+|-------|-------------|---------|
+| Account | Snowflake account identifier | `xxxxxx.us-central1.gcp` |
+| Username | Database username | `MY_USER` |
+| Password | Database password | `my_password` |
+| Role | Role with read access | `ANALYST_ROLE` |
+| Warehouse | Compute warehouse name | `WH_LOAD` |
+
+### Option 2: Key Pair Authentication
+
+| Field | Description | Example |
+|-------|-------------|---------|
+| Account | Snowflake account identifier | `xxxxxx.us-central1.gcp` |
+| Username | Service account username | `MY_USER` |
+| Private Key | PEM-formatted private key | `-----BEGIN RSA PRIVATE KEY-----...` |
+| Passphrase | Key passphrase (if encrypted) | `my_passphrase` |
+| Role | Role with read access | `ANALYST_ROLE` |
+| Warehouse | Compute warehouse name | `WH_LOAD` |
+
+## Connect Databricks
+
+### Option 1: Personal Access Token
+
+| Field | Description | Example |
+|-------|-------------|---------|
+| Host | Workspace URL | `adb-1234567890123456.7.azuredatabricks.net` |
+| HTTP Path | SQL warehouse path | `/sql/1.0/warehouses/abc123def456` |
+| Token | Personal access token | `dapiXXXXXXXXXXXXXXXXXXXXXXX` |
+| Catalog | Unity Catalog name (optional) | `my_catalog` |
+
+### Option 2: OAuth (M2M)
+
+| Field | Description | Example |
+|-------|-------------|---------|
+| Host | Workspace URL | `adb-1234567890123456.7.azuredatabricks.net` |
+| HTTP Path | SQL warehouse path | `/sql/1.0/warehouses/abc123def456` |
+| Client ID | Service principal client ID | `12345678-1234-1234-1234-123456789012` |
+| Client Secret | Service principal secret | `dose1234567890abcdef` |
+| Catalog | Unity Catalog name (optional) | `my_catalog` |
+
+
+> **Note**: OAuth M2M is auto-enabled in Databricks accounts. For setup details, see [dbt Databricks setup](https://docs.getdbt.com/docs/core/connect-data-platform/databricks-setup#oauth-machine-to-machine-m2m-authentication).
+
+## Connect BigQuery
+
+| Field | Description | Example |
+|-------|-------------|---------|
+| Project | GCP project ID | `my-gcp-project-123456` |
+| Service Account JSON | Full JSON key file contents | `{"type": "service_account", ...}` |
+
+
+> **Note**: For authentication, we currently provide support for service account JSON only. More details [here](https://docs.getdbt.com/docs/core/connect-data-platform/bigquery-setup#service-account-json).
+
+## Connect Redshift
+
+| Field | Description | Example |
+|-------|-------------|---------|
+| Host | Cluster endpoint | `my-cluster.abc123xyz.us-west-2.redshift.amazonaws.com` |
+| Port | Database port | `5439` (Default) |
+| Database | Database name | `analytics_db` |
+| Username | Database user | `admin_user` |
+| Password | Database password | `my_password` |
+
+
+> **Note**: We currently support Database (Password-based authentication) only. More details [here](https://docs.getdbt.com/docs/core/connect-data-platform/redshift-setup#authentication-parameters).
+
+## Save Connection
+
+After entering your connection details, click **Save**. Recce Cloud runs a connection test automatically and displays "Connected" on success.
+
+## Verify Success
+
+Navigate to Organization Settings in Recce Cloud. Your data warehouse should appear.
+
+{: .shadow}
+
+## Troubleshooting
+
+| Issue | Solution |
+| --- | --- |
+| Connection refused | Whitelist Recce Cloud IP ranges in your network configuration |
+| Authentication failed | Verify credentials and regenerate if expired |
+| Permission denied on table | Grant SELECT permissions on target schemas |
+
+## Next Steps
+
+- [Add Recce to CI/CD](../7-cicd/setup-ci.md)
+- [Run Your First Data Diff](../5-data-diffing/row-count-diff.md)
diff --git a/docs/2-getting-started/dbt-cloud-setup.md b/docs/2-getting-started/dbt-cloud-setup.md
new file mode 100644
index 00000000..91646845
--- /dev/null
+++ b/docs/2-getting-started/dbt-cloud-setup.md
@@ -0,0 +1,239 @@
+---
+title: dbt Cloud Setup
+---
+
+# dbt Cloud Setup
+
+When your dbt project runs on dbt Cloud, validating PR data changes requires retrieving artifacts from the dbt Cloud API rather than generating them locally.
+
+## Goal
+
+After completing this tutorial, every PR triggers automated data validation. Recce compares your PR changes against production, with results visible in Recce Cloud.
+
+## Prerequisites
+
+- [x] **Recce Cloud account**: free trial at [cloud.reccehq.com](https://cloud.reccehq.com)
+- [x] **dbt Cloud account**: with CI (continuous integration) and CD (continuous deployment) jobs configured
+- [x] **dbt Cloud API token**: with read access to job artifacts
+- [x] **GitHub repository**: with admin access to add workflows and secrets
+
+## How Recce retrieves dbt Cloud artifacts
+
+Recce needs both base (production) and current (PR) dbt artifacts to compare changes. When using dbt Cloud, these artifacts live in dbt Cloud's API rather than your local filesystem. Your GitHub Actions workflows retrieve them via API calls and upload to Recce Cloud.
+
+Two workflows handle this:
+
+1. **Base workflow** (on merge to main): Downloads production artifacts from your CD job → uploads with `recce-cloud upload --type prod`
+2. **PR workflow** (on pull request): Downloads PR artifacts from your CI job → uploads with `recce-cloud upload`
+
+## Setup steps
+
+### 1. Enable "Generate docs on run" in dbt Cloud
+
+Recce requires `catalog.json` for schema comparisons. Enable documentation generation for both your CI and CD jobs in dbt Cloud.
+
+**For CD jobs (production):**
+
+1. Go to your CD job settings in dbt Cloud
+2. Under **Execution settings**, enable **Generate docs on run**
+
+**For CI jobs (pull requests):**
+
+1. Go to your CI job settings in dbt Cloud
+2. Under **Advanced settings**, enable **Generate docs on run**
+
+!!! note
+ Without this setting, dbt Cloud won't generate `catalog.json`, and Recce won't be able to compare schemas between environments.
+
+### 2. Get your dbt Cloud credentials
+
+Collect the following from your dbt Cloud account:
+
+| Credential | Where to find it |
+| --- | --- |
+| **Account ID** | URL when viewing any job: `cloud.getdbt.com/deploy/{ACCOUNT_ID}/projects/...` |
+| **CD Job ID** | URL of your production/CD job: `...jobs/{JOB_ID}` |
+| **CI Job ID** | URL of your PR/CI job: `...jobs/{JOB_ID}` |
+| **API Token** | Account Settings > API Tokens > Create Service Token |
+
+!!! tip
+ Create a service token with "Job Admin" or "Member" permissions. This allows read access to job artifacts.
+
+### 3. Configure GitHub secrets
+
+Add the following secrets to your GitHub repository (Settings > Secrets and variables > Actions):
+
+**dbt Cloud secrets:**
+
+- `DBT_CLOUD_API_TOKEN` - Your dbt Cloud API token
+- `DBT_CLOUD_ACCOUNT_ID` - Your dbt Cloud account ID
+- `DBT_CLOUD_CD_JOB_ID` - Your production/CD job ID
+- `DBT_CLOUD_CI_JOB_ID` - Your PR/CI job ID
+
+!!! note
+ `GITHUB_TOKEN` is automatically provided by GitHub Actions, no configuration needed.
+
+### 4. Create the base workflow (CD)
+
+Create `.github/workflows/recce-base.yml` to update your production baseline when merging to main.
+
+```yaml
+name: Update Base Metadata (dbt Cloud)
+
+on:
+ push:
+ branches: [main]
+ workflow_dispatch:
+
+env:
+ DBT_CLOUD_API_BASE: "https://cloud.getdbt.com/api/v2/accounts/${{ secrets.DBT_CLOUD_ACCOUNT_ID }}"
+ DBT_CLOUD_API_TOKEN: ${{ secrets.DBT_CLOUD_API_TOKEN }}
+
+jobs:
+ update-base:
+ runs-on: ubuntu-latest
+ steps:
+ - uses: actions/checkout@v4
+
+ - uses: actions/setup-python@v5
+ with:
+ python-version: "3.10"
+
+ - name: Install recce-cloud
+ run: pip install recce-cloud
+
+ - name: Retrieve artifacts from CD job
+ env:
+ DBT_CLOUD_CD_JOB_ID: ${{ secrets.DBT_CLOUD_CD_JOB_ID }}
+ run: |
+ set -eo pipefail
+ CD_RUNS_URL="${DBT_CLOUD_API_BASE}/runs/?job_definition_id=${DBT_CLOUD_CD_JOB_ID}&order_by=-id&limit=1"
+ CD_RUNS_RESPONSE=$(curl -sSf -H "Authorization: Bearer ${DBT_CLOUD_API_TOKEN}" "${CD_RUNS_URL}")
+ DBT_CLOUD_CD_RUN_ID=$(echo "${CD_RUNS_RESPONSE}" | jq -r ".data[0].id")
+ mkdir -p target
+ for artifact in manifest.json catalog.json; do
+ ARTIFACT_URL="${DBT_CLOUD_API_BASE}/runs/${DBT_CLOUD_CD_RUN_ID}/artifacts/${artifact}"
+ curl -sSf -H "Authorization: Bearer ${DBT_CLOUD_API_TOKEN}" "${ARTIFACT_URL}" -o "target/${artifact}"
+ done
+
+ - name: Upload to Recce Cloud
+ env:
+ GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+ run: recce-cloud upload --type prod
+```
+
+### 5. Create the PR workflow (CI)
+
+Create `.github/workflows/recce-pr.yml` to validate PR changes.
+
+```yaml
+name: Validate PR (dbt Cloud)
+
+on:
+ pull_request:
+ branches: [main]
+
+env:
+ DBT_CLOUD_API_BASE: "https://cloud.getdbt.com/api/v2/accounts/${{ secrets.DBT_CLOUD_ACCOUNT_ID }}"
+ DBT_CLOUD_API_TOKEN: ${{ secrets.DBT_CLOUD_API_TOKEN }}
+
+jobs:
+ validate-pr:
+ runs-on: ubuntu-latest
+ steps:
+ - uses: actions/checkout@v4
+
+ - uses: actions/setup-python@v5
+ with:
+ python-version: "3.10"
+
+ - name: Install recce-cloud
+ run: pip install recce-cloud
+
+ - name: Wait for dbt Cloud CI job
+ env:
+ DBT_CLOUD_CI_JOB_ID: ${{ secrets.DBT_CLOUD_CI_JOB_ID }}
+ CURRENT_GITHUB_SHA: ${{ github.event.pull_request.head.sha }}
+ run: |
+ set -eo pipefail
+ CI_RUNS_URL="${DBT_CLOUD_API_BASE}/runs/?job_definition_id=${DBT_CLOUD_CI_JOB_ID}&order_by=-id"
+ fetch_ci_run_id() {
+ CI_RUNS_RESPONSE=$(curl -sSf -H "Authorization: Bearer ${DBT_CLOUD_API_TOKEN}" "${CI_RUNS_URL}")
+ echo "${CI_RUNS_RESPONSE}" | jq -r ".data[] | select(.git_sha == \"${CURRENT_GITHUB_SHA}\") | .id" | head -n1
+ }
+ DBT_CLOUD_CI_RUN_ID=$(fetch_ci_run_id)
+ while [ -z "$DBT_CLOUD_CI_RUN_ID" ]; do
+ echo "Waiting for dbt Cloud CI job to start..."
+ sleep 10
+ DBT_CLOUD_CI_RUN_ID=$(fetch_ci_run_id)
+ done
+ echo "DBT_CLOUD_CI_RUN_ID=${DBT_CLOUD_CI_RUN_ID}" >> $GITHUB_ENV
+ CI_RUN_URL="${DBT_CLOUD_API_BASE}/runs/${DBT_CLOUD_CI_RUN_ID}/"
+ while true; do
+ CI_RUN_RESPONSE=$(curl -sSf -H "Authorization: Bearer ${DBT_CLOUD_API_TOKEN}" "${CI_RUN_URL}")
+ CI_RUN_SUCCESS=$(echo "${CI_RUN_RESPONSE}" | jq '.data.is_complete and .data.is_success')
+ CI_RUN_FAILED=$(echo "${CI_RUN_RESPONSE}" | jq '.data.is_complete and (.data.is_error or .data.is_cancelled)')
+ if $CI_RUN_SUCCESS; then
+ echo "dbt Cloud CI job completed successfully."
+ break
+ elif $CI_RUN_FAILED; then
+ status=$(echo ${CI_RUN_RESPONSE} | jq -r '.data.status_humanized')
+ echo "dbt Cloud CI job failed or was cancelled. Status: $status"
+ exit 1
+ fi
+ echo "Waiting for dbt Cloud CI job to complete..."
+ sleep 10
+ done
+
+ - name: Retrieve artifacts from CI job
+ run: |
+ set -eo pipefail
+ mkdir -p target
+ for artifact in manifest.json catalog.json; do
+ ARTIFACT_URL="${DBT_CLOUD_API_BASE}/runs/${DBT_CLOUD_CI_RUN_ID}/artifacts/${artifact}"
+ curl -sSf -H "Authorization: Bearer ${DBT_CLOUD_API_TOKEN}" "${ARTIFACT_URL}" -o "target/${artifact}"
+ done
+
+ - name: Upload to Recce Cloud
+ env:
+ GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+ run: recce-cloud upload
+```
+
+## Verification
+
+After setting up:
+
+1. **Trigger the base workflow** - Push to main or run manually to upload production baseline
+2. **Create a test PR** with a small model change
+3. **Wait for dbt Cloud CI job** to complete
+4. **Check GitHub Actions** - the Recce PR workflow should run after dbt Cloud CI completes
+5. **Open Recce Cloud** - the PR session appears with validation results
+
+!!! tip
+ Run the base workflow first to establish your production baseline. The PR workflow compares against this baseline.
+
+## Troubleshooting
+
+| Issue | Solution |
+| --- | --- |
+| "CD run not found" | Ensure your CD job has run on the base branch commit. Try rebasing your PR to trigger a new CD run. |
+| "CI job timeout" | The workflow waits for dbt Cloud CI to complete. Check if your CI job is stuck or taking longer than expected. |
+| "Artifact not found" | Verify "Generate docs on run" is enabled for both CI and CD jobs. |
+| "API authentication failed" | Check your `DBT_CLOUD_API_TOKEN` has correct permissions and is stored in GitHub secrets. |
+
+### CD job timing considerations
+
+The base workflow retrieves artifacts from the latest CD job run. For accurate comparisons, ensure your dbt Cloud CD job runs on every merge to main.
+
+If your CD job runs on a schedule:
+
+- The baseline may be outdated compared to the actual main branch
+- Consider triggering the CD job manually before validating PRs
+
+## Next steps
+
+- [Get Started with Recce Cloud](./start-free-with-cloud.md) - Standard setup for self-hosted dbt
+- [Configure CD to establish your production baseline](../7-cicd/setup-cd.md)
+- [Configure CI for automated PR validation](../7-cicd/setup-ci.md)
+- [Learn environment strategies for reliable comparisons](../7-cicd/best-practices-prep-env.md)
diff --git a/docs/2-getting-started/environment-best-practices.md b/docs/2-getting-started/environment-best-practices.md
new file mode 100644
index 00000000..3845d66f
--- /dev/null
+++ b/docs/2-getting-started/environment-best-practices.md
@@ -0,0 +1,193 @@
+---
+title: Environment Best Practices
+---
+
+# Environment Best Practices
+
+Unreliable comparison environments produce misleading validation results. When source data drifts, branches fall behind, or environments collide, you cannot trust what Recce reports.
+
+This guide covers strategies to prepare reliable, efficient environments for Recce data validation. Recce compares a *base environment* (production or staging, representing your main branch) against a *current environment* (representing your pull request branch).
+
+## When to use this guide
+
+- Setting up CI/CD for Recce for the first time
+- Seeing inconsistent diff results across PRs
+- Managing warehouse costs from accumulated PR environments
+- Troubleshooting validation results that don't match expectations
+
+## Challenges this guide addresses
+
+Several factors can affect comparison accuracy:
+
+- Source data updates continuously
+- Transformations take time to run
+- Other pull requests (PRs) merge into the base branch
+- Generated environments accumulate in the warehouse
+
+## Use per-PR schemas
+
+Each PR should have its own isolated schema. This prevents interference between concurrent PRs and makes cleanup straightforward.
+
+```yaml
+# profiles.yml
+ci:
+ schema: "{{ env_var('CI_SCHEMA') }}"
+
+# CI workflow
+env:
+ CI_SCHEMA: "pr_${{ github.event.pull_request.number }}"
+```
+
+Benefits:
+
+- Complete isolation between PRs
+- Parallel validation without conflicts
+- Easy cleanup by dropping the schema
+
+See [Environment Setup](environment-setup.md) for detailed configuration.
+
+## Prepare a single base environment
+
+Use one consistent base environment for all PRs to compare against. Options:
+
+| Base Environment | Characteristics | Best For |
+|------------------|-----------------|----------|
+| Production | Latest merged code, full data | Accurate production comparison |
+| Staging | Latest merged code, limited data | Faster comparisons, lower cost |
+
+If using staging as base:
+
+- Ensure transformed results reflect the latest commit of the base branch
+- Use the same source data as PR environments
+- Use the same transformation logic as PR environments
+
+The staging environment should match PR environments as closely as possible, differing only in git commit.
+
+## Limit source data range
+
+Most data is temporal. Using only recent data reduces transformation time while still validating correctness.
+
+**Strategy:** Use data from the last month, excluding the current week. This ensures consistent results regardless of when transformations run.
+
+```sql
+SELECT *
+FROM {{ source('your_source_name', 'orders') }}
+{% if target.name != 'prod' %}
+WHERE
+ order_date >= DATEADD(month, -1, CURRENT_DATE)
+ AND order_date < DATE_TRUNC('week', CURRENT_DATE)
+{% endif %}
+```
+
+{: .shadow}
+
+Benefits:
+
+- Faster transformation execution
+- Consistent comparison results
+- Reduced warehouse costs
+
+## Reduce source data volatility
+
+If source data updates frequently (hourly or more), comparison results can vary based on timing rather than code changes.
+
+**Strategies:**
+
+- **Zero-copy clone** (Snowflake, BigQuery, Databricks): Freeze source data at a specific point in time
+- **Weekly snapshots**: Update source data weekly to reduce variability
+
+{: .shadow}
+
+## Keep base environment current
+
+The base environment can become outdated in two scenarios:
+
+1. **New source data**: If you update data weekly, update the base environment at least weekly
+2. **PRs merged to main**: Trigger base environment update on merge events
+
+Configure your CD workflow to run:
+
+- On merge to main (immediate update)
+- On schedule (e.g., daily at 2 AM UTC)
+
+See [Setup CD](setup-cd.md) for workflow configuration.
+
+## Obtain artifacts for environments
+
+Recce uses base and current environment artifacts (`manifest.json`, `catalog.json`) to find corresponding tables in the data warehouse for comparison.
+
+**Recommended approaches:**
+
+- **Recce Cloud** - Automatic artifact management via `recce-cloud upload`. See [Setup CD](setup-cd.md) and [Setup CI](setup-ci.md).
+- **dbt Cloud** - Download artifacts from dbt Cloud jobs. See dbt Cloud Setup (separate guide).
+
+**Alternative approaches** (for custom setups):
+
+- **Cloud storage** - Upload artifacts to S3, GCS, or Azure Blob in CI
+- **GitHub Actions artifacts** - Use `gh run download` to retrieve from workflow runs
+- **Stateless** - Checkout the base branch and run `dbt docs generate` on-demand
+
+## Keep PR branch in sync with base
+
+If a PR runs after other PRs merge to main, the comparison mixes:
+
+- Changes from the current PR
+- Changes from other merged PRs
+
+This produces comparison results that don't accurately reflect the current PR's impact.
+
+{: .shadow}
+
+**GitHub**: Enable [branch protection](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/keeping-your-pull-request-in-sync-with-the-base-branch) to show when PRs are outdated.
+
+**CI check**: Add a workflow step to verify the PR is up-to-date:
+
+```yaml
+- name: Check if PR is up-to-date
+ if: github.event_name == 'pull_request'
+ run: |
+ git fetch origin main
+ UPSTREAM=${GITHUB_BASE_REF:-'main'}
+ HEAD=${GITHUB_HEAD_REF:-${GITHUB_REF#refs/heads/}}
+ if [ "$(git rev-list --left-only --count ${HEAD}...origin/${UPSTREAM})" -eq 0 ]; then
+ echo "Branch is up-to-date"
+ else
+ echo "Branch is not up-to-date"
+ exit 1
+ fi
+```
+
+## Clean up PR environments
+
+As PRs accumulate, so do generated schemas. Implement cleanup to manage warehouse storage.
+
+**On PR close**: Create a workflow that drops the PR schema when the PR closes.
+
+```jinja
+{% macro clear_schema(schema_name) %}
+{% set drop_schema_command = "DROP SCHEMA IF EXISTS " ~ schema_name ~ " CASCADE;" %}
+{% do run_query(drop_schema_command) %}
+{% endmacro %}
+```
+
+Run the cleanup:
+
+```shell
+dbt run-operation clear_schema --args "{'schema_name': 'pr_123'}"
+```
+
+**Scheduled cleanup**: Remove schemas not used for a week.
+
+## Example configuration
+
+| Environment | Schema | When to Run | Count | Data Range |
+|-------------|--------|-------------|-------|------------|
+| Production | `public` | Daily | 1 | All |
+| Staging | `staging` | Daily + on merge | 1 | 1 month, excluding current week |
+| PR | `pr_` | On push | # of open PRs | 1 month, excluding current week |
+
+## Next steps
+
+- [Environment Setup](environment-setup.md) - Technical configuration for profiles.yml and CI/CD
+- [Setup CD](setup-cd.md) - Configure automatic baseline updates
+- [Setup CI](setup-ci.md) - Configure PR validation
diff --git a/docs/2-getting-started/environment-setup.md b/docs/2-getting-started/environment-setup.md
new file mode 100644
index 00000000..641a8504
--- /dev/null
+++ b/docs/2-getting-started/environment-setup.md
@@ -0,0 +1,201 @@
+---
+title: Environment Setup
+description: >-
+ Configure dbt profiles and CI/CD environment variables for Recce data validation.
+ Set up isolated schemas for base vs current comparison on pull requests.
+---
+
+!!! tip "Following the onboarding guide?"
+ Return to [Get Started with Recce Cloud](start-free-with-cloud.md#3-add-recce-to-cicd) after completing this page.
+
+# Environment Setup
+
+Configure your dbt profiles and CI/CD environment variables for Recce data validation.
+
+## Goal
+
+Set up isolated schemas for base vs current comparison. After completing this guide, your CI/CD workflows automatically create per-PR schemas and compare them against production.
+
+## Prerequisites
+
+- [ ] **dbt project**: A working dbt project with `profiles.yml` configured
+- [ ] **CI/CD platform**: GitHub Actions, GitLab CI, or similar
+- [ ] **Warehouse access**: Credentials with permissions to create schemas dynamically
+
+## Why separate schemas matter
+
+Recce compares two sets of data to validate changes:
+
+- **Base**: The production state (main branch)
+- **Current**: The PR branch with your changes
+
+For accurate validation, these must point to different schemas in your warehouse. Without separation, you would compare identical data and miss meaningful differences.
+
+## How CI/CD works with Recce
+
+Recce uses both continuous delivery (CD) and continuous integration (CI) to automate data validation:
+
+- **CD (Continuous Delivery)**: Runs after merge to main. Updates baseline artifacts with latest production state.
+- **CI (Continuous Integration)**: Runs on PR. Validates proposed changes against baseline.
+
+**Set up CD first**, then CI. CD establishes your baseline (production artifacts), which CI uses for comparison.
+
+## Configure profiles.yml
+
+Your `profiles.yml` file defines how dbt connects to your warehouse. Add a `ci` target with a dynamic schema for PR isolation.
+
+```yaml
+jaffle_shop:
+ target: dev
+ outputs:
+ dev:
+ type: snowflake
+ account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
+ user: "{{ env_var('SNOWFLAKE_USER') }}"
+ password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
+ database: analytics
+ warehouse: COMPUTE_WH
+ schema: dev
+ threads: 4
+
+ # CI environment with dynamic schema per PR
+ ci:
+ type: snowflake
+ account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
+ user: "{{ env_var('SNOWFLAKE_USER') }}"
+ password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
+ database: analytics
+ warehouse: COMPUTE_WH
+ schema: "{{ env_var('SNOWFLAKE_SCHEMA') }}"
+ threads: 4
+
+ prod:
+ type: snowflake
+ account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
+ user: "{{ env_var('SNOWFLAKE_USER') }}"
+ password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
+ database: analytics
+ warehouse: COMPUTE_WH
+ schema: public
+ threads: 4
+```
+
+After saving, your profile supports three targets: `dev` for local development, `ci` for PR validation, and `prod` for production.
+
+Key points:
+
+- The `ci` target uses `env_var('SNOWFLAKE_SCHEMA')` for dynamic schema assignment (other warehouses use their own variable name)
+- The `prod` target uses a fixed schema (`public`) for consistency
+- Adapt this pattern for other warehouses (BigQuery uses `dataset` instead of `schema`)
+
+## Set CI/CD environment variables
+
+Your CI/CD workflow sets the schema dynamically for each PR. The key configuration:
+
+**GitHub Actions:**
+
+```yaml
+env:
+ SNOWFLAKE_SCHEMA: "PR_${{ github.event.pull_request.number }}"
+```
+
+**GitLab CI:**
+
+```yaml
+variables:
+ SNOWFLAKE_SCHEMA: "MR_${CI_MERGE_REQUEST_IID}"
+```
+
+This creates schemas like `PR_123`, `PR_456` for each PR automatically. When a PR opens, the workflow sets `SNOWFLAKE_SCHEMA` and dbt writes to that isolated schema.
+
+For complete workflow examples, see [Setup CD](setup-cd.md) and [Setup CI](setup-ci.md).
+
+## Recommended pattern: Schema-per-PR
+
+Create an isolated schema for each PR. This is the recommended approach for teams.
+
+| Base Schema | Current Schema | Example |
+|-------------|----------------|---------|
+| `public` (prod) | `pr_123` | PR #123 gets its own schema |
+
+**Why this pattern:**
+
+- Complete isolation between PRs
+- Multiple PRs can run validation in parallel without conflicts
+- Easy cleanup by dropping the schema when PR closes
+- Clear audit trail of what data each PR produced
+
+## Alternative patterns
+
+### Using staging as base
+
+Instead of comparing against production, compare against a staging environment with limited data.
+
+| Base Schema | Current Schema | Use Case |
+|-------------|----------------|----------|
+| `staging` | `pr_123` | Teams wanting faster comparisons |
+
+**Pros:**
+
+- Faster diffs with limited data ranges
+- Consistent source data between base and current
+- Reduced warehouse costs
+
+**Cons:**
+
+- Staging may drift from production
+- Issues caught in staging might not reflect production behavior
+- Requires maintaining an additional environment
+
+See [Environment Best Practices](environment-best-practices.md) for strategies on limiting data ranges.
+
+### Shared development schema (not recommended)
+
+Using a single `dev` schema for all development work.
+
+| Base Schema | Current Schema | Use Case |
+|-------------|----------------|----------|
+| `public` (prod) | `dev` | Solo developers only |
+
+**Why this is not recommended:**
+
+- Multiple PRs overwrite each other's data
+- Cannot run parallel validations
+- Comparison results may include changes from other work
+- Difficult to isolate issues to specific PRs
+
+Only use this pattern for individual local development, not for CI/CD automation.
+
+## Verification
+
+After configuring your setup, verify that both base and current schemas are accessible.
+
+### Check configuration locally
+
+```shell
+dbt debug --target ci
+```
+
+### Verify in Recce interface
+
+Launch Recce and check **Environment Info** in the top-right corner. You should see:
+
+- **Base**: Your production schema (e.g., `public`)
+- **Current**: Your PR-specific schema (e.g., `pr_123`)
+
+## Troubleshooting
+
+| Issue | Solution |
+|-------|----------|
+| Schema creation fails | Verify your CI credentials have `CREATE SCHEMA` permissions |
+| Environment variable not found | Check that secrets are configured in your CI/CD platform settings |
+| Base and current show same schema | Ensure `--target ci` is used in CI, not `--target dev` |
+| Profile not found | Verify `profiles.yml` is accessible in CI (check path or use `DBT_PROFILES_DIR`) |
+| Connection timeout | Check warehouse IP allowlists include CI runner IP ranges |
+
+## Next steps
+
+- [Get Started with Recce Cloud](start-free-with-cloud.md) - Complete onboarding guide
+- [Environment Best Practices](environment-best-practices.md) - Strategies for source data and schema management
+- [Setup CD](setup-cd.md) - CD workflow for GitHub Actions and GitLab CI
+- [Setup CI](setup-ci.md) - CI workflow for GitHub Actions and GitLab CI
diff --git a/docs/2-getting-started/get-started-jaffle-shop.md b/docs/2-getting-started/get-started-jaffle-shop.md
deleted file mode 100644
index 80babdb1..00000000
--- a/docs/2-getting-started/get-started-jaffle-shop.md
+++ /dev/null
@@ -1,93 +0,0 @@
----
-title: Open Source Tutorial
----
-
-Jaffle Shop is an example project officially provided by [dbt Labs](https://www.getdbt.com). This document uses [jaffle_shop_duckdb](https://github.com/dbt-labs/jaffle_shop_duckdb) to enable you to start using Recce locally from scratch within five minutes.
-
-!!! tip
-
- [DuckDB](https://duckdb.org/) projects like jaffle_shop_duckdb don’t use a server-based connection or cloud warehouse credentials. Be aware that a few setup steps differ from those for cloud-based warehouses.
-
-## Step by Step
-
-1. Clone the “Jaffle Shop” dbt data project
- ```shell
- git clone git@github.com:dbt-labs/jaffle_shop_duckdb.git
- cd jaffle_shop_duckdb
- ```
-2. Prepare virtual env
- ```shell
- python -m venv venv
- source venv/bin/activate
- ```
-3. Installation
- ```shell
- pip install -r requirements.txt
- pip install recce
- ```
-4. Provide additional environment to compare
- Edit `./profiles.yml` to add one more target to serve as the base environment for comparison.
-
Note: This step is only necessary for file-based engines like DuckDB. For cloud warehouses (e.g., Snowflake, BigQuery), Recce auto-detects your active dbt profile and schema, so no changes required.
- ```diff
- jaffle_shop:
- target: dev
- outputs:
- dev:
- type: duckdb
- path: 'jaffle_shop.duckdb'
- threads: 24
- + prod:
- + type: duckdb
- + path: 'jaffle_shop.duckdb'
- + schema: prod
- + threads: 24
- ```
-5. Prepare production environment
- Using DuckDB, you need to generate the artifacts for the base environment. Checkout the `main` branch of your project and generate the required artifacts into `target-base`. You can skip `dbt build` if this environment already exists.
-
Note: This step is only necessary for file-based engines like DuckDB. For most data warehouses, you don’t need to re-run production locally. You can download the dbt artifacts generated from the main branch, and save them to a `target-base/` folder.
- ```shell
- dbt seed --target prod
- dbt run --target prod
- dbt docs generate --target prod --target-path ./target-base
- ```
-6. Prepare development environment. First, edit an existing model `./models/staging/stg_payments.sql`.
- ```diff
- ...
-
- renamed as (
- payment_method,
-
- - -- `amount` is currently stored in cents, so we convert it to dollars
- - amount / 100 as amount
- + amount
-
- from source
- )
- ```
- run on development environment.
- ```shell
- dbt seed
- dbt run
- dbt docs generate
- ```
-7. Run the recce server
- ```shell
- recce server
- ```
- Open the link http://0.0.0.0:8000, you can see the lineage diff
- 
-8. Switch to the **Query** tab, run this query
- ```sql
- select * from {{ ref("orders") }} order by 1
- ```
- Click the `Run Diff` or press `Cmd + Shift + Enter`
- Click on the 🔑 icon next to the `order_id` column to compare records that are uniquely identified by their `order_id`.
- 
-9. Click the blue `Add to Checklist` button on the right bottom corner to add the query result to checklist
- 
-
-## What’s Next
-By following this DuckDB tutorial, you’ve seen how Recce works locally.
-You can now return to the [Open Source Setup](./installation.md) to set up Recce with your cloud data warehouse.
-
-Got questions? [Let us know](../1-whats-recce/community-support.md). We're happy to help!
diff --git a/docs/2-getting-started/installation.md b/docs/2-getting-started/installation.md
deleted file mode 100644
index 764de7aa..00000000
--- a/docs/2-getting-started/installation.md
+++ /dev/null
@@ -1,202 +0,0 @@
----
-title: Open Source Setup
----
-
-# Open Source Setup
-
-## Install Open Source
-
-From within a dbt project directory:
-```shell
-cd your-dbt-project/ # if you're not already there
-pip install -U recce
-```
-
-
-## Launch
-To start Recce in the current environment:
-```shell
-recce server
-```
-Launching Recce enables:
-
-- **Lineage clarity**: Trace changes down to the column level
-
-- **Query insights**: Explore logic and run custom queries
-
-- **Live diffing**: Reload and inspect changes as you iterate
-
-Best suited for quick exploration before moving to structured validation using Diff.
-
-
-
-
-## Configure Diff
-
-To compare changes, Recce needs a baseline. This guide explains the concept of Diff in Recce and how it fits into data validation workflows. Setup steps vary by environment, so this guide focuses on the core ideas rather than copy-paste instructions.
-
-For a concrete example, refer to the [5-minute Jaffle Shop tutorial](./get-started-jaffle-shop.md).
-
-To configure a comparison in Recce, two components are required:
-
-### 1. Artifacts
-
-Recce uses dbt [artifacts](https://docs.getdbt.com/reference/artifacts/dbt-artifacts) to perform diffs. These files are generated with each dbt run and typically saved in the `target/` folder.
-
-In addition to the current artifacts, a second set is needed to serve as the baseline for comparison. Recce looks for these in the `target-base/` folder.
-
-- `target/` – Artifacts from the current development environment
-- `target-base/` – Artifacts from a baseline environment (e.g., production)
-
-For most setups, retrieve the existing artifacts that generated from the main branch (usually from a CI run or build cache) and save them into a `target-base/` folder.
-
-### 2. Schemas
-
-Recce also compares the actual query results between two dbt [environments](https://docs.getdbt.com/docs/core/dbt-core-environments), each pointing to a different [schema](https://docs.getdbt.com/docs/core/connect-data-platform/connection-profiles#understanding-target-schemas). This allows validation beyond metadata by comparing the data itself.
-
-For example:
-
-- `prod` schema for production
-- `dev` schema for development
-
-These schemas represent where dbt builds its models.
-
-!!! tip
-
- In dbt, an environment typically maps to a schema. To compare data results, separate schemas are required. Learn more in [dbt environments](https://docs.getdbt.com/docs/core/dbt-core-environments).
-
-Schemas are typically configured in the `profiles.yml` file, which defines how dbt connects to the data platform. Both schemas must be accessible for Recce to perform environment-based comparisons.
-
-Once both artifacts and schemas are configured, Recce can surface meaningful diffs across logic, metadata, and data.
-
-## Verify your setup
-
-There are two ways to check that your configuration is complete:
-
-### 1. Debug Command (CLI)
-
-Run `recce debug` from the command line to verify your setup before launching the server:
-
-```bash
-recce debug
-```
-
-This command checks artifacts, directories, and warehouse connection, providing detailed feedback on any missing components.
-
-### 2. Environment Info (Web UI)
-
-Use **Environment Info** in the top-right corner of the Recce web interface to verify your configuration.
-
-A correctly configured setup will display two environments:
-
-- **Base** – the reference schema used for comparison (e.g., production)
-- **Current** – the schema for the environment under development (e.g., staging or dev)
-
-This confirms that both the artifacts and schemas are properly connected for diffing.
-
-
-
-## Start with dbt Cloud
-
-dbt Cloud is a hosted service that provides a managed environment for running dbt projects by [dbt Labs](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features). This document provides a step-by-step guide to get started Recce with dbt Cloud.
-
-### Prerequisites
-
-Recce will compare the data models between two environments. That means you need to have two environments in your dbt Cloud project. For example, one for production and another for development.
-Also, you need to provide the credentials profile for both environments in your `profiles.yml` file to let Recce access your data warehouse.
-
-#### Suggestions for setting up dbt Cloud
-
-To integrate the dbt Cloud with Recce, we suggest to set up two run jobs in your dbt Cloud project.
-
-#### Production Run Job
-
-The production run should be the main branch of your dbt project. You can trigger the dbt Cloud job on every merge to the main branch or schedule it to run at a daily specific time.
-
-### Development Run Job
-
-The development run should be a separate branch of your dbt project. You can trigger the dbt Cloud job on every merge to the pull-request branch.
-
-### Set up dbt profiles with credentials
-
-You need to provide the credentials profile for both environments in your `profiles.yml` file. Here is an example of how your `profiles.yml` file might look like:
-
-```yaml
-dbt-example-project:
- target: dev
- outputs:
- dev:
- type: snowflake
- account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
-
- # User/password auth
- user: "{{ env_var('SNOWFLAKE_USER') | as_text }}"
- password: "{{ env_var('SNOWFLAKE_PASSWORD') | as_text }}"
-
- role: DEVELOPER
- database: cloud_database
- warehouse: LOAD_WH
- schema: "{{ env_var('SNOWFLAKE_SCHEMA') | as_text }}"
- threads: 4
- prod:
- type: snowflake
- account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
-
- # User/password auth
- user: "{{ env_var('SNOWFLAKE_USER') | as_text }}"
- password: "{{ env_var('SNOWFLAKE_PASSWORD') | as_text }}"
-
- role: DEVELOPER
- database: cloud_database
- warehouse: LOAD_WH
- schema: PUBLIC
- threads: 4
-```
-
-
-### Execute Recce with dbt Cloud
-
-To compare the data models between two environments, you need to download the dbt Cloud artifacts for both environments. The artifacts include the manifest.json file and the catalog.json file. You can download the artifacts from the dbt Cloud UI.
-
-#### Login to your dbt Cloud account
-
-
-
-#### Go to the project you want to compare
-
-
-
-#### Download the dbt artifacts
-
-Download the artifacts from the latest run of both run jobs. You can download the artifacts from the `Artifacts` tab.
-
-
-
-
-### Set up the dbt artifacts folders
-
-Extract the downloaded artifacts and keep them in a separate folder. The production artifacts should be in the `target-base` folder and the development artifacts should be in the `target` folder.
-
-```bash
-$ tree target target-base
-target
-├── catalog.json
-└── manifest.json
-target-base/
-├── catalog.json
-└── manifest.json
-```
-
-### Setup dbt project
-
-Move the `target` and `target-base` folders to the root of your dbt project.
-You should also have the `profiles.yml` file in the root of your dbt project with the credentials profile for both environments.
-
-### Launch Recce
-
-Run the command to compare the data models between the two environments.
-
-```shell
-recce server
-```
-
diff --git a/docs/2-getting-started/jaffle-shop-tutorial.md b/docs/2-getting-started/jaffle-shop-tutorial.md
new file mode 100644
index 00000000..46432fe5
--- /dev/null
+++ b/docs/2-getting-started/jaffle-shop-tutorial.md
@@ -0,0 +1,175 @@
+---
+title: Jaffle Shop Tutorial
+---
+
+# Jaffle Shop Tutorial
+
+When you change a dbt model, how do you know what data actually changed? Running your model isn't enough — you need to compare outputs against the previous version.
+
+**Goal:** Make a model change and validate the data impact using Recce with the dbt Labs example project.
+
+This tutorial uses [jaffle_shop_duckdb](https://github.com/dbt-labs/jaffle_shop_duckdb), a sample project from dbt Labs. You'll modify a model, see how the change affects downstream data, and add a validation to your checklist.
+
+## Prerequisites
+
+- [x] Python 3.9+ installed
+- [x] Git installed
+
+## Steps
+
+### 1. Clone Jaffle Shop
+
+```shell
+git clone git@github.com:dbt-labs/jaffle_shop_duckdb.git
+cd jaffle_shop_duckdb
+```
+
+**Expected result:** You're in the `jaffle_shop_duckdb` directory.
+
+### 2. Set up virtual environment
+
+```shell
+python -m venv venv
+source venv/bin/activate
+```
+
+**Expected result:** Your terminal prompt shows `(venv)`.
+
+### 3. Install dependencies
+
+```shell
+pip install -r requirements.txt
+pip install recce
+```
+
+**Expected result:** Both dbt and Recce install without errors.
+
+### 4. Configure DuckDB profile for comparison
+
+Recce compares two environments. Edit `./profiles.yml` to add a `prod` target for the base environment.
+
+Add the following under `outputs:`:
+
+```yaml
+ prod:
+ type: duckdb
+ path: 'jaffle_shop.duckdb'
+ schema: prod
+ threads: 24
+```
+
+Your complete `profiles.yml` should look like:
+
+```yaml
+jaffle_shop:
+ target: dev
+ outputs:
+ dev:
+ type: duckdb
+ path: 'jaffle_shop.duckdb'
+ threads: 24
+ prod:
+ type: duckdb
+ path: 'jaffle_shop.duckdb'
+ schema: prod
+ threads: 24
+```
+
+**Expected result:** `profiles.yml` has both `dev` and `prod` targets.
+
+### 5. Build base environment
+
+Generate the production data and artifacts that Recce uses as baseline.
+
+```shell
+dbt seed --target prod
+dbt run --target prod
+dbt docs generate --target prod --target-path ./target-base
+```
+
+**Expected result:** `target-base/` folder contains `manifest.json` and `catalog.json`.
+
+### 6. Make a model change
+
+Edit `./models/staging/stg_payments.sql` to introduce a data change:
+
+```diff
+renamed as (
+ payment_method,
+
+- -- `amount` is currently stored in cents, so we convert it to dollars
+- amount / 100 as amount
++ amount
+
+ from source
+)
+```
+
+This removes the cents-to-dollars conversion — downstream models will now show values 100x larger.
+
+**Expected result:** `stg_payments.sql` outputs `amount` in cents instead of dollars.
+
+### 7. Build development environment
+
+```shell
+dbt seed
+dbt run
+dbt docs generate
+```
+
+**Expected result:** `target/` folder contains updated `manifest.json` and `catalog.json`.
+
+### 8. Start Recce server
+
+```shell
+recce server
+```
+
+**Expected result:** Server starts at http://0.0.0.0:8000
+
+Open http://localhost:8000 in your browser. The Lineage tab shows `stg_payments` and downstream models highlighted.
+
+
+
+### 9. Run a Query Diff
+
+Switch to the **Query** tab and run:
+
+```sql
+select * from {{ ref("orders") }} order by 1
+```
+
+Click **Run Diff** (or press `Cmd+Shift+Enter`).
+
+**Expected result:** Query Diff shows the `amount` column with values 100x larger in the current environment.
+
+
+
+### 10. Add to checklist
+
+Click **Add to Checklist** (blue button, bottom right) to save this validation.
+
+**Expected result:** Checklist tab shows your saved Query Diff.
+
+
+
+## Verify Success
+
+Confirm you completed the tutorial:
+
+1. Lineage Diff shows `stg_payments` and downstream models highlighted
+2. Query Diff on `orders` shows the amount change (100x difference)
+3. Checklist contains your saved validation
+
+## Troubleshooting
+
+| Issue | Solution |
+|-------|----------|
+| "No artifacts found" error | Run `dbt docs generate` for both prod (`--target-path ./target-base`) and dev |
+| Empty Lineage Diff | Ensure you made the model change in step 6 and ran `dbt run` + `dbt docs generate` |
+| DuckDB lock error | Close any other processes using `jaffle_shop.duckdb` |
+
+## Next Steps
+
+- [OSS Setup](oss-setup.md) — Set up Recce with your own dbt project
+- [Cloud vs Open Source](../1-whats-recce/cloud-vs-oss.md) — Compare OSS and Cloud features
diff --git a/docs/2-getting-started/oss-setup.md b/docs/2-getting-started/oss-setup.md
new file mode 100644
index 00000000..085713cc
--- /dev/null
+++ b/docs/2-getting-started/oss-setup.md
@@ -0,0 +1,141 @@
+---
+title: OSS Setup
+---
+
+# Set Up Recce OSS
+
+When you change data models, you need to compare the data before and after to catch unintended impacts. Recce OSS lets you run this validation locally.
+
+**Goal:** Install and run Recce locally for manual data validation.
+
+Recce OSS gives you the core validation engine to run locally. For the full experience with Recce Agent assistance on PRs and during development, see [Cloud vs Open Source](../1-whats-recce/cloud-vs-oss.md).
+
+## Prerequisites
+
+- [x] Python 3.9+ installed
+- [x] A dbt project with at least one model
+- [x] Git installed (for version comparison)
+
+## Steps
+
+### 1. Install Recce
+
+Install Recce in your dbt project's virtual environment.
+
+```shell
+pip install recce
+```
+
+**Expected result:** Installation completes without errors.
+
+### 2. Generate base environment artifacts
+
+Recce compares two states of your dbt project. First, generate artifacts for your base (production) state.
+
+```shell
+git checkout main
+dbt docs generate --target-path ./target-base
+```
+
+**Expected result:** `target-base/` folder contains `manifest.json` and `catalog.json`.
+
+!!! note "Different approaches by environment"
+ - **File-based (DuckDB):** Run `dbt build` first to create data. See [Jaffle Shop Tutorial](jaffle-shop-tutorial.md).
+ - **Cloud warehouses with dbt Cloud:** Download artifacts from dbt Cloud API. See [For dbt Cloud Users](#for-dbt-cloud-users) below.
+
+### 3. Generate current environment artifacts
+
+Switch to your development branch and generate artifacts for comparison.
+
+```shell
+git checkout your-feature-branch
+dbt run
+dbt docs generate
+```
+
+**Expected result:** `target/` folder contains updated `manifest.json` and `catalog.json`.
+
+### 4. Start Recce server
+
+Launch the Recce web interface.
+
+```shell
+recce server
+```
+
+**Expected result:** Server starts and displays:
+
+```
+Recce server is running at http://0.0.0.0:8000
+```
+
+### 5. Explore changes in the UI
+
+Open http://localhost:8000 in your browser.
+
+- **Lineage tab:** See which models changed and their downstream impact
+- **Query tab:** Run SQL queries to compare data between base and current states
+
+**Expected result:** Lineage Diff shows your modified models highlighted.
+
+### 6. Add validation checks to checklist
+
+After running a query or diff:
+
+1. Review the results
+2. Click **Add to Checklist** to save the validation
+3. Repeat for each check you want to track
+
+**Expected result:** Checklist shows your saved validations.
+
+## Verify Success
+
+Run `recce server` and confirm you can:
+
+1. See Lineage Diff between base and current
+2. Run a Query Diff on a modified model
+3. Add the result to your checklist
+
+## Try It: Jaffle Shop Tutorial
+
+Want a hands-on walkthrough with DuckDB? The [Jaffle Shop Tutorial](jaffle-shop-tutorial.md) guides you through making a model change, comparing data, and validating the impact.
+
+## For dbt Cloud Users
+
+If you use dbt Cloud for CI/CD, download production artifacts instead of generating them locally.
+
+**Get artifacts from dbt Cloud API:**
+
+```shell
+# Set your dbt Cloud credentials
+export DBT_CLOUD_API_TOKEN="your-token"
+export DBT_CLOUD_ACCOUNT_ID="your-account-id"
+export DBT_CLOUD_JOB_ID="your-production-job-id"
+
+# Download artifacts from your production job
+curl -H "Authorization: Token $DBT_CLOUD_API_TOKEN" \
+ "https://cloud.getdbt.com/api/v2/accounts/$DBT_CLOUD_ACCOUNT_ID/jobs/$DBT_CLOUD_JOB_ID/artifacts/manifest.json" \
+ -o target-base/manifest.json
+
+curl -H "Authorization: Token $DBT_CLOUD_API_TOKEN" \
+ "https://cloud.getdbt.com/api/v2/accounts/$DBT_CLOUD_ACCOUNT_ID/jobs/$DBT_CLOUD_JOB_ID/artifacts/catalog.json" \
+ -o target-base/catalog.json
+```
+
+Then generate current artifacts locally (`dbt docs generate`) and run `recce server` as usual.
+
+!!! tip "Recce Cloud automates this"
+ With Recce Cloud, the Agent retrieves artifacts automatically — no manual downloads. See [Start Free with Cloud](start-free-with-cloud.md).
+
+## Troubleshooting
+
+| Issue | Solution |
+|-------|----------|
+| "No artifacts found" error | Run `dbt docs generate` for both base and current states |
+| Empty Lineage Diff | Ensure you have uncommitted model changes vs the base branch |
+| Port 8000 already in use | Use `recce server --port 8001` to specify a different port |
+
+## Next Steps
+
+- [Cloud vs Open Source](../1-whats-recce/cloud-vs-oss.md) — Compare OSS and Cloud features
+- [Start Free with Cloud](start-free-with-cloud.md) — Get Recce Agent on your PRs and CLI
diff --git a/docs/2-getting-started/oss-vs-cloud.md b/docs/2-getting-started/oss-vs-cloud.md
deleted file mode 100644
index b88f5ba0..00000000
--- a/docs/2-getting-started/oss-vs-cloud.md
+++ /dev/null
@@ -1,52 +0,0 @@
----
-title: Choose best for you
----
-
-# Choose what's best for you
-
-Recce offers two ways to validate your data changes: Open Source and Recce Cloud. **We recommend starting with Recce Cloud** for the easiest setup and team collaboration.
-
-## Quick Comparison
-
-| | **Recce Cloud** ⭐ *Recommended* | **Open Source** |
-|---|---|---|
-| **Setup** | 30 seconds - just sign up | 5-10 minutes installation |
-| **Team Collaboration** | ✅ Real-time sharing, checklist sync | ❌ Local only |
-| **PR Integration** | ✅ Automatic PR gating | ⚠️ Manual setup required |
-| **Exclusive Features** | ✅ LLM validation insights, upcoming innovations | ❌ Core features only |
-| **Best For** | Data teams focused on validation | Initial experimentation |
-
-## Choose Recce Cloud If You...
-
-- Want the **fastest setup** and immediate results
-- Want to **focus on data validation**, not infrastructure setup
-- Want to do a **PoC** for your team to evaluate data tools
-- Need **team collaboration** and stakeholder reviews
-- Want **exclusive features** like LLM validation insights
-- Want **automatic updates** with new innovations as they're released
-- Prefer to spend time on data problems, not tool configuration
-
-👉 **[Start with Recce Cloud](start-free-with-cloud.md)**
-
-### Cloud Plans Overview
-
-- **Free Plan**: All core validation features with zero setup effort
-- **Team Plan**: Advanced collaboration + LLM validation insights + automated workflows
-- **Enterprise Plan**: Full governance + BYOC + SSO + dedicated support
-
-View [pricing](https://reccehq.com/pricing) to learn more
-
-## Choose Open Source If You...
-
-- Want to **experiment locally** before team adoption
-- Need to **explore Recce** without creating accounts first
-- Want to **understand the basics** before moving to Cloud
-
-👉 **[Open Source Setup](installation.md)**
-
-
-## What's Next
-
-**New to Recce?** We recommend **[Cloud](start-free-with-cloud.md)** for the smoothest experience. You can validate data changes and collaborate with your team within minutes.
-
-**Already technical?** **Cloud is still the smart choice.** Why spend time on infrastructure when you could focus on data validation? Plus you get exclusive features like LLM insights and automatic updates with new innovations.
\ No newline at end of file
diff --git a/docs/7-cicd/setup-cd.md b/docs/2-getting-started/setup-cd.md
similarity index 78%
rename from docs/7-cicd/setup-cd.md
rename to docs/2-getting-started/setup-cd.md
index 075d0d55..a24756d7 100644
--- a/docs/7-cicd/setup-cd.md
+++ b/docs/2-getting-started/setup-cd.md
@@ -1,10 +1,18 @@
---
title: Setup CD
+description: >-
+ Automate baseline updates for Recce Cloud with a continuous deployment workflow.
+ Keep your dbt validation baseline current after every merge to main.
---
+!!! tip "Following the onboarding guide?"
+ Return to [Get Started with Recce Cloud](start-free-with-cloud.md#3-add-recce-to-cicd) after completing this page.
+
# Setup CD - Auto-Update Baseline
-Set up automatic updates for your Recce Cloud base sessions. Keep your data comparison baseline current every time you merge to main, with no manual work required.
+Manually updating your Recce Cloud baseline after every merge is tedious and error-prone. This guide shows you how to automate baseline updates so your data comparison stays current without manual intervention.
+
+After completing this guide, your continuous deployment (CD) workflow automatically uploads dbt artifacts to Cloud whenever code merges to main.
## What This Does
@@ -12,15 +20,22 @@ Set up automatic updates for your Recce Cloud base sessions. Keep your data comp
- **Triggers**: Merge to main + scheduled updates + manual runs
- **Action**: Auto-update base Recce session with latest production artifacts
-- **Benefit**: Current comparison baseline for all future PRs/MRs
+- **Benefit**: Current comparison baseline for all future PRs
## Prerequisites
Before setting up CD, ensure you have:
-- [x] **Recce Cloud account** - [Start free trial](https://cloud.reccehq.com/)
-- [x] **Repository connected** to Recce Cloud - [Connect Git Provider](../2-getting-started/start-free-with-cloud.md#2-connect-git-provider)
-- [x] **dbt artifacts** - Know how to generate `manifest.json` and `catalog.json` from your dbt project
+- [ ] **Cloud account** - [Start free trial](https://cloud.reccehq.com/)
+- [ ] **Repository connected** to Cloud - [Connect Git Provider](start-free-with-cloud.md#2-connect-git-provider)
+- [ ] **dbt artifacts** - Know how to generate `manifest.json` and `catalog.json` from your dbt project
+- [ ] **Environment configured** - [Environment Setup](environment-setup.md) with `prod` target for base artifacts
+
+## Environment strategy
+
+This workflow uses the **main branch** with the `prod` target as the base environment. The base artifacts represent your production state, which PRs compare against.
+
+See [Environment Setup](environment-setup.md) for profiles.yml configuration.
## Setup
@@ -85,8 +100,8 @@ jobs:
**Key points:**
- `dbt build` and `dbt docs generate` create the required artifacts (`manifest.json` and `catalog.json`)
-- `recce-cloud upload --type prod` uploads the Base metadata to Recce Cloud
-- [`GITHUB_TOKEN`](https://docs.github.com/en/actions/concepts/security/github_token) authenticates with Recce Cloud
+- `recce-cloud upload --type prod` uploads the Base metadata to Cloud
+- [`GITHUB_TOKEN`](https://docs.github.com/en/actions/concepts/security/github_token) authenticates with Cloud
### GitLab CI/CD
@@ -153,7 +168,7 @@ recce-upload-prod:
**GitHub:**
-1. Go to **Actions** tab → Select "Update Base Recce Session"
+1. Go to **Actions** tab → Select "Update Base Metadata"
2. Click **Run workflow** → Monitor for completion
**GitLab:**
@@ -165,16 +180,16 @@ recce-upload-prod:
Look for these indicators:
-- [x] **Workflow/Pipeline completes** without errors
-- [x] **Base session updated** in [Recce Cloud](https://cloud.reccehq.com)
+- [ ] **Workflow/Pipeline completes** without errors
+- [ ] **Base session updated** in [Cloud](https://cloud.reccehq.com)
**GitHub:**
-{: .shadow}
+{: .shadow}
**GitLab:**
-{: .shadow}
+{: .shadow}
### Expected Output
@@ -290,12 +305,12 @@ prod-build:
**Solutions**:
-1. Verify your repository is connected in [Recce Cloud settings](https://cloud.reccehq.com/settings)
+1. Verify your repository is connected in [Cloud settings](https://cloud.reccehq.com/settings)
2. **For GitHub**: Ensure `GITHUB_TOKEN` is passed explicitly to the upload step and the job has `contents: read` permission
3. **For GitLab**: Verify project has GitLab integration configured
- - Check that you've created a [Personal Access Token](../2-getting-started/gitlab-pat-guide.md)
+ - Check that you've created a [Personal Access Token](gitlab-pat-guide.md)
- Ensure the token has appropriate scope (`api` or `read_api`)
- - Verify the project is connected in Recce Cloud settings
+ - Verify the project is connected in Cloud settings
### Upload failures
@@ -303,7 +318,7 @@ prod-build:
**Solutions**:
-1. Check network connectivity to Recce Cloud
+1. Check network connectivity to Cloud
2. Verify artifact files exist in `target/` directory
3. Review workflow/pipeline logs for detailed error messages
4. **For GitLab**: Ensure artifacts are passed between jobs:
@@ -321,14 +336,14 @@ prod-build:
### Session not appearing
-**Issue**: Upload succeeds but session doesn't appear in Recce Cloud
+**Issue**: Upload succeeds but session doesn't appear in Cloud
**Solutions**:
-1. Check you're viewing the correct repository in Recce Cloud
+1. Check you're viewing the correct repository in Cloud
2. Verify you're looking at the production/base sessions (not PR/MR sessions)
-3. Check session filters in Recce Cloud (may be hidden by filters)
-4. Refresh the Recce Cloud page
+3. Check session filters in Cloud (may be hidden by filters)
+4. Refresh the Cloud page
### Schedule not triggering (GitLab only)
@@ -344,4 +359,4 @@ prod-build:
## Next Steps
-**[Setup CI](./setup-ci.md)** to automatically validate PR/MR changes against your updated base session. This completes your CI/CD pipeline by adding automated data validation for every pull request or merge request.
+**[Setup CI](setup-ci.md)** to automatically validate PR/MR changes against your updated base session. This completes your CI/CD pipeline by adding automated data validation for every pull request or merge request.
diff --git a/docs/7-cicd/setup-ci.md b/docs/2-getting-started/setup-ci.md
similarity index 73%
rename from docs/7-cicd/setup-ci.md
rename to docs/2-getting-started/setup-ci.md
index 9310ee03..49175985 100644
--- a/docs/7-cicd/setup-ci.md
+++ b/docs/2-getting-started/setup-ci.md
@@ -1,27 +1,42 @@
---
title: Setup CI
+description: >-
+ Set up continuous integration to automatically validate dbt data changes on
+ every pull request. Prevent data quality regressions before merge.
---
-# Setup CI - Auto-Validate PRs/MRs
+!!! tip "Following the onboarding guide?"
+ Return to [Get Started with Recce Cloud](start-free-with-cloud.md#3-add-recce-to-cicd) after completing this page.
-Automatically validate your data changes in every pull request or merge request using Recce Cloud. Catch data issues before they reach production, with validation results right in your PR/MR.
+# Setup CI - Auto-Validate PRs
+
+Manual data validation before merging is error-prone and slows down PR reviews. This guide shows you how to set up continuous integration (CI) that automatically validates data changes in every pull request (PR).
+
+After completing this guide, your CI workflow validates every PR against your production baseline, with results appearing in Recce Cloud.
## What This Does
-**Automated PR/MR Validation** prevents data regressions before merge:
+**Automated PR Validation** prevents data regressions before merge:
-- **Triggers**: PR/MR opened or updated against main
+- **Triggers**: PR opened or updated against main
- **Action**: Auto-update Recce session for validation
-- **Benefit**: Automated data validation and comparison visible in your PR/MR
+- **Benefit**: Automated data validation and comparison visible in your PR
## Prerequisites
Before setting up CI, ensure you have:
-- [x] **Recce Cloud account** - [Start free trial](https://cloud.reccehq.com/)
-- [x] **Repository connected** to Recce Cloud - [Connect Git Provider](../2-getting-started/start-free-with-cloud.md#2-connect-git-provider)
-- [x] **dbt artifacts** - Know how to generate `manifest.json` and `catalog.json` from your dbt project
-- [x] **CD configured** - [Setup CD](./setup-cd.md) to establish baseline for comparisons
+- [ ] **Cloud account** - [Start free trial](https://cloud.reccehq.com/)
+- [ ] **Repository connected** to Cloud - [Connect Git Provider](start-free-with-cloud.md#2-connect-git-provider)
+- [ ] **dbt artifacts** - Know how to generate `manifest.json` and `catalog.json` from your dbt project
+- [ ] **CD configured** - [Setup CD](setup-cd.md) to establish baseline for comparisons
+- [ ] **Environment configured** - [Environment Setup](environment-setup.md) with `ci` target for per-PR schemas
+
+## Environment strategy
+
+This workflow uses **per-PR schemas** with the `ci` target as the current environment. Each PR gets an isolated schema (e.g., `pr_123`) that compares against the base artifacts from CD.
+
+See [Environment Setup](environment-setup.md) for profiles.yml configuration and why per-PR schemas are recommended.
## Setup
@@ -89,7 +104,7 @@ jobs:
- Creates a per-PR schema (`PR_123`, `PR_456`, etc.) using the dynamic `SNOWFLAKE_SCHEMA` environment variable to isolate each PR's data
- `dbt build` and `dbt docs generate` create the required artifacts (`manifest.json` and `catalog.json`)
- `recce-cloud upload` (without `--type`) auto-detects this is a PR session
-- [`GITHUB_TOKEN`](https://docs.github.com/en/actions/concepts/security/github_token) authenticates with Recce Cloud
+- [`GITHUB_TOKEN`](https://docs.github.com/en/actions/concepts/security/github_token) authenticates with Cloud
### GitLab CI/CD
@@ -150,7 +165,7 @@ recce-upload:
## Verification
-### Test with a PR/MR
+### Test with a PR
**GitHub:**
@@ -168,17 +183,17 @@ recce-upload:
Look for these indicators:
-- [x] **Workflow/Pipeline completes** without errors
-- [x] **PR/MR session created** in [Recce Cloud](https://cloud.reccehq.com)
-- [x] **Session URL** appears in workflow/pipeline output
+- [ ] **Workflow/Pipeline completes** without errors
+- [ ] **PR session created** in [Cloud](https://cloud.reccehq.com)
+- [ ] **Session URL** appears in workflow/pipeline output
**GitHub:**
-{: .shadow}
+{: .shadow}
**GitLab:**
-{: .shadow}
+{: .shadow}
### Expected Output
@@ -232,12 +247,12 @@ Artifacts from: "/builds/your-org/your-project/target"
Change request: https://gitlab.com/your-org/your-project/-/merge_requests/4
```
-### Review PR/MR Session
+### Review PR Session
To analyze the changes in detail:
-1. Go to your [Recce Cloud](https://cloud.reccehq.com)
-2. Find the PR/MR session that was created
+1. Go to your [Cloud](https://cloud.reccehq.com)
+2. Find the PR session that was created
3. Launch Recce instance to explore data differences
## Advanced Options
@@ -269,18 +284,17 @@ If CI is not working, the issue is likely in your CD setup. Most problems are sh
- Upload errors
- Sessions not appearing
-**→ See the [Setup CD Troubleshooting section](./setup-cd.md#troubleshooting)** for detailed solutions.
+**→ See the [Setup CD Troubleshooting section](setup-cd.md#troubleshooting)** for detailed solutions.
**CI-specific tip:** If CD works but CI doesn't, verify:
-1. PR/MR trigger conditions in your workflow configuration
-2. The PR/MR is targeting the correct base branch (usually `main`)
-3. You're looking at PR/MR sessions in Recce Cloud (not production sessions)
+1. PR trigger conditions in your workflow configuration
+2. The PR is targeting the correct base branch (usually `main`)
+3. You're looking at PR sessions in Cloud (not production sessions)
## Next Steps
-After setting up CI, explore these workflow guides:
+After setting up CI, explore these guides:
-- [PR/MR review workflow](./scenario-pr-review.md) - Collaborate with teammates using Recce
-- [Preset checks](./preset-checks.md) - Configure automatic validation checks
-- [Best practices](./best-practices-prep-env.md) - Environment preparation tips
+- [Environment Best Practices](environment-best-practices.md) - Strategies for source data and schema management
+- [Get Started with Cloud](start-free-with-cloud.md) - Complete onboarding guide
diff --git a/docs/2-getting-started/start-free-with-cloud.md b/docs/2-getting-started/start-free-with-cloud.md
index 4edc0d04..36ecb712 100644
--- a/docs/2-getting-started/start-free-with-cloud.md
+++ b/docs/2-getting-started/start-free-with-cloud.md
@@ -1,45 +1,34 @@
---
title: Get Started with Recce Cloud
+description: >-
+ Set up Recce Cloud to automate dbt validation on pull requests. Follow this
+ tutorial to enable automated data review for your team.
---
# Get Started with Recce Cloud
-This tutorial helps analytics engineers and data engineers set up Recce Cloud to automate data review on pull requests.
+Set up Cloud to automate data review on every pull request. This guide walks you through each onboarding step.
-[**Get Started**](https://cloud.reccehq.com/onboarding/get-started)
+[**Get Started**](https://cloud.reccehq.com/onboarding/get-started)
## Goal
-Reviewing data changes in PRs is error-prone without visibility into downstream impact. After setup, the Recce agent reviews your data changes on every PR—showing what changed and what it affects.
+Recce compares **Base** vs **Current** environments to validate data changes in every PR:
-To validate changes, Recce compares **Base** vs **Current** environments:
+- **Base**: your main branch (production)
+- **Current**: your PR branch (development)
+- **Per-PR schema**: an isolated database schema created for each pull request, so multiple PRs can validate simultaneously without conflicts
-- **Base**: models in the main branch (production)
-- **Current**: models in the PR branch
-
-Recce requires dbt artifacts from both environments. This guide covers:
-
-- dbt profile configuration for Base and Current
-- CI/CD workflow setup
-
-For accurate comparisons, both environments should use consistent data ranges. See [Best Practices for Preparing Environments](../7-cicd/best-practices-prep-env.md) for environment strategies.
-
-This guide uses Snowflake, GitHub, and GitHub Actions examples, but can be adapted to your configuration. The setup assumes:
-
-- Production runs a full daily refresh
-- No pre-configured per-PR environments exist yet
-- Each developer has their own dev environment for local work
-
-We'll configure CI to create isolated, per-PR schemas automatically.
+For accurate comparisons, both environments should use consistent data ranges. See [Best Practices for Preparing Environments](environment-best-practices.md) for environment strategies.
## Prerequisites
-- [x] **Recce Cloud account**: free trial at [cloud.reccehq.com](https://cloud.reccehq.com)
-- [x] **dbt project in a git repository that runs successfully:** your environment can execute `dbt build` and `dbt docs generate`
-- [x] **Repository admin access for setup**: required to add workflows and secrets
-- [x] **Data warehouse**: read access to your warehouse for data diffing
+- [ ] **Cloud account**: free trial at [cloud.reccehq.com](https://cloud.reccehq.com)
+- [ ] **dbt project in a git repository that runs successfully:** your environment can execute `dbt build` and `dbt docs generate`
+- [ ] **Repository admin access for setup**: required to add workflows and secrets
+- [ ] **Data warehouse**: read access to your warehouse for data diffing
## Onboarding Process Overview
@@ -50,7 +39,7 @@ After signing up, you'll enter the onboarding flow:
3. Add Recce to CI/CD
4. Merge the CI/CD change
-## Recce Web Agent Setup [Experimental]
+## Recce Web Agent Setup
You can use the Recce Web Agent to help automate your setup. Currently it handles **step 3** (Add Recce to CI/CD):
@@ -60,8 +49,6 @@ You can use the Recce Web Agent to help automate your setup. Currently it handle
The agent covers common setups and continues to expand coverage. If your setup isn't supported yet, the agent directs you to the Setup Guide below for manual configuration. Need help? Contact us at support@reccehq.com.
-**Coming soon**: The agent will guide you through steps 1–3, including warehouse connection, Git connection, and CI/CD configuration.
-
---
## Setup Guide
@@ -72,234 +59,71 @@ First, go to [cloud.reccehq.com](https://cloud.reccehq.com) and create your free
### 1. Connect Data Warehouse
-1. Select your data warehouse (e.g. Snowflake)
-2. Provide your read-only warehouse credentials
+Provide read-only credentials so Recce can run data diffs against your warehouse.
-> **Note**: This guide uses Snowflake. For supported warehouses, see [Connect to Warehouse](../5-data-diffing/connect-to-warehouse.md).
+**[Connect Data Warehouse](connect-to-warehouse.md)**
### 2. Connect Git Provider
-1. Click **Connect GitHub**
-2. Authorize the Recce app installation
-3. Select the repositories you want to connect
+Authorize the Recce app and select the repositories you want to connect.
-> **Note**: This guide uses GitHub. For GitLab setup, see [GitLab Personal Access Token](gitlab-pat-guide.md).
+**[Connect Your Repository](connect-git.md)**
### 3. Add Recce to CI/CD
-This step adds CI/CD workflow files to your repository. The agent creates these automatically. For manual setup, create and merge a PR with the templates below.
-
-> **Note**: This guide uses GitHub Actions. For other CI/CD platforms, see [Setup CD](../7-cicd/setup-cd.md) and [Setup CI](../7-cicd/setup-ci.md).
-
-#### Set Up Profile.yml
-
-The profile.yml file tells your system where to look for the "base" and "current" builds. We have a sample `profile.yml` file:
-
-```yaml
-:
- target: dev
- outputs:
- dev:
- type: snowflake
- account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
- user: "{{ env_var('SNOWFLAKE_USER') | as_text }}"
- password: "{{ env_var('SNOWFLAKE_PASSWORD') | as_text }}"
- role: DEVELOPER
- database: cloud_database
- warehouse: LOAD_WH
- schema: "{{ env_var('SNOWFLAKE_SCHEMA') | as_text }}"
- threads: 4
-
- ## Add a new target for CI
- ci:
- type: snowflake
- account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
- user: "{{ env_var('SNOWFLAKE_USER') | as_text }}"
- password: "{{ env_var('SNOWFLAKE_PASSWORD') | as_text }}"
- role: DEVELOPER
- database: cloud_database
- warehouse: LOAD_WH
- schema: "{{ env_var('SNOWFLAKE_SCHEMA') | as_text }}"
- threads: 4
-
- prod:
- type: snowflake
- account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
- user: "{{ env_var('SNOWFLAKE_USER') | as_text }}"
- password: "{{ env_var('SNOWFLAKE_PASSWORD') | as_text }}"
- role: DEVELOPER
- database: cloud_database
- warehouse: LOAD_WH
- schema: PUBLIC
- threads: 4
-```
-
-In this sample:
-
-1. **Base** uses the `prod` target pointing to the `PUBLIC` schema (your production data)
-2. **Current** uses the `ci` target with a dynamic schema via `env_var('SNOWFLAKE_SCHEMA')`
-
-The `ci` target uses an environment variable for the schema name. In `pr-workflow.yml` below, we set `SNOWFLAKE_SCHEMA: "PR_${{ github.event.pull_request.number }}"` to create isolated environments per PR (e.g., `PR_123`, `PR_456`). This isolates each PR's data so multiple PRs can run without conflicts.
-
-> NOTE: Ensure your data warehouse allows creating schemas dynamically. The CI runner needs write permissions to create PR-specific schemas (e.g., `PR_123`).
-
-#### About Secrets
-
-The workflows use two types of secrets:
-
-- **`GITHUB_TOKEN`**: automatically provided by GitHub Actions, no configuration needed. This is used by the GitHub integration you just set up to connect the results of the call to Recce.
-- **Warehouse credentials**: your existing secrets for dbt (e.g., `SNOWFLAKE_ACCOUNT`, `SNOWFLAKE_USER`, `SNOWFLAKE_PASSWORD`). If your dbt project already runs in CI, you have these configured.
-
-#### Set Up Base Metadata Updates
-
-The Base environment should reflect the dbt configuration in the main branch. Example workflow file: `base-workflow.yml`
-
-```yaml
-name: Update Base Metadata
-on:
- push:
- branches: ["main"]
- schedule:
- - cron: "0 2 * * *"
- workflow_dispatch:
-
-concurrency:
- group: ${{ github.workflow }}
- cancel-in-progress: true
-
-jobs:
- update-base-session:
- runs-on: ubuntu-latest
- timeout-minutes: 30
- permissions:
- contents: read
- steps:
- - name: Checkout code
- uses: actions/checkout@v4
-
- - name: Setup Python
- uses: actions/setup-python@v5
- with:
- python-version: "3.11"
- cache: "pip"
-
- - name: Install dependencies
- run: pip install -r requirements.txt
-
- - name: Prepare dbt artifacts
- run: |
- dbt deps
- dbt build --target prod
- dbt docs generate --target prod
- env:
- DBT_ENV_SECRET_KEY: ${{ secrets.DBT_ENV_SECRET_KEY }}
- SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
- SNOWFLAKE_USER: ${{ secrets.SNOWFLAKE_USER }}
- SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}
- SNOWFLAKE_DATABASE: ${{ secrets.SNOWFLAKE_DATABASE }}
- SNOWFLAKE_WAREHOUSE: ${{ secrets.SNOWFLAKE_WAREHOUSE }}
-
- ## Add this part
- - name: Upload to Recce Cloud
- run: |
- pip install recce-cloud
- recce-cloud upload --type prod
- env:
- GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-```
-
-This sample workflow:
-
-- Runs once a day
-- Installs Python 3.11 and the contents of `requirements.txt`, and recce-cloud
-- **Calls `dbt docs generate`** to generate artifacts
-- **Calls `recce-cloud upload --type prod`** to upload the Base metadata, using `GITHUB_TOKEN` for authentication
-
-To integrate into your own configuration, ensure your workflow includes the bolded steps.
-
-#### Set Up Current Metadata Updates
-
-The Current environment should reflect the dbt configuration in the PR branch. Recce provides an example workflow file: `pr-workflow.yml`
-
-```yaml
-name: Validate PR Changes
-on:
- pull_request:
- branches: ["main"]
-
-concurrency:
- group: ${{ github.workflow }}-${{ github.ref }}
- cancel-in-progress: true
-
-jobs:
- validate-changes:
- runs-on: ubuntu-latest
- timeout-minutes: 45
- permissions:
- contents: read
- pull-requests: write
- steps:
- - name: Checkout PR branch
- uses: actions/checkout@v4
- with:
- fetch-depth: 2
-
- - name: Setup Python
- uses: actions/setup-python@v5
- with:
- python-version: "3.11"
- cache: "pip"
-
- - name: Install dependencies
- run: pip install -r requirements.txt
-
- - name: Build current branch artifacts
- run: |
- dbt deps
- dbt build --target ci
- dbt docs generate --target ci
- env:
- DBT_ENV_SECRET_KEY: ${{ secrets.DBT_ENV_SECRET_KEY }}
- SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
- SNOWFLAKE_USER: ${{ secrets.SNOWFLAKE_USER }}
- SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}
- SNOWFLAKE_DATABASE: ${{ secrets.SNOWFLAKE_DATABASE }}
- SNOWFLAKE_WAREHOUSE: ${{ secrets.SNOWFLAKE_WAREHOUSE }}
- SNOWFLAKE_SCHEMA: "PR_${{ github.event.pull_request.number }}"
-
- - name: Upload to Recce Cloud
- run: |
- pip install recce-cloud
- recce-cloud upload
- env:
- GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-```
-
-This sample workflow:
-
-- Runs on every PR targeting main
-- Installs Python 3.11, dependencies from `requirements.txt`, and recce-cloud
-- **Creates a per-PR schema** (`PR_123`, `PR_456`, etc.) using the dynamic `SNOWFLAKE_SCHEMA` environment variable—this isolates each PR's data so multiple PRs can run simultaneously without conflicts
-- **Calls `dbt docs generate --target ci`** to generate artifacts for the PR branch
-- **Calls `recce-cloud upload`** to upload the Current metadata, using `GITHUB_TOKEN` for authentication
-
-To integrate into your own configuration, ensure your workflow includes the bolded steps.
+This step adds CI/CD workflow files to your repository. The web agent detects your setup and guides you through. For manual setup, follow the linked guides below.
+
+#### Choose your setup
+
+| Question | If this is you... | Then... |
+|----------|-------------------|---------|
+| **How do you run dbt?** | You own your dbt run (GitHub Actions, GitLab CI, CircleCI) | Continue reading below |
+| | You run dbt on a platform (dbt Cloud, Paradime, etc.) | See [dbt Cloud Setup](dbt-cloud-setup.md) |
+| **How complex is your environment?** | Simple (prod and dev targets) | Continue reading below. We use per-PR schemas for fast setup. See [Environment Setup](environment-setup.md) for why. |
+| | Advanced (multiple schemas, staging environments) | See [Environment Setup](environment-setup.md) |
+| **What's your CI/CD platform?** | GitHub Actions | Continue reading below |
+| | Other (GitLab CI, CircleCI, etc.) | See [Setup CD](setup-cd.md) and [Setup CI](setup-ci.md) |
+
+Configure in this order: profile, then CD, then CI. CD establishes the production baseline that CI compares against.
+
+**a. Configure your dbt profile**
+
+Add `ci` and `prod` targets to your `profiles.yml` so Recce can compare base and current environments.
+
+**[Environment Setup](environment-setup.md)**
+
+**b. Set up baseline updates (CD)**
+
+Add a workflow that uploads production artifacts to Cloud after every merge to main.
+
+**[Setup CD](setup-cd.md)**
+
+**c. Set up PR validation (CI)**
+
+Add a workflow that uploads PR branch artifacts so Recce can validate changes before merge.
+
+**[Setup CI](setup-ci.md)**
+
+Your workflows use `GITHUB_TOKEN` (automatically provided by GitHub Actions) and your existing warehouse credential secrets.
+
+!!! note "recce vs recce-cloud"
+ `pip install recce` is the open source CLI for local validation. `pip install recce-cloud` is the CI/CD uploader for Cloud.
### 4. Merge the CI/CD change
Merge the PR containing the workflow files. After merging:
-- The **Base workflow** automatically uploads your Base to Recce Cloud
+- The **Base workflow** automatically uploads your Base to Cloud
- The **Current workflow** is ready to validate future PRs
-In Recce Cloud, verify you see:
+In Cloud, verify you see:
- GitHub Integration: Connected
- Warehouse Connection: Connected
- Production Metadata: Updated automatically
- PR Sessions: all open PRs appear in the list. Only PRs with uploaded metadata can be launched for review.
-{: .shadow}
+{: .shadow}
### 5. Final Steps
@@ -313,24 +137,24 @@ You can now:
## Verification Checklist
-- [x] **Base workflow**: Trigger manually, check Base metadata appears in Recce Cloud
-- [x] **Current workflow**: Create a test PR, verify PR session appears
-- [x] **Data diff**: Open PR session, run Row Count Diff
+- [ ] **Base workflow**: Trigger manually, check Base metadata appears in Cloud
+- [ ] **Current workflow**: Create a test PR, verify PR session appears
+- [ ] **Data diff**: Open PR session, run Row Count Diff
## Troubleshooting
| Issue | Solution |
| --- | --- |
-| Authentication errors | Confirm repository is connected in Recce Cloud settings |
+| Authentication errors | Confirm repository is connected in Cloud settings |
| Push to main blocked | Check branch protection rules |
| Secret names don't match | Update template to use your existing secret names |
| Workflow fails | Check secrets are configured correctly |
| Artifacts missing | Ensure `dbt docs generate` completes before upload |
| Warehouse connection fails | Check IP whitelisting; add GitHub Actions IP ranges |
-## Related Resources
+## Next Steps
-- [CI/CD Getting Started](../7-cicd/ci-cd-getting-started.md)
-- [Setup CD](../7-cicd/setup-cd.md)
-- [Setup CI](../7-cicd/setup-ci.md)
-- [Best Practices for Preparing Environments](../7-cicd/best-practices-prep-env.md)
+- [Environment Setup](environment-setup.md) - Configure dbt profiles and CI/CD variables
+- [Setup CD](setup-cd.md) - Detailed CD workflow guide
+- [Setup CI](setup-ci.md) - Detailed CI workflow guide
+- [Environment Best Practices](environment-best-practices.md) - Strategies for source data and schema management
diff --git a/docs/3-using-recce/admin-setup.md b/docs/3-using-recce/admin-setup.md
new file mode 100644
index 00000000..a11d72c5
--- /dev/null
+++ b/docs/3-using-recce/admin-setup.md
@@ -0,0 +1,119 @@
+---
+title: Admin Setup
+---
+
+# Set Up Your Organization
+
+After connecting your Git repo to Recce Cloud, you need to configure your organization so your team can collaborate on PR validation.
+
+**Goal:** Configure your Recce Cloud organization for team collaboration.
+
+When you sign up for Recce Cloud, you get one organization and one project. After connecting to Git, your organization and project names automatically map to your Git provider's names. You can rename them and invite team members.
+
+## Prerequisites
+
+- [x] Recce Cloud account with owner/admin access
+- [x] Git repository connected to Recce Cloud
+- [x] Team members' email addresses
+
+## Steps
+
+### 1. Access organization settings
+
+Navigate to your organization configuration.
+
+1. Log in to [Recce Cloud](https://cloud.reccehq.com)
+2. Click **Settings** → **Organization** in the side panel
+
+**Expected result:** Organization settings page displays your current organization.
+
+{: .shadow}
+
+### 2. Rename your organization (optional)
+
+Update the organization name to match your company or team.
+
+1. In Organization Settings, find the **Organization Name** field
+2. Enter your preferred name
+3. Click **Save**
+
+**Expected result:** Organization name updates across all Recce Cloud pages.
+
+### 3. Set up additional projects (monorepo)
+
+!!! note "For monorepo users"
+ If your repository contains multiple dbt projects, set up additional projects before inviting team members. Skip this step if you have a single dbt project.
+
+1. In Organization Settings, navigate to **Projects**
+2. Click **Add Project**
+3. Enter the project name and select the subdirectory path
+4. Click **Create**
+
+**Expected result:** New project appears in the project list and sidebar.
+
+### 4. Rename your project (optional)
+
+Update the project name if needed.
+
+1. In Organization Settings, navigate to **Projects**
+2. Click on the project you want to rename
+3. Enter the new project name
+4. Click **Save**
+
+**Expected result:** Project name updates in the sidebar and project list.
+
+### 5. Invite team members
+
+Add collaborators to your organization.
+
+1. In Organization Settings, find the **Members** section
+2. Click **Invite Members**
+3. Enter email addresses (use SSO email if members use SSO login)
+4. Select a role for each invitee
+5. Click **Send Invitation**
+6. Tell invitees: when they log in, a modal appears asking them to accept the invitation. See [For Invited Users](#for-invited-users)
+
+| Role | Permissions |
+|------|-------------|
+| **Owner** | The one who created this organization. Full organization management: update info, manage roles, remove members |
+| **Admin** | Same permissions as Owner |
+| **Member** | Upload metadata, launch Recce instances, view organization info |
+
+!!! tip "SSO login requires Team plan or above"
+ SSO login is available on the Team plan and above. See [Pricing](https://www.reccehq.com/pricing) for plan details.
+
+
+**Expected result:** Invitees receive email invitations and see notifications when logged in.
+
+{: .shadow}
+
+## Verify Success
+
+Confirm your setup by checking:
+
+1. Organization name displays correctly in the sidebar
+2. Invited members appear in the Members list (pending or active)
+3. All projects are listed under Settings → Projects
+
+## Troubleshooting
+
+| Issue | Solution |
+|-------|----------|
+| Invitation not received | Check spam folder; verify email address matches SSO provider |
+| Member sees their own org, not company org | They may have signed up with a different email than the one you invited; ask them to log in with the invited email |
+| Cannot change organization name | Confirm you have Admin role |
+| Project not appearing | Refresh the page; verify the subdirectory path is correct |
+
+## For Invited Users
+
+When you receive an invitation:
+
+1. **Immediate response:** A notification modal appears on login — accept or decline directly
+2. **Later:** Navigate to **Settings** → **Organization** to view pending invitations
+
+{: .shadow}
+
+
diff --git a/docs/3-using-recce/data-developer.md b/docs/3-using-recce/data-developer.md
new file mode 100644
index 00000000..c0e13db4
--- /dev/null
+++ b/docs/3-using-recce/data-developer.md
@@ -0,0 +1,162 @@
+---
+title: Data Developer Workflow
+---
+
+# Data Developer Workflow
+
+Validate data changes throughout your development lifecycle. This guide covers validating changes before creating a PR (dev sessions) and iterating on feedback after your PR is open.
+
+**Goal:** Validate data changes at every stage of development, from local work through PR merge.
+
+## Prerequisites
+
+- [x] Recce Cloud account
+- [x] dbt project with CI/CD configured for Recce
+- [x] Access to your data warehouse
+
+## Development Stages
+
+### Before PR: Dev Sessions
+
+Validate changes locally before pushing to remote. Dev sessions let you run Recce validation without creating a PR.
+
+#### Upload via Web UI
+
+1. Go to [Recce Cloud](https://cloud.reccehq.com)
+2. Navigate to your project
+3. Click **New Dev Session**
+4. Upload your dbt artifacts:
+ - `target/manifest.json`
+ - `target/catalog.json`
+5. Select your base environment for comparison
+
+**Expected result:** Dev session opens with lineage diff showing your changes.
+
+#### Upload via CLI
+
+Run from your dbt project directory:
+
+```bash
+recce-cloud upload --type dev
+```
+
+This uploads your current `target/` artifacts and creates a dev session.
+
+**Required files:**
+
+| File | Location | Generated by |
+|------|----------|--------------|
+| `manifest.json` | `target/` | `dbt run`, `dbt build`, or `dbt compile` |
+| `catalog.json` | `target/` | `dbt docs generate` |
+
+#### When to Use Dev Sessions
+
+- Testing changes before committing
+- Validating complex refactoring locally
+- Exploring impact without creating a PR
+- Sharing work-in-progress with teammates
+
+### After PR: CI/CD Validation
+
+Once you push changes and open a PR, the Recce Agent validates automatically.
+
+#### What Happens
+
+1. Your CI pipeline runs `recce-cloud upload`
+2. The agent compares your PR branch against the base branch
+3. The agent runs validation checks based on detected changes
+4. A data review summary posts to your PR
+
+#### Understanding the Agent Summary
+
+The summary includes:
+
+- **Change overview** - Which models changed and how
+- **Impact analysis** - Downstream models affected
+- **Validation results** - Schema diffs, row counts, and other checks
+- **Recommendations** - Suggested actions for review
+
+#### Fixing Issues
+
+When the agent identifies issues:
+
+1. Review the validation results in the PR comment
+2. Click **Launch Recce** to explore details in the web UI
+3. Identify the root cause using lineage and data diffs
+4. Make fixes in your branch
+5. Push changes - the agent re-validates automatically
+
+#### Iterating Until Checks Pass
+
+Each push triggers a new validation cycle:
+
+1. Agent re-analyzes your changes
+2. New validation results post to the PR
+3. Previous results are updated (not duplicated)
+4. Continue until all checks pass
+
+## Validation Techniques
+
+### Check Lineage First
+
+Start with lineage diff to understand your change scope:
+
+- Modified models highlighted in the DAG
+- Downstream impact visible at a glance
+- Schema changes shown per model
+
+### Validate Metadata
+
+Low-cost checks using model metadata:
+
+- **Schema diff** - Column additions, removals, type changes
+- **Row count diff** - Record count comparison (uses warehouse metadata)
+
+### Validate Data
+
+Higher-cost checks that query your warehouse:
+
+- **Value diff** - Column-level match percentage
+- **Profile diff** - Statistical comparison (count, distinct, min, max, avg)
+- **Histogram diff** - Distribution changes for numeric columns
+- **Top-K diff** - Distribution changes for categorical columns
+
+### Custom Queries
+
+For flexible validation, use query diff:
+
+```sql
+SELECT
+ date_trunc('month', order_date) AS month,
+ SUM(amount) AS revenue
+FROM {{ ref('orders') }}
+GROUP BY month
+ORDER BY month DESC
+```
+
+Add queries to your checklist for repeated use.
+
+## Verification
+
+Confirm your workflow works:
+
+1. Make a small model change locally
+2. Generate artifacts: `dbt build && dbt docs generate`
+3. Upload dev session: `recce-cloud upload --type dev`
+4. Verify session appears in Recce Cloud
+5. Create PR and confirm agent posts summary
+
+## Troubleshooting
+
+| Issue | Solution |
+|-------|----------|
+| Dev session upload fails | Check artifacts exist in `target/`; run `dbt docs generate` |
+| Agent doesn't run on PR | Verify CI workflow includes `recce-cloud upload` |
+| Validation results missing | Check warehouse credentials in CI secrets |
+| Summary not appearing | Confirm `GITHUB_TOKEN` has PR write permissions |
+
+## Related
+
+- [Data Reviewer Workflow](data-reviewer.md) - How reviewers use Recce
+- [Admin Setup](admin-setup.md) - Set up your organization
+- [PR/MR Data Review](../7-cicd/pr-mr-summary.md) - Understanding agent summaries
diff --git a/docs/3-using-recce/data-reviewer.md b/docs/3-using-recce/data-reviewer.md
new file mode 100644
index 00000000..3aceef0e
--- /dev/null
+++ b/docs/3-using-recce/data-reviewer.md
@@ -0,0 +1,125 @@
+---
+title: Data Reviewer Workflow
+---
+
+# Data Reviewer Workflow
+
+Review data changes in pull requests using Recce. Your admin set up Recce for your team - here's how to use it as a reviewer.
+
+**Goal:** Review and approve data changes in PRs with confidence.
+
+## Prerequisites
+
+- [x] Recce Cloud account (via team invitation)
+- [x] Access to the project in Recce Cloud
+- [x] PR with Recce validation results
+
+## Reviewing a PR
+
+### 1. Find the Data Review Summary
+
+When a PR modifies dbt models, the Recce Agent posts a summary comment:
+
+1. Open the PR in GitHub/GitLab
+2. Scroll to the Recce bot comment
+3. Review the summary sections
+
+**Expected result:** Summary shows change overview, impact analysis, and validation results.
+
+### 2. Understand the Summary
+
+The summary includes:
+
+| Section | What It Shows |
+|---------|---------------|
+| **Change Overview** | Which models changed and the type of change |
+| **Impact Analysis** | Downstream models affected by the changes |
+| **Validation Results** | Schema diffs, row counts, and check outcomes |
+| **Recommendations** | Suggested actions based on findings |
+
+### 3. Explore in Recce Cloud
+
+For deeper investigation:
+
+1. Click **Launch Recce** in the PR comment (or go to Recce Cloud)
+2. Select the PR session from the list
+3. Explore the changes interactively
+
+**What you can do:**
+
+- View lineage diff to see affected models
+- Drill into schema changes per model
+- Run additional data diffs (row count, profile, value)
+- Execute custom queries to investigate specific concerns
+
+### 4. Review Validation Results
+
+Check each validation result:
+
+- **Pass** - Change validated successfully
+- **Warning** - Review recommended but not blocking
+- **Fail** - Issue detected that needs attention
+
+For failures, click through to see:
+- What was compared
+- Expected vs actual results
+- Specific differences found
+
+### 5. Approve or Request Changes
+
+Based on your review:
+
+**Approve the PR:**
+
+- Validation results meet expectations
+- Impact scope is understood and acceptable
+- No unexpected data changes
+
+**Request changes:**
+
+- Validation failures need investigation
+- Impact scope is broader than expected
+- Questions about specific changes
+
+Leave comments referencing specific validation results to help the developer address issues.
+
+## Common Review Scenarios
+
+### Schema Changes
+
+When columns are added, removed, or modified:
+
+1. Check if downstream models are affected
+2. Verify the change is intentional
+3. Confirm breaking changes are coordinated
+
+### Row Count Differences
+
+When record counts change:
+
+1. Determine if the change is expected
+2. Check if filters or joins were modified
+3. Verify the magnitude is reasonable
+
+### Performance Impact
+
+When models are refactored:
+
+1. Compare query complexity
+2. Check for unintended full table scans
+3. Review impact on downstream refresh times
+
+## Verification
+
+Confirm you can review PRs:
+
+1. Open a PR with Recce validation results
+2. Find the Recce bot comment
+3. Click Launch Recce to open the session
+4. Navigate the lineage and view a diff result
+
+## Related
+
+- [Data Developer Workflow](data-developer.md) - How developers validate changes
+- [Admin Setup](admin-setup.md) - Organization and team setup
+- [Checklist](../6-collaboration/checklist.md) - Adding checks to track
diff --git a/docs/4-what-the-agent-does/automated-validation.md b/docs/4-what-the-agent-does/automated-validation.md
new file mode 100644
index 00000000..a5b42900
--- /dev/null
+++ b/docs/4-what-the-agent-does/automated-validation.md
@@ -0,0 +1,61 @@
+---
+title: Automated Validation
+---
+
+# Automated Validation
+
+Manual data validation slows down every pull request. Developers must remember which checks to run, execute them correctly, and communicate results to reviewers. The Recce Agent automates this process, running the right validation checks based on what changed in your PR.
+
+## How It Works
+
+When a PR is opened or updated, the Recce Agent analyzes your changes and determines what needs validation.
+
+### 1. PR Triggers the Agent
+
+Your CI/CD pipeline runs `recce-cloud upload` when dbt metadata is updated. This triggers the agent to analyze the changes.
+
+### 2. Agent Analyzes Changes
+
+The agent reads dbt artifacts from both your base branch and PR branch. It identifies:
+
+- Which models were modified
+- What schema changes occurred
+- Which downstream models are affected
+
+### 3. Agent Runs Validation
+
+Based on the analysis, the agent executes appropriate validation checks against your warehouse:
+
+- **Schema diff** - Detects added, removed, or modified columns
+- **Row count diff** - Compares record counts between branches
+- **Profile diff** - Analyzes statistical changes in column values
+- **Breaking change analysis** - Identifies changes that affect downstream models
+
+### 4. Agent Posts Summary
+
+The agent generates a data review summary and posts it directly to your PR. Reviewers see:
+
+- What changed and why it matters
+- Validation results with pass/fail status
+- Recommended actions for review
+
+## When to Use
+
+- **Every PR that modifies dbt models** - The agent runs automatically for all data changes
+- **Large-scale refactoring** - When many models change, automated validation catches issues you might miss
+- **Critical path changes** - When modifying models that power dashboards or reports
+- **Continuous integration** - As part of your CI pipeline to validate every change
+
+## Triggering Validation
+
+You can trigger the data review summary in three ways:
+
+1. **Automatic trigger** - Runs when `recce-cloud upload` executes in CI
+2. **Manual trigger from UI** - Click the Data Review button in a PR/MR session
+3. **GitHub comment** - Comment `/recce` on your GitHub PR to generate a new summary
+
+## Related
+
+- [Impact Analysis](impact-analysis.md) - How the agent analyzes change scope
+- [PR/MR Data Review Summary](../7-cicd/pr-mr-summary.md) - Understanding the summary output
+- [Setup CI](../7-cicd/setup-ci.md) - Configure automated validation
diff --git a/docs/4-what-the-agent-does/impact-analysis.md b/docs/4-what-the-agent-does/impact-analysis.md
new file mode 100644
index 00000000..b523bf6d
--- /dev/null
+++ b/docs/4-what-the-agent-does/impact-analysis.md
@@ -0,0 +1,70 @@
+---
+title: Impact Analysis
+---
+
+# Impact Analysis
+
+A single column change can break dashboards, reports, and downstream models you never intended to affect. Impact analysis maps the full scope of your changes before they reach production, helping you understand exactly what will be affected.
+
+## How It Works
+
+The Recce Agent analyzes your changes at multiple levels to determine their true impact.
+
+### Lineage Analysis
+
+The agent traces dependencies through your dbt project to identify all models affected by your changes. It builds a graph of:
+
+- **Direct dependencies** - Models that reference your modified model
+- **Transitive dependencies** - Models further downstream in the lineage
+- **Column-level dependencies** - Specific columns that reference modified columns
+
+### Schema Comparison
+
+The agent compares schemas between your base and PR branches to detect:
+
+- Added columns
+- Removed columns
+- Renamed columns
+- Data type changes
+
+### Change Classification
+
+The agent categorizes each change based on its downstream impact:
+
+| Type | Description | Example |
+|------|-------------|---------|
+| **Breaking** | Affects all downstream models | Adding a filter condition, changing GROUP BY |
+| **Partial breaking** | Affects only models that reference specific modified columns | Removing or renaming a column |
+| **Non-breaking** | Does not affect downstream models | Adding a new column, formatting changes |
+
+### Downstream Effects
+
+For each modified model, the agent identifies:
+
+- Which downstream models are affected
+- Which specific columns in those models are impacted
+- Whether the impact is direct or indirect
+
+## When to Use
+
+- **Before merging any PR** - Understand the full scope of your changes
+- **During development** - Validate that changes are isolated to intended models
+- **Code review** - Help reviewers understand what will be affected
+- **Breaking change assessment** - Determine if coordination with downstream consumers is needed
+
+## Example: Column Change Impact
+
+When you modify a column like `stg_orders.status`:
+
+1. The agent identifies that `orders` model selects this column directly (partial impact)
+2. The agent detects that `customers` model uses `status` in a WHERE clause (full impact)
+3. The agent traces that `customer_segments` depends on `customers` (indirect impact)
+
+This lets you know that your seemingly simple column change affects models you may not have considered.
+
+## Related
+
+- [Impact Radius](../4-downstream-impacts/impact-radius.md) - Visualize affected models
+- [Breaking Change Analysis](../4-downstream-impacts/breaking-change-analysis.md) - Understand change types
+- [Lineage Diff](../3-visualized-change/lineage.md) - See lineage changes
+- [Column-Level Lineage](../3-visualized-change/column-level-lineage.md) - Trace column dependencies
diff --git a/docs/4-what-the-agent-does/index.md b/docs/4-what-the-agent-does/index.md
new file mode 100644
index 00000000..40625b3c
--- /dev/null
+++ b/docs/4-what-the-agent-does/index.md
@@ -0,0 +1,54 @@
+---
+title: What the Agent Does
+---
+
+# What the Recce Agent Does
+
+Data validation for pull requests is time-consuming. You need to understand what changed, identify downstream impacts, run the right checks, and communicate findings to reviewers. The Recce Agent automates this entire workflow.
+
+## How It Works
+
+The Recce Agent monitors your pull requests and acts as an automated data reviewer. When you open or update a PR that modifies dbt models, the agent:
+
+1. **Analyzes your changes** - Reads dbt artifacts and compares your branch against the base branch
+2. **Identifies impact** - Traces lineage to find all affected models and columns
+3. **Runs validation checks** - Executes schema diffs, row count comparisons, and other relevant checks
+4. **Generates insights** - Produces a data review summary with actionable findings
+5. **Posts results** - Adds the summary directly to your PR for reviewers to see
+
+This happens automatically in your CI/CD pipeline. No manual intervention required.
+
+## When to Use
+
+- **Every PR with data changes** - The agent runs automatically when dbt models are modified
+- **Complex refactoring** - When changes affect many models, the agent maps the full impact radius
+- **Critical model updates** - When validating changes to models that power dashboards or reports
+- **Team collaboration** - When reviewers need context about data changes without running Recce locally
+
+## Agent Capabilities
+
+The Recce Agent provides three core capabilities:
+
+### Automated Validation
+
+The agent determines what needs validation based on your changes and runs appropriate checks automatically. It executes schema comparisons, row count diffs, and other validation queries against your warehouse.
+
+[Learn more about Automated Validation](automated-validation.md)
+
+### Impact Analysis
+
+Before running checks, the agent analyzes your model changes to understand the scope of impact. It traces column-level lineage and categorizes changes as breaking, partial breaking, or non-breaking.
+
+[Learn more about Impact Analysis](impact-analysis.md)
+
+### Data Review Summary
+
+After validation completes, the agent generates a comprehensive summary that explains what changed, what was validated, and whether the changes are safe to merge.
+
+[Learn more about the Data Review Summary](../7-cicd/pr-mr-summary.md)
+
+## Related
+
+- [Data Developer Workflow](../3-using-recce/data-developer.md) - How developers validate changes
+- [Data Reviewer Workflow](../3-using-recce/data-reviewer.md) - How reviewers approve PRs
+- [CI/CD Getting Started](../7-cicd/ci-cd-getting-started.md) - Set up automated validation
diff --git a/docs/5-what-you-can-explore/breaking-change-analysis.md b/docs/5-what-you-can-explore/breaking-change-analysis.md
new file mode 100644
index 00000000..6ddca9ad
--- /dev/null
+++ b/docs/5-what-you-can-explore/breaking-change-analysis.md
@@ -0,0 +1,131 @@
+---
+title: Breaking Change Analysis
+---
+
+**Breaking Change Analysis** examines modified models and categorizes changes into three types:
+
+- Breaking changes
+- Partial breaking changes
+- Non-breaking changes
+
+It's generally assumed that any modification to a model’s SQL will affect all downstream models. However, not all changes have the same level of impact. For example, formatting adjustments or the addition of a new column should not break downstream dependencies. Breaking change analysis helps you assess whether a change affects downstream models and, if so, to what extent.
+
+
+## Usage
+Use the [impact radius](./impact-radius.md#usage) view to analyze changed and see the impacted downstream.
+
+## Categories of change
+### Non-breaking change
+
+No downstream models are affected. Common cases are adding new columns, comments, or formatting changes that don't alter logic.
+
+**Example: Add new columns**
+Adding a new column like status doesn't affect models that don't reference it.
+
+```diff
+select
+ user_id,
+ user_name,
+++ status,
+from
+ {{ ref("orders") }}
+
+```
+
+
+
+
+### Partial breaking change
+
+Only downstream models that reference specific columns are affected. Common cases are removing, renaming, or redefining a column.
+
+**Example: Removing a column**
+
+```diff
+select
+ user_id,
+-- status,
+ order_date,
+from
+ {{ ref("orders") }}
+```
+
+**Example: Renaming a column**
+
+```diff
+select
+ user_id,
+-- status
+++ order_status
+from
+ {{ ref("orders") }}
+```
+
+
+**Example: Redefining a column**
+```diff
+select
+ user_id,
+-- discount
+++ coalesce(discount, 0) as discount
+from
+ {{ ref("orders") }}
+```
+
+
+### Breaking change
+
+All downstream models are affected. Common case are changes adding a filter condition or adding group by columns.
+
+**Example: Adding a filter condition**
+This may reduce the number of rows, affecting all downstream logic that depends on the original row set.
+
+```diff
+select
+ user_id,
+ order_date
+from
+ {{ ref("orders") }}
+++ where status = 'completed'
+```
+
+
+**Example: Adding a GROUP BY column**
+Changes the granularity of the result set, which can break all dependent models.
+
+```diff
+select
+ user_id,
+++ order_data,
+ count(*) as total_orders
+from
+ {{ ref("orders") }}
+-- group by user_id
+++ group by user_id, order_date
+```
+
+
+## Limitations
+
+Our breaking change analysis is intentionally conservative to prioritize safety. As a result, a modified model may be classified as a breaking change when it is actually non-breaking or partial breaking changes. Common cases include:
+
+1. Logical equivalence in operations, such as changing `a + b` to `b + a`.
+1. Adding a `LEFT JOIN` to a table and selecting columns from it. This is often used to enrich the current model with additional dimension table data without affecting existing downstream tables.
+1. All modified python models or seeds are treated as breaking change.
+
+## When to Use
+
+- Determine which downstream models need validation after a change
+- Prioritize review effort based on impact severity
+- Understand if a refactor will break dependent models
+- Assess risk before merging model changes
+
+## Technology
+
+Breaking Change Analysis is powered by the SQL analysis and AST diff capabilities of [SQLGlot](https://github.com/tobymao/sqlglot) to compare two SQL semantic trees.
+
+## Related
+
+- [Impact Radius](impact-radius.md) - Visualize affected downstream models
+- [Column-Level Lineage](column-level-lineage.md) - Trace column dependencies
+- [Code Change](code-change.md) - Review the actual SQL modifications
diff --git a/docs/5-what-you-can-explore/code-change.md b/docs/5-what-you-can-explore/code-change.md
new file mode 100644
index 00000000..e61c1119
--- /dev/null
+++ b/docs/5-what-you-can-explore/code-change.md
@@ -0,0 +1,84 @@
+---
+title: Code Change
+---
+
+# Code Change
+
+The Code Change feature allows you to compare the SQL code changes between your current branch and the base branch, helping you understand exactly what has been modified in your dbt models.
+
+## Viewing Code Change
+
+When you identify a modified model in the [Lineage Diff](lineage-diff.md), you can examine the specific code changes to understand the nature of the modifications.
+
+### Opening Code Change
+
+To view the code changes for a model:
+
+1. Click on any modified (orange) model node in the lineage view
+2. In the node details panel that opens, navigate to the **Code** tab
+3. The code diff will display showing the changes between branches
+
+
+ {: .shadow}
+ Viewing code changes for a modified model
+
+
+### Understanding the Code Diff
+
+The code diff uses standard diff formatting to highlight changes:
+
+- **Red lines** (with `-` prefix) show code that was removed
+- **Green lines** (with `+` prefix) show code that was added
+- **Unchanged lines** appear in normal formatting for context
+
+This visual comparison makes it easy to identify:
+- New columns or transformations
+- Modified business logic
+- Changes to joins or filters
+- Updated column names or data types
+
+### Full Screen View
+
+For complex changes or detailed review, you can expand the code diff to full screen:
+
+1. Click the expand button in the top-right corner of the code diff panel
+2. Review the changes in the larger view for better readability
+3. Use this view when conducting thorough code reviews or sharing changes with team members
+
+
+ {: .shadow}
+ Full-screen view for detailed code review
+
+
+## Why Code Diff Matters
+
+Understanding code changes is essential for:
+
+- **Impact Assessment**: Determining if changes affect downstream models or reports
+- **Code Review**: Validating that modifications align with business requirements
+- **Collaboration**: Clearly communicating what changed to stakeholders
+- **Quality Assurance**: Ensuring changes don't introduce errors or break existing logic
+
+## Next Steps
+
+After reviewing code changes, you can:
+
+- Examine the [impact radius](impact-radius.md) to see which downstream models are affected
+- Run [data diffs](data-diffing.md) to validate that the changes produce expected results
+- Add your findings to the [collaboration checklist](../6-collaboration/checklist.md) for team review
+
+!!! tip "Best Practice"
+ Always review code changes alongside data validation checks to ensure your modifications produce the expected results and don't break downstream dependencies.
+
+## When to Use
+
+- Review what SQL logic changed before running data diffs
+- Understand the scope of a PR during code review
+- Identify which columns or joins were modified
+- Document changes for team communication
+
+## Related
+
+- [Breaking Change Analysis](breaking-change-analysis.md) - Classify impact severity
+- [Impact Radius](impact-radius.md) - See affected downstream models
+- [Data Diffing](data-diffing.md) - Validate data changes
\ No newline at end of file
diff --git a/docs/5-what-you-can-explore/column-level-lineage.md b/docs/5-what-you-can-explore/column-level-lineage.md
new file mode 100644
index 00000000..cc53e944
--- /dev/null
+++ b/docs/5-what-you-can-explore/column-level-lineage.md
@@ -0,0 +1,52 @@
+---
+title: Column-Level Lineage
+---
+
+Column-Level Lineage provides visibility into the upstream and downstream relationships of a column.
+
+Common use-cases for column-level lineage are:
+
+1. **Source Exploration**: During development, column-level lineage helps you understand how a column is derived.
+2. **Impact Analysis**: When modifying the logic of a column, column-level lineage enables you to assess the potential impact across the entire DAG.
+3. **Root Cause Analysis**: Column-level lineage helps identify the possible source of errors by tracing data lineage at the column level.
+
+## Usage
+
+1. Select a node in the lineage DAG, then click the column you want to view.
+
+ {: .shadow}
+
+1. The column-level lineage for the selected column will be displayed.
+
+ {: .shadow}
+
+1. To exit column-level lineage view, click the close button in the upper-left corner.
+
+ {: .shadow}
+
+## Transformation Types
+
+The transformation type is also displayed for each column, which will help you understand how the column was generated or modified.
+
+| Type | Description |
+|------|--------------|
+| Pass-through |The column is directly selected from the upstream table. |
+| Renamed | The column is selected from the upstream table but with a different name. |
+| Derived | The column is created through transformations applied to upstream columns, such as calculations, conditions, functions, or aggregations. |
+| Source | The column is not derived from any upstream data. It may originate from a seed/source node, literal value or data generation function. |
+| Unknown | We have no information about the transformation type. This could be due to a parse error or other unknown reason. |
+
+## When to Use
+
+- Trace where a column's data originates
+- Understand which downstream columns depend on a specific column
+- Assess the impact of modifying a column's logic
+- Debug data quality issues by following the transformation chain
+
+## Related
+
+- [Impact Radius](impact-radius.md) - See column-level impact on downstream models
+- [Breaking Change Analysis](breaking-change-analysis.md) - Classify change severity
+- [Data Diffing](data-diffing.md) - Validate column-level data changes
+
+
diff --git a/docs/5-what-you-can-explore/data-diffing.md b/docs/5-what-you-can-explore/data-diffing.md
new file mode 100644
index 00000000..1208abbb
--- /dev/null
+++ b/docs/5-what-you-can-explore/data-diffing.md
@@ -0,0 +1,256 @@
+---
+title: Data Diffing
+---
+
+# Data Diffing
+
+Data diffing validates that your model changes produce the expected results. Each diff type serves a different validation purpose, from quick row counts to detailed value comparisons.
+
+## Overview
+
+| Diff Type | Purpose | Query Cost | Best For |
+|-----------|---------|------------|----------|
+| [Row Count](#row-count-diff) | Compare record counts | Low | Quick sanity check |
+| [Profile](#profile-diff) | Column-level statistics | Medium | Distribution analysis |
+| [Value](#value-diff) | Row-by-row comparison | High | Exact match verification |
+| [Top-K](#top-k-diff) | Categorical distribution | Medium | Categorical columns |
+| [Histogram](#histogram-diff) | Numeric distribution | Medium | Numeric columns |
+| [Query](#query-diff) | Custom SQL comparison | Varies | Flexible validation |
+
+## Choosing the Right Diff
+
+A common approach is to start with lightweight checks and progressively drill down as needed. This decision tree provides a suggested workflow:
+
+```
+Start with Row Count
+ │
+ ├─ Counts match? → Profile Diff for deeper stats
+ │
+ └─ Counts differ?
+ │
+ ├─ Expected? → Document in checklist
+ │
+ └─ Unexpected? → Value Diff to find specific changes
+ │
+ └─ For specific columns:
+ • Categorical → Top-K Diff
+ • Numeric → Histogram Diff
+ • Custom logic → Query Diff
+```
+
+
+## Row Count Diff
+
+Compare the number of rows between base and current environments.
+
+**When to use:** Quick validation that filters or joins didn't unexpectedly add or remove records.
+
+### Running Row Count Diff
+
+1. Click a model in the Lineage DAG
+2. Click **Explore Change** > **Row Count Diff**
+
+
+ {: .shadow}
+ Row Count Diff for a single model
+
+
+### Interpreting Results
+
+| Result | Meaning |
+|--------|---------|
+| Count unchanged | No records added or removed |
+| Count increased | New records added (check if expected) |
+| Count decreased | Records removed (verify filters/joins) |
+
+---
+
+## Profile Diff
+
+Compare column-level statistics between environments.
+
+**When to use:** Validate that transformations didn't unexpectedly change data distributions.
+
+### Statistics Compared
+
+| Statistic | Description |
+|-----------|-------------|
+| Row count | Total records |
+| Not null % | Proportion of non-null values |
+| Distinct % | Proportion of unique values |
+| Distinct count | Number of unique values |
+| Is unique | Whether all values are unique |
+| Min / Max | Range of values |
+| Average / Median | Central tendency |
+
+### Running Profile Diff
+
+1. Select a model from the Lineage DAG
+2. Click **Explore Change** > **Profile Diff**
+
+
+ 
+ Profile Diff showing column statistics
+
+
+### Interpreting Results
+
+Look for unexpected changes in:
+
+- **Null rates** - Did a column become more/less nullable?
+- **Distinct counts** - Did cardinality change unexpectedly?
+- **Min/Max** - Did value ranges shift?
+
+---
+
+## Value Diff
+
+Compare actual values row-by-row using primary keys.
+
+**When to use:** Verify exact data matches when precision matters.
+
+### How It Works
+
+Value Diff uses primary keys to match records between environments, then compares each column value. Primary keys are auto-detected from columns with the `unique` test.
+
+
+ 
+ Value Diff showing match percentages
+
+
+### Result Columns
+
+| Column | Meaning |
+|--------|---------|
+| **Added** | New PKs in current (not in base) |
+| **Removed** | PKs in base (not in current) |
+| **Matched** | Count of matching values for common PKs |
+| **Matched %** | Percentage match for common PKs |
+
+### Viewing Mismatches
+
+Click **show mismatched values** on a column to see row-level differences:
+
+{: .shadow}
+
+---
+
+## Top-K Diff
+
+Compare the distribution of categorical columns by showing the most frequent values.
+
+**When to use:** Validate categorical data hasn't shifted unexpectedly (status codes, categories, regions).
+
+### Running Top-K Diff
+
+**Via Explore Change:**
+
+1. Select model > **Explore Change** > **Top-K Diff**
+2. Select a column
+3. Click **Execute**
+
+**Via Column Menu:**
+
+1. Hover over a column in Node Details
+2. Click **...** > **Top-K Diff**
+
+
+ {: .shadow}
+ Generate a Top-K Diff from the column menu
+
+
+### Options
+
+| Option | Description |
+|--------|-------------|
+| Top 10 | Default view |
+| Top 50 | Expanded view for more categories |
+
+
+ 
+ Top-K Diff comparing category distributions
+
+
+---
+
+## Histogram Diff
+
+Compare the distribution of numeric columns using binned histograms.
+
+**When to use:** Validate numeric distributions haven't shifted (amounts, scores, durations).
+
+### Running Histogram Diff
+
+**Via Explore Change:**
+
+1. Select model > **Explore Change** > **Histogram Diff**
+2. Select a numeric column
+3. Click **Execute**
+
+**Via Column Menu:**
+
+1. Hover over a numeric column
+2. Click **...** > **Histogram Diff**
+
+
+ {: .shadow}
+ Generate a Histogram Diff from column options
+
+
+
+ 
+ Histogram Diff showing overlaid distributions
+
+
+---
+
+## Query Diff
+
+Write custom SQL to compare any query results between environments.
+
+**When to use:** Flexible validation for complex scenarios not covered by standard diffs.
+
+### Running Query Diff
+
+1. Open the Query page
+2. Write SQL using dbt syntax:
+ ```sql
+ select * from {{ ref("mymodel") }}
+ ```
+3. Click **Run Diff**
+
+
+ 
+ Query Diff interface
+
+
+### Comparison Modes
+
+| Mode | When to Use | How It Works |
+|------|-------------|--------------|
+| **Client-side** | No primary key | Fetches first 2,000 rows, compares locally |
+| **Warehouse** | Primary key specified | Compares in warehouse, shows only differences |
+
+!!! tip "Keyboard Shortcuts (Mac)"
+ - `⌘ Enter` - Run query
+ - `⌘ ⇧ Enter` - Run query diff
+
+### Result Options
+
+| Option | Description |
+|--------|-------------|
+| **Primary Key** | Click key icon to set comparison key |
+| **Pinned Column** | Show specific columns first |
+| **Changed Only** | Hide unchanged rows and columns |
+
+
+ {: .shadow}
+ Query Diff with filtering options
+
+
+---
+
+## Related
+
+- [Lineage Diff](lineage-diff.md) - Visualize change impact
+- [Checklist](../6-collaboration/checklist.md) - Save validation results
diff --git a/docs/5-what-you-can-explore/impact-radius.md b/docs/5-what-you-can-explore/impact-radius.md
new file mode 100644
index 00000000..89cc8717
--- /dev/null
+++ b/docs/5-what-you-can-explore/impact-radius.md
@@ -0,0 +1,158 @@
+---
+title: Impact Radius
+---
+
+**Impact Radius** helps you analyze changes and identify downstream impacts at the column level.
+
+While dbt provides a similar capability using the [state selector](https://docs.getdbt.com/reference/node-selection/methods#state) with `state:modified+` to identify modified nodes and their downstream dependencies, Recce goes further. By analyzing SQL code directly, Recce enables **fine-grained impact radius analysis**. It reveals how changes to specific columns can ripple through your data pipeline, helping you prioritize which models—and even which columns—deserve closer attention.
+
+=== "Impact Radius"
+
+ {: .shadow}
+
+=== "state:modified+"
+
+ {: .shadow}
+
+
+## Usage
+
+### Show impact radius
+
+1. Click the **Impact Radius** button in the upper-left corner.
+
+ {: .shadow}
+
+1. The impact radius will be displayed.
+
+ {: .shadow}
+
+1. To exit impact radius view, click the close button in the upper-left corner.
+
+ {: .shadow}
+
+### Show impact radius for a single changed model
+
+1. Hover over a changed model, then click the **target icon** or right-click the model and click the **Show Impact Radius**
+
+ {: .shadow}
+
+1. The impact radius for this model will be displayed.
+
+ {: .shadow}
+
+1. To exit impact radius view, click the close button in the upper-left corner.
+
+ {: .shadow}
+
+## Impact Radius of a Column
+
+The **right side of the [Column-Level Lineage](column-level-lineage.md) (CLL)** graph represents the **impact radius** of a selected column.
+This view helps you quickly understand what will be affected if that column changes.
+
+### What does the impact radius include?
+
+- **Downstream columns** that directly reference the selected column
+- **Downstream models** that directly depend on the selected column
+- **All indirect downstream columns and models** that transitively depend on it
+
+This helps you evaluate both the direct and downstream effects of a column change, making it easier to understand its overall impact.
+
+
+### Example: Simplified Model Chain
+
+Given the following models, here's how changes to `stg_orders.status` would impact downstream models:
+
+```sql
+-- stg_orders.sql
+select
+ order_id,
+ customer_id,
+ status,
+ ...
+from {{ ref("raw_orders") }}
+
+
+-- orders.sql
+select
+ order_id,
+ customer_id,
+ status,
+ ...
+from {{ ref("stg_orders") }}
+
+
+-- customers.sql
+select
+ c.customer_id,
+ ...
+from {{ ref("stg_customers") }} as c
+join {{ ref("stg_orders") }} as o
+ on c.customer_id = o.customer_id
+where o.status = 'completed'
+group by c.customer_id
+
+
+-- customer_segments.sql
+select
+ customer_id,
+ ...
+from {{ ref("customers") }}
+```
+
+{: .shadow}
+
+The following impact is detected:
+
+- **orders**: This model is partially impacted, as it selects the `status` column directly from `stg_orders` but does not apply any transformation or filtering logic. The change is limited to the `status` column only.
+
+- **customers**: This model is fully impacted, because it uses `status` in a WHERE clause (`where o.status = 'completed'`). Any change to the logic in `stg_orders.status` can affect the entire output of the model.
+
+- **customer_segments**: This model is indirectly impacted, as it depends on the `customers` model, which itself is fully impacted. Even though `customer_segments` does not directly reference `status`, changes can still propagate downstream via its upstream dependency.
+
+
+
+## How it works
+
+Two core features power the impact radius analysis:
+
+**[Breaking Change Analysis](./breaking-change-analysis.md)** classifies modified models into three categories:
+
+- **Breaking changes**: Impact all downstream **models**
+- **Non-breaking changes**: Do not impact any downstream **models**
+- **Partial breaking changes**: Impact only downstream **models or columns** that depend on the modified columns
+
+**[Column-level lineage](column-level-lineage.md)** analyzes your model's SQL to identify column-level dependencies:
+
+- Which upstream **columns** are used as filters or grouping keys. If those upstream **columns** change, the current **model** is impacted.
+- Which upstream **columns** a specific column references. If those upstream **columns** change, the specific **column** is impacted.
+
+## Putting It Together
+
+With the insights from the two features above, Recce determines the impact radius:
+
+1. If a model has a **[breaking change](breaking-change-analysis.md#breaking-change)**, include all downstream models in the impact radius.
+2. If a model has a **[non-breaking change](breaking-change-analysis.md#non-breaking-change)**, include only the downstream columns and models of newly added columns.
+3. If a model has a **[partial breaking change](breaking-change-analysis.md#partial-breaking-change)**, include the downstream columns and models of added, removed, or modified columns.
+
+## When to Use
+
+- Identify which downstream models need validation after a change
+- Prioritize data diff effort based on actual impact
+- Assess risk before merging model modifications
+- Understand the blast radius of column-level changes
+
+## Related
+
+- [Breaking Change Analysis](breaking-change-analysis.md) - Understand how changes are classified
+- [Column-Level Lineage](column-level-lineage.md) - Trace column dependencies
+- [Data Diffing](data-diffing.md) - Validate data changes in impacted models
+
+
+
+
+
+
+
+
+
diff --git a/docs/5-what-you-can-explore/lineage-diff.md b/docs/5-what-you-can-explore/lineage-diff.md
new file mode 100644
index 00000000..f1a16453
--- /dev/null
+++ b/docs/5-what-you-can-explore/lineage-diff.md
@@ -0,0 +1,141 @@
+---
+title: Lineage Diff
+---
+
+# Lineage Diff
+
+The Lineage view shows how your data model changes impact your data pipeline. It visualizes the potential area of impact from your modifications, helping you determine which models need further investigation.
+
+## How It Works
+
+Recce compares your base and current branch artifacts to identify:
+
+- **Dependencies** - Which models depend on others
+- **Change Impact** - How modifications ripple through your pipeline
+- **Data Flow** - The path data takes from sources to final outputs
+
+
+ {: .shadow}
+ Interactive lineage graph showing modified models
+
+
+### Visual Status Indicators
+
+Models are color-coded to indicate their status:
+
+| Color | Status |
+|-------|--------|
+| **Green** | Added (new to your project) |
+| **Red** | Removed (deleted from your project) |
+| **Orange** | Modified (changed code or configuration) |
+| **Gray** | Unchanged (shown for context) |
+
+
+ {: .shadow}
+ Model node with status indicators
+
+
+### Change Detection Icons
+
+Each model displays icons in the bottom-right corner:
+
+- **Row Count Icon** - Shows when row count differences are detected
+- **Schema Icon** - Shows when column or data type changes are detected
+
+Grayed-out icons indicate no changes were detected.
+
+
+ {: .shadow}
+ Model with schema change detected
+
+
+## Filtering and Selection
+
+### Filter Options
+
+In the top control bar:
+
+| Filter | Description |
+|--------|-------------|
+| **Mode** | Changed Models (modified + downstream) or All |
+| **Package** | Filter by dbt package names |
+| **Select** | Select nodes by [node selection](multi-models.md) |
+| **Exclude** | Exclude nodes by [node selection](multi-models.md) |
+
+### Selecting Models
+
+Click a node to select it, or use **Select nodes** to select multiple models for batch operations.
+
+### Row Count Diff by Selector
+
+Run row count diff on selected nodes:
+
+1. Use `select` and `exclude` to filter nodes
+2. Click the **...** button in the top-right corner
+3. Click **Row Count Diff by Selector**
+
+{: .shadow}
+
+## Investigating Changes
+
+### Node Details Panel
+
+Click any model to open the node details panel:
+
+
+ {: .shadow}
+ Open the node details panel
+
+
+From this panel you can:
+
+- View model metadata (type, materialization)
+- Examine schema changes
+- Run validation checks
+- Add findings to your checklist
+
+### Available Validations
+
+Click **Explore Change** to access:
+
+- [Row Count Diff](data-diffing.md#row-count-diff) - Compare record counts
+- [Profile Diff](data-diffing.md#profile-diff) - Analyze column statistics
+- [Value Diff](data-diffing.md#value-diff) - Identify specific value changes
+- [Top-K Diff](data-diffing.md#top-k-diff) - Compare common values
+- [Histogram Diff](data-diffing.md#histogram-diff) - Visualize distributions
+
+
+ {: .shadow}
+ Node details with exploration options
+
+
+## Schema Diff
+
+Schema diff identifies structural changes to your models:
+
+- **Added columns** - New fields (shown in green)
+- **Removed columns** - Deleted fields (shown in red)
+- **Renamed columns** - Changed names (shown with arrows)
+- **Data type changes** - Modified column types
+
+
+ {: .shadow}
+ Interactive schema diff showing column changes
+
+
+!!! warning "Requirements"
+ Schema diff requires `catalog.json` in both environments. Run `dbt docs generate` in both before starting your Recce session.
+
+## When to Use
+
+- **Starting your review** - Get an overview of all changes and their downstream impact
+- **Identifying affected models** - Find models that need validation
+- **Understanding dependencies** - See how changes propagate through your pipeline
+- **Scoping your validation** - Determine which models to diff
+
+## Related
+
+- [Code Change](code-change.md) - View SQL changes for a model
+- [Column-Level Lineage](column-level-lineage.md) - Trace column dependencies
+- [Multi-Model Selection](multi-models.md) - Batch operations on models
+- [Data Diffing](data-diffing.md) - Validate data changes
diff --git a/docs/5-what-you-can-explore/multi-models.md b/docs/5-what-you-can-explore/multi-models.md
new file mode 100644
index 00000000..541bc2b4
--- /dev/null
+++ b/docs/5-what-you-can-explore/multi-models.md
@@ -0,0 +1,115 @@
+---
+title: Multi-Models
+---
+
+
+## Multi-Models Selection
+
+Multiple models can be selected in the Lineage DAG. This enables actions to be performed on multiple models at the same time such as Row Count Diff, or Value Diff.
+
+### Select Models Individually
+
+To select multiple models individually, click the checkbox on the models you wish to select.
+
+
+ {: .shadow}
+ Select multiple models individually
+
+
+### Select Parent or Child models
+
+To select a node and all of its parents or children:
+
+1. Click the checkbox on the node
+2. Right-click the node
+3. Click to select either parent or child models
+
+
+ {: .shadow}
+ Select a node and its parents or children
+
+
+### Perform actions on multiple models
+
+After selecting the desired models, use the Actions menu at the top right of the screen to perform diffs or add checks.
+
+
+ {: .shadow}
+ Perform actions on multiple models
+
+
+### Example - Row Count Diff
+
+An example of selecting multiple models to perform a multi-node row count diff:
+
+
+ {: .shadow}
+ Perform a Row Count Diff on multiple models
+
+
+### Example - Value Diff
+
+An example of selecting multiple models to perform a multi-node Value Diff:
+
+
+ {: .shadow}
+ Perform a Value Diff on multiple models
+
+
+
+### Schema and Lineage Diff
+
+From the Lineage DAG, click the Actions dropdown menu and click Lineage Diff or Schema Diff from the Add to Checklist section. This will add:
+
+- Lineage Diff: The current Lineage view, dependent on your [model selection](lineage-diff.md#selecting-models) options.
+- Schema Diff: A diff of all models if none are selected, or [specific selected models](#multi-models-selection).
+
+
+ {: .shadow}
+ Add a Lineage Diff Check or Schema Check via the Actions dropdown menu
+
+
+
+
+Recce supports dbt [node selection](https://docs.getdbt.com/reference/node-selection/syntax) in the [lineage diff](lineage-diff.md). This enables you to target specific resources with data checks by selecting or excluding models.
+
+## Supported syntax and methods
+
+Since Recce uses dbt's built-in node selector, it supports most of the selecting methods. Here are some examples:
+
+- Select a node: `my_model`
+- select by tag: `tag:nightly`
+- Select by wildcard: `customer*`
+- Select by graph operators: `my_model+`, `+my_model`, `+my_model`, `1+my_model+`
+- Select by union: `model1 model2`
+- Select by intersection: `stg_invoices+,stg_accounts+`
+- Select by state: `state:modified`, `state:modified+`
+
+
+### Use `state` method
+
+In dbt, you need to specify the `--state` option in the CLI. In Recce we use the base environment as the state, allowing you to use the selector on the fly.
+
+
+### Removed models
+Another difference is that in dbt, you cannot select removed models. However, in Recce, you can select removed models and also find them using the graph operator. This is a notable distinction from dbt's node selection capabilities.
+
+
+## Limitation
+
+- ["result" method](https://docs.getdbt.com/reference/node-selection/syntax#the-result-status) not supported
+- ["source_status" method](https://docs.getdbt.com/reference/node-selection/syntax#the-source_status-status) not supported.
+- [YAML selectors](https://docs.getdbt.com/reference/node-selection/yaml-selectors) not supported.
+
+## When to Use
+
+- Run Row Count Diff across multiple related models at once
+- Perform bulk Value Diff on a set of staging models
+- Validate all models in a specific path or tag
+- Compare schema changes across an entire model group
+
+## Related
+
+- [Lineage Diff](lineage-diff.md) - Visualize model dependencies and changes
+- [Data Diffing](data-diffing.md) - Run diffs on selected models
+- [Impact Radius](impact-radius.md) - See downstream effects of changes
diff --git a/docs/6-collaboration/checklist.md b/docs/6-collaboration/checklist.md
index 76386604..172bb2ac 100644
--- a/docs/6-collaboration/checklist.md
+++ b/docs/6-collaboration/checklist.md
@@ -2,42 +2,51 @@
title: Checklist
---
-## What's Checklist
+# Checklist
-Save your validation checks to the Recce checklist with a description of your findings.
+Save validation checks to track your findings and share them with reviewers. The checklist becomes your proof-of-correctness for modeling changes.
-These checks can later be added to your pull request comment as proof-of-correctness for your modeling changes.
+## How It Works
+
+When you run a diff or query in Recce, you can add the result to your checklist. Each check captures:
+
+- The validation type (schema diff, row count, query, etc.)
+- The result at the time of capture
+- Your notes explaining what the result means

- Checklist
+ Checklist with saved validation checks
+### Adding Checks
-## Diffs performed via the Explore Change dropdown menu
-
-For the majority of diffs, which are performed via the Explore Change dropdown menu, the Check can be added by clicking the Add to Checklist button in the results panel:
+For diffs performed via the Explore Change dropdown menu, click **Add to Checklist** in the results panel:
{: .shadow}
- Add a Check by clicking the Add to Checklist button in the diff results panel
+ Add to Checklist button in diff results panel
-An example performing a Top-K diff and adding the results to the Checklist:
-
{: .shadow}
- Example adding a Top-K Diff to the Checklist
+ Example: Adding a Top-K Diff to the Checklist
-## Add to Checklist
+### Re-running Checks
+
+After making additional changes to your models, re-run checks from the checklist to verify your updates. This lets you iterate until all validations pass.
+
+For checks you want to run on every PR automatically, see [Preset Checks](preset-checks.md).
-The Recce Checklist provides a way to record the results of a data check during change exploration. The purpose of adding Checks to the Checklist is to enable you to:
+## When to Use
-- Save Checks with notes of your interpretation of the data
-- Re-run checks following further data modeling changes
-- Share Checks as part of PR or stakeholder review
+- **During development** - Save checks as you validate each change, building evidence as you go
+- **Before creating a PR** - Compile all validations that prove your changes are correct
+- **For recurring validations** - Use [Preset Checks](preset-checks.md) to automate checks that should run on every PR
+- **Stakeholder review** - [Share](share.md) your checklist to give reviewers full context
-## Preset Check
+## Related
-Preset checks can be the fixed checks that are generated every time a new Recce instance is initiated.
\ No newline at end of file
+- [Preset Checks](preset-checks.md) - Automate recurring validation checks
+- [Share](share.md) - Share your checklist with reviewers
diff --git a/docs/6-collaboration/invitation.md b/docs/6-collaboration/invitation.md
deleted file mode 100644
index bf108754..00000000
--- a/docs/6-collaboration/invitation.md
+++ /dev/null
@@ -1,57 +0,0 @@
----
-title: Invitation
----
-
-## Inviting Team Members to Your Recce Organization
-
-To collaborate effectively within Recce Cloud, you can invite team members to join your organization. Follow these steps to send invitations:
-
-### Step 1: Access Organization Settings
-- Log in to your Recce Cloud account
-- Navigate to **Settings** → **Organization** from the side panel
-- Alternatively, you can access directly via: `https://cloud.reccehq.com/settings#organization`
-- In the Organization Settings section, select your desired organization
-
-{: .shadow}
-
-### Step 2: Invite Members
-
-!!! Note
- Please use the SSO email address if your member uses SSO login.
-
-- In the **Members** section, click the **Invite Members** button
-- Enter the email addresses of the individuals you wish to invite
-- Select the appropriate role for each invitee based on the roles below:
-
-#### Organization Roles
-
-| Role | Key Responsibilities | Permissions |
-|------|---------------------|-------------|
-| **ADMIN** | Full organization management | • Update organization info • Manage member roles • Remove members • Transfer storage regions |
-| **MEMBER** | Upload metadata and launch instances | • Upload metadata • Launch Recce instances • View organization info and member list • Leave organization |
-| **VIEWER** | Only instance launch | • Launch Recce instances • View organization info and member list • Leave organization |
-
-{: .shadow}
-
-### Step 3: Send Invitation
-- Click the **Send Invitation** button to dispatch the invites
-- Each invitee will receive an email with a link to join your organization
-- Logged-in invitees will also see notifications on their home page or can view pending invitations in **Settings** → **Organization**
-
-## For Invited Users
-
-When you receive an invitation to join a Recce organization, you have several ways to respond:
-
-### Immediate Response
-- Upon login, you'll see a notification modal with the invitations
-- You can immediately accept or decline the invitations directly from the notification without navigating elsewhere
-
-{: .shadow}
-
-### Managing Invitations Later
-- Navigate to **Settings** → **Organization** in your account
-- View all pending invitations in the "Pending Invitations" section
-- Review the organization and role
-- Accept or decline each invitation as needed
-
-{: .shadow}
diff --git a/docs/6-collaboration/preset-checks.md b/docs/6-collaboration/preset-checks.md
new file mode 100644
index 00000000..5f17c806
--- /dev/null
+++ b/docs/6-collaboration/preset-checks.md
@@ -0,0 +1,143 @@
+---
+title: Preset Checks
+---
+
+# Preset Checks
+
+Define validation checks that run automatically for every PR. Preset checks ensure consistent validation across your team.
+
+**Goal:** Configure recurring checks that execute automatically when Recce runs.
+
+## Prerequisites
+
+- [x] Recce Cloud account or Recce installed in your dbt project
+- [x] At least one validation check you want to automate
+
+## Recce Cloud
+
+Create preset checks directly in the Recce Cloud interface. When a PR is created, preset checks run automatically.
+
+### From the checklist
+
+Mark any existing check as a preset check:
+
+1. Run a diff or query in your Recce session
+2. Add the result to your checklist
+3. Open the check menu and select **Mark as Preset Check**
+
+{: .shadow}
+
+### From project settings
+
+Create preset checks directly in your project configuration:
+
+1. Navigate to your project's **Preset Checks** page
+2. Click **Add Preset Check**
+3. Configure the check type and parameters
+
+{: .shadow}
+
+{: .shadow}
+
+When preset checks are configured, they run automatically each time a PR is created.
+
+## Recce OSS
+
+For local Recce, configure preset checks in `recce.yml` and run them manually or in CI.
+
+### Configure in recce.yml
+
+1. Start by adding a check to your checklist manually:
+
+ 1. Run a diff or query in Recce
+ 2. Add the result to your checklist
+
+ {: .shadow}
+
+ 3. Open the check menu and select **Get Preset Check Template**
+ 4. Copy the YAML config from the dialog
+
+ {: .shadow}
+
+2. Paste the config into `recce.yml` at your project root:
+
+ ```yaml
+ # recce.yml
+ checks:
+ - name: Query diff of customers
+ description: |
+ This is the demo preset check.
+
+ Please run the query and paste the screenshot to the PR comment.
+ type: query_diff
+ params:
+ sql_template: select * from {{ ref("customers") }}
+ view_options:
+ primary_keys:
+ - customer_id
+ ```
+
+### Run preset checks
+
+#### In Recce server
+
+When you launch Recce, preset checks appear in your checklist automatically (but not yet executed):
+
+{: .shadow}
+
+Click **Run Query** to execute each check.
+
+#### With recce run
+
+Execute all preset checks from the command line:
+
+```bash
+recce run
+```
+
+Output:
+```
+───────────────────────────────── DBT Artifacts ─────────────────────────────────
+Base:
+ Manifest: 2024-04-10 08:54:41.546402+00:00
+ Catalog: 2024-04-10 08:54:42.251611+00:00
+Current:
+ Manifest: 2024-04-22 03:24:11.262489+00:00
+ Catalog: 2024-04-10 06:15:13.813125+00:00
+───────────────────────────────── Preset checks ─────────────────────────────────
+ Recce Preset Checks
+──────────────────────────────────────────────────────────────────────────────
+Status Name Type Execution Time Failed Reason
+──────────────────────────────────────────────────────────────────────────────
+[Success] Query of customers Query Diff 0.10 seconds N/A
+──────────────────────────────────────────────────────────────────────────────
+The state file is stored at [recce_state.json]
+```
+
+View results by launching the server with the state file:
+
+```bash
+recce server recce_state.json
+```
+
+### Verification
+
+Confirm preset checks work:
+
+1. Add a check config to `recce.yml`
+2. Run `recce run`
+3. Verify the check appears in output with `[Success]` status
+4. Launch `recce server recce_state.json` and confirm the check appears in your checklist
+
+### Troubleshooting
+
+| Issue | Solution |
+|-------|----------|
+| Check not appearing | Verify `recce.yml` is in project root and YAML syntax is valid |
+| Check fails to run | Check that the SQL template references valid models |
+| Wrong results | Ensure base and current artifacts are up to date |
+
+## Related
+
+- [Checklist](checklist.md) - Manually add checks during development
+- [Configuration](../8-technical-concepts/configuration.md) - Full recce.yml reference
diff --git a/docs/6-collaboration/share.md b/docs/6-collaboration/share.md
index 6640e3ff..90f4cd01 100644
--- a/docs/6-collaboration/share.md
+++ b/docs/6-collaboration/share.md
@@ -2,152 +2,126 @@
title: Share
---
-## Share Recce Sessions Securely
+# Share
-Recce provides two secure methods to share your validation results with team members and stakeholders, ensuring everyone can access the insights they need for informed decision-making.
+Share your validation results with team members and stakeholders.
-## Sharing Methods Overview
+**Goal:** Give reviewers access to your Recce session so they can explore validation results.
-
- {: .shadow}
- Access sharing options from the Share button
-
-Choose the sharing method that best fits your collaboration needs:
+## Recce Cloud
+
+Share your session by copying the URL directly from your browser. Team members with organization access can view any session immediately.
-1. **Copy to Clipboard** - Quick screenshot sharing for PR comments and discussions
-2. **Recce Cloud Sharing** - Full interactive session sharing with complete context
+To invite team members to your organization, see [Admin Setup](../3-using-recce/admin-setup.md#5-invite-team-members).
-## Method 1: Copy to Clipboard
+## Recce OSS
-For quick sharing of specific results, use the **Copy to Clipboard** button found in diff results. This feature captures a screenshot image that you can paste directly into PR comments, Slack messages, or other communication channels.
+For local Recce sessions, use these sharing methods:
+
+| Method | Best For | Requires |
+|--------|----------|----------|
+| **Copy to Clipboard** | Quick screenshots in PR comments | Nothing |
+| **Upload to Recce Cloud** | Full interactive session access | Recce Cloud account |
+
+### Copy to Clipboard
+
+For quick sharing of specific results, use **Copy to Clipboard** in any diff result. Paste the screenshot directly into PR comments, Slack, or other channels.
{: .shadow}
- Copy a diff result screenshot to the clipboard and paste to GitHub
+ Copy diff result and paste to GitHub
!!! note "Browser Compatibility"
- Firefox does not support copying images to the clipboard. Instead, Recce displays a modal where you can download the image locally or right-click to copy the image.
+ Firefox does not support copying images to the clipboard. Recce displays a modal where you can download or right-click to copy the image.
-## Method 2: Recce Cloud Sharing
+### Upload to Recce Cloud
-When stakeholders need full context but don't have the environment to run Recce locally, use Recce Cloud sharing. This method creates a read-only link that provides complete access to your validation results.
+When reviewers need full context, upload your session to Recce Cloud. This creates a shareable link with complete access to your validation results.
-### Benefits of Recce Cloud Sharing
+**Benefits:**
-- **No Setup Required** - Stakeholders access results instantly in their browser
-- **Full Context** - Complete lineage exploration, query results, and validation checklists
-- **Read-Only Access** - Secure viewing without ability to modify your work
-- **Simple Link Sharing** - Share via any communication channel
+- No setup required for viewers
+- Full lineage exploration, query results, and checklists
+- Read-only access (secure viewing)
+- Simple link sharing via any channel
!!! warning "Access Control"
- Anyone with the shared link can view your Recce session after signing into Recce Cloud. For restricted access requirements, [contact our team](https://cal.com/team/recce/chat).
-
-## Setting Up Recce Cloud Sharing
+ Anyone with the link can view your session after signing into Recce Cloud. For restricted access, [contact our team](https://cal.com/team/recce/chat).
-The first time you share via Recce Cloud, you'll need to associate your local Recce with your cloud account. This one-time setup enables secure hosting of your state files.
-### Step 1: Enable Recce Cloud Connection
+#### First-time setup
-Launch the Recce server and click the **Use Recce Cloud** button if your local installation isn't already connected to Recce Cloud.
+1. Launch Recce server and click **Use Recce Cloud** if not already connected
-{: .shadow}
+ {: .shadow}
-### Step 2: Sign In and Grant Access
+2. Sign in and authorize your local Recce to connect with Recce Cloud
-After successful login, authorize your local Recce to connect with Recce Cloud. This authorization enables the sharing functionality and secure state file hosting.
+ {: .shadow}
-{: .shadow}
+3. Refresh the page to activate the connection. The **Share** button is now available.
-### Step 3: Complete the Setup
+ {: .shadow}
-Refresh the Recce page to activate the cloud connection. Once connected, the **Share** button will be available, allowing you to generate shareable links.
-
-{: .shadow}
-
-!!! tip "Alternative Setup Method"
- You can also connect to Recce Cloud using the command line:
-
+!!! tip "Alternative: CLI Setup"
```bash
recce connect-to-cloud
```
-
- This command handles the sign-in and authorization process directly from your terminal.
-
-
-## Manual Configuration (Advanced)
-
-For containerized environments or when you prefer manual setup, you can configure the Recce Cloud connection directly using your API token.
-### Step 1: Retrieve Your API Token
+#### Manual configuration (advanced)
-Sign in to Recce Cloud and copy your API token from the [personal settings page](https://cloud.reccehq.com/settings#tokens).
+For containerized environments or manual setup:
-{: .shadow}
+1. Get your API token from [Recce Cloud settings](https://cloud.reccehq.com/settings#tokens)
-### Step 2: Configure Local Connection
+ {: .shadow}
-Choose one of the following methods to configure your local Recce:
+2. Configure using one of these methods:
-#### Option A: Command Line Flag
-
-Launch Recce server with your API token. The token will be saved to your profile for future use:
-
-```bash
-recce server --api-token
-```
-
-#### Option B: Profile Configuration
-
-Edit your `~/.recce/profile.yml` file to include the API token:
-
-```yaml
-api_token:
-```
-
-!!! info "Configuration File Location"
- **Mac/Linux:**
- ```shell
- cd ~/.recce
- ```
-
- **Windows:**
- ```powershell
- cd ~\.recce
+ **Option A: Command line flag**
+ ```bash
+ recce server --api-token
```
-
- Navigate to `C:\Users\\.recce` or use the PowerShell command above.
+ **Option B: Profile configuration**
+ ```yaml
+ # ~/.recce/profile.yml
+ api_token:
+ ```
-## Command Line Sharing
+#### Share from UI or CLI
-For automated workflows or when working with existing state files, use the `recce share` command to generate shareable links directly from the terminal.
+**From UI:** Click the **Share** button and select Recce Cloud.
-### Basic Sharing
+
+ {: .shadow}
+ Access sharing options from the Share button
+
-If your Recce is already connected to Recce Cloud:
+**From CLI:** Share existing state files directly from the terminal:
```bash
+# If already connected to Recce Cloud
recce share
-```
-### Sharing with API Token
-
-For environments where Recce isn't pre-configured with cloud access:
-
-```bash
+# With API token
recce share --api-token
```
{: .shadow}
-## Security Best Practices
+## Verification
+
+Confirm sharing works:
-When sharing Recce sessions, consider these security guidelines:
+1. Add a check to your checklist
+2. Share via your preferred method (URL for Cloud, Share button for OSS)
+3. Open the link in an incognito window
+4. Verify you can view the session
-- **Review Content**: Ensure shared sessions don't contain sensitive data before generating links
-- **Access Control**: Be aware that anyone with the link can view your session after signing in
-- **Token Security**: Keep your API tokens secure and rotate them periodically
-- **Team Communication**: Share links through secure channels when possible
+## Related
-For additional security requirements or enterprise features, [contact our team](https://cal.com/team/recce/chat) to discuss custom access controls.
+- [Admin Setup](../3-using-recce/admin-setup.md) - Invite team members to your organization
+- [Checklist](checklist.md) - Save validation checks to share
+- [Preset Checks](preset-checks.md) - Automate recurring checks
diff --git a/docs/7-cicd/ci-cd-getting-started.md b/docs/7-cicd/ci-cd-getting-started.md
index eda39817..1bfda5c3 100644
--- a/docs/7-cicd/ci-cd-getting-started.md
+++ b/docs/7-cicd/ci-cd-getting-started.md
@@ -65,14 +65,14 @@ Before setting up, ensure you have:
Both GitHub and GitLab follow the same simple pattern:
### 1. Setup CD - Auto-update baseline
-[**Setup CD Guide**](./setup-cd.md) - Configure automatic baseline updates when you merge to main
+[**Setup CD Guide**](../2-getting-started/setup-cd.md) - Configure automatic baseline updates when you merge to main
- Updates your production baseline artifacts automatically
- Runs on merge to main + optional scheduled updates
- Works with both GitHub Actions and GitLab CI/CD
### 2. Setup CI - Auto-validate PRs/MRs
-[**Setup CI Guide**](./setup-ci.md) - Enable automatic validation for every PR/MR
+[**Setup CI Guide**](../2-getting-started/setup-ci.md) - Enable automatic validation for every PR/MR
- Validates data changes in every pull request or merge request
- Catches issues before they reach production
@@ -84,9 +84,9 @@ Start with **CD first** to establish your baseline (production artifacts), then
## Next Steps
-1. **[Setup CD](./setup-cd.md)** - Establish automatic baseline updates
-2. **[Setup CI](./setup-ci.md)** - Enable PR/MR validation
-3. Review [best practices](./best-practices-prep-env.md) for environment preparation
+1. **[Setup CD](../2-getting-started/setup-cd.md)** - Establish automatic baseline updates
+2. **[Setup CI](../2-getting-started/setup-ci.md)** - Enable PR/MR validation
+3. Review [Environment Best Practices](../2-getting-started/environment-best-practices.md) for environment preparation
## Related workflows
diff --git a/docs/7-reference/cli-reference.md b/docs/7-reference/cli-reference.md
new file mode 100644
index 00000000..8a22fdac
--- /dev/null
+++ b/docs/7-reference/cli-reference.md
@@ -0,0 +1,296 @@
+---
+title: CLI Reference
+---
+
+# CLI Reference
+
+This reference documents the command-line interfaces for Recce OSS (`recce`) and Recce Cloud (`recce-cloud`).
+
+## Overview
+
+Recce provides two CLI tools:
+
+- **`recce`** - The open source CLI for local data validation and diffing
+- **`recce-cloud`** - The cloud CLI for uploading artifacts to Recce Cloud in CI/CD workflows
+
+## recce Commands
+
+### recce server
+
+Starts the Recce web server for interactive data validation.
+
+**Syntax:**
+
+```bash
+recce server [OPTIONS] [STATE_FILE]
+```
+
+**Arguments:**
+
+| Argument | Description |
+|----------|-------------|
+| `STATE_FILE` | Optional path to a state file. If specified and exists, loads the state. If specified and does not exist, creates a new state file at that path. |
+
+**Options:**
+
+| Option | Description |
+|--------|-------------|
+| `--review` | Enable review mode. Uses dbt artifacts from the state file instead of `target/` and `target-base/` directories. |
+| `--api-token ` | API token for Recce Cloud connection. |
+
+**Examples:**
+
+Start server with default settings:
+
+```bash
+recce server
+```
+
+Start server with a state file:
+
+```bash
+recce server my_recce_state.json
+```
+
+Start server in review mode (uses artifacts from state file):
+
+```bash
+recce server --review my_recce_state.json
+```
+
+Start server with Recce Cloud connection:
+
+```bash
+recce server --api-token
+```
+
+**Notes:**
+
+- The server runs on `http://localhost:8000` by default
+- Requires dbt artifacts in `target/` (current) and `target-base/` (base) directories unless using `--review` mode
+- State is auto-saved when the Save button is clicked in the UI
+
+### recce run
+
+Executes preset checks and saves results to a state file.
+
+**Syntax:**
+
+```bash
+recce run [OPTIONS]
+```
+
+**Options:**
+
+| Option | Description |
+|--------|-------------|
+| `--state-file ` | Path to state file. Default: `recce_state.json` |
+| `--github-pull-request-url ` | GitHub PR URL for CI context |
+
+**Examples:**
+
+Run all preset checks:
+
+```bash
+recce run
+```
+
+Run checks and save to specific state file:
+
+```bash
+recce run --state-file my_state.json
+```
+
+Run checks with GitHub PR context:
+
+```bash
+recce run --github-pull-request-url ${{ github.event.pull_request.html_url }}
+```
+
+**Notes:**
+
+- Executes all checks defined in `recce.yml`
+- Outputs results to the state file (default: `recce_state.json`)
+- Used primarily in CI/CD pipelines for automated validation
+
+### recce summary
+
+Generates a summary report from a state file.
+
+**Syntax:**
+
+```bash
+recce summary
+```
+
+**Arguments:**
+
+| Argument | Description |
+|----------|-------------|
+| `STATE_FILE` | Path to the state file to summarize |
+
+**Examples:**
+
+Generate summary from state file:
+
+```bash
+recce summary recce_state.json
+```
+
+Generate summary and save to file:
+
+```bash
+recce summary recce_state.json > recce_summary.md
+```
+
+**Notes:**
+
+- Outputs summary in Markdown format
+- Useful for generating PR comments in CI/CD workflows
+
+### recce debug
+
+Verifies Recce configuration and environment setup.
+
+**Syntax:**
+
+```bash
+recce debug
+```
+
+**Examples:**
+
+```bash
+recce debug
+```
+
+**Notes:**
+
+- Checks for required artifacts in `target/` and `target-base/` directories
+- Verifies warehouse connection
+- Useful for troubleshooting setup issues before launching the server
+
+## recce-cloud Commands
+
+The `recce-cloud` CLI is a lightweight tool for uploading dbt artifacts to Recce Cloud in CI/CD pipelines.
+
+### Installation
+
+```bash
+pip install recce-cloud
+```
+
+### recce-cloud upload
+
+Uploads dbt artifacts to Recce Cloud.
+
+**Syntax:**
+
+```bash
+recce-cloud upload [OPTIONS]
+```
+
+**Options:**
+
+| Option | Description |
+|--------|-------------|
+| `--type ` | Session type: `prod` for baseline, omit for PR/MR auto-detection |
+| `--target-path ` | Path to dbt artifacts directory. Default: `target/` |
+| `--dry-run` | Test configuration without uploading |
+
+**Examples:**
+
+Upload baseline artifacts (for CD workflow):
+
+```bash
+recce-cloud upload --type prod
+```
+
+Upload PR/MR artifacts (auto-detected):
+
+```bash
+recce-cloud upload
+```
+
+Upload from custom artifact path:
+
+```bash
+recce-cloud upload --target-path custom-target
+```
+
+Test configuration without uploading:
+
+```bash
+recce-cloud upload --dry-run
+```
+
+**Notes:**
+
+- Automatically detects CI platform (GitHub Actions, GitLab CI)
+- Uses `GITHUB_TOKEN` for GitHub authentication
+- Uses `CI_JOB_TOKEN` for GitLab authentication
+- Session type is auto-detected from PR/MR context when `--type` is omitted
+
+**Environment Variables:**
+
+| Platform | Variable | Description |
+|----------|----------|-------------|
+| GitHub | `GITHUB_TOKEN` | Authentication token (automatically available in Actions) |
+| GitLab | `CI_JOB_TOKEN` | Authentication token (automatically available in CI/CD) |
+
+### Expected Output
+
+Successful upload displays:
+
+```
+─────────────────────────── CI Environment Detection ───────────────────────────
+Platform: github-actions
+Session Type: prod
+Commit SHA: abc123de...
+Source Branch: main
+Repository: your-org/your-repo
+Info: Using GITHUB_TOKEN for platform-specific authentication
+────────────────────────── Creating/touching session ───────────────────────────
+Session ID: f8b0f7ca-ea59-411d-abd8-88b80b9f87ad
+Uploading manifest from path "target/manifest.json"
+Uploading catalog from path "target/catalog.json"
+Notifying upload completion...
+──────────────────────────── Uploaded Successfully ─────────────────────────────
+Uploaded dbt artifacts to Recce Cloud for session ID "f8b0f7ca-ea59-411d-abd8-88b80b9f87ad"
+```
+
+## Common Workflows
+
+### Local Development
+
+```bash
+# Start interactive session
+recce server
+
+# Or continue from saved state
+recce server my_state.json
+```
+
+### CI/CD Pipeline
+
+```bash
+# CD: Update baseline after merge to main
+recce-cloud upload --type prod
+
+# CI: Upload PR artifacts for validation
+recce-cloud upload
+```
+
+### Review Workflow
+
+```bash
+# Reviewer loads state file in review mode
+recce server --review recce_state.json
+```
+
+## Related
+
+- [Configuration](./configuration.md) - Preset check configuration in `recce.yml`
+- [State File](./state-file.md) - State file format and usage
+- [Setup CI](../7-cicd/setup-ci.md) - CI/CD integration guide
+- [Setup CD](../7-cicd/setup-cd.md) - CD workflow setup
diff --git a/docs/7-reference/configuration.md b/docs/7-reference/configuration.md
new file mode 100644
index 00000000..157453b0
--- /dev/null
+++ b/docs/7-reference/configuration.md
@@ -0,0 +1,353 @@
+---
+title: Configuration
+---
+
+# Configuration
+
+This reference documents the `recce.yml` configuration file, which defines preset checks and their parameters for automated data validation.
+
+## Overview
+
+The config file for Recce is located in `recce.yml` in your dbt project root. Use this file to define preset checks that run automatically with `recce server` or `recce run`.
+
+## File Location
+
+| Path | Description |
+|------|-------------|
+| `recce.yml` | Main configuration file in dbt project root |
+
+## Preset Checks
+
+Preset checks define automated validations that execute when you run `recce server` or `recce run`. Each check specifies a type of comparison and its parameters.
+
+### Check Structure
+
+```yaml
+# recce.yml
+checks:
+ - name: Query diff of customers
+ description: |
+ This is the demo preset check.
+
+ Please run the query and paste the screenshot to the PR comment.
+ type: query_diff
+ params:
+ sql_template: select * from {{ ref("customers") }}
+ view_options:
+ primary_keys:
+ - customer_id
+```
+
+### Check Fields
+
+| Field | Description | Type | Required |
+|-------|-------------|------|----------|
+| `name` | The title of the check | string | Yes |
+| `description` | The description of the check | string | |
+| `type` | The type of the check (see types below) | string | Yes |
+| `params` | The parameters for running the check | object | Yes |
+| `view_options` | The options for presenting the run result | object | |
+
+## Check Types
+
+### Row Count Diff
+
+Compares row counts between base and current environments.
+
+**Type:** `row_count_diff`
+
+**Parameters:**
+
+| Field | Description | Type | Required |
+|-------|-------------|------|----------|
+| `node_names` | List of node names | `string[]` | *1 |
+| `node_ids` | List of node IDs | `string[]` | *1 |
+| `select` | Node selection syntax. See [dbt docs](https://docs.getdbt.com/reference/node-selection/syntax) | `string` | |
+| `exclude` | Node exclusion syntax. See [dbt docs](https://docs.getdbt.com/reference/node-selection/syntax) | `string` | |
+| `packages` | Package filter | `string[]` | |
+| `view_mode` | Quick filter for changed models | `all`, `changed_models` | |
+
+**Notes:**
+
+*1: If `node_ids` or `node_names` is specified, it will be used; otherwise, nodes will be selected using the criteria defined by `select`, `exclude`, `packages`, and `view_mode`.
+
+**Examples:**
+
+Using node selector:
+
+```yaml
+checks:
+ - name: Row count for modified tables
+ description: Check row counts for all modified table models
+ type: row_count_diff
+ params:
+ select: state:modified,config.materialized:table
+ exclude: tag:dev
+```
+
+Using node names:
+
+```yaml
+checks:
+ - name: Row count for key models
+ description: Check row counts for customers and orders
+ type: row_count_diff
+ params:
+ node_names: ['customers', 'orders']
+```
+
+### Schema Diff
+
+Compares schema structure between base and current environments.
+
+**Type:** `schema_diff`
+
+**Parameters:**
+
+| Field | Description | Type | Required |
+|-------|-------------|------|----------|
+| `node_id` | The node ID or list of node IDs to check | `string[]` | *1 |
+| `select` | Node selection syntax. See [dbt docs](https://docs.getdbt.com/reference/node-selection/syntax) | `string` | |
+| `exclude` | Node exclusion syntax. See [dbt docs](https://docs.getdbt.com/reference/node-selection/syntax) | `string` | |
+| `packages` | Package filter | `string[]` | |
+| `view_mode` | Quick filter for changed models | `all`, `changed_models` | |
+
+**Notes:**
+
+*1: If `node_id` is specified, it will be used; otherwise, nodes will be selected using the criteria defined by `select`, `exclude`, `packages`, and `view_mode`.
+
+**Examples:**
+
+Using node selector:
+
+```yaml
+checks:
+ - name: Schema diff for modified models
+ description: Check schema changes for modified models and downstream
+ type: schema_diff
+ params:
+ select: state:modified+
+ exclude: tag:dev
+```
+
+Using node ID:
+
+```yaml
+checks:
+ - name: Schema diff for customers
+ description: Check schema for customers model
+ type: schema_diff
+ params:
+ node_id: model.jaffle_shop.customers
+```
+
+### Lineage Diff
+
+Compares lineage structure between base and current environments.
+
+**Type:** `lineage_diff`
+
+**Parameters:**
+
+| Field | Description | Type | Required |
+|-------|-------------|------|----------|
+| `select` | Node selection syntax. See [dbt docs](https://docs.getdbt.com/reference/node-selection/syntax) | `string` | |
+| `exclude` | Node exclusion syntax. See [dbt docs](https://docs.getdbt.com/reference/node-selection/syntax) | `string` | |
+| `packages` | Package filter | `string[]` | |
+| `view_mode` | Quick filter for changed models | `all`, `changed_models` | |
+
+**Examples:**
+
+```yaml
+checks:
+ - name: Lineage diff for modified models
+ description: Check lineage changes for modified models and downstream
+ type: lineage_diff
+ params:
+ select: state:modified+
+ exclude: tag:dev
+```
+
+### Query
+
+Executes a custom SQL query in the current environment.
+
+**Type:** `query`
+
+**Parameters:**
+
+| Field | Description | Type | Required |
+|-------|-------------|------|----------|
+| `sql_template` | SQL statement using Jinja templating | `string` | Yes |
+
+**Examples:**
+
+```yaml
+checks:
+ - name: Customer count
+ description: Get total customer count
+ type: query
+ params:
+ sql_template: select count(*) from {{ ref("customers") }}
+```
+
+### Query Diff
+
+Compares query results between base and current environments.
+
+**Type:** `query_diff`
+
+**Parameters:**
+
+| Field | Description | Type | Required |
+|-------|-------------|------|----------|
+| `sql_template` | SQL statement using Jinja templating | `string` | Yes |
+| `base_sql_template` | SQL statement for base environment (if different) | `string` | |
+| `primary_keys` | Primary keys for record identification | `string[]` | *1 |
+
+**Notes:**
+
+*1: If `primary_keys` is specified, the query diff is performed in the warehouse. Otherwise, the query result (up to the first 2000 records) is returned, and the diff is executed on the client side.
+
+**Examples:**
+
+```yaml
+checks:
+ - name: Customer data diff
+ description: Compare customer data between environments
+ type: query_diff
+ params:
+ sql_template: select * from {{ ref("customers") }}
+ primary_keys:
+ - customer_id
+```
+
+### Value Diff
+
+Compares values for a specific model between environments.
+
+**Type:** `value_diff` or `value_diff_detail`
+
+**Parameters:**
+
+| Field | Description | Type | Required |
+|-------|-------------|------|----------|
+| `model` | The name of the model | `string` | Yes |
+| `primary_key` | Primary key(s) for record identification | `string` or `string[]` | Yes |
+| `columns` | List of columns to include in diff | `string[]` | |
+
+**Examples:**
+
+Value diff summary:
+
+```yaml
+checks:
+ - name: Customer value diff
+ description: Compare customer values
+ type: value_diff
+ params:
+ model: customers
+ primary_key: customer_id
+```
+
+Value diff with detailed rows:
+
+```yaml
+checks:
+ - name: Customer value diff (detailed)
+ description: Compare customer values with row details
+ type: value_diff_detail
+ params:
+ model: customers
+ primary_key: customer_id
+```
+
+### Profile Diff
+
+Compares statistical profiles of a model between environments.
+
+**Type:** `profile_diff`
+
+**Parameters:**
+
+| Field | Description | Type | Required |
+|-------|-------------|------|----------|
+| `model` | The name of the model | `string` | Yes |
+
+**Examples:**
+
+```yaml
+checks:
+ - name: Customer profile diff
+ description: Compare statistical profile of customers
+ type: profile_diff
+ params:
+ model: customers
+```
+
+### Histogram Diff
+
+Compares histogram distributions for a column between environments.
+
+**Type:** `histogram_diff`
+
+**Parameters:**
+
+| Field | Description | Type | Required |
+|-------|-------------|------|----------|
+| `model` | The name of the model | `string` | Yes |
+| `column_name` | The name of the column | `string` | Yes |
+| `column_type` | The type of the column | `string` | Yes |
+
+**Examples:**
+
+```yaml
+checks:
+ - name: CLV histogram diff
+ description: Compare customer lifetime value distribution
+ type: histogram_diff
+ params:
+ model: customers
+ column_name: customer_lifetime_value
+ column_type: BIGINT
+```
+
+### Top-K Diff
+
+Compares top-K values for a column between environments.
+
+**Type:** `top_k_diff`
+
+**Parameters:**
+
+| Field | Description | Type | Required |
+|-------|-------------|------|----------|
+| `model` | The name of the model | `string` | Yes |
+| `column_name` | The name of the column | `string` | Yes |
+| `k` | Number of top items to include | `number` | Default: 50 |
+
+**Examples:**
+
+```yaml
+checks:
+ - name: Top 50 customer values
+ description: Compare top 50 customer lifetime values
+ type: top_k_diff
+ params:
+ model: customers
+ column_name: customer_lifetime_value
+ k: 50
+```
+
+## Default Behavior
+
+- Preset checks are loaded from `recce.yml` when Recce starts
+- Checks execute automatically with `recce run`
+- Results are stored in the state file
+- View options control how results are displayed in the UI
+
+## Related
+
+- [Preset Checks Guide](../7-cicd/preset-checks.md) - How to use preset checks in workflows
+- [State File](./state-file.md) - Understanding the state file format
+- [CLI Reference](./cli-reference.md) - Command-line options for running checks
diff --git a/docs/7-reference/state-file.md b/docs/7-reference/state-file.md
new file mode 100644
index 00000000..4b36e6b2
--- /dev/null
+++ b/docs/7-reference/state-file.md
@@ -0,0 +1,145 @@
+---
+title: State File
+---
+
+# State File
+
+This reference documents the Recce state file format, which stores validation results, checks, and environment information.
+
+## Overview
+
+The state file represents the serialized state of a Recce instance. It is a JSON-formatted file containing checks, runs, environment artifacts, and runtime information.
+
+## File Format
+
+| Aspect | Details |
+|--------|---------|
+| Format | JSON |
+| Default name | `recce_state.json` |
+| Location | dbt project root |
+
+## Contents
+
+The state file contains the following information:
+
+- **Checks**: Data from the checks added to the checklist on the Checklist page
+- **Runs**: Each diff execution in Recce corresponds to a run, similar to a query in a data warehouse. Typically, a single run submits a series of queries to the warehouse and retrieves the final results
+- **Environment Artifacts**: Includes `manifest.json` and `catalog.json` files for both the base and current environments
+- **Runtime Information**: Metadata such as Git branch details and pull request (PR) information from the CI runner
+
+## Saving the State File
+
+There are multiple ways to save the state file.
+
+### Save from Web UI
+
+Click the **Save** button at the top of the app. Recce will continuously write updates to the state file, effectively working like an auto-save feature, and persist the state until the Recce instance is closed. The file is saved with the specified filename in the directory where the `recce server` command is run.
+
+### Export from Web UI
+
+Click the **Export** button located in the top-right corner to download the current Recce state to any location on your machine.
+
+{: .shadow}
+
+### Start with State File
+
+Provide a state file as an argument when launching Recce. If the file does not exist, Recce will create a state file and start with an empty state. If the file exists, Recce will load the state and continue working from it.
+
+```bash
+recce server my_recce_state.json
+```
+
+## Using the State File
+
+The state file can be used in several ways:
+
+### Continue State
+
+Launch Recce with the specified state file to continue from where you left off.
+
+```bash
+recce server my_recce_state.json
+```
+
+### Review Mode
+
+Running Recce with the `--review` option enables review mode. In this mode, Recce uses the dbt artifacts in the state file instead of those in the `target/` and `target-base/` directories. This option is useful for distinguishing between development and review purposes.
+
+```bash
+recce server --review my_recce_state.json
+```
+
+### Import Checklist
+
+To preserve favorite checks across different branches, import a checklist by clicking the **Import** button at the top of the checklist.
+
+### Continue from `recce run`
+
+Execute the checks in the specified state file.
+
+```bash
+recce run --state-file my_recce_state.json
+```
+
+## Workflow Examples
+
+### Development Workflow
+
+In the development workflow, the state file acts as a session for developing a feature. It allows you to store checks to verify the diff results against the base environment.
+
+1. Run the recce server without a state file
+
+ ```bash
+ recce server
+ ```
+
+2. Add checks to the checklist
+3. Save the state by clicking the **Save** or **Export** button
+4. Resume your session by launching Recce with the specific state file
+
+ ```bash
+ recce server recce_issue_1.json
+ ```
+
+
+
+### PR Review Workflow
+
+During the PR review process, the state file serves as a communication medium between the submitter and the reviewer.
+
+1. Start the Recce server without a state file
+
+ ```bash
+ recce server
+ ```
+
+2. Add checks to the checklist
+3. Save the state by clicking the **Save** or **Export** button
+4. Share the state file with the reviewer or attach it as a comment in the pull request
+5. The reviewer reviews the results using the state file
+
+ ```bash
+ recce server --review recce_issue_1.json
+ ```
+
+
+
+## CLI Options
+
+| Option | Description |
+|--------|-------------|
+| `recce server ` | Start server with state file |
+| `recce server --review ` | Start in review mode using state file artifacts |
+| `recce run --state-file ` | Run checks from state file |
+
+## Default Behavior
+
+- If no state file is specified, Recce starts with an empty state
+- State files are saved to the current working directory by default
+- Review mode (`--review`) uses artifacts embedded in the state file
+
+## Related
+
+- [CLI Reference](./cli-reference.md) - Command-line options
+- [Configuration](./configuration.md) - Preset check configuration
+- [PR Review Workflow](../7-cicd/scenario-pr-review.md) - Using state files in reviews
diff --git a/docs/8-community/changelog.md b/docs/8-community/changelog.md
new file mode 100644
index 00000000..c03d0086
--- /dev/null
+++ b/docs/8-community/changelog.md
@@ -0,0 +1,15 @@
+---
+title: Changelog
+---
+
+# Changelog
+
+Stay informed about Recce updates, new features, and improvements.
+
+## Release notes
+
+For a quick overview of recent releases, visit the [Recce Changelog](https://reccehq.com/changelog/).
+
+## Detailed release posts
+
+For in-depth coverage of new features and how to use them, see our [release blog posts](https://blog.reccehq.com/tag/release).
diff --git a/docs/8-community/support.md b/docs/8-community/support.md
new file mode 100644
index 00000000..48eb23d8
--- /dev/null
+++ b/docs/8-community/support.md
@@ -0,0 +1,36 @@
+---
+title: Community & Support
+---
+
+# Community & Support
+
+Connect with the Recce team and community for help and updates.
+
+## Get help
+
+- [Discord](https://discord.com/invite/VpwXRC34jz) - Join our community for discussions and quick support
+- [dbt Slack](https://www.getdbt.com/community/join-the-community) - Find us in the [#tools-recce](https://getdbt.slack.com/archives/C05C28V7CPP) channel
+- [Email](mailto:help@reccehq.com) - Reach us at help@reccehq.com
+
+## Report issues
+
+Found a bug or have a feature request? Open a [GitHub Issue](https://github.com/DataRecce/recce/issues) on our repository.
+
+## Follow Recce
+
+Stay updated with news and insights from the team:
+
+- [LinkedIn](https://www.linkedin.com/company/datarecce)
+- [Recce Blog](https://blog.reccehq.com/)
+- [X (Twitter)](https://x.com/DataRecce)
+- [Mastodon](https://mastodon.social/@DataRecce)
+- [Bluesky](https://bsky.app/profile/datarecce.bsky.social)
+
+## Subscribe to our newsletter
+
+Stay updated with Recce news, data engineering insights, and product updates.
+
+
+
Sign up for Recce Updates
+
+
diff --git a/docs/CLEANUP-TODO.md b/docs/CLEANUP-TODO.md
new file mode 100644
index 00000000..65f5909c
--- /dev/null
+++ b/docs/CLEANUP-TODO.md
@@ -0,0 +1,85 @@
+# Post-Merge Cleanup Tasks
+
+This document tracks files to delete after all documentation restructuring PRs (6-9) are merged into `docs-v3`.
+
+## Important
+
+- Do NOT delete these files until all PRs are merged
+- The redirects plugin in `mkdocs.yml` handles URL redirections automatically
+- Delete this file after cleanup is complete
+
+## Files to Delete
+
+### From PR 6 - Lineage and Data Diffing Consolidation
+
+**Old Visualized Change section:**
+
+- `docs/3-visualized-change/lineage.md` → Moved to `5-what-you-can-explore/lineage-diff.md`
+
+**Old Downstream Impacts section:**
+
+- `docs/4-downstream-impacts/impact-radius.md` → Moved to `5-what-you-can-explore/impact-radius.md`
+- `docs/4-downstream-impacts/breaking-change-analysis.md` → Moved to `5-what-you-can-explore/breaking-change-analysis.md`
+- `docs/4-downstream-impacts/metadata-first.md` → Deprecated (if exists)
+- `docs/4-downstream-impacts/transformation-types.md` → Deprecated (if exists)
+
+**Old Data Diffing section:**
+
+- `docs/5-data-diffing/row-count-diff.md` → Consolidated into `5-what-you-can-explore/data-diffing.md`
+- `docs/5-data-diffing/profile-diff.md` → Consolidated into `5-what-you-can-explore/data-diffing.md`
+- `docs/5-data-diffing/value-diff.md` → Consolidated into `5-what-you-can-explore/data-diffing.md`
+- `docs/5-data-diffing/topK-diff.md` → Consolidated into `5-what-you-can-explore/data-diffing.md`
+- `docs/5-data-diffing/histogram-diff.md` → Consolidated into `5-what-you-can-explore/data-diffing.md`
+- `docs/5-data-diffing/query.md` → Consolidated into `5-what-you-can-explore/data-diffing.md`
+
+### From PR 7 - CI/CD Reorganization
+
+- `docs/7-cicd/preset-checks.md` → Moved to `6-collaboration/preset-checks.md`
+
+### From PR 8 - Reference Section
+
+- `docs/8-technical-concepts/configuration.md` → Moved to `7-reference/configuration.md`
+- `docs/8-technical-concepts/state-file.md` → Moved to `7-reference/state-file.md`
+
+### From PR 9 - Community Section
+
+- `docs/1-whats-recce/community-support.md` → Moved to `8-community/support.md`
+
+## Empty Directories to Remove
+
+After deleting files, remove these directories if empty:
+
+- `docs/4-downstream-impacts/`
+- `docs/5-data-diffing/`
+- `docs/8-technical-concepts/`
+
+## Cleanup Command
+
+After all PRs are merged, run this to delete old files:
+
+```bash
+# Delete old files (run from repository root)
+rm -f docs/3-visualized-change/lineage.md
+rm -f docs/4-downstream-impacts/impact-radius.md
+rm -f docs/4-downstream-impacts/breaking-change-analysis.md
+rm -f docs/4-downstream-impacts/metadata-first.md
+rm -f docs/4-downstream-impacts/transformation-types.md
+rm -f docs/5-data-diffing/row-count-diff.md
+rm -f docs/5-data-diffing/profile-diff.md
+rm -f docs/5-data-diffing/value-diff.md
+rm -f docs/5-data-diffing/topK-diff.md
+rm -f docs/5-data-diffing/histogram-diff.md
+rm -f docs/5-data-diffing/query.md
+rm -f docs/7-cicd/preset-checks.md
+rm -f docs/8-technical-concepts/configuration.md
+rm -f docs/8-technical-concepts/state-file.md
+rm -f docs/1-whats-recce/community-support.md
+
+# Remove empty directories
+rmdir docs/4-downstream-impacts/ 2>/dev/null || true
+rmdir docs/5-data-diffing/ 2>/dev/null || true
+rmdir docs/8-technical-concepts/ 2>/dev/null || true
+
+# Remove this file
+rm -f docs/CLEANUP-TODO.md
+```
diff --git a/docs/assets/images/2-getting-started/connect-dw.png b/docs/assets/images/2-getting-started/connect-dw.png
new file mode 100644
index 00000000..aa05e9ee
Binary files /dev/null and b/docs/assets/images/2-getting-started/connect-dw.png differ
diff --git a/docs/assets/images/2-getting-started/connect-github.png b/docs/assets/images/2-getting-started/connect-github.png
new file mode 100644
index 00000000..5fc4ba0d
Binary files /dev/null and b/docs/assets/images/2-getting-started/connect-github.png differ
diff --git a/docs/assets/images/2-getting-started/connect-gitlab.png b/docs/assets/images/2-getting-started/connect-gitlab.png
new file mode 100644
index 00000000..75ace086
Binary files /dev/null and b/docs/assets/images/2-getting-started/connect-gitlab.png differ
diff --git a/docs/assets/images/2-getting-started/org-projects.png b/docs/assets/images/2-getting-started/org-projects.png
new file mode 100644
index 00000000..c127ef3e
Binary files /dev/null and b/docs/assets/images/2-getting-started/org-projects.png differ
diff --git a/mkdocs.yml b/mkdocs.yml
index bee87747..10604fae 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -49,52 +49,53 @@ nav:
- Guides:
- What's Recce:
- index.md
+ - 1-whats-recce/cloud-vs-oss.md
- 1-whats-recce/community-support.md
- Changelog: "https://reccehq.com/changelog/"
- Getting Started:
- - 2-getting-started/oss-vs-cloud.md
- 2-getting-started/start-free-with-cloud.md
- #- 2-getting-started/cloud-5min-tutorial.md
- - 2-getting-started/installation.md
+ - dbt Cloud Setup: 2-getting-started/dbt-cloud-setup.md
+ - Environment Setup: 2-getting-started/environment-setup.md
+ - Environment Best Practices: 2-getting-started/environment-best-practices.md
+ - Setup CD: 2-getting-started/setup-cd.md
+ - Setup CI: 2-getting-started/setup-ci.md
- Claude Plugin: 2-getting-started/claude-plugin.md
- - 2-getting-started/get-started-jaffle-shop.md
- - Visualized Change:
- - 3-visualized-change/lineage.md
- - 3-visualized-change/code-change.md
- - 3-visualized-change/column-level-lineage.md
- - 3-visualized-change/multi-models.md
- - Downstream Impacts:
- #- 4-downstream-impacts/metadata-first.md
- - 4-downstream-impacts/impact-radius.md
- - 4-downstream-impacts/breaking-change-analysis.md
- #- 4-downstream-impacts/transformation-types.md
- - Data Diffing:
- - 5-data-diffing/connect-to-warehouse.md
- - 5-data-diffing/row-count-diff.md
- - 5-data-diffing/profile-diff.md
- - 5-data-diffing/value-diff.md
- - 5-data-diffing/topK-diff.md
- - 5-data-diffing/histogram-diff.md
- - 5-data-diffing/query.md
- - MCP Server (AI Agents): 5-data-diffing/mcp-server.md
+ - 2-getting-started/oss-setup.md
+ - 2-getting-started/jaffle-shop-tutorial.md
+ - Using Recce:
+ - 3-using-recce/admin-setup.md
+ - 3-using-recce/data-developer.md
+ - 3-using-recce/data-reviewer.md
+ - What the Agent Does:
+ - 4-what-the-agent-does/index.md
+ - 4-what-the-agent-does/automated-validation.md
+ - 4-what-the-agent-does/impact-analysis.md
+ - What You Can Explore:
+ - 5-what-you-can-explore/lineage-diff.md
+ - 5-what-you-can-explore/code-change.md
+ - 5-what-you-can-explore/column-level-lineage.md
+ - 5-what-you-can-explore/impact-radius.md
+ - 5-what-you-can-explore/breaking-change-analysis.md
+ - 5-what-you-can-explore/data-diffing.md
+ - 5-what-you-can-explore/multi-models.md
- Collaborate Validation:
- - 6-collaboration/invitation.md
- 6-collaboration/checklist.md
+ - 6-collaboration/preset-checks.md
- 6-collaboration/share.md
- CI/CD:
- - 7-cicd/ci-cd-getting-started.md
- - 7-cicd/setup-cd.md
- - 7-cicd/setup-ci.md
- 7-cicd/pr-mr-summary.md
#- 7-cicd/recce-debug.md # content outdated
- - 7-cicd/scenario-dev.md
+ - 7-cicd/scenario-dev.md
- 7-cicd/scenario-pr-review.md
- - 7-cicd/preset-checks.md
- - 7-cicd/best-practices-prep-env.md
- - Technical Concepts:
- - 8-technical-concepts/state-file.md
- - 8-technical-concepts/configuration.md
+ - Reference:
+ - 7-reference/configuration.md
+ - 7-reference/state-file.md
+ - 7-reference/cli-reference.md
+
+ - Community:
+ - 8-community/support.md
+ - 8-community/changelog.md
- Blog: "https://blog.reccehq.com"
- Changelog: "https://reccehq.com/changelog/"
@@ -155,6 +156,32 @@ theme:
plugins:
- search
+ - redirects:
+ redirect_maps:
+ '6-collaboration/invitation.md': '3-using-recce/admin-setup.md'
+ '2-getting-started/installation.md': '2-getting-started/oss-setup.md'
+ '2-getting-started/get-started-jaffle-shop.md': '2-getting-started/jaffle-shop-tutorial.md'
+ '2-getting-started/oss-vs-cloud.md': '1-whats-recce/cloud-vs-oss.md'
+ '7-cicd/setup-cd.md': '2-getting-started/setup-cd.md'
+ '7-cicd/setup-ci.md': '2-getting-started/setup-ci.md'
+ '7-cicd/ci-cd-getting-started.md': '2-getting-started/environment-setup.md'
+ '7-cicd/best-practices-prep-env.md': '2-getting-started/environment-best-practices.md'
+ '3-visualized-change/lineage.md': '5-what-you-can-explore/lineage-diff.md'
+ '3-visualized-change/code-change.md': '5-what-you-can-explore/code-change.md'
+ '3-visualized-change/column-level-lineage.md': '5-what-you-can-explore/column-level-lineage.md'
+ '3-visualized-change/multi-models.md': '5-what-you-can-explore/multi-models.md'
+ '4-downstream-impacts/impact-radius.md': '5-what-you-can-explore/impact-radius.md'
+ '4-downstream-impacts/breaking-change-analysis.md': '5-what-you-can-explore/breaking-change-analysis.md'
+ '5-data-diffing/row-count-diff.md': '5-what-you-can-explore/data-diffing.md'
+ '5-data-diffing/profile-diff.md': '5-what-you-can-explore/data-diffing.md'
+ '5-data-diffing/value-diff.md': '5-what-you-can-explore/data-diffing.md'
+ '5-data-diffing/topK-diff.md': '5-what-you-can-explore/data-diffing.md'
+ '5-data-diffing/histogram-diff.md': '5-what-you-can-explore/data-diffing.md'
+ '5-data-diffing/query.md': '5-what-you-can-explore/data-diffing.md'
+ '7-cicd/preset-checks.md': '6-collaboration/preset-checks.md'
+ '8-technical-concepts/configuration.md': '7-reference/configuration.md'
+ '8-technical-concepts/state-file.md': '7-reference/state-file.md'
+ '1-whats-recce/community-support.md': '8-community/support.md'
- glightbox:
skip_classes:
- skip-glightbox
diff --git a/requirements.txt b/requirements.txt
index 7359048b..44e20f72 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,3 +1,4 @@
mkdocs-material
+mkdocs-redirects
mkdocs-glightbox
mkdocs-material[imaging]
\ No newline at end of file