Skip to content

Latest commit

 

History

History
677 lines (482 loc) · 27.2 KB

File metadata and controls

677 lines (482 loc) · 27.2 KB

drift

Detecting Riparian and Inland Floodplain Transitions — track land cover change in riparian and floodplain areas using free satellite imagery.

Repository Context

Repository: NewGraphEnvironment/drift Primary Language: R pkgdown site: https://www.newgraphenvironment.com/drift/

Architecture

  • dft_ prefix for all exported functions
  • Generic STAC pipeline — works with any classified raster (IO LULC, ESA WorldCover, custom COGs)
  • R/ — package functions, tests/testthat/ — testthat 3e tests, vignettes/ — worked examples
  • inst/lulc_classes/ — shipped CSV class tables (code, class_name, color, description)
  • inst/extdata/ — small test rasters (Neexdzii Kwa reach, 204KB total)
  • data-raw/ — scripts to regenerate test data (flooded + gdalcubes)

Core Pipeline

rasters    <- dft_stac_fetch(aoi, source = "io-lulc", years = c(2017, 2020, 2023))
classified <- dft_rast_classify(rasters, source = "io-lulc")
summary    <- dft_rast_summarize(classified, unit = "ha")
dft_map_interactive(classified, aoi = aoi)

Key Patterns

  • Dual-mode maps: dft_map_interactive() uses addRasterImage() for local SpatRasters, addTiles() via titiler for remote COGs
  • titiler URL from option: getOption("drift.titiler_url") — keeps infrastructure private
  • class_table fallback: All functions accept class_table tibble or fall back to dft_class_table(source)
  • List or single: Core functions accept a single SpatRaster or a named list — names become layer labels / year column
  • gdalcubes for STAC: Server-side crop, not full tile download. Orders of magnitude faster than /vsicurl/

Related Packages

  • flooded — generates floodplain AOI polygons (upstream of drift)
  • gq — cartographic style registry (planned bridge for leaflet translator)

Development

devtools::document()   # after roxygen changes
devtools::test()       # 100 tests, all local (no network)
devtools::install()    # needed before rendering vignettes

Code Check Conventions

Structured checklist for reviewing diffs before commit. Used by /code-check. Add new checks here when a bug class is discovered — they compound over time.

Shell Scripts

Quoting

  • Variables in double-quoted strings containing single quotes break if value has '
  • "echo '${VAR}'" — if VAR contains ', shell syntax breaks
  • Use printf '%s\n' "$VAR" | command to pipe values safely
  • Heredocs: unquoted <<EOF expands variables locally, <<'EOF' does not — know which you need

Paths

  • Hardcoded absolute paths (/Users/airvine/...) break for other users
  • Use REPO_ROOT="$(cd "$(dirname "$0")/<relative>" && pwd)"
  • After moving scripts, verify ../ depth still resolves correctly
  • Usage comments should match actual script location

Silent Failures

  • || true hides real errors — is the failure actually safe to ignore?
  • Empty variable before destructive operation (rm, destroy) — add guard: [ -n "$VAR" ] || exit 1
  • grep returning empty silently — downstream commands get empty input

Process Visibility

  • Secrets passed as command-line args are visible in ps aux
  • Use env files, stdin pipes, or temp files with chmod 600 instead

Cloud-Init (YAML)

ASCII

  • Must be pure ASCII — em dashes, curly quotes, arrows cause silent parse failure
  • Check with: perl -ne 'print "$.: $_" if /[^\x00-\x7F]/' file.yaml

State

  • cloud-init clean causes full re-provisioning on next boot — almost never what you want before snapshot
  • Use tailscale logout not tailscale down before snapshot (deregister vs disconnect)

Template Variables

  • Secrets rendered via templatefile() are readable at 169.254.169.254 metadata endpoint
  • Acceptable for ephemeral machines, document the tradeoff

OpenTofu / Terraform

State

  • Parsing tofu state show text output is fragile — use tofu output instead
  • Missing outputs that scripts need — add them to main.tf
  • Snapshot/image IDs in tfvars after deleting the snapshot — stale reference

Destructive Operations

  • Validate resource IDs before destroy: [ -n "$ID" ] || exit 1
  • tofu destroy without -target destroys everything including reserved IPs
  • Snapshot ID extraction: use --resource droplet and grep -F for exact match

Security

Secrets in Committed Files

  • .tfvars must be gitignored (contains tokens, passwords)
  • .tfvars.example should have all variables with empty/placeholder values
  • Sensitive variables need sensitive = true in variables.tf

Firewall Defaults

  • 0.0.0.0/0 for SSH is world-open — document if intentional
  • If access is gated by Tailscale, say so explicitly

Credentials

  • Passwords with special chars (', ", $, !) break naive shell quoting
  • printf '%q' escapes values for shell safety
  • Temp files for secrets: create with chmod 600, delete after use

R / Package Installation

pak Behavior

  • pak stops on first unresolvable package — all subsequent packages are skipped
  • Removed CRAN packages (like leaflet.extras) must move to GitHub source
  • PPPM binaries may lag a few hours behind new CRAN releases

Reproducibility

  • Branch pins (pkg@branch) are not reproducible — document why used
  • Pinned download URLs (RStudio .deb) go stale — document where to update

General

Adopting Existing Config

When importing config from one location into a canonical one (legacy ~/.bash_profile → dotfiles repo, old script's env → repo, another project's settings.json → soul):

  • Verify every referenced path/binary exists. Dead PATH exports, missing interpreters, stale env vars should be cut, not codified. Shell paths: for p in $(echo "$PATH" | tr ':' ' '); do [ -d "$p" ] || echo "DEAD: $p"; done
  • Ask before dropping a reference — it may be something the user forgot to reinstall on this machine, not something to delete.
  • Curated subset, not verbatim copy. The diff should reflect what you verified, not the whole source.

Documentation Staleness

  • Moving/renaming scripts: update CLAUDE.md, READMEs, usage comments
  • New variables: update .tfvars.example
  • New workflows: update relevant README

LLM Behavioral Guidelines

Behavioral guidelines to reduce common LLM coding mistakes. Merge with project-specific instructions as needed.

Tradeoff: These guidelines bias toward caution over speed. For trivial tasks, use judgment.

1. Think Before Coding

Don't assume. Don't hide confusion. Surface tradeoffs.

Before implementing:

  • State your assumptions explicitly. If uncertain, ask.
  • If multiple interpretations exist, present them - don't pick silently.
  • If a simpler approach exists, say so. Push back when warranted.
  • If something is unclear, stop. Name what's confusing. Ask.

2. Simplicity First

Minimum code that solves the problem. Nothing speculative.

  • No features beyond what was asked.
  • No abstractions for single-use code.
  • No "flexibility" or "configurability" that wasn't requested.
  • No error handling for impossible scenarios.
  • If you write 200 lines and it could be 50, rewrite it.

Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.

3. Surgical Changes

Touch only what you must. Clean up only your own mess.

When editing existing code:

  • Don't "improve" adjacent code, comments, or formatting.
  • Don't refactor things that aren't broken.
  • Match existing style, even if you'd do it differently.
  • If you notice unrelated dead code, mention it - don't delete it.

When your changes create orphans:

  • Remove imports/variables/functions that YOUR changes made unused.
  • Don't remove pre-existing dead code unless asked.

The test: Every changed line should trace directly to the user's request.

4. Goal-Driven Execution

Define success criteria. Loop until verified.

Transform tasks into verifiable goals:

  • "Add validation" → "Write tests for invalid inputs, then make them pass"
  • "Fix the bug" → "Write a test that reproduces it, then make it pass"
  • "Refactor X" → "Ensure tests pass before and after"

For multi-step tasks, state a brief plan:

1. [Step] → verify: [check]
2. [Step] → verify: [check]
3. [Step] → verify: [check]

Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.

These guidelines are working if: fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.

Planning Conventions

How Claude manages structured planning for complex tasks using planning-with-files (PWF).

When to Plan

Use PWF when a task has multiple phases, requires research, or involves more than ~5 tool calls. Triggers:

  • User says "let's plan this", "plan mode", "use planning", or invokes /planning-init
  • Complex issue work begins (multi-step, uncertain approach)
  • Claude judges the task warrants structured tracking

Skip planning for single-file edits, quick fixes, or tasks with obvious next steps.

The Workflow

  1. Explore first — Enter plan mode (read-only). Read code, trace paths, understand the problem before proposing anything.
  2. Plan to files — Write the plan into 3 files in planning/active/:
    • task_plan.md — Phases with checkbox tasks
    • findings.md — Research, discoveries, technical analysis
    • progress.md — Session log with timestamps and commit refs
  3. Commit the plan — Commit the planning files before starting implementation. This is the baseline.
  4. Work in atomic commits — Each commit bundles code changes WITH checkbox updates in the planning files. The diff shows both what was done and the checkbox marking it done.
  5. Code check before commit — Run /code-check on staged diffs before committing. Don't mark a task done until the diff passes review.
  6. Archive when complete — Move planning/active/ to planning/archive/ via /planning-archive. Write a README.md in the archive directory with a one-paragraph outcome summary and closing commit/PR ref — future sessions scan these to catch up fast.

Atomic Commits (Critical)

Every commit that completes a planned task MUST include:

  • The code/script changes
  • The checkbox update in task_plan.md (- [ ] -> - [x])
  • A progress entry in progress.md if meaningful

This creates a git audit trail where git log -- planning/ tells the full story. Each commit is self-documenting — you can backtrack with git and understand everything that happened.

File Formats

task_plan.md

Phases with checkboxes. This is the core tracking file.

# Task Plan

## Phase 1: [Name]
- [ ] Task description
- [ ] Another task

## Phase 2: [Name]
- [ ] Task description

Mark tasks done as they're completed: - [x] Task description

findings.md

Append-only research log. Discoveries, technical analysis, things learned.

# Findings

## [Topic]
[What was found, with source/date]

progress.md

Session entries with commit references.

# Progress

## Session YYYY-MM-DD
- Completed: [items]
- Commits: [refs]
- Next: [items]

Directory Structure

planning/
  active/          <- Current work (3 PWF files)
  archive/         <- Completed issues
    YYYY-MM-issue-N-slug/

If planning/ doesn't exist in the repo, run /planning-init first.

Skills

Skill When to use
/planning-init First time in a repo — creates directory structure
/planning-update Mid-session — sync checkboxes and progress
/planning-archive Issue complete — archive and create fresh active/

R Package Development Conventions

Standards for R package development across New Graph Environment repositories. Based on R Packages (2e) by Hadley Wickham and Jenny Bryan.

Reference packages: When starting a new package, study these existing packages for patterns: flooded, gq. They demonstrate the conventions below in practice (DESCRIPTION fields, README layout, NEWS.md style, pkgdown setup, test structure, hex sticker, etc.).

Style

  • tidyverse style guide: snake_case, pipe operators (|> or %>%)
  • Match existing patterns in each codebase
  • Use pak for package installation (not install.packages)
  • Prefix column name vectors with cols_ for discoverability in the environment pane: cols_all, cols_carry, cols_split, cols_writable. Same principle for other grouped vectors (params_, tbl_, etc.)

Package Structure

Follow R Packages (2e) conventions:

  • R/ for functions, tests/testthat/ for tests, man/ for docs
  • DESCRIPTION with proper fields (Title, Description, Authors@R)
  • DESCRIPTION URL field: include both the GitHub repo and the pkgdown site so pkgdown links correctly (e.g., URL: https://github.com/OWNER/PKG, https://owner.github.io/PKG/)
  • NAMESPACE managed by roxygen2 (#' @export, #' @import, #' @importFrom)
  • Never edit NAMESPACE or man/ by hand

One Function, One File

Each exported function gets its own R file and its own test file:

  • R/fl_mask.Rtests/testthat/test-fl_mask.R
  • Commit the function and its tests together
  • Use Fixes #N in the commit message to close the corresponding issue

GitHub Issues and SRED Tracking

Issue-per-function workflow

File a GitHub issue for each function before building it. This creates a traceable record of what was planned, built, and verified.

Branching for SRED

For new packages or major features, work on a branch and merge via PR:

main ← scaffold-branch (PR closes with "Relates to NewGraphEnvironment/sred-2025-2026#N")

This gives one PR that contains all commits — a single SRED cross-reference covers the entire body of work. Individual commits within the branch close their respective function issues with Fixes #N.

Closing issues

Close function issues via commit messages — see Closing Issues in newgraph conventions.

Testing

  • Use testthat 3e (Config/testthat/edition: 3 in DESCRIPTION)
  • Run devtools::test() before committing
  • Test files mirror source: R/utils.R -> tests/testthat/test-utils.R
  • Test for edge cases and potential failures, not just happy paths
  • Tests must pass before closing the function's issue
  • Always grep for errors in the same command as the test run to avoid running twice:
    Rscript -e 'devtools::test()' 2>&1 | grep -E "(FAIL|ERROR|PASS)" | tail -5
    For error context: grep -E "(ERROR:|FAIL )" -A 10 | head -25

Examples and Vignettes

Runnable examples on every exported function

Examples are how users discover what a function does. They must:

  • Actually run — no \dontrun{} unless external resources are required
  • Use bundled test data via system.file() so they work for anyone
  • Show why the function is useful — not just that it runs, but what it produces and why you'd use it
  • Use qualified names for non-exported dependencies (terra::rast(), sf::st_read()) since examples run in the user's environment

Vignettes

At least one vignette showing the full pipeline on real data:

  • Demonstrates the package solving an actual problem end-to-end
  • Uses bundled test data (committed to inst/testdata/)
  • Hosted on pkgdown so users can read it without installing

Output format: Use bookdown::html_vignette2 (not rmarkdown::html_vignette) for figure numbering and cross-references. Requires bookdown in Suggests and chunks must have fig.cap for numbered figures. Cross-reference with Figure \@ref(fig:chunk-name).

Vignettes that need external resources (DB, API, STAC): Do NOT use the .Rmd.orig pre-knit pattern — it breaks bookdown figure numbering because knitr evaluates chunks during pre-knit and emits ![](path) markdown that bookdown can't number.

Instead, separate data generation from presentation:

  1. data-raw/vignette_data.R — runs the queries, saves results as .rds to inst/testdata/ (or inst/vignette-data/)
  2. Vignette loads .rds files, all chunks run live during pkgdown build
  3. Note at top of vignette: "Data generated by data-raw/script.R"
  4. bookdown controls all chunks — figure numbers, cross-refs work

This is the same pattern as test data: data-raw/ documents how the data was produced, committed artifacts make vignettes reproducible without the external resource.

Test data

  • Created via a script in data-raw/ that documents exactly how the data was produced (database queries, spatial crops, etc.)
  • Committed to inst/testdata/ — small enough to ship with the package
  • Used by tests, examples, and vignettes — one dataset, three purposes

Documentation

  • roxygen2 for all exported functions
  • @import or @importFrom in the package-level doc (R/<pkg>-package.R) to populate NAMESPACE — don't rely on :: everywhere in function bodies
  • pkgdown site for public packages with _pkgdown.yml (bootstrap 5)
  • GitHub Action for pkgdown (usethis::use_github_action("pkgdown"))

lintr

Run lintr::lint_package() before committing R package code. Fix all warnings — every lint should be worth fixing.

Recommended .lintr config

linters: linters_with_defaults(
    line_length_linter(120),
    object_name_linter(styles = c("snake_case", "dotted.case")),
    commented_code_linter = NULL
  )
exclusions: list(
    "renv" = list(linters = "all")
  )
  • 120 char line length (default 80 is too strict for data pipelines)
  • Allow dotted.case (common in base R and legacy code)
  • Suppress commented code lints (exploratory R scripts often have commented alternatives)
  • Exclude renv directory entirely

Dependencies

  • Minimize Imports — use Suggests for packages only needed in tests/vignettes
  • Pin versions only when breaking changes are known
  • Prefer packages already in the tidyverse ecosystem

Releasing

  1. Update NEWS.md — keep it concise:
    • First release: one line (e.g., "Initial release. Brief description.")
    • Later releases: describe what changed and why, not function-by-function. Link to the pkgdown reference page for details — don't duplicate it.
    • Don't list every function; the pkgdown reference page is the single source of truth for what's in the package.
  2. Bump version in DESCRIPTION (e.g., 0.0.0.90000.1.0) — as the final commit of the branch, after verification numbers/tests are final. Mid-branch bumps are premature and churn: additional code changes end up bundled inside a "release" that already claimed the version.
  3. Commit as "Release vX.Y.Z"
  4. Tag: git tag vX.Y.Z && git push && git push --tags

Repository Setup

Branch protection

Protect main from deletion and force pushes:

gh api repos/OWNER/REPO/rulesets --method POST --input - <<'EOF'
{
  "name": "Protect main",
  "target": "branch",
  "enforcement": "active",
  "bypass_actors": [
    { "actor_id": 5, "actor_type": "RepositoryRole", "bypass_mode": "always" }
  ],
  "conditions": { "ref_name": { "include": ["refs/heads/main"], "exclude": [] } },
  "rules": [ { "type": "deletion" }, { "type": "non_fast_forward" } ]
}
EOF

Scaffold checklist

  • usethis::create_package(".")
  • usethis::use_mit_license("New Graph Environment Ltd.")
  • usethis::use_testthat(edition = 3)
  • usethis::use_pkgdown()
  • usethis::use_github_action("pkgdown")
  • usethis::use_directory("dev") — reproducible setup script
  • usethis::use_directory("data-raw") — data generation scripts
  • Hex sticker via hexSticker (see data-raw/make_hexsticker.R)
  • Set GitHub Pages to serve from gh-pages branch

dev/dev.R

Keep a dev/dev.R file that documents every setup step. Not idempotent — run interactively. This is the reproducible recipe for the package scaffold.

README

Keep the README lean:

  • Hex sticker, one-line description, install, example showing why it's useful
  • Link to pkgdown vignette and function reference — don't duplicate them
  • Don't maintain a function table — it's just another thing to keep updated and pkgdown's reference page is the single source of truth

LLM Workflow

When an LLM assistant modifies R package code:

  1. Run lintr::lint_package() — fix issues before committing
  2. Run devtools::test() with error grep — ensure tests pass in one call:
    Rscript -e 'devtools::test()' 2>&1 | grep -E "(FAIL|ERROR|PASS)" | tail -5
  3. Run devtools::document() and grep for results:
    Rscript -e 'devtools::document()' 2>&1 | grep -E "(Writing|Updating|warning)" | tail -10
  4. Check devtools::check() passes for releases — capture results in one call:
    Rscript -e 'devtools::check()' 2>&1 | grep -E "(ERROR|WARNING|NOTE|errors|warnings|notes)" | tail -10

Reference Management Conventions

How references flow between Claude Code, Zotero, and technical writing at New Graph Environment.

Tool Routing

Three tools, different purposes. Use the right one.

Need Tool Why
Search by keyword, read metadata/fulltext, semantic search MCP zotero_* tools pyzotero, works with Zotero item keys
Look up by citation key (e.g., irvine2020ParsnipRiver) /zotero-lookup skill Citation keys are a BBT feature — pyzotero can't resolve them
Create items, attach PDFs, deduplicate /zotero-api skill Connector API for writes, JS console for attachments

Citation keys vs item keys: Citation keys (like irvine2020ParsnipRiver) come from Better BibTeX. Item keys (like K7WALMSY) are native Zotero. The MCP works with item keys. /zotero-lookup bridges citation keys to item data.

BBT citation key storage: As of Feb 2025+, BBT stores citation keys as a citationKey field directly in zotero.sqlite (via Zotero's item data system), not in a separate BBT database. The old better-bibtex.sqlite and better-bibtex.migrated files are stale and no longer updated. Query citation keys with: SELECT idv.value FROM items i JOIN itemData id ON i.itemID = id.itemID JOIN itemDataValues idv ON id.valueID = idv.valueID JOIN fields f ON id.fieldID = f.fieldID WHERE f.fieldName = 'citationKey'.

Adding References Workflow

1. Search and flag

When research turns up a reference:

  • DOI available: Tell the user — Zotero's magic wand (DOI lookup) is the fastest path
  • ResearchGate link: Flag to user for manual check — programmatic fetch is blocked (403), but full text is often there
  • BC gov report: Search ACAT, for.gov.bc.ca library, EIRS viewer
  • Paywalled: Note it, move on. Don't waste time trying to bypass.

2. Add to Zotero

Preferred order:

  1. DOI magic wand in Zotero UI (fastest, most complete metadata)
  2. Web API POST with collections array (grey literature, local PDFs — targets collection directly, no UI interaction needed)
  3. saveItems via /zotero-api (batch creation from structured data — requires UI collection selection)
  4. JS console script for group library (when connector can't target the right collection)

Collection targeting: saveItems drops items into whatever collection is selected in Zotero's UI. Always confirm with the user before calling it. Web API bypasses this — include "collections": ["KEY"] in the POST body. Find collection keys with ?q=name search on the collections endpoint.

3. Attach PDFs

saveItems attachments silently fail. Don't use them. Instead:

  1. Web API S3 upload (preferred): Create attachment item → get upload auth → build S3 body (Python: prefix + file bytes + suffix) → POST to S3 → register with uploadKey. Works without Zotero running. See /zotero-api skill section 4.
  2. JS console fallback: Download with curl, attach via item_attach_pdf.js in Zotero JS console.
  3. Verify attachment exists via MCP: zotero_get_item_children

4. Verify

After manual adds, confirm via MCP:

  • zotero_search_items — find by title
  • zotero_get_item_metadata — check fields are complete
  • zotero_get_item_children — confirm PDF attached

5. Clean up

If duplicates were created (common with saveItems retries):

  • Run collection_dedup.js via Zotero JS console
  • It keeps the copy with the most attachments, trashes the rest

In Reports (bookdown)

Bibliography generation

# index.Rmd — dynamic bib from Zotero via Better BibTeX
bibliography: "`r rbbt::bbt_write_bib('references.bib', overwrite = TRUE)`"

rbbt pulls from BBT, which syncs with Zotero. Edit references in Zotero → rebuild report → bibliography updates.

Library targeting: rbbt must know which Zotero library to search. This is set globally in ~/.Rprofile:

# default library — NewGraphEnvironment group (libraryID 9, group 4733734)
options(rbbt.default.library_id = 9)

Without this option, rbbt searches only the personal library (libraryID 1) and won't find group library references. The library IDs map to Zotero's internal numbering — use /zotero-lookup with SELECT DISTINCT libraryID FROM citationkey against the BBT database to discover available libraries.

Citation syntax

  • [@key2020] — parenthetical: (Author 2020)
  • @key2020 — narrative: Author (2020)
  • [@key1; @key2] — multiple
  • nocite: in YAML — include uncited references

Cite primary sources

When a review paper references an older study, trace back to the original and cite it. Don't attribute findings to the review when the original exists. (See LLM Agent Conventions in newgraph.md.)

When the original is unavailable (paywalled, out of print, can't locate): use secondary citation format in the prose and include bib entries for both sources:

Smith et al. (2003; as cited in Doctor 2022) found that...

Both @smith2003 and @doctor2022 go in the .bib file. The reader can then track down the original themselves. Flag incomplete metadata on the primary entry — it's better to have a partial reference than none at all.

PDF Fallback Chain

When you need a PDF and the obvious URL doesn't work:

  1. DOI resolver → publisher site (often has OA link)
  2. Europe PMC (europepmc.org/backend/ptpmcrender.fcgi?accid=PMC{ID}&blobtype=pdf) — ncbi blocks curl
  3. SciELO — needs User-Agent: Mozilla/5.0 header
  4. ResearchGate — flag to user for manual download
  5. Semantic Scholar — sometimes has OA links
  6. Ask user for institutional access

Always verify downloads: file paper.pdf should say "PDF document", not HTML.

Searching Paper Content (ragnar)

Setup (per project)

  • scripts/rag_build.R — maps citation keys to Zotero PDF attachment keys, builds DuckDB
  • data/rag/ gitignored — store is local, not committed
  • Dependencies: ragnar, Ollama with nomic-embed-text model
  • See /lit-search skill for full recipe

Query

ragnar_store_connect() then ragnar_retrieve() — returns chunks with source file attribution.

Anti-patterns

  • NEVER write abstracts manually — if CrossRef has no abstract, leave blank
  • NEVER cite specific numbers without verifying from the source PDF via ragnar search
  • NEVER paraphrase equations — copy exact notation and cite page/section