Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 13 additions & 16 deletions .github/actions/setup-env/action.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
name: Setup Environment
description: Sets up Go (dynamically from go.mod) and installs system dependencies

inputs: {}
# bust_lumera_retag:
# description: "One-time: remove lumera sums after retag"
# required: false
# default: 'false'
inputs:
bust_lumera_retag:
description: "One-time: remove cached Lumera module artifacts after a retag/checksum refresh"
required: false
default: 'false'
outputs:
go-version:
description: "Go version parsed from go.mod"
Expand Down Expand Up @@ -33,17 +33,14 @@ runs:
sudo apt-get update
sudo apt-get install -y libwebp-dev make

# - name: One-time reset retagged lumera checksums
# if: ${{ inputs.bust_lumera_retag == 'true' }}
# shell: bash
# run: |
# echo "Busting go.sum entries for github.com/LumeraProtocol/lumera v1.11.0-rc (one-time)"
# # Remove stale checksums in all local modules
# find . -name 'go.sum' -maxdepth 3 -print0 | xargs -0 -I{} sed -i \
# '/github.com\/LumeraProtocol\/lumera v1.11.0-rc/d' {}
# # Clear module/build caches to avoid cached zips
# go clean -modcache || true
# rm -rf "$(go env GOCACHE)" || true
- name: Bust cached Lumera module artifacts
if: ${{ inputs.bust_lumera_retag == 'true' }}
shell: bash
run: |
echo "Busting cached Lumera module artifacts before go mod download"
go clean -modcache || true
rm -rf "$(go env GOCACHE)" || true
rm -rf "$(go env GOPATH)/pkg/mod/cache/download/github.com/!lumera!protocol/lumera" || true

- name: Set Go Private Modules
shell: bash
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/build&release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ jobs:

- name: Setup Go and dependencies
uses: ./.github/actions/setup-env
# with:
# bust_lumera_retag: 'true'
with:
bust_lumera_retag: 'true'

- name: Build binaries
run: |
Expand Down Expand Up @@ -74,8 +74,8 @@ jobs:

- name: Setup Go and dependencies
uses: ./.github/actions/setup-env
# with:
# bust_lumera_retag: 'true'
with:
bust_lumera_retag: 'true'

- name: Prepare Release Variables
id: vars
Expand Down
34 changes: 28 additions & 6 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ jobs:
uses: actions/checkout@v6.0.1
- name: Setup Go and system deps
uses: ./.github/actions/setup-env
# with:
# bust_lumera_retag: 'true'
with:
bust_lumera_retag: 'true'
- name: Go mod tidy
run: go mod tidy

Expand All @@ -35,8 +35,8 @@ jobs:

- name: Setup Go and system deps
uses: ./.github/actions/setup-env
# with:
# bust_lumera_retag: 'true'
with:
bust_lumera_retag: 'true'

- name: Go mod tidy
run: go mod tidy
Expand All @@ -54,8 +54,8 @@ jobs:

- name: Setup Go and system deps
uses: ./.github/actions/setup-env
# with:
# bust_lumera_retag: 'true'
with:
bust_lumera_retag: 'true'

- name: Go mod tidy
run: go mod tidy
Expand All @@ -70,6 +70,28 @@ jobs:
- name: Run cascade e2e tests
run: make test-cascade

lep6-e2e-tests:
name: lep6-e2e-tests
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v6.0.1

- name: Setup Go and system deps
uses: ./.github/actions/setup-env
with:
bust_lumera_retag: 'true'

- name: Go mod tidy
run: go mod tidy

- name: Install Lumera
run: make install-lumera

- name: Run LEP-6 e2e tests
run: make test-lep6

# sn-manager-e2e-tests:
# name: sn-manager-e2e-tests
# runs-on: ubuntu-latest
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ go.work
go.work.sum
tests/system/testnet
tests/system/**/supernode-data*
tests/system/supernode-lep6-data*/
.lep6-wip-backup/
tests/system/data
tests/system/1
# env file
Expand Down
45 changes: 41 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -120,9 +120,9 @@ release:
###################################################
### Tests and Simulation ###
###################################################
.PHONY: test-e2e test-unit test-integration test-system test-cascade test-sn-manager
.PHONY: install-lumera setup-supernodes system-test-setup install-deps
.PHONY: gen-cascade gen-supernode
.PHONY: test-e2e test-unit test-integration test-system test-cascade test-lep6 test-sn-manager
.PHONY: install-lumera setup-supernodes setup-lep6-supernodes system-test-setup install-deps
.PHONY: gen-cascade gen-supernode audit-mod-clean lep6-reset-dedup lep6-validate-config
test-unit:
${GO} test -v ./...

Expand Down Expand Up @@ -152,16 +152,22 @@ gen-supernode:
--grpc-gateway_out=gen \
--grpc-gateway_opt=paths=source_relative \
--openapiv2_out=gen \
proto/supernode/service.proto proto/supernode/status.proto proto/supernode/storage_challenge.proto
proto/supernode/service.proto proto/supernode/status.proto proto/supernode/storage_challenge.proto proto/supernode/self_healing.proto

# Define the paths
SUPERNODE_SRC=supernode/main.go
DATA_DIR=tests/system/supernode-data1
DATA_DIR2=tests/system/supernode-data2
DATA_DIR3=tests/system/supernode-data3
LEP6_DATA_DIR=tests/system/supernode-lep6-data1
LEP6_DATA_DIR2=tests/system/supernode-lep6-data2
LEP6_DATA_DIR3=tests/system/supernode-lep6-data3
CONFIG_FILE=tests/system/config.test-1.yml
CONFIG_FILE2=tests/system/config.test-2.yml
CONFIG_FILE3=tests/system/config.test-3.yml
LEP6_CONFIG_FILE=tests/system/config.lep6-1.yml
LEP6_CONFIG_FILE2=tests/system/config.lep6-2.yml
LEP6_CONFIG_FILE3=tests/system/config.lep6-3.yml

# Setup script
SETUP_SCRIPT=tests/scripts/setup-supernodes.sh
Expand All @@ -186,6 +192,12 @@ setup-supernodes:
@chmod +x $(SETUP_SCRIPT)
@bash $(SETUP_SCRIPT) all $(SUPERNODE_SRC) $(DATA_DIR) $(CONFIG_FILE) $(DATA_DIR2) $(CONFIG_FILE2) $(DATA_DIR3) $(CONFIG_FILE3)

setup-lep6-supernodes:
@echo "Setting up isolated LEP-6 supernode environments..."
@rm -rf tests/system/heal-staging
@chmod +x $(SETUP_SCRIPT)
@bash $(SETUP_SCRIPT) all $(SUPERNODE_SRC) $(LEP6_DATA_DIR) $(LEP6_CONFIG_FILE) $(LEP6_DATA_DIR2) $(LEP6_CONFIG_FILE2) $(LEP6_DATA_DIR3) $(LEP6_CONFIG_FILE3)

# Complete system test setup (Lumera + Supernodes)
system-test-setup: install-lumera setup-supernodes
@echo "System test environment setup complete."
Expand All @@ -201,6 +213,31 @@ test-cascade:
@echo "Running cascade e2e tests..."
@cd tests/system && ${GO} mod tidy && ${GO} test -tags=system_test -v -run TestCascadeE2E .

# Run LEP-6 e2e tests only against the real lumerad/local-chain system harness.
# The runtime test uses isolated supernode-lep6-data* directories so per-node
# SQLite history/dedup state is not shared with Cascade fixtures or other nodes.
test-lep6: setup-lep6-supernodes
@echo "Running LEP-6 e2e tests..."
@cd tests/system && ${GO} mod tidy && ${GO} test -tags=system_test -timeout=900s -v -run '^TestLEP6' .

# Validate LEP-6 local config/default/fixture coverage without starting a network.
lep6-validate-config:
@echo "Validating LEP-6 supernode config fixtures..."
@${GO} test ./supernode/config -run 'TestLoadConfig_LEP6|TestCreateDefaultConfig_IncludesExplicitLEP6Blocks|TestSystemConfigFixturesIncludeLEP6' -count=1

# Recover from stale Lumera module checksum/cache issues during local PR-6 work.
audit-mod-clean:
@echo "Cleaning Go module cache and re-resolving modules..."
@${GO} clean -modcache
@${GO} mod download

# Reset local LEP-6 dedup/reconciliation tables. Requires DB=/absolute/path/to/local.db.
lep6-reset-dedup:
@if [ -z "$(DB)" ]; then echo "DB=/absolute/path/to/local.db is required"; exit 2; fi
@test -f "$(DB)" || (echo "DB does not exist: $(DB)"; exit 2)
@echo "Resetting LEP-6 local dedup tables in $(DB): heal_claims_submitted, heal_verifications_submitted, storage_recheck_submissions, recheck_attempt_failures"
@sqlite3 "$(DB)" "DELETE FROM heal_claims_submitted; DELETE FROM heal_verifications_submitted; DELETE FROM storage_recheck_submissions; DELETE FROM recheck_attempt_failures;"

# Run sn-manager e2e tests only
test-sn-manager:
@echo "Running sn-manager e2e tests..."
Expand Down
111 changes: 111 additions & 0 deletions docs/lep6-supernode-runbook.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# LEP-6 Supernode Release Runbook

This runbook covers the Supernode-side LEP-6 storage-truth enforcement support introduced across the LEP-6 PR stack and finalized in PR-6.

## Scope

Supernode LEP-6 provides runtime support for Lumera `v1.12.0` audit/storage-truth APIs:

- storage challenge ticket discovery and transcript/evidence submission;
- storage recheck candidate discovery, local retry budget, and `MsgSubmitStorageRecheckEvidence` submission;
- self-healing heal-op dispatch, healer claim submission, verifier attestation submission, and finalizer publication only after chain-verified heal success;
- repo-native in-process observability snapshots plus structured `logtrace` events.

The chain remains the source of truth for heal-op scheduling, verifier assignment, verification quorum, rejected/failed/expired status, and scoring/probation changes.

## Release prerequisites

1. Supernode must depend on Lumera `v1.12.0` APIs.
2. Operators must run against a Lumera chain whose audit module includes LEP-6 storage-truth endpoints.
3. Supernode local SQLite storage must be writable; PR-6 adds local idempotency state for pending/submitted heal and recheck txs.
4. Existing Supernode status/log collection should be enabled so LEP-6 snapshot counters and structured logs are visible through the same operator workflow used by storage challenge, Cascade, and supernode metrics.

## Local validation commands

From the supernode repository root:

```bash
export PATH=/home/openclaw/.local/go/bin:$PATH
go test $(go list ./... | grep -v '/tests')
```

For the real-chain LEP-6 system test:

```bash
make system-test-setup
make test-lep6
```

`make test-lep6` runs `tests/system/TestLEP6RealChainIntegration` using the same real `lumerad`/local-chain harness as Cascade e2e. It does not use chain mocks.

## Observability

LEP-6 uses the repo-native Supernode observability pattern: in-process atomic snapshots plus structured `logtrace` fields. PR-6 does **not** add a LEP-6-only Prometheus endpoint.

LEP-6 snapshot signals include:

- challenge dispatch results by chain result class;
- challenge dispatch throttling drops by reason;
- challenge dispatch epoch duration totals/counts by role;
- ticket discovery outcomes;
- no-ticket-provider-active state;
- recheck candidates discovered and current pending candidate gauge;
- recheck submissions by result class/result;
- recheck already-submitted dedupe count;
- recheck failure counts by stage;
- heal claims by result;
- heal claim reconciliation count;
- heal verifications by result/vote;
- heal verification already-recorded dedupe count;
- self-healing pending claim gauge;
- self-healing staging bytes gauge;
- finalizer publish count;
- finalizer cleanup count by terminal chain status.

Suggested alerts/signals from snapshots/logs:

- sustained heal-claim `submit_error` or `stage_error` increases;
- sustained heal-verification `submit_error` or `stage_error` increases;
- sustained recheck failure increases by stage;
- challenge dispatch throttling drops approaching the chain cap;
- no-ticket-provider-active remaining true after candidate-producing epochs;
- self-healing staging bytes increasing without matching finalizer publish/cleanup progress;
- rejected/failed/expired finalizer cleanup spikes after a release.

## Operational behavior

### Successful healing

1. Chain schedules a heal-op and assigns a healer/verifiers.
2. Healer stages recovered data locally and pre-stages a local dedup row.
3. Healer submits `MsgClaimHealComplete`.
4. On chain acceptance, Supernode marks the local row as submitted.
5. Verifiers fetch and verify the staged manifest/hash, pre-stage local dedup rows, and submit `MsgSubmitHealVerification`.
6. Once chain marks the heal-op verified, the finalizer publishes the healed artifact to the P2P layer.

Important: the healed file is not published as durable P2P recovery output before successful chain verification.

### Rejected healing

If verifier quorum rejects the heal, the chain marks the heal-op rejected/failed according to Lumera `v1.12.0` keeper rules. Supernode does not publish the healer output as recovered data.

### Healer cannot heal / no-show

If the healer cannot produce a valid manifest or misses the deadline, the chain eventually expires/fails the heal-op and applies LEP-6 scoring/probation rules. Supernode records errors and retry/backoff state locally where applicable, but does not override chain status.

### Restart/idempotency

PR-6 closes the submit-success/persist-crash window by pre-staging local pending rows before chain tx submission for:

- heal claims;
- heal verifications;
- recheck evidence submissions.

Pending rows dedup retries after restart; successful txs are marked submitted after chain acceptance. Submit failures remove the pending row so the operation can retry later.

## Troubleshooting

- If duplicate tx errors appear after restart, inspect local SQLite `status` values for LEP-6 pending/submitted tables and compare with chain heal/recheck state.
- If recheck candidates stop processing, inspect `recheck_attempt_failures`; failures expire after the configured TTL and successful submissions clear the failure budget.
- If LEP-6 counters are flat while work is expected, inspect service startup/configuration first, then check structured `logtrace` events for the challenge, recheck, and self-healing services.
- If `make test-lep6` fails before tests start, run `make system-test-setup` and confirm `lumerad version` matches the Lumera dependency version.
Loading