Skip to content

M1.5: refactor block-validation swallow path to typed errors + caller filter #3434

@bdchatham

Description

@bdchatham

Problem

PR #3429 (M1 of #3427) implements the `mock_chain_validation` build-tag plumbing in sei-tendermint by refactoring 13 swallow-eligible halting-check call sites in `internal/state/validation.go` (9 sites) and `types/block.go` (4 sites) to a per-site compute-log-conditional-return pattern:

```go
err := fmt.Errorf("wrong Block.Header.AppHash. Expected %X, got %v", state.AppHash, block.AppHash)
if policy.SwallowAppHashFailure() {
types.LogSwallowedFailure(logger, "app_hash", "internal/state/validation.go:52", block.Height, state.AppHash, block.AppHash)
} else {
return err
}
```

The pattern works and ships correct behavior, but the per-site code is verbose and fans the swallow decision across 13 locations. Review feedback on #3429 (thread 1, thread 2) proposed a more idiomatic shape: return typed validation errors from each site, and consult `ConsensusPolicy` once in a single caller-side filter that decides swallow vs return.

Impact

  • Smaller per-site diff for future audit-row additions. Adding a new swallow-eligible check becomes "return `NewXError(...)`" + one new constructor + one new `Swallow*Failure()` method — not fanning a new if-block to every site.
  • DRY swallow logic. One filter, one log+counter call site, one place to evolve the swallow behavior (e.g., adding richer telemetry, structured retry hints).
  • Idiomatic Go. `errors.As` + typed-error hierarchy matches stdlib patterns; the per-site if-block was style-driven by the original audit's call-site enumeration, not by a structural requirement.
  • Future M3/M4 reuse. When cosmos-sdk module-genesis panics get swallowed in M3 and sei-chain modules in M4, the typed-error pattern extends naturally to those layers — same struct, new constructors.

Relevant experts

Proposed approach

Builds on top of merged #3429. `ConsensusPolicy` interface unchanged — the 13 `Swallow*Failure()` methods stay as the source of truth.

New file: `sei-tendermint/types/validation_errors.go` (~120 lines)

```go
package types

// SwallowableValidationError is returned by validateBlock and Block.ValidateBasic
// from sites enumerated in the M1.0 audit. The caller-side filter in validateBlock
// consults ConsensusPolicy.Swallow*Failure() per Kind and decides whether to
// swallow (log + counter + return nil) or propagate (return the wrapped error).
type SwallowableValidationError struct {
Kind string // audit-row label: "app_hash", "validators_hash", ...
Site string // file:line for log + counter labels
Height int64
Expected interface{}
Got interface{}
Wrapped error // optional: original VerifyCommit / ev.ValidateBasic err
}

func (e SwallowableValidationError) Error() string { / preserve existing fmt */ }
func (e *SwallowableValidationError) Unwrap() error { return e.Wrapped }

// 13 constructors, one per audit-row kind (matches the Swallow*Failure
// method set on ConsensusPolicy 1:1):
func NewAppHashError(height int64, expected, got []byte) *SwallowableValidationError { ... }
func NewDataHashError(...) *SwallowableValidationError { ... }
func NewLastResultsHashError(...) *SwallowableValidationError { ... }
func NewLastBlockIDError(...) *SwallowableValidationError { ... }
func NewConsensusHashError(...) *SwallowableValidationError { ... }
func NewValidatorsHashError(...) *SwallowableValidationError { ... }
func NewNextValidatorsHashError(...) *SwallowableValidationError { ... }
func NewLastCommitVerifyError(...) *SwallowableValidationError { ... }
func NewProposerNotInValidatorSetError(...) *SwallowableValidationError { ... }
func NewEvidenceOverflowError(...) *SwallowableValidationError { ... }
func NewLastCommitHashError(...) *SwallowableValidationError { ... }
func NewEvidenceHashError(...) *SwallowableValidationError { ... }
func NewPerEvidenceValidateBasicError(...) *SwallowableValidationError { ... }
```

Filter location: inside `validateBlock`

Not at `BlockExecutor.ValidateBlock`. `Block.ValidateBasic` is called from non-state callers (`BlockFromProto` at `types/block.go:347/667/1025`, `internal/test/factory/block.go:84`); filtering at the lowest common ancestor catches both surfaces. Caller-graph confirmed: no caller introspects raw validation errors for retry / peer-ban logic — the typed-error pattern is invisible to them.

Filter shape (~40 lines):

```go
func swallowOrReturn(logger log.Logger, policy ConsensusPolicy, err error) error {
if err == nil {
return nil
}
var ve *types.SwallowableValidationError
if !errors.As(err, &ve) {
return err // not swallow-eligible
}
swallow := false
switch ve.Kind {
case "app_hash": swallow = policy.SwallowAppHashFailure()
case "data_hash": swallow = policy.SwallowDataHashFailure()
case "last_results_hash": swallow = policy.SwallowLastResultsHashFailure()
case "last_block_id": swallow = policy.SwallowLastBlockIDFailure()
case "consensus_hash": swallow = policy.SwallowConsensusHashFailure()
case "validators_hash": swallow = policy.SwallowValidatorsHashFailure()
case "next_validators_hash": swallow = policy.SwallowNextValidatorsHashFailure()
case "last_commit_verify": swallow = policy.SwallowLastCommitVerifyFailure()
case "proposer_not_in_validator_set": swallow = policy.SwallowProposerNotInValidatorSetFailure()
case "evidence_overflow": swallow = policy.SwallowEvidenceOverflowFailure()
case "last_commit_hash": swallow = policy.SwallowLastCommitHashFailure()
case "evidence_hash": swallow = policy.SwallowEvidenceHashFailure()
case "per_evidence_validate_basic": swallow = policy.SwallowPerEvidenceValidateBasicFailure()
}
if !swallow {
return err
}
types.LogSwallowedFailure(logger, ve.Kind, ve.Site, ve.Height, ve.Expected, ve.Got)
return nil
}
```

Per-site refactor

Each of the 13 sites in `validation.go` + `block.go` becomes mechanical:

```go
// before (post-M1)
if !bytes.Equal(block.AppHash, state.AppHash) {
err := fmt.Errorf("wrong Block.Header.AppHash. Expected %X, got %v", state.AppHash, block.AppHash)
if policy.SwallowAppHashFailure() {
types.LogSwallowedFailure(logger, "app_hash", "internal/state/validation.go:52",
block.Height, state.AppHash, block.AppHash)
} else {
return err
}
}

// after (M1.5)
if !bytes.Equal(block.AppHash, state.AppHash) {
return types.NewAppHashError(block.Height, state.AppHash, block.AppHash)
}
```

The `Skip*` short-circuits (`SkipAppHashValidation`, `SkipDataHashValidation`, `tmtypes.SkipLastResultsHashValidation.Load()`) stay as inline guards before the comparison runs — Giga's fast-path preserved.

Acceptance criteria

  • `sei-tendermint/types/validation_errors.go` lands with the struct + 13 constructors
  • `validateBlock` wraps every `return err` from swallow-eligible sites through `swallowOrReturn`
  • All 13 sites in `validation.go` + `block.go` use the typed-error constructors instead of inline if-blocks
  • `ConsensusPolicy` interface unchanged (13 `Swallow*Failure()` methods, no breaking changes)
  • `go build` + `go test` pass under default build and `-tags mock_block_validation` (matching feat(consensus-policy): mock_chain_validation — Swallow* surface + validation refactor (M1) #3429's acceptance)
  • M1.0 audit doc (`sei-protocol/platform:docs/designs/mock-chain-validation-m1-audit.md`) gets a one-line note pointing at this M1.5 PR

Out of scope (defer)

  • Collapsing `ConsensusPolicy` to a kind-based `ShouldSwallow(kind string) bool` method. The 13-method shape is more verbose but `go vet`-detectable; collapsing is a one-way door that loses compile-time coverage. Un-defer trigger: someone proposes a fourth dispatch axis (e.g., per-Kind metric labels stamped on the policy itself) where the 13-method shape becomes a cost.
  • Extending the typed-error pattern into sei-cosmos module-genesis panics (M3) or sei-chain modules (M4). Same struct + new constructors, but the architecture is settled by M1.5; M3/M4 just consume it.

References

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions