Skip to content

Join and PartialJoin traits#20

Open
nothingmuch wants to merge 5 commits into
mainfrom
lattice-partial
Open

Join and PartialJoin traits#20
nothingmuch wants to merge 5 commits into
mainfrom
lattice-partial

Conversation

@nothingmuch
Copy link
Copy Markdown
Collaborator

@nothingmuch nothingmuch commented May 26, 2026

Introduces Join and PartialJoin, traits for combining values that respect commutativity, associativity, and idempotence.

This allows collections of values to be merged, tracking conflicts between them.

Join::join is total (infallible), PartialJoin::try_join returns a JoinResult. JoinResult implements Join. PartialJoin::wrap wraps self in Ok injecting or "lifting" it into the JoinResult space, which is essentially a completion of join by way of a formal product: the Conflict struct just accumulates arguments to join when one does not exist in the partial definition.

Lifting to the result type is more convenient since a total join can be used to reduce items in any order and obtain the least upper bound of all of the inputs, which in this case may be a conflict.

@nothingmuch nothingmuch force-pushed the lattice-partial branch 2 times, most recently from 43b2c19 to e295fb3 Compare May 26, 2026 17:14
@nothingmuch nothingmuch marked this pull request as ready for review May 26, 2026 20:44
@nothingmuch nothingmuch changed the title Lattice partial Join and PartialJoin traits May 26, 2026
Copy link
Copy Markdown

@bc1cindy bc1cindy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, utACK

accumulating conflicts into a set instead of picking a value arbitrarily is the right call

Copy link
Copy Markdown
Collaborator

@arminsabouri arminsabouri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PartialJoin and Join definitions looked sane. Conflicts are still tripping me up.
What is the flow? if there is a conflict between two PSBTs and a third needs to be partially joined is it:

  1. Join all three in one operation and deal with the conflicts?
  2. Join the first two, resolve conflicts / or rejects. If Ok() then join the third?

The latter makes more sense to me. Curious to know your thoughts

Comment thread src/lattice/partial.rs Outdated
Comment thread src/lattice/partial.rs Outdated
Copy link
Copy Markdown
Collaborator Author

@nothingmuch nothingmuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

for the record, i'm requesting changes on my own pull request.

Comment thread src/lattice/partial.rs Outdated
@nothingmuch nothingmuch force-pushed the lattice-partial branch 3 times, most recently from dc777b0 to d8edb43 Compare May 27, 2026 23:10
@nothingmuch nothingmuch requested a review from arminsabouri May 27, 2026 23:10
@nothingmuch
Copy link
Copy Markdown
Collaborator Author

Re-ack required, IntoIterator &Conflict implementation was a brain fart, not sure why my brain expected it to magically implement .iter() on the target type (but now that that exists, its implementation is in terms of .iter())

I also reordered some of the impl blocks, conslidated two of them by changing my mind for the 40th time about whether Conflict should require V : PartialJoin or not, added the giant rationale wall of text, and moved the provenance/ordering discussion to try_join.

Copy link
Copy Markdown
Collaborator

@arminsabouri arminsabouri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK

Had a question about the use of wrap() in a unit test. Shouldn't be a blocker.

Comment thread src/lattice/partial.rs
Comment thread src/lattice/partial.rs
@nothingmuch nothingmuch force-pushed the lattice-partial branch 2 times, most recently from 89ed490 to 2a8d621 Compare May 27, 2026 23:56
Comment thread src/lattice/partial.rs
Join: infallible binary merge (semilattice operation). Implementors
must satisfy idempotent, commutative, and associative laws.

JoinMut: in-place variant; blanket impl provides Join automatically.

assert_join_laws! is a crate-internal macro (gated behind prop-tests
feature, scoped via #[macro_use]). Given an arbitrary strategy, it
generates three proptests verifying the semilattice laws for any
Join + Clone + PartialEq + Debug type. Used by downstream commits
to validate collection and domain-type Join implementations.
PartialJoin<V>::try_join returns JoinResult<V>, either Ok(v), where v
the least upper bound, or a Err(Conflict<V>) where the conflict contains
values that could not be joined.

Conflict<V> is a multiset with set-equality semantics (order-independent).
It implements JoinMut to merge conflict sets (union of distinct values).

JoinResult<V> itself implements Join:

  (Ok(a), Ok(b))   => a.try_join(b)      (delegate to PartialJoin on V)
  (Ok(v), Err(c))  => Err(c ∪ {v})       (absorb into conflict)
  (Err(c), Ok(v))  => Err(c ∪ {v})       (absorb into conflict)
  (Err(a), Err(b)) => Err(a ∪ b)         (join conflict sets)

Containers or product types that wrap their fields in JoinResult<V> can
implement Join recursively to compute the field-by-field merge without
early exit.

assert_partial_join_laws! is a crate-internal macro (gated behind
prop-tests feature, scoped via #[macro_use]). Given clean-value and
result strategies, it generates try_join law tests, JoinResult law tests
(via assert_join_laws!), and a wrap roundtrip test.

## Rationale for Conflict as flat, order preserving & order insensitive

Conflicts represent a formal completion of the partial semilattice `V :
PartialJoin`. If `a` and `b` are conflicts, commutativity requires that
`a.join(b) == b.join(a)`, and idempotence requires that `a.join(a) ==
a`. This is somewhat at odds with keeping track of where each
conflicting value originated from.

The main purpose of preserving the order is to allow provenance to be
tracked.

If `a` and `b` are both conflict free, and `let c = a.join(b)`, then
`c.conflicted_field.len() == 2` and the first value is from `a` whereas
the second is from `b`, which makes reporting this as an error with
clear diagnostics easier, without requiring the provenance be tracked by
some kind of surrogate ID.

This does not generalize to n > 2, because if
`a.join(b).join(c).some_conflicted_field.len() == 2`, the values could
originate from `(a, b)`, `(b, c)`, or `(a, c)`.

This compromise keeps the interface and implementation simple, allows
provenance to be tracked as long as it's done one pair of values at a
time in a straightforward way, but imposes no additional burdens on
users that do not care about provenance (for instance if computing
something like `vs.reduce(|a, b| a.join(b))`)

### Alternatives considered

Several alternative approaches were tried, of which the compromise of
making `Conflict` just a thin wrapper around Vec seemed the best.

#### HashSet or BTreeSet based

This alternative is very close to what is implemented. The differences
are that with a Vec, the order is preserved, the implementation of
equality and `join` has quadratic complexity. We expect `n` to be very
small so this shouldn't make a difference in practice.

Using a lookup based set requires `V : Hash` or `V : Ord` which the
current `Conflict` does not require (unfortunately adding it later would
be semver breaking, as would be changing the return value from `iter()`
or the associated `IntoIter` type of the `IntoIterator` impl).

#### Recursive data type

The following definition could in principle shadow the `join` structure:

```rs
enum Conflict<V> {
    Value(V),
    Pair([Box<Conflict>; 2])
}
```

In this case, if `a.join(b).join(c.join(d))` has 4 conflicting values,
they would take the form `Pair([ Pair([ Value(x), Value(y) ]), Pair([
Value(z), Value(w) ]) ])`, which is arguably more informative.
Unfortunately this is still imprecise because if `a.join(b).join(c)` has
a binary conflict `Pair([ Value(x), Value(y) ])`, it's still ambiguous
in the same way.

In order for this approach to be workable it has to shadow the syntax
tree of the join operation for the Ok branch too, in which case this
entire abstraction kind of only computing the transpose, going from e.g.
a list of structs with values, to a struct of lists of values, but not
reducing any of the complexity unless there are no conflicts anywhere.

The purpose of these abstractions is to take the problem of merging two
or more compound values into the a series of simpler problems, merging
two or more elements of a simpler type. Tracking provenance with perfect
fidelity means that if there is any conflict the structure is not
simplified at all.

#### n-ary join

The final option considered was defining join not as a binary operation
but n-ary. This is no different than 

### Associated error type or generics

The above alternatives imply a "one size fits all" approach. However,
PartialJoin could have an Error type, where `JoinResult<V> = Result<V,
<V as PartialJoin>::Error>`.

Ostensibly this would allow some choice, but with associated types the
choice is fixed per implementation of the trait and so would not afford
users the choice of whether to opt out of provenance tracking for
simpler errors or opt in and deal with the added complexity.

Making the error type fully generic would make that possible with even
more complexity and syntactic overhead. However, no generality would be
gained for this additional complexity.

Thinking of Conflict<V> as just "deferred arguments for a join" (i.e. a
formal product), any arbitrary merge operation can be expressed by just
taking those arguments.

More formally, Conflict<V> is the free semilattice (sets under union)
over V. Since every semilattice is a quotient of the free semilattice,
so there is no operation that can be expressed by setting Error to some
type that merges V's according to some rules (e.g. taking the max of
integers) that can't be expressed by simply processing the conflict
after the fact.

### Conclusion

For these reasons, making Conflict a thin wrapper around `Vec` seems
like the best compromise: has the same expressive power but results in a
simpler interface than all the alternatives, and makes provenance
tracking possible and even relatively straightforward without forcing it
on all users.
The no-coverage-report special case hid real coverage build failures
unintentionally.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants