Join and PartialJoin traits#20
Conversation
43b2c19 to
e295fb3
Compare
bc1cindy
left a comment
There was a problem hiding this comment.
looks good, utACK
accumulating conflicts into a set instead of picking a value arbitrarily is the right call
arminsabouri
left a comment
There was a problem hiding this comment.
PartialJoin and Join definitions looked sane. Conflicts are still tripping me up.
What is the flow? if there is a conflict between two PSBTs and a third needs to be partially joined is it:
- Join all three in one operation and deal with the conflicts?
- Join the first two, resolve conflicts / or rejects. If Ok() then join the third?
The latter makes more sense to me. Curious to know your thoughts
dc777b0 to
d8edb43
Compare
|
Re-ack required, IntoIterator &Conflict implementation was a brain fart, not sure why my brain expected it to magically implement .iter() on the target type (but now that that exists, its implementation is in terms of .iter()) I also reordered some of the impl blocks, conslidated two of them by changing my mind for the 40th time about whether Conflict should require |
d8edb43 to
76f775e
Compare
arminsabouri
left a comment
There was a problem hiding this comment.
re-ACK
Had a question about the use of wrap() in a unit test. Shouldn't be a blocker.
89ed490 to
2a8d621
Compare
2a8d621 to
b050ba6
Compare
Join: infallible binary merge (semilattice operation). Implementors must satisfy idempotent, commutative, and associative laws. JoinMut: in-place variant; blanket impl provides Join automatically. assert_join_laws! is a crate-internal macro (gated behind prop-tests feature, scoped via #[macro_use]). Given an arbitrary strategy, it generates three proptests verifying the semilattice laws for any Join + Clone + PartialEq + Debug type. Used by downstream commits to validate collection and domain-type Join implementations.
PartialJoin<V>::try_join returns JoinResult<V>, either Ok(v), where v
the least upper bound, or a Err(Conflict<V>) where the conflict contains
values that could not be joined.
Conflict<V> is a multiset with set-equality semantics (order-independent).
It implements JoinMut to merge conflict sets (union of distinct values).
JoinResult<V> itself implements Join:
(Ok(a), Ok(b)) => a.try_join(b) (delegate to PartialJoin on V)
(Ok(v), Err(c)) => Err(c ∪ {v}) (absorb into conflict)
(Err(c), Ok(v)) => Err(c ∪ {v}) (absorb into conflict)
(Err(a), Err(b)) => Err(a ∪ b) (join conflict sets)
Containers or product types that wrap their fields in JoinResult<V> can
implement Join recursively to compute the field-by-field merge without
early exit.
assert_partial_join_laws! is a crate-internal macro (gated behind
prop-tests feature, scoped via #[macro_use]). Given clean-value and
result strategies, it generates try_join law tests, JoinResult law tests
(via assert_join_laws!), and a wrap roundtrip test.
## Rationale for Conflict as flat, order preserving & order insensitive
Conflicts represent a formal completion of the partial semilattice `V :
PartialJoin`. If `a` and `b` are conflicts, commutativity requires that
`a.join(b) == b.join(a)`, and idempotence requires that `a.join(a) ==
a`. This is somewhat at odds with keeping track of where each
conflicting value originated from.
The main purpose of preserving the order is to allow provenance to be
tracked.
If `a` and `b` are both conflict free, and `let c = a.join(b)`, then
`c.conflicted_field.len() == 2` and the first value is from `a` whereas
the second is from `b`, which makes reporting this as an error with
clear diagnostics easier, without requiring the provenance be tracked by
some kind of surrogate ID.
This does not generalize to n > 2, because if
`a.join(b).join(c).some_conflicted_field.len() == 2`, the values could
originate from `(a, b)`, `(b, c)`, or `(a, c)`.
This compromise keeps the interface and implementation simple, allows
provenance to be tracked as long as it's done one pair of values at a
time in a straightforward way, but imposes no additional burdens on
users that do not care about provenance (for instance if computing
something like `vs.reduce(|a, b| a.join(b))`)
### Alternatives considered
Several alternative approaches were tried, of which the compromise of
making `Conflict` just a thin wrapper around Vec seemed the best.
#### HashSet or BTreeSet based
This alternative is very close to what is implemented. The differences
are that with a Vec, the order is preserved, the implementation of
equality and `join` has quadratic complexity. We expect `n` to be very
small so this shouldn't make a difference in practice.
Using a lookup based set requires `V : Hash` or `V : Ord` which the
current `Conflict` does not require (unfortunately adding it later would
be semver breaking, as would be changing the return value from `iter()`
or the associated `IntoIter` type of the `IntoIterator` impl).
#### Recursive data type
The following definition could in principle shadow the `join` structure:
```rs
enum Conflict<V> {
Value(V),
Pair([Box<Conflict>; 2])
}
```
In this case, if `a.join(b).join(c.join(d))` has 4 conflicting values,
they would take the form `Pair([ Pair([ Value(x), Value(y) ]), Pair([
Value(z), Value(w) ]) ])`, which is arguably more informative.
Unfortunately this is still imprecise because if `a.join(b).join(c)` has
a binary conflict `Pair([ Value(x), Value(y) ])`, it's still ambiguous
in the same way.
In order for this approach to be workable it has to shadow the syntax
tree of the join operation for the Ok branch too, in which case this
entire abstraction kind of only computing the transpose, going from e.g.
a list of structs with values, to a struct of lists of values, but not
reducing any of the complexity unless there are no conflicts anywhere.
The purpose of these abstractions is to take the problem of merging two
or more compound values into the a series of simpler problems, merging
two or more elements of a simpler type. Tracking provenance with perfect
fidelity means that if there is any conflict the structure is not
simplified at all.
#### n-ary join
The final option considered was defining join not as a binary operation
but n-ary. This is no different than
### Associated error type or generics
The above alternatives imply a "one size fits all" approach. However,
PartialJoin could have an Error type, where `JoinResult<V> = Result<V,
<V as PartialJoin>::Error>`.
Ostensibly this would allow some choice, but with associated types the
choice is fixed per implementation of the trait and so would not afford
users the choice of whether to opt out of provenance tracking for
simpler errors or opt in and deal with the added complexity.
Making the error type fully generic would make that possible with even
more complexity and syntactic overhead. However, no generality would be
gained for this additional complexity.
Thinking of Conflict<V> as just "deferred arguments for a join" (i.e. a
formal product), any arbitrary merge operation can be expressed by just
taking those arguments.
More formally, Conflict<V> is the free semilattice (sets under union)
over V. Since every semilattice is a quotient of the free semilattice,
so there is no operation that can be expressed by setting Error to some
type that merges V's according to some rules (e.g. taking the max of
integers) that can't be expressed by simply processing the conflict
after the fact.
### Conclusion
For these reasons, making Conflict a thin wrapper around `Vec` seems
like the best compromise: has the same expressive power but results in a
simpler interface than all the alternatives, and makes provenance
tracking possible and even relatively straightforward without forcing it
on all users.
The no-coverage-report special case hid real coverage build failures unintentionally.
b050ba6 to
1345567
Compare

Introduces
JoinandPartialJoin, traits for combining values that respect commutativity, associativity, and idempotence.This allows collections of values to be merged, tracking conflicts between them.
Join::joinis total (infallible),PartialJoin::try_joinreturns aJoinResult.JoinResultimplementsJoin.PartialJoin::wrapwrapsselfinOkinjecting or "lifting" it into theJoinResultspace, which is essentially a completion of join by way of a formal product: theConflictstruct just accumulates arguments tojoinwhen one does not exist in the partial definition.Lifting to the result type is more convenient since a total join can be used to reduce items in any order and obtain the least upper bound of all of the inputs, which in this case may be a conflict.