Skip to content

resource manager trait and impl#4409

Open
elnosh wants to merge 13 commits intolightningdevkit:mainfrom
elnosh:resource-mgr
Open

resource manager trait and impl#4409
elnosh wants to merge 13 commits intolightningdevkit:mainfrom
elnosh:resource-mgr

Conversation

@elnosh
Copy link
Contributor

@elnosh elnosh commented Feb 10, 2026

Part of #4384

This PR introduces a ResourceManager trait and DefaultResourceManager implementation of that trait which is based on the proposed mitigation in lightning/bolts#1280.

It only covers the standalone implementation of the mitigation. I have done some testing with integrating it into the ChannelManager but that can be done separately. As mentioned in the issue, the resource manager trait defines these 4 methods to be called from the channel manager:

  • add_channel
  • remove_channel
  • add_htlc
  • resolve_htlc

Integrating into the ChannelManager

  • The ResourceManager is intended to be internal to the ChannelManager rather than users instantiating their own and passing it to a ChannelManager constructor.

  • add/remove_channel should be called when channels are opened/closed.

  • add_htlc: When processing HTLCs, the channel manager would call add_htlc which returns a ForwardingOutcome telling it whether to forward or fail the HTLC along with the accountable signal to use in case that it should be forwarded. For the initial "read-only" mode, the channel manager would log the results but not actually fail the HTLC if it was told to do so. A bit more specific on where it would be called: I think it will be when processing the forward_htlcs before we queue the add_htlc to the outgoing channel

    if let Err((reason, msg)) = optimal_channel.queue_add_htlc(

  • resolve_htlc: Used to tell back the ResourceManager the resolution of an HTLC. It will be used to release bucket resources and update reputation/revenue values internally.

This could have more tests but opening early to get thoughts on design if possible

cc @carlaKC

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Feb 10, 2026

👋 Thanks for assigning @carlaKC as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@codecov
Copy link

codecov bot commented Feb 11, 2026

Codecov Report

❌ Patch coverage is 93.77028% with 96 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.29%. Comparing base (94d1e5e) to head (41d3ba9).
⚠️ Report is 315 commits behind head on main.

Files with missing lines Patch % Lines
lightning/src/ln/resource_manager.rs 93.77% 65 Missing and 31 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4409      +/-   ##
==========================================
+ Coverage   86.03%   86.29%   +0.26%     
==========================================
  Files         156      161       +5     
  Lines      103091   109078    +5987     
  Branches   103091   109078    +5987     
==========================================
+ Hits        88690    94134    +5444     
- Misses      11891    12286     +395     
- Partials     2510     2658     +148     
Flag Coverage Δ
tests 86.29% <93.77%> (+0.26%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@carlaKC carlaKC self-requested a review February 11, 2026 07:04
Copy link
Contributor

@carlaKC carlaKC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great job on this! Done an overly-specific first review round for something that's in draft because I've taken a look at previous versions of this code before when we wrote simulations. Also haven't looked at the tests in detail yet, but coverage is looking ✨ great ✨ .

I think that taking a look at tracking slot usage in GeneralBucket with a single source of truth is worth taking a look at, seems like it could clean up a few places where we need to two hashmap lookups one after the other.

In the interest of one day fuzzing this, I think it could also use some validation that enforces our protocol assumptions (eg, number of slots <= 483).

@ldk-reviews-bot
Copy link

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

@elnosh
Copy link
Contributor Author

elnosh commented Feb 16, 2026

think I have addressed most of the comments code-wise. Still need to add some requested comments/docs changes.

@elnosh
Copy link
Contributor Author

elnosh commented Feb 17, 2026

pushed more fixups addressing requests for adding docs/comments, lmk if those look good

Comment on lines +20 to +28
/// Tracks the occupancy of HTLC slots in the bucket.
slots_occupied: Vec<bool>,

/// SCID -> (slots assigned, salt)
/// Maps short channel IDs to an array of tuples with the slots that the channel is allowed
/// to use and the current usage state for each slot. It also stores the salt used to
/// generate the slots for the channel. This is used to deterministically generate the
/// slots for each channel on restarts.
channels_slots: HashMap<u64, (Vec<(u16, bool)>, [u8; 32])>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't accidentally double-assign them.

Yeah it shouldn't (provided we don't have bugs), but tracking the same information (whether a slot is occupied) in multiple places is a design that allows for inconsistency / the possibility of bugs. If we have a single source of truth, we move from "shouldn't double assign" to "can't double assign".

Gave it a shot here, lmk what you think!

@TheBlueMatt
Copy link
Collaborator

First of all not sure why all your commit messages are line-wrapped at 40 chars, but you can use like 60 or 70 lol.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, I think the design is fine, but startup resync may be annoying.

}
}

/// Tracks an average value over multiple rolling windows to smooth out volatility.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kinda confused by this struct. First of all, the docs here are wrong - we aren't tracking "multiple windows" we're tracking a rolling average over one window of window * window_count. The only difference between this and DecayingAverage is it tries to compensate for if we don't have enough data to actually go back window_count * window. Why shouldn't we just have DecayingAverage do that instead of having a separate struct here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to keep separate because the use of DecayingAverage for reputation differs from AggregatedWindowAverage when tracking revenue. For reputation, we want the DecayingAverage over the full window (24 weeks). For revenue, using AggregatedWindowAverage, we track the decaying average over the same window (24 weeks) but divide by window_count because we want the revenue for 2 weeks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we want to track two different things here:

  • Reputation (as DecayingAverage): we want shocks to reflect, so that we can quickly react to a change in attacker behavior
  • Revenue (as AggregatedWindowAverage): we want to smooth shocks to track our peer's average revenue in two weeks over a window_count periods.

But ran some numbers and it does look like we're penalizing old data a bit too much with this approach, as mentioned below.

struct DecayingAverage {
value: i64,
last_updated_unix_secs: u64,
window: Duration,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't actually use window (only decay_rate) so we can drop it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems okay to me to just write the decay_rate directly. We'd only need the window if we wanted to change the way that we calculate it, and that seems unlikely?

// We are not concerned with the rounding precision loss for this value because it is
// negligible when dealing with a long rolling average.
Ok((self.aggregated_revenue_decaying.value_at_timestamp(timestamp_unix_secs)? as f64
/ window_divisor)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't buy this? Let's say our windows_tracked is 4 and we have some data for the last 3 windows. On average, those 3 windows worth of data data will have been multiplied by 0.62175 (https://www.wolframalpha.com/input?i=%28integral+from+0+to+3+%280.5+%5E+0.5%29+%5E+x%29+%2F+3) but then we divide it by three. Whereas if we only have data for a single-window, that data will multiplied by, on average, 0.845111 (https://www.wolframalpha.com/input?i=%28integral+from+0+to+1+%280.5+%5E+0.5%29+%5E+x%29+%2F+1), and then we'll divide by one. We have to factor in the decrease in the data from the decay as well as just the increased amount of data here.

Comment on lines +20 to +28
/// Tracks the occupancy of HTLC slots in the bucket.
slots_occupied: Vec<bool>,

/// SCID -> (slots assigned, salt)
/// Maps short channel IDs to an array of tuples with the slots that the channel is allowed
/// to use and the current usage state for each slot. It also stores the salt used to
/// generate the slots for the channel. This is used to deterministically generate the
/// slots for each channel on restarts.
channels_slots: HashMap<u64, (Vec<(u16, bool)>, [u8; 32])>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the protection algorithm break if slots are allocated probabilistically? We could reduce implementation complexity a good bit if we just drop channel_slots entirely and generate the list of slots the channel can occupy any time we need it and allow two channels to occupy the same slot (presumably leading to some extra HTLC failures in that case?). This feels very much like a bloom filter problem where we should be able to reduce FPs somehow, though maybe it isn't quite the same because we actually do want conflicts to be "common".

}
}

impl Readable for DefaultResourceManager {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmmmmmmmmmmmmmmmm. Reconciliation on startup is gonna be tricky here. What happens if we accept an HTLC then restart and actually it never made it to disk in the ChannelMonitor? Theoretically this can be persisted as a part of ChannelManager and it should be consistent-ish, but Val is hard at work making it so that we don't have to persist ChannelManager at all.

Instead, I wonder how easy we can make it to rebuild this from HTLC information. It would require some additional integration into "LDK core" but hopefully not much. If we have some HTLCSlotUsage struct that we return from add_htlc in the ForwardingOutcome::Forward case, we could presumably shove that into the HTLCSource (as the lots are "on" the inbound channel) and rebuild the resource manager very cheaply.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we accept an HTLC then restart and actually it never made it to disk in the ChannelMonitor? Theoretically this can be persisted as a part of ChannelManager and it should be consistent-ish, but Val is hard at work making it so that we don't have to persist ChannelManager at all.

hmmmm yeah I thought about that but was operating under the assumption that by persisting along with the ChannelManager it should stay consistent.

In a world where we don't persist the ChannelManager I was exploring your suggestion to rebuild the resource manager from HTLC data we have on startup and came up with the approach here: elnosh@cdd0bf8 With some caveats, I think we can replay HTLCs by calling add_htlc on the ResourceManager so we would only need general HTLC information and no need to shove bucket/resourcemanager specific information into HTLCSource. We would basically need this HTLC info on startup. I added 2 helper methods in channel.rs and the replay on the ChannelManager could look like this https://github.com/elnosh/rust-lightning/blob/cdd0bf80cb200d370995c4f859645c0a54b3a798/lightning/src/ln/channelmanager.rs#L19303-L19366

With this, I was able to restart a node with pending HTLCs and replayed them fine in the resource manager using Channel data. The only field I would need to add to HTLCSource is incoming_accountable

The caveat is that reputation and in-flight-risk when replaying the HTLCs might be somewhat (slightly) different if the shutdown time was long because the current timestamp is different.

Another approach would be to store the specific bucket usage in the HTLCSource so we replay HTLCs and add them directly to the bucket they were before shutdown. I went with previous approach mentioned since I think that will be less intrusive in the channel manager and would require less resourcemanager-specific information to leak into the channel manager. Let me know what you think

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only question there is what the performance cost is. If we have 500 channels and have to replay a hundred HTLCs per channel how bad does it get?

Copy link
Contributor Author

@elnosh elnosh Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to run it but, indeed, it is not optimal because for each outbound HTLC in each channel it needs to lookup the inbound htlc on the incoming channel. It could store the missing fields in the HTLCSource as well to avoid the inbound htlc lookup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did alternative approach in 9094319

Implements a decaying average over a rolling window. It will be
used in upcoming commits by the resource manager to track
reputation and revenue of channels.
@elnosh
Copy link
Contributor Author

elnosh commented Mar 3, 2026

I have pushed changes for majority of comments from last round - diff here.

The most notable things are:

  • added a PendingHTLCReplay to be passed from upstream by the ChannelManager to replay pending HTLCs on startup instead of writing them in the ResourceManager
  • Do not double-track HTLC slot occupancy in general bucket and only track them in slots_occupied.
  • Use ChaCha instead of sha256 for slot generation in general bucket
  • Added more test cases

@elnosh elnosh marked this pull request as ready for review March 3, 2026 14:38
@valentinewallace valentinewallace requested review from carlaKC and removed request for valentinewallace March 3, 2026 14:40
@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @carlaKC! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 2nd Reminder

Hey @carlaKC! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 3rd Reminder

Hey @carlaKC! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

Copy link
Contributor

@carlaKC carlaKC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't review tests yet, main comment is about how we handle replays on restart (+ saving needing to persist a few things).

struct DecayingAverage {
value: i64,
last_updated_unix_secs: u64,
window: Duration,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems okay to me to just write the decay_rate directly. We'd only need the window if we wanted to change the way that we calculate it, and that seems unlikely?

}
}

/// Tracks an average value over multiple rolling windows to smooth out volatility.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we want to track two different things here:

  • Reputation (as DecayingAverage): we want shocks to reflect, so that we can quickly react to a change in attacker behavior
  • Revenue (as AggregatedWindowAverage): we want to smooth shocks to track our peer's average revenue in two weeks over a window_count periods.

But ran some numbers and it does look like we're penalizing old data a bit too much with this approach, as mentioned below.

Comment on lines +146 to +147
// TODO: could return the slots already assigned instead of erroring.
Entry::Occupied(_) => Err(()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meant that assign_slots_for_channel doesn't need &self at all - we can just pass in our_scid + per_channel_slots, return the slots/salt we're adding and then have the caller be responsible for adding these values to self.channel_slots.

Saves us a double lookup because we're looking up the entry in the caller (to see if we need to assign_slots_for_channel and looking up again here).

Comment on lines +221 to +222
self.slots_used += 1;
self.liquidity_used += htlc_amount_msat;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: debug_assert that we never go over our _allocated values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that caught by the resources_available check above this?

.map_err(|_| DecodeError::InvalidValue)?,
);
}
Ok(forwarding_outcomes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline: it seems possible that we end up in a state where we replay a pending htlc (which we know we previously forwarded) and get a ForwardingOutcome::Fail because the decaying averages we're tracking have changed since we first called add_htlc.

This will be fine in "read only" mode, but once we're running for real we'll need a way to handle this (because we won't have HTLCs in our internal state, and resolve_htlc will be called for an unknown htlc).

Seems like a reasonable solution to track these in a failed_replays map and have some special case handling for them. A really nice side effect of this is that we no longer need to persist GeneralBucket + ResourceManagerConfig (previously, we had the persist these to make sure we have the same bucket sizes so that everything we replay will fit in them - but if we accept that this isn't the world we live in already, then this issue is already handled).

Practically in terms of our defense, failing to re-add htlcs to our state means that we're a bit more forgiving on restarts (since each failed_replay isn't in our internal state). Since we're just shooting for readonly now, seems reasonable to live with this (log it to understand how much it happens) and improve as necessary.

The Channel struct introduced here has the core information that
will be used by the resource manager to make forwarding decisions
on HTLCs:

- Reputation that this channel has accrued as an outgoing link
in HTLC forwards.

- Revenue (forwarding fees) that the channel has earned us as an
incoming link.

- Pending HTLCs this channel is currently holding as an outgoing link.

- Bucket resources that are currently in use in general, congestion
and protected.
@ldk-claude-review-bot
Copy link
Collaborator

ldk-claude-review-bot commented Mar 23, 2026

Re-review Summary for PR #4409 (Pass 4)

Two new inline comments posted on issues not covered by prior reviews:

New inline comments

  • resource_manager.rs:671-676resolution_period = Duration::ZERO causes f64 division by zero producing infinity, silently corrupting all reputation/risk calculations and permanently disabling the protected bucket
  • resource_manager.rs:367-369bucket_allocations potential u16/u64 underflow wrapping when f64 rounding causes general_slots + congestion_slots > max_accepted_htlcs

Prior inline comments (still open, 11 total)

  • resource_manager.rs:149 — Division by zero when slots_allocated is 0
  • resource_manager.rs:166 — Division by zero when per_slot_msat is 0
  • resource_manager.rs:258 — Nonce only uses upper 4 of 8 bytes of incoming SCID
  • resource_manager.rs:263max_attempts may be insufficient for slot assignment
  • resource_manager.rs:528 — Division by zero when congestion_bucket.slots_allocated is 0
  • resource_manager.rs:610Channel::read deserialization lacks validation
  • resource_manager.rs:666 — u32 overflow in htlc_in_flight_risk
  • resource_manager.rs:700DefaultResourceManager doesn't implement the ResourceManager trait
  • resource_manager.rs:716 — Silent u8 truncation of revenue_window_weeks_avg
  • resource_manager.rs:923 — Partial state corruption in resolve_htlc
  • resource_manager.rs:1073DecayingAverage timestamp ratchet blocks forwarding after clock skew
  • resource_manager.rs:1129 — Integer division by zero when avg_weeks is 0

Cross-cutting concerns

  • No ResourceManagerConfig validation: All config fields are pub with no validation anywhere. resolution_period = 0, reputation_multiplier = 0, or general_allocation_pct + congestion_allocation_pct >= 100 all cause silent corruption or panics. A validated constructor or a validation method on ResourceManagerConfig would address this entire class of issues.
  • Division-by-zero family (6+ instances): All stem from missing minimum-value validation on config parameters, channel parameters, and deserialization paths.
  • No runtime check that incoming_channel_id != outgoing_channel_id in add_htlc — only guarded by a debug_assert stripped in release builds.
  • Clock-skew sensitivity: The mutable-on-read value_at_timestamp pattern creates a one-way timestamp ratchet across all reputation and revenue tracking.

@elnosh
Copy link
Contributor Author

elnosh commented Mar 23, 2026

pushed changes addressing comments from last review. Changes to point out are to DecayingAverage and AggregatedWindowAverage addressing this comment #4409 (comment) I have added the test test_aggregated_window_average showing how the aggregate average approximates the real value.

Comment on lines +1039 to +1047
fn value_at_timestamp(&mut self, timestamp_unix_secs: u64) -> Result<i64, ()> {
if timestamp_unix_secs < self.last_updated_unix_secs {
return Err(());
}

let elapsed_secs = (timestamp_unix_secs - self.last_updated_unix_secs) as f64;
let decay_rate = 0.5_f64.powf(elapsed_secs / self.half_life);
self.value = (self.value as f64 * decay_rate).round() as i64;
self.last_updated_unix_secs = timestamp_unix_secs;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: value_at_timestamp mutates last_updated_unix_secs on every call, making it a one-way ratchet. If any caller passes a timestamp slightly ahead of the current time (e.g., due to clock skew, NTP adjustment, or VM migration), all subsequent calls with the "correct" time will return Err(()) until the wall clock catches up.

In resolve_htlc, this manifests as: one call with resolved_at slightly in the future poisons the channel's outgoing_reputation.last_updated_unix_secs. All subsequent add_htlc calls fail (line 768 calls value_at_timestamp(added_at) which returns Err), effectively blocking all HTLC forwarding through this outgoing channel until the system clock reaches the poisoned timestamp.

Consider clamping to max(timestamp, last_updated) instead of returning Err, or using monotonic timestamps internally. Alternatively, at minimum, document that callers must guarantee strictly non-decreasing timestamps to avoid bricking a channel's forwarding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider clamping to max(timestamp, last_updated)

This seems fine to me, it'll just mean that we think our HTLC is held for a second or two more than we expect which isn't critical.

@carlaKC carlaKC self-requested a review March 24, 2026 13:36
elnosh added 5 commits March 24, 2026 11:12
f
- fix comment and remove unnecessary channel id
- add sanity checks when creating channel
Introduces the DefaultResourceManager struct. The core of methods
that will be used to inform the HTLC forward decisions are
add/resolve_htlc.

- add_htlc: Based on resource availability and reputation, it
evaluates whehther to forward or fail the HTLC.

- resolve_htlc: Releases the bucket resources used from a HTLC
previously added and updates the channel's reputation based on HTLC
fees and resolution times.
Adds write and read implementations to persist the
DefaultResourceManager.
Comment on lines +1039 to +1047
fn value_at_timestamp(&mut self, timestamp_unix_secs: u64) -> Result<i64, ()> {
if timestamp_unix_secs < self.last_updated_unix_secs {
return Err(());
}

let elapsed_secs = (timestamp_unix_secs - self.last_updated_unix_secs) as f64;
let decay_rate = 0.5_f64.powf(elapsed_secs / self.half_life);
self.value = (self.value as f64 * decay_rate).round() as i64;
self.last_updated_unix_secs = timestamp_unix_secs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider clamping to max(timestamp, last_updated)

This seems fine to me, it'll just mean that we think our HTLC is held for a second or two more than we expect which isn't critical.

DecayingAverage {
value: 0,
last_updated_unix_secs: start_timestamp_unix_secs,
window,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Window is now unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used in the methods but I'm writing it to reconstruct the half-life later when reading it back. I think we agreed on this to not write a f64

}
}

#[cfg(test)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-add a test covering expected values for the decaying average (get values from Clara)?

struct AggregatedWindowAverage {
start_timestamp_unix_secs: u64,
window_count: u8,
avg_weeks: u8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: var rename not really an improvement to me

Perhaps bucket_duration and aggregate_duration


impl AggregatedWindowAverage {
fn new(window: Duration, window_count: u8, start_timestamp_unix_secs: u64) -> Self {
fn new(avg_weeks: u8, window_weeks: u8, start_timestamp_unix_secs: u64) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API doesn't really make sense to me? When you come to this struct, I'd assume that statement in your mind is "I want to track a smoothed average for 2 weeks periods, and I'd like to look at the last 6 periods". Here you have to do some of that calculation yourself to get window_weeks, rather than just provide the multiplier?

You could provide a window_weeks that isn't divisible by avg_weeks which doesn't really make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd assume that statement in your mind is "I want to track a smoothed average for 2 weeks periods, and I'd like to look at the last 6 periods".

felt to me the API was allowing this. Here if you wanted a 2 week average over the last 6 periods, you'd pass avg_weeks = 2 and window_weeks = 12 which seemed fine given that this is based on the configurations we set ourselves. But you mean change the window_weeks to be the multiplier instead? That makes sense

Here you have to do some of that calculation yourself to get window_weeks, rather than just provide the multiplier?

the calculation is just converting weeks to seconds to get a Duration for the DecayingAverage which we'll need to do even if we pass a multiplier. Unless we pass a Duration directly but then this can happen You could provide a window_weeks that isn't divisible by avg_weeks which doesn't really make sense.

return Err(());
}

if max_accepted_htlcs < 12 || max_in_flight_sat < 1000 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs an explanation, also is there a constant we can use(/add) for the minimum channel size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will add explanation. Yeah I tried looking for a constant for the 1000 value we use in channel.rs as well but didn't find one.

if max_accepted_htlcs > 483
|| (max_htlc_value_in_flight_msat / 1000) >= TOTAL_BITCOIN_SUPPLY_SATOSHIS
{
let max_in_flight_sat = max_htlc_value_in_flight_msat / 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: check in msat, division has rounding issues

/// Tracks HTLCs that returned [`ForwardingOutcome::Fail`] during [`Self::replay_pending_htlcs`].
/// When [`Self::resolve_htlc`] is called for one of these, it is silently ignored instead of
/// returning an error.
failed_replays: Mutex<HashSet<HtlcRef>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: explain why this could happen

.map_err(|_| DecodeError::InvalidValue)?;

if outcome == ForwardingOutcome::Fail {
failed_replays.insert(HtlcRef {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log this? This is supposed to catch a few edge cases, but if we're hitting it for the majority of our HTLCs we should know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. I left logging for the follow-up when integrating into the channel manager in #4468

Comment on lines +576 to +577
(1, self.max_htlc_value_in_flight_msat, required),
(3, self.max_accepted_htlcs, required),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda annoying that we need to persist these - any chance we can provide a map of our known channel constraints as a read arg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll already have the FundedChannels when reading the resource manager so should be doable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

6 participants