Skip to content

Conversation

@joshuabrink
Copy link

Add a custom Ktor Darwin HTTP engine to the repository with bounded channel backpressure to prevent out-of-memory crashes when processing large sync payloads on iOS/macOS.

Problem

NSURLSession delivers data faster than it can be processed, causing unbounded buffering in the upstream Ktor Darwin engine's channel. This results in multi-GB memory spikes and OOM crashes on large sync operations.

Solution

Implemented a custom Darwin client fork with a bounded channel (capacity: 64) and backpressure handling via runBlocking. This naturally throttles NSURLSession's data delivery rate to match processing speed, eliminating memory spikes.

Changes

  • Added internal/ktor-client-darwin/ with modified DarwinTaskHandler.kt
  • Implements bounded channel backpressure for large payload handling
  • Maintains full compatibility with existing PowerSync Kotlin SDK code
  • Reduces memory footprint during large syncs by preventing unbounded buffering

Copy link
Contributor

@rkistner rkistner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some general comments:

  1. Since most of the code is a fork from the ktor-client-darwin library, we need to include a copy of their license.
  2. There is quite a lot of code here from the fork. Is there an easy way to view the difference? And how would we keep it up to date with the upstream version in the future? (Can we get the fix merged upstream?)
  3. The bounded channel has a capacity of "64". What does this mean in practice? Does this translate to a practical limit in MB?

Copy link
Contributor

@simolus3 simolus3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest question is whether this actually works. I've explored a different approach that calls suspend() on the NSURLSession download task when the channel is full, and the framework implementation just completely ignored that. So it's not obvious to me whether blocking the receive thread is enough to implement backpressure (since most of the OS APIs are asynchronous).

Steven has created an aggressive repro for this, can you check whether the client fork fixes that? You can start the server, then run the iOS app in a simulator. At least in my attempts, memory usage was growing unbounded for about 40% of all app starts (with my approach, it looks like there may be a race condition in the underlying HTTP client implementation from the OS).

Comment on lines 11 to 12
1. **NSURLSession accumulates all response chunks** into a single `NSData` object
2. **Ktor converts the entire NSData to ByteArray** - allocating another full copy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't what actually happens - if it were the case, streaming sync would be completely broken. We still see individual lines being emitted by ktor, it's just that backpressure doesn't reach the server.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, ignore this file. It was a hold over from my initial investigation into how sessions were handled - very wrong indeed.

* Copyright 2014-2019 JetBrains s.r.o and contributors. Use of this source code is governed by the Apache 2.0 license.
*/

package io.ktor.client.engine.darwin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid linker issues if someone also depends on the upstream package in their app, we should give all of these files a different package name (e.g. com.powersync.internal.client_fork.darwin)

@joshuabrink
Copy link
Author

joshuabrink commented Dec 10, 2025

  1. Since most of the code is a fork from the ktor-client-darwin library, we need to include a copy of their license.

I did keep all license headers in the files, but you are right I included it as a standalone.

  1. There is quite a lot of code here from the fork. Is there an easy way to view the difference? And how would we keep it up to date with the upstream version in the future? (Can we get the fix merged upstream?)

A fix upstream would of been ideal. This is being tracked in this issue, but I think the idea was to have this as a temporary fix, until the fix was merged upstream.

  1. The bounded channel has a capacity of "64". What does this mean in practice? Does this translate to a practical limit in MB?

I tested channel capacities of 8, 64, and 256. The idea being to initially using trySend as a non-blocking until the channel gets saturated, then fallback to run blocking. Just to improve the speed for small payloads. Using a higher capacity did show increased memory usage: (8 chunk capacity 800 MB, 64 chunk capacity ~1.6 GB, 256 chunk capacity ~2.5 GB)

The single chunk capacity had the lowest memory usage ~600 MB and showed similar performance (operations/s) in my hacky benchmarks, even for short bursts. So not sure if my theory or testing is flawed. But if the goal is to lower memory usage, single chunk makes sense.

@joshuabrink
Copy link
Author

joshuabrink commented Dec 10, 2025

Steven has created an aggressive repro for this, can you check whether the client fork fixes that? You can start the server, then run the iOS app in a simulator. At least in my attempts, memory usage was growing unbounded for about 40% of all app starts (with my approach, it looks like there may be a race condition in the underlying HTTP client implementation from the OS).

Using a channel capacity of 1:

  • Tested it on Stevens repo, we see stable memory usage ~80 MB (ran it for about 3 minutes and no change)
  • Also tested the Swift SDK directly (using the reproduction of the reported issue) and we see about ~700 MB stable memory usage during initial sync (it gets up to 900 MB intermittently but soon drops back down). Most of the additional memory seems to be PowerSync overhead:
 433.27 MB      74.2%	65420	 	                                                                              kfun:com.powersync.sqlite.Statement#step(){}kotlin.Boolean
 433.27 MB      74.2%	65420	 	                                                                               sqlite3_step
 433.27 MB      74.2%	65420	 	                                                                                sqlite3VdbeExec
 427.04 MB      73.1%	50405	 	                                                                                 powersync_core::operations_vtab::update::h057c31bbad73a7d0
 427.04 MB      73.1%	50405	 	                                                                                  powersync_core::operations::insert_operation::h5aafae70b4795e73
 427.04 MB      73.1%	50405	 	                                                                                   powersync_core::sync::operations::insert_bucket_operations::h2edc58eeb588e78b
 393.53 MB      67.3%	29768	 	                                                                                    sqlite3_capi::capi::step::h009b6dd0ba7a12c7
 393.53 MB      67.3%	29768	 	                                                                                     sqlite3_step
 393.53 MB      67.3%	29768	 	                                                                                      sqlite3VdbeExec
 348.84 MB      59.7%	3721	 	                                                                                       btreeBeginTrans
 348.84 MB      59.7%	3721	 	                                                                                        getPageNormal
 348.84 MB      59.7%	3721	 	                                                                                         pcache1FetchStage2
 348.84 MB      59.7%	3721	 	                                                                                          sqlite3Malloc

@cahofmeyr
Copy link
Contributor

Also tested the using the Swift SDK directly (using the reproduction of the reported issue) and we see about ~700 MB stable memory usage during initial sync (it gets up to 900 MB intermittently but soon drops back down). Most of the additional memory seems to be PowerSync overhead:

@joshuabrink Do I understand correctly that this means 700-900MB vs. multiple GB in reproduction of the original reported issue?

@joshuabrink
Copy link
Author

@cahofmeyr Yes. The only caveat is performance. I've done a couple comparisons of the operation/s of this PR v.s. the original implementation v.s. the csqlite implementation.

Taking into account that my benchmarks are not 100% accurate - this runBlocking PR is still orders of magnitude slower than both. I do have these benchmarks available if anyone wants to take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants