-
Notifications
You must be signed in to change notification settings - Fork 80
handling agent memory leak #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ojarjur
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
There are some formatting issues that need to be addressed, but also please update the PR description to give background on what the issue is (connections leaking of the close message gets dropped) and how this PR addresses it.
Thanks!
The ba-proxy-agent currently experiences increasing memory consumption,
leading to daily or weekly restarts. Previous mitigation attempts were
insufficient, and the issue has been isolated to high memory usage on
the agent connection side.
This CL mitigates the issue by enforcing a timeout on idle connections.
Since the root cause remains elusive, forcefully closing unused
connections prevents memory accumulation from persistent links.
Implementation details:
* Introduced `lastActivityTime` property to agent connections, which
updates upon usage.
* Added a background routine to monitor connection activity.
* Configured the routine to explicitly close connections that remain
idle for more than 30 seconds.
| // The server messages channel has been closed. | ||
| return nil, fmt.Errorf("attempt to read a server message from a closed websocket connection") | ||
| } | ||
| conn.updateActivity() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs to be called at the beginning of this method (i.e. on line 258).
Otherwise, a connection that the client is continuously polling on, but where the server has not yet responded, will be closed as inactive.
| return msgs, nil | ||
| } | ||
| } | ||
| case <-time.After(time.Second * 20): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout here needs to be the minimum of 20 seconds and some amount less than the configured idle timeout (e.g. half of the idle timeout).
That way, a polling request from an actively connected client will timeout and be re-run before the activity tracking logic decides that the connection is idle.
No description provided.