Skip to content

feat: introduce serial link recovery#123

Open
sansmoraxz wants to merge 26 commits intogrid-x:masterfrom
sansmoraxz:feat/serial-recover
Open

feat: introduce serial link recovery#123
sansmoraxz wants to merge 26 commits intogrid-x:masterfrom
sansmoraxz:feat/serial-recover

Conversation

@sansmoraxz
Copy link
Copy Markdown
Contributor

@sansmoraxz sansmoraxz commented Dec 4, 2025

Attempt protection against shoddy wires in RTU connection.

Fixes: #122

@sansmoraxz

This comment was marked as resolved.

@sansmoraxz

This comment was marked as resolved.

@sansmoraxz sansmoraxz marked this pull request as ready for review December 7, 2025 05:37
Copy link
Copy Markdown
Contributor

@alexjoedt alexjoedt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, sorry for the late review! Really like the direction here.

I left a few comments, mostly around defaults and some small robustness things. Hope they're helpful, let me know what you think!

Comment thread serial.go
Comment thread rtuclient.go Outdated
Comment thread rtuclient.go Outdated
Comment thread rtuclient.go Outdated
Comment thread rtuclient.go
Comment thread rtuclient.go Outdated
Comment thread rtu_transport_test.go Outdated
Comment thread rtu_transport_test.go
@sansmoraxz
Copy link
Copy Markdown
Contributor Author

sansmoraxz commented Mar 27, 2026

Also this has the unexpected surprise of writing to currently open terminals.

modbus/Makefile

Lines 12 to 16 in 79617db

test:
diagslave -m tcp -p 5020 & diagslave -m enc -p 5021 & go test -run TCP -v $(shell glide nv)
socat -d -d pty,raw,echo=0 pty,raw,echo=0 & diagslave -m rtu /dev/pts/1 & go test -run RTU -v $(shell glide nv)
socat -d -d pty,raw,echo=0 pty,raw,echo=0 & diagslave -m ascii /dev/pts/3 & go test -run ASCII -v $(shell glide nv)
go test -v -count=1 github.com/grid-x/modbus/cmd/modbus-cli

IMO instead of pts terminals perhaps better to have some tmp file that's auto trapped and deleted once tests are run.

@sansmoraxz sansmoraxz requested a review from alexjoedt April 2, 2026 09:32
Comment thread rtuclient.go Outdated
// Send the request
mb.logf("modbus: send % x\n", aduRequest)
if _, err = mb.port.Write(aduRequest); err != nil {
connDeadline := time.Now().Add(mb.Timeout)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The connDeadline is captured once before the retry loop. After a successful reconnect() the remaining time on that deadline may be close to zero, causing readIncrementally (or readASCII) to time out immediately on the very next iteration and return context.DeadlineExceeded, which shouldRecover won't handle. This effectively makes link recovery useless under any real latency.

Please move connDeadline := time.Now().Add(mb.Timeout) inside the for loop (or reset it after a succesful reconnect), consistent with how tcpclient.go refreshes the deadline via SetDeadline on each iteration. Same issue exists in both rtuclient.go and asciiclient.go

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait wouldn't this apply to linkRecoveryDeadline too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or there's a risk of infinite block

Copy link
Copy Markdown
Contributor Author

@sansmoraxz sansmoraxz Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah deadlock seems likely with incremental linkRecoveryDeadline.

We should probably mention somewhere that timeout is per read/write attempt instead of across the entire cycle as I was previously assuming.

The Timeout is coming from serial.

https://github.com/grid-x/serial/blob/ad4f461b8ed5860433b53666ddd6514cc83e0a95/serial.go#L30-L31

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexjoedt where should it be documented?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the field itself?

Comment thread serial.go Outdated
Comment thread asciiclient.go Outdated
@alexjoedt alexjoedt added the enhancement New feature or request label Apr 17, 2026
Comment thread ascii_transport_test.go Outdated
Comment thread serial.go
Comment thread serial.go
Comment on lines +104 to +112
mb.logf("modbus: connection reset, reconnecting")
if cerr := mb.close(); cerr != nil {
mb.logf("modbus: error closing connection: %v", cerr)
return cerr
}
if cerr := mb.connect(ctx); cerr != nil {
mb.logf("modbus: error reconnecting: %v", cerr)
return cerr
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a USB-serial adapter is physically unplugged mid-transfer, port.Close() can
itself return an error (e.g. ENODEV on Linux). The current code:

if cerr := mb.close(); cerr != nil {
    mb.logf("modbus: error closing connection: %v", cerr)
    return cerr   // <-- bails here
}

…returns immediately without calling connect(). Note however that close() does
set mb.port = nil even on error, so a subsequent connect() call would attempt to
reopen the device — exactly what recovery is supposed to do.

Returning the close error defeats the whole purpose of link recovery in the most common
real-world scenario (hot-unplug of USB-serial dongles).

What do you think to log the close error and continue to connect?

if cerr := mb.close(); cerr != nil {
    mb.logf("modbus: error closing connection: %v", cerr)
    // mb.port is nil after close(); still attempt to reopen.
}
if cerr := mb.connect(ctx); cerr != nil {
    mb.logf("modbus: error reconnecting: %v", cerr)
    return cerr
}

Copy link
Copy Markdown
Contributor Author

@sansmoraxz sansmoraxz Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning the close error defeats the whole purpose of link recovery in the most common
real-world scenario (hot-unplug of USB-serial dongles).

Good point. Although I am not familiar with any program that does this. Most often they treat it as final error state.

I am a bit confused here, let's say close failed.

if cerr := mb.close(); cerr != nil {
    mb.logf("modbus: error closing connection: %v", cerr)
    // mb.port is nil after close(); still attempt to reopen.
}
if cerr := mb.connect(ctx); cerr != nil { // <- may fail due to timing
    mb.logf("modbus: error reconnecting: %v", cerr)
    return cerr
}

we keep spinning?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK keeping spin, both for close and connect fails

Comment thread rtuclient.go Outdated
Comment thread asciiclient.go Outdated
Comment thread rtu_transport_test.go Outdated
Comment thread ascii_transport_test.go
Comment thread serial.go
@sansmoraxz sansmoraxz requested a review from alexjoedt April 17, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature: Auto recovery for serial connections

2 participants