feat: introduce serial link recovery#123
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
alexjoedt
left a comment
There was a problem hiding this comment.
Hey, sorry for the late review! Really like the direction here.
I left a few comments, mostly around defaults and some small robustness things. Hope they're helpful, let me know what you think!
|
Also this has the unexpected surprise of writing to currently open terminals. Lines 12 to 16 in 79617db IMO instead of pts terminals perhaps better to have some tmp file that's auto trapped and deleted once tests are run. |
| // Send the request | ||
| mb.logf("modbus: send % x\n", aduRequest) | ||
| if _, err = mb.port.Write(aduRequest); err != nil { | ||
| connDeadline := time.Now().Add(mb.Timeout) |
There was a problem hiding this comment.
The connDeadline is captured once before the retry loop. After a successful reconnect() the remaining time on that deadline may be close to zero, causing readIncrementally (or readASCII) to time out immediately on the very next iteration and return context.DeadlineExceeded, which shouldRecover won't handle. This effectively makes link recovery useless under any real latency.
Please move connDeadline := time.Now().Add(mb.Timeout) inside the for loop (or reset it after a succesful reconnect), consistent with how tcpclient.go refreshes the deadline via SetDeadline on each iteration. Same issue exists in both rtuclient.go and asciiclient.go
There was a problem hiding this comment.
Wait wouldn't this apply to linkRecoveryDeadline too?
There was a problem hiding this comment.
Or there's a risk of infinite block
There was a problem hiding this comment.
Yeah deadlock seems likely with incremental linkRecoveryDeadline.
We should probably mention somewhere that timeout is per read/write attempt instead of across the entire cycle as I was previously assuming.
The Timeout is coming from serial.
https://github.com/grid-x/serial/blob/ad4f461b8ed5860433b53666ddd6514cc83e0a95/serial.go#L30-L31
There was a problem hiding this comment.
@alexjoedt where should it be documented?
| mb.logf("modbus: connection reset, reconnecting") | ||
| if cerr := mb.close(); cerr != nil { | ||
| mb.logf("modbus: error closing connection: %v", cerr) | ||
| return cerr | ||
| } | ||
| if cerr := mb.connect(ctx); cerr != nil { | ||
| mb.logf("modbus: error reconnecting: %v", cerr) | ||
| return cerr | ||
| } |
There was a problem hiding this comment.
When a USB-serial adapter is physically unplugged mid-transfer, port.Close() can
itself return an error (e.g. ENODEV on Linux). The current code:
if cerr := mb.close(); cerr != nil {
mb.logf("modbus: error closing connection: %v", cerr)
return cerr // <-- bails here
}…returns immediately without calling connect(). Note however that close() does
set mb.port = nil even on error, so a subsequent connect() call would attempt to
reopen the device — exactly what recovery is supposed to do.
Returning the close error defeats the whole purpose of link recovery in the most common
real-world scenario (hot-unplug of USB-serial dongles).
What do you think to log the close error and continue to connect?
if cerr := mb.close(); cerr != nil {
mb.logf("modbus: error closing connection: %v", cerr)
// mb.port is nil after close(); still attempt to reopen.
}
if cerr := mb.connect(ctx); cerr != nil {
mb.logf("modbus: error reconnecting: %v", cerr)
return cerr
}There was a problem hiding this comment.
Returning the close error defeats the whole purpose of link recovery in the most common
real-world scenario (hot-unplug of USB-serial dongles).
Good point. Although I am not familiar with any program that does this. Most often they treat it as final error state.
I am a bit confused here, let's say close failed.
if cerr := mb.close(); cerr != nil {
mb.logf("modbus: error closing connection: %v", cerr)
// mb.port is nil after close(); still attempt to reopen.
}
if cerr := mb.connect(ctx); cerr != nil { // <- may fail due to timing
mb.logf("modbus: error reconnecting: %v", cerr)
return cerr
}we keep spinning?
There was a problem hiding this comment.
OK keeping spin, both for close and connect fails
Attempt protection against shoddy wires in RTU connection.
Fixes: #122