Skip to content

IoT Edge hub loses connections/direct methods after error while re authenticating devices #7487

@BGbakker

Description

@BGbakker

Describe the bug
Periodically the edge hub runs a task to re authenticate the connected clients. While doing this for a device an error occurs (seems to be due to a message being send). The re authentication continues and finishes for this device, but an other device has lost it's connection (or direct method registrations) until the next re authentication (or forever, it doesn't always reconnect). This device was not part of the devices that were re authenticated, it can be multiple devices that lose the connection.
Trying to invoke a direct method of that device results in a '404' or '404103'. Invoking direct methods on the same edge but a different module is still working.
The missing connection is unnoticed by the IoT hub.

To Reproduce
We have been unable to reproduce this issue at will. However this is not an isolated incident. We did notice the following

  • Issue does not occur on our test systems (same setup, lower system load).
  • Issue returns after restart of the edge hub (hours/days later).
  • Issue does not return (for a longer period of time) after deleting and recreating the container.

Expected behavior
Connections of device A and B do not influence each other.

Screenshots
Image
Image shows the keep alive request we do to the downstream devices, just for illustrative purposes. Note that the time in the screenshot is UTC+1 and the logs are UTC

Logs

<6> 2025-11-13 08:43:47.918 +00:00 [INF] - Entering periodic task to reauthenticate connected clients
<6> 2025-11-13 08:43:51.035 +00:00 [INF] - Error authenticating token for Device_xxx1 because the token is expired or could not be parsed
<6> 2025-11-13 08:43:51.035 +00:00 [INF] - Credentials for client Device_xxx1 are not valid.
<4> 2025-11-13 08:43:51.035 +00:00 [WRN] - Reauthenticating client Device_xxx1 failed, removing client connection
<6> 2025-11-13 08:43:51.035 +00:00 [INF] - Unable to re-authenticate Device_xxx1, dropping client connection.
<6> 2025-11-13 08:43:51.035 +00:00 [INF] - Removing device connection for device Device_xxx1 with removeCloudConnection flag 'True'.
<4> 2025-11-13 08:43:51.036 +00:00 [WRN] - "Closing connection for device: Device_xxx1, Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnectionException: Connection closed for device Device_xxx1., "
<6> 2025-11-13 08:43:51.036 +00:00 [INF] - Disposing MessagingServiceClient for device Id Device_xxx1 because of exception - Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnectionException: Connection closed for device Device_xxx1.
<6> 2025-11-13 08:43:51.036 +00:00 [INF] - Closing device proxy for device Id Device_xxx1
<6> 2025-11-13 08:43:51.041 +00:00 [INF] - Closing receiver in cloud proxy 2fd82568-4156-44db-86a9-dba8369a7c77 for Device_xxx1
<6> 2025-11-13 08:43:51.039 +00:00 [INF] - Retrying cloud proxy operation SendMessageBatchAsync for Device_xxx1.
Microsoft.Azure.Devices.Client.Exceptions.IotHubCommunicationException: Amqp resource is disconnected.
at Microsoft.Azure.Devices.Client.Transport.AmqpIoT.AmqpIoTSendingLink.SendAmqpMessageAsync(AmqpMessage amqpMessage, TimeSpan timeout)
at Microsoft.Azure.Devices.Client.Transport.AmqpIoT.AmqpIoTSendingLink.SendMessagesAsync(IEnumerable1 messages, TimeSpan timeout) at Microsoft.Azure.Devices.Client.Transport.AmqpIoT.AmqpUnit.SendMessagesAsync(IEnumerable1 messages, TimeSpan timeout)
at Microsoft.Azure.Devices.Client.Transport.AmqpIoT.AmqpUnit.SendEventsAsync(IEnumerable1 messages, TimeSpan timeout) at Microsoft.Azure.Devices.Client.Transport.Amqp.AmqpTransportHandler.SendEventAsync(IEnumerable1 messages, CancellationToken cancellationToken)
at Microsoft.Azure.Devices.Client.Transport.ErrorDelegatingHandler.<>c__DisplayClass27_0.<b__0>d.MoveNext()
--- End of stack trace from previous location ---
at Microsoft.Azure.Devices.Client.Transport.ErrorDelegatingHandler.ExecuteWithErrorHandlingAsync[T](Func1 asyncOperation) at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.<>c__DisplayClass22_0.<<SendEventAsync>b__0>d.MoveNext() --- End of stack trace from previous location --- at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.SendEventAsync(IEnumerable1 messages, CancellationToken cancellationToken)
at Microsoft.Azure.Devices.Client.InternalClient.SendEventBatchAsync(IEnumerable1 messages) at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient.<>c__DisplayClass30_0.<<InvokeFunc>b__0>d.MoveNext() in /mnt/vss/_work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/ConnectivityAwareClient.cs:line 196 --- End of stack trace from previous location --- at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient.InvokeFunc[T](Func1 func, String operation, Boolean useForConnectivityCheck) in /mnt/vss/_work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/ConnectivityAwareClient.cs:line 189
at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient.InvokeFunc[T](Func1 func, String operation, Boolean useForConnectivityCheck) in /mnt/vss/_work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/ConnectivityAwareClient.cs:line 189 at Microsoft.Azure.Devices.Edge.Util.TaskEx.TimeoutAfter(Task task, TimeSpan timeout, Action action) in /mnt/vss/_work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/TaskEx.cs:line 160 at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudProxy.SendMessageBatchAsync(IEnumerable1 inputMessages) in /mnt/vss/_work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/CloudProxy.cs:line 207
at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudProxy.SendMessageBatchAsync(IEnumerable1 inputMessages) in /mnt/vss/_work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/CloudProxy.cs:line 207 at Microsoft.Azure.Devices.Edge.Hub.Core.Cloud.RetryingCloudProxy.<>c__DisplayClass24_0.<<ExecuteOperation>b__0>d.MoveNext() in /mnt/vss/_work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.Core/cloud/RetryingCloudProxy.cs:line 65 --- End of stack trace from previous location --- at Microsoft.Azure.Devices.Edge.Hub.Core.Cloud.RetryingCloudProxy.ExecuteOperation[T](Func2 func, String operation) in /mnt/vss/_work/1/s/edge-hub/core/src/Microsoft.Azure.Devices.Edge.Hub.Core/cloud/RetryingCloudProxy.cs:line 76
<6> 2025-11-13 08:43:51.045 +00:00 [INF] - Client DataSetWriterV2_uatcac1986a0c292074adb3b77cea27102326e65919 in device scope authenticated locally.
<6> 2025-11-13 08:43:51.045 +00:00 [INF] - Cloud message receiver stopped for Device_xxx1 in cloud proxy 2fd82568-4156-44db-86a9-dba8369a7c77
<6> 2025-11-13 08:43:51.045 +00:00 [INF] - Closed cloud proxy 2fd82568-4156-44db-86a9-dba8369a7c77 for Device_xxx1
<6> 2025-11-13 08:43:51.045 +00:00 [INF] - Device connection removed for device Device_xxx1
<6> 2025-11-13 08:43:51.051 +00:00 [INF] - Attempting to connect to IoT Hub for client Device_xxx1 via AMQP...
<6> 2025-11-13 08:43:51.084 +00:00 [INF] - Cloud connection for Device_xxx1 is True
<6> 2025-11-13 08:43:51.084 +00:00 [INF] - Connection status for Device_xxx1 changed to ConnectionEstablished
<6> 2025-11-13 08:43:51.084 +00:00 [INF] - Client Device_xxx1 connected to cloud, processing existing subscriptions.
<6> 2025-11-13 08:43:51.084 +00:00 [INF] - Created cloud proxy for client Device_xxx1 via AMQP, with client operation timeout 20 seconds.
<6> 2025-11-13 08:43:51.084 +00:00 [INF] - Initialized cloud proxy 811aee47-32b0-4add-80c6-a6237ac5b9a6 for Device_xxx1
<6> 2025-11-13 08:43:51.084 +00:00 [INF] - Created cloud connection for client Device_xxx1

<6> 2025-11-13 08:43:51.241 +00:00 [INF] - Client Device_xxx1 in device scope authenticated locally.
<6> 2025-11-13 08:43:51.242 +00:00 [INF] - Successfully generated identity for clientId Device_xxx1 and username Edge_Device_1/Device_xxx1/?api-version=2020-09-30&DeviceClientType=.NET%2F1.41.3%20%28.NET%206.0.16%3B%20Linux%204.15.0-213-generic%20%23224-Ubuntu%20SMP%20Mon%20Jun%2019%2013%3A30%3A12%20UTC%202023%3B%20X64%29%20Module
<6> 2025-11-13 08:43:51.243 +00:00 [INF] - "ClientAuthenticated, Device_xxx1, 6df4e298"
<6> 2025-11-13 08:43:51.244 +00:00 [INF] - New device connection for device Device_xxx1
<6> 2025-11-13 08:43:51.244 +00:00 [INF] - Client Device_xxx1 connected to edgeHub, processing existing subscriptions.
<6> 2025-11-13 08:43:51.247 +00:00 [INF] - Bind device proxy for device Device_xxx1
<6> 2025-11-13 08:43:51.247 +00:00 [INF] - Binding message channel for device Id Device_xxx1
.
.
.
<6> 2025-11-13 09:43:47.917 +00:00 [INF] - Entering periodic task to reauthenticate connected clients
<6> 2025-11-13 09:43:47.929 +00:00 [INF] - Error authenticating token for Device_xxx2 because the token is expired or could not be parsed
<6> 2025-11-13 09:43:47.929 +00:00 [INF] - Credentials for client Device_xxx2 are not valid.
<4> 2025-11-13 09:43:47.930 +00:00 [WRN] - Reauthenticating client Device_xxx2 failed, removing client connection
<6> 2025-11-13 09:43:47.930 +00:00 [INF] - Unable to re-authenticate Device_xxx2, dropping client connection.
<6> 2025-11-13 09:43:47.930 +00:00 [INF] - Removing device connection for device Device_xxx2 with removeCloudConnection flag 'True'.
<4> 2025-11-13 09:43:47.930 +00:00 [WRN] - "Closing connection for device: Device_xxx2, Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnectionException: Connection closed for device Device_xxx2., "
<6> 2025-11-13 09:43:47.930 +00:00 [INF] - Disposing MessagingServiceClient for device Id Device_xxx2 because of exception - Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnectionException: Connection closed for device Device_xxx2.
<6> 2025-11-13 09:43:47.930 +00:00 [INF] - Closing device proxy for device Id Device_xxx2
<6> 2025-11-13 09:43:47.946 +00:00 [INF] - Closing receiver in cloud proxy 3b695435-47f5-46bf-9cf9-c2970b4f5901 for Device_xxx2
<6> 2025-11-13 09:43:47.950 +00:00 [INF] - Cloud message receiver stopped for Device_xxx2 in cloud proxy 3b695435-47f5-46bf-9cf9-c2970b4f5901
<6> 2025-11-13 09:43:47.950 +00:00 [INF] - Closed cloud proxy 3b695435-47f5-46bf-9cf9-c2970b4f5901 for Device_xxx2
<6> 2025-11-13 09:43:47.950 +00:00 [INF] - Device connection removed for device Device_xxx2
<6> 2025-11-13 09:43:48.747 +00:00 [INF] - Client Device_xxx2 in device scope authenticated locally.
<6> 2025-11-13 09:43:48.747 +00:00 [INF] - Successfully generated identity for clientId Device_xxx2 and username Edge_Device_1/Device_xxx2/?api-version=2020-09-30&DeviceClientType=.NET%2F1.41.3%20%28.NET%206.0.16%3B%20Linux%204.15.0-213-generic%20%23224-Ubuntu%20SMP%20Mon%20Jun%2019%2013%3A30%3A12%20UTC%202023%3B%20X64%29%20OpcTwin_2.8.6.2
<6> 2025-11-13 09:43:48.747 +00:00 [INF] - "ClientAuthenticated, Device_xxx2, 6f97095b"
<6> 2025-11-13 09:43:48.747 +00:00 [INF] - New device connection for device Device_xxx2
<6> 2025-11-13 09:43:48.747 +00:00 [INF] - Client Device_xxx2 connected to edgeHub, processing existing subscriptions.
<6> 2025-11-13 09:43:48.747 +00:00 [INF] - Bind device proxy for device Device_xxx2
<6> 2025-11-13 09:43:48.747 +00:00 [INF] - Binding message channel for device Id Device_xxx2
<6> 2025-11-13 09:43:48.748 +00:00 [INF] - Attempting to connect to IoT Hub for client Device_xxx2 via AMQP...
<6> 2025-11-13 09:43:48.748 +00:00 [INF] - Processing pending subscriptions for Device_xxx2
<6> 2025-11-13 09:43:48.755 +00:00 [INF] - Processing subscriptions TwinResponse, Methods, DesiredPropertyUpdates, C2D for client Device_xxx2.
<6> 2025-11-13 09:43:48.755 +00:00 [INF] - Set subscriptions from session state for Device_xxx2
<6> 2025-11-13 09:43:48.886 +00:00 [INF] - Cloud connection for Device_xxx2 is True
<6> 2025-11-13 09:43:48.886 +00:00 [INF] - Connection status for Device_xxx2 changed to ConnectionEstablished
<6> 2025-11-13 09:43:48.886 +00:00 [INF] - Client Device_xxx2 connected to cloud, processing existing subscriptions.
<6> 2025-11-13 09:43:48.886 +00:00 [INF] - Skipping Device_xxx2 for subscription processing, as it is currently being processed.
<6> 2025-11-13 09:43:48.886 +00:00 [INF] - Created cloud proxy for client Device_xxx2 via AMQP, with client operation timeout 20 seconds.
<6> 2025-11-13 09:43:48.886 +00:00 [INF] - Initialized cloud proxy 2a8a247b-fce0-479a-8e9d-b846403783c0 for Device_xxx2
<6> 2025-11-13 09:43:48.886 +00:00 [INF] - Created cloud connection for client Device_xxx2
<6> 2025-11-13 09:43:48.887 +00:00 [INF] - Start listening for C2D messages for Device_xxx2

System info

  • Ubuntu 22.04
  • IoT edge deamon 1.4 & 1.5
  • Edge hub version 1.4.18 & 1.4.43

Additional context
Issue is noticed over multiple edge devices used in multiple solutions.
Logs have edited to not display the actual device names and lines for other devices have been removed to make it a bit more compact.

Let us know if you have any questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions