-
Notifications
You must be signed in to change notification settings - Fork 132
Description
Maybe this is more a RHEL/dracut bug, but I'm not really sure how to find the root cause of this problem.
Using Centos 7.6 (3.10.0-957.1.3.el7.x86_64, nbd 3.14) and booting with nbd root filesystem and nbd client options -p -t10, the nbd-client fails to reconnect after a network hickup. If the nbd-client is ever restarted after boot (i.e. nbd-client -d /dev/nbd0 && nbd-client ... -p -t10 /dev/nbd0) the newly started nbd-client recovers just fine on network failures.
Adding -nofork and redirecting the nbd-client stderr output I was able to capture the following output of an initramfs started nbd-client:
CentOS Linux 7 (Core)
Kernel 3.10.0-957.1.3.el7.x86_64 on an x86_64
localhost login: [ 58.382947] fuse init (API version 7.22)
[ 151.366971] block nbd0: Receive control failed (result -104)
[ 151.370055] block nbd0: shutting down socket
[ 151.372247] block nbd0: queue cleared
[ 151.373853] nbd,3371: Kernel call returned: 104 Reconnecting
[ 151.395901] Error: Socket failed: Connection refused
[ 151.395901] Exiting.
[ 151.949925] e1000: ens33 NIC Link is Down
[ 152.397085] Reconnecting
[ 157.989809] e1000: ens33 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 157.992670] IPv6: ADDRCONF(NETDEV_CHANGE): ens33: link becomes ready
[ 158.007023] block nbd0: Attempted send on closed socket
[ 158.008705] blk_update_request: I/O error, dev nbd0, sector 18518264
[ 158.010371] XFS (nbd0): metadata I/O error: block 0x11a90f8 ("xfs_trans_read_buf_map") error 5 numblks 8
[ 158.012997] block nbd0: Attempted send on closed socket
[ 158.014379] blk_update_request: I/O error, dev nbd0, sector 55372432
[ 158.016040] XFS (nbd0): metadata I/O error: block 0x34cea90 ("xfs_trans_read_buf_map") error 5 numblks 8
[ 158.041987] block nbd0: Attempted send on closed socket
[ 158.043606] blk_update_request: I/O error, dev nbd0, sector 18518264
[ 158.045309] XFS (nbd0): metadata I/O error: block 0x11a90f8 ("xfs_trans_read_buf_map") error 5 numblks 8
[ 158.048266] block nbd0: Attempted send on closed socket
[ 158.049547] blk_update_request: I/O error, dev nbd0, sector 55372432
[ 158.051324] XFS (nbd0): metadata I/O error: block 0x34cea90 ("xfs_trans_read_buf_map") error 5 numblks 8
[ 158.118247] block nbd0: Attempted send on closed socket
[ 158.119665] blk_update_request: I/O error, dev nbd0, sector 197568
[ 158.121388] XFS (nbd0): metadata I/O error: block 0x303c0 ("xfs_trans_read_buf_map") error 5 numblks 32
[ 158.123793] XFS (nbd0): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
[ 158.125818] block nbd0: Attempted send on closed socket
[ 158.127059] blk_update_request: I/O error, dev nbd0, sector 55372432
[ 158.133341] XFS (nbd0): metadata I/O error: block 0x34cea90 ("xfs_trans_read_buf_map") error 5 numblks 8
[ 158.133786] block nbd0: Attempted send on closed socket
[ 158.133788] blk_update_request: I/O error, dev nbd0, sector 55694096
[ 158.133793] block nbd0: Attempted send on closed socket
[ 158.133793] blk_update_request: I/O error, dev nbd0, sector 55694096
[ 158.133821] block nbd0: Attempted send on closed socket
[ 158.133821] blk_update_request: I/O error, dev nbd0, sector 55694096
[ 159.419970] Error: Socket failed: Connection refused
[ 159.419970] Exiting.
[ 160.422105] Reconnecting
[ 160.424333] Error: Socket failed: Connection refused
[ 160.424333] Exiting.
[ 161.428520] Reconnecting
[ 161.431024] Error: Socket failed: Connection refused
[ 161.431024] Exiting.
[ 162.433949] Reconnecting
[ 162.436331] Error: Socket failed: Connection refused
[ 162.436331] Exiting.
[ 163.438283] Reconnecting
[ 163.441123] Error: Socket failed: Connection refused
[ 163.441123] Exiting.
[ 164.443784] Reconnecting
[ 164.446080] Error: Socket failed: Connection refused
[ 164.446080] Exiting.
[ 165.447956] Reconnecting
[ 165.450821] Error: Socket failed: Connection refused
[ 165.450821] Exiting.
[ 166.454441] Reconnecting
[ 166.457184] Error: Socket failed: Connection refused
[ 166.457184] Exiting.
[ 167.460289] Reconnecting
[ 167.462858] Error: Socket failed: Connection refused
[ 167.462858] Exiting.
[ 168.464403] Reconnecting
[ 168.466996] Error: Socket failed: Connection refused
[ 168.466996] Exiting.
[ 169.469852] Reconnecting
[ 169.472117] Error: Socket failed: Connection refused
[ 169.472117] Exiting.
[ 170.479696] Reconnecting
[ 170.482395] Error: Socket failed: Connection refused
[ 170.482395] Exiting.
[ 171.492696] Reconnecting
[ 171.494984] Error: Socket failed: Connection refused
[ 171.494984] Exiting.
[ 172.505718] Reconnecting
[ 172.508425] Error: Socket failed: Connection refused
[ 172.508425] Exiting.
[ 173.518706] Reconnecting
[ 173.521055] Error: Socket failed: Connection refused
[ 173.521055] Exiting.
[ 174.531709] Reconnecting
[ 174.534078] Error: Socket failed: Connection refused
[ 174.534078] Exiting.
[ 175.544887] Reconnecting
[ 175.547474] Error: Socket failed: Connection refused
[ 175.547474] Exiting.
[ 176.557703] Reconnecting
[ 176.559988] Error: Socket failed: Connection refused
[ 176.559988] Exiting.
[ 177.570708] Reconnecting
[ 177.574317] Error: Socket failed: Connection refused
[ 177.574317] Exiting.
[ 178.583621] Reconnecting
[ 178.587565] Error: Socket failed: Connection refused
[ 178.587565] Exiting.
[ 179.596929] Reconnecting
[ 179.600292] Error: Socket failed: Connection refused
[ 179.600292] Exiting.
[ 180.609744] Reconnecting
[ 180.613501] Error: Socket failed: Connection refused
[ 180.613501] Exiting.
[ 181.622742] Reconnecting
[ 181.624653] Error: Socket failed: Connection refused
[ 181.624653] Exiting.
[ 182.635739] Reconnecting
[ 182.639482] Error: Socket failed: Connection refused
[ 182.639482] Exiting.
[ 183.648633] Reconnecting
[ 183.650941] Error: Socket failed: Connection refused
[ 183.650941] Exiting.
[ 184.661574] Reconnecting
[ 185.674957] Error: Socket failed: Connection refused
[ 185.674957] Exiting
[ 186.095634] e1000: ens33 NIC Link is Down
[ 186.687704] Reconnecting
[ 190.117042] e1000: ens33 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 193.714773] Error: Cannot open NBD: No such file or directory
[ 193.714773] Exiting.
So it fails in https://github.com/NetworkBlockDevice/nbd/blob/master/nbd-client.c#L1317. I also tried to reproduce using newer kernels, however there the nbd-client was always stopped after the initramfs finished executing, so the situation was actually much worse.
I think we'll try to use iscsi for our root devices now (especially as the nbd behaviour is not particularly nice when a connection drops: We get lot's of IO errors until the connection is restored which means most running programs will crash), but I still wanted to file this report as it may help others.