Skip to content

Conversation

@prudo1
Copy link
Collaborator

@prudo1 prudo1 commented Dec 4, 2024

Hi everybody,

This PR started out as a cleanup for ssh dumps by using curl but picked up a few other cleanups to support the kdumpctl test changes made by @liutgnu in PR #53 on its way. The rational behind using curl is that it has a much cleaner UI compared to the home grown mix of ssh and scp. In addition it also supports a wide range of other protocols which could be leveraged to extend support to dump via e.g. http in the future.

I can imagine that the switch to curl can be controversial. So I'm posting this as a RFC for now.

Checking for the Opalcore currently requires duplicate code. Simplify it
by always checking if an Opalcore exists at the beginning of the script.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Move the ssh host to a global variable to make it available for new
functions introduced in following commits.

While at it rename $SSH_KEY_LOCATION to $SSH_KEY to shorten the name and
be consistent with the naming schema of the other variables.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
In dump_ssh the _ssh_opts need to be word split, which triggers
ShellChecks SC2086 warning. This warning is currently disabled for all
calls to ssh/scp. However in the ShellCheck wiki [1] the suggested solution
for POSIX compatible code is to make the call in a separate function.
This also makes disabling of SC2029 obsolete.

One big benefit of this solution is that now calls to ssh/scp can also
be made outside of dump_ssh without redefining _ssh_opts.

One small downside is that the options passed to ssh/scp must now be
maintained in each new function separately. Compared to the benefit
described above this is a small price to pay.

Note: Currently ssh/scp option -q,--quiet is used inconsistently. With
this commit it will always be set.

[1] https://www.shellcheck.net/wiki/SC2086

Signed-off-by: Philipp Rudo <prudo@redhat.com>
The mount point for file system dumps is currently passed as an argument
to dump_fs. But there are only two possible values for the mount point.
Its either $NEWROOT for failure_action dump_to_rootfs or whatever the
user provided in kdump.conf. Thus instead of passing the mount point as
argument use a global variable and update it when parsing the config and
use that in dump_fs. Reuse $NEWROOT for this.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Both dump_fs and dump_ssh currently define their own, almost identical,
directories they use to write the dump to. This not only adds duplicate
code but also prevents any code outside of dump_{fs,ssh} to access the
dump directory. Thus use a global definition for said directory. Reuse
the already existing $KDUMP_PATH for that.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
This commit contains two major changes for dumps via ssh. One is the
switch from scp to sftp as protocol to transfer files. Using sftp has
two big advantages

  1. It allows a wide variety of file operations, e.g. renaming files,
     on the remote host.
  2. It can work with files of unknown size, i.e. read from a pipe.

sftp is fully supported by OpenSSH, in fact since OpenSSH 9.0 (released
April 2022) the 'scp' command uses sftp internally by default.

The other big change is to make use of curl rather than ssh/scp/sftp
directly. This is mainly because curl provides a cleaner user interface
for now. But curl also supports a big variety of different protocols and
thus could be used to extend support to dump via e.g. http in the future.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
When fetching the test status from the target a missing status file is
always considered a 'failure'. So there is no need to explicitly setting
the status to 'failure' in the initrd. This allows simplifying the code
a bit. For example we can now assume that the directory for $KDUMP_PATH
always exists (otherwise dump_{fs,ssh} would have returned with an
error).

Signed-off-by: Philipp Rudo <prudo@redhat.com>
@prudo1 prudo1 force-pushed the features/curl/main branch from 77987c3 to e6e421a Compare December 5, 2024 12:58
@prudo1
Copy link
Collaborator Author

prudo1 commented Dec 5, 2024

Rebased to current main as PR #53 was merged. Updated the description accordingly. No code changes made.

@coiby
Copy link
Member

coiby commented Jan 13, 2025

Hi @prudo1,

I like current approach as it also significantly simplifies the code!

I notice one problem when testing the code. Since this PR makes scp and ssh obsolete, I remove the dependency on the ssh-client dracut module,

diff --git a/dracut/99kdumpbase/module-setup.sh b/dracut/99kdumpbase/module-setup.sh
index 2a61a4ce..ae782d59 100755
--- a/dracut/99kdumpbase/module-setup.sh
+++ b/dracut/99kdumpbase/module-setup.sh
@@ -49,7 +49,7 @@ depends() {
     fi
 
     if is_ssh_dump_target; then
-        _dep="$_dep ssh-client"
+        _dep="$_dep network"
     fi
 
     if is_lvm2_thinp_dump_target; then

unfortunately then the ssh dumping test fails,

[    3.588282] kdump[542]: saving to 192.168.122.114:/var/crash/192.168.122.140-2025-01-13-07:02:28
[    3.595213] kdump[547]: saving vmcore-dmesg.txt to 192.168.122.114:/var/crash/192.168.122.140-2025-01-13-07:02:28
[    3.631297] kdump[550]: failed to save vmcore-dmesg.txt, exitcode 60
[    3.634540] kdump[552]: saving vmcore
[    3.639194] kdump[556]: saving vmcore.flat to 192.168.122.114:/var/crash/192.168.122.140-2025-01-13-07:02:28
[    3.671413] kdump[559]: failed to save vmcore.flat, exitcode 60
Excluding unnecessary pages                       : [100.0 %] \                  
[    3.676208] kdump.sh[553]: Can't write the dump file(STDOUT). Broken pipe
[    3.677757] kdump.sh[553]: makedumpfile Failed.
[    3.680441] kdump[561]: saving vmcore failed
[    3.693409] kdump[566]: saving the /run/initramfs/kexec-dmesg.log to 192.168.122.114:/var/crash/192.168.122.140-2025-01-13-07:02:28//
[    3.696867] kdump[568]: saving kexec-dmesg.log to 192.168.122.114:/var/crash/192.168.122.140-2025-01-13-07:02:28
[    3.724288] kdump[571]: failed to save kexec-dmesg.log, exitcode 60
[    3.726297] systemd[1]: kdump-capture.service: Main process exited, code=exited, status=1/FAILURE

Note the test works without removing the dependency on the ssh-client dracut module. Do you know why?

I'm also curious to ask why do you think switching to curl is controversial. Do you foresee any potential issue? It seems using curl only bring benefits and curl is always available to RHEL/Fedora.

@prudo1
Copy link
Collaborator Author

prudo1 commented Jan 17, 2025

Hi @coiby,

Hi @prudo1,

I like current approach as it also significantly simplifies the code!

I notice one problem when testing the code. Since this PR makes scp and ssh obsolete, I remove the dependency on the ssh-client dracut module,

diff --git a/dracut/99kdumpbase/module-setup.sh b/dracut/99kdumpbase/module-setup.sh
index 2a61a4ce..ae782d59 100755
--- a/dracut/99kdumpbase/module-setup.sh
+++ b/dracut/99kdumpbase/module-setup.sh
@@ -49,7 +49,7 @@ depends() {
     fi
 
     if is_ssh_dump_target; then
-        _dep="$_dep ssh-client"
+        _dep="$_dep network"
     fi
 
     if is_lvm2_thinp_dump_target; then

unfortunately then the ssh dumping test fails,

[    3.588282] kdump[542]: saving to 192.168.122.114:/var/crash/192.168.122.140-2025-01-13-07:02:28
[    3.595213] kdump[547]: saving vmcore-dmesg.txt to 192.168.122.114:/var/crash/192.168.122.140-2025-01-13-07:02:28
[    3.631297] kdump[550]: failed to save vmcore-dmesg.txt, exitcode 60
[    3.634540] kdump[552]: saving vmcore
[    3.639194] kdump[556]: saving vmcore.flat to 192.168.122.114:/var/crash/192.168.122.140-2025-01-13-07:02:28
[    3.671413] kdump[559]: failed to save vmcore.flat, exitcode 60
Excluding unnecessary pages                       : [100.0 %] \                  
[    3.676208] kdump.sh[553]: Can't write the dump file(STDOUT). Broken pipe
[    3.677757] kdump.sh[553]: makedumpfile Failed.
[    3.680441] kdump[561]: saving vmcore failed
[    3.693409] kdump[566]: saving the /run/initramfs/kexec-dmesg.log to 192.168.122.114:/var/crash/192.168.122.140-2025-01-13-07:02:28//
[    3.696867] kdump[568]: saving kexec-dmesg.log to 192.168.122.114:/var/crash/192.168.122.140-2025-01-13-07:02:28
[    3.724288] kdump[571]: failed to save kexec-dmesg.log, exitcode 60
[    3.726297] systemd[1]: kdump-capture.service: Main process exited, code=exited, status=1/FAILURE

Note the test works without removing the dependency on the ssh-client dracut module. Do you know why?

I'm not entirely sure how/if curl and ssh interact with each other. But even when curl can work without the ssh binaries it still needs all the auxiliary files, esp. ssh keys, known_hosts, all the configs etc., to work as expected. But those are only added to the initrd when the ssh-client module is included. That's why I didn't remove the module.

I'm also curious to ask why do you think switching to curl is controversial. Do you foresee any potential issue? It seems using curl only bring benefits and curl is always available to RHEL/Fedora.

It's mainly because the code for dumping via ssh is so old and 'well tested'. People might be reluctant to change it when there is no benefit other than having nicer code (at least for the moment, when/if we implement support for http dumping it will be different). Plus I'm not very proficient when it comes to networking. So I'm not sure which subtle problems such a change could cause on all the different network setups our customers have.

@coiby
Copy link
Member

coiby commented Jan 21, 2025

Hi @coiby,

Hi @prudo1,
[...]

Note the test works without removing the dependency on the ssh-client dracut module. Do you know why?

I'm not entirely sure how/if curl and ssh interact with each other. But even when curl can work without the ssh binaries it still needs all the auxiliary files, esp. ssh keys, known_hosts, all the configs etc., to work as expected. But those are only added to the initrd when the ssh-client module is included. That's why I didn't remove the module.

Thanks for the clarification! Indeed, ssh keys and known_hosts needs to be installed in order for curl to work. And the exitcode 60 means curl can't verify the legitimacy of the server because of missing. If we don't call with --silent, the following error message will be shown,

Warning: Couldn't find a known_hosts file
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (60) SSL peer certificate or SSH remote key was not OK
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

I'm also curious to ask why do you think switching to curl is controversial. Do you foresee any potential issue? It seems using curl only bring benefits and curl is always available to RHEL/Fedora.

It's mainly because the code for dumping via ssh is so old and 'well tested'. People might be reluctant to change it when there is no benefit other than having nicer code (at least for the moment, when/if we implement support for http dumping it will be different).

I thought this change is internal and doesn't require any change from users. Or do I miss something?

Plus I'm not very proficient when it comes to networking. So I'm not sure which subtle problems such a change could cause on all the different network setups our customers have.

ssh/curl depends on a working network but setting up kdump network is not done by ssh/curl. So I think this concern can be safely dismissed.

@prudo1
Copy link
Collaborator Author

prudo1 commented Jan 22, 2025

Hi @coiby,

I'm also curious to ask why do you think switching to curl is controversial. Do you foresee any potential issue? It seems using curl only bring benefits and curl is always available to RHEL/Fedora.

It's mainly because the code for dumping via ssh is so old and 'well tested'. People might be reluctant to change it when there is no benefit other than having nicer code (at least for the moment, when/if we implement support for http dumping it will be different).

I thought this change is internal and doesn't require any change from users. Or do I miss something?

The change is purely internal. There shouldn't be any visible change for users (besides a few slightly different error/info messages).

Plus I'm not very proficient when it comes to networking. So I'm not sure which subtle problems such a change could cause on all the different network setups our customers have.

ssh/curl depends on a working network but setting up kdump network is not done by ssh/curl. So I think this concern can be safely dismissed.

True, but there could still be firewalls settings that prevent a connection to the server to be setup. Even with an otherwise working network connection. But as scp also uses sftp internally as well that shouldn't be a problem. So I'm probably overthinking it.

Anyway, I'm dropping the RFC. Feel free to merge if you like.

@prudo1 prudo1 changed the title [RFC] use curl for ssh dumps Use curl for ssh dumps Jan 22, 2025
-o StrictHostKeyChecking=yes \
"$SSH_HOST" "$@"
_curl() {
curl --silent \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why ssh/scp option -q was used before. But I think it's better to drop --silent in case in some corners cases curl does fail.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @coiby,
yeah, what confused me most was the inconsistent use of -q...
You are right that --silent should be replaced here as it also masks error messages. But I don't think we should simply drop it. If we do so the scripts output will be quite cluttered as every file transferred will print it's progress. Which is quite annoying, when the progress printed by curl and makedumpfile are interleaved. That's why I think replacing --silent with --no-progress-meter makes more sense here.

derror "Unknown test status $_status"
return 1
;;
esac
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @liutgnu, does this commit looks good to you? I'm not sure if checking "Unknown test status" is necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @coiby,
the "Unknown test status" was only a safeguard for typos within the script, i.e. to make it obvious when the function got called with an improper $_status.

@daveyoung
Copy link
Contributor

daveyoung commented Feb 7, 2025

Hi @coiby @prudo1 hi, can you invite some networking people to have a look (mainly see if any corener cases about curl/ssh differences, there was a report here although original post deleted: https://www.reddit.com/r/linux4noobs/comments/dwsugz/curl_much_slower_over_ssh/ and curl/curl#9336)? Otherwise any difference of installed initramfs size with the change? Another thing is the runtime memory footprint also need consideration. Enough test and review will be better before making the change formally. About test, I'd suggest to test on large memory systems eg. over 1T, compare the saving time, and another case is to measure and compare the memory use

@coiby
Copy link
Member

coiby commented Feb 7, 2025

Hi @jacekmigacz, I see you maintain RHEL's curl. We want to switch from ssh/scp to curl because of its cleaner interface and its rich features. Do you think it is a good idea? Or do you have any concern in mind? Thanks!

@jacekmigacz
Copy link

Hi @coiby! I went through PR and I agree, that curl(1) would simplify your scripts.
Since what you are doing is non-interactive by nature and there are no tunneling or port forwarding involved, you should be good to go.

@coiby
Copy link
Member

coiby commented Feb 20, 2025

Hi @coiby! I went through PR and I agree, that curl(1) would simplify your scripts. Since what you are doing is non-interactive by nature and there are no tunneling or port forwarding involved, you should be good to go.

Thanks for your confirmation!

@prudo1
Copy link
Collaborator Author

prudo1 commented Mar 4, 2025

Hi @daveyoung,

Hi @coiby @prudo1 hi, can you invite some networking people to have a look (mainly see if any corener cases about curl/ssh differences, there was a report here although original post deleted: https://www.reddit.com/r/linux4noobs/comments/dwsugz/curl_much_slower_over_ssh/ and curl/curl#9336)? Otherwise any difference of installed initramfs size with the change? Another thing is the runtime memory footprint also need consideration. Enough test and review will be better before making the change formally. About test, I'd suggest to test on large memory systems eg. over 1T, compare the saving time, and another case is to measure and compare the memory use

the initrd get's slightly bigger by adding libcurl and the curl binary by almost exactly 1MB. For the runtime memory usage and transfere time I need to run some additional tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants