* [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-18 3:25 ` Dave Young 0 siblings, 0 replies; 40+ messages in thread From: Dave Young @ 2020-09-18 3:25 UTC (permalink / raw) To: Andrew Morton, bhe, Eric Biederman, linux-kernel, kexec crash_kexec_post_notifiers enables running various panic notifier before kdump kernel booting. This increases risks of kdump failure. It is well documented in kernel-parameters.txt. We do not suggest people to enable it together with kdump unless he/she is really sure. This is also not suggested to be enabled by default when users are not aware in distributions. But unfortunately it is enabled by default in systemd, see below discussions in a systemd report, we can not convince systemd to change it: https://github.com/systemd/systemd/issues/16661 Actually we have got reports about kdump kernel hangs in both s390x and powerpcle cases caused by the systemd change, also some x86 cases could also be caused by the same (although that is in Hyper-V code instead of systemd, that need to be addressed separately). Thus to avoid the auto enablement here just disable the param writable permission in sysfs. Signed-off-by: Dave Young <dyoung@redhat.com> --- kernel/panic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/panic.c b/kernel/panic.c index aef8872ba843..bea44fc4eb3b 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -695,7 +695,7 @@ core_param(panic, panic_timeout, int, 0644); core_param(panic_print, panic_print, ulong, 0644); core_param(pause_on_oops, pause_on_oops, int, 0644); core_param(panic_on_warn, panic_on_warn, int, 0644); -core_param(crash_kexec_post_notifiers, crash_kexec_post_notifiers, bool, 0644); +core_param(crash_kexec_post_notifiers, crash_kexec_post_notifiers, bool, 0444); static int __init oops_setup(char *s) { -- 2.26.2 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-18 3:25 ` Dave Young 0 siblings, 0 replies; 40+ messages in thread From: Dave Young @ 2020-09-18 3:25 UTC (permalink / raw) To: Andrew Morton, bhe, Eric Biederman, linux-kernel, kexec crash_kexec_post_notifiers enables running various panic notifier before kdump kernel booting. This increases risks of kdump failure. It is well documented in kernel-parameters.txt. We do not suggest people to enable it together with kdump unless he/she is really sure. This is also not suggested to be enabled by default when users are not aware in distributions. But unfortunately it is enabled by default in systemd, see below discussions in a systemd report, we can not convince systemd to change it: https://github.com/systemd/systemd/issues/16661 Actually we have got reports about kdump kernel hangs in both s390x and powerpcle cases caused by the systemd change, also some x86 cases could also be caused by the same (although that is in Hyper-V code instead of systemd, that need to be addressed separately). Thus to avoid the auto enablement here just disable the param writable permission in sysfs. Signed-off-by: Dave Young <dyoung@redhat.com> --- kernel/panic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/panic.c b/kernel/panic.c index aef8872ba843..bea44fc4eb3b 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -695,7 +695,7 @@ core_param(panic, panic_timeout, int, 0644); core_param(panic_print, panic_print, ulong, 0644); core_param(pause_on_oops, pause_on_oops, int, 0644); core_param(panic_on_warn, panic_on_warn, int, 0644); -core_param(crash_kexec_post_notifiers, crash_kexec_post_notifiers, bool, 0644); +core_param(crash_kexec_post_notifiers, crash_kexec_post_notifiers, bool, 0444); static int __init oops_setup(char *s) { -- 2.26.2 _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time 2020-09-18 3:25 ` Dave Young @ 2020-09-19 0:47 ` Andrew Morton -1 siblings, 0 replies; 40+ messages in thread From: Andrew Morton @ 2020-09-19 0:47 UTC (permalink / raw) To: Dave Young; +Cc: bhe, Eric Biederman, linux-kernel, kexec On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: > crash_kexec_post_notifiers enables running various panic notifier > before kdump kernel booting. This increases risks of kdump failure. > It is well documented in kernel-parameters.txt. We do not suggest > people to enable it together with kdump unless he/she is really sure. > This is also not suggested to be enabled by default when users are > not aware in distributions. > > But unfortunately it is enabled by default in systemd, see below > discussions in a systemd report, we can not convince systemd to change > it: > https://github.com/systemd/systemd/issues/16661 > > Actually we have got reports about kdump kernel hangs in both s390x > and powerpcle cases caused by the systemd change, also some x86 cases > could also be caused by the same (although that is in Hyper-V code > instead of systemd, that need to be addressed separately). > > Thus to avoid the auto enablement here just disable the param writable > permission in sysfs. > Well. I don't think this is at all a desirable way of resolving a disagreement with the systemd developers At the above github address I'm seeing "ryncsn added a commit to ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't enable crash_kexec_post_notifiers by default". So didn't that address the issue? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-19 0:47 ` Andrew Morton 0 siblings, 0 replies; 40+ messages in thread From: Andrew Morton @ 2020-09-19 0:47 UTC (permalink / raw) To: Dave Young; +Cc: kexec, Eric Biederman, bhe, linux-kernel On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: > crash_kexec_post_notifiers enables running various panic notifier > before kdump kernel booting. This increases risks of kdump failure. > It is well documented in kernel-parameters.txt. We do not suggest > people to enable it together with kdump unless he/she is really sure. > This is also not suggested to be enabled by default when users are > not aware in distributions. > > But unfortunately it is enabled by default in systemd, see below > discussions in a systemd report, we can not convince systemd to change > it: > https://github.com/systemd/systemd/issues/16661 > > Actually we have got reports about kdump kernel hangs in both s390x > and powerpcle cases caused by the systemd change, also some x86 cases > could also be caused by the same (although that is in Hyper-V code > instead of systemd, that need to be addressed separately). > > Thus to avoid the auto enablement here just disable the param writable > permission in sysfs. > Well. I don't think this is at all a desirable way of resolving a disagreement with the systemd developers At the above github address I'm seeing "ryncsn added a commit to ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't enable crash_kexec_post_notifiers by default". So didn't that address the issue? _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time 2020-09-19 0:47 ` Andrew Morton @ 2020-09-19 7:26 ` Dave Young -1 siblings, 0 replies; 40+ messages in thread From: Dave Young @ 2020-09-19 7:26 UTC (permalink / raw) To: Andrew Morton; +Cc: bhe, Eric Biederman, linux-kernel, kexec On 09/18/20 at 05:47pm, Andrew Morton wrote: > On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: > > > crash_kexec_post_notifiers enables running various panic notifier > > before kdump kernel booting. This increases risks of kdump failure. > > It is well documented in kernel-parameters.txt. We do not suggest > > people to enable it together with kdump unless he/she is really sure. > > This is also not suggested to be enabled by default when users are > > not aware in distributions. > > > > But unfortunately it is enabled by default in systemd, see below > > discussions in a systemd report, we can not convince systemd to change > > it: > > https://github.com/systemd/systemd/issues/16661 > > > > Actually we have got reports about kdump kernel hangs in both s390x > > and powerpcle cases caused by the systemd change, also some x86 cases > > could also be caused by the same (although that is in Hyper-V code > > instead of systemd, that need to be addressed separately). > > > > Thus to avoid the auto enablement here just disable the param writable > > permission in sysfs. > > > > Well. I don't think this is at all a desirable way of resolving a > disagreement with the systemd developers > > At the above github address I'm seeing "ryncsn added a commit to > ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't > enable crash_kexec_post_notifiers by default". So didn't that address > the issue? > I hope that commit can be merged in systemd, but we are really not optimize about that. The discussion is clear there but we did not get response since Aug 6. BTW, Kairui sent the systemd pull request 15 days ago, the new update added some comment. Thanks Dave ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-19 7:26 ` Dave Young 0 siblings, 0 replies; 40+ messages in thread From: Dave Young @ 2020-09-19 7:26 UTC (permalink / raw) To: Andrew Morton; +Cc: kexec, Eric Biederman, bhe, linux-kernel On 09/18/20 at 05:47pm, Andrew Morton wrote: > On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: > > > crash_kexec_post_notifiers enables running various panic notifier > > before kdump kernel booting. This increases risks of kdump failure. > > It is well documented in kernel-parameters.txt. We do not suggest > > people to enable it together with kdump unless he/she is really sure. > > This is also not suggested to be enabled by default when users are > > not aware in distributions. > > > > But unfortunately it is enabled by default in systemd, see below > > discussions in a systemd report, we can not convince systemd to change > > it: > > https://github.com/systemd/systemd/issues/16661 > > > > Actually we have got reports about kdump kernel hangs in both s390x > > and powerpcle cases caused by the systemd change, also some x86 cases > > could also be caused by the same (although that is in Hyper-V code > > instead of systemd, that need to be addressed separately). > > > > Thus to avoid the auto enablement here just disable the param writable > > permission in sysfs. > > > > Well. I don't think this is at all a desirable way of resolving a > disagreement with the systemd developers > > At the above github address I'm seeing "ryncsn added a commit to > ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't > enable crash_kexec_post_notifiers by default". So didn't that address > the issue? > I hope that commit can be merged in systemd, but we are really not optimize about that. The discussion is clear there but we did not get response since Aug 6. BTW, Kairui sent the systemd pull request 15 days ago, the new update added some comment. Thanks Dave _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time 2020-09-19 0:47 ` Andrew Morton @ 2020-09-21 20:18 ` Konrad Rzeszutek Wilk -1 siblings, 0 replies; 40+ messages in thread From: Konrad Rzeszutek Wilk @ 2020-09-21 20:18 UTC (permalink / raw) To: Andrew Morton Cc: Dave Young, bhe, Eric Biederman, linux-kernel, kexec, Eric DeVolder, Boris Ostrovsky On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote: > On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: > > > crash_kexec_post_notifiers enables running various panic notifier > > before kdump kernel booting. This increases risks of kdump failure. > > It is well documented in kernel-parameters.txt. We do not suggest > > people to enable it together with kdump unless he/she is really sure. > > This is also not suggested to be enabled by default when users are > > not aware in distributions. > > > > But unfortunately it is enabled by default in systemd, see below > > discussions in a systemd report, we can not convince systemd to change > > it: > > https://github.com/systemd/systemd/issues/16661 > > > > Actually we have got reports about kdump kernel hangs in both s390x > > and powerpcle cases caused by the systemd change, also some x86 cases > > could also be caused by the same (although that is in Hyper-V code > > instead of systemd, that need to be addressed separately). Perhaps it may be better to fix the issus on s390x and PowerPC as well? > > > > Thus to avoid the auto enablement here just disable the param writable > > permission in sysfs. > > > > Well. I don't think this is at all a desirable way of resolving a > disagreement with the systemd developers > > At the above github address I'm seeing "ryncsn added a commit to > ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't > enable crash_kexec_post_notifiers by default". So didn't that address > the issue? It does in systemd, but there is a strong interest in making this on by default. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-21 20:18 ` Konrad Rzeszutek Wilk 0 siblings, 0 replies; 40+ messages in thread From: Konrad Rzeszutek Wilk @ 2020-09-21 20:18 UTC (permalink / raw) To: Andrew Morton Cc: bhe, kexec, linux-kernel, Eric Biederman, Boris Ostrovsky, Eric DeVolder, Dave Young On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote: > On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: > > > crash_kexec_post_notifiers enables running various panic notifier > > before kdump kernel booting. This increases risks of kdump failure. > > It is well documented in kernel-parameters.txt. We do not suggest > > people to enable it together with kdump unless he/she is really sure. > > This is also not suggested to be enabled by default when users are > > not aware in distributions. > > > > But unfortunately it is enabled by default in systemd, see below > > discussions in a systemd report, we can not convince systemd to change > > it: > > https://github.com/systemd/systemd/issues/16661 > > > > Actually we have got reports about kdump kernel hangs in both s390x > > and powerpcle cases caused by the systemd change, also some x86 cases > > could also be caused by the same (although that is in Hyper-V code > > instead of systemd, that need to be addressed separately). Perhaps it may be better to fix the issus on s390x and PowerPC as well? > > > > Thus to avoid the auto enablement here just disable the param writable > > permission in sysfs. > > > > Well. I don't think this is at all a desirable way of resolving a > disagreement with the systemd developers > > At the above github address I'm seeing "ryncsn added a commit to > ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't > enable crash_kexec_post_notifiers by default". So didn't that address > the issue? It does in systemd, but there is a strong interest in making this on by default. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time 2020-09-21 20:18 ` Konrad Rzeszutek Wilk @ 2020-09-22 1:45 ` Eric W. Biederman -1 siblings, 0 replies; 40+ messages in thread From: Eric W. Biederman @ 2020-09-22 1:45 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Andrew Morton, Dave Young, bhe, linux-kernel, kexec, Eric DeVolder, Boris Ostrovsky Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes: > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote: >> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: >> >> > crash_kexec_post_notifiers enables running various panic notifier >> > before kdump kernel booting. This increases risks of kdump failure. >> > It is well documented in kernel-parameters.txt. We do not suggest >> > people to enable it together with kdump unless he/she is really sure. >> > This is also not suggested to be enabled by default when users are >> > not aware in distributions. >> > >> > But unfortunately it is enabled by default in systemd, see below >> > discussions in a systemd report, we can not convince systemd to change >> > it: >> > https://github.com/systemd/systemd/issues/16661 >> > >> > Actually we have got reports about kdump kernel hangs in both s390x >> > and powerpcle cases caused by the systemd change, also some x86 cases >> > could also be caused by the same (although that is in Hyper-V code >> > instead of systemd, that need to be addressed separately). > > Perhaps it may be better to fix the issus on s390x and PowerPC as well? > >> > >> > Thus to avoid the auto enablement here just disable the param writable >> > permission in sysfs. >> > >> >> Well. I don't think this is at all a desirable way of resolving a >> disagreement with the systemd developers >> >> At the above github address I'm seeing "ryncsn added a commit to >> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't >> enable crash_kexec_post_notifiers by default". So didn't that address >> the issue? > > It does in systemd, but there is a strong interest in making this on > by default. There is also a strong interest in removing this code entirely from the kernel. This failure is a case in point. I think I am at my I told you so point. This is what all of the testing over all the years has said. Leaving functionality to the peculiarities of firmware when you don't have to, and can actually control what is going on doesn't work. Eric ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-22 1:45 ` Eric W. Biederman 0 siblings, 0 replies; 40+ messages in thread From: Eric W. Biederman @ 2020-09-22 1:45 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: bhe, kexec, linux-kernel, Andrew Morton, Eric DeVolder, Dave Young, Boris Ostrovsky Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes: > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote: >> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: >> >> > crash_kexec_post_notifiers enables running various panic notifier >> > before kdump kernel booting. This increases risks of kdump failure. >> > It is well documented in kernel-parameters.txt. We do not suggest >> > people to enable it together with kdump unless he/she is really sure. >> > This is also not suggested to be enabled by default when users are >> > not aware in distributions. >> > >> > But unfortunately it is enabled by default in systemd, see below >> > discussions in a systemd report, we can not convince systemd to change >> > it: >> > https://github.com/systemd/systemd/issues/16661 >> > >> > Actually we have got reports about kdump kernel hangs in both s390x >> > and powerpcle cases caused by the systemd change, also some x86 cases >> > could also be caused by the same (although that is in Hyper-V code >> > instead of systemd, that need to be addressed separately). > > Perhaps it may be better to fix the issus on s390x and PowerPC as well? > >> > >> > Thus to avoid the auto enablement here just disable the param writable >> > permission in sysfs. >> > >> >> Well. I don't think this is at all a desirable way of resolving a >> disagreement with the systemd developers >> >> At the above github address I'm seeing "ryncsn added a commit to >> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't >> enable crash_kexec_post_notifiers by default". So didn't that address >> the issue? > > It does in systemd, but there is a strong interest in making this on > by default. There is also a strong interest in removing this code entirely from the kernel. This failure is a case in point. I think I am at my I told you so point. This is what all of the testing over all the years has said. Leaving functionality to the peculiarities of firmware when you don't have to, and can actually control what is going on doesn't work. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time 2020-09-22 1:45 ` Eric W. Biederman @ 2020-09-23 2:43 ` Dave Young -1 siblings, 0 replies; 40+ messages in thread From: Dave Young @ 2020-09-23 2:43 UTC (permalink / raw) To: Eric W. Biederman Cc: Konrad Rzeszutek Wilk, Andrew Morton, bhe, linux-kernel, kexec, Eric DeVolder, Boris Ostrovsky, Tianyu Lani, Michael Kelley, Wei Liu, Masami Hiramatsu, HATAYAMA Daisuke + more people who may care about this param On 09/21/20 at 08:45pm, Eric W. Biederman wrote: > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes: > > > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote: > >> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: > >> > >> > crash_kexec_post_notifiers enables running various panic notifier > >> > before kdump kernel booting. This increases risks of kdump failure. > >> > It is well documented in kernel-parameters.txt. We do not suggest > >> > people to enable it together with kdump unless he/she is really sure. > >> > This is also not suggested to be enabled by default when users are > >> > not aware in distributions. > >> > > >> > But unfortunately it is enabled by default in systemd, see below > >> > discussions in a systemd report, we can not convince systemd to change > >> > it: > >> > https://github.com/systemd/systemd/issues/16661 > >> > > >> > Actually we have got reports about kdump kernel hangs in both s390x > >> > and powerpcle cases caused by the systemd change, also some x86 cases > >> > could also be caused by the same (although that is in Hyper-V code > >> > instead of systemd, that need to be addressed separately). > > > > Perhaps it may be better to fix the issus on s390x and PowerPC as well? > > > >> > > >> > Thus to avoid the auto enablement here just disable the param writable > >> > permission in sysfs. > >> > > >> > >> Well. I don't think this is at all a desirable way of resolving a > >> disagreement with the systemd developers > >> > >> At the above github address I'm seeing "ryncsn added a commit to > >> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't > >> enable crash_kexec_post_notifiers by default". So didn't that address > >> the issue? > > > > It does in systemd, but there is a strong interest in making this on > > by default. > > There is also a strong interest in removing this code entirely from the > kernel. Added Hyper-V people and people who created the param, it is below commit, I also want to remove it if possible, let's see how people think, but the least way should be to disable the auto setting in both systemd and kernel: commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45 Author: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Date: Fri Jun 6 14:37:07 2014 -0700 kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers Add a "crash_kexec_post_notifiers" boot option to run kdump after running panic_notifiers and dump kmsg. This can help rare situations where kdump fails because of unstable crashed kernel or hardware failure (memory corruption on critical data/code), or the 2nd kernel is already broken by the 1st kernel (it's a broken behavior, but who can guarantee that the "crashed" kernel works correctly?). Usage: add "crash_kexec_post_notifiers" to kernel boot option. Note that this actually increases risks of the failure of kdump. This option should be set only if you worry about the rare case of kdump failure rather than increasing the chance of success. > > This failure is a case in point. > > I think I am at my I told you so point. This is what all of the testing > over all the years has said. Leaving functionality to the peculiarities > of firmware when you don't have to, and can actually control what is > going on doesn't work. > > Eric > > Thanks Dave ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-23 2:43 ` Dave Young 0 siblings, 0 replies; 40+ messages in thread From: Dave Young @ 2020-09-23 2:43 UTC (permalink / raw) To: Eric W. Biederman Cc: Wei Liu, Tianyu Lani, bhe, Konrad Rzeszutek Wilk, kexec, linux-kernel, Michael Kelley, HATAYAMA Daisuke, Masami Hiramatsu, Andrew Morton, Eric DeVolder, Boris Ostrovsky + more people who may care about this param On 09/21/20 at 08:45pm, Eric W. Biederman wrote: > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes: > > > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote: > >> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: > >> > >> > crash_kexec_post_notifiers enables running various panic notifier > >> > before kdump kernel booting. This increases risks of kdump failure. > >> > It is well documented in kernel-parameters.txt. We do not suggest > >> > people to enable it together with kdump unless he/she is really sure. > >> > This is also not suggested to be enabled by default when users are > >> > not aware in distributions. > >> > > >> > But unfortunately it is enabled by default in systemd, see below > >> > discussions in a systemd report, we can not convince systemd to change > >> > it: > >> > https://github.com/systemd/systemd/issues/16661 > >> > > >> > Actually we have got reports about kdump kernel hangs in both s390x > >> > and powerpcle cases caused by the systemd change, also some x86 cases > >> > could also be caused by the same (although that is in Hyper-V code > >> > instead of systemd, that need to be addressed separately). > > > > Perhaps it may be better to fix the issus on s390x and PowerPC as well? > > > >> > > >> > Thus to avoid the auto enablement here just disable the param writable > >> > permission in sysfs. > >> > > >> > >> Well. I don't think this is at all a desirable way of resolving a > >> disagreement with the systemd developers > >> > >> At the above github address I'm seeing "ryncsn added a commit to > >> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't > >> enable crash_kexec_post_notifiers by default". So didn't that address > >> the issue? > > > > It does in systemd, but there is a strong interest in making this on > > by default. > > There is also a strong interest in removing this code entirely from the > kernel. Added Hyper-V people and people who created the param, it is below commit, I also want to remove it if possible, let's see how people think, but the least way should be to disable the auto setting in both systemd and kernel: commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45 Author: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Date: Fri Jun 6 14:37:07 2014 -0700 kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers Add a "crash_kexec_post_notifiers" boot option to run kdump after running panic_notifiers and dump kmsg. This can help rare situations where kdump fails because of unstable crashed kernel or hardware failure (memory corruption on critical data/code), or the 2nd kernel is already broken by the 1st kernel (it's a broken behavior, but who can guarantee that the "crashed" kernel works correctly?). Usage: add "crash_kexec_post_notifiers" to kernel boot option. Note that this actually increases risks of the failure of kdump. This option should be set only if you worry about the rare case of kdump failure rather than increasing the chance of success. > > This failure is a case in point. > > I think I am at my I told you so point. This is what all of the testing > over all the years has said. Leaving functionality to the peculiarities > of firmware when you don't have to, and can actually control what is > going on doesn't work. > > Eric > > Thanks Dave _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time 2020-09-23 2:43 ` Dave Young @ 2020-09-23 15:48 ` Konrad Rzeszutek Wilk -1 siblings, 0 replies; 40+ messages in thread From: Konrad Rzeszutek Wilk @ 2020-09-23 15:48 UTC (permalink / raw) To: Dave Young Cc: Eric W. Biederman, Andrew Morton, bhe, linux-kernel, kexec, Eric DeVolder, Boris Ostrovsky, Tianyu Lani, Michael Kelley, Wei Liu, Masami Hiramatsu, HATAYAMA Daisuke On Wed, Sep 23, 2020 at 10:43:29AM +0800, Dave Young wrote: > + more people who may care about this param Paarty time!! (See below, didn't snip any comments) > On 09/21/20 at 08:45pm, Eric W. Biederman wrote: > > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes: > > > > > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote: > > >> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: > > >> > > >> > crash_kexec_post_notifiers enables running various panic notifier > > >> > before kdump kernel booting. This increases risks of kdump failure. > > >> > It is well documented in kernel-parameters.txt. We do not suggest > > >> > people to enable it together with kdump unless he/she is really sure. > > >> > This is also not suggested to be enabled by default when users are > > >> > not aware in distributions. > > >> > > > >> > But unfortunately it is enabled by default in systemd, see below > > >> > discussions in a systemd report, we can not convince systemd to change > > >> > it: > > >> > https://github.com/systemd/systemd/issues/16661 > > >> > > > >> > Actually we have got reports about kdump kernel hangs in both s390x > > >> > and powerpcle cases caused by the systemd change, also some x86 cases > > >> > could also be caused by the same (although that is in Hyper-V code > > >> > instead of systemd, that need to be addressed separately). > > > > > > Perhaps it may be better to fix the issus on s390x and PowerPC as well? > > > > > >> > > > >> > Thus to avoid the auto enablement here just disable the param writable > > >> > permission in sysfs. > > >> > > > >> > > >> Well. I don't think this is at all a desirable way of resolving a > > >> disagreement with the systemd developers > > >> > > >> At the above github address I'm seeing "ryncsn added a commit to > > >> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't > > >> enable crash_kexec_post_notifiers by default". So didn't that address > > >> the issue? > > > > > > It does in systemd, but there is a strong interest in making this on > > > by default. > > > > There is also a strong interest in removing this code entirely from the > > kernel. > > Added Hyper-V people and people who created the param, it is below > commit, I also want to remove it if possible, let's see how people > think, but the least way should be to disable the auto setting in both systemd > and kernel: > > commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45 > Author: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> > Date: Fri Jun 6 14:37:07 2014 -0700 > > kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers > > Add a "crash_kexec_post_notifiers" boot option to run kdump after > running panic_notifiers and dump kmsg. This can help rare situations > where kdump fails because of unstable crashed kernel or hardware failure > (memory corruption on critical data/code), or the 2nd kernel is already > broken by the 1st kernel (it's a broken behavior, but who can guarantee > that the "crashed" kernel works correctly?). > > Usage: add "crash_kexec_post_notifiers" to kernel boot option. > > Note that this actually increases risks of the failure of kdump. This > option should be set only if you worry about the rare case of kdump > failure rather than increasing the chance of success. If this is such risky knob that leads to bugs where folks are backing away from with disgust in their faces - then perhaps the only way to go about this is - limit the exposure to known working situations on firmware that we can control? That is enable only a subset of post notifiers which determine if they are OK running if the conditions are blessed? I think that would satisfy the conditions where you have to to deal with unsavory bugs that end up on your plate - and aren't fun because there is no way to fixing it - but at the same time allowing multiple ways to save the crash? Please don't take away something that is quite useful in the field. Can we hammer out something that will remove your pain points? > > > > > This failure is a case in point. > > > > I think I am at my I told you so point. This is what all of the testing > > over all the years has said. Leaving functionality to the peculiarities > > of firmware when you don't have to, and can actually control what is > > going on doesn't work. > > > > Eric > > > > > > Thanks > Dave > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-23 15:48 ` Konrad Rzeszutek Wilk 0 siblings, 0 replies; 40+ messages in thread From: Konrad Rzeszutek Wilk @ 2020-09-23 15:48 UTC (permalink / raw) To: Dave Young Cc: Wei Liu, Tianyu Lani, bhe, kexec, linux-kernel, Michael Kelley, HATAYAMA Daisuke, Eric W. Biederman, Masami Hiramatsu, Andrew Morton, Eric DeVolder, Boris Ostrovsky On Wed, Sep 23, 2020 at 10:43:29AM +0800, Dave Young wrote: > + more people who may care about this param Paarty time!! (See below, didn't snip any comments) > On 09/21/20 at 08:45pm, Eric W. Biederman wrote: > > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes: > > > > > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote: > > >> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: > > >> > > >> > crash_kexec_post_notifiers enables running various panic notifier > > >> > before kdump kernel booting. This increases risks of kdump failure. > > >> > It is well documented in kernel-parameters.txt. We do not suggest > > >> > people to enable it together with kdump unless he/she is really sure. > > >> > This is also not suggested to be enabled by default when users are > > >> > not aware in distributions. > > >> > > > >> > But unfortunately it is enabled by default in systemd, see below > > >> > discussions in a systemd report, we can not convince systemd to change > > >> > it: > > >> > https://github.com/systemd/systemd/issues/16661 > > >> > > > >> > Actually we have got reports about kdump kernel hangs in both s390x > > >> > and powerpcle cases caused by the systemd change, also some x86 cases > > >> > could also be caused by the same (although that is in Hyper-V code > > >> > instead of systemd, that need to be addressed separately). > > > > > > Perhaps it may be better to fix the issus on s390x and PowerPC as well? > > > > > >> > > > >> > Thus to avoid the auto enablement here just disable the param writable > > >> > permission in sysfs. > > >> > > > >> > > >> Well. I don't think this is at all a desirable way of resolving a > > >> disagreement with the systemd developers > > >> > > >> At the above github address I'm seeing "ryncsn added a commit to > > >> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't > > >> enable crash_kexec_post_notifiers by default". So didn't that address > > >> the issue? > > > > > > It does in systemd, but there is a strong interest in making this on > > > by default. > > > > There is also a strong interest in removing this code entirely from the > > kernel. > > Added Hyper-V people and people who created the param, it is below > commit, I also want to remove it if possible, let's see how people > think, but the least way should be to disable the auto setting in both systemd > and kernel: > > commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45 > Author: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> > Date: Fri Jun 6 14:37:07 2014 -0700 > > kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers > > Add a "crash_kexec_post_notifiers" boot option to run kdump after > running panic_notifiers and dump kmsg. This can help rare situations > where kdump fails because of unstable crashed kernel or hardware failure > (memory corruption on critical data/code), or the 2nd kernel is already > broken by the 1st kernel (it's a broken behavior, but who can guarantee > that the "crashed" kernel works correctly?). > > Usage: add "crash_kexec_post_notifiers" to kernel boot option. > > Note that this actually increases risks of the failure of kdump. This > option should be set only if you worry about the rare case of kdump > failure rather than increasing the chance of success. If this is such risky knob that leads to bugs where folks are backing away from with disgust in their faces - then perhaps the only way to go about this is - limit the exposure to known working situations on firmware that we can control? That is enable only a subset of post notifiers which determine if they are OK running if the conditions are blessed? I think that would satisfy the conditions where you have to to deal with unsavory bugs that end up on your plate - and aren't fun because there is no way to fixing it - but at the same time allowing multiple ways to save the crash? Please don't take away something that is quite useful in the field. Can we hammer out something that will remove your pain points? > > > > > This failure is a case in point. > > > > I think I am at my I told you so point. This is what all of the testing > > over all the years has said. Leaving functionality to the peculiarities > > of firmware when you don't have to, and can actually control what is > > going on doesn't work. > > > > Eric > > > > > > Thanks > Dave > _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time 2020-09-23 15:48 ` Konrad Rzeszutek Wilk @ 2020-09-24 16:15 ` Michael Kelley -1 siblings, 0 replies; 40+ messages in thread From: Michael Kelley @ 2020-09-24 16:15 UTC (permalink / raw) To: Konrad Rzeszutek Wilk, Dave Young Cc: Eric W. Biederman, Andrew Morton, bhe, linux-kernel, kexec, Eric DeVolder, Boris Ostrovsky, Tianyu Lan, Wei Liu, Masami Hiramatsu, HATAYAMA Daisuke From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Sent: Wednesday, September 23, 2020 8:48 AM > > On Wed, Sep 23, 2020 at 10:43:29AM +0800, Dave Young wrote: > > + more people who may care about this param > > Paarty time!! > > (See below, didn't snip any comments) > > On 09/21/20 at 08:45pm, Eric W. Biederman wrote: > > > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes: > > > > > > > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote: > > > >> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: > > > >> > > > >> > crash_kexec_post_notifiers enables running various panic notifier > > > >> > before kdump kernel booting. This increases risks of kdump failure. > > > >> > It is well documented in kernel-parameters.txt. We do not suggest > > > >> > people to enable it together with kdump unless he/she is really sure. > > > >> > This is also not suggested to be enabled by default when users are > > > >> > not aware in distributions. > > > >> > > > > >> > But unfortunately it is enabled by default in systemd, see below > > > >> > discussions in a systemd report, we can not convince systemd to change > > > >> > it: > > > >> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsyst > emd%2Fsystemd%2Fissues%2F16661&data=02%7C01%7Cmikelley%40microsoft.com% > 7C3631bae06f7147c0f92908d85fd7f2b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0% > 7C637364728378052956&sdata=9CUpPUxcKLLggbJ1bjubBjbFUAhPVeZhIc4yss8wAiU%3 > D&reserved=0 > > > >> > > > > >> > Actually we have got reports about kdump kernel hangs in both s390x > > > >> > and powerpcle cases caused by the systemd change, also some x86 cases > > > >> > could also be caused by the same (although that is in Hyper-V code > > > >> > instead of systemd, that need to be addressed separately). > > > > > > > > Perhaps it may be better to fix the issus on s390x and PowerPC as well? > > > > > > > >> > > > > >> > Thus to avoid the auto enablement here just disable the param writable > > > >> > permission in sysfs. > > > >> > > > > >> > > > >> Well. I don't think this is at all a desirable way of resolving a > > > >> disagreement with the systemd developers > > > >> > > > >> At the above github address I'm seeing "ryncsn added a commit to > > > >> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't > > > >> enable crash_kexec_post_notifiers by default". So didn't that address > > > >> the issue? > > > > > > > > It does in systemd, but there is a strong interest in making this on > > > > by default. > > > > > > There is also a strong interest in removing this code entirely from the > > > kernel. > > > > Added Hyper-V people and people who created the param, it is below > > commit, I also want to remove it if possible, let's see how people > > think, but the least way should be to disable the auto setting in both systemd > > and kernel: Hyper-V uses a notifier to inform the host system that a Linux VM has panic'ed. Informing the host is particularly important in a public cloud such as Azure so that the cloud software can alert the customer, and can track cloud-wide reliability statistics. Whether a kdump is taken is controlled entirely by the customer and how he configures the VM, and we want the host to be informed either way. Michael > > > > commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45 > > Author: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> > > Date: Fri Jun 6 14:37:07 2014 -0700 > > > > kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after > panic_notifers > > > > Add a "crash_kexec_post_notifiers" boot option to run kdump after > > running panic_notifiers and dump kmsg. This can help rare situations > > where kdump fails because of unstable crashed kernel or hardware failure > > (memory corruption on critical data/code), or the 2nd kernel is already > > broken by the 1st kernel (it's a broken behavior, but who can guarantee > > that the "crashed" kernel works correctly?). > > > > Usage: add "crash_kexec_post_notifiers" to kernel boot option. > > > > Note that this actually increases risks of the failure of kdump. This > > option should be set only if you worry about the rare case of kdump > > failure rather than increasing the chance of success. > > > If this is such risky knob that leads to bugs where folks are backing away > from with disgust in their faces - then perhaps the only way to go about > this is - limit the exposure to known working situations on firmware > that we can control? > > That is enable only a subset of post notifiers which determine if they > are OK running if the conditions are blessed? > > I think that would satisfy the conditions where you have to to deal with unsavory > bugs that end up on your plate - and aren't fun because there is no > way to fixing it - but at the same time allowing multiple ways to save the crash? > > Please don't take away something that is quite useful in the field. Can we > hammer out something that will remove your pain points? > > > > > > > > This failure is a case in point. > > > > > > I think I am at my I told you so point. This is what all of the testing > > > over all the years has said. Leaving functionality to the peculiarities > > > of firmware when you don't have to, and can actually control what is > > > going on doesn't work. > > > > > > Eric > > > > > > > > > > Thanks > > Dave > > ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-24 16:15 ` Michael Kelley 0 siblings, 0 replies; 40+ messages in thread From: Michael Kelley @ 2020-09-24 16:15 UTC (permalink / raw) To: Konrad Rzeszutek Wilk, Dave Young Cc: Wei Liu, Tianyu Lan, bhe, kexec, linux-kernel, HATAYAMA Daisuke, Eric W. Biederman, Masami Hiramatsu, Andrew Morton, Eric DeVolder, Boris Ostrovsky From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Sent: Wednesday, September 23, 2020 8:48 AM > > On Wed, Sep 23, 2020 at 10:43:29AM +0800, Dave Young wrote: > > + more people who may care about this param > > Paarty time!! > > (See below, didn't snip any comments) > > On 09/21/20 at 08:45pm, Eric W. Biederman wrote: > > > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes: > > > > > > > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote: > > > >> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: > > > >> > > > >> > crash_kexec_post_notifiers enables running various panic notifier > > > >> > before kdump kernel booting. This increases risks of kdump failure. > > > >> > It is well documented in kernel-parameters.txt. We do not suggest > > > >> > people to enable it together with kdump unless he/she is really sure. > > > >> > This is also not suggested to be enabled by default when users are > > > >> > not aware in distributions. > > > >> > > > > >> > But unfortunately it is enabled by default in systemd, see below > > > >> > discussions in a systemd report, we can not convince systemd to change > > > >> > it: > > > >> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsyst > emd%2Fsystemd%2Fissues%2F16661&data=02%7C01%7Cmikelley%40microsoft.com% > 7C3631bae06f7147c0f92908d85fd7f2b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0% > 7C637364728378052956&sdata=9CUpPUxcKLLggbJ1bjubBjbFUAhPVeZhIc4yss8wAiU%3 > D&reserved=0 > > > >> > > > > >> > Actually we have got reports about kdump kernel hangs in both s390x > > > >> > and powerpcle cases caused by the systemd change, also some x86 cases > > > >> > could also be caused by the same (although that is in Hyper-V code > > > >> > instead of systemd, that need to be addressed separately). > > > > > > > > Perhaps it may be better to fix the issus on s390x and PowerPC as well? > > > > > > > >> > > > > >> > Thus to avoid the auto enablement here just disable the param writable > > > >> > permission in sysfs. > > > >> > > > > >> > > > >> Well. I don't think this is at all a desirable way of resolving a > > > >> disagreement with the systemd developers > > > >> > > > >> At the above github address I'm seeing "ryncsn added a commit to > > > >> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't > > > >> enable crash_kexec_post_notifiers by default". So didn't that address > > > >> the issue? > > > > > > > > It does in systemd, but there is a strong interest in making this on > > > > by default. > > > > > > There is also a strong interest in removing this code entirely from the > > > kernel. > > > > Added Hyper-V people and people who created the param, it is below > > commit, I also want to remove it if possible, let's see how people > > think, but the least way should be to disable the auto setting in both systemd > > and kernel: Hyper-V uses a notifier to inform the host system that a Linux VM has panic'ed. Informing the host is particularly important in a public cloud such as Azure so that the cloud software can alert the customer, and can track cloud-wide reliability statistics. Whether a kdump is taken is controlled entirely by the customer and how he configures the VM, and we want the host to be informed either way. Michael > > > > commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45 > > Author: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> > > Date: Fri Jun 6 14:37:07 2014 -0700 > > > > kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after > panic_notifers > > > > Add a "crash_kexec_post_notifiers" boot option to run kdump after > > running panic_notifiers and dump kmsg. This can help rare situations > > where kdump fails because of unstable crashed kernel or hardware failure > > (memory corruption on critical data/code), or the 2nd kernel is already > > broken by the 1st kernel (it's a broken behavior, but who can guarantee > > that the "crashed" kernel works correctly?). > > > > Usage: add "crash_kexec_post_notifiers" to kernel boot option. > > > > Note that this actually increases risks of the failure of kdump. This > > option should be set only if you worry about the rare case of kdump > > failure rather than increasing the chance of success. > > > If this is such risky knob that leads to bugs where folks are backing away > from with disgust in their faces - then perhaps the only way to go about > this is - limit the exposure to known working situations on firmware > that we can control? > > That is enable only a subset of post notifiers which determine if they > are OK running if the conditions are blessed? > > I think that would satisfy the conditions where you have to to deal with unsavory > bugs that end up on your plate - and aren't fun because there is no > way to fixing it - but at the same time allowing multiple ways to save the crash? > > Please don't take away something that is quite useful in the field. Can we > hammer out something that will remove your pain points? > > > > > > > > This failure is a case in point. > > > > > > I think I am at my I told you so point. This is what all of the testing > > > over all the years has said. Leaving functionality to the peculiarities > > > of firmware when you don't have to, and can actually control what is > > > going on doesn't work. > > > > > > Eric > > > > > > > > > > Thanks > > Dave > > _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time 2020-09-24 16:15 ` Michael Kelley @ 2020-09-24 16:25 ` Eric W. Biederman -1 siblings, 0 replies; 40+ messages in thread From: Eric W. Biederman @ 2020-09-24 16:25 UTC (permalink / raw) To: Michael Kelley Cc: Konrad Rzeszutek Wilk, Dave Young, Andrew Morton, bhe, linux-kernel, kexec, Eric DeVolder, Boris Ostrovsky, Tianyu Lan, Wei Liu, Masami Hiramatsu, HATAYAMA Daisuke Michael Kelley <mikelley@microsoft.com> writes: > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Sent: Wednesday, September 23, 2020 8:48 AM >> >> On Wed, Sep 23, 2020 at 10:43:29AM +0800, Dave Young wrote: >> > + more people who may care about this param >> >> Paarty time!! >> >> (See below, didn't snip any comments) >> > On 09/21/20 at 08:45pm, Eric W. Biederman wrote: >> > > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes: >> > > >> > > > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote: >> > > >> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: >> > > >> >> > > >> > crash_kexec_post_notifiers enables running various panic notifier >> > > >> > before kdump kernel booting. This increases risks of kdump failure. >> > > >> > It is well documented in kernel-parameters.txt. We do not suggest >> > > >> > people to enable it together with kdump unless he/she is really sure. >> > > >> > This is also not suggested to be enabled by default when users are >> > > >> > not aware in distributions. >> > > >> > >> > > >> > But unfortunately it is enabled by default in systemd, see below >> > > >> > discussions in a systemd report, we can not convince systemd to change >> > > >> > it: >> > > >> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsyst >> emd%2Fsystemd%2Fissues%2F16661&data=02%7C01%7Cmikelley%40microsoft.com% >> 7C3631bae06f7147c0f92908d85fd7f2b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0% >> 7C637364728378052956&sdata=9CUpPUxcKLLggbJ1bjubBjbFUAhPVeZhIc4yss8wAiU%3 >> D&reserved=0 >> > > >> > >> > > >> > Actually we have got reports about kdump kernel hangs in both s390x >> > > >> > and powerpcle cases caused by the systemd change, also some x86 cases >> > > >> > could also be caused by the same (although that is in Hyper-V code >> > > >> > instead of systemd, that need to be addressed separately). >> > > > >> > > > Perhaps it may be better to fix the issus on s390x and PowerPC as well? >> > > > >> > > >> > >> > > >> > Thus to avoid the auto enablement here just disable the param writable >> > > >> > permission in sysfs. >> > > >> > >> > > >> >> > > >> Well. I don't think this is at all a desirable way of resolving a >> > > >> disagreement with the systemd developers >> > > >> >> > > >> At the above github address I'm seeing "ryncsn added a commit to >> > > >> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't >> > > >> enable crash_kexec_post_notifiers by default". So didn't that address >> > > >> the issue? >> > > > >> > > > It does in systemd, but there is a strong interest in making this on >> > > > by default. >> > > >> > > There is also a strong interest in removing this code entirely from the >> > > kernel. >> > >> > Added Hyper-V people and people who created the param, it is below >> > commit, I also want to remove it if possible, let's see how people >> > think, but the least way should be to disable the auto setting in both systemd >> > and kernel: > > Hyper-V uses a notifier to inform the host system that a Linux VM has > panic'ed. Informing the host is particularly important in a public cloud > such as Azure so that the cloud software can alert the customer, and can > track cloud-wide reliability statistics. Whether a kdump is taken is controlled > entirely by the customer and how he configures the VM, and we want > the host to be informed either way. Why? Why does the host care? Especially if the VM continues executing into a kdump kernel? Further like I have mentioned everytime something like this has come up a call on the kexec on panic code path should be a direct call (That can be audited) not something hidden in a notifier call chain (which can not). Eric ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-24 16:25 ` Eric W. Biederman 0 siblings, 0 replies; 40+ messages in thread From: Eric W. Biederman @ 2020-09-24 16:25 UTC (permalink / raw) To: Michael Kelley Cc: Wei Liu, Tianyu Lan, bhe, Konrad Rzeszutek Wilk, kexec, linux-kernel, HATAYAMA Daisuke, Masami Hiramatsu, Andrew Morton, Eric DeVolder, Dave Young, Boris Ostrovsky Michael Kelley <mikelley@microsoft.com> writes: > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Sent: Wednesday, September 23, 2020 8:48 AM >> >> On Wed, Sep 23, 2020 at 10:43:29AM +0800, Dave Young wrote: >> > + more people who may care about this param >> >> Paarty time!! >> >> (See below, didn't snip any comments) >> > On 09/21/20 at 08:45pm, Eric W. Biederman wrote: >> > > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes: >> > > >> > > > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote: >> > > >> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: >> > > >> >> > > >> > crash_kexec_post_notifiers enables running various panic notifier >> > > >> > before kdump kernel booting. This increases risks of kdump failure. >> > > >> > It is well documented in kernel-parameters.txt. We do not suggest >> > > >> > people to enable it together with kdump unless he/she is really sure. >> > > >> > This is also not suggested to be enabled by default when users are >> > > >> > not aware in distributions. >> > > >> > >> > > >> > But unfortunately it is enabled by default in systemd, see below >> > > >> > discussions in a systemd report, we can not convince systemd to change >> > > >> > it: >> > > >> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsyst >> emd%2Fsystemd%2Fissues%2F16661&data=02%7C01%7Cmikelley%40microsoft.com% >> 7C3631bae06f7147c0f92908d85fd7f2b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0% >> 7C637364728378052956&sdata=9CUpPUxcKLLggbJ1bjubBjbFUAhPVeZhIc4yss8wAiU%3 >> D&reserved=0 >> > > >> > >> > > >> > Actually we have got reports about kdump kernel hangs in both s390x >> > > >> > and powerpcle cases caused by the systemd change, also some x86 cases >> > > >> > could also be caused by the same (although that is in Hyper-V code >> > > >> > instead of systemd, that need to be addressed separately). >> > > > >> > > > Perhaps it may be better to fix the issus on s390x and PowerPC as well? >> > > > >> > > >> > >> > > >> > Thus to avoid the auto enablement here just disable the param writable >> > > >> > permission in sysfs. >> > > >> > >> > > >> >> > > >> Well. I don't think this is at all a desirable way of resolving a >> > > >> disagreement with the systemd developers >> > > >> >> > > >> At the above github address I'm seeing "ryncsn added a commit to >> > > >> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't >> > > >> enable crash_kexec_post_notifiers by default". So didn't that address >> > > >> the issue? >> > > > >> > > > It does in systemd, but there is a strong interest in making this on >> > > > by default. >> > > >> > > There is also a strong interest in removing this code entirely from the >> > > kernel. >> > >> > Added Hyper-V people and people who created the param, it is below >> > commit, I also want to remove it if possible, let's see how people >> > think, but the least way should be to disable the auto setting in both systemd >> > and kernel: > > Hyper-V uses a notifier to inform the host system that a Linux VM has > panic'ed. Informing the host is particularly important in a public cloud > such as Azure so that the cloud software can alert the customer, and can > track cloud-wide reliability statistics. Whether a kdump is taken is controlled > entirely by the customer and how he configures the VM, and we want > the host to be informed either way. Why? Why does the host care? Especially if the VM continues executing into a kdump kernel? Further like I have mentioned everytime something like this has come up a call on the kexec on panic code path should be a direct call (That can be audited) not something hidden in a notifier call chain (which can not). Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time 2020-09-24 16:25 ` Eric W. Biederman @ 2020-09-24 16:43 ` Michael Kelley -1 siblings, 0 replies; 40+ messages in thread From: Michael Kelley @ 2020-09-24 16:43 UTC (permalink / raw) To: Eric W. Biederman Cc: Konrad Rzeszutek Wilk, Dave Young, Andrew Morton, bhe, linux-kernel, kexec, Eric DeVolder, Boris Ostrovsky, Tianyu Lan, Wei Liu, Masami Hiramatsu, HATAYAMA Daisuke From: Eric W. Biederman <ebiederm@xmission.com> Sent: Thursday, September 24, 2020 9:26 AM > > Michael Kelley <mikelley@microsoft.com> writes: > > >> > > >> > Added Hyper-V people and people who created the param, it is below > >> > commit, I also want to remove it if possible, let's see how people > >> > think, but the least way should be to disable the auto setting in both systemd > >> > and kernel: > > > > Hyper-V uses a notifier to inform the host system that a Linux VM has > > panic'ed. Informing the host is particularly important in a public cloud > > such as Azure so that the cloud software can alert the customer, and can > > track cloud-wide reliability statistics. Whether a kdump is taken is controlled > > entirely by the customer and how he configures the VM, and we want > > the host to be informed either way. > > Why? > > Why does the host care? > Especially if the VM continues executing into a kdump kernel? The host itself doesn't care. But the host is a convenient out-of-band channel for recording that a panic has occurred and to collect basic data about the panic. This out-of-band channel is then used to notify the end customer that his VM has panic'ed. Sure, the customer should be running his own monitoring software, but customers don't always do what they should. Equally important, the out-of-band channel allows the cloud infrastructure software to notice trends, such as that the rate of Linux panics has increased, and that perhaps there is a cloud problem that should be investigated. > > Further like I have mentioned everytime something like this has come up > a call on the kexec on panic code path should be a direct call (That can > be audited) not something hidden in a notifier call chain (which can not). > The use case I describe has no particular requirement that it be implemented via the notifier call chain. If there's a better way to run some out-of-band notification code on all Linux panics regardless of whether a kdump is taken, we're open to such an alternative. Michael ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-24 16:43 ` Michael Kelley 0 siblings, 0 replies; 40+ messages in thread From: Michael Kelley @ 2020-09-24 16:43 UTC (permalink / raw) To: Eric W. Biederman Cc: Wei Liu, Tianyu Lan, bhe, Konrad Rzeszutek Wilk, kexec, linux-kernel, HATAYAMA Daisuke, Masami Hiramatsu, Andrew Morton, Eric DeVolder, Dave Young, Boris Ostrovsky From: Eric W. Biederman <ebiederm@xmission.com> Sent: Thursday, September 24, 2020 9:26 AM > > Michael Kelley <mikelley@microsoft.com> writes: > > >> > > >> > Added Hyper-V people and people who created the param, it is below > >> > commit, I also want to remove it if possible, let's see how people > >> > think, but the least way should be to disable the auto setting in both systemd > >> > and kernel: > > > > Hyper-V uses a notifier to inform the host system that a Linux VM has > > panic'ed. Informing the host is particularly important in a public cloud > > such as Azure so that the cloud software can alert the customer, and can > > track cloud-wide reliability statistics. Whether a kdump is taken is controlled > > entirely by the customer and how he configures the VM, and we want > > the host to be informed either way. > > Why? > > Why does the host care? > Especially if the VM continues executing into a kdump kernel? The host itself doesn't care. But the host is a convenient out-of-band channel for recording that a panic has occurred and to collect basic data about the panic. This out-of-band channel is then used to notify the end customer that his VM has panic'ed. Sure, the customer should be running his own monitoring software, but customers don't always do what they should. Equally important, the out-of-band channel allows the cloud infrastructure software to notice trends, such as that the rate of Linux panics has increased, and that perhaps there is a cloud problem that should be investigated. > > Further like I have mentioned everytime something like this has come up > a call on the kexec on panic code path should be a direct call (That can > be audited) not something hidden in a notifier call chain (which can not). > The use case I describe has no particular requirement that it be implemented via the notifier call chain. If there's a better way to run some out-of-band notification code on all Linux panics regardless of whether a kdump is taken, we're open to such an alternative. Michael _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time 2020-09-24 16:43 ` Michael Kelley @ 2020-09-24 17:16 ` boris.ostrovsky -1 siblings, 0 replies; 40+ messages in thread From: boris.ostrovsky @ 2020-09-24 17:16 UTC (permalink / raw) To: Michael Kelley, Eric W. Biederman Cc: Konrad Rzeszutek Wilk, Dave Young, Andrew Morton, bhe, linux-kernel, kexec, Eric DeVolder, Tianyu Lan, Wei Liu, Masami Hiramatsu, HATAYAMA Daisuke On 9/24/20 12:43 PM, Michael Kelley wrote: > From: Eric W. Biederman <ebiederm@xmission.com> Sent: Thursday, September 24, 2020 9:26 AM >> Michael Kelley <mikelley@microsoft.com> writes: >> >>>>> Added Hyper-V people and people who created the param, it is below >>>>> commit, I also want to remove it if possible, let's see how people >>>>> think, but the least way should be to disable the auto setting in both systemd >>>>> and kernel: >>> Hyper-V uses a notifier to inform the host system that a Linux VM has >>> panic'ed. Informing the host is particularly important in a public cloud >>> such as Azure so that the cloud software can alert the customer, and can >>> track cloud-wide reliability statistics. Whether a kdump is taken is controlled >>> entirely by the customer and how he configures the VM, and we want >>> the host to be informed either way. >> Why? >> >> Why does the host care? >> Especially if the VM continues executing into a kdump kernel? > The host itself doesn't care. But the host is a convenient out-of-band > channel for recording that a panic has occurred and to collect basic data > about the panic. This out-of-band channel is then used to notify the end > customer that his VM has panic'ed. Sure, the customer should be running > his own monitoring software, but customers don't always do what they > should. Equally important, the out-of-band channel allows the cloud > infrastructure software to notice trends, such as that the rate of Linux > panics has increased, and that perhaps there is a cloud problem that > should be investigated. In many cases (especially in cloud environment) your dump device is remote (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity issues (which could be cause of the panic in the first place). So it is quite desirable to inform the infrastructure that the VM is on its way out without waiting for kdump to complete. > >> Further like I have mentioned everytime something like this has come up >> a call on the kexec on panic code path should be a direct call (That can >> be audited) not something hidden in a notifier call chain (which can not). >> We btw already have a direct call from panic() to kmsg_dump() which is indirectly controlled by crash_kexec_post_notifiers, and it would also be preferable to be able to call it before kdump as well. -boris > The use case I describe has no particular requirement that it be > implemented via the notifier call chain. If there's a better way to run > some out-of-band notification code on all Linux panics regardless of > whether a kdump is taken, we're open to such an alternative. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-24 17:16 ` boris.ostrovsky 0 siblings, 0 replies; 40+ messages in thread From: boris.ostrovsky @ 2020-09-24 17:16 UTC (permalink / raw) To: Michael Kelley, Eric W. Biederman Cc: Wei Liu, Tianyu Lan, bhe, Konrad Rzeszutek Wilk, kexec, linux-kernel, HATAYAMA Daisuke, Masami Hiramatsu, Andrew Morton, Eric DeVolder, Dave Young On 9/24/20 12:43 PM, Michael Kelley wrote: > From: Eric W. Biederman <ebiederm@xmission.com> Sent: Thursday, September 24, 2020 9:26 AM >> Michael Kelley <mikelley@microsoft.com> writes: >> >>>>> Added Hyper-V people and people who created the param, it is below >>>>> commit, I also want to remove it if possible, let's see how people >>>>> think, but the least way should be to disable the auto setting in both systemd >>>>> and kernel: >>> Hyper-V uses a notifier to inform the host system that a Linux VM has >>> panic'ed. Informing the host is particularly important in a public cloud >>> such as Azure so that the cloud software can alert the customer, and can >>> track cloud-wide reliability statistics. Whether a kdump is taken is controlled >>> entirely by the customer and how he configures the VM, and we want >>> the host to be informed either way. >> Why? >> >> Why does the host care? >> Especially if the VM continues executing into a kdump kernel? > The host itself doesn't care. But the host is a convenient out-of-band > channel for recording that a panic has occurred and to collect basic data > about the panic. This out-of-band channel is then used to notify the end > customer that his VM has panic'ed. Sure, the customer should be running > his own monitoring software, but customers don't always do what they > should. Equally important, the out-of-band channel allows the cloud > infrastructure software to notice trends, such as that the rate of Linux > panics has increased, and that perhaps there is a cloud problem that > should be investigated. In many cases (especially in cloud environment) your dump device is remote (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity issues (which could be cause of the panic in the first place). So it is quite desirable to inform the infrastructure that the VM is on its way out without waiting for kdump to complete. > >> Further like I have mentioned everytime something like this has come up >> a call on the kexec on panic code path should be a direct call (That can >> be audited) not something hidden in a notifier call chain (which can not). >> We btw already have a direct call from panic() to kmsg_dump() which is indirectly controlled by crash_kexec_post_notifiers, and it would also be preferable to be able to call it before kdump as well. -boris > The use case I describe has no particular requirement that it be > implemented via the notifier call chain. If there's a better way to run > some out-of-band notification code on all Linux panics regardless of > whether a kdump is taken, we're open to such an alternative. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time 2020-09-24 17:16 ` boris.ostrovsky @ 2020-09-25 3:05 ` Dave Young -1 siblings, 0 replies; 40+ messages in thread From: Dave Young @ 2020-09-25 3:05 UTC (permalink / raw) To: boris.ostrovsky Cc: Michael Kelley, Eric W. Biederman, Konrad Rzeszutek Wilk, Andrew Morton, bhe, linux-kernel, kexec, Eric DeVolder, Tianyu Lan, Wei Liu, Masami Hiramatsu, HATAYAMA Daisuke Hi, On 09/24/20 at 01:16pm, boris.ostrovsky@oracle.com wrote: > > On 9/24/20 12:43 PM, Michael Kelley wrote: > > From: Eric W. Biederman <ebiederm@xmission.com> Sent: Thursday, September 24, 2020 9:26 AM > >> Michael Kelley <mikelley@microsoft.com> writes: > >> > >>>>> Added Hyper-V people and people who created the param, it is below > >>>>> commit, I also want to remove it if possible, let's see how people > >>>>> think, but the least way should be to disable the auto setting in both systemd > >>>>> and kernel: > >>> Hyper-V uses a notifier to inform the host system that a Linux VM has > >>> panic'ed. Informing the host is particularly important in a public cloud > >>> such as Azure so that the cloud software can alert the customer, and can > >>> track cloud-wide reliability statistics. Whether a kdump is taken is controlled > >>> entirely by the customer and how he configures the VM, and we want > >>> the host to be informed either way. > >> Why? > >> > >> Why does the host care? > >> Especially if the VM continues executing into a kdump kernel? > > The host itself doesn't care. But the host is a convenient out-of-band > > channel for recording that a panic has occurred and to collect basic data > > about the panic. This out-of-band channel is then used to notify the end > > customer that his VM has panic'ed. Sure, the customer should be running > > his own monitoring software, but customers don't always do what they > > should. Equally important, the out-of-band channel allows the cloud > > infrastructure software to notice trends, such as that the rate of Linux > > panics has increased, and that perhaps there is a cloud problem that > > should be investigated. > > > In many cases (especially in cloud environment) your dump device is remote (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity issues (which could be cause of the panic in the first place). So it is quite desirable to inform the infrastructure that the VM is on its way out without waiting for kdump to complete. That can probably be done in kdump kernel if it is really needed. Say informing host that panic happened and a kdump kernel is runnning. But I think to set crash_kexec_post_notifiers by default is still bad. > > > > > >> Further like I have mentioned everytime something like this has come up > >> a call on the kexec on panic code path should be a direct call (That can > >> be audited) not something hidden in a notifier call chain (which can not). > >> > > We btw already have a direct call from panic() to kmsg_dump() which is indirectly controlled by crash_kexec_post_notifiers, and it would also be preferable to be able to call it before kdump as well. Right, that is the same thing we are talking about. Thanks Dave ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-25 3:05 ` Dave Young 0 siblings, 0 replies; 40+ messages in thread From: Dave Young @ 2020-09-25 3:05 UTC (permalink / raw) To: boris.ostrovsky Cc: Wei Liu, Tianyu Lan, bhe, Konrad Rzeszutek Wilk, kexec, linux-kernel, Michael Kelley, HATAYAMA Daisuke, Eric W. Biederman, Masami Hiramatsu, Andrew Morton, Eric DeVolder Hi, On 09/24/20 at 01:16pm, boris.ostrovsky@oracle.com wrote: > > On 9/24/20 12:43 PM, Michael Kelley wrote: > > From: Eric W. Biederman <ebiederm@xmission.com> Sent: Thursday, September 24, 2020 9:26 AM > >> Michael Kelley <mikelley@microsoft.com> writes: > >> > >>>>> Added Hyper-V people and people who created the param, it is below > >>>>> commit, I also want to remove it if possible, let's see how people > >>>>> think, but the least way should be to disable the auto setting in both systemd > >>>>> and kernel: > >>> Hyper-V uses a notifier to inform the host system that a Linux VM has > >>> panic'ed. Informing the host is particularly important in a public cloud > >>> such as Azure so that the cloud software can alert the customer, and can > >>> track cloud-wide reliability statistics. Whether a kdump is taken is controlled > >>> entirely by the customer and how he configures the VM, and we want > >>> the host to be informed either way. > >> Why? > >> > >> Why does the host care? > >> Especially if the VM continues executing into a kdump kernel? > > The host itself doesn't care. But the host is a convenient out-of-band > > channel for recording that a panic has occurred and to collect basic data > > about the panic. This out-of-band channel is then used to notify the end > > customer that his VM has panic'ed. Sure, the customer should be running > > his own monitoring software, but customers don't always do what they > > should. Equally important, the out-of-band channel allows the cloud > > infrastructure software to notice trends, such as that the rate of Linux > > panics has increased, and that perhaps there is a cloud problem that > > should be investigated. > > > In many cases (especially in cloud environment) your dump device is remote (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity issues (which could be cause of the panic in the first place). So it is quite desirable to inform the infrastructure that the VM is on its way out without waiting for kdump to complete. That can probably be done in kdump kernel if it is really needed. Say informing host that panic happened and a kdump kernel is runnning. But I think to set crash_kexec_post_notifiers by default is still bad. > > > > > >> Further like I have mentioned everytime something like this has come up > >> a call on the kexec on panic code path should be a direct call (That can > >> be audited) not something hidden in a notifier call chain (which can not). > >> > > We btw already have a direct call from panic() to kmsg_dump() which is indirectly controlled by crash_kexec_post_notifiers, and it would also be preferable to be able to call it before kdump as well. Right, that is the same thing we are talking about. Thanks Dave _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time 2020-09-25 3:05 ` Dave Young @ 2020-09-25 14:56 ` Konrad Rzeszutek Wilk -1 siblings, 0 replies; 40+ messages in thread From: Konrad Rzeszutek Wilk @ 2020-09-25 14:56 UTC (permalink / raw) To: Dave Young Cc: boris.ostrovsky, Michael Kelley, Eric W. Biederman, Andrew Morton, bhe, linux-kernel, kexec, Eric DeVolder, Tianyu Lan, Wei Liu, Masami Hiramatsu, HATAYAMA Daisuke On Fri, Sep 25, 2020 at 11:05:58AM +0800, Dave Young wrote: > Hi, > > On 09/24/20 at 01:16pm, boris.ostrovsky@oracle.com wrote: > > > > On 9/24/20 12:43 PM, Michael Kelley wrote: > > > From: Eric W. Biederman <ebiederm@xmission.com> Sent: Thursday, September 24, 2020 9:26 AM > > >> Michael Kelley <mikelley@microsoft.com> writes: > > >> > > >>>>> Added Hyper-V people and people who created the param, it is below > > >>>>> commit, I also want to remove it if possible, let's see how people > > >>>>> think, but the least way should be to disable the auto setting in both systemd > > >>>>> and kernel: > > >>> Hyper-V uses a notifier to inform the host system that a Linux VM has > > >>> panic'ed. Informing the host is particularly important in a public cloud > > >>> such as Azure so that the cloud software can alert the customer, and can > > >>> track cloud-wide reliability statistics. Whether a kdump is taken is controlled > > >>> entirely by the customer and how he configures the VM, and we want > > >>> the host to be informed either way. > > >> Why? > > >> > > >> Why does the host care? > > >> Especially if the VM continues executing into a kdump kernel? > > > The host itself doesn't care. But the host is a convenient out-of-band > > > channel for recording that a panic has occurred and to collect basic data > > > about the panic. This out-of-band channel is then used to notify the end > > > customer that his VM has panic'ed. Sure, the customer should be running > > > his own monitoring software, but customers don't always do what they > > > should. Equally important, the out-of-band channel allows the cloud > > > infrastructure software to notice trends, such as that the rate of Linux > > > panics has increased, and that perhaps there is a cloud problem that > > > should be investigated. > > > > > > In many cases (especially in cloud environment) your dump device is remote (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity issues (which could be cause of the panic in the first place). So it is quite desirable to inform the infrastructure that the VM is on its way out without waiting for kdump to complete. > > That can probably be done in kdump kernel if it is really needed. Say > informing host that panic happened and a kdump kernel is runnning. If kdump kernel gets to that point. Sometimes (sadly) it ends up being misconfigured and it chokes up - and hence having multiple ways to emit the crash information before running kdump kernel is a life-saver. > > But I think to set crash_kexec_post_notifiers by default is still bad. Because of the way it is run today I presume? If there was some safe/unsafe policy that should work right? I would think that the safe ones that work properly all the time are: - HyperV CRASH_MSRs, - KVM PVPANIC_[PANIC,CRASHLOAD] push button knob, - pstore EFI variables - Dumping in memory, And then some that depend on firmware version (aka BIOS, and vendor) are: - ACPI ERST, And then the unsafe: - s390, PowerPC (I don't actually know what they are but that was Dave's primary motivator). > > > > > > > > > > >> Further like I have mentioned everytime something like this has come up > > >> a call on the kexec on panic code path should be a direct call (That can > > >> be audited) not something hidden in a notifier call chain (which can not). > > >> > > > > We btw already have a direct call from panic() to kmsg_dump() which is indirectly controlled by crash_kexec_post_notifiers, and it would also be preferable to be able to call it before kdump as well. > > Right, that is the same thing we are talking about. > > Thanks > Dave > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-25 14:56 ` Konrad Rzeszutek Wilk 0 siblings, 0 replies; 40+ messages in thread From: Konrad Rzeszutek Wilk @ 2020-09-25 14:56 UTC (permalink / raw) To: Dave Young Cc: Wei Liu, Tianyu Lan, bhe, kexec, linux-kernel, Michael Kelley, HATAYAMA Daisuke, Eric W. Biederman, Masami Hiramatsu, boris.ostrovsky, Eric DeVolder, Andrew Morton On Fri, Sep 25, 2020 at 11:05:58AM +0800, Dave Young wrote: > Hi, > > On 09/24/20 at 01:16pm, boris.ostrovsky@oracle.com wrote: > > > > On 9/24/20 12:43 PM, Michael Kelley wrote: > > > From: Eric W. Biederman <ebiederm@xmission.com> Sent: Thursday, September 24, 2020 9:26 AM > > >> Michael Kelley <mikelley@microsoft.com> writes: > > >> > > >>>>> Added Hyper-V people and people who created the param, it is below > > >>>>> commit, I also want to remove it if possible, let's see how people > > >>>>> think, but the least way should be to disable the auto setting in both systemd > > >>>>> and kernel: > > >>> Hyper-V uses a notifier to inform the host system that a Linux VM has > > >>> panic'ed. Informing the host is particularly important in a public cloud > > >>> such as Azure so that the cloud software can alert the customer, and can > > >>> track cloud-wide reliability statistics. Whether a kdump is taken is controlled > > >>> entirely by the customer and how he configures the VM, and we want > > >>> the host to be informed either way. > > >> Why? > > >> > > >> Why does the host care? > > >> Especially if the VM continues executing into a kdump kernel? > > > The host itself doesn't care. But the host is a convenient out-of-band > > > channel for recording that a panic has occurred and to collect basic data > > > about the panic. This out-of-band channel is then used to notify the end > > > customer that his VM has panic'ed. Sure, the customer should be running > > > his own monitoring software, but customers don't always do what they > > > should. Equally important, the out-of-band channel allows the cloud > > > infrastructure software to notice trends, such as that the rate of Linux > > > panics has increased, and that perhaps there is a cloud problem that > > > should be investigated. > > > > > > In many cases (especially in cloud environment) your dump device is remote (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity issues (which could be cause of the panic in the first place). So it is quite desirable to inform the infrastructure that the VM is on its way out without waiting for kdump to complete. > > That can probably be done in kdump kernel if it is really needed. Say > informing host that panic happened and a kdump kernel is runnning. If kdump kernel gets to that point. Sometimes (sadly) it ends up being misconfigured and it chokes up - and hence having multiple ways to emit the crash information before running kdump kernel is a life-saver. > > But I think to set crash_kexec_post_notifiers by default is still bad. Because of the way it is run today I presume? If there was some safe/unsafe policy that should work right? I would think that the safe ones that work properly all the time are: - HyperV CRASH_MSRs, - KVM PVPANIC_[PANIC,CRASHLOAD] push button knob, - pstore EFI variables - Dumping in memory, And then some that depend on firmware version (aka BIOS, and vendor) are: - ACPI ERST, And then the unsafe: - s390, PowerPC (I don't actually know what they are but that was Dave's primary motivator). > > > > > > > > > > >> Further like I have mentioned everytime something like this has come up > > >> a call on the kexec on panic code path should be a direct call (That can > > >> be audited) not something hidden in a notifier call chain (which can not). > > >> > > > > We btw already have a direct call from panic() to kmsg_dump() which is indirectly controlled by crash_kexec_post_notifiers, and it would also be preferable to be able to call it before kdump as well. > > Right, that is the same thing we are talking about. > > Thanks > Dave > _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time 2020-09-25 14:56 ` Konrad Rzeszutek Wilk @ 2020-09-27 2:51 ` Dave Young -1 siblings, 0 replies; 40+ messages in thread From: Dave Young @ 2020-09-27 2:51 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: boris.ostrovsky, Michael Kelley, Eric W. Biederman, Andrew Morton, bhe, linux-kernel, kexec, Eric DeVolder, Tianyu Lan, Wei Liu, Masami Hiramatsu, HATAYAMA Daisuke Hi, On 09/25/20 at 10:56am, Konrad Rzeszutek Wilk wrote: > On Fri, Sep 25, 2020 at 11:05:58AM +0800, Dave Young wrote: > > Hi, > > > > On 09/24/20 at 01:16pm, boris.ostrovsky@oracle.com wrote: > > > > > > On 9/24/20 12:43 PM, Michael Kelley wrote: > > > > From: Eric W. Biederman <ebiederm@xmission.com> Sent: Thursday, September 24, 2020 9:26 AM > > > >> Michael Kelley <mikelley@microsoft.com> writes: > > > >> > > > >>>>> Added Hyper-V people and people who created the param, it is below > > > >>>>> commit, I also want to remove it if possible, let's see how people > > > >>>>> think, but the least way should be to disable the auto setting in both systemd > > > >>>>> and kernel: > > > >>> Hyper-V uses a notifier to inform the host system that a Linux VM has > > > >>> panic'ed. Informing the host is particularly important in a public cloud > > > >>> such as Azure so that the cloud software can alert the customer, and can > > > >>> track cloud-wide reliability statistics. Whether a kdump is taken is controlled > > > >>> entirely by the customer and how he configures the VM, and we want > > > >>> the host to be informed either way. > > > >> Why? > > > >> > > > >> Why does the host care? > > > >> Especially if the VM continues executing into a kdump kernel? > > > > The host itself doesn't care. But the host is a convenient out-of-band > > > > channel for recording that a panic has occurred and to collect basic data > > > > about the panic. This out-of-band channel is then used to notify the end > > > > customer that his VM has panic'ed. Sure, the customer should be running > > > > his own monitoring software, but customers don't always do what they > > > > should. Equally important, the out-of-band channel allows the cloud > > > > infrastructure software to notice trends, such as that the rate of Linux > > > > panics has increased, and that perhaps there is a cloud problem that > > > > should be investigated. > > > > > > > > > In many cases (especially in cloud environment) your dump device is remote (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity issues (which could be cause of the panic in the first place). So it is quite desirable to inform the infrastructure that the VM is on its way out without waiting for kdump to complete. > > > > That can probably be done in kdump kernel if it is really needed. Say > > informing host that panic happened and a kdump kernel is runnning. > > If kdump kernel gets to that point. Sometimes (sadly) it ends up being > misconfigured and it chokes up - and hence having multiple ways to emit > the crash information before running kdump kernel is a life-saver. If it is done in kernel boot phase before pid 1 comes up then things should be good enough, specific for kvm/hyper-v guests the kdump kernel. > > > > > But I think to set crash_kexec_post_notifiers by default is still bad. > > Because of the way it is run today I presume? If there was some > safe/unsafe policy that should work right? I would think that the > safe ones that work properly all the time are: > > - HyperV CRASH_MSRs, > - KVM PVPANIC_[PANIC,CRASHLOAD] push button knob, > - pstore EFI variables > - Dumping in memory, > > And then some that depend on firmware version (aka BIOS, and vendor) are: > - ACPI ERST, > > And then the unsafe: > - s390, PowerPC (I don't actually know what they are but that > was Dave's primary motivator). As I said we also got reports of kdump kernel hang with Hyper-V with the crash_kexec_post_notifiers enabled. EFI pstore also depends on efi runtime that is in firmware, also we can not ensure it works well after a panic happened. Ditto for other pstore backends we do not prefer to do it before kdump. But as I said I'm not saying they are not useful, people can use them by their choose. As for the virtual machine panic events maybe it is ok to add some other hooks instead of the notifiers. But frankly I still feel it is better to do it in kdump kernel boot path since kdump works well for virt from our experience. > > > > > > > > > > > > > > > > >> Further like I have mentioned everytime something like this has come up > > > >> a call on the kexec on panic code path should be a direct call (That can > > > >> be audited) not something hidden in a notifier call chain (which can not). > > > >> > > > > > > We btw already have a direct call from panic() to kmsg_dump() which is indirectly controlled by crash_kexec_post_notifiers, and it would also be preferable to be able to call it before kdump as well. > > > > Right, that is the same thing we are talking about. > > > > Thanks > > Dave > > > Thanks Dave ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-27 2:51 ` Dave Young 0 siblings, 0 replies; 40+ messages in thread From: Dave Young @ 2020-09-27 2:51 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Wei Liu, Tianyu Lan, bhe, kexec, linux-kernel, Michael Kelley, HATAYAMA Daisuke, Eric W. Biederman, Masami Hiramatsu, boris.ostrovsky, Eric DeVolder, Andrew Morton Hi, On 09/25/20 at 10:56am, Konrad Rzeszutek Wilk wrote: > On Fri, Sep 25, 2020 at 11:05:58AM +0800, Dave Young wrote: > > Hi, > > > > On 09/24/20 at 01:16pm, boris.ostrovsky@oracle.com wrote: > > > > > > On 9/24/20 12:43 PM, Michael Kelley wrote: > > > > From: Eric W. Biederman <ebiederm@xmission.com> Sent: Thursday, September 24, 2020 9:26 AM > > > >> Michael Kelley <mikelley@microsoft.com> writes: > > > >> > > > >>>>> Added Hyper-V people and people who created the param, it is below > > > >>>>> commit, I also want to remove it if possible, let's see how people > > > >>>>> think, but the least way should be to disable the auto setting in both systemd > > > >>>>> and kernel: > > > >>> Hyper-V uses a notifier to inform the host system that a Linux VM has > > > >>> panic'ed. Informing the host is particularly important in a public cloud > > > >>> such as Azure so that the cloud software can alert the customer, and can > > > >>> track cloud-wide reliability statistics. Whether a kdump is taken is controlled > > > >>> entirely by the customer and how he configures the VM, and we want > > > >>> the host to be informed either way. > > > >> Why? > > > >> > > > >> Why does the host care? > > > >> Especially if the VM continues executing into a kdump kernel? > > > > The host itself doesn't care. But the host is a convenient out-of-band > > > > channel for recording that a panic has occurred and to collect basic data > > > > about the panic. This out-of-band channel is then used to notify the end > > > > customer that his VM has panic'ed. Sure, the customer should be running > > > > his own monitoring software, but customers don't always do what they > > > > should. Equally important, the out-of-band channel allows the cloud > > > > infrastructure software to notice trends, such as that the rate of Linux > > > > panics has increased, and that perhaps there is a cloud problem that > > > > should be investigated. > > > > > > > > > In many cases (especially in cloud environment) your dump device is remote (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity issues (which could be cause of the panic in the first place). So it is quite desirable to inform the infrastructure that the VM is on its way out without waiting for kdump to complete. > > > > That can probably be done in kdump kernel if it is really needed. Say > > informing host that panic happened and a kdump kernel is runnning. > > If kdump kernel gets to that point. Sometimes (sadly) it ends up being > misconfigured and it chokes up - and hence having multiple ways to emit > the crash information before running kdump kernel is a life-saver. If it is done in kernel boot phase before pid 1 comes up then things should be good enough, specific for kvm/hyper-v guests the kdump kernel. > > > > > But I think to set crash_kexec_post_notifiers by default is still bad. > > Because of the way it is run today I presume? If there was some > safe/unsafe policy that should work right? I would think that the > safe ones that work properly all the time are: > > - HyperV CRASH_MSRs, > - KVM PVPANIC_[PANIC,CRASHLOAD] push button knob, > - pstore EFI variables > - Dumping in memory, > > And then some that depend on firmware version (aka BIOS, and vendor) are: > - ACPI ERST, > > And then the unsafe: > - s390, PowerPC (I don't actually know what they are but that > was Dave's primary motivator). As I said we also got reports of kdump kernel hang with Hyper-V with the crash_kexec_post_notifiers enabled. EFI pstore also depends on efi runtime that is in firmware, also we can not ensure it works well after a panic happened. Ditto for other pstore backends we do not prefer to do it before kdump. But as I said I'm not saying they are not useful, people can use them by their choose. As for the virtual machine panic events maybe it is ok to add some other hooks instead of the notifiers. But frankly I still feel it is better to do it in kdump kernel boot path since kdump works well for virt from our experience. > > > > > > > > > > > > > > > > >> Further like I have mentioned everytime something like this has come up > > > >> a call on the kexec on panic code path should be a direct call (That can > > > >> be audited) not something hidden in a notifier call chain (which can not). > > > >> > > > > > > We btw already have a direct call from panic() to kmsg_dump() which is indirectly controlled by crash_kexec_post_notifiers, and it would also be preferable to be able to call it before kdump as well. > > > > Right, that is the same thing we are talking about. > > > > Thanks > > Dave > > > Thanks Dave _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time 2020-09-25 14:56 ` Konrad Rzeszutek Wilk @ 2020-09-29 13:36 ` Philipp Rudo -1 siblings, 0 replies; 40+ messages in thread From: Philipp Rudo @ 2020-09-29 13:36 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Dave Young, Wei Liu, Tianyu Lan, bhe, kexec, linux-kernel, Michael Kelley, HATAYAMA Daisuke, Eric W. Biederman, Masami Hiramatsu, boris.ostrovsky, Eric DeVolder, Andrew Morton Hi, On Fri, 25 Sep 2020 10:56:25 -0400 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > On Fri, Sep 25, 2020 at 11:05:58AM +0800, Dave Young wrote: > > Hi, > > > > On 09/24/20 at 01:16pm, boris.ostrovsky@oracle.com wrote: > > > > > > On 9/24/20 12:43 PM, Michael Kelley wrote: > > > > From: Eric W. Biederman <ebiederm@xmission.com> Sent: Thursday, September 24, 2020 9:26 AM > > > >> Michael Kelley <mikelley@microsoft.com> writes: > > > >> > > > >>>>> Added Hyper-V people and people who created the param, it is below > > > >>>>> commit, I also want to remove it if possible, let's see how people > > > >>>>> think, but the least way should be to disable the auto setting in both systemd > > > >>>>> and kernel: > > > >>> Hyper-V uses a notifier to inform the host system that a Linux VM has > > > >>> panic'ed. Informing the host is particularly important in a public cloud > > > >>> such as Azure so that the cloud software can alert the customer, and can > > > >>> track cloud-wide reliability statistics. Whether a kdump is taken is controlled > > > >>> entirely by the customer and how he configures the VM, and we want > > > >>> the host to be informed either way. > > > >> Why? > > > >> > > > >> Why does the host care? > > > >> Especially if the VM continues executing into a kdump kernel? > > > > The host itself doesn't care. But the host is a convenient out-of-band > > > > channel for recording that a panic has occurred and to collect basic data > > > > about the panic. This out-of-band channel is then used to notify the end > > > > customer that his VM has panic'ed. Sure, the customer should be running > > > > his own monitoring software, but customers don't always do what they > > > > should. Equally important, the out-of-band channel allows the cloud > > > > infrastructure software to notice trends, such as that the rate of Linux > > > > panics has increased, and that perhaps there is a cloud problem that > > > > should be investigated. > > > > > > > > > In many cases (especially in cloud environment) your dump device is remote (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity issues (which could be cause of the panic in the first place). So it is quite desirable to inform the infrastructure that the VM is on its way out without waiting for kdump to complete. > > > > That can probably be done in kdump kernel if it is really needed. Say > > informing host that panic happened and a kdump kernel is runnning. > > If kdump kernel gets to that point. Sometimes (sadly) it ends up being > misconfigured and it chokes up - and hence having multiple ways to emit > the crash information before running kdump kernel is a life-saver. > > > > > But I think to set crash_kexec_post_notifiers by default is still bad. > > Because of the way it is run today I presume? If there was some > safe/unsafe policy that should work right? I would think that the > safe ones that work properly all the time are: > > - HyperV CRASH_MSRs, > - KVM PVPANIC_[PANIC,CRASHLOAD] push button knob, > - pstore EFI variables > - Dumping in memory, > > And then some that depend on firmware version (aka BIOS, and vendor) are: > - ACPI ERST, > > And then the unsafe: > - s390, PowerPC (I don't actually know what they are but that > was Dave's primary motivator). that won't work on s390. Let me emphasize that the problems on s390 are not the notifiers themselves but the fact that they are called before crash_kexec. On s390 we have multiple dump methods besides kdump. We use a panic notifier to trigger these dump methods from the panicking kernel. The problem is that these dump methods are less powerful than kdump so we only want to use them as fallback, i.e. only use them when either kdump wasn't configured or loading of the crash kernel failed for whatever reason. That's why (plus historic reasons) our notifier stops the machine when it is called and none of the methods is configured. Which means that the second crash_kexec is never reached. Long story short, the problem on s390 is caused by the two hunks in kernel/panic.c:panic from f06e5153f4ae ("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers"). Besides the problems on s390 I support Dave and think that setting crash_kexec_post_notifiers by default is wrong. We should keep in mind that we are in a panic situation. This means that the kernel is in a state where it doesn't trust itself anymore. So we should keep the code that is run to the bare minimum as we cannot rely on it to work properly. Thanks Philipp > > > > > > > > > > > > > > > > >> Further like I have mentioned everytime something like this has come up > > > >> a call on the kexec on panic code path should be a direct call (That can > > > >> be audited) not something hidden in a notifier call chain (which can not). > > > >> > > > > > > We btw already have a direct call from panic() to kmsg_dump() which is indirectly controlled by crash_kexec_post_notifiers, and it would also be preferable to be able to call it before kdump as well. > > > > Right, that is the same thing we are talking about. > > > > Thanks > > Dave > > > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-29 13:36 ` Philipp Rudo 0 siblings, 0 replies; 40+ messages in thread From: Philipp Rudo @ 2020-09-29 13:36 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Wei Liu, Tianyu Lan, bhe, kexec, linux-kernel, Michael Kelley, HATAYAMA Daisuke, Eric W. Biederman, Masami Hiramatsu, boris.ostrovsky, Eric DeVolder, Dave Young, Andrew Morton Hi, On Fri, 25 Sep 2020 10:56:25 -0400 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > On Fri, Sep 25, 2020 at 11:05:58AM +0800, Dave Young wrote: > > Hi, > > > > On 09/24/20 at 01:16pm, boris.ostrovsky@oracle.com wrote: > > > > > > On 9/24/20 12:43 PM, Michael Kelley wrote: > > > > From: Eric W. Biederman <ebiederm@xmission.com> Sent: Thursday, September 24, 2020 9:26 AM > > > >> Michael Kelley <mikelley@microsoft.com> writes: > > > >> > > > >>>>> Added Hyper-V people and people who created the param, it is below > > > >>>>> commit, I also want to remove it if possible, let's see how people > > > >>>>> think, but the least way should be to disable the auto setting in both systemd > > > >>>>> and kernel: > > > >>> Hyper-V uses a notifier to inform the host system that a Linux VM has > > > >>> panic'ed. Informing the host is particularly important in a public cloud > > > >>> such as Azure so that the cloud software can alert the customer, and can > > > >>> track cloud-wide reliability statistics. Whether a kdump is taken is controlled > > > >>> entirely by the customer and how he configures the VM, and we want > > > >>> the host to be informed either way. > > > >> Why? > > > >> > > > >> Why does the host care? > > > >> Especially if the VM continues executing into a kdump kernel? > > > > The host itself doesn't care. But the host is a convenient out-of-band > > > > channel for recording that a panic has occurred and to collect basic data > > > > about the panic. This out-of-band channel is then used to notify the end > > > > customer that his VM has panic'ed. Sure, the customer should be running > > > > his own monitoring software, but customers don't always do what they > > > > should. Equally important, the out-of-band channel allows the cloud > > > > infrastructure software to notice trends, such as that the rate of Linux > > > > panics has increased, and that perhaps there is a cloud problem that > > > > should be investigated. > > > > > > > > > In many cases (especially in cloud environment) your dump device is remote (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity issues (which could be cause of the panic in the first place). So it is quite desirable to inform the infrastructure that the VM is on its way out without waiting for kdump to complete. > > > > That can probably be done in kdump kernel if it is really needed. Say > > informing host that panic happened and a kdump kernel is runnning. > > If kdump kernel gets to that point. Sometimes (sadly) it ends up being > misconfigured and it chokes up - and hence having multiple ways to emit > the crash information before running kdump kernel is a life-saver. > > > > > But I think to set crash_kexec_post_notifiers by default is still bad. > > Because of the way it is run today I presume? If there was some > safe/unsafe policy that should work right? I would think that the > safe ones that work properly all the time are: > > - HyperV CRASH_MSRs, > - KVM PVPANIC_[PANIC,CRASHLOAD] push button knob, > - pstore EFI variables > - Dumping in memory, > > And then some that depend on firmware version (aka BIOS, and vendor) are: > - ACPI ERST, > > And then the unsafe: > - s390, PowerPC (I don't actually know what they are but that > was Dave's primary motivator). that won't work on s390. Let me emphasize that the problems on s390 are not the notifiers themselves but the fact that they are called before crash_kexec. On s390 we have multiple dump methods besides kdump. We use a panic notifier to trigger these dump methods from the panicking kernel. The problem is that these dump methods are less powerful than kdump so we only want to use them as fallback, i.e. only use them when either kdump wasn't configured or loading of the crash kernel failed for whatever reason. That's why (plus historic reasons) our notifier stops the machine when it is called and none of the methods is configured. Which means that the second crash_kexec is never reached. Long story short, the problem on s390 is caused by the two hunks in kernel/panic.c:panic from f06e5153f4ae ("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers"). Besides the problems on s390 I support Dave and think that setting crash_kexec_post_notifiers by default is wrong. We should keep in mind that we are in a panic situation. This means that the kernel is in a state where it doesn't trust itself anymore. So we should keep the code that is run to the bare minimum as we cannot rely on it to work properly. Thanks Philipp > > > > > > > > > > > > > > > > >> Further like I have mentioned everytime something like this has come up > > > >> a call on the kexec on panic code path should be a direct call (That can > > > >> be audited) not something hidden in a notifier call chain (which can not). > > > >> > > > > > > We btw already have a direct call from panic() to kmsg_dump() which is indirectly controlled by crash_kexec_post_notifiers, and it would also be preferable to be able to call it before kdump as well. > > > > Right, that is the same thing we are talking about. > > > > Thanks > > Dave > > > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time 2020-09-29 13:36 ` Philipp Rudo @ 2020-09-29 19:10 ` boris.ostrovsky -1 siblings, 0 replies; 40+ messages in thread From: boris.ostrovsky @ 2020-09-29 19:10 UTC (permalink / raw) To: Philipp Rudo, Konrad Rzeszutek Wilk Cc: Dave Young, Wei Liu, Tianyu Lan, bhe, kexec, linux-kernel, Michael Kelley, HATAYAMA Daisuke, Eric W. Biederman, Masami Hiramatsu, Eric DeVolder, Andrew Morton, lennart +Lennart On 9/29/20 9:36 AM, Philipp Rudo wrote: > Hi, > > On Fri, 25 Sep 2020 10:56:25 -0400 > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > >> On Fri, Sep 25, 2020 at 11:05:58AM +0800, Dave Young wrote: >>> Hi, >>> >>> On 09/24/20 at 01:16pm, boris.ostrovsky@oracle.com wrote: >>>> On 9/24/20 12:43 PM, Michael Kelley wrote: >>>>> From: Eric W. Biederman <ebiederm@xmission.com> Sent: Thursday, September 24, 2020 9:26 AM >>>>>> Michael Kelley <mikelley@microsoft.com> writes: >>>>>> >>>>>>>>> Added Hyper-V people and people who created the param, it is below >>>>>>>>> commit, I also want to remove it if possible, let's see how people >>>>>>>>> think, but the least way should be to disable the auto setting in both systemd >>>>>>>>> and kernel: >>>>>>> Hyper-V uses a notifier to inform the host system that a Linux VM has >>>>>>> panic'ed. Informing the host is particularly important in a public cloud >>>>>>> such as Azure so that the cloud software can alert the customer, and can >>>>>>> track cloud-wide reliability statistics. Whether a kdump is taken is controlled >>>>>>> entirely by the customer and how he configures the VM, and we want >>>>>>> the host to be informed either way. >>>>>> Why? >>>>>> >>>>>> Why does the host care? >>>>>> Especially if the VM continues executing into a kdump kernel? >>>>> The host itself doesn't care. But the host is a convenient out-of-band >>>>> channel for recording that a panic has occurred and to collect basic data >>>>> about the panic. This out-of-band channel is then used to notify the end >>>>> customer that his VM has panic'ed. Sure, the customer should be running >>>>> his own monitoring software, but customers don't always do what they >>>>> should. Equally important, the out-of-band channel allows the cloud >>>>> infrastructure software to notice trends, such as that the rate of Linux >>>>> panics has increased, and that perhaps there is a cloud problem that >>>>> should be investigated. >>>> >>>> In many cases (especially in cloud environment) your dump device is remote (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity issues (which could be cause of the panic in the first place). So it is quite desirable to inform the infrastructure that the VM is on its way out without waiting for kdump to complete. >>> That can probably be done in kdump kernel if it is really needed. Say >>> informing host that panic happened and a kdump kernel is runnning. >> If kdump kernel gets to that point. Sometimes (sadly) it ends up being >> misconfigured and it chokes up - and hence having multiple ways to emit >> the crash information before running kdump kernel is a life-saver. >> >>> But I think to set crash_kexec_post_notifiers by default is still bad. >> Because of the way it is run today I presume? If there was some >> safe/unsafe policy that should work right? I would think that the >> safe ones that work properly all the time are: >> >> - HyperV CRASH_MSRs, >> - KVM PVPANIC_[PANIC,CRASHLOAD] push button knob, >> - pstore EFI variables >> - Dumping in memory, >> >> And then some that depend on firmware version (aka BIOS, and vendor) are: >> - ACPI ERST, >> >> And then the unsafe: >> - s390, PowerPC (I don't actually know what they are but that >> was Dave's primary motivator). > that won't work on s390. Let me emphasize that the problems on s390 are not the > notifiers themselves but the fact that they are called before crash_kexec. > > On s390 we have multiple dump methods besides kdump. We use a panic notifier to > trigger these dump methods from the panicking kernel. The problem is that these > dump methods are less powerful than kdump so we only want to use them as > fallback, i.e. only use them when either kdump wasn't configured or loading of > the crash kernel failed for whatever reason. That's why (plus historic reasons) > our notifier stops the machine when it is called and none of the methods is > configured. Which means that the second crash_kexec is never reached. > > Long story short, the problem on s390 is caused by the two hunks in > kernel/panic.c:panic from f06e5153f4ae ("kernel/panic.c: add > "crash_kexec_post_notifiers" option for kdump after panic_notifers"). > > Besides the problems on s390 I support Dave and think that setting > crash_kexec_post_notifiers by default is wrong. We should keep in mind that > we are in a panic situation. This means that the kernel is in a state where it > doesn't trust itself anymore. So we should keep the code that is run to the > bare minimum as we cannot rely on it to work properly. There is a pending patch to revert notifiers' default in systemd: https://github.com/systemd/systemd/pull/16950 If this change goes through then Dave's patch will be unnecessary. -boris > > Thanks > Philipp > >>> >>>> >>>>> >>>>>> Further like I have mentioned everytime something like this has come up >>>>>> a call on the kexec on panic code path should be a direct call (That can >>>>>> be audited) not something hidden in a notifier call chain (which can not). >>>>>> >>>> We btw already have a direct call from panic() to kmsg_dump() which is indirectly controlled by crash_kexec_post_notifiers, and it would also be preferable to be able to call it before kdump as well. >>> Right, that is the same thing we are talking about. >>> >>> Thanks >>> Dave >>> >> _______________________________________________ >> kexec mailing list >> kexec@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-29 19:10 ` boris.ostrovsky 0 siblings, 0 replies; 40+ messages in thread From: boris.ostrovsky @ 2020-09-29 19:10 UTC (permalink / raw) To: Philipp Rudo, Konrad Rzeszutek Wilk Cc: Wei Liu, Tianyu Lan, bhe, kexec, linux-kernel, Michael Kelley, HATAYAMA Daisuke, Eric W. Biederman, lennart, Masami Hiramatsu, Andrew Morton, Eric DeVolder, Dave Young +Lennart On 9/29/20 9:36 AM, Philipp Rudo wrote: > Hi, > > On Fri, 25 Sep 2020 10:56:25 -0400 > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > >> On Fri, Sep 25, 2020 at 11:05:58AM +0800, Dave Young wrote: >>> Hi, >>> >>> On 09/24/20 at 01:16pm, boris.ostrovsky@oracle.com wrote: >>>> On 9/24/20 12:43 PM, Michael Kelley wrote: >>>>> From: Eric W. Biederman <ebiederm@xmission.com> Sent: Thursday, September 24, 2020 9:26 AM >>>>>> Michael Kelley <mikelley@microsoft.com> writes: >>>>>> >>>>>>>>> Added Hyper-V people and people who created the param, it is below >>>>>>>>> commit, I also want to remove it if possible, let's see how people >>>>>>>>> think, but the least way should be to disable the auto setting in both systemd >>>>>>>>> and kernel: >>>>>>> Hyper-V uses a notifier to inform the host system that a Linux VM has >>>>>>> panic'ed. Informing the host is particularly important in a public cloud >>>>>>> such as Azure so that the cloud software can alert the customer, and can >>>>>>> track cloud-wide reliability statistics. Whether a kdump is taken is controlled >>>>>>> entirely by the customer and how he configures the VM, and we want >>>>>>> the host to be informed either way. >>>>>> Why? >>>>>> >>>>>> Why does the host care? >>>>>> Especially if the VM continues executing into a kdump kernel? >>>>> The host itself doesn't care. But the host is a convenient out-of-band >>>>> channel for recording that a panic has occurred and to collect basic data >>>>> about the panic. This out-of-band channel is then used to notify the end >>>>> customer that his VM has panic'ed. Sure, the customer should be running >>>>> his own monitoring software, but customers don't always do what they >>>>> should. Equally important, the out-of-band channel allows the cloud >>>>> infrastructure software to notice trends, such as that the rate of Linux >>>>> panics has increased, and that perhaps there is a cloud problem that >>>>> should be investigated. >>>> >>>> In many cases (especially in cloud environment) your dump device is remote (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity issues (which could be cause of the panic in the first place). So it is quite desirable to inform the infrastructure that the VM is on its way out without waiting for kdump to complete. >>> That can probably be done in kdump kernel if it is really needed. Say >>> informing host that panic happened and a kdump kernel is runnning. >> If kdump kernel gets to that point. Sometimes (sadly) it ends up being >> misconfigured and it chokes up - and hence having multiple ways to emit >> the crash information before running kdump kernel is a life-saver. >> >>> But I think to set crash_kexec_post_notifiers by default is still bad. >> Because of the way it is run today I presume? If there was some >> safe/unsafe policy that should work right? I would think that the >> safe ones that work properly all the time are: >> >> - HyperV CRASH_MSRs, >> - KVM PVPANIC_[PANIC,CRASHLOAD] push button knob, >> - pstore EFI variables >> - Dumping in memory, >> >> And then some that depend on firmware version (aka BIOS, and vendor) are: >> - ACPI ERST, >> >> And then the unsafe: >> - s390, PowerPC (I don't actually know what they are but that >> was Dave's primary motivator). > that won't work on s390. Let me emphasize that the problems on s390 are not the > notifiers themselves but the fact that they are called before crash_kexec. > > On s390 we have multiple dump methods besides kdump. We use a panic notifier to > trigger these dump methods from the panicking kernel. The problem is that these > dump methods are less powerful than kdump so we only want to use them as > fallback, i.e. only use them when either kdump wasn't configured or loading of > the crash kernel failed for whatever reason. That's why (plus historic reasons) > our notifier stops the machine when it is called and none of the methods is > configured. Which means that the second crash_kexec is never reached. > > Long story short, the problem on s390 is caused by the two hunks in > kernel/panic.c:panic from f06e5153f4ae ("kernel/panic.c: add > "crash_kexec_post_notifiers" option for kdump after panic_notifers"). > > Besides the problems on s390 I support Dave and think that setting > crash_kexec_post_notifiers by default is wrong. We should keep in mind that > we are in a panic situation. This means that the kernel is in a state where it > doesn't trust itself anymore. So we should keep the code that is run to the > bare minimum as we cannot rely on it to work properly. There is a pending patch to revert notifiers' default in systemd: https://github.com/systemd/systemd/pull/16950 If this change goes through then Dave's patch will be unnecessary. -boris > > Thanks > Philipp > >>> >>>> >>>>> >>>>>> Further like I have mentioned everytime something like this has come up >>>>>> a call on the kexec on panic code path should be a direct call (That can >>>>>> be audited) not something hidden in a notifier call chain (which can not). >>>>>> >>>> We btw already have a direct call from panic() to kmsg_dump() which is indirectly controlled by crash_kexec_post_notifiers, and it would also be preferable to be able to call it before kdump as well. >>> Right, that is the same thing we are talking about. >>> >>> Thanks >>> Dave >>> >> _______________________________________________ >> kexec mailing list >> kexec@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/kexec _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time 2020-09-21 20:18 ` Konrad Rzeszutek Wilk @ 2020-09-22 10:58 ` Philipp Rudo -1 siblings, 0 replies; 40+ messages in thread From: Philipp Rudo @ 2020-09-22 10:58 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Andrew Morton, bhe, kexec, linux-kernel, Eric Biederman, Boris Ostrovsky, Eric DeVolder, Dave Young Hi Konrad, On Mon, 21 Sep 2020 16:18:12 -0400 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote: > > On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: > > > > > crash_kexec_post_notifiers enables running various panic notifier > > > before kdump kernel booting. This increases risks of kdump failure. > > > It is well documented in kernel-parameters.txt. We do not suggest > > > people to enable it together with kdump unless he/she is really sure. > > > This is also not suggested to be enabled by default when users are > > > not aware in distributions. > > > > > > But unfortunately it is enabled by default in systemd, see below > > > discussions in a systemd report, we can not convince systemd to change > > > it: > > > https://github.com/systemd/systemd/issues/16661 > > > > > > Actually we have got reports about kdump kernel hangs in both s390x > > > and powerpcle cases caused by the systemd change, also some x86 cases > > > could also be caused by the same (although that is in Hyper-V code > > > instead of systemd, that need to be addressed separately). > > Perhaps it may be better to fix the issus on s390x and PowerPC as well? There's little s390 can fix. We use the panic_notifier_list to start other dumpers in case kdump isn't configured or failed. This behavior was introduced in 2006 long before crash_kexec_post_notifiers were introduced. So I suggest that crash_kexec_post_notifiers are fixed instead. > > > > > > Thus to avoid the auto enablement here just disable the param writable > > > permission in sysfs. > > > > > > > Well. I don't think this is at all a desirable way of resolving a > > disagreement with the systemd developers > > > > At the above github address I'm seeing "ryncsn added a commit to > > ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't > > enable crash_kexec_post_notifiers by default". So didn't that address > > the issue? > > It does in systemd, but there is a strong interest in making this on by default. AFAIK pstore requires UEFI to work. So what's the point to enable it on non-UEFI systems? Thanks Philipp ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-22 10:58 ` Philipp Rudo 0 siblings, 0 replies; 40+ messages in thread From: Philipp Rudo @ 2020-09-22 10:58 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: bhe, kexec, linux-kernel, Eric Biederman, Boris Ostrovsky, Eric DeVolder, Dave Young, Andrew Morton Hi Konrad, On Mon, 21 Sep 2020 16:18:12 -0400 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote: > > On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: > > > > > crash_kexec_post_notifiers enables running various panic notifier > > > before kdump kernel booting. This increases risks of kdump failure. > > > It is well documented in kernel-parameters.txt. We do not suggest > > > people to enable it together with kdump unless he/she is really sure. > > > This is also not suggested to be enabled by default when users are > > > not aware in distributions. > > > > > > But unfortunately it is enabled by default in systemd, see below > > > discussions in a systemd report, we can not convince systemd to change > > > it: > > > https://github.com/systemd/systemd/issues/16661 > > > > > > Actually we have got reports about kdump kernel hangs in both s390x > > > and powerpcle cases caused by the systemd change, also some x86 cases > > > could also be caused by the same (although that is in Hyper-V code > > > instead of systemd, that need to be addressed separately). > > Perhaps it may be better to fix the issus on s390x and PowerPC as well? There's little s390 can fix. We use the panic_notifier_list to start other dumpers in case kdump isn't configured or failed. This behavior was introduced in 2006 long before crash_kexec_post_notifiers were introduced. So I suggest that crash_kexec_post_notifiers are fixed instead. > > > > > > Thus to avoid the auto enablement here just disable the param writable > > > permission in sysfs. > > > > > > > Well. I don't think this is at all a desirable way of resolving a > > disagreement with the systemd developers > > > > At the above github address I'm seeing "ryncsn added a commit to > > ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't > > enable crash_kexec_post_notifiers by default". So didn't that address > > the issue? > > It does in systemd, but there is a strong interest in making this on by default. AFAIK pstore requires UEFI to work. So what's the point to enable it on non-UEFI systems? Thanks Philipp _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time 2020-09-22 10:58 ` Philipp Rudo @ 2020-09-22 14:50 ` boris.ostrovsky -1 siblings, 0 replies; 40+ messages in thread From: boris.ostrovsky @ 2020-09-22 14:50 UTC (permalink / raw) To: Philipp Rudo, Konrad Rzeszutek Wilk Cc: Andrew Morton, bhe, kexec, linux-kernel, Eric Biederman, Eric DeVolder, Dave Young On 9/22/20 6:58 AM, Philipp Rudo wrote: > > AFAIK pstore requires UEFI to work. So what's the point to enable it on non-UEFI > systems? I don't think UEFI is required, ERST can specify its own backend. And that, in fact, can be quite useful in virtualization scenarios (especially in cases of direct boot, when there is no OVMF) -boris ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-22 14:50 ` boris.ostrovsky 0 siblings, 0 replies; 40+ messages in thread From: boris.ostrovsky @ 2020-09-22 14:50 UTC (permalink / raw) To: Philipp Rudo, Konrad Rzeszutek Wilk Cc: bhe, kexec, linux-kernel, Eric Biederman, Andrew Morton, Eric DeVolder, Dave Young On 9/22/20 6:58 AM, Philipp Rudo wrote: > > AFAIK pstore requires UEFI to work. So what's the point to enable it on non-UEFI > systems? I don't think UEFI is required, ERST can specify its own backend. And that, in fact, can be quite useful in virtualization scenarios (especially in cases of direct boot, when there is no OVMF) -boris _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time 2020-09-22 14:50 ` boris.ostrovsky @ 2020-09-22 17:04 ` Guilherme G. Piccoli -1 siblings, 0 replies; 40+ messages in thread From: Guilherme G. Piccoli @ 2020-09-22 17:04 UTC (permalink / raw) To: boris.ostrovsky Cc: Philipp Rudo, Konrad Rzeszutek Wilk, Baoquan He, kexec mailing list, LKML, Eric Biederman, Andrew Morton, Eric DeVolder, Dave Young On Tue, Sep 22, 2020 at 11:53 AM <boris.ostrovsky@oracle.com> wrote: > > > On 9/22/20 6:58 AM, Philipp Rudo wrote: > > > > AFAIK pstore requires UEFI to work. So what's the point to enable it on non-UEFI > > systems? > > > I don't think UEFI is required, ERST can specify its own backend. And that, in fact, can be quite useful in virtualization scenarios (especially in cases of direct boot, when there is no OVMF) > > > -boris There is ramoops backend too - I was able to collect a dmesg in a cloud provider using that! ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-22 17:04 ` Guilherme G. Piccoli 0 siblings, 0 replies; 40+ messages in thread From: Guilherme G. Piccoli @ 2020-09-22 17:04 UTC (permalink / raw) To: boris.ostrovsky Cc: Baoquan He, Konrad Rzeszutek Wilk, kexec mailing list, LKML, Philipp Rudo, Eric Biederman, Andrew Morton, Eric DeVolder, Dave Young On Tue, Sep 22, 2020 at 11:53 AM <boris.ostrovsky@oracle.com> wrote: > > > On 9/22/20 6:58 AM, Philipp Rudo wrote: > > > > AFAIK pstore requires UEFI to work. So what's the point to enable it on non-UEFI > > systems? > > > I don't think UEFI is required, ERST can specify its own backend. And that, in fact, can be quite useful in virtualization scenarios (especially in cases of direct boot, when there is no OVMF) > > > -boris There is ramoops backend too - I was able to collect a dmesg in a cloud provider using that! _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time 2020-09-21 20:18 ` Konrad Rzeszutek Wilk @ 2020-09-23 2:25 ` Dave Young -1 siblings, 0 replies; 40+ messages in thread From: Dave Young @ 2020-09-23 2:25 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Andrew Morton, bhe, Eric Biederman, linux-kernel, kexec, Eric DeVolder, Boris Ostrovsky On 09/21/20 at 04:18pm, Konrad Rzeszutek Wilk wrote: > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote: > > On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: > > > > > crash_kexec_post_notifiers enables running various panic notifier > > > before kdump kernel booting. This increases risks of kdump failure. > > > It is well documented in kernel-parameters.txt. We do not suggest > > > people to enable it together with kdump unless he/she is really sure. > > > This is also not suggested to be enabled by default when users are > > > not aware in distributions. > > > > > > But unfortunately it is enabled by default in systemd, see below > > > discussions in a systemd report, we can not convince systemd to change > > > it: > > > https://github.com/systemd/systemd/issues/16661 > > > > > > Actually we have got reports about kdump kernel hangs in both s390x > > > and powerpcle cases caused by the systemd change, also some x86 cases > > > could also be caused by the same (although that is in Hyper-V code > > > instead of systemd, that need to be addressed separately). > > Perhaps it may be better to fix the issus on s390x and PowerPC as well? > > > > > > > Thus to avoid the auto enablement here just disable the param writable > > > permission in sysfs. > > > > > > > Well. I don't think this is at all a desirable way of resolving a > > disagreement with the systemd developers > > > > At the above github address I'm seeing "ryncsn added a commit to > > ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't > > enable crash_kexec_post_notifiers by default". So didn't that address > > the issue? > > It does in systemd, but there is a strong interest in making this on by default. I understand there could be such interest, but we have to keep in mind that any extra things after a system crash can cause kdump unreliable. I do not object people to use pstore, but I do object to enable the notifiers by default. BTW, crash notifiers are not limited to pstore, there are quite a log of other pieces like led trigger etc. Thanks Dave ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time @ 2020-09-23 2:25 ` Dave Young 0 siblings, 0 replies; 40+ messages in thread From: Dave Young @ 2020-09-23 2:25 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: bhe, kexec, linux-kernel, Eric Biederman, Andrew Morton, Eric DeVolder, Boris Ostrovsky On 09/21/20 at 04:18pm, Konrad Rzeszutek Wilk wrote: > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote: > > On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote: > > > > > crash_kexec_post_notifiers enables running various panic notifier > > > before kdump kernel booting. This increases risks of kdump failure. > > > It is well documented in kernel-parameters.txt. We do not suggest > > > people to enable it together with kdump unless he/she is really sure. > > > This is also not suggested to be enabled by default when users are > > > not aware in distributions. > > > > > > But unfortunately it is enabled by default in systemd, see below > > > discussions in a systemd report, we can not convince systemd to change > > > it: > > > https://github.com/systemd/systemd/issues/16661 > > > > > > Actually we have got reports about kdump kernel hangs in both s390x > > > and powerpcle cases caused by the systemd change, also some x86 cases > > > could also be caused by the same (although that is in Hyper-V code > > > instead of systemd, that need to be addressed separately). > > Perhaps it may be better to fix the issus on s390x and PowerPC as well? > > > > > > > Thus to avoid the auto enablement here just disable the param writable > > > permission in sysfs. > > > > > > > Well. I don't think this is at all a desirable way of resolving a > > disagreement with the systemd developers > > > > At the above github address I'm seeing "ryncsn added a commit to > > ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't > > enable crash_kexec_post_notifiers by default". So didn't that address > > the issue? > > It does in systemd, but there is a strong interest in making this on by default. I understand there could be such interest, but we have to keep in mind that any extra things after a system crash can cause kdump unreliable. I do not object people to use pstore, but I do object to enable the notifiers by default. BTW, crash notifiers are not limited to pstore, there are quite a log of other pieces like led trigger etc. Thanks Dave _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2020-09-29 19:11 UTC | newest] Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-18 3:25 [PATCH] Only allow to set crash_kexec_post_notifiers on boot time Dave Young 2020-09-18 3:25 ` Dave Young 2020-09-19 0:47 ` Andrew Morton 2020-09-19 0:47 ` Andrew Morton 2020-09-19 7:26 ` Dave Young 2020-09-19 7:26 ` Dave Young 2020-09-21 20:18 ` Konrad Rzeszutek Wilk 2020-09-21 20:18 ` Konrad Rzeszutek Wilk 2020-09-22 1:45 ` Eric W. Biederman 2020-09-22 1:45 ` Eric W. Biederman 2020-09-23 2:43 ` Dave Young 2020-09-23 2:43 ` Dave Young 2020-09-23 15:48 ` Konrad Rzeszutek Wilk 2020-09-23 15:48 ` Konrad Rzeszutek Wilk 2020-09-24 16:15 ` Michael Kelley 2020-09-24 16:15 ` Michael Kelley 2020-09-24 16:25 ` Eric W. Biederman 2020-09-24 16:25 ` Eric W. Biederman 2020-09-24 16:43 ` Michael Kelley 2020-09-24 16:43 ` Michael Kelley 2020-09-24 17:16 ` boris.ostrovsky 2020-09-24 17:16 ` boris.ostrovsky 2020-09-25 3:05 ` Dave Young 2020-09-25 3:05 ` Dave Young 2020-09-25 14:56 ` Konrad Rzeszutek Wilk 2020-09-25 14:56 ` Konrad Rzeszutek Wilk 2020-09-27 2:51 ` Dave Young 2020-09-27 2:51 ` Dave Young 2020-09-29 13:36 ` Philipp Rudo 2020-09-29 13:36 ` Philipp Rudo 2020-09-29 19:10 ` boris.ostrovsky 2020-09-29 19:10 ` boris.ostrovsky 2020-09-22 10:58 ` Philipp Rudo 2020-09-22 10:58 ` Philipp Rudo 2020-09-22 14:50 ` boris.ostrovsky 2020-09-22 14:50 ` boris.ostrovsky 2020-09-22 17:04 ` Guilherme G. Piccoli 2020-09-22 17:04 ` Guilherme G. Piccoli 2020-09-23 2:25 ` Dave Young 2020-09-23 2:25 ` Dave Young
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.