All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Dave Young <dyoung@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	bhe@redhat.com, linux-kernel@vger.kernel.org,
	kexec@lists.infradead.org,
	Eric DeVolder <eric.devolder@oracle.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Tianyu Lani <Tianyu.Lan@microsoft.com>,
	Michael Kelley <mikelley@microsoft.com>,
	Wei Liu <wei.liu@kernel.org>,
	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
	HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Subject: Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time
Date: Wed, 23 Sep 2020 11:48:25 -0400	[thread overview]
Message-ID: <20200923154825.GC7635@char.us.oracle.com> (raw)
In-Reply-To: <20200923024329.GB3642@dhcp-128-65.nay.redhat.com>

On Wed, Sep 23, 2020 at 10:43:29AM +0800, Dave Young wrote:
> + more people who may care about this param 

Paarty time!!

(See below, didn't snip any comments)
> On 09/21/20 at 08:45pm, Eric W. Biederman wrote:
> > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> > 
> > > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote:
> > >> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote:
> > >> 
> > >> > crash_kexec_post_notifiers enables running various panic notifier
> > >> > before kdump kernel booting. This increases risks of kdump failure.
> > >> > It is well documented in kernel-parameters.txt. We do not suggest
> > >> > people to enable it together with kdump unless he/she is really sure.
> > >> > This is also not suggested to be enabled by default when users are
> > >> > not aware in distributions.
> > >> > 
> > >> > But unfortunately it is enabled by default in systemd, see below
> > >> > discussions in a systemd report, we can not convince systemd to change
> > >> > it:
> > >> > https://github.com/systemd/systemd/issues/16661
> > >> > 
> > >> > Actually we have got reports about kdump kernel hangs in both s390x
> > >> > and powerpcle cases caused by the systemd change,  also some x86 cases
> > >> > could also be caused by the same (although that is in Hyper-V code
> > >> > instead of systemd, that need to be addressed separately).
> > >
> > > Perhaps it may be better to fix the issus on s390x and PowerPC as well?
> > >
> > >> > 
> > >> > Thus to avoid the auto enablement here just disable the param writable
> > >> > permission in sysfs.
> > >> > 
> > >> 
> > >> Well.  I don't think this is at all a desirable way of resolving a
> > >> disagreement with the systemd developers
> > >> 
> > >> At the above github address I'm seeing "ryncsn added a commit to
> > >> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't
> > >> enable crash_kexec_post_notifiers by default".  So didn't that address
> > >> the issue?
> > >
> > > It does in systemd, but there is a strong interest in making this on
> > > by default.
> > 
> > There is also a strong interest in removing this code entirely from the
> > kernel.
> 
> Added Hyper-V people and people who created the param, it is below
> commit, I also want to remove it if possible, let's see how people
> think, but the least way should be to disable the auto setting in both systemd
> and kernel:
> 
>     commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45
>     Author: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
>     Date:   Fri Jun 6 14:37:07 2014 -0700
>     
>         kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers
>     
>         Add a "crash_kexec_post_notifiers" boot option to run kdump after
>         running panic_notifiers and dump kmsg.  This can help rare situations
>         where kdump fails because of unstable crashed kernel or hardware failure
>         (memory corruption on critical data/code), or the 2nd kernel is already
>         broken by the 1st kernel (it's a broken behavior, but who can guarantee
>         that the "crashed" kernel works correctly?).
>     
>         Usage: add "crash_kexec_post_notifiers" to kernel boot option.
>     
>         Note that this actually increases risks of the failure of kdump.  This
>         option should be set only if you worry about the rare case of kdump
>         failure rather than increasing the chance of success.


If this is such risky knob that leads to bugs where folks are backing away
from with disgust in their faces - then perhaps the only way to go about
this is - limit the exposure to known working situations on firmware
that we can control?

That is enable only a subset of post notifiers which determine if they
are OK running if the conditions are blessed?

I think that would satisfy the conditions where you have to to deal with unsavory
bugs that end up on your plate - and aren't fun because there is no
way to fixing it -  but at the same time allowing multiple ways to save the crash?

Please don't take away something that is quite useful in the field. Can we
hammer out something that will remove your pain points?
> 
> > 
> > This failure is a case in point.
> > 
> > I think I am at my I told you so point.  This is what all of the testing
> > over all the years has said.  Leaving functionality to the peculiarities
> > of firmware when you don't have to, and can actually control what is
> > going on doesn't work.
> > 
> > Eric
> > 
> > 
> 
> Thanks
> Dave
> 

WARNING: multiple messages have this Message-ID (diff)
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Dave Young <dyoung@redhat.com>
Cc: Wei Liu <wei.liu@kernel.org>,
	Tianyu Lani <Tianyu.Lan@microsoft.com>,
	bhe@redhat.com, kexec@lists.infradead.org,
	linux-kernel@vger.kernel.org,
	Michael Kelley <mikelley@microsoft.com>,
	HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Eric DeVolder <eric.devolder@oracle.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time
Date: Wed, 23 Sep 2020 11:48:25 -0400	[thread overview]
Message-ID: <20200923154825.GC7635@char.us.oracle.com> (raw)
In-Reply-To: <20200923024329.GB3642@dhcp-128-65.nay.redhat.com>

On Wed, Sep 23, 2020 at 10:43:29AM +0800, Dave Young wrote:
> + more people who may care about this param 

Paarty time!!

(See below, didn't snip any comments)
> On 09/21/20 at 08:45pm, Eric W. Biederman wrote:
> > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> > 
> > > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote:
> > >> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <dyoung@redhat.com> wrote:
> > >> 
> > >> > crash_kexec_post_notifiers enables running various panic notifier
> > >> > before kdump kernel booting. This increases risks of kdump failure.
> > >> > It is well documented in kernel-parameters.txt. We do not suggest
> > >> > people to enable it together with kdump unless he/she is really sure.
> > >> > This is also not suggested to be enabled by default when users are
> > >> > not aware in distributions.
> > >> > 
> > >> > But unfortunately it is enabled by default in systemd, see below
> > >> > discussions in a systemd report, we can not convince systemd to change
> > >> > it:
> > >> > https://github.com/systemd/systemd/issues/16661
> > >> > 
> > >> > Actually we have got reports about kdump kernel hangs in both s390x
> > >> > and powerpcle cases caused by the systemd change,  also some x86 cases
> > >> > could also be caused by the same (although that is in Hyper-V code
> > >> > instead of systemd, that need to be addressed separately).
> > >
> > > Perhaps it may be better to fix the issus on s390x and PowerPC as well?
> > >
> > >> > 
> > >> > Thus to avoid the auto enablement here just disable the param writable
> > >> > permission in sysfs.
> > >> > 
> > >> 
> > >> Well.  I don't think this is at all a desirable way of resolving a
> > >> disagreement with the systemd developers
> > >> 
> > >> At the above github address I'm seeing "ryncsn added a commit to
> > >> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't
> > >> enable crash_kexec_post_notifiers by default".  So didn't that address
> > >> the issue?
> > >
> > > It does in systemd, but there is a strong interest in making this on
> > > by default.
> > 
> > There is also a strong interest in removing this code entirely from the
> > kernel.
> 
> Added Hyper-V people and people who created the param, it is below
> commit, I also want to remove it if possible, let's see how people
> think, but the least way should be to disable the auto setting in both systemd
> and kernel:
> 
>     commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45
>     Author: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
>     Date:   Fri Jun 6 14:37:07 2014 -0700
>     
>         kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers
>     
>         Add a "crash_kexec_post_notifiers" boot option to run kdump after
>         running panic_notifiers and dump kmsg.  This can help rare situations
>         where kdump fails because of unstable crashed kernel or hardware failure
>         (memory corruption on critical data/code), or the 2nd kernel is already
>         broken by the 1st kernel (it's a broken behavior, but who can guarantee
>         that the "crashed" kernel works correctly?).
>     
>         Usage: add "crash_kexec_post_notifiers" to kernel boot option.
>     
>         Note that this actually increases risks of the failure of kdump.  This
>         option should be set only if you worry about the rare case of kdump
>         failure rather than increasing the chance of success.


If this is such risky knob that leads to bugs where folks are backing away
from with disgust in their faces - then perhaps the only way to go about
this is - limit the exposure to known working situations on firmware
that we can control?

That is enable only a subset of post notifiers which determine if they
are OK running if the conditions are blessed?

I think that would satisfy the conditions where you have to to deal with unsavory
bugs that end up on your plate - and aren't fun because there is no
way to fixing it -  but at the same time allowing multiple ways to save the crash?

Please don't take away something that is quite useful in the field. Can we
hammer out something that will remove your pain points?
> 
> > 
> > This failure is a case in point.
> > 
> > I think I am at my I told you so point.  This is what all of the testing
> > over all the years has said.  Leaving functionality to the peculiarities
> > of firmware when you don't have to, and can actually control what is
> > going on doesn't work.
> > 
> > Eric
> > 
> > 
> 
> Thanks
> Dave
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  reply	other threads:[~2020-09-23 15:47 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-18  3:25 [PATCH] Only allow to set crash_kexec_post_notifiers on boot time Dave Young
2020-09-18  3:25 ` Dave Young
2020-09-19  0:47 ` Andrew Morton
2020-09-19  0:47   ` Andrew Morton
2020-09-19  7:26   ` Dave Young
2020-09-19  7:26     ` Dave Young
2020-09-21 20:18   ` Konrad Rzeszutek Wilk
2020-09-21 20:18     ` Konrad Rzeszutek Wilk
2020-09-22  1:45     ` Eric W. Biederman
2020-09-22  1:45       ` Eric W. Biederman
2020-09-23  2:43       ` Dave Young
2020-09-23  2:43         ` Dave Young
2020-09-23 15:48         ` Konrad Rzeszutek Wilk [this message]
2020-09-23 15:48           ` Konrad Rzeszutek Wilk
2020-09-24 16:15           ` Michael Kelley
2020-09-24 16:15             ` Michael Kelley
2020-09-24 16:25             ` Eric W. Biederman
2020-09-24 16:25               ` Eric W. Biederman
2020-09-24 16:43               ` Michael Kelley
2020-09-24 16:43                 ` Michael Kelley
2020-09-24 17:16                 ` boris.ostrovsky
2020-09-24 17:16                   ` boris.ostrovsky
2020-09-25  3:05                   ` Dave Young
2020-09-25  3:05                     ` Dave Young
2020-09-25 14:56                     ` Konrad Rzeszutek Wilk
2020-09-25 14:56                       ` Konrad Rzeszutek Wilk
2020-09-27  2:51                       ` Dave Young
2020-09-27  2:51                         ` Dave Young
2020-09-29 13:36                       ` Philipp Rudo
2020-09-29 13:36                         ` Philipp Rudo
2020-09-29 19:10                         ` boris.ostrovsky
2020-09-29 19:10                           ` boris.ostrovsky
2020-09-22 10:58     ` Philipp Rudo
2020-09-22 10:58       ` Philipp Rudo
2020-09-22 14:50       ` boris.ostrovsky
2020-09-22 14:50         ` boris.ostrovsky
2020-09-22 17:04         ` Guilherme G. Piccoli
2020-09-22 17:04           ` Guilherme G. Piccoli
2020-09-23  2:25     ` Dave Young
2020-09-23  2:25       ` Dave Young

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200923154825.GC7635@char.us.oracle.com \
    --to=konrad.wilk@oracle.com \
    --cc=Tianyu.Lan@microsoft.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=d.hatayama@jp.fujitsu.com \
    --cc=dyoung@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=eric.devolder@oracle.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mikelley@microsoft.com \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.