linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Query about kdump_msg hook into crash_kexec()
@ 2011-01-31 22:59 Vivek Goyal
  2011-02-01  7:19 ` Américo Wang
  2011-02-03  0:55 ` Query about kdump_msg hook into crash_kexec() KOSAKI Motohiro
  0 siblings, 2 replies; 41+ messages in thread
From: Vivek Goyal @ 2011-01-31 22:59 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: Eric W. Biederman, linux kernel mailing list, Jarod Wilson

Hi,

I noticed following commit which hooks into crash_kexec() for calling
kmsg_dump().

I think it is not a very good idea to pass control to modules after
crash_kexec() has been called. Because modules can try to take locks
or try to do some other operations which we really should not be doing
now and fail kdump also. The whole design of kdump is built on the
fact that in crashing kernel we do minimal thing and try to make 
transition to second kernel robust. Now with this hook, kmsg dumper
breaks that assumption.

Anyway, if an image is loaded and we have setup to capture dump also
why do we need to save kmsg with the help of an helper. I am assuming
this is more of a debugging aid if we have no other way to capture the
kernel log buffer. So if somebody has setup kdump to capture the
vmcore, why to call another handler which tries to save part of the
vmcore (kmsg) separately.

Thanks
Vivek

commit 0f4bd46ec252887f44f1f065b41867cac8f70dfb
Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date:   Tue Dec 22 03:15:43 2009 +0000

    kmsg_dump: Dump on crash_kexec as well
    
    crash_kexec gets called before kmsg_dump(KMSG_DUMP_OOPS) if
    panic_on_oops is set, so the kernel log buffer is not stored
    for this case.
    
    This patch adds a KMSG_DUMP_KEXEC dump type which gets called
    when crash_kexec() is invoked. To avoid getting double dumps,
    the old KMSG_DUMP_PANIC is moved below crash_kexec(). The
    mtdoops driver is modified to handle KMSG_DUMP_KEXEC in the
    same way as a panic.
    
    Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Acked-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
    Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

Thanks
Vivek

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Query about kdump_msg hook into crash_kexec()
  2011-01-31 22:59 Query about kdump_msg hook into crash_kexec() Vivek Goyal
@ 2011-02-01  7:19 ` Américo Wang
  2011-02-01  7:33   ` Eric W. Biederman
  2011-02-03  0:55 ` Query about kdump_msg hook into crash_kexec() KOSAKI Motohiro
  1 sibling, 1 reply; 41+ messages in thread
From: Américo Wang @ 2011-02-01  7:19 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kosaki.motohiro, Eric W. Biederman, linux kernel mailing list,
	Jarod Wilson

On Mon, Jan 31, 2011 at 05:59:39PM -0500, Vivek Goyal wrote:
>
>Anyway, if an image is loaded and we have setup to capture dump also
>why do we need to save kmsg with the help of an helper. I am assuming
>this is more of a debugging aid if we have no other way to capture the
>kernel log buffer. So if somebody has setup kdump to capture the
>vmcore, why to call another handler which tries to save part of the
>vmcore (kmsg) separately.
>

Well, kmsg dumper is not only for panic/oops, it can also save kernel
messages when the system is reboot/halt.

Yeah, it is true that vmcore contains log_buf, but I think the users
of ramoops/mtdoops are mainly those who don't have kdump configured
in their kernels, they are cheaper than kdump, but less powerful.

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Query about kdump_msg hook into crash_kexec()
  2011-02-01  7:19 ` Américo Wang
@ 2011-02-01  7:33   ` Eric W. Biederman
  2011-02-01  7:38     ` Américo Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Eric W. Biederman @ 2011-02-01  7:33 UTC (permalink / raw)
  To: Américo Wang
  Cc: Vivek Goyal, kosaki.motohiro, linux kernel mailing list, Jarod Wilson

Américo Wang <xiyou.wangcong@gmail.com> writes:

> On Mon, Jan 31, 2011 at 05:59:39PM -0500, Vivek Goyal wrote:
>>
>>Anyway, if an image is loaded and we have setup to capture dump also
>>why do we need to save kmsg with the help of an helper. I am assuming
>>this is more of a debugging aid if we have no other way to capture the
>>kernel log buffer. So if somebody has setup kdump to capture the
>>vmcore, why to call another handler which tries to save part of the
>>vmcore (kmsg) separately.
>>
>
> Well, kmsg dumper is not only for panic/oops, it can also save kernel
> messages when the system is reboot/halt.
>
> Yeah, it is true that vmcore contains log_buf, but I think the users
> of ramoops/mtdoops are mainly those who don't have kdump configured
> in their kernels, they are cheaper than kdump, but less powerful

The issue is the inane call inside crash_kexec.

It requires both a kexec kernel to be loaded and it requires you to be
crashing.  Given that when I audited the kmsg_dump handlers they really
weren't safe in a crash dump scenario we should just remove it.

Eric

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Query about kdump_msg hook into crash_kexec()
  2011-02-01  7:33   ` Eric W. Biederman
@ 2011-02-01  7:38     ` Américo Wang
  2011-02-01  8:13       ` [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec()) Américo Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Américo Wang @ 2011-02-01  7:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Américo Wang, Vivek Goyal, kosaki.motohiro,
	linux kernel mailing list, Jarod Wilson

On Mon, Jan 31, 2011 at 11:33:15PM -0800, Eric W. Biederman wrote:
>Américo Wang <xiyou.wangcong@gmail.com> writes:
>
>> On Mon, Jan 31, 2011 at 05:59:39PM -0500, Vivek Goyal wrote:
>>>
>>>Anyway, if an image is loaded and we have setup to capture dump also
>>>why do we need to save kmsg with the help of an helper. I am assuming
>>>this is more of a debugging aid if we have no other way to capture the
>>>kernel log buffer. So if somebody has setup kdump to capture the
>>>vmcore, why to call another handler which tries to save part of the
>>>vmcore (kmsg) separately.
>>>
>>
>> Well, kmsg dumper is not only for panic/oops, it can also save kernel
>> messages when the system is reboot/halt.
>>
>> Yeah, it is true that vmcore contains log_buf, but I think the users
>> of ramoops/mtdoops are mainly those who don't have kdump configured
>> in their kernels, they are cheaper than kdump, but less powerful
>
>The issue is the inane call inside crash_kexec.
>
>It requires both a kexec kernel to be loaded and it requires you to be
>crashing.  Given that when I audited the kmsg_dump handlers they really
>weren't safe in a crash dump scenario we should just remove it.
>

Probably, I think we need to get rid of KMSG_DUMP_KEXEC.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec())
  2011-02-01  7:38     ` Américo Wang
@ 2011-02-01  8:13       ` Américo Wang
  2011-02-01 15:28         ` Vivek Goyal
  2011-02-03  0:59         ` KOSAKI Motohiro
  0 siblings, 2 replies; 41+ messages in thread
From: Américo Wang @ 2011-02-01  8:13 UTC (permalink / raw)
  To: Américo Wang
  Cc: Eric W. Biederman, Vivek Goyal, kosaki.motohiro,
	linux kernel mailing list, Jarod Wilson, akpm

On Tue, Feb 01, 2011 at 03:38:53PM +0800, Américo Wang wrote:
>On Mon, Jan 31, 2011 at 11:33:15PM -0800, Eric W. Biederman wrote:
>>The issue is the inane call inside crash_kexec.
>>
>>It requires both a kexec kernel to be loaded and it requires you to be
>>crashing.  Given that when I audited the kmsg_dump handlers they really
>>weren't safe in a crash dump scenario we should just remove it.
>>
>
>Probably, I think we need to get rid of KMSG_DUMP_KEXEC.
>

Here we go.

--------->

KMSG_DUMP_KEXEC is useless because we already save kernel messages
inside /proc/vmcore, and it is unsafe to allow modules to do
other stuffs in a crash dump scenario.

Reported-by: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>


---
diff --git a/drivers/char/ramoops.c b/drivers/char/ramoops.c
index 1a9f5f6..8653ba4 100644
--- a/drivers/char/ramoops.c
+++ b/drivers/char/ramoops.c
@@ -69,8 +69,7 @@ static void ramoops_do_dump(struct kmsg_dumper *dumper,
 	struct timeval timestamp;
 
 	if (reason != KMSG_DUMP_OOPS &&
-	    reason != KMSG_DUMP_PANIC &&
-	    reason != KMSG_DUMP_KEXEC)
+	    reason != KMSG_DUMP_PANIC)
 		return;
 
 	/* Only dump oopses if dump_oops is set */
diff --git a/drivers/mtd/mtdoops.c b/drivers/mtd/mtdoops.c
index e3e40f4..56eac4e 100644
--- a/drivers/mtd/mtdoops.c
+++ b/drivers/mtd/mtdoops.c
@@ -308,8 +308,7 @@ static void mtdoops_do_dump(struct kmsg_dumper *dumper,
 	char *dst;
 
 	if (reason != KMSG_DUMP_OOPS &&
-	    reason != KMSG_DUMP_PANIC &&
-	    reason != KMSG_DUMP_KEXEC)
+	    reason != KMSG_DUMP_PANIC)
 		return;
 
 	/* Only dump oopses if dump_oops is set */
diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index 2a0d7d6..05fa2a9 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -17,7 +17,6 @@
 enum kmsg_dump_reason {
 	KMSG_DUMP_OOPS,
 	KMSG_DUMP_PANIC,
-	KMSG_DUMP_KEXEC,
 	KMSG_DUMP_RESTART,
 	KMSG_DUMP_HALT,
 	KMSG_DUMP_POWEROFF,
diff --git a/kernel/kexec.c b/kernel/kexec.c
index ec19b92..3ac0218 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -32,7 +32,6 @@
 #include <linux/console.h>
 #include <linux/vmalloc.h>
 #include <linux/swap.h>
-#include <linux/kmsg_dump.h>
 
 #include <asm/page.h>
 #include <asm/uaccess.h>
@@ -1078,8 +1077,6 @@ void crash_kexec(struct pt_regs *regs)
 		if (kexec_crash_image) {
 			struct pt_regs fixed_regs;
 
-			kmsg_dump(KMSG_DUMP_KEXEC);
-
 			crash_setup_regs(&fixed_regs, regs);
 			crash_save_vmcoreinfo();
 			machine_crash_shutdown(&fixed_regs);

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec())
  2011-02-01  8:13       ` [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec()) Américo Wang
@ 2011-02-01 15:28         ` Vivek Goyal
  2011-02-01 16:06           ` Jarod Wilson
  2011-02-03  0:59         ` KOSAKI Motohiro
  1 sibling, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2011-02-01 15:28 UTC (permalink / raw)
  To: Américo Wang
  Cc: Eric W. Biederman, kosaki.motohiro, linux kernel mailing list,
	Jarod Wilson, akpm, Simon Kagstrom, David Woodhouse

On Tue, Feb 01, 2011 at 04:13:13PM +0800, Américo Wang wrote:
> On Tue, Feb 01, 2011 at 03:38:53PM +0800, Américo Wang wrote:
> >On Mon, Jan 31, 2011 at 11:33:15PM -0800, Eric W. Biederman wrote:
> >>The issue is the inane call inside crash_kexec.
> >>
> >>It requires both a kexec kernel to be loaded and it requires you to be
> >>crashing.  Given that when I audited the kmsg_dump handlers they really
> >>weren't safe in a crash dump scenario we should just remove it.
> >>
> >
> >Probably, I think we need to get rid of KMSG_DUMP_KEXEC.
> >
> 
> Here we go.
> 
> --------->
> 
> KMSG_DUMP_KEXEC is useless because we already save kernel messages
> inside /proc/vmcore, and it is unsafe to allow modules to do
> other stuffs in a crash dump scenario.
> 

I think this is right thing to do. First of all kmsg_dump() call inside
crash_kexec() does not make any sense. Secondly, if kdump kernel is
loaded, we anyway are going to capture log buffers in vmcore and there
should not be any need to try to save messages twice.

Now one can argue that what if kdump does not capture the dump. I think
then right solution (though painful one) is to try to make kdump as
reliable as possible instead of trying to come up with backup mechanisms
to save logs in case kdump fails.

Acked-by: Vivek Goyal <vgoyal@redhat.com>

Thanks
Vivek

> Reported-by: Vivek Goyal <vgoyal@redhat.com>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
> 
> 
> ---
> diff --git a/drivers/char/ramoops.c b/drivers/char/ramoops.c
> index 1a9f5f6..8653ba4 100644
> --- a/drivers/char/ramoops.c
> +++ b/drivers/char/ramoops.c
> @@ -69,8 +69,7 @@ static void ramoops_do_dump(struct kmsg_dumper *dumper,
>  	struct timeval timestamp;
>  
>  	if (reason != KMSG_DUMP_OOPS &&
> -	    reason != KMSG_DUMP_PANIC &&
> -	    reason != KMSG_DUMP_KEXEC)
> +	    reason != KMSG_DUMP_PANIC)
>  		return;
>  
>  	/* Only dump oopses if dump_oops is set */
> diff --git a/drivers/mtd/mtdoops.c b/drivers/mtd/mtdoops.c
> index e3e40f4..56eac4e 100644
> --- a/drivers/mtd/mtdoops.c
> +++ b/drivers/mtd/mtdoops.c
> @@ -308,8 +308,7 @@ static void mtdoops_do_dump(struct kmsg_dumper *dumper,
>  	char *dst;
>  
>  	if (reason != KMSG_DUMP_OOPS &&
> -	    reason != KMSG_DUMP_PANIC &&
> -	    reason != KMSG_DUMP_KEXEC)
> +	    reason != KMSG_DUMP_PANIC)
>  		return;
>  
>  	/* Only dump oopses if dump_oops is set */
> diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
> index 2a0d7d6..05fa2a9 100644
> --- a/include/linux/kmsg_dump.h
> +++ b/include/linux/kmsg_dump.h
> @@ -17,7 +17,6 @@
>  enum kmsg_dump_reason {
>  	KMSG_DUMP_OOPS,
>  	KMSG_DUMP_PANIC,
> -	KMSG_DUMP_KEXEC,
>  	KMSG_DUMP_RESTART,
>  	KMSG_DUMP_HALT,
>  	KMSG_DUMP_POWEROFF,
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index ec19b92..3ac0218 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -32,7 +32,6 @@
>  #include <linux/console.h>
>  #include <linux/vmalloc.h>
>  #include <linux/swap.h>
> -#include <linux/kmsg_dump.h>
>  
>  #include <asm/page.h>
>  #include <asm/uaccess.h>
> @@ -1078,8 +1077,6 @@ void crash_kexec(struct pt_regs *regs)
>  		if (kexec_crash_image) {
>  			struct pt_regs fixed_regs;
>  
> -			kmsg_dump(KMSG_DUMP_KEXEC);
> -
>  			crash_setup_regs(&fixed_regs, regs);
>  			crash_save_vmcoreinfo();
>  			machine_crash_shutdown(&fixed_regs);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec())
  2011-02-01 15:28         ` Vivek Goyal
@ 2011-02-01 16:06           ` Jarod Wilson
  0 siblings, 0 replies; 41+ messages in thread
From: Jarod Wilson @ 2011-02-01 16:06 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Américo Wang, Eric W. Biederman, kosaki.motohiro,
	linux kernel mailing list, akpm, Simon Kagstrom, David Woodhouse

On Tue, Feb 01, 2011 at 10:28:45AM -0500, Vivek Goyal wrote:
> On Tue, Feb 01, 2011 at 04:13:13PM +0800, Américo Wang wrote:
> > On Tue, Feb 01, 2011 at 03:38:53PM +0800, Américo Wang wrote:
> > >On Mon, Jan 31, 2011 at 11:33:15PM -0800, Eric W. Biederman wrote:
> > >>The issue is the inane call inside crash_kexec.
> > >>
> > >>It requires both a kexec kernel to be loaded and it requires you to be
> > >>crashing.  Given that when I audited the kmsg_dump handlers they really
> > >>weren't safe in a crash dump scenario we should just remove it.
> > >>
> > >
> > >Probably, I think we need to get rid of KMSG_DUMP_KEXEC.
> > >
> > 
> > Here we go.
> > 
> > --------->
> > 
> > KMSG_DUMP_KEXEC is useless because we already save kernel messages
> > inside /proc/vmcore, and it is unsafe to allow modules to do
> > other stuffs in a crash dump scenario.
> > 
> 
> I think this is right thing to do. First of all kmsg_dump() call inside
> crash_kexec() does not make any sense. Secondly, if kdump kernel is
> loaded, we anyway are going to capture log buffers in vmcore and there
> should not be any need to try to save messages twice.
> 
> Now one can argue that what if kdump does not capture the dump. I think
> then right solution (though painful one) is to try to make kdump as
> reliable as possible instead of trying to come up with backup mechanisms
> to save logs in case kdump fails.

I'm of the same mind, kmsg_dump is fine by itself, but wedging it in to
duplicate something kdump already does on the grounds that kdump might
fail is Doing It Wrong. If one has concerns about kdump failing, put
effort into solidifying kdump.

Acked-by: Jarod Wilson <jarod@redhat.com>

-- 
Jarod Wilson
jarod@redhat.com


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Query about kdump_msg hook into crash_kexec()
  2011-01-31 22:59 Query about kdump_msg hook into crash_kexec() Vivek Goyal
  2011-02-01  7:19 ` Américo Wang
@ 2011-02-03  0:55 ` KOSAKI Motohiro
  2011-02-03  2:05   ` Vivek Goyal
  1 sibling, 1 reply; 41+ messages in thread
From: KOSAKI Motohiro @ 2011-02-03  0:55 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kosaki.motohiro, Eric W. Biederman, linux kernel mailing list,
	Jarod Wilson

Hi

> Hi,
> 
> I noticed following commit which hooks into crash_kexec() for calling
> kmsg_dump().
> 
> I think it is not a very good idea to pass control to modules after
> crash_kexec() has been called. Because modules can try to take locks
> or try to do some other operations which we really should not be doing
> now and fail kdump also. The whole design of kdump is built on the
> fact that in crashing kernel we do minimal thing and try to make 
> transition to second kernel robust. Now with this hook, kmsg dumper
> breaks that assumption.

I guess you talked about some foolish can shoot their own foot. if so,
Yes. Any kernel module can make kernel panic or more disaster result.


> Anyway, if an image is loaded and we have setup to capture dump also
> why do we need to save kmsg with the help of an helper. I am assuming
> this is more of a debugging aid if we have no other way to capture the
> kernel log buffer. So if somebody has setup kdump to capture the
> vmcore, why to call another handler which tries to save part of the
> vmcore (kmsg) separately.

No.

kmsg_dump() is desingned for embedded. and some embedded devices uses
kexec for non dumping purpose. (Have you seen your embedded devices 
show "Now storing dump image.." message?)

Anyway, you can feel free to avoid to use ksmg_dump().





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec())
  2011-02-01  8:13       ` [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec()) Américo Wang
  2011-02-01 15:28         ` Vivek Goyal
@ 2011-02-03  0:59         ` KOSAKI Motohiro
  2011-02-03  2:07           ` Vivek Goyal
  1 sibling, 1 reply; 41+ messages in thread
From: KOSAKI Motohiro @ 2011-02-03  0:59 UTC (permalink / raw)
  To: Americo Wang
  Cc: kosaki.motohiro, Eric W. Biederman, Vivek Goyal,
	linux kernel mailing list, Jarod Wilson, akpm

> On Tue, Feb 01, 2011 at 03:38:53PM +0800, Américo Wang wrote:
> >On Mon, Jan 31, 2011 at 11:33:15PM -0800, Eric W. Biederman wrote:
> >>The issue is the inane call inside crash_kexec.
> >>
> >>It requires both a kexec kernel to be loaded and it requires you to be
> >>crashing.  Given that when I audited the kmsg_dump handlers they really
> >>weren't safe in a crash dump scenario we should just remove it.
> >>
> >
> >Probably, I think we need to get rid of KMSG_DUMP_KEXEC.
> >
> 
> Here we go.
> 
> --------->
> 
> KMSG_DUMP_KEXEC is useless because we already save kernel messages
> inside /proc/vmcore, and it is unsafe to allow modules to do
> other stuffs in a crash dump scenario.
> 
> Reported-by: Vivek Goyal <vgoyal@redhat.com>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>

I wrote why this is no good idea by another mail. Please see it.
Anyway you have a right to don't use this feature.

Nack.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Query about kdump_msg hook into crash_kexec()
  2011-02-03  0:55 ` Query about kdump_msg hook into crash_kexec() KOSAKI Motohiro
@ 2011-02-03  2:05   ` Vivek Goyal
  2011-02-03  4:52     ` KOSAKI Motohiro
  2011-02-03 18:38     ` Seiji Aguchi
  0 siblings, 2 replies; 41+ messages in thread
From: Vivek Goyal @ 2011-02-03  2:05 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Eric W. Biederman, linux kernel mailing list, Jarod Wilson

On Thu, Feb 03, 2011 at 09:55:41AM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> > Hi,
> > 
> > I noticed following commit which hooks into crash_kexec() for calling
> > kmsg_dump().
> > 
> > I think it is not a very good idea to pass control to modules after
> > crash_kexec() has been called. Because modules can try to take locks
> > or try to do some other operations which we really should not be doing
> > now and fail kdump also. The whole design of kdump is built on the
> > fact that in crashing kernel we do minimal thing and try to make 
> > transition to second kernel robust. Now with this hook, kmsg dumper
> > breaks that assumption.
> 
> I guess you talked about some foolish can shoot their own foot. if so,
> Yes. Any kernel module can make kernel panic or more disaster result.

Yes, the difference is that once a fool shoots his foot, kernel tries
to take a meaningful action to figure out what went wrong. Like displayig
an oops backtrace or like dumping a core (if kdump is configured) so
that one can figure out who was the fool and what did who do.

Now think give the control to two fools. First fool shoots his foot
and then kernel transfers the control to another fool which completely
screws up the situation and one can not save the core.

> 
> 
> > Anyway, if an image is loaded and we have setup to capture dump also
> > why do we need to save kmsg with the help of an helper. I am assuming
> > this is more of a debugging aid if we have no other way to capture the
> > kernel log buffer. So if somebody has setup kdump to capture the
> > vmcore, why to call another handler which tries to save part of the
> > vmcore (kmsg) separately.
> 
> No.
> 
> kmsg_dump() is desingned for embedded.

Great. And I like the idea of trying to save some useful information 
to non volatile RAM or flash or something like that.

> kexec for non dumping purpose. (Have you seen your embedded devices 
> show "Now storing dump image.." message?)

No I have not seen. Can you explain a bit more that apart from kernel
dump, what are the other purposes of kdump. 

> 
> Anyway, you can feel free to avoid to use ksmg_dump().

Yes, that is one more way but this information is not even exported to
user space to figure out if there are any registerd users of kmsg_dump.

Seconly there are two more important things.

- Why do you need a notification from inside crash_kexec(). IOW, what
  is the usage of KMSG_DUMP_KEXEC.

- One can anyway call kmsg_dump() outside crash_kexec() before it so
  that kmsg_dump notification will go out before kdump gets the control.
  What I am arguing here is that it is not necessarily a very good idea
  because external modules can try to do any amount of unsafe actions
  once we export the hook.

  Doing this is still fine if kdump is not configured as anyway syste would
  have rebooted. But if kdump is configured, then we are just reducing
  the reliability of the operation by passing the control in the hands
  of unaudited code and trusting it when kernel data structures are
  corrupt.

  So to me, sending out kmsg_dump notifications are perfectly fine when
  kdump is not configured. But if it is configured, then it probably is
  not a good idea. Anyway, if you have configured the system to capture
  the full dump, why do you also need kmsg_dump. And if you are happy
  with kmsg_dump() then you don't need kdump. So these both seem to be
  mutually exclusive anyway.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec())
  2011-02-03  0:59         ` KOSAKI Motohiro
@ 2011-02-03  2:07           ` Vivek Goyal
  2011-02-03  4:53             ` KOSAKI Motohiro
  0 siblings, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2011-02-03  2:07 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Americo Wang, Eric W. Biederman, linux kernel mailing list,
	Jarod Wilson, akpm

On Thu, Feb 03, 2011 at 09:59:04AM +0900, KOSAKI Motohiro wrote:
> > On Tue, Feb 01, 2011 at 03:38:53PM +0800, Américo Wang wrote:
> > >On Mon, Jan 31, 2011 at 11:33:15PM -0800, Eric W. Biederman wrote:
> > >>The issue is the inane call inside crash_kexec.
> > >>
> > >>It requires both a kexec kernel to be loaded and it requires you to be
> > >>crashing.  Given that when I audited the kmsg_dump handlers they really
> > >>weren't safe in a crash dump scenario we should just remove it.
> > >>
> > >
> > >Probably, I think we need to get rid of KMSG_DUMP_KEXEC.
> > >
> > 
> > Here we go.
> > 
> > --------->
> > 
> > KMSG_DUMP_KEXEC is useless because we already save kernel messages
> > inside /proc/vmcore, and it is unsafe to allow modules to do
> > other stuffs in a crash dump scenario.
> > 
> > Reported-by: Vivek Goyal <vgoyal@redhat.com>
> > Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> > Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
> 
> I wrote why this is no good idea by another mail. Please see it.
> Anyway you have a right to don't use this feature.
> 

But you have not explained that why do you need to hook into crash_kexec()
and you have also not explained why do you need to send out kdump_msg()
notification if kdump is configured.

Some detailed explanation here would help.

Thanks
Vivek

> Nack.
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Query about kdump_msg hook into crash_kexec()
  2011-02-03  2:05   ` Vivek Goyal
@ 2011-02-03  4:52     ` KOSAKI Motohiro
  2011-02-03  5:20       ` KOSAKI Motohiro
  2011-02-04 14:58       ` Vivek Goyal
  2011-02-03 18:38     ` Seiji Aguchi
  1 sibling, 2 replies; 41+ messages in thread
From: KOSAKI Motohiro @ 2011-02-03  4:52 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kosaki.motohiro, Eric W. Biederman, linux kernel mailing list,
	Jarod Wilson

> On Thu, Feb 03, 2011 at 09:55:41AM +0900, KOSAKI Motohiro wrote:
> > Hi
> > 
> > > Hi,
> > > 
> > > I noticed following commit which hooks into crash_kexec() for calling
> > > kmsg_dump().
> > > 
> > > I think it is not a very good idea to pass control to modules after
> > > crash_kexec() has been called. Because modules can try to take locks
> > > or try to do some other operations which we really should not be doing
> > > now and fail kdump also. The whole design of kdump is built on the
> > > fact that in crashing kernel we do minimal thing and try to make 
> > > transition to second kernel robust. Now with this hook, kmsg dumper
> > > breaks that assumption.
> > 
> > I guess you talked about some foolish can shoot their own foot. if so,
> > Yes. Any kernel module can make kernel panic or more disaster result.
> 
> Yes, the difference is that once a fool shoots his foot, kernel tries
> to take a meaningful action to figure out what went wrong. Like displayig
> an oops backtrace or like dumping a core (if kdump is configured) so
> that one can figure out who was the fool and what did who do.
> 
> Now think give the control to two fools. First fool shoots his foot
> and then kernel transfers the control to another fool which completely
> screws up the situation and one can not save the core.

If you really want to full control, you should disable CONFIG_MODULES,
kprobes, ftrace and perf. We have a lot of kernel capturing way already.
So, only one feature diabling don't solve anything. Alternatively, 
I can imagine to improve security modules and audit loaded kernel 
module (and other injection code) more carefully.

So, I'm curious why do you hate so much a part of them and not all of them.



> > > Anyway, if an image is loaded and we have setup to capture dump also
> > > why do we need to save kmsg with the help of an helper. I am assuming
> > > this is more of a debugging aid if we have no other way to capture the
> > > kernel log buffer. So if somebody has setup kdump to capture the
> > > vmcore, why to call another handler which tries to save part of the
> > > vmcore (kmsg) separately.
> > 
> > No.
> > 
> > kmsg_dump() is desingned for embedded.
> 
> Great. And I like the idea of trying to save some useful information 
> to non volatile RAM or flash or something like that.

Yeah, thanks.

> 
> > kexec for non dumping purpose. (Have you seen your embedded devices 
> > show "Now storing dump image.." message?)
> 
> No I have not seen. Can you explain a bit more that apart from kernel
> dump, what are the other purposes of kdump. 
> 
> > 
> > Anyway, you can feel free to avoid to use ksmg_dump().
> 
> Yes, that is one more way but this information is not even exported to
> user space to figure out if there are any registerd users of kmsg_dump.
> 
> Seconly there are two more important things.
> 
> - Why do you need a notification from inside crash_kexec(). IOW, what
>   is the usage of KMSG_DUMP_KEXEC.

AFAIK, kexec is used sneak rebooting way when the system face unexpected
scenario on some devices. (Some embedded is running very long time, then 
it can't avoid memory bit corruption. all of reset is a last resort. 
and a vendor gather logs at periodically checkback).

The main purpose of to introduce KMSG_DUMP_KEXEC is to be separate it
from KMSG_DUMP_PANIC. At kmsg_dump() initial patch, KMSG_DUMP_PANIC 
is always called both kdump is configured or not. But it's no good idea
the same log is to be appeared when both kexec was successed and failured.
Moreover someone don't want any log at kexec phase. They only want logs
when real panic (ie kexec failure) route. Then, I've separated it to two.
Two separated argument can solve above both requreiment.


> - One can anyway call kmsg_dump() outside crash_kexec() before it so
>   that kmsg_dump notification will go out before kdump gets the control.
>   What I am arguing here is that it is not necessarily a very good idea
>   because external modules can try to do any amount of unsafe actions
>   once we export the hook.

I wrote why I don't think I added new risk. (shortly, It can be a lot of
another way)
Can you please tell me your view of my point? I'm afraid I haven't 
understand your worry. So, I hope to understand it before making 
alternative fixing patch.

>   Doing this is still fine if kdump is not configured as anyway syste would
>   have rebooted. But if kdump is configured, then we are just reducing
>   the reliability of the operation by passing the control in the hands
>   of unaudited code and trusting it when kernel data structures are
>   corrupt.

At minimum, I'm fully agree we need reliable kdump. I only put a doubtness
this removing is a real solution or not.

>   So to me, sending out kmsg_dump notifications are perfectly fine when
>   kdump is not configured. But if it is configured, then it probably is
>   not a good idea. Anyway, if you have configured the system to capture
>   the full dump, why do you also need kmsg_dump. And if you are happy
>   with kmsg_dump() then you don't need kdump. So these both seem to be
>   mutually exclusive anyway.

Honestly, I haven't heared anyone are using both at the same time. But
I can guess some reason. 1) If the system is very big box, kdump is
really slooooooow operation. example Some stock exchange system have half
terabytes memory and it mean dump delivery need to hald days at worse. But
market should open just 9:00 at next day. So, summry information (eg log and
trace information) spoiling is important thing. 2) Two sequence crash (ie
crash kdump reboot-start next-crash-before-finish-reboot) can override former
dump image. Usually admin _guess_ the reason of two are same and report boss so.
But unfortunatelly customers at high end area often refuse a _guess_ report.

Or, it's for business competition reason. As far as I heared, IBM and HP 
UNI*X system can save the logs both dump and special flash device. 

PS: FWIW, Hitach folks have usage idea for their enterprise purpose, but 
    unfortunately I don't know its detail. I hope anyone tell us it.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec())
  2011-02-03  2:07           ` Vivek Goyal
@ 2011-02-03  4:53             ` KOSAKI Motohiro
  2011-05-26 20:10               ` Andrew Morton
  0 siblings, 1 reply; 41+ messages in thread
From: KOSAKI Motohiro @ 2011-02-03  4:53 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kosaki.motohiro, Americo Wang, Eric W. Biederman,
	linux kernel mailing list, Jarod Wilson, akpm

> > I wrote why this is no good idea by another mail. Please see it.
> > Anyway you have a right to don't use this feature.
> > 
> 
> But you have not explained that why do you need to hook into crash_kexec()
> and you have also not explained why do you need to send out kdump_msg()
> notification if kdump is configured.
> 
> Some detailed explanation here would help.

Hi,

I send it you now :)




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Query about kdump_msg hook into crash_kexec()
  2011-02-03  4:52     ` KOSAKI Motohiro
@ 2011-02-03  5:20       ` KOSAKI Motohiro
  2011-02-04 15:00         ` Vivek Goyal
  2011-02-04 14:58       ` Vivek Goyal
  1 sibling, 1 reply; 41+ messages in thread
From: KOSAKI Motohiro @ 2011-02-03  5:20 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: kosaki.motohiro, Vivek Goyal, Eric W. Biederman,
	linux kernel mailing list, Jarod Wilson

> AFAIK, kexec is used sneak rebooting way when the system face unexpected
> scenario on some devices. (Some embedded is running very long time, then 
> it can't avoid memory bit corruption. all of reset is a last resort. 
> and a vendor gather logs at periodically checkback).
> 
> The main purpose of to introduce KMSG_DUMP_KEXEC is to be separate it
> from KMSG_DUMP_PANIC. At kmsg_dump() initial patch, KMSG_DUMP_PANIC 
> is always called both kdump is configured or not. But it's no good idea
> the same log is to be appeared when both kexec was successed and failured.
> Moreover someone don't want any log at kexec phase. They only want logs
> when real panic (ie kexec failure) route. Then, I've separated it to two.
> Two separated argument can solve above both requreiment.

A bit additional explanation, An original patch have kmsg_dump(KMSG_DUMP_PANIC)
callsite at following point. I didn't think it makes either embedded or 
kdump folks happiness. Thus I moved it after crash_kexec().


---------------------------------------------------------------------
@@ -74,6 +75,7 @@ NORET_TYPE void panic(const char * fmt, ...)
        dump_stack();
 #endif

+       kmsg_dump(KMSG_DUMP_PANIC);
        /*
         * If we have crashed and we have a crash kernel loaded let it handle
         * everything else.
         * Do we want to call this before we try to display a message?
         */
        crash_kexec(NULL);
---------------------------------------------------------------------




^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: Query about kdump_msg hook into crash_kexec()
  2011-02-03  2:05   ` Vivek Goyal
  2011-02-03  4:52     ` KOSAKI Motohiro
@ 2011-02-03 18:38     ` Seiji Aguchi
  2011-02-03 21:13       ` Eric W. Biederman
  1 sibling, 1 reply; 41+ messages in thread
From: Seiji Aguchi @ 2011-02-03 18:38 UTC (permalink / raw)
  To: Vivek Goyal, KOSAKI Motohiro
  Cc: Eric W. Biederman, linux kernel mailing list, Jarod Wilson

Hi,

>PS: FWIW, Hitach folks have usage idea for their enterprise purpose, but 
>    unfortunately I don't know its detail. I hope anyone tell us it.

I explain the usage of kmsg_dump(KMSG_DUMP_KEXEC) in enterprise area.

[Background]
In our support service experience, we always need to detect root cause 
of OS panic.
So, customers in enterprise area never forgive us if kdump fails and 
we can't detect the root cause of panic due to lack of materials for 
investigation.

>- Why do you need a notification from inside crash_kexec(). IOW, what
>  is the usage of KMSG_DUMP_KEXEC.


The usage of kdump(KMSG_DUMP_KEXEC) in enterprise area is getting 
useful information for investigating kernel crash in case kdump
kernel doesn't boot.

Kdump kernel may not start booting because there is a sha256 checksum
verified over the kdump kernel before it starts booting.
This means kdump kernel may fail even if there is no bug in kdump and
we can't get any information for detecting root cause of kernel crash.

As I mentioned in [Background], We must avoid lack of materials for 
investigation.
So, kdump(KMSG_DUMP_KEXEC) is very important feature in enterprise
area.

>- One can anyway call kmsg_dump() outside crash_kexec() before it so
>  that kmsg_dump notification will go out before kdump gets the control.
>  What I am arguing here is that it is not necessarily a very good idea
>  because external modules can try to do any amount of unsafe actions
>  once we export the hook.
>

kmsg_dump() is a feature for specific servers capable NVRAM or flash memory.
So, it should be provided as a option.

By providing notification, linux kernel is able to support
this feature in following different kinds of servers flexibly.

- equipped with some NVRAMs or flash memories
- equipped with a NVRAM or flash memory

Also, linux kernel is able to support servers which don't have NVRAM/flash memory

Seiji

>-----Original Message-----
>From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Vivek Goyal
>Sent: Wednesday, February 02, 2011 9:05 PM
>To: KOSAKI Motohiro
>Cc: Eric W. Biederman; linux kernel mailing list; Jarod Wilson
>Subject: Re: Query about kdump_msg hook into crash_kexec()
>
>On Thu, Feb 03, 2011 at 09:55:41AM +0900, KOSAKI Motohiro wrote:
>> Hi
>>
>> > Hi,
>> >
>> > I noticed following commit which hooks into crash_kexec() for calling
>> > kmsg_dump().
>> >
>> > I think it is not a very good idea to pass control to modules after
>> > crash_kexec() has been called. Because modules can try to take locks
>> > or try to do some other operations which we really should not be doing
>> > now and fail kdump also. The whole design of kdump is built on the
>> > fact that in crashing kernel we do minimal thing and try to make
>> > transition to second kernel robust. Now with this hook, kmsg dumper
>> > breaks that assumption.
>>
>> I guess you talked about some foolish can shoot their own foot. if so,
>> Yes. Any kernel module can make kernel panic or more disaster result.
>
>Yes, the difference is that once a fool shoots his foot, kernel tries
>to take a meaningful action to figure out what went wrong. Like displayig
>an oops backtrace or like dumping a core (if kdump is configured) so
>that one can figure out who was the fool and what did who do.
>
>Now think give the control to two fools. First fool shoots his foot
>and then kernel transfers the control to another fool which completely
>screws up the situation and one can not save the core.
>
>>
>>
>> > Anyway, if an image is loaded and we have setup to capture dump also
>> > why do we need to save kmsg with the help of an helper. I am assuming
>> > this is more of a debugging aid if we have no other way to capture the
>> > kernel log buffer. So if somebody has setup kdump to capture the
>> > vmcore, why to call another handler which tries to save part of the
>> > vmcore (kmsg) separately.
>>
>> No.
>>
>> kmsg_dump() is desingned for embedded.
>
>Great. And I like the idea of trying to save some useful information
>to non volatile RAM or flash or something like that.
>
>> kexec for non dumping purpose. (Have you seen your embedded devices
>> show "Now storing dump image.." message?)
>
>No I have not seen. Can you explain a bit more that apart from kernel
>dump, what are the other purposes of kdump.
>
>>
>> Anyway, you can feel free to avoid to use ksmg_dump().
>
>Yes, that is one more way but this information is not even exported to
>user space to figure out if there are any registerd users of kmsg_dump.
>
>Seconly there are two more important things.
>
>- Why do you need a notification from inside crash_kexec(). IOW, what
>  is the usage of KMSG_DUMP_KEXEC.
>
>- One can anyway call kmsg_dump() outside crash_kexec() before it so
>  that kmsg_dump notification will go out before kdump gets the control.
>  What I am arguing here is that it is not necessarily a very good idea
>  because external modules can try to do any amount of unsafe actions
>  once we export the hook.
>
>  Doing this is still fine if kdump is not configured as anyway syste would
>  have rebooted. But if kdump is configured, then we are just reducing
>  the reliability of the operation by passing the control in the hands
>  of unaudited code and trusting it when kernel data structures are
>  corrupt.
>
>  So to me, sending out kmsg_dump notifications are perfectly fine when
>  kdump is not configured. But if it is configured, then it probably is
>  not a good idea. Anyway, if you have configured the system to capture
>  the full dump, why do you also need kmsg_dump. And if you are happy
>  with kmsg_dump() then you don't need kdump. So these both seem to be
>  mutually exclusive anyway.
>
>Thanks
>Vivek
>--
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Query about kdump_msg hook into crash_kexec()
  2011-02-03 18:38     ` Seiji Aguchi
@ 2011-02-03 21:13       ` Eric W. Biederman
  2011-02-03 22:08         ` Seiji Aguchi
  0 siblings, 1 reply; 41+ messages in thread
From: Eric W. Biederman @ 2011-02-03 21:13 UTC (permalink / raw)
  To: Seiji Aguchi
  Cc: Vivek Goyal, KOSAKI Motohiro, linux kernel mailing list, Jarod Wilson

Seiji Aguchi <seiji.aguchi@hds.com> writes:

> Hi,
>
>>PS: FWIW, Hitach folks have usage idea for their enterprise purpose, but 
>>    unfortunately I don't know its detail. I hope anyone tell us it.
>
> I explain the usage of kmsg_dump(KMSG_DUMP_KEXEC) in enterprise area.
>
> [Background]
> In our support service experience, we always need to detect root cause 
> of OS panic.
> So, customers in enterprise area never forgive us if kdump fails and 
> we can't detect the root cause of panic due to lack of materials for 
> investigation.
>
>>- Why do you need a notification from inside crash_kexec(). IOW, what
>>  is the usage of KMSG_DUMP_KEXEC.
>
>
> The usage of kdump(KMSG_DUMP_KEXEC) in enterprise area is getting 
> useful information for investigating kernel crash in case kdump
> kernel doesn't boot.
>
> Kdump kernel may not start booting because there is a sha256 checksum
> verified over the kdump kernel before it starts booting.
> This means kdump kernel may fail even if there is no bug in kdump and
> we can't get any information for detecting root cause of kernel crash

Sure it is theoretically possible that the sha256 checksum gets
corrupted (I have never seen it happen or heard reports of it
happening).  It is a feature that if someone has corrupted your code the
code doesn't try and run anyway and corrupt anything else.

That you are arguing against have such a feature in the code you use to
write to persistent storage is scary.

> As I mentioned in [Background], We must avoid lack of materials for 
> investigation.
> So, kdump(KMSG_DUMP_KEXEC) is very important feature in enterprise
> area.

That sounds wonderful, but it doesn't jive with the
code. kmsg_dump(KMSG_DUMP_KEXEC) when I read through it was simply not
written to be robust when most of the kernel is suspect.  Making it in
appropriate for use on the crash_kexec path.  I do not believe kmsg_dump
has seen any testing in kernel failure scenarios.

There is this huge assumption that kmsg_dump is more reliable than
crash_kexec, from my review of the code kmsg_dump is simply not safe in
the context of a broken kernel.  The kmsg_dump code last I looked code
won't work if called with interrupts disabled.

Furthermore kmsg_dump(KMSG_DUMP_KEXEC) is only useful for debugging
crash_kexec.  Which has it backwards as it is kmsg_dump that needs the
debugging.

You just argued that it is better to corrupt the target of your
kmsg_dump in the event of a kernel failure instead of to fail silently.

I don't want that unreliable code that wants to corrupt my jffs
partition anywhere near my machines.

Eric

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: Query about kdump_msg hook into crash_kexec()
  2011-02-03 21:13       ` Eric W. Biederman
@ 2011-02-03 22:08         ` Seiji Aguchi
  2011-02-04  2:24           ` Américo Wang
  2011-02-08 16:46           ` Vivek Goyal
  0 siblings, 2 replies; 41+ messages in thread
From: Seiji Aguchi @ 2011-02-03 22:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Vivek Goyal, KOSAKI Motohiro, linux kernel mailing list, Jarod Wilson

Hi Eric,

Thank you for your prompt reply.

I would like to consider "Needs in enterprise area" and "Implementation of kmsg_dump()" separately.

(1) Needs in enterprise area
  In case of kdump failure, we would like to store kernel buffer to NVRAM/flush memory
  for detecting root cause of kernel crash.

(2) Implementation of kmsg_dump 
  You suggest to review/test cording of kmsg_dump() more.

What do you think about (1)?
Is it acceptable for you?

Seiji

>-----Original Message-----
>From: Eric W. Biederman [mailto:ebiederm@xmission.com]
>Sent: Thursday, February 03, 2011 4:13 PM
>To: Seiji Aguchi
>Cc: Vivek Goyal; KOSAKI Motohiro; linux kernel mailing list; Jarod Wilson
>Subject: Re: Query about kdump_msg hook into crash_kexec()
>
>Seiji Aguchi <seiji.aguchi@hds.com> writes:
>
>> Hi,
>>
>>>PS: FWIW, Hitach folks have usage idea for their enterprise purpose, but
>>>    unfortunately I don't know its detail. I hope anyone tell us it.
>>
>> I explain the usage of kmsg_dump(KMSG_DUMP_KEXEC) in enterprise area.
>>
>> [Background]
>> In our support service experience, we always need to detect root cause
>> of OS panic.
>> So, customers in enterprise area never forgive us if kdump fails and
>> we can't detect the root cause of panic due to lack of materials for
>> investigation.
>>
>>>- Why do you need a notification from inside crash_kexec(). IOW, what
>>>  is the usage of KMSG_DUMP_KEXEC.
>>
>>
>> The usage of kdump(KMSG_DUMP_KEXEC) in enterprise area is getting
>> useful information for investigating kernel crash in case kdump
>> kernel doesn't boot.
>>
>> Kdump kernel may not start booting because there is a sha256 checksum
>> verified over the kdump kernel before it starts booting.
>> This means kdump kernel may fail even if there is no bug in kdump and
>> we can't get any information for detecting root cause of kernel crash
>
>Sure it is theoretically possible that the sha256 checksum gets
>corrupted (I have never seen it happen or heard reports of it
>happening).  It is a feature that if someone has corrupted your code the
>code doesn't try and run anyway and corrupt anything else.
>
>That you are arguing against have such a feature in the code you use to
>write to persistent storage is scary.
>
>> As I mentioned in [Background], We must avoid lack of materials for
>> investigation.
>> So, kdump(KMSG_DUMP_KEXEC) is very important feature in enterprise
>> area.
>
>That sounds wonderful, but it doesn't jive with the
>code. kmsg_dump(KMSG_DUMP_KEXEC) when I read through it was simply not
>written to be robust when most of the kernel is suspect.  Making it in
>appropriate for use on the crash_kexec path.  I do not believe kmsg_dump
>has seen any testing in kernel failure scenarios.
>
>There is this huge assumption that kmsg_dump is more reliable than
>crash_kexec, from my review of the code kmsg_dump is simply not safe in
>the context of a broken kernel.  The kmsg_dump code last I looked code
>won't work if called with interrupts disabled.
>
>Furthermore kmsg_dump(KMSG_DUMP_KEXEC) is only useful for debugging
>crash_kexec.  Which has it backwards as it is kmsg_dump that needs the
>debugging.
>
>You just argued that it is better to corrupt the target of your
>kmsg_dump in the event of a kernel failure instead of to fail silently.
>
>I don't want that unreliable code that wants to corrupt my jffs
>partition anywhere near my machines.
>
>Eric

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Query about kdump_msg hook into crash_kexec()
  2011-02-03 22:08         ` Seiji Aguchi
@ 2011-02-04  2:24           ` Américo Wang
  2011-02-04  2:50             ` Vivek Goyal
  2011-02-08 16:46           ` Vivek Goyal
  1 sibling, 1 reply; 41+ messages in thread
From: Américo Wang @ 2011-02-04  2:24 UTC (permalink / raw)
  To: Seiji Aguchi
  Cc: Eric W. Biederman, Vivek Goyal, KOSAKI Motohiro,
	linux kernel mailing list, Jarod Wilson

On Thu, Feb 03, 2011 at 05:08:01PM -0500, Seiji Aguchi wrote:
>Hi Eric,
>
>Thank you for your prompt reply.
>
>I would like to consider "Needs in enterprise area" and "Implementation of kmsg_dump()" separately.
>
>(1) Needs in enterprise area
>  In case of kdump failure, we would like to store kernel buffer to NVRAM/flush memory
>  for detecting root cause of kernel crash.
>
>(2) Implementation of kmsg_dump 
>  You suggest to review/test cording of kmsg_dump() more.
>
>What do you think about (1)?
>Is it acceptable for you?
>

For "in case of kdump fails", we can move
KMSG_DUMP_OOPS/KMSG_DUMP_PANIC before calling crash_kexec(),
so you can capture the log before kdump happens.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Query about kdump_msg hook into crash_kexec()
  2011-02-04  2:24           ` Américo Wang
@ 2011-02-04  2:50             ` Vivek Goyal
  2011-02-04  3:28               ` Américo Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2011-02-04  2:50 UTC (permalink / raw)
  To: Américo Wang
  Cc: Seiji Aguchi, Eric W. Biederman, KOSAKI Motohiro,
	linux kernel mailing list, Jarod Wilson

On Fri, Feb 04, 2011 at 10:24:27AM +0800, Américo Wang wrote:
> On Thu, Feb 03, 2011 at 05:08:01PM -0500, Seiji Aguchi wrote:
> >Hi Eric,
> >
> >Thank you for your prompt reply.
> >
> >I would like to consider "Needs in enterprise area" and "Implementation of kmsg_dump()" separately.
> >
> >(1) Needs in enterprise area
> >  In case of kdump failure, we would like to store kernel buffer to NVRAM/flush memory
> >  for detecting root cause of kernel crash.
> >
> >(2) Implementation of kmsg_dump 
> >  You suggest to review/test cording of kmsg_dump() more.
> >
> >What do you think about (1)?
> >Is it acceptable for you?
> >
> 
> For "in case of kdump fails", we can move
> KMSG_DUMP_OOPS/KMSG_DUMP_PANIC before calling crash_kexec(),
> so you can capture the log before kdump happens.

True. If the idea is to capture the logs before kdump starts saving
core, then kdump_msg() call, can be put before crash_kexec() call and
there is no need to hook into crash_kexec().

Secondly now the question of whether kdump_msg() call should be before
crash_kexec() or not because modules might want to do lots of unreliable
things, I am now split on that. Especially because of the fact that if
somebody wants it probably can use kprobe to hook into crash_kexec()
or panic() or something like that to execute its code before everything
else.

The other reason I am little split on this because in the past I have seen
kdump fail many a times either because of chipset issues or because of driver
issues etc. So though I don't like it and I want drivers to be fixed
to take care of booting in kdump environment, things not necessarily
worked as well as we expected them to and it is not hard to meet a kdump
failure.

Hence though I do not prefer it but I will not stand in the way of kdump_msg()
being called before crash_kexec() gets the control.

But we should atleast remove the hook from crash_kexec() and get rid
of additional config option (KMSG_DUMP_KEXEC).

Thanks
Vivek

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Query about kdump_msg hook into crash_kexec()
  2011-02-04  2:50             ` Vivek Goyal
@ 2011-02-04  3:28               ` Américo Wang
  2011-02-04  6:40                 ` KOSAKI Motohiro
  0 siblings, 1 reply; 41+ messages in thread
From: Américo Wang @ 2011-02-04  3:28 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Américo Wang, Seiji Aguchi, Eric W. Biederman,
	KOSAKI Motohiro, linux kernel mailing list, Jarod Wilson

On Thu, Feb 03, 2011 at 09:50:42PM -0500, Vivek Goyal wrote:
>On Fri, Feb 04, 2011 at 10:24:27AM +0800, Américo Wang wrote:
>> On Thu, Feb 03, 2011 at 05:08:01PM -0500, Seiji Aguchi wrote:
>> >Hi Eric,
>> >
>> >Thank you for your prompt reply.
>> >
>> >I would like to consider "Needs in enterprise area" and "Implementation of kmsg_dump()" separately.
>> >
>> >(1) Needs in enterprise area
>> >  In case of kdump failure, we would like to store kernel buffer to NVRAM/flush memory
>> >  for detecting root cause of kernel crash.
>> >
>> >(2) Implementation of kmsg_dump 
>> >  You suggest to review/test cording of kmsg_dump() more.
>> >
>> >What do you think about (1)?
>> >Is it acceptable for you?
>> >
>> 
>> For "in case of kdump fails", we can move
>> KMSG_DUMP_OOPS/KMSG_DUMP_PANIC before calling crash_kexec(),
>> so you can capture the log before kdump happens.
>
>True. If the idea is to capture the logs before kdump starts saving
>core, then kdump_msg() call, can be put before crash_kexec() call and
>there is no need to hook into crash_kexec().
>

Totally agreed.

>Secondly now the question of whether kdump_msg() call should be before
>crash_kexec() or not because modules might want to do lots of unreliable
>things, I am now split on that. Especially because of the fact that if
>somebody wants it probably can use kprobe to hook into crash_kexec()
>or panic() or something like that to execute its code before everything
>else.

If kprobe is the reason, then probably we can remove lots of other
kernel API's like kmsg dumper.


>
>The other reason I am little split on this because in the past I have seen
>kdump fail many a times either because of chipset issues or because of driver
>issues etc. So though I don't like it and I want drivers to be fixed
>to take care of booting in kdump environment, things not necessarily
>worked as well as we expected them to and it is not hard to meet a kdump
>failure.
>
>Hence though I do not prefer it but I will not stand in the way of kdump_msg()
>being called before crash_kexec() gets the control.
>

The point is that we don't have to choose one of them (kmsg dumper or
kdump), we can choose to have them both. Like Seiji said, they always
want the kernel messages to be saved. Duplicates are much better than
nothing.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Query about kdump_msg hook into crash_kexec()
  2011-02-04  3:28               ` Américo Wang
@ 2011-02-04  6:40                 ` KOSAKI Motohiro
  0 siblings, 0 replies; 41+ messages in thread
From: KOSAKI Motohiro @ 2011-02-04  6:40 UTC (permalink / raw)
  To: Americo Wang
  Cc: kosaki.motohiro, Vivek Goyal, Seiji Aguchi, Eric W. Biederman,
	linux kernel mailing list, Jarod Wilson

> >Secondly now the question of whether kdump_msg() call should be before
> >crash_kexec() or not because modules might want to do lots of unreliable
> >things, I am now split on that. Especially because of the fact that if
> >somebody wants it probably can use kprobe to hook into crash_kexec()
> >or panic() or something like that to execute its code before everything
> >else.
> 
> If kprobe is the reason, then probably we can remove lots of other
> kernel API's like kmsg dumper.

Really?

On enterprise, distro kernel has CONFIG_KPROBE=y. It has both pros and cons, 
pros) kprobe can provide alternative hooking way therefore kmsg dump is not 
must.  cons) even if we remove kmsg dump, we can't get full kernel control. 
kprobe can inject unrealiable code. It's two sides of the same coin. But, 
many embedded don't have kprobe feature, therefore kprobe can't be 
alternative of kmsg dumper.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Query about kdump_msg hook into crash_kexec()
  2011-02-03  4:52     ` KOSAKI Motohiro
  2011-02-03  5:20       ` KOSAKI Motohiro
@ 2011-02-04 14:58       ` Vivek Goyal
  1 sibling, 0 replies; 41+ messages in thread
From: Vivek Goyal @ 2011-02-04 14:58 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Eric W. Biederman, linux kernel mailing list, Jarod Wilson

On Thu, Feb 03, 2011 at 01:52:01PM +0900, KOSAKI Motohiro wrote:

[..]
> > Seconly there are two more important things.
> > 
> > - Why do you need a notification from inside crash_kexec(). IOW, what
> >   is the usage of KMSG_DUMP_KEXEC.
> 
> AFAIK, kexec is used sneak rebooting way when the system face unexpected
> scenario on some devices. (Some embedded is running very long time, then 
> it can't avoid memory bit corruption. all of reset is a last resort. 
> and a vendor gather logs at periodically checkback).
> 
> The main purpose of to introduce KMSG_DUMP_KEXEC is to be separate it
> from KMSG_DUMP_PANIC. At kmsg_dump() initial patch, KMSG_DUMP_PANIC 
> is always called both kdump is configured or not. But it's no good idea
> the same log is to be appeared when both kexec was successed and failured.
> Moreover someone don't want any log at kexec phase. They only want logs
> when real panic (ie kexec failure) route. Then, I've separated it to two.
> Two separated argument can solve above both requreiment.

This point is not very clear to me. Let me add some details.

- crash_kexec() path is not taken during regular kexec boot. It is only
  taken when first kernel has crashed and we want to boot in kdump
  kernel

- If kdump fails, most likely you will either be hung or panic in second
  kernel or something like that.

- Current code looks as follows.

        if (mutex_trylock(&kexec_mutex)) {
                if (kexec_crash_image) {
                        struct pt_regs fixed_regs;

                        kmsg_dump(KMSG_DUMP_KEXEC);

                        crash_setup_regs(&fixed_regs, regs);
                        crash_save_vmcoreinfo();
                        machine_crash_shutdown(&fixed_regs);
                        machine_kexec(kexec_crash_image);
                }
                mutex_unlock(&kexec_mutex);

If a kdump kernel is not loaded you will call kmsg_dump(PANIC) and if
a kdump kernel is loaded you will call kmsg_dump(KEXEC). This is
irrespective of the fact whether kdump succeeds or fails.

- ramoops driver is not differentiating between KMSG_DUMP_KEXEC or
  KMSG_DUMP_PANIC. Similiarly mtdoops driver is not differentiating
  between two reasons. IOW, to me there does not seem to be any difference
  between events KMSG_DUMP_KEXEC and KMSG_DUMP_PANIC.

So we should simply get rid of extra event KMSG_DUMP_KEXEC?

> 
> 
> > - One can anyway call kmsg_dump() outside crash_kexec() before it so
> >   that kmsg_dump notification will go out before kdump gets the control.
> >   What I am arguing here is that it is not necessarily a very good idea
> >   because external modules can try to do any amount of unsafe actions
> >   once we export the hook.
> 
> I wrote why I don't think I added new risk. (shortly, It can be a lot of
> another way)
> Can you please tell me your view of my point? I'm afraid I haven't 
> understand your worry. So, I hope to understand it before making 
> alternative fixing patch.

Once you provide a hook to modules they can try doing anything. As long
as modules keep it really simple, it probably is fine. But what if
they start doing some writes to regular file system/devices in an 
attempt to save kernel logs or even some more data and that can lead
to hard lockups or further corruption and possibly resulting in kdump
not even getting a chance to run.

> 
> >   Doing this is still fine if kdump is not configured as anyway syste would
> >   have rebooted. But if kdump is configured, then we are just reducing
> >   the reliability of the operation by passing the control in the hands
> >   of unaudited code and trusting it when kernel data structures are
> >   corrupt.
> 
> At minimum, I'm fully agree we need reliable kdump. I only put a doubtness
> this removing is a real solution or not.
> 
> >   So to me, sending out kmsg_dump notifications are perfectly fine when
> >   kdump is not configured. But if it is configured, then it probably is
> >   not a good idea. Anyway, if you have configured the system to capture
> >   the full dump, why do you also need kmsg_dump. And if you are happy
> >   with kmsg_dump() then you don't need kdump. So these both seem to be
> >   mutually exclusive anyway.
> 
> Honestly, I haven't heared anyone are using both at the same time. But
> I can guess some reason. 1) If the system is very big box, kdump is
> really slooooooow operation. example Some stock exchange system have half
> terabytes memory and it mean dump delivery need to hald days at worse. But
> market should open just 9:00 at next day. So, summry information (eg log and
> trace information) spoiling is important thing. 2) Two sequence crash (ie
> crash kdump reboot-start next-crash-before-finish-reboot) can override former
> dump image. Usually admin _guess_ the reason of two are same and report boss so.
> But unfortunatelly customers at high end area often refuse a _guess_ report.
> 
> Or, it's for business competition reason. As far as I heared, IBM and HP 
> UNI*X system can save the logs both dump and special flash device. 
> 
> PS: FWIW, Hitach folks have usage idea for their enterprise purpose, but 
>     unfortunately I don't know its detail. I hope anyone tell us it.

If we keep the kmsg dump driver simple I guess things will be just fine in
most of the cases. So as I said in other mail, I am fine with kmsg
notifications being sent before crash_kexec(). Atleast we can audit in
kernel driver code and bigger worry is vendor specific outside the tree
modules. But I think we can live with that and down the line possibly
provide a tunable to change this behavior if it becomes an issue.

So until and unlesss we have a very clear reason of differentiating
between KEXEC and PANIC event, lets get rid of it. So far I have not
been able to understand what's the difference between two.

Thanks
Vivek 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Query about kdump_msg hook into crash_kexec()
  2011-02-03  5:20       ` KOSAKI Motohiro
@ 2011-02-04 15:00         ` Vivek Goyal
  2011-03-08  1:31           ` KOSAKI Motohiro
  0 siblings, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2011-02-04 15:00 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Eric W. Biederman, linux kernel mailing list, Jarod Wilson

On Thu, Feb 03, 2011 at 02:20:53PM +0900, KOSAKI Motohiro wrote:
> > AFAIK, kexec is used sneak rebooting way when the system face unexpected
> > scenario on some devices. (Some embedded is running very long time, then 
> > it can't avoid memory bit corruption. all of reset is a last resort. 
> > and a vendor gather logs at periodically checkback).
> > 
> > The main purpose of to introduce KMSG_DUMP_KEXEC is to be separate it
> > from KMSG_DUMP_PANIC. At kmsg_dump() initial patch, KMSG_DUMP_PANIC 
> > is always called both kdump is configured or not. But it's no good idea
> > the same log is to be appeared when both kexec was successed and failured.
> > Moreover someone don't want any log at kexec phase. They only want logs
> > when real panic (ie kexec failure) route. Then, I've separated it to two.
> > Two separated argument can solve above both requreiment.
> 
> A bit additional explanation, An original patch have kmsg_dump(KMSG_DUMP_PANIC)
> callsite at following point. I didn't think it makes either embedded or 
> kdump folks happiness. Thus I moved it after crash_kexec().
> 
> 
> ---------------------------------------------------------------------
> @@ -74,6 +75,7 @@ NORET_TYPE void panic(const char * fmt, ...)
>         dump_stack();
>  #endif
> 
> +       kmsg_dump(KMSG_DUMP_PANIC);
>         /*
>          * If we have crashed and we have a crash kernel loaded let it handle
>          * everything else.
>          * Do we want to call this before we try to display a message?
>          */
>         crash_kexec(NULL);
> ---------------------------------------------------------------------

And I think to compensate for that somebody introduced additional
kmsg_dump(KEXEC) call inside crash_kexec() and put it under CONFIG
option so that one can change the behavior based on config options.

I think this makes the logic somewhat twisted and an unnecessary call
inside crash_kexec(). So until and unless there is a strong reason we
can get rid of KEXEC event and move kmsg_dump call before crash_kexec()
for now and see how does it go, IMHO.

Vivek

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Query about kdump_msg hook into crash_kexec()
  2011-02-03 22:08         ` Seiji Aguchi
  2011-02-04  2:24           ` Américo Wang
@ 2011-02-08 16:46           ` Vivek Goyal
  2011-02-08 17:35             ` Eric W. Biederman
  1 sibling, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2011-02-08 16:46 UTC (permalink / raw)
  To: Seiji Aguchi
  Cc: Eric W. Biederman, KOSAKI Motohiro, linux kernel mailing list,
	Jarod Wilson

On Thu, Feb 03, 2011 at 05:08:01PM -0500, Seiji Aguchi wrote:
> Hi Eric,
> 
> Thank you for your prompt reply.
> 
> I would like to consider "Needs in enterprise area" and "Implementation of kmsg_dump()" separately.
> 
> (1) Needs in enterprise area
>   In case of kdump failure, we would like to store kernel buffer to NVRAM/flush memory
>   for detecting root cause of kernel crash.
> 
> (2) Implementation of kmsg_dump 
>   You suggest to review/test cording of kmsg_dump() more.
> 
> What do you think about (1)?
> Is it acceptable for you?

Ok, I am just trying to think loud about this problem and see if something
fruitful comes out which paves the way forward.

- So ideally we would like kdump_msg() to be called after crash_kexec() so
  that any unaudited (third party modules), unreliable calls do not 
  compromise the realiability of kdump operation.

  But hitachi folks seems to be wanting to save atleast kernel buffers
  somwhere in the NVRAM etc because they think that kdump can be
  unreliable and we might not capture any information after the crash. So
  they kind of want two mechanisms in place. One is light weight which
  tries to save kernel buffers in NVRAM and then one heavy weight one
  which tries to save the entire/filtered kernel core.

  Personally I am not too excited about the idea but I guess I can live
  with it. We can try to audit atleast in kernel module and for external
  modules we don't have much control and live with the fact that if
  modules screw up, we don't capture the dump.

 Those who don't want this behavior can do three things.

	- Disable kdump_msg() at compile time.
	- Do not load any module which registers for kdump_msg()
	- Implement a /proc tunable which allows controlling this
	  behavior.

- Ok, having said why do we want it, comes the question of how to  
  do it so that it works reasonably well.

  - There seems to be on common requirement of kmsg_dump() and kdump()
    and that is stop other cpus reliably (use nmi if possible). Can
    we try to share this code between kmsg_dump and crash_kexec(). So
    something like as follows.

	- panic happens
	- Do all the activities related to printing panic string and
	  stack dump.
	- Stop other cpus.
		- This can be probably be done with the equivalent of
		  machine_crash_shutdown() function. In fact this function
		  can probably be broken down in two parts. First part
	  	  does shutdown_prepare() where all other cpus are shot
		  down and second part can do the actual disabling of
		  LAPIC/IOAPIC and saving cpu registers etc.

		if (mutex_trylock(some_shutdown_mutex)) {
			/* setp regs, fix vmcoreinfo etc */
			crash_kexec_prepare();
			machine_shutdown_prepare();
			kdump_msg();	
			crash_kexec_execute()
			/* Also call panic_notifier_list here ? */
		}

crash_kexec_prepare () {
		crash_setup_regs(&fixed_regs, regs);
		crash_save_vmcoreinfo();
}

crash_kexec_execute() {
			/* Shutdown lapic/ioapic, save this cpu register etc */
			machine_shutdown();
			machine_kexec()
}

So basically we break down machine_shutdown() function in two parts
and start sharing common part between kdump_msg(), crash_kexec and
possibly panic_notifiers. 

If kdump is not configured, then after executing kdump_msg() and panic
notifiers, we should either be sitting in tight loop with interrupt
enabled for somebody to press Ctrl-boot or reboot system upon lapse
of panic_timeout().

Eric, does it make sense to you?

Thanks
Vivek

    

> 
> Seiji
> 
> >-----Original Message-----
> >From: Eric W. Biederman [mailto:ebiederm@xmission.com]
> >Sent: Thursday, February 03, 2011 4:13 PM
> >To: Seiji Aguchi
> >Cc: Vivek Goyal; KOSAKI Motohiro; linux kernel mailing list; Jarod Wilson
> >Subject: Re: Query about kdump_msg hook into crash_kexec()
> >
> >Seiji Aguchi <seiji.aguchi@hds.com> writes:
> >
> >> Hi,
> >>
> >>>PS: FWIW, Hitach folks have usage idea for their enterprise purpose, but
> >>>    unfortunately I don't know its detail. I hope anyone tell us it.
> >>
> >> I explain the usage of kmsg_dump(KMSG_DUMP_KEXEC) in enterprise area.
> >>
> >> [Background]
> >> In our support service experience, we always need to detect root cause
> >> of OS panic.
> >> So, customers in enterprise area never forgive us if kdump fails and
> >> we can't detect the root cause of panic due to lack of materials for
> >> investigation.
> >>
> >>>- Why do you need a notification from inside crash_kexec(). IOW, what
> >>>  is the usage of KMSG_DUMP_KEXEC.
> >>
> >>
> >> The usage of kdump(KMSG_DUMP_KEXEC) in enterprise area is getting
> >> useful information for investigating kernel crash in case kdump
> >> kernel doesn't boot.
> >>
> >> Kdump kernel may not start booting because there is a sha256 checksum
> >> verified over the kdump kernel before it starts booting.
> >> This means kdump kernel may fail even if there is no bug in kdump and
> >> we can't get any information for detecting root cause of kernel crash
> >
> >Sure it is theoretically possible that the sha256 checksum gets
> >corrupted (I have never seen it happen or heard reports of it
> >happening).  It is a feature that if someone has corrupted your code the
> >code doesn't try and run anyway and corrupt anything else.
> >
> >That you are arguing against have such a feature in the code you use to
> >write to persistent storage is scary.
> >
> >> As I mentioned in [Background], We must avoid lack of materials for
> >> investigation.
> >> So, kdump(KMSG_DUMP_KEXEC) is very important feature in enterprise
> >> area.
> >
> >That sounds wonderful, but it doesn't jive with the
> >code. kmsg_dump(KMSG_DUMP_KEXEC) when I read through it was simply not
> >written to be robust when most of the kernel is suspect.  Making it in
> >appropriate for use on the crash_kexec path.  I do not believe kmsg_dump
> >has seen any testing in kernel failure scenarios.
> >
> >There is this huge assumption that kmsg_dump is more reliable than
> >crash_kexec, from my review of the code kmsg_dump is simply not safe in
> >the context of a broken kernel.  The kmsg_dump code last I looked code
> >won't work if called with interrupts disabled.
> >
> >Furthermore kmsg_dump(KMSG_DUMP_KEXEC) is only useful for debugging
> >crash_kexec.  Which has it backwards as it is kmsg_dump that needs the
> >debugging.
> >
> >You just argued that it is better to corrupt the target of your
> >kmsg_dump in the event of a kernel failure instead of to fail silently.
> >
> >I don't want that unreliable code that wants to corrupt my jffs
> >partition anywhere near my machines.
> >
> >Eric

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Query about kdump_msg hook into crash_kexec()
  2011-02-08 16:46           ` Vivek Goyal
@ 2011-02-08 17:35             ` Eric W. Biederman
  2011-02-08 19:27               ` Vivek Goyal
  0 siblings, 1 reply; 41+ messages in thread
From: Eric W. Biederman @ 2011-02-08 17:35 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Seiji Aguchi, KOSAKI Motohiro, linux kernel mailing list, Jarod Wilson

Vivek Goyal <vgoyal@redhat.com> writes:

> On Thu, Feb 03, 2011 at 05:08:01PM -0500, Seiji Aguchi wrote:
>> Hi Eric,
>> 
>> Thank you for your prompt reply.
>> 
>> I would like to consider "Needs in enterprise area" and "Implementation of kmsg_dump()" separately.
>> 
>> (1) Needs in enterprise area
>>   In case of kdump failure, we would like to store kernel buffer to NVRAM/flush memory
>>   for detecting root cause of kernel crash.
>> 
>> (2) Implementation of kmsg_dump 
>>   You suggest to review/test cording of kmsg_dump() more.
>> 
>> What do you think about (1)?
>> Is it acceptable for you?
>
> Ok, I am just trying to think loud about this problem and see if something
> fruitful comes out which paves the way forward.
>
> - So ideally we would like kdump_msg() to be called after crash_kexec() so
>   that any unaudited (third party modules), unreliable calls do not 
>   compromise the realiability of kdump operation.
>
>   But hitachi folks seems to be wanting to save atleast kernel buffers
>   somwhere in the NVRAM etc because they think that kdump can be
>   unreliable and we might not capture any information after the crash. So
>   they kind of want two mechanisms in place. One is light weight which
>   tries to save kernel buffers in NVRAM and then one heavy weight one
>   which tries to save the entire/filtered kernel core.
>
>   Personally I am not too excited about the idea but I guess I can live
>   with it. We can try to audit atleast in kernel module and for external
>   modules we don't have much control and live with the fact that if
>   modules screw up, we don't capture the dump.
>
>  Those who don't want this behavior can do three things.
>
> 	- Disable kdump_msg() at compile time.
> 	- Do not load any module which registers for kdump_msg()
> 	- Implement a /proc tunable which allows controlling this
> 	  behavior.
>
> - Ok, having said why do we want it, comes the question of how to  
>   do it so that it works reasonably well.
>
>   - There seems to be on common requirement of kmsg_dump() and kdump()
>     and that is stop other cpus reliably (use nmi if possible). Can
>     we try to share this code between kmsg_dump and crash_kexec(). So
>     something like as follows.
>
> 	- panic happens
> 	- Do all the activities related to printing panic string and
> 	  stack dump.
> 	- Stop other cpus.
> 		- This can be probably be done with the equivalent of
> 		  machine_crash_shutdown() function. In fact this function
> 		  can probably be broken down in two parts. First part
> 	  	  does shutdown_prepare() where all other cpus are shot
> 		  down and second part can do the actual disabling of
> 		  LAPIC/IOAPIC and saving cpu registers etc.
>
> 		if (mutex_trylock(some_shutdown_mutex)) {
> 			/* setp regs, fix vmcoreinfo etc */
> 			crash_kexec_prepare();
> 			machine_shutdown_prepare();
> 			kdump_msg();	
> 			crash_kexec_execute()
> 			/* Also call panic_notifier_list here ? */
> 		}
>
> crash_kexec_prepare () {
> 		crash_setup_regs(&fixed_regs, regs);
> 		crash_save_vmcoreinfo();
> }
>
> crash_kexec_execute() {
> 			/* Shutdown lapic/ioapic, save this cpu register etc */
> 			machine_shutdown();
> 			machine_kexec()
> }
>
> So basically we break down machine_shutdown() function in two parts
> and start sharing common part between kdump_msg(), crash_kexec and
> possibly panic_notifiers. 
>
> If kdump is not configured, then after executing kdump_msg() and panic
> notifiers, we should either be sitting in tight loop with interrupt
> enabled for somebody to press Ctrl-boot or reboot system upon lapse
> of panic_timeout().
>
> Eric, does it make sense to you?

kexec on panic doesn't strictly require that we stop other cpus.

What makes sense to me at this point is for someone on the kmsg_dump
side to make a strong case that the code actually works in a crash dump
scenario.  We have lots of experience over the years that says a design
like kmsg_dump is attractive but turns out to be a unreliable piece of
junk that fails just when you need it.  Because developers only test
the case when the kernel is happy and because people share code with
the regular path drivers, and that code assumes things are happy.

I forget exactly why but last I looked.
local_irq_disable()
kmsg_dump()
local_irq_enalbe()

Was a recipe for disaster, and you have be at least that good to even
have a chance of working in a crash dump scenario.

In part I am puzzled why the kmsg dump doesn't just use the printk
interface.  Strangely enough printk works in the event of a crash and
has been shown to be reliable over the years.

Eric

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Query about kdump_msg hook into crash_kexec()
  2011-02-08 17:35             ` Eric W. Biederman
@ 2011-02-08 19:27               ` Vivek Goyal
  2011-02-08 19:58                 ` Eric W. Biederman
  0 siblings, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2011-02-08 19:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Seiji Aguchi, KOSAKI Motohiro, linux kernel mailing list, Jarod Wilson

On Tue, Feb 08, 2011 at 09:35:15AM -0800, Eric W. Biederman wrote:
> Vivek Goyal <vgoyal@redhat.com> writes:
> 
> > On Thu, Feb 03, 2011 at 05:08:01PM -0500, Seiji Aguchi wrote:
> >> Hi Eric,
> >> 
> >> Thank you for your prompt reply.
> >> 
> >> I would like to consider "Needs in enterprise area" and "Implementation of kmsg_dump()" separately.
> >> 
> >> (1) Needs in enterprise area
> >>   In case of kdump failure, we would like to store kernel buffer to NVRAM/flush memory
> >>   for detecting root cause of kernel crash.
> >> 
> >> (2) Implementation of kmsg_dump 
> >>   You suggest to review/test cording of kmsg_dump() more.
> >> 
> >> What do you think about (1)?
> >> Is it acceptable for you?
> >
> > Ok, I am just trying to think loud about this problem and see if something
> > fruitful comes out which paves the way forward.
> >
> > - So ideally we would like kdump_msg() to be called after crash_kexec() so
> >   that any unaudited (third party modules), unreliable calls do not 
> >   compromise the realiability of kdump operation.
> >
> >   But hitachi folks seems to be wanting to save atleast kernel buffers
> >   somwhere in the NVRAM etc because they think that kdump can be
> >   unreliable and we might not capture any information after the crash. So
> >   they kind of want two mechanisms in place. One is light weight which
> >   tries to save kernel buffers in NVRAM and then one heavy weight one
> >   which tries to save the entire/filtered kernel core.
> >
> >   Personally I am not too excited about the idea but I guess I can live
> >   with it. We can try to audit atleast in kernel module and for external
> >   modules we don't have much control and live with the fact that if
> >   modules screw up, we don't capture the dump.
> >
> >  Those who don't want this behavior can do three things.
> >
> > 	- Disable kdump_msg() at compile time.
> > 	- Do not load any module which registers for kdump_msg()
> > 	- Implement a /proc tunable which allows controlling this
> > 	  behavior.
> >
> > - Ok, having said why do we want it, comes the question of how to  
> >   do it so that it works reasonably well.
> >
> >   - There seems to be on common requirement of kmsg_dump() and kdump()
> >     and that is stop other cpus reliably (use nmi if possible). Can
> >     we try to share this code between kmsg_dump and crash_kexec(). So
> >     something like as follows.
> >
> > 	- panic happens
> > 	- Do all the activities related to printing panic string and
> > 	  stack dump.
> > 	- Stop other cpus.
> > 		- This can be probably be done with the equivalent of
> > 		  machine_crash_shutdown() function. In fact this function
> > 		  can probably be broken down in two parts. First part
> > 	  	  does shutdown_prepare() where all other cpus are shot
> > 		  down and second part can do the actual disabling of
> > 		  LAPIC/IOAPIC and saving cpu registers etc.
> >
> > 		if (mutex_trylock(some_shutdown_mutex)) {
> > 			/* setp regs, fix vmcoreinfo etc */
> > 			crash_kexec_prepare();
> > 			machine_shutdown_prepare();
> > 			kdump_msg();	
> > 			crash_kexec_execute()
> > 			/* Also call panic_notifier_list here ? */
> > 		}
> >
> > crash_kexec_prepare () {
> > 		crash_setup_regs(&fixed_regs, regs);
> > 		crash_save_vmcoreinfo();
> > }
> >
> > crash_kexec_execute() {
> > 			/* Shutdown lapic/ioapic, save this cpu register etc */
> > 			machine_shutdown();
> > 			machine_kexec()
> > }
> >
> > So basically we break down machine_shutdown() function in two parts
> > and start sharing common part between kdump_msg(), crash_kexec and
> > possibly panic_notifiers. 
> >
> > If kdump is not configured, then after executing kdump_msg() and panic
> > notifiers, we should either be sitting in tight loop with interrupt
> > enabled for somebody to press Ctrl-boot or reboot system upon lapse
> > of panic_timeout().
> >
> > Eric, does it make sense to you?
> 
> kexec on panic doesn't strictly require that we stop other cpus.

Yes but it is desirable.

- We don't want cpus to be scribbling on old memory and possibly on
  new kernel's memory also in case of corrupted pointer and crash
  the freshly booted kernel (New kerenl's memory is mapped in old
  kernel)

- We don't want other cpus to complete panic() and jump to BIOS or
  lead to some kind of triple fault and reset the system etc.

So that would suggest to be robust, stopping other cpus is required.

On a side note, kdump_msg() hook is present in emergency_reboot() too.
So if these paths are not properly designed, then system might not
even reboot automatically even if panic_timeout() has been specified.

> 
> What makes sense to me at this point is for someone on the kmsg_dump
> side to make a strong case that the code actually works in a crash dump
> scenario.  We have lots of experience over the years that says a design
> like kmsg_dump is attractive but turns out to be a unreliable piece of
> junk that fails just when you need it.  Because developers only test
> the case when the kernel is happy and because people share code with
> the regular path drivers, and that code assumes things are happy.
> 
> I forget exactly why but last I looked.
> local_irq_disable()
> kmsg_dump()
> local_irq_enalbe()

I agree that kdump_msg() code should be able to work with interrupts
disabled, atleast. 

There seem to be two pieces to it. One is generic call which calls
all the dumpers and then respective dumpers.

Looking at generic dumping call, I can't think why does it need interrupts
to be enabled. There is one spin_lock() and then rcu_read_lock(). That's
it.

Regarding mtdoops, it is hard to tell. There is lot of code. But the good
thing is that they have introduced a separate write path for panic context.
That way atleast one can do special casing in panic path to avoid
taking locks and not be dependent on interrupts.

ramoops seems to be simple. It seems to be just memcpy() except
do_gettimeofday(). I noticed that in the past you raised concern about usage
of do_gettimeofday(), but I am not sure what is the concern here (silly
question i guess).

So to me, ramoops seems to be simple memcpy (atleast in principle) and
mtdoops has a dedicated path for handling panic context. So atleast
it is fixable for possible issues. Generic code seems harmless to me
at this point of time.
 
> 
> Was a recipe for disaster, and you have be at least that good to even
> have a chance of working in a crash dump scenario.
> 
> In part I am puzzled why the kmsg dump doesn't just use the printk
> interface.  Strangely enough printk works in the event of a crash and
> has been shown to be reliable over the years.

Are you suggesting implementing these things as console driver and
register with printk as console? Sounds interesting. I think one of
the issues they probably will find that they don't want to log everything.
They want to do selective logging. Not sure how would they get this
info.

In some cases like emergency_restart(), there are no printk() and they
just consider it one of the events to dump the kernel buffer contents.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Query about kdump_msg hook into crash_kexec()
  2011-02-08 19:27               ` Vivek Goyal
@ 2011-02-08 19:58                 ` Eric W. Biederman
  0 siblings, 0 replies; 41+ messages in thread
From: Eric W. Biederman @ 2011-02-08 19:58 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Seiji Aguchi, KOSAKI Motohiro, linux kernel mailing list, Jarod Wilson

Vivek Goyal <vgoyal@redhat.com> writes:

> On Tue, Feb 08, 2011 at 09:35:15AM -0800, Eric W. Biederman wrote:
>> Vivek Goyal <vgoyal@redhat.com> writes:
>> 
>> > On Thu, Feb 03, 2011 at 05:08:01PM -0500, Seiji Aguchi wrote:
>> >> Hi Eric,
>> >> 
>> >> Thank you for your prompt reply.
>> >> 
>> >> I would like to consider "Needs in enterprise area" and "Implementation of kmsg_dump()" separately.
>> >> 
>> >> (1) Needs in enterprise area
>> >>   In case of kdump failure, we would like to store kernel buffer to NVRAM/flush memory
>> >>   for detecting root cause of kernel crash.
>> >> 
>> >> (2) Implementation of kmsg_dump 
>> >>   You suggest to review/test cording of kmsg_dump() more.
>> >> 
>> >> What do you think about (1)?
>> >> Is it acceptable for you?
>> >
>> > Ok, I am just trying to think loud about this problem and see if something
>> > fruitful comes out which paves the way forward.
>> >
>> > - So ideally we would like kdump_msg() to be called after crash_kexec() so
>> >   that any unaudited (third party modules), unreliable calls do not 
>> >   compromise the realiability of kdump operation.
>> >
>> >   But hitachi folks seems to be wanting to save atleast kernel buffers
>> >   somwhere in the NVRAM etc because they think that kdump can be
>> >   unreliable and we might not capture any information after the crash. So
>> >   they kind of want two mechanisms in place. One is light weight which
>> >   tries to save kernel buffers in NVRAM and then one heavy weight one
>> >   which tries to save the entire/filtered kernel core.
>> >
>> >   Personally I am not too excited about the idea but I guess I can live
>> >   with it. We can try to audit atleast in kernel module and for external
>> >   modules we don't have much control and live with the fact that if
>> >   modules screw up, we don't capture the dump.
>> >
>> >  Those who don't want this behavior can do three things.
>> >
>> > 	- Disable kdump_msg() at compile time.
>> > 	- Do not load any module which registers for kdump_msg()
>> > 	- Implement a /proc tunable which allows controlling this
>> > 	  behavior.
>> >
>> > - Ok, having said why do we want it, comes the question of how to  
>> >   do it so that it works reasonably well.
>> >
>> >   - There seems to be on common requirement of kmsg_dump() and kdump()
>> >     and that is stop other cpus reliably (use nmi if possible). Can
>> >     we try to share this code between kmsg_dump and crash_kexec(). So
>> >     something like as follows.
>> >
>> > 	- panic happens
>> > 	- Do all the activities related to printing panic string and
>> > 	  stack dump.
>> > 	- Stop other cpus.
>> > 		- This can be probably be done with the equivalent of
>> > 		  machine_crash_shutdown() function. In fact this function
>> > 		  can probably be broken down in two parts. First part
>> > 	  	  does shutdown_prepare() where all other cpus are shot
>> > 		  down and second part can do the actual disabling of
>> > 		  LAPIC/IOAPIC and saving cpu registers etc.
>> >
>> > 		if (mutex_trylock(some_shutdown_mutex)) {
>> > 			/* setp regs, fix vmcoreinfo etc */
>> > 			crash_kexec_prepare();
>> > 			machine_shutdown_prepare();
>> > 			kdump_msg();	
>> > 			crash_kexec_execute()
>> > 			/* Also call panic_notifier_list here ? */
>> > 		}
>> >
>> > crash_kexec_prepare () {
>> > 		crash_setup_regs(&fixed_regs, regs);
>> > 		crash_save_vmcoreinfo();
>> > }
>> >
>> > crash_kexec_execute() {
>> > 			/* Shutdown lapic/ioapic, save this cpu register etc */
>> > 			machine_shutdown();
>> > 			machine_kexec()
>> > }
>> >
>> > So basically we break down machine_shutdown() function in two parts
>> > and start sharing common part between kdump_msg(), crash_kexec and
>> > possibly panic_notifiers. 
>> >
>> > If kdump is not configured, then after executing kdump_msg() and panic
>> > notifiers, we should either be sitting in tight loop with interrupt
>> > enabled for somebody to press Ctrl-boot or reboot system upon lapse
>> > of panic_timeout().
>> >
>> > Eric, does it make sense to you?
>> 
>> kexec on panic doesn't strictly require that we stop other cpus.
>
> Yes but it is desirable.
>
> - We don't want cpus to be scribbling on old memory and possibly on
>   new kernel's memory also in case of corrupted pointer and crash
>   the freshly booted kernel (New kerenl's memory is mapped in old
>   kernel)
>
> - We don't want other cpus to complete panic() and jump to BIOS or
>   lead to some kind of triple fault and reset the system etc.
>
> So that would suggest to be robust, stopping other cpus is required.
>
> On a side note, kdump_msg() hook is present in emergency_reboot() too.
> So if these paths are not properly designed, then system might not
> even reboot automatically even if panic_timeout() has been specified.

Ouch! Ouch! Ouch!

emergency_restart is just supposed to be the restart bits, so
that it always works!  We wouldn't need to use emergency restart
if we could trust the normal hardware shutdown code.  Sigh.

I guess this is another issue I have with the design of kmsg_dump.
Instead of being a good citizen like everyone else and placing their
hooks explicitly where they can be seen.  The kmsg_dump hooks were
buried in other routines changing the semantics of those routines.

What makes this scary is the modified routines are deliberately simple
so they are robust and can be audited and now on those code paths we
have the hooks that are so big everyone gives up before reading and
understanding all of the code involved.

>> What makes sense to me at this point is for someone on the kmsg_dump
>> side to make a strong case that the code actually works in a crash dump
>> scenario.  We have lots of experience over the years that says a design
>> like kmsg_dump is attractive but turns out to be a unreliable piece of
>> junk that fails just when you need it.  Because developers only test
>> the case when the kernel is happy and because people share code with
>> the regular path drivers, and that code assumes things are happy.
>> 
>> I forget exactly why but last I looked.
>> local_irq_disable()
>> kmsg_dump()
>> local_irq_enalbe()
>
> I agree that kdump_msg() code should be able to work with interrupts
> disabled, atleast. 
>
> There seem to be two pieces to it. One is generic call which calls
> all the dumpers and then respective dumpers.
>
> Looking at generic dumping call, I can't think why does it need interrupts
> to be enabled. There is one spin_lock() and then rcu_read_lock(). That's
> it.
>
> Regarding mtdoops, it is hard to tell. There is lot of code. But the good
> thing is that they have introduced a separate write path for panic context.
> That way atleast one can do special casing in panic path to avoid
> taking locks and not be dependent on interrupts.

I think that is part of my problem.   There is a lot of code so it is
really hard to audit, and really hard to trust.  I'm still leary with
the amount of code on the kexec on panic code path.

> ramoops seems to be simple. It seems to be just memcpy() except
> do_gettimeofday(). I noticed that in the past you raised concern about usage
> of do_gettimeofday(), but I am not sure what is the concern here (silly
> question i guess).
>
> So to me, ramoops seems to be simple memcpy (atleast in principle) and
> mtdoops has a dedicated path for handling panic context. So atleast
> it is fixable for possible issues. Generic code seems harmless to me
> at this point of time.
>  
>> 
>> Was a recipe for disaster, and you have be at least that good to even
>> have a chance of working in a crash dump scenario.
>> 
>> In part I am puzzled why the kmsg dump doesn't just use the printk
>> interface.  Strangely enough printk works in the event of a crash and
>> has been shown to be reliable over the years.
>
> Are you suggesting implementing these things as console driver and
> register with printk as console? Sounds interesting. I think one of
> the issues they probably will find that they don't want to log everything.
> They want to do selective logging. Not sure how would they get this
> info.
>
> In some cases like emergency_restart(), there are no printk() and they
> just consider it one of the events to dump the kernel buffer contents.

Sorry.  I am tired and I'm really not ready to deal with yet another
mechanism like this, that is over designed and was flaky the first time
I looked at it.  That flakiness combined with the authors not stepping
forward and rushing off to fix things or telling me they have fixed
things, means that I don't have any confidence in the mechanism.

Vivek if you want to step forward and be one of the people taking care
of kmsg_dump, then this conversation might have a point.

What I know for certain is that kmsg_dump and kexec on panic are
used in pretty exclusive ways, so getting the kmsg_dump code out
of crash_kexec does not hurt anyone today.

Right now I can't even get 2.6.38-rc4 to compile a reasonably sized
program, so I see a lot worse regressions to deal with than something
I can just avoid for now.

Eric

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Query about kdump_msg hook into crash_kexec()
  2011-02-04 15:00         ` Vivek Goyal
@ 2011-03-08  1:31           ` KOSAKI Motohiro
  0 siblings, 0 replies; 41+ messages in thread
From: KOSAKI Motohiro @ 2011-03-08  1:31 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kosaki.motohiro, Eric W. Biederman, linux kernel mailing list,
	Jarod Wilson

I'm sorry I've missed this mail long time.

> > ---------------------------------------------------------------------
> > @@ -74,6 +75,7 @@ NORET_TYPE void panic(const char * fmt, ...)
> >         dump_stack();
> >  #endif
> > 
> > +       kmsg_dump(KMSG_DUMP_PANIC);
> >         /*
> >          * If we have crashed and we have a crash kernel loaded let it handle
> >          * everything else.
> >          * Do we want to call this before we try to display a message?
> >          */
> >         crash_kexec(NULL);
> > ---------------------------------------------------------------------
> 
> And I think to compensate for that somebody introduced additional
> kmsg_dump(KEXEC) call inside crash_kexec() and put it under CONFIG
> option so that one can change the behavior based on config options.
> 
> I think this makes the logic somewhat twisted and an unnecessary call
> inside crash_kexec(). So until and unless there is a strong reason we
> can get rid of KEXEC event and move kmsg_dump call before crash_kexec()
> for now and see how does it go, IMHO.

I think I can agree your proposal. But could you please explain why do
you think kmsg _before_ kdump and kmsg _in_ kdump are so different?
I think it is only C level difference. CPU don't care C function and
anyway the kernel call kmsg_dump() because invoke second kernel even 
if you proposal applied.

It is only curious. I'm not against your proposal.

Thanks.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec())
  2011-02-03  4:53             ` KOSAKI Motohiro
@ 2011-05-26 20:10               ` Andrew Morton
  2011-05-28  1:43                 ` Eric W. Biederman
                                   ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Andrew Morton @ 2011-05-26 20:10 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Vivek Goyal, Americo Wang, Eric W. Biederman,
	linux kernel mailing list, Jarod Wilson

On Thu,  3 Feb 2011 13:53:01 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> > > I wrote why this is no good idea by another mail. Please see it.
> > > Anyway you have a right to don't use this feature.
> > > 
> > 
> > But you have not explained that why do you need to hook into crash_kexec()
> > and you have also not explained why do you need to send out kdump_msg()
> > notification if kdump is configured.
> > 
> > Some detailed explanation here would help.
> 
> Hi,
> 
> I send it you now :)
> 

What happened with this?  kexec-remove-kmsg_dump_kexec.patch has two acks
and one unexplained nack :(

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec())
  2011-05-26 20:10               ` Andrew Morton
@ 2011-05-28  1:43                 ` Eric W. Biederman
  2011-05-30  7:30                   ` Américo Wang
  2011-05-30  5:13                 ` KOSAKI Motohiro
  2011-05-31 20:58                 ` Seiji Aguchi
  2 siblings, 1 reply; 41+ messages in thread
From: Eric W. Biederman @ 2011-05-28  1:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, Vivek Goyal, Americo Wang,
	linux kernel mailing list, Jarod Wilson

Andrew Morton <akpm@linux-foundation.org> writes:

> On Thu,  3 Feb 2011 13:53:01 +0900 (JST)
> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
>
>> > > I wrote why this is no good idea by another mail. Please see it.
>> > > Anyway you have a right to don't use this feature.
>> > > 
>> > 
>> > But you have not explained that why do you need to hook into crash_kexec()
>> > and you have also not explained why do you need to send out kdump_msg()
>> > notification if kdump is configured.
>> > 
>> > Some detailed explanation here would help.
>> 
>> Hi,
>> 
>> I send it you now :)
>> 
>
> What happened with this?  kexec-remove-kmsg_dump_kexec.patch has two acks
> and one unexplained nack :(

As I recall the nack was based on a theoretical use case, and a cleanup
of kmsg_dump to make it more robust which I don't believe has happened,
instead of something real.

My feel is that we should remove kmsg_dump and if a real use case comes
up reconsider it at that time.

I don't think anyone cares too strongly at the moment because the
features are not expected to be used in conjunction with each other, nor
even expected to be compiled into the same kernel.  However given that
they are not used to be used in conjunction with each other a call into
kmsg_dump from crash_kexec is really just cluttering the code for no
benefit to anyone.

I do believe kmsg_dump suffers from all of the same problems lkdtm
suffered from long ago which is it only works in a working kernel,
and it is unlikely to tell you anything on system failure.

Eric


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec())
  2011-05-26 20:10               ` Andrew Morton
  2011-05-28  1:43                 ` Eric W. Biederman
@ 2011-05-30  5:13                 ` KOSAKI Motohiro
  2011-05-31 21:51                   ` Vivek Goyal
  2011-05-31 20:58                 ` Seiji Aguchi
  2 siblings, 1 reply; 41+ messages in thread
From: KOSAKI Motohiro @ 2011-05-30  5:13 UTC (permalink / raw)
  To: akpm; +Cc: vgoyal, xiyou.wangcong, ebiederm, linux-kernel, jwilson

(2011/05/27 5:10), Andrew Morton wrote:
> On Thu,  3 Feb 2011 13:53:01 +0900 (JST)
> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> 
>>>> I wrote why this is no good idea by another mail. Please see it.
>>>> Anyway you have a right to don't use this feature.
>>>>
>>>
>>> But you have not explained that why do you need to hook into crash_kexec()
>>> and you have also not explained why do you need to send out kdump_msg()
>>> notification if kdump is configured.
>>>
>>> Some detailed explanation here would help.
>>
>> Hi,
>>
>> I send it you now :)
>>
> 
> What happened with this?  kexec-remove-kmsg_dump_kexec.patch has two acks
> and one unexplained nack :(

http://groups.google.com/group/linux.kernel/browse_thread/thread/1084f406573d76ac/ee19e34b45f83536?lnk=raot&pli=1

At last mail, Vivek proposed move kms_dump() instead remove. and I asked following question and
I've got no response. I'm still waiting his.


> I'm sorry I've missed this mail long time. 
> 
>> > --------------------------------------------------------------------- 
>> > @@ -74,6 +75,7 @@ NORET_TYPE void panic(const char * fmt, ...) 
>> >         dump_stack(); 
>> >  #endif 
>> > +       kmsg_dump(KMSG_DUMP_PANIC); 
>> >         /* 
>> >          * If we have crashed and we have a crash kernel loaded let it handle 
>> >          * everything else. 
>> >          * Do we want to call this before we try to display a message? 
>> >          */ 
>> >         crash_kexec(NULL); 
>> > --------------------------------------------------------------------- 
>> And I think to compensate for that somebody introduced additional 
>> kmsg_dump(KEXEC) call inside crash_kexec() and put it under CONFIG 
>> option so that one can change the behavior based on config options. 
>> I think this makes the logic somewhat twisted and an unnecessary call 
>> inside crash_kexec(). So until and unless there is a strong reason we 
>> can get rid of KEXEC event and move kmsg_dump call before crash_kexec() 
>> for now and see how does it go, IMHO. 
> 
> 
> I think I can agree your proposal. But could you please explain why do 
> you think kmsg _before_ kdump and kmsg _in_ kdump are so different? 
> I think it is only C level difference. CPU don't care C function and 
> anyway the kernel call kmsg_dump() because invoke second kernel even 
> if you proposal applied. 
> It is only curious. I'm not against your proposal. 
> Thanks. 







^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec())
  2011-05-28  1:43                 ` Eric W. Biederman
@ 2011-05-30  7:30                   ` Américo Wang
  0 siblings, 0 replies; 41+ messages in thread
From: Américo Wang @ 2011-05-30  7:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, KOSAKI Motohiro, Vivek Goyal,
	linux kernel mailing list, Jarod Wilson

On Sat, May 28, 2011 at 9:43 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Andrew Morton <akpm@linux-foundation.org> writes:
>
>> On Thu,  3 Feb 2011 13:53:01 +0900 (JST)
>> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
>>
>>> > > I wrote why this is no good idea by another mail. Please see it.
>>> > > Anyway you have a right to don't use this feature.
>>> > >
>>> >
>>> > But you have not explained that why do you need to hook into crash_kexec()
>>> > and you have also not explained why do you need to send out kdump_msg()
>>> > notification if kdump is configured.
>>> >
>>> > Some detailed explanation here would help.
>>>
>>> Hi,
>>>
>>> I send it you now :)
>>>
>>
>> What happened with this?  kexec-remove-kmsg_dump_kexec.patch has two acks
>> and one unexplained nack :(
>
> As I recall the nack was based on a theoretical use case, and a cleanup
> of kmsg_dump to make it more robust which I don't believe has happened,
> instead of something real.
>
> My feel is that we should remove kmsg_dump and if a real use case comes
> up reconsider it at that time.
>
> I don't think anyone cares too strongly at the moment because the
> features are not expected to be used in conjunction with each other, nor
> even expected to be compiled into the same kernel.  However given that
> they are not used to be used in conjunction with each other a call into
> kmsg_dump from crash_kexec is really just cluttering the code for no
> benefit to anyone.

Seiji once proposed NVRAM which uses kmsg_dump, he wants
to have another way to store panic information when kdump doesn't
work which is _not_ rare in the real world.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec())
  2011-05-26 20:10               ` Andrew Morton
  2011-05-28  1:43                 ` Eric W. Biederman
  2011-05-30  5:13                 ` KOSAKI Motohiro
@ 2011-05-31 20:58                 ` Seiji Aguchi
  2011-05-31 21:37                   ` Vivek Goyal
  2 siblings, 1 reply; 41+ messages in thread
From: Seiji Aguchi @ 2011-05-31 20:58 UTC (permalink / raw)
  To: Andrew Morton, KOSAKI Motohiro
  Cc: Vivek Goyal, Americo Wang, Eric W. Biederman,
	linux kernel mailing list, Jarod Wilson

Hi,

As Americo said, sometimes kdump doesn't work in the real world.

So, I would like to keep discussing implementation of kmsg_dump(KMSG_DUMP_KEXEC)
rather than just removing it.

Seiji

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec())
  2011-05-31 20:58                 ` Seiji Aguchi
@ 2011-05-31 21:37                   ` Vivek Goyal
  2011-05-31 22:24                     ` Seiji Aguchi
  0 siblings, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2011-05-31 21:37 UTC (permalink / raw)
  To: Seiji Aguchi
  Cc: Andrew Morton, KOSAKI Motohiro, Americo Wang, Eric W. Biederman,
	linux kernel mailing list, Jarod Wilson

On Tue, May 31, 2011 at 04:58:00PM -0400, Seiji Aguchi wrote:
> Hi,
> 
> As Americo said, sometimes kdump doesn't work in the real world.
> 
> So, I would like to keep discussing implementation of kmsg_dump(KMSG_DUMP_KEXEC)
> rather than just removing it.

What are you using kmsg_dump() for? Using mtdoops, ramoops or something
else?  Is it working reliably for you?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec())
  2011-05-30  5:13                 ` KOSAKI Motohiro
@ 2011-05-31 21:51                   ` Vivek Goyal
  2011-06-09 11:00                     ` KOSAKI Motohiro
  0 siblings, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2011-05-31 21:51 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: akpm, xiyou.wangcong, ebiederm, linux-kernel, jwilson

On Mon, May 30, 2011 at 02:13:33PM +0900, KOSAKI Motohiro wrote:
> (2011/05/27 5:10), Andrew Morton wrote:
> > On Thu,  3 Feb 2011 13:53:01 +0900 (JST)
> > KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> > 
> >>>> I wrote why this is no good idea by another mail. Please see it.
> >>>> Anyway you have a right to don't use this feature.
> >>>>
> >>>
> >>> But you have not explained that why do you need to hook into crash_kexec()
> >>> and you have also not explained why do you need to send out kdump_msg()
> >>> notification if kdump is configured.
> >>>
> >>> Some detailed explanation here would help.
> >>
> >> Hi,
> >>
> >> I send it you now :)
> >>
> > 
> > What happened with this?  kexec-remove-kmsg_dump_kexec.patch has two acks
> > and one unexplained nack :(
> 
> http://groups.google.com/group/linux.kernel/browse_thread/thread/1084f406573d76ac/ee19e34b45f83536?lnk=raot&pli=1
> 
> At last mail, Vivek proposed move kms_dump() instead remove. and I asked following question and
> I've got no response. I'm still waiting his.
> 
> 
> > I'm sorry I've missed this mail long time. 
> > 
> >> > --------------------------------------------------------------------- 
> >> > @@ -74,6 +75,7 @@ NORET_TYPE void panic(const char * fmt, ...) 
> >> >         dump_stack(); 
> >> >  #endif 
> >> > +       kmsg_dump(KMSG_DUMP_PANIC); 
> >> >         /* 
> >> >          * If we have crashed and we have a crash kernel loaded let it handle 
> >> >          * everything else. 
> >> >          * Do we want to call this before we try to display a message? 
> >> >          */ 
> >> >         crash_kexec(NULL); 
> >> > --------------------------------------------------------------------- 
> >> And I think to compensate for that somebody introduced additional 
> >> kmsg_dump(KEXEC) call inside crash_kexec() and put it under CONFIG 
> >> option so that one can change the behavior based on config options. 
> >> I think this makes the logic somewhat twisted and an unnecessary call 
> >> inside crash_kexec(). So until and unless there is a strong reason we 
> >> can get rid of KEXEC event and move kmsg_dump call before crash_kexec() 
> >> for now and see how does it go, IMHO. 
> > 
> > 
> > I think I can agree your proposal. But could you please explain why do 
> > you think kmsg _before_ kdump and kmsg _in_ kdump are so different? 
> > I think it is only C level difference. CPU don't care C function and 
> > anyway the kernel call kmsg_dump() because invoke second kernel even 
> > if you proposal applied. 
> > It is only curious. I'm not against your proposal. 
> > Thanks. 

Few reasons.

- There is no correlation between crash_kexec() and kdump_msg(). What
  you are creating is equivalent of panic notifiers and calling those
  notifiers before dump happened. So calling it inside of crash_kexec()
  does not make much sense from code point of view.

- Why does somebody need to keep track of event KMSG_DUMP_KEXEC?

- There is one kernel CONFIG option introduce which looks completely
  superfluous.

My general take on the whole issue.

- In general I think exporting a hook to module so that they can do
  anything before crash is a bad idea. Now this can be overloaded to
  do things like sending crash notifications in clustered environement
  where we recommend doing it in second kernel.

- Even if we really have to do it, there seemed to be two concern
  areas.

	- Reliability of kdump_msg() generic infrastructure and its
  	  capability in terms of handling races with other cpus and
	  NMIs.

	- Reliability of module which is getting the callback from
	  kdump_msg().

 I think in one of the mails I was discussing that common infrastructure
 between kdump and kmsg_dump() can be put in a separate function, like
 stopping all cpus etc to avoid races in generic infrastrucutre and
 then first we can all kmsg_dump() and then crash_kexec().

 But this still does not provide us any protection against modules getting
 control after crash and possiblly worsen the situation.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec())
  2011-05-31 21:37                   ` Vivek Goyal
@ 2011-05-31 22:24                     ` Seiji Aguchi
  2011-06-02  3:26                       ` Eric W. Biederman
  0 siblings, 1 reply; 41+ messages in thread
From: Seiji Aguchi @ 2011-05-31 22:24 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andrew Morton, KOSAKI Motohiro, Americo Wang, Eric W. Biederman,
	linux kernel mailing list, Jarod Wilson

Hi,

>What are you using kmsg_dump() for? Using mtdoops, ramoops or something
>else?  Is it working reliably for you?

I plan to use kmsg_dump() for set_variable service of UEFI.
I proposed a prototype patch this month and will improve it.
(kmsg_dump is used inside pstore.)

https://lkml.org/lkml/2011/5/10/340

Seiji


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec())
  2011-05-31 22:24                     ` Seiji Aguchi
@ 2011-06-02  3:26                       ` Eric W. Biederman
  2011-06-08  0:00                         ` Andrew Morton
  2011-06-09 11:15                         ` KOSAKI Motohiro
  0 siblings, 2 replies; 41+ messages in thread
From: Eric W. Biederman @ 2011-06-02  3:26 UTC (permalink / raw)
  To: Seiji Aguchi
  Cc: Vivek Goyal, Andrew Morton, KOSAKI Motohiro, Americo Wang,
	linux kernel mailing list, Jarod Wilson

Seiji Aguchi <seiji.aguchi@hds.com> writes:

> Hi,
>
>>What are you using kmsg_dump() for? Using mtdoops, ramoops or something
>>else?  Is it working reliably for you?
>
> I plan to use kmsg_dump() for set_variable service of UEFI.
> I proposed a prototype patch this month and will improve it.
> (kmsg_dump is used inside pstore.)
>
> https://lkml.org/lkml/2011/5/10/340

Shudder.  Firmware calls in the crash path.

If that is the use, we need to remove the kmsg_dump(KMSG_DUMP_KEXEC)
hook from crash_kexec yesterday.  It is leading to some really ludicrous
suggestions that are on the way from making kexec on panic unreliable
and useless.

There will always be EFI implementations where that will not work and
there will be no way we can fix those.

There is a long history of people trying to do things in a crashing
kernel, things that simply do not work when the system is in a bad
state.  kmsg_dump() when I reviewed the code had significant
implementation problems for being called from interrupt handlers
and the like.

To introduce a different solution for capturing information when a
kernel crashes we need to see numbers that in a large number of
situations that the mechanism you are proposing is more reliable and/or
more maintainable than the current kexec on panic implementation.

The best work I know of on the reliability of the current situation
is "Evaluating Linux Kernel Crash Dumping Mechanisms", by Fernando Luis Vazquez Cao.
http://www.linuxsymposium.org/archives/OLS/Reprints-2006/cao-reprint.pdf


Now it does happen to be a fact that our efi support in linux is
so buggy kexec does not work let alone kexec on panic (if the target
kernel has any efi support).   But our efi support being buggy is not
a reason to add more ways to fail when we have a kernel with efi
support.  It is an argument to remove our excessive use of EFI
calls.

So let's just remove the ridiculous kmsg_dump(KMSG_DUMP_KEXEC) hook from
crash_kexec and remove any temptation for abuses like wanting to use
kmsg_dump() on anything but a deeply embedded system where there simply
is not enough memory for 2 kernels.

Eric

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec())
  2011-06-02  3:26                       ` Eric W. Biederman
@ 2011-06-08  0:00                         ` Andrew Morton
  2011-06-09 11:15                         ` KOSAKI Motohiro
  1 sibling, 0 replies; 41+ messages in thread
From: Andrew Morton @ 2011-06-08  0:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Seiji Aguchi, Vivek Goyal, KOSAKI Motohiro, Americo Wang,
	linux kernel mailing list, Jarod Wilson

On Wed, 01 Jun 2011 20:26:20 -0700
ebiederm@xmission.com (Eric W. Biederman) wrote:

> >>What are you using kmsg_dump() for? Using mtdoops, ramoops or something
> >>else?  Is it working reliably for you?
> >
> > I plan to use kmsg_dump() for set_variable service of UEFI.
> > I proposed a prototype patch this month and will improve it.
> > (kmsg_dump is used inside pstore.)
> >
> > https://lkml.org/lkml/2011/5/10/340
> 
> Shudder.  Firmware calls in the crash path.
> 
> If that is the use, we need to remove the kmsg_dump(KMSG_DUMP_KEXEC)
> hook from crash_kexec yesterday.  It is leading to some really ludicrous
> suggestions that are on the way from making kexec on panic unreliable
> and useless.
> 
> There will always be EFI implementations where that will not work and
> there will be no way we can fix those.
> 
> There is a long history of people trying to do things in a crashing
> kernel, things that simply do not work when the system is in a bad
> state.  kmsg_dump() when I reviewed the code had significant
> implementation problems for being called from interrupt handlers
> and the like.
> 
> To introduce a different solution for capturing information when a
> kernel crashes we need to see numbers that in a large number of
> situations that the mechanism you are proposing is more reliable and/or
> more maintainable than the current kexec on panic implementation.
> 
> The best work I know of on the reliability of the current situation
> is "Evaluating Linux Kernel Crash Dumping Mechanisms", by Fernando Luis Vazquez Cao.
> http://www.linuxsymposium.org/archives/OLS/Reprints-2006/cao-reprint.pdf
> 
> 
> Now it does happen to be a fact that our efi support in linux is
> so buggy kexec does not work let alone kexec on panic (if the target
> kernel has any efi support).   But our efi support being buggy is not
> a reason to add more ways to fail when we have a kernel with efi
> support.  It is an argument to remove our excessive use of EFI
> calls.
> 
> So let's just remove the ridiculous kmsg_dump(KMSG_DUMP_KEXEC) hook from
> crash_kexec and remove any temptation for abuses like wanting to use
> kmsg_dump() on anything but a deeply embedded system where there simply
> is not enough memory for 2 kernels.
> 

So am I allowed to merge kexec-remove-kmsg_dump_kexec.patch yet?


From: WANG Cong <xiyou.wangcong@gmail.com>

KMSG_DUMP_KEXEC is useless because we already save kernel messages inside
/proc/vmcore, and it is unsafe to allow modules to do other stuffs in a
crash dump scenario.

[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Reported-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/powerpc/platforms/pseries/nvram.c |    1 -
 drivers/char/ramoops.c                 |    3 +--
 drivers/mtd/mtdoops.c                  |    3 +--
 include/linux/kmsg_dump.h              |    1 -
 kernel/kexec.c                         |    3 ---
 5 files changed, 2 insertions(+), 9 deletions(-)

diff -puN drivers/char/ramoops.c~kexec-remove-kmsg_dump_kexec drivers/char/ramoops.c
--- a/drivers/char/ramoops.c~kexec-remove-kmsg_dump_kexec
+++ a/drivers/char/ramoops.c
@@ -69,8 +69,7 @@ static void ramoops_do_dump(struct kmsg_
 	struct timeval timestamp;
 
 	if (reason != KMSG_DUMP_OOPS &&
-	    reason != KMSG_DUMP_PANIC &&
-	    reason != KMSG_DUMP_KEXEC)
+	    reason != KMSG_DUMP_PANIC)
 		return;
 
 	/* Only dump oopses if dump_oops is set */
diff -puN drivers/mtd/mtdoops.c~kexec-remove-kmsg_dump_kexec drivers/mtd/mtdoops.c
--- a/drivers/mtd/mtdoops.c~kexec-remove-kmsg_dump_kexec
+++ a/drivers/mtd/mtdoops.c
@@ -308,8 +308,7 @@ static void mtdoops_do_dump(struct kmsg_
 	char *dst;
 
 	if (reason != KMSG_DUMP_OOPS &&
-	    reason != KMSG_DUMP_PANIC &&
-	    reason != KMSG_DUMP_KEXEC)
+	    reason != KMSG_DUMP_PANIC)
 		return;
 
 	/* Only dump oopses if dump_oops is set */
diff -puN include/linux/kmsg_dump.h~kexec-remove-kmsg_dump_kexec include/linux/kmsg_dump.h
--- a/include/linux/kmsg_dump.h~kexec-remove-kmsg_dump_kexec
+++ a/include/linux/kmsg_dump.h
@@ -18,7 +18,6 @@
 enum kmsg_dump_reason {
 	KMSG_DUMP_OOPS,
 	KMSG_DUMP_PANIC,
-	KMSG_DUMP_KEXEC,
 	KMSG_DUMP_RESTART,
 	KMSG_DUMP_HALT,
 	KMSG_DUMP_POWEROFF,
diff -puN kernel/kexec.c~kexec-remove-kmsg_dump_kexec kernel/kexec.c
--- a/kernel/kexec.c~kexec-remove-kmsg_dump_kexec
+++ a/kernel/kexec.c
@@ -32,7 +32,6 @@
 #include <linux/console.h>
 #include <linux/vmalloc.h>
 #include <linux/swap.h>
-#include <linux/kmsg_dump.h>
 #include <linux/syscore_ops.h>
 
 #include <asm/page.h>
@@ -1079,8 +1078,6 @@ void crash_kexec(struct pt_regs *regs)
 		if (kexec_crash_image) {
 			struct pt_regs fixed_regs;
 
-			kmsg_dump(KMSG_DUMP_KEXEC);
-
 			crash_setup_regs(&fixed_regs, regs);
 			crash_save_vmcoreinfo();
 			machine_crash_shutdown(&fixed_regs);
diff -puN arch/powerpc/platforms/pseries/nvram.c~kexec-remove-kmsg_dump_kexec arch/powerpc/platforms/pseries/nvram.c
--- a/arch/powerpc/platforms/pseries/nvram.c~kexec-remove-kmsg_dump_kexec
+++ a/arch/powerpc/platforms/pseries/nvram.c
@@ -490,7 +490,6 @@ static void oops_to_nvram(struct kmsg_du
 		/* These are almost always orderly shutdowns. */
 		return;
 	case KMSG_DUMP_OOPS:
-	case KMSG_DUMP_KEXEC:
 		break;
 	case KMSG_DUMP_PANIC:
 		panicking = true;
_


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec())
  2011-05-31 21:51                   ` Vivek Goyal
@ 2011-06-09 11:00                     ` KOSAKI Motohiro
  2011-06-14 22:13                       ` Vivek Goyal
  0 siblings, 1 reply; 41+ messages in thread
From: KOSAKI Motohiro @ 2011-06-09 11:00 UTC (permalink / raw)
  To: vgoyal
  Cc: kosaki.motohiro, akpm, xiyou.wangcong, ebiederm, linux-kernel,
	jwilson, seiji.aguchi

Hi

Sorry for the delay. I had got stuck LinuxCon Japan and piled up plenty
paper works.

>>> I think I can agree your proposal. But could you please explain why do 
>>> you think kmsg _before_ kdump and kmsg _in_ kdump are so different? 
>>> I think it is only C level difference. CPU don't care C function and 
>>> anyway the kernel call kmsg_dump() because invoke second kernel even 
>>> if you proposal applied. 
>>> It is only curious. I'm not against your proposal. 
>>> Thanks. 
> 
> Few reasons.
> 
> - There is no correlation between crash_kexec() and kdump_msg(). What
>   you are creating is equivalent of panic notifiers and calling those
>   notifiers before dump happened. So calling it inside of crash_kexec()
>   does not make much sense from code point of view.

Thank you for the replay. I got you _think_ no makes sense, but I haven't
explain what you talk about the code of "code point of view".
If you read the code, you can understand kdump_msg() and panic_notifiers
are not same point.


> - Why does somebody need to keep track of event KMSG_DUMP_KEXEC?

I believe I answered already at last thread.

http://groups.google.com/group/linux.kernel/browse_thread/thread/1084f406573d76ac/daa1a08804385089?q=kexec%3A+remove+KMSG_DUMP_KEXEC&lnk=ol&


> - There is one kernel CONFIG option introduce which looks completely
>   superfluous.

What you mean "superfluous"? We already have billion kernel CONFIG.
Is it also bad?

> My general take on the whole issue.
> 
> - In general I think exporting a hook to module so that they can do
>   anything before crash is a bad idea. Now this can be overloaded to
>   do things like sending crash notifications in clustered environement
>   where we recommend doing it in second kernel.

??
It's not my issue and I haven't talked about such thing. I guess you
confuse I and Aguch-san or someone else.

> 
> - Even if we really have to do it, there seemed to be two concern
>   areas.
> 
> 	- Reliability of kdump_msg() generic infrastructure and its
>   	  capability in terms of handling races with other cpus and
> 	  NMIs.
> 
> 	- Reliability of module which is getting the callback from
> 	  kdump_msg().

Indeed. I thought Aguch-san said he promised he work on improve them.
However it doesn't happen yet. Okay, I


>  I think in one of the mails I was discussing that common infrastructure
>  between kdump and kmsg_dump() can be put in a separate function, like
>  stopping all cpus etc to avoid races in generic infrastrucutre and
>  then first we can all kmsg_dump() and then crash_kexec().

Nice idea! Yes. I didn't think enterprise folks start to use this feature
and it now happen.
If nobody are working on this, I'll do it.


>  But this still does not provide us any protection against modules getting
>  control after crash and possiblly worsen the situation.

I think this doesn't big matter. If module author hope to get hook, they
can use kprobe in nowadays. I don't think we can make perfect kprobe protection.
I think I wrote this at last thread too.

Probably most reliability stupid module detect way is, watching lkml and revewing
kmsg_dump() user conteniously. However, if you strongly worry about this issue,
I can agree we make tiny foolish module protection. (but I don't have concrete
idea yet)




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec())
  2011-06-02  3:26                       ` Eric W. Biederman
  2011-06-08  0:00                         ` Andrew Morton
@ 2011-06-09 11:15                         ` KOSAKI Motohiro
  1 sibling, 0 replies; 41+ messages in thread
From: KOSAKI Motohiro @ 2011-06-09 11:15 UTC (permalink / raw)
  To: ebiederm
  Cc: seiji.aguchi, vgoyal, akpm, xiyou.wangcong, linux-kernel, jwilson

Hi

> Seiji Aguchi <seiji.aguchi@hds.com> writes:
> 
>> Hi,
>>
>>> What are you using kmsg_dump() for? Using mtdoops, ramoops or something
>>> else?  Is it working reliably for you?
>>
>> I plan to use kmsg_dump() for set_variable service of UEFI.
>> I proposed a prototype patch this month and will improve it.
>> (kmsg_dump is used inside pstore.)
>>
>> https://lkml.org/lkml/2011/5/10/340
> 
> Shudder.  Firmware calls in the crash path.
> 
> If that is the use, we need to remove the kmsg_dump(KMSG_DUMP_KEXEC)
> hook from crash_kexec yesterday.  It is leading to some really ludicrous
> suggestions that are on the way from making kexec on panic unreliable
> and useless.

Do you have concrete example?

If you only talked about theoretical issue, probably making boot parameter
is enough and reasonable way.


> There will always be EFI implementations where that will not work and
> there will be no way we can fix those.
> 
> There is a long history of people trying to do things in a crashing
> kernel, things that simply do not work when the system is in a bad
> state.  kmsg_dump() when I reviewed the code had significant
> implementation problems for being called from interrupt handlers
> and the like.
> 
> To introduce a different solution for capturing information when a
> kernel crashes we need to see numbers that in a large number of
> situations that the mechanism you are proposing is more reliable and/or
> more maintainable than the current kexec on panic implementation.
> 
> The best work I know of on the reliability of the current situation
> is "Evaluating Linux Kernel Crash Dumping Mechanisms", by Fernando Luis Vazquez Cao.
> http://www.linuxsymposium.org/archives/OLS/Reprints-2006/cao-reprint.pdf

Every reliability improvement idea is welcome! This also improve embedded too.


> Now it does happen to be a fact that our efi support in linux is
> so buggy kexec does not work let alone kexec on panic (if the target
> kernel has any efi support).   But our efi support being buggy is not
> a reason to add more ways to fail when we have a kernel with efi
> support.  It is an argument to remove our excessive use of EFI
> calls.

Which part is buggy? As far as I know, IBM, HP and Fujitsu have EFI supported
server product and it works.

So, if you are suffered from buggy efi, I think the best way is to make config
option and you disable it and Aguch-san enable it.


> So let's just remove the ridiculous kmsg_dump(KMSG_DUMP_KEXEC) hook from
> crash_kexec and remove any temptation for abuses like wanting to use
> kmsg_dump() on anything but a deeply embedded system where there simply
> is not enough memory for 2 kernels.

This sentence has two misunderstands. 1) modern embedded (e.g. Android) has
lots memory rather than 10 years past PC 2) they often don't need full feature
second kernel. they often don't need full crash dump.

Thanks.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec())
  2011-06-09 11:00                     ` KOSAKI Motohiro
@ 2011-06-14 22:13                       ` Vivek Goyal
  0 siblings, 0 replies; 41+ messages in thread
From: Vivek Goyal @ 2011-06-14 22:13 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: akpm, xiyou.wangcong, ebiederm, linux-kernel, jwilson, seiji.aguchi

On Thu, Jun 09, 2011 at 08:00:08PM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> Sorry for the delay. I had got stuck LinuxCon Japan and piled up plenty
> paper works.
> 
> >>> I think I can agree your proposal. But could you please explain why do 
> >>> you think kmsg _before_ kdump and kmsg _in_ kdump are so different? 
> >>> I think it is only C level difference. CPU don't care C function and 
> >>> anyway the kernel call kmsg_dump() because invoke second kernel even 
> >>> if you proposal applied. 
> >>> It is only curious. I'm not against your proposal. 
> >>> Thanks. 
> > 
> > Few reasons.
> > 
> > - There is no correlation between crash_kexec() and kdump_msg(). What
> >   you are creating is equivalent of panic notifiers and calling those
> >   notifiers before dump happened. So calling it inside of crash_kexec()
> >   does not make much sense from code point of view.
> 
> Thank you for the replay. I got you _think_ no makes sense, but I haven't
> explain what you talk about the code of "code point of view".
> If you read the code, you can understand kdump_msg() and panic_notifiers
> are not same point.
> 
> 
> > - Why does somebody need to keep track of event KMSG_DUMP_KEXEC?
> 
> I believe I answered already at last thread.
> 
> http://groups.google.com/group/linux.kernel/browse_thread/thread/1084f406573d76ac/daa1a08804385089?q=kexec%3A+remove+KMSG_DUMP_KEXEC&lnk=ol&
> 

Frankly speaking I never understood it. The only thing I got is that
you are saying embedded devices want to do something upon a KEXEC and
I do not understand what's that action embedded devices want to take
upon an KEXEC.

And it is not just KEXEC, I am also curious about hooks in places
like emergency_restart(). Who needs to know about it and is it
safe to do.

So if would be great if you could explain it in detail again.

> 
> > - There is one kernel CONFIG option introduce which looks completely
> >   superfluous.
> 
> What you mean "superfluous"? We already have billion kernel CONFIG.
> Is it also bad?

We have lots of config options but every config option goes through
some thought process and it is taken in if it makes sense. In this
case this additional config option did not make any sense.

> 
> > My general take on the whole issue.
> > 
> > - In general I think exporting a hook to module so that they can do
> >   anything before crash is a bad idea. Now this can be overloaded to
> >   do things like sending crash notifications in clustered environement
> >   where we recommend doing it in second kernel.
> 
> ??
> It's not my issue and I haven't talked about such thing. I guess you
> confuse I and Aguch-san or someone else.

Once you export the hook to a module, anybody can do that. In the
past people have asked for it reapeatedly. So I am just giving you
one example that what will people start doing once the hook is
there.

> 
> > 
> > - Even if we really have to do it, there seemed to be two concern
> >   areas.
> > 
> > 	- Reliability of kdump_msg() generic infrastructure and its
> >   	  capability in terms of handling races with other cpus and
> > 	  NMIs.
> > 
> > 	- Reliability of module which is getting the callback from
> > 	  kdump_msg().
> 
> Indeed. I thought Aguch-san said he promised he work on improve them.
> However it doesn't happen yet. Okay, I
> 
> 
> >  I think in one of the mails I was discussing that common infrastructure
> >  between kdump and kmsg_dump() can be put in a separate function, like
> >  stopping all cpus etc to avoid races in generic infrastrucutre and
> >  then first we can all kmsg_dump() and then crash_kexec().
> 
> Nice idea! Yes. I didn't think enterprise folks start to use this feature
> and it now happen.
> If nobody are working on this, I'll do it.

It would be great if you could work on it and make sure kdump_msg()
and crash_kexec() could share common infrastructure which takes care
of common actions like stopping cpus, taking care of NMIs and invocation
of panic() on mutliple cpus etc.

> 
> 
> >  But this still does not provide us any protection against modules getting
> >  control after crash and possiblly worsen the situation.
> 
> I think this doesn't big matter. If module author hope to get hook, they
> can use kprobe in nowadays. I don't think we can make perfect kprobe protection.
> I think I wrote this at last thread too.
> 
> Probably most reliability stupid module detect way is, watching lkml and revewing
> kmsg_dump() user conteniously. However, if you strongly worry about this issue,
> I can agree we make tiny foolish module protection. (but I don't have concrete
> idea yet)

I do worry about modules. Once the system has paniced, I personally don't
think that anbody and everybody should be able to look at that event and
take whatever actions they want to.

Having said that personally I like the idea of being able to save 
backtrace on some non volatile storage and access it over next boot
through pstore interface.

So if care is taken to make kmsg_dump() generic infrastructure fool proof
probably it is good start and then we can look into module thing later.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2011-06-14 22:14 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-31 22:59 Query about kdump_msg hook into crash_kexec() Vivek Goyal
2011-02-01  7:19 ` Américo Wang
2011-02-01  7:33   ` Eric W. Biederman
2011-02-01  7:38     ` Américo Wang
2011-02-01  8:13       ` [Patch] kexec: remove KMSG_DUMP_KEXEC (was Re: Query about kdump_msg hook into crash_kexec()) Américo Wang
2011-02-01 15:28         ` Vivek Goyal
2011-02-01 16:06           ` Jarod Wilson
2011-02-03  0:59         ` KOSAKI Motohiro
2011-02-03  2:07           ` Vivek Goyal
2011-02-03  4:53             ` KOSAKI Motohiro
2011-05-26 20:10               ` Andrew Morton
2011-05-28  1:43                 ` Eric W. Biederman
2011-05-30  7:30                   ` Américo Wang
2011-05-30  5:13                 ` KOSAKI Motohiro
2011-05-31 21:51                   ` Vivek Goyal
2011-06-09 11:00                     ` KOSAKI Motohiro
2011-06-14 22:13                       ` Vivek Goyal
2011-05-31 20:58                 ` Seiji Aguchi
2011-05-31 21:37                   ` Vivek Goyal
2011-05-31 22:24                     ` Seiji Aguchi
2011-06-02  3:26                       ` Eric W. Biederman
2011-06-08  0:00                         ` Andrew Morton
2011-06-09 11:15                         ` KOSAKI Motohiro
2011-02-03  0:55 ` Query about kdump_msg hook into crash_kexec() KOSAKI Motohiro
2011-02-03  2:05   ` Vivek Goyal
2011-02-03  4:52     ` KOSAKI Motohiro
2011-02-03  5:20       ` KOSAKI Motohiro
2011-02-04 15:00         ` Vivek Goyal
2011-03-08  1:31           ` KOSAKI Motohiro
2011-02-04 14:58       ` Vivek Goyal
2011-02-03 18:38     ` Seiji Aguchi
2011-02-03 21:13       ` Eric W. Biederman
2011-02-03 22:08         ` Seiji Aguchi
2011-02-04  2:24           ` Américo Wang
2011-02-04  2:50             ` Vivek Goyal
2011-02-04  3:28               ` Américo Wang
2011-02-04  6:40                 ` KOSAKI Motohiro
2011-02-08 16:46           ` Vivek Goyal
2011-02-08 17:35             ` Eric W. Biederman
2011-02-08 19:27               ` Vivek Goyal
2011-02-08 19:58                 ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).