linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC Patch 0/2] Slimdump framework using CRASH_REASON - v2
@ 2011-11-21  9:54 K.Prasad
  2011-11-21 10:11 ` [RFC Patch 1/2][slimdump] Append CRASH_REASON to VMCOREINFO elf-note K.Prasad
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: K.Prasad @ 2011-11-21  9:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Vivek Goyal, Borislav Petkov, Luck, Tony, Eric W. Biederman,
	anderson, tachibana, oomichi, Valdis.Kletnieks, Nick Bowler

Hi All,
	In furtherance of the previous discussion regarding 'slimdump'
(refer: http://article.gmane.org/gmane.linux.kernel/1204967), it was
decided that,

- An entry in VMCOREINFO elf-note be added to denote the cause of crash,
  instead of creating a new elf-note.

- Upstream tools such as 'makedumpfile' and 'crash' be modified to
  recognise this string and inform the user accordingly.

Accordingly, this new version of the patchset makes the following
changes 

Changelog - version 2
-----------------------
(First version posted here:
http://article.gmane.org/gmane.linux.kernel/1198435)

- Append VMCOREINFO elf-note with a new variable CRASH_REASON whose
  value will be populated using arch_add_crash_reason() function.

- Define arch_add_crash_reason() in the x86 MCE path to return "PANIC_MCE"
  in the panic path of MCE.

- 'makedumpfile' tool is taught to recognise PANIC_MCE string as one
  value of CRASH_REASON for which 'slimdump' must be captured.

- Changes to 'crash' tool are not included, and is deferred till there's
  consensus to the kernel and makedumpfile patches.

Let me know your comments on this.

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC Patch 1/2][slimdump] Append CRASH_REASON to VMCOREINFO elf-note
  2011-11-21  9:54 [RFC Patch 0/2] Slimdump framework using CRASH_REASON - v2 K.Prasad
@ 2011-11-21 10:11 ` K.Prasad
  2011-11-21 15:11   ` Vivek Goyal
  2011-11-21 15:19   ` Dave Anderson
  2011-11-21 10:14 ` [RFC Patch 2/2][slimdump][makedumpfile] Recognise PANIC_MCE crashes to generate slimdu K.Prasad
  2011-11-21 15:17 ` [RFC Patch 0/2] Slimdump framework using CRASH_REASON - v2 Vivek Goyal
  2 siblings, 2 replies; 15+ messages in thread
From: K.Prasad @ 2011-11-21 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Vivek Goyal, Borislav Petkov, Luck, Tony, Eric W. Biederman,
	anderson, tachibana, oomichi, Valdis.Kletnieks, Nick Bowler

Allow various crash paths to append the reason of crash into the
VMCOREINFO elf-note through the field CRASH_REASON. We also make the
fatal machine check exceptions append "PANIC_MCE" as the crash reason.
This string will be recognised by upstream tools like makedumpfile and
crash to generate slimdump.

With increased usage of the CRASH_REASON field, the crash strings can be
encoded for better usage.

Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |    8 ++++++++
 kernel/kexec.c                   |    6 ++++++
 2 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 362056a..5b2cb6a 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -38,6 +38,7 @@
 #include <linux/debugfs.h>
 #include <linux/irq_work.h>
 #include <linux/export.h>
+#include <linux/kexec.h>
 
 #include <asm/processor.h>
 #include <asm/mce.h>
@@ -240,6 +241,13 @@ static atomic_t mce_paniced;
 static int fake_panic;
 static atomic_t mce_fake_paniced;
 
+char *arch_add_crash_reason(void)
+{
+	static char crash_reason[] = "PANIC_MCE";
+
+	return crash_reason;
+}
+
 /* Panic in progress. Enable interrupts and wait for final IPI */
 static void wait_for_panic(void)
 {
diff --git a/kernel/kexec.c b/kernel/kexec.c
index dc7bc08..a731693 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1080,6 +1080,11 @@ asmlinkage long compat_sys_kexec_load(unsigned long entry,
 }
 #endif
 
+__weak char *arch_add_crash_reason(void)
+{
+	return (char *)NULL;
+}
+
 void crash_kexec(struct pt_regs *regs)
 {
 	/* Take the kexec_mutex here to prevent sys_kexec_load
@@ -1411,6 +1416,7 @@ static void update_vmcoreinfo_note(void)
 void crash_save_vmcoreinfo(void)
 {
 	vmcoreinfo_append_str("CRASHTIME=%ld", get_seconds());
+	vmcoreinfo_append_str("\nCRASH_REASON=%s\n", arch_add_crash_reason());
 	update_vmcoreinfo_note();
 }
 


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC Patch 2/2][slimdump][makedumpfile] Recognise PANIC_MCE crashes to generate slimdu
  2011-11-21  9:54 [RFC Patch 0/2] Slimdump framework using CRASH_REASON - v2 K.Prasad
  2011-11-21 10:11 ` [RFC Patch 1/2][slimdump] Append CRASH_REASON to VMCOREINFO elf-note K.Prasad
@ 2011-11-21 10:14 ` K.Prasad
  2011-11-21 15:17 ` [RFC Patch 0/2] Slimdump framework using CRASH_REASON - v2 Vivek Goyal
  2 siblings, 0 replies; 15+ messages in thread
From: K.Prasad @ 2011-11-21 10:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: Vivek Goyal, Borislav Petkov, Luck, Tony, Eric W. Biederman,
	anderson, tachibana, oomichi, Valdis.Kletnieks, Nick Bowler

Given that the kernel indicates the cause of crash through a new field
CRASH_REASON in the VMCOREINFO elf-note, recognise the same. For crashes
caused by PANIC_MCE, avoid capture of kernel memory, instead generate
only a slimdump.

Since 'slimdump' will be of very small size (containing only elf-headers and
elf-notes section), the resultant coredump will be of ELF type (and not
kdump-compressed format).

Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
---
 elf_info.c     |   67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 elf_info.h     |    2 +
 makedumpfile.c |   13 ++++++++++-
 makedumpfile.h |    1 +
 4 files changed, 82 insertions(+), 1 deletions(-)

diff --git a/elf_info.c b/elf_info.c
index 114dd05..a925484 100644
--- a/elf_info.c
+++ b/elf_info.c
@@ -287,6 +287,73 @@ offset_note_desc(void *note)
 	return offset;
 }
 
+#define CRASH_REASON_PANIC_MCE	"CRASH_REASON=PANIC_MCE"
+
+/*
+ * This function checks if the vmcoreinfo note has its CRASH_REASON set as
+ * PANIC_MCE. This is added if the crash is due to a hardware error and
+ * when it makes no sense to read/store the crashing kernel's memory. In
+ * such a case, only a 'slimdump' is captured.
+ */
+int
+is_crash_by_mce(void)
+{
+	int note_size, ret = FALSE;
+	off_t offset;
+	char buf[VMCOREINFO_XEN_NOTE_NAME_BYTES];
+	char note[MAX_SIZE_NHDR];
+	void *vmcoreinfo_note = NULL;
+
+	offset = offset_pt_note_memory;
+	while (offset < offset_pt_note_memory + size_pt_note_memory) {
+		if (lseek(fd_memory, offset, SEEK_SET) < 0) {
+			ERRMSG("Can't seek the dump memory(%s). %s\n",
+			    name_memory, strerror(errno));
+			return FALSE;
+		}
+		if (read(fd_memory, note, sizeof(note)) != sizeof(note)) {
+			ERRMSG("Can't read the dump memory(%s). %s\n",
+			    name_memory, strerror(errno));
+			return FALSE;
+		}
+
+		if (read(fd_memory, &buf, sizeof(buf)) != sizeof(buf)) {
+			ERRMSG("Can't read the dump memory(%s). %s\n",
+			    name_memory, strerror(errno));
+			return FALSE;
+		}
+		if (strncmp(VMCOREINFO_NOTE_NAME, buf,
+				VMCOREINFO_NOTE_NAME_BYTES)) {
+			offset += offset_next_note(note);
+			continue;
+		}
+
+		/*
+		 * Now copy VMCOREINFO_NOTE to examine its contents.
+		 * We need to parse it to check if the CRASH_REASON=PANIC_MCE.
+		 */
+		note_size = offset_next_note(note);
+
+		vmcoreinfo_note = malloc(note_size);
+		if(!vmcoreinfo_note) {
+			ERRMSG("Can't allocate memory for the vmcoreinfo note."
+				"%s\n", strerror(errno));
+			return FALSE;
+		}
+		if (read(fd_memory, vmcoreinfo_note, note_size) != note_size) {
+			ERRMSG("Can't read the dump memory(%s). %s\n",
+			    name_memory, strerror(errno));
+			goto exit;
+		}
+		if(strstr(vmcoreinfo_note, CRASH_REASON_PANIC_MCE))
+			ret = TRUE;
+			break;
+	}
+exit:
+	free(vmcoreinfo_note);
+	return ret;
+}
+
 static int
 get_pt_note_info(void)
 {
diff --git a/elf_info.h b/elf_info.h
index 4dff9c1..0437481 100644
--- a/elf_info.h
+++ b/elf_info.h
@@ -34,6 +34,8 @@ unsigned long long get_max_paddr(void);
 int get_elf64_ehdr(int fd, char *filename, Elf64_Ehdr *ehdr);
 int get_elf32_ehdr(int fd, char *filename, Elf32_Ehdr *ehdr);
 int get_elf_info(int fd, char *filename);
+int is_crash_by_mce(void);
+
 void free_elf_info(void);
 
 int is_elf64_memory(void);
diff --git a/makedumpfile.c b/makedumpfile.c
index 7b7c266..15efa90 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -4173,7 +4173,11 @@ write_elf_pages(struct cache_data *cd_header, struct cache_data *cd_page)
 		if (!get_phdr_memory(i, &load))
 			return FALSE;
 
-		if (load.p_type != PT_LOAD)
+		/*
+		 * Do not capture the kernel's memory if flag_nocoredump is
+		 * turned on. This may be dangerous to the system stability.
+		 */
+		if ((load.p_type != PT_LOAD) || (info->flag_nocoredump))
 			continue;
 
 		off_memory= load.p_offset;
@@ -5760,6 +5764,13 @@ create_dumpfile(void)
 		if (!get_elf_info(info->fd_memory, info->name_memory))
 			return FALSE;
 	}
+	/*
+	 * If NT_NOCOREDUMP elf-note is present, indicate the same through
+	 * 'flag_nocoredump' flag. The resultant slimdump will always be in ELF
+	 * format, irrespective of the user options.
+	 */
+	info->flag_nocoredump = info->flag_elf_dumpfile = is_crash_by_mce();
+
 	if (is_xen_memory()) {
 		if (!initial_xen())
 			return FALSE;
diff --git a/makedumpfile.h b/makedumpfile.h
index f0e5da8..faf1c65 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -778,6 +778,7 @@ struct DumpInfo {
 	int		flag_exclude_xen_dom;/* exclude Domain-U from xen-kdump */
 	int             flag_dmesg;          /* dump the dmesg log out of the vmcore file */
 	int		flag_nospace;	     /* the flag of "No space on device" error */
+	int		flag_nocoredump;	/* coredump not collected */
 	unsigned long	vaddr_for_vtop;      /* virtual address for debugging */
 	long		page_size;           /* size of page */
 	long		page_shift;


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC Patch 1/2][slimdump] Append CRASH_REASON to VMCOREINFO elf-note
  2011-11-21 10:11 ` [RFC Patch 1/2][slimdump] Append CRASH_REASON to VMCOREINFO elf-note K.Prasad
@ 2011-11-21 15:11   ` Vivek Goyal
  2011-11-23 16:14     ` K.Prasad
  2011-11-21 15:19   ` Dave Anderson
  1 sibling, 1 reply; 15+ messages in thread
From: Vivek Goyal @ 2011-11-21 15:11 UTC (permalink / raw)
  To: K.Prasad
  Cc: linux-kernel, Borislav Petkov, Luck, Tony, Eric W. Biederman,
	anderson, tachibana, oomichi, Valdis.Kletnieks, Nick Bowler

On Mon, Nov 21, 2011 at 03:41:57PM +0530, K.Prasad wrote:
> Allow various crash paths to append the reason of crash into the
> VMCOREINFO elf-note through the field CRASH_REASON. We also make the
> fatal machine check exceptions append "PANIC_MCE" as the crash reason.
> This string will be recognised by upstream tools like makedumpfile and
> crash to generate slimdump.
> 
> With increased usage of the CRASH_REASON field, the crash strings can be
> encoded for better usage.
> 
> Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |    8 ++++++++
>  kernel/kexec.c                   |    6 ++++++
>  2 files changed, 14 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 362056a..5b2cb6a 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -38,6 +38,7 @@
>  #include <linux/debugfs.h>
>  #include <linux/irq_work.h>
>  #include <linux/export.h>
> +#include <linux/kexec.h>
>  
>  #include <asm/processor.h>
>  #include <asm/mce.h>
> @@ -240,6 +241,13 @@ static atomic_t mce_paniced;
>  static int fake_panic;
>  static atomic_t mce_fake_paniced;
>  
> +char *arch_add_crash_reason(void)
> +{
> +	static char crash_reason[] = "PANIC_MCE";
> +
> +	return crash_reason;
> +}
> +
>  /* Panic in progress. Enable interrupts and wait for final IPI */
>  static void wait_for_panic(void)
>  {
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index dc7bc08..a731693 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -1080,6 +1080,11 @@ asmlinkage long compat_sys_kexec_load(unsigned long entry,
>  }
>  #endif
>  
> +__weak char *arch_add_crash_reason(void)
> +{
> +	return (char *)NULL;
> +}
> +
>  void crash_kexec(struct pt_regs *regs)
>  {
>  	/* Take the kexec_mutex here to prevent sys_kexec_load
> @@ -1411,6 +1416,7 @@ static void update_vmcoreinfo_note(void)
>  void crash_save_vmcoreinfo(void)
>  {
>  	vmcoreinfo_append_str("CRASHTIME=%ld", get_seconds());
> +	vmcoreinfo_append_str("\nCRASH_REASON=%s\n", arch_add_crash_reason());

I think don't even create a CRASH_REASON= entry if arch returns a NULL
string.

Vivek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC Patch 0/2] Slimdump framework using CRASH_REASON - v2
  2011-11-21  9:54 [RFC Patch 0/2] Slimdump framework using CRASH_REASON - v2 K.Prasad
  2011-11-21 10:11 ` [RFC Patch 1/2][slimdump] Append CRASH_REASON to VMCOREINFO elf-note K.Prasad
  2011-11-21 10:14 ` [RFC Patch 2/2][slimdump][makedumpfile] Recognise PANIC_MCE crashes to generate slimdu K.Prasad
@ 2011-11-21 15:17 ` Vivek Goyal
  2011-11-23 17:33   ` K.Prasad
  2 siblings, 1 reply; 15+ messages in thread
From: Vivek Goyal @ 2011-11-21 15:17 UTC (permalink / raw)
  To: K.Prasad
  Cc: linux-kernel, Borislav Petkov, Luck, Tony, Eric W. Biederman,
	anderson, tachibana, oomichi, Valdis.Kletnieks, Nick Bowler

On Mon, Nov 21, 2011 at 03:24:05PM +0530, K.Prasad wrote:
> Hi All,
> 	In furtherance of the previous discussion regarding 'slimdump'
> (refer: http://article.gmane.org/gmane.linux.kernel/1204967), it was
> decided that,
> 
> - An entry in VMCOREINFO elf-note be added to denote the cause of crash,
>   instead of creating a new elf-note.
> 
> - Upstream tools such as 'makedumpfile' and 'crash' be modified to
>   recognise this string and inform the user accordingly.
> 
> Accordingly, this new version of the patchset makes the following
> changes 
> 
> Changelog - version 2
> -----------------------
> (First version posted here:
> http://article.gmane.org/gmane.linux.kernel/1198435)
> 
> - Append VMCOREINFO elf-note with a new variable CRASH_REASON whose
>   value will be populated using arch_add_crash_reason() function.
> 
> - Define arch_add_crash_reason() in the x86 MCE path to return "PANIC_MCE"
>   in the panic path of MCE.
> 
> - 'makedumpfile' tool is taught to recognise PANIC_MCE string as one
>   value of CRASH_REASON for which 'slimdump' must be captured.

So again, what is slimdump? I mean, what information is now being captured
in the case of slimdump? Are you capturing atleast the kernel message
buffers? I am assuming that any register info emitted on console will
make into kernel buffers and that should be useful to figure out what
MCE happened.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC Patch 1/2][slimdump] Append CRASH_REASON to VMCOREINFO elf-note
  2011-11-21 10:11 ` [RFC Patch 1/2][slimdump] Append CRASH_REASON to VMCOREINFO elf-note K.Prasad
  2011-11-21 15:11   ` Vivek Goyal
@ 2011-11-21 15:19   ` Dave Anderson
  2011-11-23 17:39     ` K.Prasad
  2011-11-23 17:42     ` K.Prasad
  1 sibling, 2 replies; 15+ messages in thread
From: Dave Anderson @ 2011-11-21 15:19 UTC (permalink / raw)
  To: prasad
  Cc: Vivek Goyal, Borislav Petkov, Tony Luck, Eric W. Biederman,
	tachibana, oomichi, Valdis Kletnieks, Nick Bowler, linux-kernel



----- Original Message -----
> Allow various crash paths to append the reason of crash into the
> VMCOREINFO elf-note through the field CRASH_REASON. We also make the
> fatal machine check exceptions append "PANIC_MCE" as the crash
> reason.  This string will be recognised by upstream tools like makedumpfile and
> crash to generate slimdump.

I don't understand -- how could "various paths" append a reason?
The patch below seems to return "PANIC_MCE" for every x86 crash.
What am I missing?

Dave
 
> With increased usage of the CRASH_REASON field, the crash strings can be
> encoded for better usage.
> 
> Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |    8 ++++++++
>  kernel/kexec.c                   |    6 ++++++
>  2 files changed, 14 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
> b/arch/x86/kernel/cpu/mcheck/mce.c
> index 362056a..5b2cb6a 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -38,6 +38,7 @@
>  #include <linux/debugfs.h>
>  #include <linux/irq_work.h>
>  #include <linux/export.h>
> +#include <linux/kexec.h>
>  
>  #include <asm/processor.h>
>  #include <asm/mce.h>
> @@ -240,6 +241,13 @@ static atomic_t mce_paniced;
>  static int fake_panic;
>  static atomic_t mce_fake_paniced;
>  
> +char *arch_add_crash_reason(void)
> +{
> +	static char crash_reason[] = "PANIC_MCE";
> +
> +	return crash_reason;
> +}
> +
>  /* Panic in progress. Enable interrupts and wait for final IPI */
>  static void wait_for_panic(void)
>  {
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index dc7bc08..a731693 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -1080,6 +1080,11 @@ asmlinkage long compat_sys_kexec_load(unsigned
> long entry,
>  }
>  #endif
>  
> +__weak char *arch_add_crash_reason(void)
> +{
> +	return (char *)NULL;
> +}
> +
>  void crash_kexec(struct pt_regs *regs)
>  {
>  	/* Take the kexec_mutex here to prevent sys_kexec_load
> @@ -1411,6 +1416,7 @@ static void update_vmcoreinfo_note(void)
>  void crash_save_vmcoreinfo(void)
>  {
>  	vmcoreinfo_append_str("CRASHTIME=%ld", get_seconds());
> +	vmcoreinfo_append_str("\nCRASH_REASON=%s\n", arch_add_crash_reason());
>  	update_vmcoreinfo_note();
>  }
>  
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC Patch 1/2][slimdump] Append CRASH_REASON to VMCOREINFO elf-note
  2011-11-21 15:11   ` Vivek Goyal
@ 2011-11-23 16:14     ` K.Prasad
  0 siblings, 0 replies; 15+ messages in thread
From: K.Prasad @ 2011-11-23 16:14 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-kernel, Borislav Petkov, Luck, Tony, Eric W. Biederman,
	anderson, tachibana, oomichi, Valdis.Kletnieks, Nick Bowler

On Mon, Nov 21, 2011 at 10:11:57AM -0500, Vivek Goyal wrote:
> On Mon, Nov 21, 2011 at 03:41:57PM +0530, K.Prasad wrote:
> > Allow various crash paths to append the reason of crash into the
> > VMCOREINFO elf-note through the field CRASH_REASON. We also make the
> > fatal machine check exceptions append "PANIC_MCE" as the crash reason.
> > This string will be recognised by upstream tools like makedumpfile and
> > crash to generate slimdump.
> > 
> > With increased usage of the CRASH_REASON field, the crash strings can be
> > encoded for better usage.
> > 

[snipped]

> > diff --git a/kernel/kexec.c b/kernel/kexec.c
> > index dc7bc08..a731693 100644
> > --- a/kernel/kexec.c
> > +++ b/kernel/kexec.c
> > @@ -1080,6 +1080,11 @@ asmlinkage long compat_sys_kexec_load(unsigned long entry,
> >  }
> >  #endif
> >  
> > +__weak char *arch_add_crash_reason(void)
> > +{
> > +	return (char *)NULL;
> > +}
> > +
> >  void crash_kexec(struct pt_regs *regs)
> >  {
> >  	/* Take the kexec_mutex here to prevent sys_kexec_load
> > @@ -1411,6 +1416,7 @@ static void update_vmcoreinfo_note(void)
> >  void crash_save_vmcoreinfo(void)
> >  {
> >  	vmcoreinfo_append_str("CRASHTIME=%ld", get_seconds());
> > +	vmcoreinfo_append_str("\nCRASH_REASON=%s\n", arch_add_crash_reason());
> 
> I think don't even create a CRASH_REASON= entry if arch returns a NULL
> string.
>

Yes, we could do that. I'll change the code accordingly in the next
revision of the patchset.

Thanks,
K.Prasad
 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC Patch 0/2] Slimdump framework using CRASH_REASON - v2
  2011-11-21 15:17 ` [RFC Patch 0/2] Slimdump framework using CRASH_REASON - v2 Vivek Goyal
@ 2011-11-23 17:33   ` K.Prasad
  2011-11-28 14:24     ` Vivek Goyal
  0 siblings, 1 reply; 15+ messages in thread
From: K.Prasad @ 2011-11-23 17:33 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-kernel, Borislav Petkov, Luck, Tony, Eric W. Biederman,
	anderson, tachibana, oomichi, Valdis.Kletnieks, Nick Bowler

On Mon, Nov 21, 2011 at 10:17:27AM -0500, Vivek Goyal wrote:
> On Mon, Nov 21, 2011 at 03:24:05PM +0530, K.Prasad wrote:
> > Hi All,
> > 	In furtherance of the previous discussion regarding 'slimdump'
> > (refer: http://article.gmane.org/gmane.linux.kernel/1204967), it was
> > decided that,
> > 
> > - An entry in VMCOREINFO elf-note be added to denote the cause of crash,
> >   instead of creating a new elf-note.
> > 
> > - Upstream tools such as 'makedumpfile' and 'crash' be modified to
> >   recognise this string and inform the user accordingly.
> > 
> > Accordingly, this new version of the patchset makes the following
> > changes 
> > 
> > Changelog - version 2
> > -----------------------
> > (First version posted here:
> > http://article.gmane.org/gmane.linux.kernel/1198435)
> > 
> > - Append VMCOREINFO elf-note with a new variable CRASH_REASON whose
> >   value will be populated using arch_add_crash_reason() function.
> > 
> > - Define arch_add_crash_reason() in the x86 MCE path to return "PANIC_MCE"
> >   in the panic path of MCE.
> > 
> > - 'makedumpfile' tool is taught to recognise PANIC_MCE string as one
> >   value of CRASH_REASON for which 'slimdump' must be captured.
> 
> So again, what is slimdump? I mean, what information is now being captured
> in the case of slimdump? Are you capturing atleast the kernel message
> buffers? I am assuming that any register info emitted on console will
> make into kernel buffers and that should be useful to figure out what
> MCE happened.
> 

The kernel message buffers can be obtained by using the --dump-dmesg
option of makedumpfile but again that's risky. We wouldn't know if it'll
cause access to the faulty memory (which is how the previous method of having
a new elf-notes in a pristine location is much safer).

The method in this patch is quite primitive in that informs the user
nothing more than a one-line cause of crash. One should take help from other
tools (such as service processor/firmware/ACPI logs, or previous corrected
error logs) to infer the location of bad memory. It would have been helpful
if we could provide a dump of related information such as contents of
"struct mce" (like in the very first iteration of the patch,
refer: http://article.gmane.org/gmane.linux.kernel/1146215), but that would
need some memory (possibly in the form of a new elf-note).

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC Patch 1/2][slimdump] Append CRASH_REASON to VMCOREINFO elf-note
  2011-11-21 15:19   ` Dave Anderson
@ 2011-11-23 17:39     ` K.Prasad
  2011-11-28 14:26       ` Vivek Goyal
  2011-11-23 17:42     ` K.Prasad
  1 sibling, 1 reply; 15+ messages in thread
From: K.Prasad @ 2011-11-23 17:39 UTC (permalink / raw)
  To: Dave Anderson
  Cc: Vivek Goyal, Borislav Petkov, Tony Luck, Eric W. Biederman,
	tachibana, oomichi, Valdis Kletnieks, Nick Bowler, linux-kernel

On Mon, Nov 21, 2011 at 10:19:31AM -0500, Dave Anderson wrote:
> 
> 
> ----- Original Message -----
> > Allow various crash paths to append the reason of crash into the
> > VMCOREINFO elf-note through the field CRASH_REASON. We also make the
> > fatal machine check exceptions append "PANIC_MCE" as the crash
> > reason.  This string will be recognised by upstream tools like makedumpfile and
> > crash to generate slimdump.
> 
> I don't understand -- how could "various paths" append a reason?
> The patch below seems to return "PANIC_MCE" for every x86 crash.
> What am I missing?
> 
> Dave
> 

Yes, presently it can only be "PANIC_MCE" for MCE crashes in x86 (not
for every crash though).

With increased usage, we should move this code to a generic location and
let each of these crash paths return a string to be appended. In fact it
doesn't have to be a string for CRASH_REASON but just an encoding of the
various crash types into numbers. User-space tools could then do a lookup
for getting the right crash string.

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC Patch 1/2][slimdump] Append CRASH_REASON to VMCOREINFO elf-note
  2011-11-21 15:19   ` Dave Anderson
  2011-11-23 17:39     ` K.Prasad
@ 2011-11-23 17:42     ` K.Prasad
  2011-11-23 19:45       ` Dave Anderson
  1 sibling, 1 reply; 15+ messages in thread
From: K.Prasad @ 2011-11-23 17:42 UTC (permalink / raw)
  To: Dave Anderson
  Cc: Vivek Goyal, Borislav Petkov, Tony Luck, Eric W. Biederman,
	tachibana, oomichi, Valdis Kletnieks, Nick Bowler, linux-kernel

On Mon, Nov 21, 2011 at 10:19:31AM -0500, Dave Anderson wrote:
> 
> 
> ----- Original Message -----
> > Allow various crash paths to append the reason of crash into the
> > VMCOREINFO elf-note through the field CRASH_REASON. We also make the
> > fatal machine check exceptions append "PANIC_MCE" as the crash
> > reason.  This string will be recognised by upstream tools like makedumpfile and
> > crash to generate slimdump.
> 
> I don't understand -- how could "various paths" append a reason?
> The patch below seems to return "PANIC_MCE" for every x86 crash.
> What am I missing?
> 
> Dave
>

Yes, presently it can only be "PANIC_MCE" for MCE crashes in x86 (not
for every crash though).

With increased usage, we should move this code to a generic location and
let each of these crash paths return a string to be appended. In fact it
doesn't have to be a string for CRASH_REASON but just an encoding of the
various crash types into numbers. User-space tools could then do a
lookup for getting the right crash string.

Thanks,
K.Prasad
 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC Patch 1/2][slimdump] Append CRASH_REASON to VMCOREINFO elf-note
  2011-11-23 17:42     ` K.Prasad
@ 2011-11-23 19:45       ` Dave Anderson
  2011-11-29 14:37         ` K.Prasad
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Anderson @ 2011-11-23 19:45 UTC (permalink / raw)
  To: prasad
  Cc: Vivek Goyal, Borislav Petkov, Tony Luck, Eric W. Biederman,
	tachibana, oomichi, Valdis Kletnieks, Nick Bowler, linux-kernel



----- Original Message -----
> On Mon, Nov 21, 2011 at 10:19:31AM -0500, Dave Anderson wrote:
> > 
> > 
> > ----- Original Message -----
> > > Allow various crash paths to append the reason of crash into the
> > > VMCOREINFO elf-note through the field CRASH_REASON. We also make the
> > > fatal machine check exceptions append "PANIC_MCE" as the crash
> > > reason.  This string will be recognised by upstream tools like
> > > makedumpfile and crash to generate slimdump.
> > 
> > I don't understand -- how could "various paths" append a reason?
> > The patch below seems to return "PANIC_MCE" for every x86 crash.
> > What am I missing?
> > 
> > Dave
> >
> 
> Yes, presently it can only be "PANIC_MCE" for MCE crashes in x86 (not
> for every crash though).

Why only MCE crashes?  If your arch/x86/kernel/cpu/mcheck/mce.c is 
compiled into any x86 kernel, then its arch_add_crash_reason() will 
always override the weak version in kernel/kexec.c, right? 

Dave
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC Patch 0/2] Slimdump framework using CRASH_REASON - v2
  2011-11-23 17:33   ` K.Prasad
@ 2011-11-28 14:24     ` Vivek Goyal
  2011-11-30 17:15       ` K.Prasad
  0 siblings, 1 reply; 15+ messages in thread
From: Vivek Goyal @ 2011-11-28 14:24 UTC (permalink / raw)
  To: K.Prasad
  Cc: linux-kernel, Borislav Petkov, Luck, Tony, Eric W. Biederman,
	anderson, tachibana, oomichi, Valdis.Kletnieks, Nick Bowler

On Wed, Nov 23, 2011 at 11:03:18PM +0530, K.Prasad wrote:
> On Mon, Nov 21, 2011 at 10:17:27AM -0500, Vivek Goyal wrote:
> > On Mon, Nov 21, 2011 at 03:24:05PM +0530, K.Prasad wrote:
> > > Hi All,
> > > 	In furtherance of the previous discussion regarding 'slimdump'
> > > (refer: http://article.gmane.org/gmane.linux.kernel/1204967), it was
> > > decided that,
> > > 
> > > - An entry in VMCOREINFO elf-note be added to denote the cause of crash,
> > >   instead of creating a new elf-note.
> > > 
> > > - Upstream tools such as 'makedumpfile' and 'crash' be modified to
> > >   recognise this string and inform the user accordingly.
> > > 
> > > Accordingly, this new version of the patchset makes the following
> > > changes 
> > > 
> > > Changelog - version 2
> > > -----------------------
> > > (First version posted here:
> > > http://article.gmane.org/gmane.linux.kernel/1198435)
> > > 
> > > - Append VMCOREINFO elf-note with a new variable CRASH_REASON whose
> > >   value will be populated using arch_add_crash_reason() function.
> > > 
> > > - Define arch_add_crash_reason() in the x86 MCE path to return "PANIC_MCE"
> > >   in the panic path of MCE.
> > > 
> > > - 'makedumpfile' tool is taught to recognise PANIC_MCE string as one
> > >   value of CRASH_REASON for which 'slimdump' must be captured.
> > 
> > So again, what is slimdump? I mean, what information is now being captured
> > in the case of slimdump? Are you capturing atleast the kernel message
> > buffers? I am assuming that any register info emitted on console will
> > make into kernel buffers and that should be useful to figure out what
> > MCE happened.
> > 
> 
> The kernel message buffers can be obtained by using the --dump-dmesg
> option of makedumpfile but again that's risky. We wouldn't know if it'll
> cause access to the faulty memory (which is how the previous method of having
> a new elf-notes in a pristine location is much safer).
> 
> The method in this patch is quite primitive in that informs the user
> nothing more than a one-line cause of crash. One should take help from other
> tools (such as service processor/firmware/ACPI logs, or previous corrected
> error logs) to infer the location of bad memory.

And how does one get to firmware/ACPI logs? Many system don't have service
processor also.

I think extracting kernel buffers by default in case of MCE is reasonable.
This should allow somebody to figure out some MCE related information.

You might want to modify makedumpfile so that it does not try to access
pages marked poisoned.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC Patch 1/2][slimdump] Append CRASH_REASON to VMCOREINFO elf-note
  2011-11-23 17:39     ` K.Prasad
@ 2011-11-28 14:26       ` Vivek Goyal
  0 siblings, 0 replies; 15+ messages in thread
From: Vivek Goyal @ 2011-11-28 14:26 UTC (permalink / raw)
  To: K.Prasad
  Cc: Dave Anderson, Borislav Petkov, Tony Luck, Eric W. Biederman,
	tachibana, oomichi, Valdis Kletnieks, Nick Bowler, linux-kernel

On Wed, Nov 23, 2011 at 11:09:31PM +0530, K.Prasad wrote:
> On Mon, Nov 21, 2011 at 10:19:31AM -0500, Dave Anderson wrote:
> > 
> > 
> > ----- Original Message -----
> > > Allow various crash paths to append the reason of crash into the
> > > VMCOREINFO elf-note through the field CRASH_REASON. We also make the
> > > fatal machine check exceptions append "PANIC_MCE" as the crash
> > > reason.  This string will be recognised by upstream tools like makedumpfile and
> > > crash to generate slimdump.
> > 
> > I don't understand -- how could "various paths" append a reason?
> > The patch below seems to return "PANIC_MCE" for every x86 crash.
> > What am I missing?
> > 
> > Dave
> > 
> 
> Yes, presently it can only be "PANIC_MCE" for MCE crashes in x86 (not
> for every crash though).
> 
> With increased usage, we should move this code to a generic location and
> let each of these crash paths return a string to be appended. In fact it
> doesn't have to be a string for CRASH_REASON but just an encoding of the
> various crash types into numbers. User-space tools could then do a lookup
> for getting the right crash string.

Probably string is a better idea? Where do we do lookup to find out what
maps to what? This would require kernel exporting this info in a header
and then comes the issue of having same kernel version info. Or the issue
of analyzing kernel dumps on the same machine.

Storing a string will atleast help that one does not have to worry about
mapping error code. makedumpfile shall have to grep for exact same string
though.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC Patch 1/2][slimdump] Append CRASH_REASON to VMCOREINFO elf-note
  2011-11-23 19:45       ` Dave Anderson
@ 2011-11-29 14:37         ` K.Prasad
  0 siblings, 0 replies; 15+ messages in thread
From: K.Prasad @ 2011-11-29 14:37 UTC (permalink / raw)
  To: Dave Anderson
  Cc: Vivek Goyal, Borislav Petkov, Tony Luck, Eric W. Biederman,
	tachibana, oomichi, Valdis Kletnieks, Nick Bowler, linux-kernel

On Wed, Nov 23, 2011 at 02:45:23PM -0500, Dave Anderson wrote:
> 
> 
> ----- Original Message -----
> > On Mon, Nov 21, 2011 at 10:19:31AM -0500, Dave Anderson wrote:
> > > 
> > > 
> > > ----- Original Message -----
> > > > Allow various crash paths to append the reason of crash into the
> > > > VMCOREINFO elf-note through the field CRASH_REASON. We also make the
> > > > fatal machine check exceptions append "PANIC_MCE" as the crash
> > > > reason.  This string will be recognised by upstream tools like
> > > > makedumpfile and crash to generate slimdump.
> > > 
> > > I don't understand -- how could "various paths" append a reason?
> > > The patch below seems to return "PANIC_MCE" for every x86 crash.
> > > What am I missing?
> > > 
> > > Dave
> > >
> > 
> > Yes, presently it can only be "PANIC_MCE" for MCE crashes in x86 (not
> > for every crash though).
> 
> Why only MCE crashes?  If your arch/x86/kernel/cpu/mcheck/mce.c is 
> compiled into any x86 kernel, then its arch_add_crash_reason() will 
> always override the weak version in kernel/kexec.c, right? 
>

(Sorry for the delayed reply...there was a multiple-day outage in my
email server).

Yes, I think it should be better by using a variable to allow
various crash paths of each architecture populate the reason (the
function approach doesn't work, we can use a function pointer
that just returns a pointer to a char array, but that's less
preferable and convoluted).

I'll change the code to contain a char pointer in a generic file, say
arch/x86/kernel/crash.c which will be populated by the MCE crash path.
The code in kernel/kexec.c can just use this.

Thanks for your comments.

--K.Prasad
 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC Patch 0/2] Slimdump framework using CRASH_REASON - v2
  2011-11-28 14:24     ` Vivek Goyal
@ 2011-11-30 17:15       ` K.Prasad
  0 siblings, 0 replies; 15+ messages in thread
From: K.Prasad @ 2011-11-30 17:15 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-kernel, Borislav Petkov, Luck, Tony, Eric W. Biederman,
	anderson, tachibana, oomichi, Valdis.Kletnieks, Nick Bowler

On Mon, Nov 28, 2011 at 09:24:02AM -0500, Vivek Goyal wrote:
> On Wed, Nov 23, 2011 at 11:03:18PM +0530, K.Prasad wrote:
> > On Mon, Nov 21, 2011 at 10:17:27AM -0500, Vivek Goyal wrote:
> > > On Mon, Nov 21, 2011 at 03:24:05PM +0530, K.Prasad wrote:
[snipped]
> > 
> > The kernel message buffers can be obtained by using the --dump-dmesg
> > option of makedumpfile but again that's risky. We wouldn't know if it'll
> > cause access to the faulty memory (which is how the previous method of having
> > a new elf-notes in a pristine location is much safer).
> > 
> > The method in this patch is quite primitive in that informs the user
> > nothing more than a one-line cause of crash. One should take help from other
> > tools (such as service processor/firmware/ACPI logs, or previous corrected
> > error logs) to infer the location of bad memory.
> 
> And how does one get to firmware/ACPI logs? Many system don't have service
> processor also.
> 
> I think extracting kernel buffers by default in case of MCE is reasonable.
> This should allow somebody to figure out some MCE related information.
> 
> You might want to modify makedumpfile so that it does not try to access
> pages marked poisoned.
>

I'm not sure how easy or difficult it would be to skip hw-poisoned pages
from user-space i.e. makedumpfile. I'll start working on the relevant
changes though, and keep the community posted with the patches.

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-11-30 17:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-21  9:54 [RFC Patch 0/2] Slimdump framework using CRASH_REASON - v2 K.Prasad
2011-11-21 10:11 ` [RFC Patch 1/2][slimdump] Append CRASH_REASON to VMCOREINFO elf-note K.Prasad
2011-11-21 15:11   ` Vivek Goyal
2011-11-23 16:14     ` K.Prasad
2011-11-21 15:19   ` Dave Anderson
2011-11-23 17:39     ` K.Prasad
2011-11-28 14:26       ` Vivek Goyal
2011-11-23 17:42     ` K.Prasad
2011-11-23 19:45       ` Dave Anderson
2011-11-29 14:37         ` K.Prasad
2011-11-21 10:14 ` [RFC Patch 2/2][slimdump][makedumpfile] Recognise PANIC_MCE crashes to generate slimdu K.Prasad
2011-11-21 15:17 ` [RFC Patch 0/2] Slimdump framework using CRASH_REASON - v2 Vivek Goyal
2011-11-23 17:33   ` K.Prasad
2011-11-28 14:24     ` Vivek Goyal
2011-11-30 17:15       ` K.Prasad

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).