All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-04 17:09 ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, horms, schwidefsky, heiko.carstens, kexec, linux-kernel,
	linux-s390

This patch series adds kdump support for the s390 architecture (64 bit). There
are a few common code changes necessary because the s390 implementation is
different to other architectures in some points. Especially these common code
patches (1-7) should be reviewed. Patch 8 "s390: kdump backend code" contains
the s390 specific part. Patch 9 includes the necessary changes for the kexec
tool.

In the following I describe the main differences of the s390 implementation:

The s390 kernel is not relocatable therefore the crashkernel memory is swapped
with the area [0 - crashkernel memory] before the kdump kernel is started.
Architectures other than s390 run the kdump kernel at a memory location that is
disjunct to the standard location for the kernel image and to all memory that
might be in use for I/O by the production system. The main reason for this
seems to be that these architectures do not have a means to clear all ongoing
I/O. If active memory of the production system is reused by the kdump kernel
they run into memory corruption issues. On s390 with diagnose call 308 or boot
(IPL) there is the possibility to stop all ongoing I/O. Therefore we can safely
run the kdump kernel at the old location.

On s390 we do not create page tables for the crashkernel memory and use a
memcpy_real() function to load the kdump kernel and ramdisk in kexec_load()
system call.

On s390 we have external kdump triggers. For example stand-alone dump tools.
The address range information of crashkernel memory is stored at a well defined
storage location that can be used by the external dump triggers to find the
kdump entry point. To export the address range for the crashkernel memory we
introduce a new mechanism that we call meminfo. This allows to define checksum
secured information in memory that is accessible via an s390 ABI defined
storage address. The following information is currently stored via meminfo:
* Crashkernel memory range
* kexec segments for kdump
* Pointer to vmcoreinfo note

Checksums for the loaded kexec segments are stored. This can be used to verify
that kdump is not corrupted. The check is done e.g. by the s390 stand-alone
dump tools via meminfo. If kdump has NOT been overwritten, the checksums are
valid and kdump is started, otherwise a full-blown s390 stand-alone dump is
created as backup dump mechanism.

On s390 the ELF header is created dynamically at kdump startup in the kdump
(2nd) kernel. This is possible, because the memory detection and collection of
the CPU register sets can be done on s390 in the 2nd kernel. Therefore on s390
the ELF header is NOT prepared by the kexec tool. The address for vmcoreinfo
can be found via meminfo and is used by the kdump kernel for ELF header
initialization.

On s390 no additional kernel parameter is needed for kdump. Everything kdump
needs to know can be determined dynamically when the 2nd kernel starts.

If you agree with the approach of this patch series, how should this go
upstream?

Thanks,

Michael

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-04 17:09 ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, linux-s390, kexec, heiko.carstens, linux-kernel, horms,
	schwidefsky

This patch series adds kdump support for the s390 architecture (64 bit). There
are a few common code changes necessary because the s390 implementation is
different to other architectures in some points. Especially these common code
patches (1-7) should be reviewed. Patch 8 "s390: kdump backend code" contains
the s390 specific part. Patch 9 includes the necessary changes for the kexec
tool.

In the following I describe the main differences of the s390 implementation:

The s390 kernel is not relocatable therefore the crashkernel memory is swapped
with the area [0 - crashkernel memory] before the kdump kernel is started.
Architectures other than s390 run the kdump kernel at a memory location that is
disjunct to the standard location for the kernel image and to all memory that
might be in use for I/O by the production system. The main reason for this
seems to be that these architectures do not have a means to clear all ongoing
I/O. If active memory of the production system is reused by the kdump kernel
they run into memory corruption issues. On s390 with diagnose call 308 or boot
(IPL) there is the possibility to stop all ongoing I/O. Therefore we can safely
run the kdump kernel at the old location.

On s390 we do not create page tables for the crashkernel memory and use a
memcpy_real() function to load the kdump kernel and ramdisk in kexec_load()
system call.

On s390 we have external kdump triggers. For example stand-alone dump tools.
The address range information of crashkernel memory is stored at a well defined
storage location that can be used by the external dump triggers to find the
kdump entry point. To export the address range for the crashkernel memory we
introduce a new mechanism that we call meminfo. This allows to define checksum
secured information in memory that is accessible via an s390 ABI defined
storage address. The following information is currently stored via meminfo:
* Crashkernel memory range
* kexec segments for kdump
* Pointer to vmcoreinfo note

Checksums for the loaded kexec segments are stored. This can be used to verify
that kdump is not corrupted. The check is done e.g. by the s390 stand-alone
dump tools via meminfo. If kdump has NOT been overwritten, the checksums are
valid and kdump is started, otherwise a full-blown s390 stand-alone dump is
created as backup dump mechanism.

On s390 the ELF header is created dynamically at kdump startup in the kdump
(2nd) kernel. This is possible, because the memory detection and collection of
the CPU register sets can be done on s390 in the 2nd kernel. Therefore on s390
the ELF header is NOT prepared by the kexec tool. The address for vmcoreinfo
can be found via meminfo and is used by the kdump kernel for ELF header
initialization.

On s390 no additional kernel parameter is needed for kdump. Everything kdump
needs to know can be determined dynamically when the 2nd kernel starts.

If you agree with the approach of this patch series, how should this go
upstream?

Thanks,

Michael

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch 1/9] kdump: Add KEXEC_CRASH_CONTROL_MEMORY_LIMIT
  2011-07-04 17:09 ` Michael Holzheu
@ 2011-07-04 17:09   ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, horms, schwidefsky, heiko.carstens, kexec, linux-kernel,
	linux-s390

[-- Attachment #1: 01-s390-kdump-common-control-limit.patch --]
[-- Type: text/plain, Size: 1260 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

On s390 there is a different KEXEC_CONTROL_MEMORY_LIMIT for the normal and
the kdump kexec case. Therefore this patch introduces a new macro
KEXEC_CRASH_CONTROL_MEMORY_LIMIT. This is set to
KEXEC_CONTROL_MEMORY_LIMIT for all architectures that do not define
KEXEC_CRASH_CONTROL_MEMORY_LIMIT.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 include/linux/kexec.h |    4 ++++
 kernel/kexec.c        |    2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -33,6 +33,10 @@
 #error KEXEC_ARCH not defined
 #endif
 
+#ifndef KEXEC_CRASH_CONTROL_MEMORY_LIMIT
+#define KEXEC_CRASH_CONTROL_MEMORY_LIMIT KEXEC_CONTROL_MEMORY_LIMIT
+#endif
+
 #define KEXEC_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4)
 #define KEXEC_CORE_NOTE_NAME "CORE"
 #define KEXEC_CORE_NOTE_NAME_BYTES ALIGN(sizeof(KEXEC_CORE_NOTE_NAME), 4)
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -498,7 +498,7 @@ static struct page *kimage_alloc_crash_c
 	while (hole_end <= crashk_res.end) {
 		unsigned long i;
 
-		if (hole_end > KEXEC_CONTROL_MEMORY_LIMIT)
+		if (hole_end > KEXEC_CRASH_CONTROL_MEMORY_LIMIT)
 			break;
 		if (hole_end > crashk_res.end)
 			break;


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch 1/9] kdump: Add KEXEC_CRASH_CONTROL_MEMORY_LIMIT
@ 2011-07-04 17:09   ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, linux-s390, kexec, heiko.carstens, linux-kernel, horms,
	schwidefsky

[-- Attachment #1: 01-s390-kdump-common-control-limit.patch --]
[-- Type: text/plain, Size: 1404 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

On s390 there is a different KEXEC_CONTROL_MEMORY_LIMIT for the normal and
the kdump kexec case. Therefore this patch introduces a new macro
KEXEC_CRASH_CONTROL_MEMORY_LIMIT. This is set to
KEXEC_CONTROL_MEMORY_LIMIT for all architectures that do not define
KEXEC_CRASH_CONTROL_MEMORY_LIMIT.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 include/linux/kexec.h |    4 ++++
 kernel/kexec.c        |    2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -33,6 +33,10 @@
 #error KEXEC_ARCH not defined
 #endif
 
+#ifndef KEXEC_CRASH_CONTROL_MEMORY_LIMIT
+#define KEXEC_CRASH_CONTROL_MEMORY_LIMIT KEXEC_CONTROL_MEMORY_LIMIT
+#endif
+
 #define KEXEC_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4)
 #define KEXEC_CORE_NOTE_NAME "CORE"
 #define KEXEC_CORE_NOTE_NAME_BYTES ALIGN(sizeof(KEXEC_CORE_NOTE_NAME), 4)
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -498,7 +498,7 @@ static struct page *kimage_alloc_crash_c
 	while (hole_end <= crashk_res.end) {
 		unsigned long i;
 
-		if (hole_end > KEXEC_CONTROL_MEMORY_LIMIT)
+		if (hole_end > KEXEC_CRASH_CONTROL_MEMORY_LIMIT)
 			break;
 		if (hole_end > crashk_res.end)
 			break;


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch 2/9] kdump: Add machine_kexec_finish()
  2011-07-04 17:09 ` Michael Holzheu
@ 2011-07-04 17:09   ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, horms, schwidefsky, heiko.carstens, kexec, linux-kernel,
	linux-s390

[-- Attachment #1: 02-s390-kdump-common-kexec_finish.patch --]
[-- Type: text/plain, Size: 1571 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

On s390 we create checksums for the loaded kexec segments case of kdump.
Therefore we need an additional callback at the end of the kexec_load()
system call. This patch introduces machine_kexec_finish() with an empty
implementation for all architectures that do not need the callback.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 include/linux/kexec.h |    1 +
 kernel/kexec.c        |    8 ++++++++
 2 files changed, 9 insertions(+)

--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -115,6 +115,7 @@ struct kimage {
 /* kexec interface functions */
 extern void machine_kexec(struct kimage *image);
 extern int machine_kexec_prepare(struct kimage *image);
+extern void machine_kexec_finish(struct kimage *image, int flags);
 extern void machine_kexec_cleanup(struct kimage *image);
 extern asmlinkage long sys_kexec_load(unsigned long entry,
 					unsigned long nr_segments,
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1017,6 +1017,7 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
 		kimage_terminate(image);
 	}
 	/* Install the new kernel, and  Uninstall the old */
+	machine_kexec_finish(image, flags);
 	image = xchg(dest_image, image);
 
 out:
@@ -1026,6 +1027,13 @@ out:
 	return result;
 }
 
+/*
+ * provide an empty default implementation here -- architecture
+ * code may override this
+ */
+void __weak machine_kexec_finish(struct kimage *image, int flags)
+{}
+
 #ifdef CONFIG_COMPAT
 asmlinkage long compat_sys_kexec_load(unsigned long entry,
 				unsigned long nr_segments,


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch 2/9] kdump: Add machine_kexec_finish()
@ 2011-07-04 17:09   ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, linux-s390, kexec, heiko.carstens, linux-kernel, horms,
	schwidefsky

[-- Attachment #1: 02-s390-kdump-common-kexec_finish.patch --]
[-- Type: text/plain, Size: 1715 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

On s390 we create checksums for the loaded kexec segments case of kdump.
Therefore we need an additional callback at the end of the kexec_load()
system call. This patch introduces machine_kexec_finish() with an empty
implementation for all architectures that do not need the callback.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 include/linux/kexec.h |    1 +
 kernel/kexec.c        |    8 ++++++++
 2 files changed, 9 insertions(+)

--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -115,6 +115,7 @@ struct kimage {
 /* kexec interface functions */
 extern void machine_kexec(struct kimage *image);
 extern int machine_kexec_prepare(struct kimage *image);
+extern void machine_kexec_finish(struct kimage *image, int flags);
 extern void machine_kexec_cleanup(struct kimage *image);
 extern asmlinkage long sys_kexec_load(unsigned long entry,
 					unsigned long nr_segments,
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1017,6 +1017,7 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
 		kimage_terminate(image);
 	}
 	/* Install the new kernel, and  Uninstall the old */
+	machine_kexec_finish(image, flags);
 	image = xchg(dest_image, image);
 
 out:
@@ -1026,6 +1027,13 @@ out:
 	return result;
 }
 
+/*
+ * provide an empty default implementation here -- architecture
+ * code may override this
+ */
+void __weak machine_kexec_finish(struct kimage *image, int flags)
+{}
+
 #ifdef CONFIG_COMPAT
 asmlinkage long compat_sys_kexec_load(unsigned long entry,
 				unsigned long nr_segments,


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch 3/9] kdump: Make kimage_load_crash_segment() weak
  2011-07-04 17:09 ` Michael Holzheu
@ 2011-07-04 17:09   ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, horms, schwidefsky, heiko.carstens, kexec, linux-kernel,
	linux-s390

[-- Attachment #1: 03-s390-kdump-common-load_crash_segment.patch --]
[-- Type: text/plain, Size: 1415 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

On s390 we do not create page tables at all for the crashkernel memory.
This requires a s390 specific version for kimage_load_crash_segment().
Therefore this patch declares this function as "__weak". The s390 version is
very simple. It just copies the kexec segment to real memory without using
page tables:

int kimage_load_crash_segment(struct kimage *image,
                              struct kexec_segment *segment)
{
        return copy_from_user_real((void *) segment->mem, segment->buf,
                                   segment->bufsz);
}

There are two main advantages of not creating page tables for the
crashkernel memory:

a) It saves memory. We have scenarios in mind, where crashkernel
   memory can be very large and saving page table space is important.
b) We protect the crashkernel memory from being overwritten.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 kernel/kexec.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -842,8 +842,8 @@ out:
 	return result;
 }
 
-static int kimage_load_crash_segment(struct kimage *image,
-					struct kexec_segment *segment)
+int __weak kimage_load_crash_segment(struct kimage *image,
+				     struct kexec_segment *segment)
 {
 	/* For crash dumps kernels we simply copy the data from
 	 * user space to it's destination.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch 3/9] kdump: Make kimage_load_crash_segment() weak
@ 2011-07-04 17:09   ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, linux-s390, kexec, heiko.carstens, linux-kernel, horms,
	schwidefsky

[-- Attachment #1: 03-s390-kdump-common-load_crash_segment.patch --]
[-- Type: text/plain, Size: 1559 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

On s390 we do not create page tables at all for the crashkernel memory.
This requires a s390 specific version for kimage_load_crash_segment().
Therefore this patch declares this function as "__weak". The s390 version is
very simple. It just copies the kexec segment to real memory without using
page tables:

int kimage_load_crash_segment(struct kimage *image,
                              struct kexec_segment *segment)
{
        return copy_from_user_real((void *) segment->mem, segment->buf,
                                   segment->bufsz);
}

There are two main advantages of not creating page tables for the
crashkernel memory:

a) It saves memory. We have scenarios in mind, where crashkernel
   memory can be very large and saving page table space is important.
b) We protect the crashkernel memory from being overwritten.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 kernel/kexec.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -842,8 +842,8 @@ out:
 	return result;
 }
 
-static int kimage_load_crash_segment(struct kimage *image,
-					struct kexec_segment *segment)
+int __weak kimage_load_crash_segment(struct kimage *image,
+				     struct kexec_segment *segment)
 {
 	/* For crash dumps kernels we simply copy the data from
 	 * user space to it's destination.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch 4/9] kdump: Initialize vmcoreinfo note at startup
  2011-07-04 17:09 ` Michael Holzheu
@ 2011-07-04 17:09   ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, horms, schwidefsky, heiko.carstens, kexec, linux-kernel,
	linux-s390

[-- Attachment #1: 04-s390-kdump-common-vmcoreinfo.patch --]
[-- Type: text/plain, Size: 1486 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

Currently the vmcoreinfo note is only initialized in case of kdump. On s390
it is possible to create kernel dumps with other dump mechanisms than kdump
(e.g. via hypervisor dump or stand-alone dump tools). For those dumps it
would also be desirable to include the vmcoreinfo data. To accomplish this,
with this patch the vmcoreinfo ELF note is always initialized, not only in
case of a (kdump) crash.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 kernel/kexec.c |   18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1388,24 +1388,23 @@ int __init parse_crashkernel(char 		 *cm
 }
 
 
-
-void crash_save_vmcoreinfo(void)
+static void update_vmcoreinfo_note(void)
 {
-	u32 *buf;
+	u32 *buf = (u32 *) vmcoreinfo_note;
 
 	if (!vmcoreinfo_size)
 		return;
-
-	vmcoreinfo_append_str("CRASHTIME=%ld", get_seconds());
-
-	buf = (u32 *)vmcoreinfo_note;
-
 	buf = append_elf_note(buf, VMCOREINFO_NOTE_NAME, 0, vmcoreinfo_data,
 			      vmcoreinfo_size);
-
 	final_note(buf);
 }
 
+void crash_save_vmcoreinfo(void)
+{
+	vmcoreinfo_append_str("CRASHTIME=%ld", get_seconds());
+	update_vmcoreinfo_note();
+}
+
 void vmcoreinfo_append_str(const char *fmt, ...)
 {
 	va_list args;
@@ -1491,6 +1490,7 @@ static int __init crash_save_vmcoreinfo_
 	VMCOREINFO_NUMBER(PG_swapcache);
 
 	arch_crash_save_vmcoreinfo();
+	update_vmcoreinfo_note();
 
 	return 0;
 }


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch 4/9] kdump: Initialize vmcoreinfo note at startup
@ 2011-07-04 17:09   ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, linux-s390, kexec, heiko.carstens, linux-kernel, horms,
	schwidefsky

[-- Attachment #1: 04-s390-kdump-common-vmcoreinfo.patch --]
[-- Type: text/plain, Size: 1630 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

Currently the vmcoreinfo note is only initialized in case of kdump. On s390
it is possible to create kernel dumps with other dump mechanisms than kdump
(e.g. via hypervisor dump or stand-alone dump tools). For those dumps it
would also be desirable to include the vmcoreinfo data. To accomplish this,
with this patch the vmcoreinfo ELF note is always initialized, not only in
case of a (kdump) crash.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 kernel/kexec.c |   18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1388,24 +1388,23 @@ int __init parse_crashkernel(char 		 *cm
 }
 
 
-
-void crash_save_vmcoreinfo(void)
+static void update_vmcoreinfo_note(void)
 {
-	u32 *buf;
+	u32 *buf = (u32 *) vmcoreinfo_note;
 
 	if (!vmcoreinfo_size)
 		return;
-
-	vmcoreinfo_append_str("CRASHTIME=%ld", get_seconds());
-
-	buf = (u32 *)vmcoreinfo_note;
-
 	buf = append_elf_note(buf, VMCOREINFO_NOTE_NAME, 0, vmcoreinfo_data,
 			      vmcoreinfo_size);
-
 	final_note(buf);
 }
 
+void crash_save_vmcoreinfo(void)
+{
+	vmcoreinfo_append_str("CRASHTIME=%ld", get_seconds());
+	update_vmcoreinfo_note();
+}
+
 void vmcoreinfo_append_str(const char *fmt, ...)
 {
 	va_list args;
@@ -1491,6 +1490,7 @@ static int __init crash_save_vmcoreinfo_
 	VMCOREINFO_NUMBER(PG_swapcache);
 
 	arch_crash_save_vmcoreinfo();
+	update_vmcoreinfo_note();
 
 	return 0;
 }


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch 5/9] kdump: Allow vmcore ELF header to be created in new kernel
  2011-07-04 17:09 ` Michael Holzheu
@ 2011-07-04 17:09   ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, horms, schwidefsky, heiko.carstens, kexec, linux-kernel,
	linux-s390

[-- Attachment #1: 05-s390-kdump-common-vmcore-newmem.patch --]
[-- Type: text/plain, Size: 3951 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

For s390 we create the ELF header for /proc/vmcore in the second (kdump)
kernel. Currently vmcore gets the ELF header from oldmem using the global
variable "elfcorehdr_addr". This patch introduces a new value
ELFCORE_ADDR_NEWMEM for "elfcorehdr_addr" that indicates that the ELF header
is allocated in the new kernel. In this case a new architecture function
"arch_vmcore_get_elf_hdr()" is called to obtain address and length of the
ELF header.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 fs/proc/vmcore.c           |   66 ++++++++++++++++++++++++++++++++++++---------
 include/linux/crash_dump.h |    1 
 2 files changed, 55 insertions(+), 12 deletions(-)

--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -494,14 +494,10 @@ static void __init set_vmcore_list_offse
 						struct list_head *vc_list)
 {
 	loff_t vmcore_off;
-	Elf64_Ehdr *ehdr_ptr;
 	struct vmcore *m;
 
-	ehdr_ptr = (Elf64_Ehdr *)elfptr;
-
 	/* Skip Elf header and program headers. */
-	vmcore_off = sizeof(Elf64_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr);
+	vmcore_off = elfcorebuf_sz;
 
 	list_for_each_entry(m, vc_list, list) {
 		m->offset = vmcore_off;
@@ -514,14 +510,10 @@ static void __init set_vmcore_list_offse
 						struct list_head *vc_list)
 {
 	loff_t vmcore_off;
-	Elf32_Ehdr *ehdr_ptr;
 	struct vmcore *m;
 
-	ehdr_ptr = (Elf32_Ehdr *)elfptr;
-
 	/* Skip Elf header and program headers. */
-	vmcore_off = sizeof(Elf32_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr);
+	vmcore_off = elfcorebuf_sz;
 
 	list_for_each_entry(m, vc_list, list) {
 		m->offset = vmcore_off;
@@ -641,7 +633,7 @@ static int __init parse_crash_elf32_head
 	return 0;
 }
 
-static int __init parse_crash_elf_headers(void)
+static int __init parse_crash_elf_headers_oldmem(void)
 {
 	unsigned char e_ident[EI_NIDENT];
 	u64 addr;
@@ -679,6 +671,53 @@ static int __init parse_crash_elf_header
 	return 0;
 }
 
+/*
+ * provide an empty default implementation here -- architecture
+ * code may override this
+ */
+int __weak arch_vmcore_get_elf_hdr(char **elfcorebuf, size_t *elfcorebuf_sz)
+{
+	return -EOPNOTSUPP;
+}
+
+static int __init parse_crash_elf_headers_newmem(void)
+{
+	unsigned char e_ident[EI_NIDENT];
+	int rc;
+
+	rc = arch_vmcore_get_elf_hdr(&elfcorebuf, &elfcorebuf_sz);
+	if (rc)
+		return rc;
+	memcpy(e_ident, elfcorebuf, EI_NIDENT);
+	if (memcmp(e_ident, ELFMAG, SELFMAG) != 0) {
+		printk(KERN_WARNING "Warning: Core image elf header "
+		       "not found\n");
+		rc = -EINVAL;
+		goto fail;
+	}
+	if (e_ident[EI_CLASS] == ELFCLASS64) {
+		rc = process_ptload_program_headers_elf64(elfcorebuf,
+							  elfcorebuf_sz,
+							  &vmcore_list);
+		if (rc)
+			goto fail;
+		set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
+		vmcore_size = get_vmcore_size_elf64(elfcorebuf);
+	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
+		rc = process_ptload_program_headers_elf32(elfcorebuf,
+							  elfcorebuf_sz,
+							  &vmcore_list);
+		if (rc)
+			goto fail;
+		set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
+		vmcore_size = get_vmcore_size_elf32(elfcorebuf);
+	}
+	return 0;
+fail:
+	kfree(elfcorebuf);
+	return rc;
+}
+
 /* Init function for vmcore module. */
 static int __init vmcore_init(void)
 {
@@ -687,7 +726,10 @@ static int __init vmcore_init(void)
 	/* If elfcorehdr= has been passed in cmdline, then capture the dump.*/
 	if (!(is_vmcore_usable()))
 		return rc;
-	rc = parse_crash_elf_headers();
+	if (elfcorehdr_addr == ELFCORE_ADDR_NEWMEM)
+		rc = parse_crash_elf_headers_newmem();
+	else
+		rc = parse_crash_elf_headers_oldmem();
 	if (rc) {
 		printk(KERN_WARNING "Kdump: vmcore not initialized\n");
 		return rc;
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -8,6 +8,7 @@
 
 #define ELFCORE_ADDR_MAX	(-1ULL)
 #define ELFCORE_ADDR_ERR	(-2ULL)
+#define ELFCORE_ADDR_NEWMEM	(-3ULL)
 
 extern unsigned long long elfcorehdr_addr;
 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch 5/9] kdump: Allow vmcore ELF header to be created in new kernel
@ 2011-07-04 17:09   ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, linux-s390, kexec, heiko.carstens, linux-kernel, horms,
	schwidefsky

[-- Attachment #1: 05-s390-kdump-common-vmcore-newmem.patch --]
[-- Type: text/plain, Size: 4095 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

For s390 we create the ELF header for /proc/vmcore in the second (kdump)
kernel. Currently vmcore gets the ELF header from oldmem using the global
variable "elfcorehdr_addr". This patch introduces a new value
ELFCORE_ADDR_NEWMEM for "elfcorehdr_addr" that indicates that the ELF header
is allocated in the new kernel. In this case a new architecture function
"arch_vmcore_get_elf_hdr()" is called to obtain address and length of the
ELF header.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 fs/proc/vmcore.c           |   66 ++++++++++++++++++++++++++++++++++++---------
 include/linux/crash_dump.h |    1 
 2 files changed, 55 insertions(+), 12 deletions(-)

--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -494,14 +494,10 @@ static void __init set_vmcore_list_offse
 						struct list_head *vc_list)
 {
 	loff_t vmcore_off;
-	Elf64_Ehdr *ehdr_ptr;
 	struct vmcore *m;
 
-	ehdr_ptr = (Elf64_Ehdr *)elfptr;
-
 	/* Skip Elf header and program headers. */
-	vmcore_off = sizeof(Elf64_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr);
+	vmcore_off = elfcorebuf_sz;
 
 	list_for_each_entry(m, vc_list, list) {
 		m->offset = vmcore_off;
@@ -514,14 +510,10 @@ static void __init set_vmcore_list_offse
 						struct list_head *vc_list)
 {
 	loff_t vmcore_off;
-	Elf32_Ehdr *ehdr_ptr;
 	struct vmcore *m;
 
-	ehdr_ptr = (Elf32_Ehdr *)elfptr;
-
 	/* Skip Elf header and program headers. */
-	vmcore_off = sizeof(Elf32_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr);
+	vmcore_off = elfcorebuf_sz;
 
 	list_for_each_entry(m, vc_list, list) {
 		m->offset = vmcore_off;
@@ -641,7 +633,7 @@ static int __init parse_crash_elf32_head
 	return 0;
 }
 
-static int __init parse_crash_elf_headers(void)
+static int __init parse_crash_elf_headers_oldmem(void)
 {
 	unsigned char e_ident[EI_NIDENT];
 	u64 addr;
@@ -679,6 +671,53 @@ static int __init parse_crash_elf_header
 	return 0;
 }
 
+/*
+ * provide an empty default implementation here -- architecture
+ * code may override this
+ */
+int __weak arch_vmcore_get_elf_hdr(char **elfcorebuf, size_t *elfcorebuf_sz)
+{
+	return -EOPNOTSUPP;
+}
+
+static int __init parse_crash_elf_headers_newmem(void)
+{
+	unsigned char e_ident[EI_NIDENT];
+	int rc;
+
+	rc = arch_vmcore_get_elf_hdr(&elfcorebuf, &elfcorebuf_sz);
+	if (rc)
+		return rc;
+	memcpy(e_ident, elfcorebuf, EI_NIDENT);
+	if (memcmp(e_ident, ELFMAG, SELFMAG) != 0) {
+		printk(KERN_WARNING "Warning: Core image elf header "
+		       "not found\n");
+		rc = -EINVAL;
+		goto fail;
+	}
+	if (e_ident[EI_CLASS] == ELFCLASS64) {
+		rc = process_ptload_program_headers_elf64(elfcorebuf,
+							  elfcorebuf_sz,
+							  &vmcore_list);
+		if (rc)
+			goto fail;
+		set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
+		vmcore_size = get_vmcore_size_elf64(elfcorebuf);
+	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
+		rc = process_ptload_program_headers_elf32(elfcorebuf,
+							  elfcorebuf_sz,
+							  &vmcore_list);
+		if (rc)
+			goto fail;
+		set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
+		vmcore_size = get_vmcore_size_elf32(elfcorebuf);
+	}
+	return 0;
+fail:
+	kfree(elfcorebuf);
+	return rc;
+}
+
 /* Init function for vmcore module. */
 static int __init vmcore_init(void)
 {
@@ -687,7 +726,10 @@ static int __init vmcore_init(void)
 	/* If elfcorehdr= has been passed in cmdline, then capture the dump.*/
 	if (!(is_vmcore_usable()))
 		return rc;
-	rc = parse_crash_elf_headers();
+	if (elfcorehdr_addr == ELFCORE_ADDR_NEWMEM)
+		rc = parse_crash_elf_headers_newmem();
+	else
+		rc = parse_crash_elf_headers_oldmem();
 	if (rc) {
 		printk(KERN_WARNING "Kdump: vmcore not initialized\n");
 		return rc;
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -8,6 +8,7 @@
 
 #define ELFCORE_ADDR_MAX	(-1ULL)
 #define ELFCORE_ADDR_ERR	(-2ULL)
+#define ELFCORE_ADDR_NEWMEM	(-3ULL)
 
 extern unsigned long long elfcorehdr_addr;
 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch 6/9] kdump: Merge set_vmcore_list_offsets_elf_32/64()
  2011-07-04 17:09 ` Michael Holzheu
@ 2011-07-04 17:09   ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, horms, schwidefsky, heiko.carstens, kexec, linux-kernel,
	linux-s390

[-- Attachment #1: 06-s390-kdump-common-vmcore-merge-set_vmcore_list_offsets.patch --]
[-- Type: text/plain, Size: 2295 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

The two functions set_vmcore_list_offsets_elf_32/64() are identical now
for 32 and 64 bit. Therefore this patch merges them into one
set_vmcore_list_offsets() function.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 fs/proc/vmcore.c |   28 ++++++----------------------
 1 file changed, 6 insertions(+), 22 deletions(-)

--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -490,24 +490,8 @@ static int __init process_ptload_program
 }
 
 /* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf64(char *elfptr,
-						struct list_head *vc_list)
-{
-	loff_t vmcore_off;
-	struct vmcore *m;
-
-	/* Skip Elf header and program headers. */
-	vmcore_off = elfcorebuf_sz;
-
-	list_for_each_entry(m, vc_list, list) {
-		m->offset = vmcore_off;
-		vmcore_off += m->size;
-	}
-}
-
-/* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf32(char *elfptr,
-						struct list_head *vc_list)
+static void __init set_vmcore_list_offsets(char *elfptr,
+					   struct list_head *vc_list)
 {
 	loff_t vmcore_off;
 	struct vmcore *m;
@@ -573,7 +557,7 @@ static int __init parse_crash_elf64_head
 		kfree(elfcorebuf);
 		return rc;
 	}
-	set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
+	set_vmcore_list_offsets(elfcorebuf, &vmcore_list);
 	return 0;
 }
 
@@ -629,7 +613,7 @@ static int __init parse_crash_elf32_head
 		kfree(elfcorebuf);
 		return rc;
 	}
-	set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
+	set_vmcore_list_offsets(elfcorebuf, &vmcore_list);
 	return 0;
 }
 
@@ -701,7 +685,7 @@ static int __init parse_crash_elf_header
 							  &vmcore_list);
 		if (rc)
 			goto fail;
-		set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
+		set_vmcore_list_offsets(elfcorebuf, &vmcore_list);
 		vmcore_size = get_vmcore_size_elf64(elfcorebuf);
 	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
 		rc = process_ptload_program_headers_elf32(elfcorebuf,
@@ -709,7 +693,7 @@ static int __init parse_crash_elf_header
 							  &vmcore_list);
 		if (rc)
 			goto fail;
-		set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
+		set_vmcore_list_offsets(elfcorebuf, &vmcore_list);
 		vmcore_size = get_vmcore_size_elf32(elfcorebuf);
 	}
 	return 0;


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch 6/9] kdump: Merge set_vmcore_list_offsets_elf_32/64()
@ 2011-07-04 17:09   ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, linux-s390, kexec, heiko.carstens, linux-kernel, horms,
	schwidefsky

[-- Attachment #1: 06-s390-kdump-common-vmcore-merge-set_vmcore_list_offsets.patch --]
[-- Type: text/plain, Size: 2439 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

The two functions set_vmcore_list_offsets_elf_32/64() are identical now
for 32 and 64 bit. Therefore this patch merges them into one
set_vmcore_list_offsets() function.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 fs/proc/vmcore.c |   28 ++++++----------------------
 1 file changed, 6 insertions(+), 22 deletions(-)

--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -490,24 +490,8 @@ static int __init process_ptload_program
 }
 
 /* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf64(char *elfptr,
-						struct list_head *vc_list)
-{
-	loff_t vmcore_off;
-	struct vmcore *m;
-
-	/* Skip Elf header and program headers. */
-	vmcore_off = elfcorebuf_sz;
-
-	list_for_each_entry(m, vc_list, list) {
-		m->offset = vmcore_off;
-		vmcore_off += m->size;
-	}
-}
-
-/* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf32(char *elfptr,
-						struct list_head *vc_list)
+static void __init set_vmcore_list_offsets(char *elfptr,
+					   struct list_head *vc_list)
 {
 	loff_t vmcore_off;
 	struct vmcore *m;
@@ -573,7 +557,7 @@ static int __init parse_crash_elf64_head
 		kfree(elfcorebuf);
 		return rc;
 	}
-	set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
+	set_vmcore_list_offsets(elfcorebuf, &vmcore_list);
 	return 0;
 }
 
@@ -629,7 +613,7 @@ static int __init parse_crash_elf32_head
 		kfree(elfcorebuf);
 		return rc;
 	}
-	set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
+	set_vmcore_list_offsets(elfcorebuf, &vmcore_list);
 	return 0;
 }
 
@@ -701,7 +685,7 @@ static int __init parse_crash_elf_header
 							  &vmcore_list);
 		if (rc)
 			goto fail;
-		set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
+		set_vmcore_list_offsets(elfcorebuf, &vmcore_list);
 		vmcore_size = get_vmcore_size_elf64(elfcorebuf);
 	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
 		rc = process_ptload_program_headers_elf32(elfcorebuf,
@@ -709,7 +693,7 @@ static int __init parse_crash_elf_header
 							  &vmcore_list);
 		if (rc)
 			goto fail;
-		set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
+		set_vmcore_list_offsets(elfcorebuf, &vmcore_list);
 		vmcore_size = get_vmcore_size_elf32(elfcorebuf);
 	}
 	return 0;


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch 7/9] kdump: Trigger kdump via panic notifier chain on s390
  2011-07-04 17:09 ` Michael Holzheu
@ 2011-07-04 17:09   ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, horms, schwidefsky, heiko.carstens, kexec, linux-kernel,
	linux-s390

[-- Attachment #1: 07-s390-kdump-common-shutdown-action.patch --]
[-- Type: text/plain, Size: 974 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

On s390 we have the possibility to configure actions that are executed in
case of a kernel panic. E.g. it is possible to automatically trigger an s390
stand-alone dump. The actions are called via a panic notifier.  We also want
to trigger kdump via the notifier call chain. Therefore this patch disables
for s390 the direct kdump invocation in the panic() function.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 kernel/panic.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -84,9 +84,12 @@ NORET_TYPE void panic(const char * fmt,
 	/*
 	 * If we have crashed and we have a crash kernel loaded let it handle
 	 * everything else.
+	 * For s390 kdump is triggered via the panic notifier call chain.
 	 * Do we want to call this before we try to display a message?
 	 */
+#if !defined(CONFIG_S390)
 	crash_kexec(NULL);
+#endif
 
 	kmsg_dump(KMSG_DUMP_PANIC);
 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch 7/9] kdump: Trigger kdump via panic notifier chain on s390
@ 2011-07-04 17:09   ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, linux-s390, kexec, heiko.carstens, linux-kernel, horms,
	schwidefsky

[-- Attachment #1: 07-s390-kdump-common-shutdown-action.patch --]
[-- Type: text/plain, Size: 1118 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

On s390 we have the possibility to configure actions that are executed in
case of a kernel panic. E.g. it is possible to automatically trigger an s390
stand-alone dump. The actions are called via a panic notifier.  We also want
to trigger kdump via the notifier call chain. Therefore this patch disables
for s390 the direct kdump invocation in the panic() function.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 kernel/panic.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -84,9 +84,12 @@ NORET_TYPE void panic(const char * fmt,
 	/*
 	 * If we have crashed and we have a crash kernel loaded let it handle
 	 * everything else.
+	 * For s390 kdump is triggered via the panic notifier call chain.
 	 * Do we want to call this before we try to display a message?
 	 */
+#if !defined(CONFIG_S390)
 	crash_kexec(NULL);
+#endif
 
 	kmsg_dump(KMSG_DUMP_PANIC);
 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch 8/9] s390: kdump backend code
  2011-07-04 17:09 ` Michael Holzheu
@ 2011-07-04 17:09   ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, horms, schwidefsky, heiko.carstens, kexec, linux-kernel,
	linux-s390

[-- Attachment #1: 08-s390-kdump-arch.patch --]
[-- Type: text/plain, Size: 69362 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

This patch provides the architecture specific part of the s390 kdump
support. This includes the following changes:
* S390 backend code for kdump/kexec framework
* New restart shutdown trigger and kdump action
* New meminfo interface to allow external kdump triggers

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 arch/s390/Kconfig                 |   10 
 arch/s390/include/asm/checksum.h  |   18 +
 arch/s390/include/asm/ipl.h       |    4 
 arch/s390/include/asm/kexec.h     |    3 
 arch/s390/include/asm/lowcore.h   |   62 +++++
 arch/s390/include/asm/sclp.h      |    1 
 arch/s390/include/asm/setup.h     |    5 
 arch/s390/include/asm/system.h    |    4 
 arch/s390/kernel/Makefile         |    3 
 arch/s390/kernel/asm-offsets.c    |    7 
 arch/s390/kernel/base.S           |   37 +++
 arch/s390/kernel/crash_dump.c     |   76 ++++++
 arch/s390/kernel/crash_dump_elf.c |  434 ++++++++++++++++++++++++++++++++++++++
 arch/s390/kernel/early.c          |   12 +
 arch/s390/kernel/entry.S          |   28 ++
 arch/s390/kernel/entry64.S        |   21 +
 arch/s390/kernel/head.S           |   14 +
 arch/s390/kernel/head_kdump.S     |  133 +++++++++++
 arch/s390/kernel/ipl.c            |  201 ++++++++++++++---
 arch/s390/kernel/machine_kexec.c  |  164 ++++++++++++++
 arch/s390/kernel/mem_detect.c     |   70 ++++++
 arch/s390/kernel/meminfo.c        |  132 +++++++++++
 arch/s390/kernel/reipl64.S        |   82 +++++--
 arch/s390/kernel/setup.c          |  210 ++++++++++++++++++
 arch/s390/kernel/smp.c            |   26 ++
 arch/s390/mm/maccess.c            |   83 +++++++
 arch/s390/mm/vmem.c               |    3 
 drivers/s390/char/zcore.c         |   20 -
 28 files changed, 1784 insertions(+), 79 deletions(-)

--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -567,6 +567,16 @@ config KEXEC
 	  current kernel, and to start another kernel.  It is like a reboot
 	  but is independent of hardware/microcode support.
 
+config CRASH_DUMP
+	bool "kernel crash dumps"
+	depends on 64BIT
+	help
+	  Generate crash dump after being started by kexec.
+	  Crash dump kernels are loaded in the main kernel with kexec-tools
+	  into a specially reserved region and then later executed after
+	  a crash by kdump/kexec.
+	  For more details see Documentation/kdump/kdump.txt
+
 config ZFCPDUMP
 	def_bool n
 	prompt "zfcpdump support"
--- a/arch/s390/include/asm/checksum.h
+++ b/arch/s390/include/asm/checksum.h
@@ -41,6 +41,24 @@ csum_partial(const void *buff, int len,
 }
 
 /*
+ * The same as csum_partial(), but operates on real memory
+ */
+static inline __wsum csum_partial_real(const void *buf, int len, __wsum sum)
+{
+	register unsigned long reg2 asm("2") = (unsigned long) buf;
+	register unsigned long reg3 asm("3") = (unsigned long) len;
+	unsigned long flags;
+
+	flags = __arch_local_irq_stnsm(0xf8UL);
+	asm volatile(
+		"0:	cksm	%0,%1\n"
+		"	jo	0b\n"
+		: "+d" (sum), "+d" (reg2), "+d" (reg3) : : "cc", "memory");
+	arch_local_irq_restore(flags);
+	return sum;
+}
+
+/*
  * the same as csum_partial_copy, but copies from user space.
  *
  * here even more important to align src and dst on a 32-bit (or even
--- a/arch/s390/include/asm/ipl.h
+++ b/arch/s390/include/asm/ipl.h
@@ -167,5 +167,9 @@ enum diag308_rc {
 };
 
 extern int diag308(unsigned long subcode, void *addr);
+void do_reset_diag308(void);
+void do_store_status(void);
+ssize_t crash_read_from_oldmem(void *buf, size_t count, u64 ppos, int userbuf);
+void machine_kdump(void);
 
 #endif /* _ASM_S390_IPL_H */
--- a/arch/s390/include/asm/kexec.h
+++ b/arch/s390/include/asm/kexec.h
@@ -30,6 +30,9 @@
 /* Not more than 2GB */
 #define KEXEC_CONTROL_MEMORY_LIMIT (1UL<<31)
 
+/* Maximum address we can use for the crash control pages */
+#define KEXEC_CRASH_CONTROL_MEMORY_LIMIT (-1UL)
+
 /* Allocate one page for the pdp and the second for the code */
 #define KEXEC_CONTROL_PAGE_SIZE 4096
 
--- a/arch/s390/include/asm/lowcore.h
+++ b/arch/s390/include/asm/lowcore.h
@@ -18,6 +18,45 @@ void system_call(void);
 void pgm_check_handler(void);
 void mcck_int_handler(void);
 void io_int_handler(void);
+void psw_restart_int_handler(void);
+
+/*
+ * Meminfo types: The defined numbers are ABI and must not be changed
+ */
+enum meminfo_type {
+	MEMINFO_TYPE_IPIB	= 0,
+	MEMINFO_TYPE_VMCOREINFO	= 1,
+	MEMINFO_TYPE_KDUMP_MEM	= 2,
+	MEMINFO_TYPE_KDUMP_SEGM	= 3,
+	MEMINFO_TYPE_LAST	= 4,
+};
+
+/*
+ * Meminfo flags: The flags are ABI and must not be changed
+ */
+#define MEMINFO_FLAG_ELEM_VALID	0x00000001U
+#define MEMINFO_FLAG_ELEM_IND	0x00000002U
+#define MEMINFO_FLAG_CSUM_VALID	0x00000004U
+
+struct meminfo {
+	unsigned long	addr;
+	unsigned long	size;
+	u32		csum;
+	u32		flags;
+} __packed;
+
+extern struct meminfo meminfo_array[MEMINFO_TYPE_LAST];
+
+void meminfo_init(void);
+int meminfo_csum_check(struct meminfo *meminfo, int recursive);
+void meminfo_update(enum meminfo_type type, void *buf, unsigned long size,
+		    u32 flags);
+
+#ifdef CONFIG_CRASH_DUMP
+int meminfo_old_get(enum meminfo_type type, struct meminfo *meminfo);
+extern unsigned long oldmem_base;
+extern unsigned long oldmem_size;
+#endif
 
 #ifdef CONFIG_32BIT
 
@@ -150,7 +189,14 @@ struct _lowcore {
 	 */
 	__u32	ipib;				/* 0x0e00 */
 	__u32	ipib_checksum;			/* 0x0e04 */
-	__u8	pad_0x0e08[0x0f00-0x0e08];	/* 0x0e08 */
+
+	/* 64 bit save area */
+	__u64	save_area_64;			/* 0x0e08 */
+
+	/* meminfo root */
+	struct meminfo	meminfo;		/* 0x0e10 */
+	__u32	meminfo_csum;			/* 0x0e20 */
+	__u8	pad_0x0e24[0x0f00-0x0e24];	/* 0x0e24 */
 
 	/* Extended facility list */
 	__u64	stfle_fac_list[32];		/* 0x0f00 */
@@ -286,7 +332,19 @@ struct _lowcore {
 	 */
 	__u64	ipib;				/* 0x0e00 */
 	__u32	ipib_checksum;			/* 0x0e08 */
-	__u8	pad_0x0e0c[0x0f00-0x0e0c];	/* 0x0e0c */
+
+	/* 64 bit save area */
+	__u64	save_area_64;			/* 0x0e0c */
+
+	/* meminfo root */
+	struct meminfo meminfo;			/* 0x0e14 */
+	__u32	meminfo_csum;			/* 0x0e2c */
+
+	/* oldmem base */
+	__u64	oldmem_base;			/* 0x0e30 */
+	/* oldmem size */
+	__u64	oldmem_size;			/* 0x0e38 */
+	__u8	pad_0x0e40[0x0f00-0x0e40];	/* 0x0e40 */
 
 	/* Extended facility list */
 	__u64	stfle_fac_list[32];		/* 0x0f00 */
--- a/arch/s390/include/asm/sclp.h
+++ b/arch/s390/include/asm/sclp.h
@@ -55,4 +55,5 @@ int sclp_chp_deconfigure(struct chp_id c
 int sclp_chp_read_info(struct sclp_chp_info *info);
 void sclp_get_ipl_info(struct sclp_ipl_info *info);
 
+void _sclp_print_early(const char *);
 #endif /* _ASM_S390_SCLP_H */
--- a/arch/s390/include/asm/setup.h
+++ b/arch/s390/include/asm/setup.h
@@ -35,6 +35,8 @@
 
 #define CHUNK_READ_WRITE 0
 #define CHUNK_READ_ONLY  1
+#define CHUNK_OLDMEM     4
+#define CHUNK_CRASHK     5
 
 struct mem_chunk {
 	unsigned long addr;
@@ -48,6 +50,8 @@ extern int memory_end_set;
 extern unsigned long memory_end;
 
 void detect_memory_layout(struct mem_chunk chunk[]);
+void create_mem_hole(struct mem_chunk memory_chunk[], unsigned long addr,
+		     unsigned long size, int type);
 
 #define PRIMARY_SPACE_MODE	0
 #define ACCESS_REGISTER_MODE	1
@@ -106,6 +110,7 @@ extern unsigned int user_mode;
 #endif /* __s390x__ */
 
 #define ZFCPDUMP_HSA_SIZE	(32UL<<20)
+#define ZFCPDUMP_HSA_SIZE_MAX	(64UL<<20)
 
 /*
  * Console mode. Override with conmode=
--- a/arch/s390/include/asm/system.h
+++ b/arch/s390/include/asm/system.h
@@ -113,6 +113,10 @@ extern void pfault_fini(void);
 
 extern void cmma_init(void);
 extern int memcpy_real(void *, void *, size_t);
+extern int copy_to_user_real(void __user *dest, void *src, size_t count);
+extern int copy_from_user_real(void *dest, void __user *src, size_t count);
+extern void copy_to_absolute_zero(void *dest, void *src, size_t count);
+extern void copy_from_absolute_zero(void *dest, void *src, size_t count);
 
 #define finish_arch_switch(prev) do {					     \
 	set_fs(current->thread.mm_segment);				     \
--- a/arch/s390/kernel/Makefile
+++ b/arch/s390/kernel/Makefile
@@ -23,7 +23,7 @@ CFLAGS_sysinfo.o += -Iinclude/math-emu -
 obj-y	:=  bitmap.o traps.o time.o process.o base.o early.o setup.o vtime.o \
 	    processor.o sys_s390.o ptrace.o signal.o cpcmd.o ebcdic.o nmi.o \
 	    debug.o irq.o ipl.o dis.o diag.o mem_detect.o sclp.o vdso.o \
-	    sysinfo.o jump_label.o
+	    sysinfo.o jump_label.o meminfo.o
 
 obj-y	+= $(if $(CONFIG_64BIT),entry64.o,entry.o)
 obj-y	+= $(if $(CONFIG_64BIT),reipl64.o,reipl.o)
@@ -48,6 +48,7 @@ obj-$(CONFIG_FUNCTION_TRACER)	+= $(if $(
 obj-$(CONFIG_DYNAMIC_FTRACE)	+= ftrace.o
 obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
 obj-$(CONFIG_FTRACE_SYSCALLS)  += ftrace.o
+obj-$(CONFIG_CRASH_DUMP)	+= crash_dump.o crash_dump_elf.o
 
 # Kexec part
 S390_KEXEC_OBJS := machine_kexec.o crash.o
--- a/arch/s390/kernel/asm-offsets.c
+++ b/arch/s390/kernel/asm-offsets.c
@@ -142,6 +142,11 @@ int main(void)
 	DEFINE(__LC_FPREGS_SAVE_AREA, offsetof(struct _lowcore, floating_pt_save_area));
 	DEFINE(__LC_GPREGS_SAVE_AREA, offsetof(struct _lowcore, gpregs_save_area));
 	DEFINE(__LC_CREGS_SAVE_AREA, offsetof(struct _lowcore, cregs_save_area));
+	DEFINE(__LC_SAVE_AREA_64, offsetof(struct _lowcore, save_area_64));
+	DEFINE(__LC_MEMINFO, offsetof(struct _lowcore, meminfo));
+	DEFINE(__MI_TYPE_KDUMP_MEM, (MEMINFO_TYPE_KDUMP_MEM * sizeof(struct meminfo)));
+	DEFINE(__MI_ADDR, offsetof(struct meminfo, addr));
+	DEFINE(__MI_SIZE, offsetof(struct meminfo, size));
 #ifdef CONFIG_32BIT
 	DEFINE(SAVE_AREA_BASE, offsetof(struct _lowcore, extended_save_area_addr));
 #else /* CONFIG_32BIT */
@@ -153,6 +158,8 @@ int main(void)
 	DEFINE(__LC_VDSO_PER_CPU, offsetof(struct _lowcore, vdso_per_cpu_data));
 	DEFINE(__LC_SIE_HOOK, offsetof(struct _lowcore, sie_hook));
 	DEFINE(__LC_CMF_HPP, offsetof(struct _lowcore, cmf_hpp));
+	DEFINE(__LC_OLDMEM_BASE, offsetof(struct _lowcore, oldmem_base));
+	DEFINE(__LC_OLDMEM_SIZE, offsetof(struct _lowcore, oldmem_size));
 #endif /* CONFIG_32BIT */
 	return 0;
 }
--- a/arch/s390/kernel/base.S
+++ b/arch/s390/kernel/base.S
@@ -75,6 +75,43 @@ s390_base_pgm_handler_fn:
 	.quad	0
 	.previous
 
+#
+# Calls diag 308 subcode 1 and continues execution
+#
+# The following conditions must be ensured before calling this function:
+# * Prefix register = 0
+# * Lowcore protection is disabled
+#
+	.globl	do_reset_diag308
+do_reset_diag308:
+	larl	%r4,.Lctlregs		# Save control registers
+	stctg	%c0,%c15,0(%r4)
+	larl	%r4,.Lrestart_psw	# Setup restart PSW at absolute 0
+	lghi	%r3,0
+	lg	%r4,0(%r4)		# Save PSW
+	sturg	%r4,%r3			# Use sturg, because of large pages
+	lghi	%r1,1
+	diag	%r1,%r1,0x308
+.Lrestart_part2:
+	lhi	%r0,0			# Load r0 with zero
+	lhi	%r1,2			# Use mode 2 = ESAME (dump)
+	sigp	%r1,%r0,0x12		# Switch to ESAME mode
+	sam64				# Switch to 64 bit addressing mode
+	larl	%r4,.Lctlregs		# Restore control registers
+	lctlg	%c0,%c15,0(%r4)
+	br	%r14
+.align 16
+.Lrestart_psw:
+	.long	0x00080000,0x80000000 + .Lrestart_part2
+
+	.section .bss
+.align 8
+.Lctlregs:
+	.rept	16
+	.quad	0
+	.endr
+	.previous
+
 #else /* CONFIG_64BIT */
 
 	.globl	s390_base_mcck_handler
--- /dev/null
+++ b/arch/s390/kernel/crash_dump.c
@@ -0,0 +1,76 @@
+/*
+ * S390 kdump implementation
+ *
+ * Copyright IBM Corp. 2011
+ * Author(s): Michael Holzheu <holzheu@linux.vnet.ibm.com>
+ */
+
+#include <linux/crash_dump.h>
+#include <asm/lowcore.h>
+
+/*
+ * Copy one page from "oldmem"
+ *
+ * For the kdump reserved memory this functions performs a swap operation:
+ *  - [kdump_base - kdump_base + kdump_size] is mapped to [0 - kdump_size].
+ *  - [0 - kdump_size] is mapped to [kdump_base - kdump_base + kdump_size]
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
+			 size_t csize, unsigned long offset, int userbuf)
+{
+	unsigned long src, kdump_base, kdump_size;
+	int rc;
+
+	if (!csize)
+		return 0;
+
+	kdump_base = oldmem_base;
+	kdump_size = oldmem_size;
+
+	src = (pfn << PAGE_SHIFT) + offset;
+	if (src < kdump_size)
+		src += kdump_base;
+	else if (src > kdump_base &&
+		 src < kdump_base + kdump_size)
+		src -= kdump_base;
+	if (userbuf)
+		rc = copy_to_user_real((void __user *) buf, (void *) src,
+				       csize);
+	else
+		rc = memcpy_real(buf, (void *) src, csize);
+	return rc < 0 ? rc : csize;
+}
+
+/*
+ * Read memory from oldmem
+ */
+ssize_t crash_read_from_oldmem(void *buf, size_t count, u64 ppos, int userbuf)
+{
+	unsigned long pfn, offset;
+	ssize_t read = 0, tmp;
+	size_t nr_bytes;
+
+	if (!count)
+		return 0;
+
+	offset = (unsigned long)(ppos % PAGE_SIZE);
+	pfn = (unsigned long)(ppos / PAGE_SIZE);
+
+	do {
+		if (count > (PAGE_SIZE - offset))
+			nr_bytes = PAGE_SIZE - offset;
+		else
+			nr_bytes = count;
+
+		tmp = copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf);
+		if (tmp < 0)
+			return tmp;
+		count -= nr_bytes;
+		buf += nr_bytes;
+		read += nr_bytes;
+		++pfn;
+		offset = 0;
+	} while (count);
+
+	return read;
+}
--- /dev/null
+++ b/arch/s390/kernel/crash_dump_elf.c
@@ -0,0 +1,434 @@
+/*
+ * S390 kdump implementation - Create ELF core header
+ *
+ * Copyright IBM Corp. 2011
+ *
+ * Author(s): Michael Holzheu <holzheu@linux.vnet.ibm.com>
+ */
+
+#define KMSG_COMPONENT "kdump"
+#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/gfp.h>
+#include <linux/slab.h>
+#include <linux/crash_dump.h>
+#include <linux/bootmem.h>
+#include <linux/elf.h>
+#include <asm/ipl.h>
+
+#define HDR_PER_CPU_SIZE	0x300
+#define HDR_PER_MEMC_SIZE	0x100
+#define HDR_BASE_SIZE		0x2000
+
+#define ROUNDUP(x, y) ((((x) + ((y) - 1)) / (y)) * (y))
+#define PTR_ADD(x, y) (((char *) (x)) + ((unsigned long) (y)))
+#define PTR_SUB(x, y) (((char *) (x)) - ((unsigned long) (y)))
+#define PTR_DIFF(x, y) ((unsigned long)(((char *) (x)) - ((unsigned long) (y))))
+
+#ifndef ELFOSABI_SYSV
+#define ELFOSABI_SYSV 0
+#endif
+
+#ifndef EI_ABIVERSION
+#define EI_ABIVERSION 8
+#endif
+
+#ifndef NT_FPREGSET
+#define NT_FPREGSET 2
+#endif
+
+/*
+ * prstatus ELF Note
+ */
+struct nt_prstatus_64 {
+	u8	pad1[32];
+	u32	pr_pid;
+	u8	pad2[76];
+	u64	psw[2];
+	u64	gprs[16];
+	u32	acrs[16];
+	u64	orig_gpr2;
+	u32	pr_fpvalid;
+	u8	pad3[4];
+} __packed;
+
+/*
+ * fpregset ELF Note
+ */
+struct nt_fpregset_64 {
+	u32	fpc;
+	u32	pad;
+	u64	fprs[16];
+} __packed;
+
+/*
+ * prpsinfo ELF Note
+ */
+struct nt_prpsinfo_64 {
+	char	pr_state;
+	char	pr_sname;
+	char	pr_zomb;
+	char	pr_nice;
+	u64	pr_flag;
+	u32	pr_uid;
+	u32	pr_gid;
+	u32	pr_pid, pr_ppid, pr_pgrp, pr_sid;
+	char	pr_fname[16];
+	char	pr_psargs[80];
+};
+
+/*
+ * File local static data
+ */
+static struct {
+	void	*hdr;
+	u32	hdr_size;
+	int	mem_chunk_cnt;
+} l;
+
+/*
+ * Create all required memory holes
+ */
+static void create_mem_holes(struct mem_chunk chunk_array[])
+{
+	create_mem_hole(chunk_array, oldmem_base, oldmem_size, CHUNK_CRASHK);
+}
+
+/*
+ * Alloc memory and panic in case of alloc failure
+ */
+static void *zg_alloc(int len)
+{
+	void *rc;
+
+	rc = kzalloc(len, GFP_KERNEL);
+	if (!rc)
+		panic("crash_dump_elf: alloc failed");
+	return rc;
+}
+
+/*
+ * Calculate CPUs count for dump
+ */
+static int cpu_cnt(void)
+{
+	int i, cpus = 0;
+
+	for (i = 0; zfcpdump_save_areas[i]; i++) {
+		if (zfcpdump_save_areas[i]->pref_reg == 0)
+			continue;
+		cpus++;
+	}
+	return cpus;
+}
+
+/*
+ * Calculate memory chunk count
+ */
+static int mem_chunk_cnt(void)
+{
+	struct mem_chunk *chunk_array, *mem_chunk;
+	int i, cnt = 0;
+
+	chunk_array = zg_alloc(MEMORY_CHUNKS * sizeof(struct mem_chunk));
+	detect_memory_layout(chunk_array);
+	create_mem_holes(chunk_array);
+	for (i = 0; i < MEMORY_CHUNKS; i++) {
+		mem_chunk = &chunk_array[i];
+		if (chunk_array[i].type != CHUNK_READ_WRITE &&
+		    chunk_array[i].type != CHUNK_READ_ONLY)
+			continue;
+		if (mem_chunk->size == 0)
+			continue;
+		cnt++;
+	}
+	kfree(chunk_array);
+	return cnt;
+}
+
+/*
+ * Initialize ELF header
+ */
+static void *ehdr_init(Elf64_Ehdr *ehdr)
+{
+	memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
+	ehdr->e_ident[EI_CLASS] = ELFCLASS64;
+	ehdr->e_ident[EI_DATA] = ELFDATA2MSB;
+	ehdr->e_ident[EI_VERSION] = EV_CURRENT;
+	ehdr->e_ident[EI_OSABI] = ELFOSABI_SYSV;
+	ehdr->e_ident[EI_ABIVERSION] = 0;
+	memset(ehdr->e_ident+EI_PAD, 0, EI_NIDENT-EI_PAD);
+	ehdr->e_type = ET_CORE;
+	ehdr->e_machine = EM_S390;
+	ehdr->e_version = EV_CURRENT;
+	ehdr->e_entry = 0;
+	ehdr->e_phoff = sizeof(Elf64_Ehdr);
+	ehdr->e_shoff = 0;
+	ehdr->e_flags = 0;
+	ehdr->e_ehsize = sizeof(Elf64_Ehdr);
+	ehdr->e_phentsize = sizeof(Elf64_Phdr);
+	ehdr->e_shentsize = 0;
+	ehdr->e_shnum = 0;
+	ehdr->e_shstrndx = 0;
+	ehdr->e_phnum = l.mem_chunk_cnt + 1;
+	return ehdr + 1;
+}
+
+/*
+ * Initialize ELF loads
+ */
+static int loads_init(Elf64_Phdr *phdr, u64 loads_offset)
+{
+	struct mem_chunk *chunk_array, *mem_chunk;
+	int i;
+
+	chunk_array = zg_alloc(MEMORY_CHUNKS * sizeof(struct mem_chunk));
+	detect_memory_layout(chunk_array);
+	create_mem_holes(chunk_array);
+	for (i = 0; i < MEMORY_CHUNKS; i++) {
+		mem_chunk = &chunk_array[i];
+		if (mem_chunk->size == 0)
+			break;
+		if (chunk_array[i].type != CHUNK_READ_WRITE &&
+		    chunk_array[i].type != CHUNK_READ_ONLY)
+			continue;
+		else
+			phdr->p_filesz = mem_chunk->size;
+		phdr->p_type = PT_LOAD;
+		phdr->p_offset = mem_chunk->addr;
+		phdr->p_vaddr = mem_chunk->addr;
+		phdr->p_paddr = mem_chunk->addr;
+		phdr->p_memsz = mem_chunk->size;
+		phdr->p_flags = PF_R | PF_W | PF_X;
+		phdr->p_align = PAGE_SIZE;
+		phdr++;
+	}
+	kfree(chunk_array);
+	return i;
+}
+
+/*
+ * Initialize ELF note
+ */
+static void *nt_init(void *buf, Elf64_Word type, void *desc, int d_len,
+		     const char *name)
+{
+	Elf64_Nhdr *note;
+	u64 len;
+
+	note = (Elf64_Nhdr *)buf;
+	note->n_namesz = strlen(name) + 1;
+	note->n_descsz = d_len;
+	note->n_type = type;
+	len = sizeof(Elf64_Nhdr);
+
+	memcpy(buf + len, name, note->n_namesz);
+	len = ROUNDUP(len + note->n_namesz, 4);
+
+	memcpy(buf + len, desc, note->n_descsz);
+	len = ROUNDUP(len + note->n_descsz, 4);
+
+	return PTR_ADD(buf, len);
+}
+
+/*
+ * Initialize prstatus note
+ */
+static void *nt_prstatus(void *ptr, struct save_area *cpu)
+{
+	struct nt_prstatus_64 nt_prstatus;
+	static int cpu_nr = 1;
+
+	memset(&nt_prstatus, 0, sizeof(nt_prstatus));
+	memcpy(&nt_prstatus.gprs, cpu->gp_regs, sizeof(cpu->gp_regs));
+	memcpy(&nt_prstatus.psw, cpu->psw, sizeof(cpu->psw));
+	memcpy(&nt_prstatus.acrs, cpu->acc_regs, sizeof(cpu->acc_regs));
+	nt_prstatus.pr_pid = cpu_nr;
+	cpu_nr++;
+
+	return nt_init(ptr, NT_PRSTATUS, &nt_prstatus, sizeof(nt_prstatus),
+			 "CORE");
+}
+
+/*
+ * Initialize fpregset (floating point) note
+ */
+static void *nt_fpregset(void *ptr, struct save_area *cpu)
+{
+	struct nt_fpregset_64 nt_fpregset;
+
+	memset(&nt_fpregset, 0, sizeof(nt_fpregset));
+	memcpy(&nt_fpregset.fpc, &cpu->fp_ctrl_reg, sizeof(cpu->fp_ctrl_reg));
+	memcpy(&nt_fpregset.fprs, &cpu->fp_regs, sizeof(cpu->fp_regs));
+
+	return nt_init(ptr, NT_FPREGSET, &nt_fpregset, sizeof(nt_fpregset),
+			 "CORE");
+}
+
+/*
+ * Initialize timer note
+ */
+static void *nt_s390_timer(void *ptr, struct save_area *cpu)
+{
+	return nt_init(ptr, NT_S390_TIMER, &cpu->timer, sizeof(cpu->timer),
+			 "LINUX");
+}
+
+/*
+ * Initialize TOD clock comparator note
+ */
+static void *nt_s390_tod_cmp(void *ptr, struct save_area *cpu)
+{
+	return nt_init(ptr, NT_S390_TODCMP, &cpu->clk_cmp,
+		       sizeof(cpu->clk_cmp), "LINUX");
+}
+
+/*
+ * Initialize TOD programmable register note
+ */
+static void *nt_s390_tod_preg(void *ptr, struct save_area *cpu)
+{
+	return nt_init(ptr, NT_S390_TODPREG, &cpu->tod_reg,
+		       sizeof(cpu->tod_reg), "LINUX");
+}
+
+/*
+ * Initialize control register note
+ */
+static void *nt_s390_ctrs(void *ptr, struct save_area *cpu)
+{
+	return nt_init(ptr, NT_S390_CTRS, &cpu->ctrl_regs,
+		       sizeof(cpu->ctrl_regs), "LINUX");
+}
+
+/*
+ * Initialize prefix register note
+ */
+static void *nt_s390_prefix(void *ptr, struct save_area *cpu)
+{
+	return nt_init(ptr, NT_S390_PREFIX, &cpu->pref_reg,
+			 sizeof(cpu->pref_reg), "LINUX");
+}
+
+/*
+ * Initialize prpsinfo note
+ */
+static void *nt_prpsinfo(void *ptr)
+{
+	struct nt_prpsinfo_64 prpsinfo;
+
+	memset(&prpsinfo, 0, sizeof(prpsinfo));
+	prpsinfo.pr_state = 0;
+	prpsinfo.pr_sname = 'R';
+	prpsinfo.pr_zomb = 0;
+	strcpy(prpsinfo.pr_fname, "vmlinux");
+
+	return nt_init(ptr, NT_PRPSINFO, &prpsinfo, sizeof(prpsinfo), "CORE");
+}
+
+/*
+ * Initialize vmcoreinfo note
+ */
+static void *nt_vmcoreinfo(void *ptr)
+{
+	struct meminfo meminfo_vmcoreinfo;
+	char note_name[11];
+	unsigned long addr;
+	char *vmcoreinfo;
+	Elf64_Nhdr note;
+
+	if (meminfo_old_get(MEMINFO_TYPE_VMCOREINFO, &meminfo_vmcoreinfo))
+		return ptr;
+	addr = meminfo_vmcoreinfo.addr;
+	memset(note_name, 0, sizeof(note_name));
+	crash_read_from_oldmem(&note, sizeof(note), addr, 0);
+	crash_read_from_oldmem(note_name, sizeof(note_name) - 1,
+			       addr + sizeof(note), 0);
+	if (strcmp(note_name, "VMCOREINFO") != 0)
+		return ptr;
+	vmcoreinfo = zg_alloc(note.n_descsz + 1);
+	crash_read_from_oldmem(vmcoreinfo, note.n_descsz, addr + 24, 0);
+	vmcoreinfo[note.n_descsz + 1] = 0;
+
+	return nt_init(ptr, 0, vmcoreinfo, note.n_descsz, "VMCOREINFO");
+}
+
+/*
+ * Initialize notes
+ */
+static void *notes_init(Elf64_Phdr *phdr, void *ptr, u64 notes_offset)
+{
+	struct save_area *cpu;
+	void *ptr_start = ptr;
+	int i;
+
+	ptr = nt_prpsinfo(ptr);
+
+	for (i = 0; zfcpdump_save_areas[i]; i++) {
+		cpu = zfcpdump_save_areas[i];
+		if (cpu->pref_reg == 0)
+			continue;
+		ptr = nt_prstatus(ptr, cpu);
+		ptr = nt_fpregset(ptr, cpu);
+		ptr = nt_s390_timer(ptr, cpu);
+		ptr = nt_s390_tod_cmp(ptr, cpu);
+		ptr = nt_s390_tod_preg(ptr, cpu);
+		ptr = nt_s390_ctrs(ptr, cpu);
+		ptr = nt_s390_prefix(ptr, cpu);
+	}
+	ptr = nt_vmcoreinfo(ptr);
+	memset(phdr, 0, sizeof(*phdr));
+	phdr->p_type = PT_NOTE;
+	phdr->p_offset = notes_offset;
+	phdr->p_filesz = (unsigned long) PTR_SUB(ptr, ptr_start);
+	phdr->p_memsz = phdr->p_filesz;
+	return ptr;
+}
+
+/*
+ * Initialize ELF header for kdump
+ */
+static void setup_kdump_elf_hdr(void)
+{
+	Elf64_Phdr *phdr_notes, *phdr_loads;
+	u32 alloc_size;
+	u64 hdr_off;
+	void *ptr;
+
+	if (!is_kdump_kernel())
+		return;
+	l.mem_chunk_cnt = mem_chunk_cnt();
+
+	alloc_size = HDR_BASE_SIZE + cpu_cnt() * HDR_PER_CPU_SIZE +
+		l.mem_chunk_cnt * HDR_PER_MEMC_SIZE;
+	l.hdr = zg_alloc(alloc_size);
+	/* Init elf header */
+	ptr = ehdr_init(l.hdr);
+	/* Init program headers */
+	phdr_notes = ptr;
+	ptr = PTR_ADD(ptr, sizeof(Elf64_Phdr));
+	phdr_loads = ptr;
+	ptr = PTR_ADD(ptr, sizeof(Elf64_Phdr) * l.mem_chunk_cnt);
+	/* Init notes */
+	hdr_off = PTR_DIFF(ptr, l.hdr);
+	ptr = notes_init(phdr_notes, ptr, hdr_off);
+	/* Init loads */
+	hdr_off = PTR_DIFF(ptr, l.hdr);
+	loads_init(phdr_loads, hdr_off);
+	l.hdr_size = hdr_off;
+	BUG_ON(l.hdr_size > alloc_size);
+}
+
+/*
+ * Get ELF header - called from vmcore common code
+ */
+int arch_vmcore_get_elf_hdr(char **elfcorebuf, size_t *elfcorebuf_sz)
+{
+	if (!l.hdr)
+		setup_kdump_elf_hdr();
+	*elfcorebuf = l.hdr;
+	*elfcorebuf_sz = l.hdr_size;
+	return 0;
+}
--- a/arch/s390/kernel/early.c
+++ b/arch/s390/kernel/early.c
@@ -20,6 +20,7 @@
 #include <linux/pfn.h>
 #include <linux/uaccess.h>
 #include <linux/kernel.h>
+#include <linux/crash_dump.h>
 #include <asm/ebcdic.h>
 #include <asm/ipl.h>
 #include <asm/lowcore.h>
@@ -29,6 +30,7 @@
 #include <asm/sysinfo.h>
 #include <asm/cpcmd.h>
 #include <asm/sclp.h>
+#include <asm/asm-offsets.h>
 #include "entry.h"
 
 /*
@@ -453,6 +455,14 @@ static void __init setup_boot_command_li
 	append_to_cmdline(append_ipl_scpdata);
 }
 
+static void __init setup_kdump(void)
+{
+#ifdef CONFIG_CRASH_DUMP
+	if (!oldmem_base)
+		return;
+	elfcorehdr_addr = ELFCORE_ADDR_NEWMEM; /* needed for is_kdump_kernel */
+#endif
+}
 
 /*
  * Save ipl parameters, clear bss memory, initialize storage keys
@@ -460,6 +470,8 @@ static void __init setup_boot_command_li
  */
 void __init startup_init(void)
 {
+	meminfo_init();
+	setup_kdump();
 	reset_tod_clock();
 	ipl_save_parameters();
 	rescue_initrd();
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -859,6 +859,34 @@ restart_crash:
 restart_go:
 #endif
 
+#
+# PSW restart interrupt handler
+#
+	.globl psw_restart_int_handler
+psw_restart_int_handler:
+	st	%r15,__LC_SAVE_AREA_64(%r0)	# save r15
+	basr	%r15,0
+0:	l	%r15,.Lrestart_stack-0b(%r15)	# load restart stack
+	l	%r15,0(%r15)
+	ahi	%r15,-SP_SIZE			# make room for pt_regs
+	stm	%r0,%r14,SP_R0(%r15)		# store gprs %r0-%r14 to stack
+	mvc	SP_R15(4,%r15),__LC_SAVE_AREA_64(%r0)# store saved %r15 to stack
+	mvc	SP_PSW(8,%r15),__LC_RST_OLD_PSW(%r0) # store restart old psw
+	xc	__SF_BACKCHAIN(4,%r15),__SF_BACKCHAIN(%r15) # set backchain to 0
+	basr	%r14,0
+1:	l	%r14,.Ldo_restart-1b(%r14)
+	basr	%r14,%r14
+
+	basr	%r14,0				# load disabled wait PSW if
+2:	lpsw	restart_psw_crash-2b(%r14)	# do_restart returns
+.Ldo_restart:
+	.long	do_restart
+.Lrestart_stack:
+	.long	restart_stack
+	.align 8
+restart_psw_crash:
+	.long	0x000a0000,0x00000000 + restart_psw_crash
+
 	.section .kprobes.text, "ax"
 
 #ifdef CONFIG_CHECK_STACK
--- a/arch/s390/kernel/entry64.S
+++ b/arch/s390/kernel/entry64.S
@@ -862,6 +862,27 @@ restart_crash:
 restart_go:
 #endif
 
+#
+# PSW restart interrupt handler
+#
+	.globl psw_restart_int_handler
+psw_restart_int_handler:
+	stg	%r15,__LC_SAVE_AREA_64(%r0)	# save r15
+	larl	%r15,restart_stack		# load restart stack
+	lg	%r15,0(%r15)
+	aghi	%r15,-SP_SIZE			# make room for pt_regs
+	stmg	%r0,%r14,SP_R0(%r15)		# store gprs %r0-%r14 to stack
+	mvc	SP_R15(8,%r15),__LC_SAVE_AREA_64(%r0)# store saved %r15 to stack
+	mvc	SP_PSW(16,%r15),__LC_RST_OLD_PSW(%r0)# store restart old psw
+	xc	__SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15) # set backchain to 0
+	brasl	%r14,do_restart
+
+	larl	%r14,restart_psw_crash		# load disabled wait PSW if
+	lpswe	0(%r14)				# do_restart returns
+	.align 8
+restart_psw_crash:
+	.quad	0x0002000080000000,0x0000000000000000 + restart_psw_crash
+
 	.section .kprobes.text, "ax"
 
 #ifdef CONFIG_CHECK_STACK
--- a/arch/s390/kernel/head.S
+++ b/arch/s390/kernel/head.S
@@ -450,10 +450,22 @@ start:
 	.org	0x10000
 	.globl	startup
 startup:
+	j	.Lep_startup_normal
+
+#
+# kdump startup-code at 0x10008, running in 64 bit absolute addressing mode
+#
+	.org	0x10008
+	.globl	startup_kdump
+startup_kdump:
+	j	.Lep_startup_kdump
+
+.Lep_startup_normal:
 	basr	%r13,0			# get base
 .LPG0:
 	xc	0x200(256),0x200	# partially clear lowcore
 	xc	0x300(256),0x300
+	xc	0xe00(256),0xe00
 	stck	__LC_LAST_UPDATE_CLOCK
 	spt	5f-.LPG0(%r13)
 	mvc	__LC_LAST_UPDATE_TIMER(8),5f-.LPG0(%r13)
@@ -535,6 +547,8 @@ startup:
 	.align	8
 5:	.long	0x7fffffff,0xffffffff
 
+#include "head_kdump.S"
+
 #
 # params at 10400 (setup.h)
 #
--- /dev/null
+++ b/arch/s390/kernel/head_kdump.S
@@ -0,0 +1,133 @@
+/*
+ * S390 kdump lowlevel functions (new kernel)
+ *
+ * Copyright IBM Corp. 2011
+ * Author(s): Michael Holzheu <holzheu@linux.vnet.ibm.com>
+ */
+
+#define DATAMOVER_ADDR	0x4000
+#define COPY_PAGE_ADDR	0x6000
+
+#ifdef CONFIG_CRASH_DUMP
+
+#
+# kdump entry (new kernel - not yet relocated)
+#
+# Note: This code has to be position independent
+#
+
+.align 2
+.Lep_startup_kdump:
+	basr	%r13,0
+.Lbase:
+	larl	%r2,.Lbase_addr			# Check, if we have been
+	lg	%r2,0(%r2)			# already relocated:
+	clgr	%r2,%r13			#
+	jne	.Lrelocate			# No : Start data mover
+	lghi	%r2,0				# Yes: Start kdump kernel
+	brasl	%r14,startup_kdump_relocated
+
+.Lrelocate:
+	lg	%r4,__LC_MEMINFO+__MI_ADDR(%r0)	# Load meminfo base (%r4)
+
+	lgr	%r5,%r4
+	aghi	%r5,__MI_TYPE_KDUMP_MEM		# Base for kdump meminfo
+	lg	%r2,__MI_ADDR(%r5)		# Load kdump base address (%r2)
+	lg	%r3,__MI_SIZE(%r5)		# Load kdump size (%r3)
+
+	stg	%r2,__LC_OLDMEM_BASE(%r2)	# Save kdump base
+	stg	%r3,__LC_OLDMEM_SIZE(%r2)	# Save kdump size
+
+	larl	%r10,.Lcopy_start		# Source of data mover
+	lghi	%r8,DATAMOVER_ADDR		# Target of data mover
+	mvc	0(256,%r8),0(%r10)		# Copy data mover code
+
+	agr	%r8,%r2				# Copy data mover to
+	mvc	0(256,%r8),0(%r10)		# reserved mem
+
+	lghi	%r14,DATAMOVER_ADDR		# Jump to copied data mover
+	basr	%r14,%r14
+.Lbase_addr:
+	.quad	.Lbase
+
+#
+# kdump data mover code (runs at address DATAMOVER_ADDR)
+#
+# r2: kdump base address
+# r3: kdump size
+#
+.Lcopy_start:
+	basr	%r13,0				# Base
+0:
+	lgr	%r11,%r2			# Save kdump base address
+	lgr	%r12,%r2
+	agr	%r12,%r3			# Compute kdump end address
+
+	lghi	%r5,0
+	lghi	%r10,COPY_PAGE_ADDR		# Load copy page address
+1:
+	mvc	0(256,%r10),0(%r5)		# Copy old kernel to tmp
+	mvc	0(256,%r5),0(%r11)		# Copy new kernel to old
+	mvc	0(256,%r11),0(%r10)		# Copy tmp to new
+	aghi	%r11,256
+	aghi	%r5,256
+	clgr	%r11,%r12
+	jl	1b
+
+	lg	%r14,.Lstartup_kdump-0b(%r13)
+	basr	%r14,%r14			# Start relocated kernel
+.Lstartup_kdump:
+	.long	0x00000000,0x00000000 + startup_kdump_relocated
+.Lcopy_end:
+
+#
+# Startup of kdump (relocated new kernel)
+#
+.align 2
+startup_kdump_relocated:
+	basr	%r13,0
+0:	lg	%r3,__LC_OLDMEM_BASE(%r0)	# Save oldmem base
+	stg	%r3,oldmem_base-0b(%r13)
+	lg	%r3,__LC_OLDMEM_SIZE(%r0)	# Save oldmem size
+	stg	%r3,oldmem_size-0b(%r13)
+
+	mvc	0(8,%r0),.Lrestart_psw-0b(%r13)	# Setup restart PSW
+	mvc	464(16,%r0),.Lpgm_psw-0b(%r13)	# Setup pgm check PSW
+	lhi	%r1,1				# Start new kernel
+	diag	%r1,%r1,0x308			# with diag 308
+
+.Lno_diag308:					# No diag 308
+	sam31					# Switch to 31 bit addr mode
+	sr	%r1,%r1				# Erase register r1
+	sr	%r2,%r2				# Erase register r2
+	sigp	%r1,%r2,0x12			# Switch to 31 bit arch mode
+	lpsw	0				# Start new kernel...
+.align	8
+.Lrestart_psw:
+	.long	0x00080000,0x80000000 + startup
+.Lpgm_psw:
+	.quad	0x0000000180000000,0x0000000000000000 + .Lno_diag308
+	.globl	oldmem_base
+oldmem_base:
+	.quad	0x0
+	.globl	oldmem_size
+oldmem_size:
+	.quad	0x0
+
+#else
+.align 2
+.Lep_startup_kdump:
+#ifdef CONFIG_64BIT
+	larl	%r13,startup_kdump_crash
+	lpswe	0(%r13)
+.align 8
+startup_kdump_crash:
+	.quad	0x0002000080000000,0x0000000000000000 + startup_kdump_crash
+#else
+	basr	%r13,0
+0:	lpsw	startup_kdump_crash-0b(%r13)
+.align 8
+startup_kdump_crash:
+	.long	0x000a0000,0x00000000 + startup_kdump_crash
+#endif /* CONFIG_64BIT */
+#endif /* CONFIG_CRASH_DUMP */
--- a/arch/s390/kernel/ipl.c
+++ b/arch/s390/kernel/ipl.c
@@ -16,6 +16,7 @@
 #include <linux/ctype.h>
 #include <linux/fs.h>
 #include <linux/gfp.h>
+#include <linux/crash_dump.h>
 #include <asm/ipl.h>
 #include <asm/smp.h>
 #include <asm/setup.h>
@@ -26,6 +27,7 @@
 #include <asm/sclp.h>
 #include <asm/sigp.h>
 #include <asm/checksum.h>
+#include <asm/lowcore.h>
 
 #define IPL_PARM_BLOCK_VERSION 0
 
@@ -45,11 +47,13 @@
  * - halt
  * - power off
  * - reipl
+ * - restart
  */
 #define ON_PANIC_STR		"on_panic"
 #define ON_HALT_STR		"on_halt"
 #define ON_POFF_STR		"on_poff"
 #define ON_REIPL_STR		"on_reboot"
+#define ON_RESTART_STR		"on_restart"
 
 struct shutdown_action;
 struct shutdown_trigger {
@@ -66,6 +70,7 @@ struct shutdown_trigger {
 #define SHUTDOWN_ACTION_VMCMD_STR	"vmcmd"
 #define SHUTDOWN_ACTION_STOP_STR	"stop"
 #define SHUTDOWN_ACTION_DUMP_REIPL_STR	"dump_reipl"
+#define SHUTDOWN_ACTION_KDUMP_STR	"kdump"
 
 struct shutdown_action {
 	char *name;
@@ -946,6 +951,13 @@ static struct attribute_group reipl_nss_
 	.attrs = reipl_nss_attrs,
 };
 
+static void set_reipl_block_actual(struct ipl_parameter_block *reipl_block)
+{
+	meminfo_update(MEMINFO_TYPE_IPIB, reipl_block, reipl_block->hdr.len,
+		       MEMINFO_FLAG_ELEM_VALID | MEMINFO_FLAG_CSUM_VALID);
+	reipl_block_actual = reipl_block;
+}
+
 /* reipl type */
 
 static int reipl_set_type(enum ipl_type type)
@@ -961,7 +973,7 @@ static int reipl_set_type(enum ipl_type
 			reipl_method = REIPL_METHOD_CCW_VM;
 		else
 			reipl_method = REIPL_METHOD_CCW_CIO;
-		reipl_block_actual = reipl_block_ccw;
+		set_reipl_block_actual(reipl_block_ccw);
 		break;
 	case IPL_TYPE_FCP:
 		if (diag308_set_works)
@@ -970,7 +982,7 @@ static int reipl_set_type(enum ipl_type
 			reipl_method = REIPL_METHOD_FCP_RO_VM;
 		else
 			reipl_method = REIPL_METHOD_FCP_RO_DIAG;
-		reipl_block_actual = reipl_block_fcp;
+		set_reipl_block_actual(reipl_block_fcp);
 		break;
 	case IPL_TYPE_FCP_DUMP:
 		reipl_method = REIPL_METHOD_FCP_DUMP;
@@ -980,7 +992,7 @@ static int reipl_set_type(enum ipl_type
 			reipl_method = REIPL_METHOD_NSS_DIAG;
 		else
 			reipl_method = REIPL_METHOD_NSS;
-		reipl_block_actual = reipl_block_nss;
+		set_reipl_block_actual(reipl_block_nss);
 		break;
 	case IPL_TYPE_UNKNOWN:
 		reipl_method = REIPL_METHOD_DEFAULT;
@@ -1111,6 +1123,12 @@ static void reipl_block_ccw_init(struct
 static void reipl_block_ccw_fill_parms(struct ipl_parameter_block *ipb)
 {
 	/* LOADPARM */
+	/* For kdump we use IPL parameters from original system */
+	if (is_kdump_kernel()) {
+		memcpy(ipb->ipl_info.ccw.load_parm,
+		       ipl_block.ipl_info.ccw.load_parm, LOADPARM_LEN);
+		return;
+	}
 	/* check if read scp info worked and set loadparm */
 	if (sclp_ipl_info.is_valid)
 		memcpy(ipb->ipl_info.ccw.load_parm,
@@ -1495,30 +1513,10 @@ static struct shutdown_action __refdata
 
 static void dump_reipl_run(struct shutdown_trigger *trigger)
 {
-	preempt_disable();
-	/*
-	 * Bypass dynamic address translation (DAT) when storing IPL parameter
-	 * information block address and checksum into the prefix area
-	 * (corresponding to absolute addresses 0-8191).
-	 * When enhanced DAT applies and the STE format control in one,
-	 * the absolute address is formed without prefixing. In this case a
-	 * normal store (stg/st) into the prefix area would no more match to
-	 * absolute addresses 0-8191.
-	 */
-#ifdef CONFIG_64BIT
-	asm volatile("sturg %0,%1"
-		:: "a" ((unsigned long) reipl_block_actual),
-		"a" (&lowcore_ptr[smp_processor_id()]->ipib));
-#else
-	asm volatile("stura %0,%1"
-		:: "a" ((unsigned long) reipl_block_actual),
-		"a" (&lowcore_ptr[smp_processor_id()]->ipib));
-#endif
-	asm volatile("stura %0,%1"
-		:: "a" (csum_partial(reipl_block_actual,
-				     reipl_block_actual->hdr.len, 0)),
-		"a" (&lowcore_ptr[smp_processor_id()]->ipib_checksum));
-	preempt_enable();
+	u32 csum;
+
+	csum = csum_partial(reipl_block_actual, reipl_block_actual->hdr.len, 0);
+	copy_to_absolute_zero(&S390_lowcore.ipib_checksum, &csum, sizeof(csum));
 	dump_run(trigger);
 }
 
@@ -1544,17 +1542,20 @@ static char vmcmd_on_reboot[128];
 static char vmcmd_on_panic[128];
 static char vmcmd_on_halt[128];
 static char vmcmd_on_poff[128];
+static char vmcmd_on_restart[128];
 
 DEFINE_IPL_ATTR_STR_RW(vmcmd, on_reboot, "%s\n", "%s\n", vmcmd_on_reboot);
 DEFINE_IPL_ATTR_STR_RW(vmcmd, on_panic, "%s\n", "%s\n", vmcmd_on_panic);
 DEFINE_IPL_ATTR_STR_RW(vmcmd, on_halt, "%s\n", "%s\n", vmcmd_on_halt);
 DEFINE_IPL_ATTR_STR_RW(vmcmd, on_poff, "%s\n", "%s\n", vmcmd_on_poff);
+DEFINE_IPL_ATTR_STR_RW(vmcmd, on_restart, "%s\n", "%s\n", vmcmd_on_restart);
 
 static struct attribute *vmcmd_attrs[] = {
 	&sys_vmcmd_on_reboot_attr.attr,
 	&sys_vmcmd_on_panic_attr.attr,
 	&sys_vmcmd_on_halt_attr.attr,
 	&sys_vmcmd_on_poff_attr.attr,
+	&sys_vmcmd_on_restart_attr.attr,
 	NULL,
 };
 
@@ -1576,6 +1577,8 @@ static void vmcmd_run(struct shutdown_tr
 		cmd = vmcmd_on_halt;
 	else if (strcmp(trigger->name, ON_POFF_STR) == 0)
 		cmd = vmcmd_on_poff;
+	else if (strcmp(trigger->name, ON_RESTART_STR) == 0)
+		cmd = vmcmd_on_restart;
 	else
 		return;
 
@@ -1621,11 +1624,43 @@ static void stop_run(struct shutdown_tri
 static struct shutdown_action stop_action = {SHUTDOWN_ACTION_STOP_STR,
 					     stop_run, NULL};
 
+/*
+ * kdump shutdown action: Trigger kdump on shutdown.
+ */
+
+#ifdef CONFIG_CRASH_DUMP
+static int kdump_init(void)
+{
+	if (crashk_res.start == 0)
+		return -EOPNOTSUPP;
+	return 0;
+}
+
+static void kdump_run(struct shutdown_trigger *trigger)
+{
+	/*
+	 * We do not call crash_kexec(), because the image could also
+	 * be loaded externally without kexec_load(). In this case
+	 * crash_kexec() would have no effect because crash_image is not
+	 * defined.
+	 */
+	machine_kdump();
+	disabled_wait((unsigned long) __builtin_return_address(0));
+}
+
+static struct shutdown_action kdump_action = {SHUTDOWN_ACTION_KDUMP_STR,
+					     kdump_run, kdump_init};
+#endif
+
 /* action list */
 
 static struct shutdown_action *shutdown_actions_list[] = {
 	&ipl_action, &reipl_action, &dump_reipl_action, &dump_action,
-	&vmcmd_action, &stop_action};
+	&vmcmd_action, &stop_action,
+#ifdef CONFIG_CRASH_DUMP
+	&kdump_action
+#endif
+	};
 #define SHUTDOWN_ACTIONS_COUNT (sizeof(shutdown_actions_list) / sizeof(void *))
 
 /*
@@ -1707,6 +1742,34 @@ static void do_panic(void)
 	stop_run(&on_panic_trigger);
 }
 
+/* on restart */
+
+static struct shutdown_trigger on_restart_trigger = {ON_RESTART_STR,
+	&reipl_action};
+
+static ssize_t on_restart_show(struct kobject *kobj,
+			       struct kobj_attribute *attr, char *page)
+{
+	return sprintf(page, "%s\n", on_restart_trigger.action->name);
+}
+
+static ssize_t on_restart_store(struct kobject *kobj,
+				struct kobj_attribute *attr,
+				const char *buf, size_t len)
+{
+	return set_trigger(buf, &on_restart_trigger, len);
+}
+
+static struct kobj_attribute on_restart_attr =
+	__ATTR(on_restart, 0644, on_restart_show, on_restart_store);
+
+void do_restart(void)
+{
+	smp_send_stop();
+	on_restart_trigger.action->fn(&on_restart_trigger);
+	stop_run(&on_restart_trigger);
+}
+
 /* on halt */
 
 static struct shutdown_trigger on_halt_trigger = {ON_HALT_STR, &stop_action};
@@ -1767,6 +1830,16 @@ void (*_machine_power_off)(void) = do_ma
 
 static void __init shutdown_triggers_init(void)
 {
+#ifdef CONFIG_CRASH_DUMP
+	/*
+	 * We set the kdump action for panic and restart, if the kdump
+	 * reserved area is defined.
+	 */
+	if (crashk_res.start != 0) {
+		on_restart_trigger.action = &kdump_action;
+		on_panic_trigger.action = &kdump_action;
+	}
+#endif
 	shutdown_actions_kset = kset_create_and_add("shutdown_actions", NULL,
 						    firmware_kobj);
 	if (!shutdown_actions_kset)
@@ -1783,7 +1856,9 @@ static void __init shutdown_triggers_ini
 	if (sysfs_create_file(&shutdown_actions_kset->kobj,
 			      &on_poff_attr.attr))
 		goto fail;
-
+	if (sysfs_create_file(&shutdown_actions_kset->kobj,
+			      &on_restart_attr.attr))
+		goto fail;
 	return;
 fail:
 	panic("shutdown_triggers_init failed\n");
@@ -1908,6 +1983,26 @@ void __init setup_ipl(void)
 	atomic_notifier_chain_register(&panic_notifier_list, &on_panic_nb);
 }
 
+/*
+ * In case of kdump get re-IPL configuration of crashed system via meminfo
+ */
+static int __init ipl_kdump_ipib_init(void)
+{
+#ifdef CONFIG_CRASH_DUMP
+	struct meminfo meminfo_ipib;
+
+	if (!is_kdump_kernel())
+		return -EINVAL;
+	if (meminfo_old_get(MEMINFO_TYPE_IPIB, &meminfo_ipib))
+		return -EINVAL;
+	crash_read_from_oldmem(&ipl_block, sizeof(ipl_block),
+			       meminfo_ipib.addr, 0);
+	return 0;
+#else
+	return -EINVAL;
+#endif
+}
+
 void __init ipl_update_parameters(void)
 {
 	int rc;
@@ -1915,6 +2010,35 @@ void __init ipl_update_parameters(void)
 	rc = diag308(DIAG308_STORE, &ipl_block);
 	if ((rc == DIAG308_RC_OK) || (rc == DIAG308_RC_NOCONFIG))
 		diag308_set_works = 1;
+	ipl_kdump_ipib_init();
+}
+
+/*
+ * For kdump IPL we set the IPL info to the values that get from the crashed
+ * system using the ipib meminfo pointer. Then a reboot of the kdump
+ * kernel will reboot the original system.
+ */
+static int setup_kdump_iplinfo(struct cio_iplinfo *iplinfo)
+{
+#ifdef CONFIG_CRASH_DUMP
+	if (ipl_kdump_ipib_init())
+		return -EINVAL;
+
+	if (ipl_block.hdr.pbt == DIAG308_IPL_TYPE_CCW) {
+		iplinfo->devno = ipl_block.ipl_info.ccw.devno;
+		iplinfo->is_qdio = 0;
+		return 0;
+	}
+	if (ipl_block.hdr.pbt == DIAG308_IPL_TYPE_FCP) {
+		iplinfo->devno = ipl_block.ipl_info.fcp.devno;
+		iplinfo->is_qdio = 1;
+		S390_lowcore.ipl_parmblock_ptr = (unsigned long) &ipl_block;
+		return 0;
+	}
+	return -ENODEV;
+#else
+	return -ENODEV;
+#endif
 }
 
 void __init ipl_save_parameters(void)
@@ -1922,9 +2046,13 @@ void __init ipl_save_parameters(void)
 	struct cio_iplinfo iplinfo;
 	void *src, *dst;
 
-	if (cio_get_iplinfo(&iplinfo))
-		return;
-
+	if (is_kdump_kernel()) {
+		if (setup_kdump_iplinfo(&iplinfo))
+			return;
+	} else {
+		if (cio_get_iplinfo(&iplinfo))
+			return;
+	}
 	ipl_devno = iplinfo.devno;
 	ipl_flags |= IPL_DEVNO_VALID;
 	if (!iplinfo.is_qdio)
@@ -1992,7 +2120,10 @@ void s390_reset_system(void)
 	S390_lowcore.program_new_psw.mask = psw_kernel_bits & ~PSW_MASK_MCHECK;
 	S390_lowcore.program_new_psw.addr =
 		PSW_ADDR_AMODE | (unsigned long) s390_base_pgm_handler;
-
-	do_reset_calls();
+#ifdef CONFIG_64BIT
+	if (diag308_set_works)
+		do_reset_diag308();
+	else
+#endif
+		do_reset_calls();
 }
-
--- a/arch/s390/kernel/machine_kexec.c
+++ b/arch/s390/kernel/machine_kexec.c
@@ -21,12 +21,169 @@
 #include <asm/smp.h>
 #include <asm/reset.h>
 #include <asm/ipl.h>
+#include <asm/cacheflush.h>
+#include <asm/asm-offsets.h>
+#include <asm/checksum.h>
+#include <asm/diag.h>
+#include <asm/sclp.h>
 
 typedef void (*relocate_kernel_t)(kimage_entry_t *, unsigned long);
 
 extern const unsigned char relocate_kernel[];
 extern const unsigned long long relocate_kernel_len;
 
+#ifdef CONFIG_CRASH_DUMP
+
+static struct meminfo meminfo_kdump_segments[KEXEC_SEGMENT_MAX];
+
+/*
+ * S390 version: Currently we do not support freeing crashkernel memory
+ */
+void crash_free_reserved_phys_range(unsigned long begin, unsigned long end)
+{
+	return;
+}
+
+/*
+ * S390 version: Just do real copy of segment
+ */
+int kimage_load_crash_segment(struct kimage *image,
+			      struct kexec_segment *segment)
+{
+	return copy_from_user_real((void *) segment->mem, segment->buf,
+				   segment->bufsz);
+}
+
+/*
+ * Update KDUMP_MEM meminfo and store oldmem base and size to absolute zero
+ */
+static void kdump_mem_update(void)
+{
+	unsigned long base, size;
+
+	base = crashk_res.start;
+	size = crashk_res.end - crashk_res.start + 1;
+	memcpy_real((void *) __LC_OLDMEM_BASE + base, &base, sizeof(base));
+	memcpy_real((void *) __LC_OLDMEM_SIZE + base, &size, sizeof(size));
+	meminfo_update(MEMINFO_TYPE_KDUMP_MEM, (void *) base, size,
+		       MEMINFO_FLAG_ELEM_VALID);
+}
+
+/*
+ * Clear kdump segments (kdump has been unloaded)
+ */
+static void kdump_segments_clear(void)
+{
+	memset(meminfo_kdump_segments, 0, sizeof(meminfo_kdump_segments));
+	meminfo_update(MEMINFO_TYPE_KDUMP_SEGM, NULL, 0, 0);
+	if (MACHINE_IS_VM)
+		diag10_range(PFN_DOWN(crashk_res.start),
+			     PFN_DOWN(crashk_res.end - crashk_res.start + 1));
+}
+
+/*
+ * Update kdump segments (kdump has been loaded)
+ */
+static void kdump_segments_update(struct kimage *image)
+{
+	int i, flags = MEMINFO_FLAG_ELEM_VALID | MEMINFO_FLAG_CSUM_VALID;
+
+	memset(meminfo_kdump_segments, 0, sizeof(meminfo_kdump_segments));
+
+	for (i = 0; i < image->nr_segments; i++) {
+		meminfo_kdump_segments[i].addr = image->segment[i].mem;
+		meminfo_kdump_segments[i].size = image->segment[i].memsz;
+		meminfo_kdump_segments[i].flags = flags;
+	}
+
+	meminfo_update(MEMINFO_TYPE_KDUMP_SEGM, &meminfo_kdump_segments,
+		       image->nr_segments * sizeof(struct meminfo),
+		       flags | MEMINFO_FLAG_ELEM_IND);
+}
+
+/*
+ * Finish kexec_load() and update meminfo data in case of kdump
+ */
+void machine_kexec_finish(struct kimage *image, int kexec_flags)
+{
+	if (!(kexec_flags & KEXEC_ON_CRASH))
+		return;
+	kdump_mem_update();
+	if (image)
+		kdump_segments_update(image);
+	else
+		kdump_segments_clear();
+}
+
+/*
+ * Print error message and load disabled wait PSW
+ */
+static void kdump_failed(const char *str)
+{
+	psw_t kdump_failed_psw;
+
+	kdump_failed_psw.mask = PSW_BASE_BITS | PSW_MASK_WAIT;
+	kdump_failed_psw.addr = (unsigned long) kdump_failed;
+	_sclp_print_early(str);
+	_sclp_print_early("Please use alternative dump tool");
+	__load_psw(kdump_failed_psw);
+}
+
+/*
+ * Check if kdump is loaded/valid and start it
+ */
+static void __machine_kdump(void *data)
+{
+	u32 flags = meminfo_array[MEMINFO_TYPE_KDUMP_SEGM].flags;
+	struct meminfo root;
+	psw_t kdump_psw;
+	u32 csum;
+
+	pfault_fini();
+	s390_reset_system();
+	__arch_local_irq_stnsm(0xfb); /* disable DAT */
+	do_store_status();
+
+	if (!(flags & MEMINFO_FLAG_ELEM_VALID))
+		kdump_failed("kdump failed: Kernel not loaded");
+
+	copy_from_absolute_zero(&root, &S390_lowcore.meminfo, sizeof(root));
+	copy_from_absolute_zero(&csum, &S390_lowcore.meminfo_csum,
+				sizeof(csum));
+	if (csum != csum_partial(&root, sizeof(root), 0))
+		kdump_failed("kdump failed: Invalid meminfo checksum");
+	if (meminfo_csum_check(&root, 1))
+		kdump_failed("kdump failed: Invalid checksum");
+
+	_sclp_print_early("Starting kdump");
+	kdump_psw.mask = PSW_BASE_BITS | PSW_DEFAULT_KEY;
+	kdump_psw.addr = crashk_res.start + 0x10008;
+	__load_psw(kdump_psw);
+}
+
+/*
+ * Start kdump on IPL CPU
+ */
+void machine_kdump(void)
+{
+	crash_save_vmcoreinfo();
+	smp_switch_to_ipl_cpu(__machine_kdump, NULL);
+}
+#endif
+
+/*
+ * Invalidate KDUMP_SEGM meminfo before new kdump is loaded
+ */
+static int machine_kexec_prepare_kdump(void)
+{
+#ifdef CONFIG_CRASH_DUMP
+	kdump_segments_clear();
+	return 0;
+#else
+	return -EINVAL;
+#endif
+}
+
 int machine_kexec_prepare(struct kimage *image)
 {
 	void *reboot_code_buffer;
@@ -35,6 +192,9 @@ int machine_kexec_prepare(struct kimage
 	if (ipl_flags & IPL_NSS_VALID)
 		return -ENOSYS;
 
+	if (image->type == KEXEC_TYPE_CRASH)
+		return machine_kexec_prepare_kdump();
+
 	/* We don't support anything but the default image type for now. */
 	if (image->type != KEXEC_TYPE_DEFAULT)
 		return -EINVAL;
@@ -72,6 +232,10 @@ static void __machine_kexec(void *data)
 
 void machine_kexec(struct kimage *image)
 {
+#ifdef CONFIG_CRASH_DUMP
+	if (image->type == KEXEC_TYPE_CRASH)
+		machine_kdump();
+#endif
 	tracer_disable();
 	smp_send_stop();
 	smp_switch_to_ipl_cpu(__machine_kexec, image);
--- a/arch/s390/kernel/mem_detect.c
+++ b/arch/s390/kernel/mem_detect.c
@@ -62,3 +62,73 @@ void detect_memory_layout(struct mem_chu
 	arch_local_irq_restore(flags);
 }
 EXPORT_SYMBOL(detect_memory_layout);
+
+/*
+ * Create memory hole with given address, size, and type
+ */
+void create_mem_hole(struct mem_chunk chunks[], unsigned long addr,
+		     unsigned long size, int type)
+{
+	unsigned long start, end, new_size;
+	int i;
+
+	for (i = 0; i < MEMORY_CHUNKS; i++) {
+		if (chunks[i].size == 0)
+			continue;
+		if (addr + size < chunks[i].addr)
+			continue;
+		if (addr >= chunks[i].addr + chunks[i].size)
+			continue;
+		start = max(addr, chunks[i].addr);
+		end = min(addr + size, chunks[i].addr + chunks[i].size);
+		new_size = end - start;
+		if (new_size == 0)
+			continue;
+		if (start == chunks[i].addr &&
+		    end == chunks[i].addr + chunks[i].size) {
+			/* Remove chunk */
+			chunks[i].type = type;
+		} else if (start == chunks[i].addr) {
+			/* Make chunk smaller at start */
+			if (i >= MEMORY_CHUNKS - 1)
+				panic("Unable to create memory hole");
+			memmove(&chunks[i + 1], &chunks[i],
+				sizeof(struct mem_chunk) *
+				(MEMORY_CHUNKS - (i + 1)));
+			chunks[i + 1].addr = chunks[i].addr + new_size;
+			chunks[i + 1].size = chunks[i].size - new_size;
+			chunks[i].size = new_size;
+			chunks[i].type = type;
+			i += 1;
+		} else if (end == chunks[i].addr + chunks[i].size) {
+			/* Make chunk smaller at end */
+			if (i >= MEMORY_CHUNKS - 1)
+				panic("Unable to create memory hole");
+			memmove(&chunks[i + 1], &chunks[i],
+				sizeof(struct mem_chunk) *
+				(MEMORY_CHUNKS - (i + 1)));
+			chunks[i + 1].addr = start;
+			chunks[i + 1].size = new_size;
+			chunks[i + 1].type = type;
+			chunks[i].size -= new_size;
+			i += 1;
+		} else {
+			/* Create memory hole */
+			if (i >= MEMORY_CHUNKS - 2)
+				panic("Unable to create memory hole");
+			memmove(&chunks[i + 2], &chunks[i],
+				sizeof(struct mem_chunk) *
+				(MEMORY_CHUNKS - (i + 2)));
+			chunks[i + 1].addr = addr;
+			chunks[i + 1].size = size;
+			chunks[i + 1].type = type;
+			chunks[i + 2].addr = addr + size;
+			chunks[i + 2].size =
+				chunks[i].addr + chunks[i].size - (addr + size);
+			chunks[i + 2].type = chunks[i].type;
+			chunks[i].size = addr - chunks[i].addr;
+			i += 2;
+		}
+	}
+}
+
--- /dev/null
+++ b/arch/s390/kernel/meminfo.c
@@ -0,0 +1,132 @@
+/*
+ * Store memory information for external users like stand-alone dump tools
+ *
+ * Copyright IBM Corp. 2011
+ * Author(s): Michael Holzheu <holzheu@linux.vnet.ibm.com>
+ */
+
+#include <asm/asm-offsets.h>
+#include <asm/lowcore.h>
+#include <asm/checksum.h>
+
+struct meminfo meminfo_array[MEMINFO_TYPE_LAST];
+
+static inline int meminfo_ind_cnt(struct meminfo *meminfo)
+{
+	return meminfo->size / sizeof(struct meminfo);
+}
+
+/*
+ * Recursively update meminfo checksums
+ */
+static void meminfo_csum_update(struct meminfo *meminfo)
+{
+	struct meminfo *child;
+	int i;
+
+	if (!(meminfo->flags & MEMINFO_FLAG_CSUM_VALID))
+		return;
+	if (meminfo->flags & MEMINFO_FLAG_ELEM_IND) {
+		child = (struct meminfo *) meminfo->addr;
+		for (i = 0; i < meminfo_ind_cnt(meminfo); i++) {
+			if (!(child[i].flags & MEMINFO_FLAG_ELEM_VALID))
+				continue;
+			meminfo_csum_update(&child[i]);
+		}
+	}
+	meminfo->csum = csum_partial_real((void *) meminfo->addr,
+					  meminfo->size, 0);
+}
+
+/*
+ * Verify checksum for meminfo element(s)
+ */
+int meminfo_csum_check(struct meminfo *meminfo, int recursive)
+{
+	struct meminfo *child;
+	u32 csum;
+	int i;
+
+	if (!(meminfo->flags & MEMINFO_FLAG_CSUM_VALID))
+		return 0;
+	csum = csum_partial_real((void *) meminfo->addr, meminfo->size, 0);
+	if (meminfo->csum != csum)
+		return -EINVAL;
+	if (!recursive)
+		return 0;
+	if (meminfo->flags & MEMINFO_FLAG_ELEM_IND) {
+		child = (struct meminfo *) meminfo->addr;
+		for (i = 0; i < meminfo_ind_cnt(meminfo); i++) {
+			if (!(child[i].flags & MEMINFO_FLAG_ELEM_VALID))
+				continue;
+			if (meminfo_csum_check(&child[i], 1))
+				return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Update root meminfo element and corresponding checksum
+ */
+static void meminfo_update_root(void)
+{
+	struct meminfo root;
+	u32 csum;
+
+	copy_from_absolute_zero(&root, &S390_lowcore.meminfo, sizeof(root));
+	meminfo_csum_update(&root);
+	copy_to_absolute_zero(&S390_lowcore.meminfo, &root, sizeof(root));
+	csum = csum_partial(&root, sizeof(root), 0);
+	copy_to_absolute_zero(&S390_lowcore.meminfo_csum, &csum, sizeof(csum));
+}
+
+/*
+ * Add memory info for given type
+ */
+void meminfo_update(enum meminfo_type type, void *buf, unsigned long size,
+		    u32 flags)
+{
+	struct meminfo *meminfo = &meminfo_array[type];
+
+	meminfo->addr = (unsigned long) buf;
+	meminfo->size = size;
+	meminfo->flags = flags;
+	meminfo_update_root();
+}
+
+/*
+ * Init meminfo and setup absolute zero pointer
+ */
+void __init meminfo_init(void)
+{
+	struct meminfo root;
+
+	root.addr = (unsigned long) &meminfo_array,
+	root.size = sizeof(meminfo_array),
+	root.flags = MEMINFO_FLAG_ELEM_VALID | MEMINFO_FLAG_ELEM_IND |
+		MEMINFO_FLAG_CSUM_VALID;
+	copy_to_absolute_zero(&S390_lowcore.meminfo, &root, sizeof(root));
+	meminfo_update_root();
+}
+
+#ifdef CONFIG_CRASH_DUMP
+/*
+ * Get meminfo from old kernel
+ */
+int meminfo_old_get(enum meminfo_type type, struct meminfo *meminfo)
+{
+	struct meminfo root, *meminfo_array_old;
+
+	if (!oldmem_base)
+		return -ENOENT;
+	memcpy_real(&root, (void *) oldmem_base + __LC_MEMINFO, sizeof(root));
+	if (type > meminfo_ind_cnt(&root))
+		return -ENOENT;
+	meminfo_array_old = (struct meminfo *) (oldmem_base + root.addr);
+	memcpy_real(meminfo, &meminfo_array_old[type], sizeof(*meminfo));
+	if (!(meminfo->flags & MEMINFO_FLAG_ELEM_VALID))
+		return -ENOENT;
+	return 0;
+}
+#endif
--- a/arch/s390/kernel/reipl64.S
+++ b/arch/s390/kernel/reipl64.S
@@ -1,5 +1,5 @@
 /*
- *    Copyright IBM Corp 2000,2009
+ *    Copyright IBM Corp 2000,2011
  *    Author(s): Holger Smolinski <Holger.Smolinski@de.ibm.com>,
  *		 Denis Joseph Barrow,
  */
@@ -7,6 +7,66 @@
 #include <asm/asm-offsets.h>
 
 #
+# do_store_status
+#
+# Prerequisites to run this function:
+# - DAT mode is off
+# - Prefix register is set to zero
+# - Original prefix register is stored in "dump_prefix_page"
+# - Lowcore protection is off
+#
+	.globl	do_store_status
+do_store_status:
+	/* Save register one and load save area base */
+	stg	%r1,__LC_SAVE_AREA_64(%r0)
+	lghi	%r1,SAVE_AREA_BASE
+	/* General purpose registers */
+	stmg	%r0,%r15,__LC_GPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	lg	%r2,__LC_SAVE_AREA_64(%r0)
+	stg	%r2,__LC_GPREGS_SAVE_AREA-SAVE_AREA_BASE+8(%r1)
+	/* Control registers */
+	stctg	%c0,%c15,__LC_CREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	/* Access registers */
+	stam	%a0,%a15,__LC_AREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	/* Floating point registers */
+	std	%f0, 0x00 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f1, 0x08 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f2, 0x10 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f3, 0x18 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f4, 0x20 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f5, 0x28 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f6, 0x30 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f7, 0x38 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f8, 0x40 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f9, 0x48 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f10,0x50 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f11,0x58 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f12,0x60 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f13,0x68 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f14,0x70 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f15,0x78 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	/* Floating point control register */
+	stfpc	__LC_FP_CREG_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	/* CPU timer */
+	stpt	__LC_CPU_TIMER_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	/* Saved prefix register */
+	larl	%r2,dump_prefix_page
+	mvc	__LC_PREFIX_SAVE_AREA-SAVE_AREA_BASE(4,%r1),0(%r2)
+	/* Clock comparator - seven bytes */
+	larl	%r2,.Lclkcmp
+	stckc	0(%r2)
+	mvc	__LC_CLOCK_COMP_SAVE_AREA-SAVE_AREA_BASE + 1(7,%r1),1(%r2)
+	/* Program status word */
+	epsw	%r2,%r3
+	st	%r2,__LC_PSW_SAVE_AREA-SAVE_AREA_BASE + 0(%r1)
+	st	%r3,__LC_PSW_SAVE_AREA-SAVE_AREA_BASE + 4(%r1)
+	larl	%r2,do_store_status
+	stg	%r2,__LC_PSW_SAVE_AREA-SAVE_AREA_BASE + 8(%r1)
+	br	%r14
+.align	8
+.Lclkcmp:	.quad	0x0000000000000000
+
+#
 # do_reipl_asm
 # Parameter: r2 = schid of reipl device
 #
@@ -14,22 +74,7 @@
 		.globl	do_reipl_asm
 do_reipl_asm:	basr	%r13,0
 .Lpg0:		lpswe	.Lnewpsw-.Lpg0(%r13)
-.Lpg1:		# do store status of all registers
-
-		stg	%r1,.Lregsave-.Lpg0(%r13)
-		lghi	%r1,0x1000
-		stmg	%r0,%r15,__LC_GPREGS_SAVE_AREA-0x1000(%r1)
-		lg	%r0,.Lregsave-.Lpg0(%r13)
-		stg	%r0,__LC_GPREGS_SAVE_AREA-0x1000+8(%r1)
-		stctg	%c0,%c15,__LC_CREGS_SAVE_AREA-0x1000(%r1)
-		stam	%a0,%a15,__LC_AREGS_SAVE_AREA-0x1000(%r1)
-		lg	%r10,.Ldump_pfx-.Lpg0(%r13)
-		mvc	__LC_PREFIX_SAVE_AREA-0x1000(4,%r1),0(%r10)
-		stfpc	__LC_FP_CREG_SAVE_AREA-0x1000(%r1)
-		stckc	.Lclkcmp-.Lpg0(%r13)
-		mvc	__LC_CLOCK_COMP_SAVE_AREA-0x1000(7,%r1),.Lclkcmp-.Lpg0(%r13)
-		stpt	__LC_CPU_TIMER_SAVE_AREA-0x1000(%r1)
-		stg	%r13, __LC_PSW_SAVE_AREA-0x1000+8(%r1)
+.Lpg1:		brasl	%r14,do_store_status
 
 		lctlg	%c6,%c6,.Lall-.Lpg0(%r13)
 		lgr	%r1,%r2
@@ -66,10 +111,7 @@ do_reipl_asm:	basr	%r13,0
 		st	%r14,.Ldispsw+12-.Lpg0(%r13)
 		lpswe	.Ldispsw-.Lpg0(%r13)
 		.align	8
-.Lclkcmp:	.quad	0x0000000000000000
 .Lall:		.quad	0x00000000ff000000
-.Ldump_pfx:	.quad	dump_prefix_page
-.Lregsave:	.quad	0x0000000000000000
 		.align	16
 /*
  * These addresses have to be 31 bit otherwise
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -42,6 +42,9 @@
 #include <linux/reboot.h>
 #include <linux/topology.h>
 #include <linux/ftrace.h>
+#include <linux/kexec.h>
+#include <linux/crash_dump.h>
+#include <linux/memory.h>
 
 #include <asm/ipl.h>
 #include <asm/uaccess.h>
@@ -57,6 +60,7 @@
 #include <asm/ebcdic.h>
 #include <asm/compat.h>
 #include <asm/kvm_virtio.h>
+#include <asm/diag.h>
 
 long psw_kernel_bits	= (PSW_BASE_BITS | PSW_MASK_DAT | PSW_ASC_PRIMARY |
 			   PSW_MASK_MCHECK | PSW_DEFAULT_KEY);
@@ -346,7 +350,7 @@ setup_lowcore(void)
 	lc = __alloc_bootmem_low(LC_PAGES * PAGE_SIZE, LC_PAGES * PAGE_SIZE, 0);
 	lc->restart_psw.mask = PSW_BASE_BITS | PSW_DEFAULT_KEY;
 	lc->restart_psw.addr =
-		PSW_ADDR_AMODE | (unsigned long) restart_int_handler;
+		PSW_ADDR_AMODE | (unsigned long) psw_restart_int_handler;
 	if (user_mode != HOME_SPACE_MODE)
 		lc->restart_psw.mask |= PSW_ASC_HOME;
 	lc->external_new_psw.mask = psw_kernel_bits;
@@ -435,6 +439,9 @@ static void __init setup_resources(void)
 	for (i = 0; i < MEMORY_CHUNKS; i++) {
 		if (!memory_chunk[i].size)
 			continue;
+		if (memory_chunk[i].type == CHUNK_OLDMEM ||
+		    memory_chunk[i].type == CHUNK_CRASHK)
+			continue;
 		res = alloc_bootmem_low(sizeof(*res));
 		res->flags = IORESOURCE_BUSY | IORESOURCE_MEM;
 		switch (memory_chunk[i].type) {
@@ -479,6 +486,7 @@ static void __init setup_memory_end(void
 	unsigned long max_mem;
 	int i;
 
+
 #ifdef CONFIG_ZFCPDUMP
 	if (ipl_info.type == IPL_TYPE_FCP_DUMP) {
 		memory_end = ZFCPDUMP_HSA_SIZE;
@@ -529,6 +537,193 @@ static void __init setup_memory_end(void
 		memory_end = memory_size;
 }
 
+void *restart_stack __attribute__((__section__(".data")));
+
+/*
+ * Setup new PSW and allocate stack for PSW restart interrupt
+ */
+static void __init setup_restart_psw(void)
+{
+	psw_t psw;
+
+	restart_stack = __alloc_bootmem(ASYNC_SIZE, ASYNC_SIZE, 0);
+	restart_stack += ASYNC_SIZE;
+
+	/*
+	 * Setup restart PSW for absolute zero lowcore. This is necesary
+	 * if PSW restart is done on an offline CPU that has lowcore zero
+	 */
+	psw.mask = PSW_BASE_BITS | PSW_DEFAULT_KEY;
+	psw.addr = PSW_ADDR_AMODE | (unsigned long) psw_restart_int_handler;
+	copy_to_absolute_zero(&S390_lowcore.restart_psw, &psw, sizeof(psw));
+}
+
+#ifdef CONFIG_CRASH_DUMP
+
+/*
+ * Find suitable location for crashkernel memory
+ */
+static unsigned long __init find_crash_base(unsigned long crash_size)
+{
+	unsigned long crash_base;
+	struct mem_chunk *chunk;
+	int i;
+
+	if (is_kdump_kernel() && (crash_size == oldmem_size))
+		return oldmem_base;
+
+	for (i = MEMORY_CHUNKS - 1; i >= 0; i--) {
+		chunk = &memory_chunk[i];
+		if (chunk->size == 0)
+			continue;
+		if (chunk->type != CHUNK_READ_WRITE)
+			continue;
+		if (chunk->size < crash_size)
+			continue;
+		crash_base = max(chunk->addr, crash_size);
+		crash_base = max(crash_base, ZFCPDUMP_HSA_SIZE_MAX);
+		crash_base = max(crash_base, (unsigned long) INITRD_START +
+				 INITRD_SIZE);
+		crash_base = PAGE_ALIGN(crash_base);
+		if (crash_base >= chunk->addr + chunk->size)
+			continue;
+		if (chunk->addr + chunk->size - crash_base < crash_size)
+			continue;
+		crash_base = chunk->size - crash_size;
+		return crash_base;
+	}
+	return 0;
+}
+
+/*
+ * Check if crash_base and crash_size is valid
+ */
+static int __init verify_crash_base(unsigned long crash_base,
+				    unsigned long crash_size)
+{
+	struct mem_chunk *chunk;
+	int i;
+
+	/*
+	 * Because we do the swap to zero, we must have at least 'crash_size'
+	 * bytes free space before crash_base
+	 */
+	if (crash_size > crash_base)
+		return -EINVAL;
+
+	/* First memory chunk must be at least crash_size */
+	if (memory_chunk[0].size < crash_size)
+		return -EINVAL;
+
+	/* Check if we fit into the respective memory chunk */
+	for (i = 0; i < MEMORY_CHUNKS; i++) {
+		chunk = &memory_chunk[i];
+		if (chunk->size == 0)
+			continue;
+		if (crash_base < chunk->addr)
+			continue;
+		if (crash_base >= chunk->addr + chunk->size)
+			continue;
+		/* we have found the memory chunk */
+		if (crash_base + crash_size > chunk->addr + chunk->size)
+			return -EINVAL;
+		return 0;
+	}
+	return -EINVAL;
+}
+
+/*
+ * Reserve kdump memory by creating a memory hole in the mem_chunk array
+ */
+static void __init reserve_kdump_bootmem(unsigned long addr, unsigned long size,
+					 int type)
+{
+	create_mem_hole(memory_chunk, addr, size, type);
+}
+
+/*
+ * When kdump is enabled, we have to ensure that no memory from
+ * the area [0 - crashkernel memory size] is set offline
+ */
+static int kdump_mem_notifier(struct notifier_block *nb,
+			      unsigned long action, void *data)
+{
+	struct memory_notify *arg = data;
+
+	if (arg->start_pfn >= PFN_DOWN(crashk_res.end - crashk_res.start + 1))
+		return NOTIFY_OK;
+	return NOTIFY_BAD;
+}
+
+static struct notifier_block kdump_mem_nb = {
+	.notifier_call = kdump_mem_notifier,
+};
+#endif
+
+/*
+ * Make sure that oldmem, where the dump is stored, is protected
+ */
+static void reserve_oldmem(void)
+{
+#ifdef CONFIG_CRASH_DUMP
+	if (!is_kdump_kernel())
+		return;
+
+	reserve_kdump_bootmem(oldmem_base, oldmem_size, CHUNK_OLDMEM);
+	reserve_kdump_bootmem(oldmem_size, memory_end - oldmem_size,
+			      CHUNK_OLDMEM);
+	if (oldmem_base + oldmem_size == real_memory_size)
+		saved_max_pfn = PFN_DOWN(oldmem_base) - 1;
+	else
+		saved_max_pfn = PFN_DOWN(real_memory_size) - 1;
+#endif
+}
+
+/*
+ * Reserve memory for kdump kernel to be loaded with kexec
+ */
+static void __init reserve_crashkernel(void)
+{
+#ifdef CONFIG_CRASH_DUMP
+	unsigned long long crash_base, crash_size;
+	int rc;
+
+	rc = parse_crashkernel(boot_command_line, memory_end, &crash_size,
+			       &crash_base);
+	if (rc || crash_size == 0)
+		return;
+	if (register_memory_notifier(&kdump_mem_nb))
+		return;
+	if (!crash_base)
+		crash_base = find_crash_base(crash_size);
+	if (!crash_base) {
+		pr_info("crashkernel reservation failed: %s\n",
+			"No suitable area found");
+		unregister_memory_notifier(&kdump_mem_nb);
+		return;
+	}
+	if (verify_crash_base(crash_base, crash_size)) {
+		pr_info("crashkernel reservation failed: %s\n",
+			"Invalid memory range specified");
+		unregister_memory_notifier(&kdump_mem_nb);
+		return;
+	}
+	if (!is_kdump_kernel() && MACHINE_IS_VM)
+		diag10_range(PFN_DOWN(crash_base), PFN_DOWN(crash_size));
+	crashk_res.start = crash_base;
+	crashk_res.end = crash_base + crash_size - 1;
+	insert_resource(&iomem_resource, &crashk_res);
+	meminfo_update(MEMINFO_TYPE_KDUMP_MEM, (void *) crash_base,
+		       crash_size, MEMINFO_FLAG_ELEM_VALID);
+	reserve_kdump_bootmem(crashk_res.start,
+			      crashk_res.end - crashk_res.start + 1,
+			      CHUNK_CRASHK);
+	pr_info("Reserving %lluMB of memory at %lluMB "
+		"for crashkernel (System RAM: %luMB)\n",
+		crash_size >> 20, crash_base >> 20, memory_end >> 20);
+#endif
+}
+
 static void __init
 setup_memory(void)
 {
@@ -559,6 +754,14 @@ setup_memory(void)
 		if (PFN_PHYS(start_pfn) + bmap_size > INITRD_START) {
 			start = PFN_PHYS(start_pfn) + bmap_size + PAGE_SIZE;
 
+#ifdef CONFIG_CRASH_DUMP
+			if (is_kdump_kernel()) {
+				/* Move initrd behind kdump oldmem */
+				if (start + INITRD_SIZE > oldmem_base &&
+				    start < oldmem_base + oldmem_size)
+					start = oldmem_base + oldmem_size;
+			}
+#endif
 			if (start + INITRD_SIZE > memory_end) {
 				pr_err("initrd extends beyond end of "
 				       "memory (0x%08lx > 0x%08lx) "
@@ -787,11 +990,16 @@ setup_arch(char **cmdline_p)
 
 	parse_early_param();
 
+	meminfo_update(MEMINFO_TYPE_VMCOREINFO, &vmcoreinfo_note,
+		       sizeof(vmcoreinfo_note), MEMINFO_FLAG_ELEM_VALID);
 	setup_ipl();
 	setup_memory_end();
 	setup_addressing_mode();
+	reserve_oldmem();
+	reserve_crashkernel();
 	setup_memory();
 	setup_resources();
+	setup_restart_psw();
 	setup_lowcore();
 
         cpu_init();
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -38,6 +38,7 @@
 #include <linux/timex.h>
 #include <linux/bootmem.h>
 #include <linux/slab.h>
+#include <linux/crash_dump.h>
 #include <asm/asm-offsets.h>
 #include <asm/ipl.h>
 #include <asm/setup.h>
@@ -281,11 +282,11 @@ void smp_ctl_clear_bit(int cr, int bit)
 }
 EXPORT_SYMBOL(smp_ctl_clear_bit);
 
-#ifdef CONFIG_ZFCPDUMP
+#if defined(CONFIG_ZFCPDUMP) || defined(CONFIG_CRASH_DUMP)
 
 static void __init smp_get_save_area(unsigned int cpu, unsigned int phy_cpu)
 {
-	if (ipl_info.type != IPL_TYPE_FCP_DUMP)
+	if (ipl_info.type != IPL_TYPE_FCP_DUMP && !is_kdump_kernel())
 		return;
 	if (cpu >= NR_CPUS) {
 		pr_warning("CPU %i exceeds the maximum %i and is excluded from "
@@ -403,6 +404,19 @@ static void __init smp_detect_cpus(void)
 	info = kmalloc(sizeof(*info), GFP_KERNEL);
 	if (!info)
 		panic("smp_detect_cpus failed to allocate memory\n");
+
+#ifdef CONFIG_CRASH_DUMP
+	if (is_kdump_kernel()) {
+		struct save_area *save_area;
+
+		save_area = kmalloc(sizeof(*save_area), GFP_KERNEL);
+		if (!save_area)
+			panic("could not allocate memory for save area\n");
+		crash_read_from_oldmem(save_area, sizeof(*save_area),
+				       SAVE_AREA_BASE, 0);
+		zfcpdump_save_areas[0] = save_area;
+	}
+#endif
 	/* Use sigp detection algorithm if sclp doesn't work. */
 	if (sclp_get_cpu_info(info)) {
 		smp_use_sigp_detection = 1;
@@ -470,6 +484,11 @@ int __cpuinit start_secondary(void *cpuv
 	ipi_call_unlock();
 	/* Switch on interrupts */
 	local_irq_enable();
+	__ctl_clear_bit(0, 28); /* Disable lowcore protection */
+	S390_lowcore.restart_psw.mask = PSW_BASE_BITS | PSW_DEFAULT_KEY;
+	S390_lowcore.restart_psw.addr =
+		PSW_ADDR_AMODE | (unsigned long) psw_restart_int_handler;
+	__ctl_set_bit(0, 28); /* Enable lowcore protection */
 	/* cpu_idle will call schedule for us */
 	cpu_idle();
 	return 0;
@@ -507,6 +526,9 @@ static int __cpuinit smp_alloc_lowcore(i
 	memset((char *)lowcore + 512, 0, sizeof(*lowcore) - 512);
 	lowcore->async_stack = async_stack + ASYNC_SIZE;
 	lowcore->panic_stack = panic_stack + PAGE_SIZE;
+	lowcore->restart_psw.mask = PSW_BASE_BITS | PSW_DEFAULT_KEY;
+	lowcore->restart_psw.addr =
+		PSW_ADDR_AMODE | (unsigned long) restart_int_handler;
 
 #ifndef CONFIG_64BIT
 	if (MACHINE_HAS_IEEE) {
--- a/arch/s390/mm/maccess.c
+++ b/arch/s390/mm/maccess.c
@@ -11,6 +11,7 @@
 #include <linux/kernel.h>
 #include <linux/types.h>
 #include <linux/errno.h>
+#include <linux/gfp.h>
 #include <asm/system.h>
 
 /*
@@ -60,6 +61,9 @@ long probe_kernel_write(void *dst, const
 	return copied < 0 ? -EFAULT : 0;
 }
 
+/*
+ * Copy memory in real mode (kernel to kernel)
+ */
 int memcpy_real(void *dest, void *src, size_t count)
 {
 	register unsigned long _dest asm("2") = (unsigned long) dest;
@@ -85,3 +89,82 @@ int memcpy_real(void *dest, void *src, s
 	arch_local_irq_restore(flags);
 	return rc;
 }
+
+/*
+ * Copy memory from kernel (real) to user (virtual)
+ */
+int copy_to_user_real(void __user *dest, void *src, size_t count)
+{
+	int offs = 0, size, rc;
+	char *buf;
+
+	buf = (char *) __get_free_page(GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+	rc = -EFAULT;
+	while (offs < count) {
+		size = min(PAGE_SIZE, count - offs);
+		if (memcpy_real(buf, src + offs, size))
+			goto out;
+		if (copy_to_user(dest + offs, buf, size))
+			goto out;
+		offs += size;
+	}
+	rc = 0;
+out:
+	free_page((unsigned long) buf);
+	return rc;
+}
+
+/*
+ * Copy memory from user (virtual) to kernel (real)
+ */
+int copy_from_user_real(void *dest, void __user *src, size_t count)
+{
+	int offs = 0, size, rc;
+	char *buf;
+
+	buf = (char *) __get_free_page(GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+	rc = -EFAULT;
+	while (offs < count) {
+		size = min(PAGE_SIZE, count - offs);
+		if (copy_from_user(buf, src + offs, size))
+			goto out;
+		if (memcpy_real(dest + offs, buf, size))
+			goto out;
+		offs += size;
+	}
+	rc = 0;
+out:
+	free_page((unsigned long) buf);
+	return rc;
+}
+
+/*
+ * Copy memory to absolute zero
+ */
+void copy_to_absolute_zero(void *dest, void *src, size_t count)
+{
+	unsigned long cr0;
+
+	BUG_ON((unsigned long) dest + count >= sizeof(struct _lowcore));
+	preempt_disable();
+	__ctl_store(cr0, 0, 0);
+	__ctl_clear_bit(0, 28); /* disable lowcore protection */
+	memcpy_real(dest + store_prefix(), src, count);
+	__ctl_load(cr0, 0, 0);
+	preempt_enable();
+}
+
+/*
+ * Copy memory from absolute zero
+ */
+void copy_from_absolute_zero(void *dest, void *src, size_t count)
+{
+	BUG_ON((unsigned long) src + count >= sizeof(struct _lowcore));
+	preempt_disable();
+	memcpy_real(dest, src + store_prefix(), count);
+	preempt_enable();
+}
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -335,6 +335,9 @@ void __init vmem_map_init(void)
 	ro_start = ((unsigned long)&_stext) & PAGE_MASK;
 	ro_end = PFN_ALIGN((unsigned long)&_eshared);
 	for (i = 0; i < MEMORY_CHUNKS && memory_chunk[i].size > 0; i++) {
+		if (memory_chunk[i].type == CHUNK_CRASHK ||
+		    memory_chunk[i].type == CHUNK_OLDMEM)
+			continue;
 		start = memory_chunk[i].addr;
 		end = memory_chunk[i].addr + memory_chunk[i].size;
 		if (start >= ro_end || end <= ro_start)
--- a/drivers/s390/char/zcore.c
+++ b/drivers/s390/char/zcore.c
@@ -142,22 +142,6 @@ static int memcpy_hsa_kernel(void *dest,
 	return memcpy_hsa(dest, src, count, TO_KERNEL);
 }
 
-static int memcpy_real_user(void __user *dest, unsigned long src, size_t count)
-{
-	static char buf[4096];
-	int offs = 0, size;
-
-	while (offs < count) {
-		size = min(sizeof(buf), count - offs);
-		if (memcpy_real(buf, (void *) src + offs, size))
-			return -EFAULT;
-		if (copy_to_user(dest + offs, buf, size))
-			return -EFAULT;
-		offs += size;
-	}
-	return 0;
-}
-
 static int __init init_cpu_info(enum arch_id arch)
 {
 	struct save_area *sa;
@@ -346,8 +330,8 @@ static ssize_t zcore_read(struct file *f
 
 	/* Copy from real mem */
 	size = count - mem_offs - hdr_count;
-	rc = memcpy_real_user(buf + hdr_count + mem_offs, mem_start + mem_offs,
-			      size);
+	rc = copy_to_user_real(buf + hdr_count + mem_offs,
+			       (void *) mem_start + mem_offs, size);
 	if (rc)
 		goto fail;
 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch 8/9] s390: kdump backend code
@ 2011-07-04 17:09   ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, linux-s390, kexec, heiko.carstens, linux-kernel, horms,
	schwidefsky

[-- Attachment #1: 08-s390-kdump-arch.patch --]
[-- Type: text/plain, Size: 69506 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

This patch provides the architecture specific part of the s390 kdump
support. This includes the following changes:
* S390 backend code for kdump/kexec framework
* New restart shutdown trigger and kdump action
* New meminfo interface to allow external kdump triggers

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 arch/s390/Kconfig                 |   10 
 arch/s390/include/asm/checksum.h  |   18 +
 arch/s390/include/asm/ipl.h       |    4 
 arch/s390/include/asm/kexec.h     |    3 
 arch/s390/include/asm/lowcore.h   |   62 +++++
 arch/s390/include/asm/sclp.h      |    1 
 arch/s390/include/asm/setup.h     |    5 
 arch/s390/include/asm/system.h    |    4 
 arch/s390/kernel/Makefile         |    3 
 arch/s390/kernel/asm-offsets.c    |    7 
 arch/s390/kernel/base.S           |   37 +++
 arch/s390/kernel/crash_dump.c     |   76 ++++++
 arch/s390/kernel/crash_dump_elf.c |  434 ++++++++++++++++++++++++++++++++++++++
 arch/s390/kernel/early.c          |   12 +
 arch/s390/kernel/entry.S          |   28 ++
 arch/s390/kernel/entry64.S        |   21 +
 arch/s390/kernel/head.S           |   14 +
 arch/s390/kernel/head_kdump.S     |  133 +++++++++++
 arch/s390/kernel/ipl.c            |  201 ++++++++++++++---
 arch/s390/kernel/machine_kexec.c  |  164 ++++++++++++++
 arch/s390/kernel/mem_detect.c     |   70 ++++++
 arch/s390/kernel/meminfo.c        |  132 +++++++++++
 arch/s390/kernel/reipl64.S        |   82 +++++--
 arch/s390/kernel/setup.c          |  210 ++++++++++++++++++
 arch/s390/kernel/smp.c            |   26 ++
 arch/s390/mm/maccess.c            |   83 +++++++
 arch/s390/mm/vmem.c               |    3 
 drivers/s390/char/zcore.c         |   20 -
 28 files changed, 1784 insertions(+), 79 deletions(-)

--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -567,6 +567,16 @@ config KEXEC
 	  current kernel, and to start another kernel.  It is like a reboot
 	  but is independent of hardware/microcode support.
 
+config CRASH_DUMP
+	bool "kernel crash dumps"
+	depends on 64BIT
+	help
+	  Generate crash dump after being started by kexec.
+	  Crash dump kernels are loaded in the main kernel with kexec-tools
+	  into a specially reserved region and then later executed after
+	  a crash by kdump/kexec.
+	  For more details see Documentation/kdump/kdump.txt
+
 config ZFCPDUMP
 	def_bool n
 	prompt "zfcpdump support"
--- a/arch/s390/include/asm/checksum.h
+++ b/arch/s390/include/asm/checksum.h
@@ -41,6 +41,24 @@ csum_partial(const void *buff, int len,
 }
 
 /*
+ * The same as csum_partial(), but operates on real memory
+ */
+static inline __wsum csum_partial_real(const void *buf, int len, __wsum sum)
+{
+	register unsigned long reg2 asm("2") = (unsigned long) buf;
+	register unsigned long reg3 asm("3") = (unsigned long) len;
+	unsigned long flags;
+
+	flags = __arch_local_irq_stnsm(0xf8UL);
+	asm volatile(
+		"0:	cksm	%0,%1\n"
+		"	jo	0b\n"
+		: "+d" (sum), "+d" (reg2), "+d" (reg3) : : "cc", "memory");
+	arch_local_irq_restore(flags);
+	return sum;
+}
+
+/*
  * the same as csum_partial_copy, but copies from user space.
  *
  * here even more important to align src and dst on a 32-bit (or even
--- a/arch/s390/include/asm/ipl.h
+++ b/arch/s390/include/asm/ipl.h
@@ -167,5 +167,9 @@ enum diag308_rc {
 };
 
 extern int diag308(unsigned long subcode, void *addr);
+void do_reset_diag308(void);
+void do_store_status(void);
+ssize_t crash_read_from_oldmem(void *buf, size_t count, u64 ppos, int userbuf);
+void machine_kdump(void);
 
 #endif /* _ASM_S390_IPL_H */
--- a/arch/s390/include/asm/kexec.h
+++ b/arch/s390/include/asm/kexec.h
@@ -30,6 +30,9 @@
 /* Not more than 2GB */
 #define KEXEC_CONTROL_MEMORY_LIMIT (1UL<<31)
 
+/* Maximum address we can use for the crash control pages */
+#define KEXEC_CRASH_CONTROL_MEMORY_LIMIT (-1UL)
+
 /* Allocate one page for the pdp and the second for the code */
 #define KEXEC_CONTROL_PAGE_SIZE 4096
 
--- a/arch/s390/include/asm/lowcore.h
+++ b/arch/s390/include/asm/lowcore.h
@@ -18,6 +18,45 @@ void system_call(void);
 void pgm_check_handler(void);
 void mcck_int_handler(void);
 void io_int_handler(void);
+void psw_restart_int_handler(void);
+
+/*
+ * Meminfo types: The defined numbers are ABI and must not be changed
+ */
+enum meminfo_type {
+	MEMINFO_TYPE_IPIB	= 0,
+	MEMINFO_TYPE_VMCOREINFO	= 1,
+	MEMINFO_TYPE_KDUMP_MEM	= 2,
+	MEMINFO_TYPE_KDUMP_SEGM	= 3,
+	MEMINFO_TYPE_LAST	= 4,
+};
+
+/*
+ * Meminfo flags: The flags are ABI and must not be changed
+ */
+#define MEMINFO_FLAG_ELEM_VALID	0x00000001U
+#define MEMINFO_FLAG_ELEM_IND	0x00000002U
+#define MEMINFO_FLAG_CSUM_VALID	0x00000004U
+
+struct meminfo {
+	unsigned long	addr;
+	unsigned long	size;
+	u32		csum;
+	u32		flags;
+} __packed;
+
+extern struct meminfo meminfo_array[MEMINFO_TYPE_LAST];
+
+void meminfo_init(void);
+int meminfo_csum_check(struct meminfo *meminfo, int recursive);
+void meminfo_update(enum meminfo_type type, void *buf, unsigned long size,
+		    u32 flags);
+
+#ifdef CONFIG_CRASH_DUMP
+int meminfo_old_get(enum meminfo_type type, struct meminfo *meminfo);
+extern unsigned long oldmem_base;
+extern unsigned long oldmem_size;
+#endif
 
 #ifdef CONFIG_32BIT
 
@@ -150,7 +189,14 @@ struct _lowcore {
 	 */
 	__u32	ipib;				/* 0x0e00 */
 	__u32	ipib_checksum;			/* 0x0e04 */
-	__u8	pad_0x0e08[0x0f00-0x0e08];	/* 0x0e08 */
+
+	/* 64 bit save area */
+	__u64	save_area_64;			/* 0x0e08 */
+
+	/* meminfo root */
+	struct meminfo	meminfo;		/* 0x0e10 */
+	__u32	meminfo_csum;			/* 0x0e20 */
+	__u8	pad_0x0e24[0x0f00-0x0e24];	/* 0x0e24 */
 
 	/* Extended facility list */
 	__u64	stfle_fac_list[32];		/* 0x0f00 */
@@ -286,7 +332,19 @@ struct _lowcore {
 	 */
 	__u64	ipib;				/* 0x0e00 */
 	__u32	ipib_checksum;			/* 0x0e08 */
-	__u8	pad_0x0e0c[0x0f00-0x0e0c];	/* 0x0e0c */
+
+	/* 64 bit save area */
+	__u64	save_area_64;			/* 0x0e0c */
+
+	/* meminfo root */
+	struct meminfo meminfo;			/* 0x0e14 */
+	__u32	meminfo_csum;			/* 0x0e2c */
+
+	/* oldmem base */
+	__u64	oldmem_base;			/* 0x0e30 */
+	/* oldmem size */
+	__u64	oldmem_size;			/* 0x0e38 */
+	__u8	pad_0x0e40[0x0f00-0x0e40];	/* 0x0e40 */
 
 	/* Extended facility list */
 	__u64	stfle_fac_list[32];		/* 0x0f00 */
--- a/arch/s390/include/asm/sclp.h
+++ b/arch/s390/include/asm/sclp.h
@@ -55,4 +55,5 @@ int sclp_chp_deconfigure(struct chp_id c
 int sclp_chp_read_info(struct sclp_chp_info *info);
 void sclp_get_ipl_info(struct sclp_ipl_info *info);
 
+void _sclp_print_early(const char *);
 #endif /* _ASM_S390_SCLP_H */
--- a/arch/s390/include/asm/setup.h
+++ b/arch/s390/include/asm/setup.h
@@ -35,6 +35,8 @@
 
 #define CHUNK_READ_WRITE 0
 #define CHUNK_READ_ONLY  1
+#define CHUNK_OLDMEM     4
+#define CHUNK_CRASHK     5
 
 struct mem_chunk {
 	unsigned long addr;
@@ -48,6 +50,8 @@ extern int memory_end_set;
 extern unsigned long memory_end;
 
 void detect_memory_layout(struct mem_chunk chunk[]);
+void create_mem_hole(struct mem_chunk memory_chunk[], unsigned long addr,
+		     unsigned long size, int type);
 
 #define PRIMARY_SPACE_MODE	0
 #define ACCESS_REGISTER_MODE	1
@@ -106,6 +110,7 @@ extern unsigned int user_mode;
 #endif /* __s390x__ */
 
 #define ZFCPDUMP_HSA_SIZE	(32UL<<20)
+#define ZFCPDUMP_HSA_SIZE_MAX	(64UL<<20)
 
 /*
  * Console mode. Override with conmode=
--- a/arch/s390/include/asm/system.h
+++ b/arch/s390/include/asm/system.h
@@ -113,6 +113,10 @@ extern void pfault_fini(void);
 
 extern void cmma_init(void);
 extern int memcpy_real(void *, void *, size_t);
+extern int copy_to_user_real(void __user *dest, void *src, size_t count);
+extern int copy_from_user_real(void *dest, void __user *src, size_t count);
+extern void copy_to_absolute_zero(void *dest, void *src, size_t count);
+extern void copy_from_absolute_zero(void *dest, void *src, size_t count);
 
 #define finish_arch_switch(prev) do {					     \
 	set_fs(current->thread.mm_segment);				     \
--- a/arch/s390/kernel/Makefile
+++ b/arch/s390/kernel/Makefile
@@ -23,7 +23,7 @@ CFLAGS_sysinfo.o += -Iinclude/math-emu -
 obj-y	:=  bitmap.o traps.o time.o process.o base.o early.o setup.o vtime.o \
 	    processor.o sys_s390.o ptrace.o signal.o cpcmd.o ebcdic.o nmi.o \
 	    debug.o irq.o ipl.o dis.o diag.o mem_detect.o sclp.o vdso.o \
-	    sysinfo.o jump_label.o
+	    sysinfo.o jump_label.o meminfo.o
 
 obj-y	+= $(if $(CONFIG_64BIT),entry64.o,entry.o)
 obj-y	+= $(if $(CONFIG_64BIT),reipl64.o,reipl.o)
@@ -48,6 +48,7 @@ obj-$(CONFIG_FUNCTION_TRACER)	+= $(if $(
 obj-$(CONFIG_DYNAMIC_FTRACE)	+= ftrace.o
 obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
 obj-$(CONFIG_FTRACE_SYSCALLS)  += ftrace.o
+obj-$(CONFIG_CRASH_DUMP)	+= crash_dump.o crash_dump_elf.o
 
 # Kexec part
 S390_KEXEC_OBJS := machine_kexec.o crash.o
--- a/arch/s390/kernel/asm-offsets.c
+++ b/arch/s390/kernel/asm-offsets.c
@@ -142,6 +142,11 @@ int main(void)
 	DEFINE(__LC_FPREGS_SAVE_AREA, offsetof(struct _lowcore, floating_pt_save_area));
 	DEFINE(__LC_GPREGS_SAVE_AREA, offsetof(struct _lowcore, gpregs_save_area));
 	DEFINE(__LC_CREGS_SAVE_AREA, offsetof(struct _lowcore, cregs_save_area));
+	DEFINE(__LC_SAVE_AREA_64, offsetof(struct _lowcore, save_area_64));
+	DEFINE(__LC_MEMINFO, offsetof(struct _lowcore, meminfo));
+	DEFINE(__MI_TYPE_KDUMP_MEM, (MEMINFO_TYPE_KDUMP_MEM * sizeof(struct meminfo)));
+	DEFINE(__MI_ADDR, offsetof(struct meminfo, addr));
+	DEFINE(__MI_SIZE, offsetof(struct meminfo, size));
 #ifdef CONFIG_32BIT
 	DEFINE(SAVE_AREA_BASE, offsetof(struct _lowcore, extended_save_area_addr));
 #else /* CONFIG_32BIT */
@@ -153,6 +158,8 @@ int main(void)
 	DEFINE(__LC_VDSO_PER_CPU, offsetof(struct _lowcore, vdso_per_cpu_data));
 	DEFINE(__LC_SIE_HOOK, offsetof(struct _lowcore, sie_hook));
 	DEFINE(__LC_CMF_HPP, offsetof(struct _lowcore, cmf_hpp));
+	DEFINE(__LC_OLDMEM_BASE, offsetof(struct _lowcore, oldmem_base));
+	DEFINE(__LC_OLDMEM_SIZE, offsetof(struct _lowcore, oldmem_size));
 #endif /* CONFIG_32BIT */
 	return 0;
 }
--- a/arch/s390/kernel/base.S
+++ b/arch/s390/kernel/base.S
@@ -75,6 +75,43 @@ s390_base_pgm_handler_fn:
 	.quad	0
 	.previous
 
+#
+# Calls diag 308 subcode 1 and continues execution
+#
+# The following conditions must be ensured before calling this function:
+# * Prefix register = 0
+# * Lowcore protection is disabled
+#
+	.globl	do_reset_diag308
+do_reset_diag308:
+	larl	%r4,.Lctlregs		# Save control registers
+	stctg	%c0,%c15,0(%r4)
+	larl	%r4,.Lrestart_psw	# Setup restart PSW at absolute 0
+	lghi	%r3,0
+	lg	%r4,0(%r4)		# Save PSW
+	sturg	%r4,%r3			# Use sturg, because of large pages
+	lghi	%r1,1
+	diag	%r1,%r1,0x308
+.Lrestart_part2:
+	lhi	%r0,0			# Load r0 with zero
+	lhi	%r1,2			# Use mode 2 = ESAME (dump)
+	sigp	%r1,%r0,0x12		# Switch to ESAME mode
+	sam64				# Switch to 64 bit addressing mode
+	larl	%r4,.Lctlregs		# Restore control registers
+	lctlg	%c0,%c15,0(%r4)
+	br	%r14
+.align 16
+.Lrestart_psw:
+	.long	0x00080000,0x80000000 + .Lrestart_part2
+
+	.section .bss
+.align 8
+.Lctlregs:
+	.rept	16
+	.quad	0
+	.endr
+	.previous
+
 #else /* CONFIG_64BIT */
 
 	.globl	s390_base_mcck_handler
--- /dev/null
+++ b/arch/s390/kernel/crash_dump.c
@@ -0,0 +1,76 @@
+/*
+ * S390 kdump implementation
+ *
+ * Copyright IBM Corp. 2011
+ * Author(s): Michael Holzheu <holzheu@linux.vnet.ibm.com>
+ */
+
+#include <linux/crash_dump.h>
+#include <asm/lowcore.h>
+
+/*
+ * Copy one page from "oldmem"
+ *
+ * For the kdump reserved memory this functions performs a swap operation:
+ *  - [kdump_base - kdump_base + kdump_size] is mapped to [0 - kdump_size].
+ *  - [0 - kdump_size] is mapped to [kdump_base - kdump_base + kdump_size]
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
+			 size_t csize, unsigned long offset, int userbuf)
+{
+	unsigned long src, kdump_base, kdump_size;
+	int rc;
+
+	if (!csize)
+		return 0;
+
+	kdump_base = oldmem_base;
+	kdump_size = oldmem_size;
+
+	src = (pfn << PAGE_SHIFT) + offset;
+	if (src < kdump_size)
+		src += kdump_base;
+	else if (src > kdump_base &&
+		 src < kdump_base + kdump_size)
+		src -= kdump_base;
+	if (userbuf)
+		rc = copy_to_user_real((void __user *) buf, (void *) src,
+				       csize);
+	else
+		rc = memcpy_real(buf, (void *) src, csize);
+	return rc < 0 ? rc : csize;
+}
+
+/*
+ * Read memory from oldmem
+ */
+ssize_t crash_read_from_oldmem(void *buf, size_t count, u64 ppos, int userbuf)
+{
+	unsigned long pfn, offset;
+	ssize_t read = 0, tmp;
+	size_t nr_bytes;
+
+	if (!count)
+		return 0;
+
+	offset = (unsigned long)(ppos % PAGE_SIZE);
+	pfn = (unsigned long)(ppos / PAGE_SIZE);
+
+	do {
+		if (count > (PAGE_SIZE - offset))
+			nr_bytes = PAGE_SIZE - offset;
+		else
+			nr_bytes = count;
+
+		tmp = copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf);
+		if (tmp < 0)
+			return tmp;
+		count -= nr_bytes;
+		buf += nr_bytes;
+		read += nr_bytes;
+		++pfn;
+		offset = 0;
+	} while (count);
+
+	return read;
+}
--- /dev/null
+++ b/arch/s390/kernel/crash_dump_elf.c
@@ -0,0 +1,434 @@
+/*
+ * S390 kdump implementation - Create ELF core header
+ *
+ * Copyright IBM Corp. 2011
+ *
+ * Author(s): Michael Holzheu <holzheu@linux.vnet.ibm.com>
+ */
+
+#define KMSG_COMPONENT "kdump"
+#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/gfp.h>
+#include <linux/slab.h>
+#include <linux/crash_dump.h>
+#include <linux/bootmem.h>
+#include <linux/elf.h>
+#include <asm/ipl.h>
+
+#define HDR_PER_CPU_SIZE	0x300
+#define HDR_PER_MEMC_SIZE	0x100
+#define HDR_BASE_SIZE		0x2000
+
+#define ROUNDUP(x, y) ((((x) + ((y) - 1)) / (y)) * (y))
+#define PTR_ADD(x, y) (((char *) (x)) + ((unsigned long) (y)))
+#define PTR_SUB(x, y) (((char *) (x)) - ((unsigned long) (y)))
+#define PTR_DIFF(x, y) ((unsigned long)(((char *) (x)) - ((unsigned long) (y))))
+
+#ifndef ELFOSABI_SYSV
+#define ELFOSABI_SYSV 0
+#endif
+
+#ifndef EI_ABIVERSION
+#define EI_ABIVERSION 8
+#endif
+
+#ifndef NT_FPREGSET
+#define NT_FPREGSET 2
+#endif
+
+/*
+ * prstatus ELF Note
+ */
+struct nt_prstatus_64 {
+	u8	pad1[32];
+	u32	pr_pid;
+	u8	pad2[76];
+	u64	psw[2];
+	u64	gprs[16];
+	u32	acrs[16];
+	u64	orig_gpr2;
+	u32	pr_fpvalid;
+	u8	pad3[4];
+} __packed;
+
+/*
+ * fpregset ELF Note
+ */
+struct nt_fpregset_64 {
+	u32	fpc;
+	u32	pad;
+	u64	fprs[16];
+} __packed;
+
+/*
+ * prpsinfo ELF Note
+ */
+struct nt_prpsinfo_64 {
+	char	pr_state;
+	char	pr_sname;
+	char	pr_zomb;
+	char	pr_nice;
+	u64	pr_flag;
+	u32	pr_uid;
+	u32	pr_gid;
+	u32	pr_pid, pr_ppid, pr_pgrp, pr_sid;
+	char	pr_fname[16];
+	char	pr_psargs[80];
+};
+
+/*
+ * File local static data
+ */
+static struct {
+	void	*hdr;
+	u32	hdr_size;
+	int	mem_chunk_cnt;
+} l;
+
+/*
+ * Create all required memory holes
+ */
+static void create_mem_holes(struct mem_chunk chunk_array[])
+{
+	create_mem_hole(chunk_array, oldmem_base, oldmem_size, CHUNK_CRASHK);
+}
+
+/*
+ * Alloc memory and panic in case of alloc failure
+ */
+static void *zg_alloc(int len)
+{
+	void *rc;
+
+	rc = kzalloc(len, GFP_KERNEL);
+	if (!rc)
+		panic("crash_dump_elf: alloc failed");
+	return rc;
+}
+
+/*
+ * Calculate CPUs count for dump
+ */
+static int cpu_cnt(void)
+{
+	int i, cpus = 0;
+
+	for (i = 0; zfcpdump_save_areas[i]; i++) {
+		if (zfcpdump_save_areas[i]->pref_reg == 0)
+			continue;
+		cpus++;
+	}
+	return cpus;
+}
+
+/*
+ * Calculate memory chunk count
+ */
+static int mem_chunk_cnt(void)
+{
+	struct mem_chunk *chunk_array, *mem_chunk;
+	int i, cnt = 0;
+
+	chunk_array = zg_alloc(MEMORY_CHUNKS * sizeof(struct mem_chunk));
+	detect_memory_layout(chunk_array);
+	create_mem_holes(chunk_array);
+	for (i = 0; i < MEMORY_CHUNKS; i++) {
+		mem_chunk = &chunk_array[i];
+		if (chunk_array[i].type != CHUNK_READ_WRITE &&
+		    chunk_array[i].type != CHUNK_READ_ONLY)
+			continue;
+		if (mem_chunk->size == 0)
+			continue;
+		cnt++;
+	}
+	kfree(chunk_array);
+	return cnt;
+}
+
+/*
+ * Initialize ELF header
+ */
+static void *ehdr_init(Elf64_Ehdr *ehdr)
+{
+	memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
+	ehdr->e_ident[EI_CLASS] = ELFCLASS64;
+	ehdr->e_ident[EI_DATA] = ELFDATA2MSB;
+	ehdr->e_ident[EI_VERSION] = EV_CURRENT;
+	ehdr->e_ident[EI_OSABI] = ELFOSABI_SYSV;
+	ehdr->e_ident[EI_ABIVERSION] = 0;
+	memset(ehdr->e_ident+EI_PAD, 0, EI_NIDENT-EI_PAD);
+	ehdr->e_type = ET_CORE;
+	ehdr->e_machine = EM_S390;
+	ehdr->e_version = EV_CURRENT;
+	ehdr->e_entry = 0;
+	ehdr->e_phoff = sizeof(Elf64_Ehdr);
+	ehdr->e_shoff = 0;
+	ehdr->e_flags = 0;
+	ehdr->e_ehsize = sizeof(Elf64_Ehdr);
+	ehdr->e_phentsize = sizeof(Elf64_Phdr);
+	ehdr->e_shentsize = 0;
+	ehdr->e_shnum = 0;
+	ehdr->e_shstrndx = 0;
+	ehdr->e_phnum = l.mem_chunk_cnt + 1;
+	return ehdr + 1;
+}
+
+/*
+ * Initialize ELF loads
+ */
+static int loads_init(Elf64_Phdr *phdr, u64 loads_offset)
+{
+	struct mem_chunk *chunk_array, *mem_chunk;
+	int i;
+
+	chunk_array = zg_alloc(MEMORY_CHUNKS * sizeof(struct mem_chunk));
+	detect_memory_layout(chunk_array);
+	create_mem_holes(chunk_array);
+	for (i = 0; i < MEMORY_CHUNKS; i++) {
+		mem_chunk = &chunk_array[i];
+		if (mem_chunk->size == 0)
+			break;
+		if (chunk_array[i].type != CHUNK_READ_WRITE &&
+		    chunk_array[i].type != CHUNK_READ_ONLY)
+			continue;
+		else
+			phdr->p_filesz = mem_chunk->size;
+		phdr->p_type = PT_LOAD;
+		phdr->p_offset = mem_chunk->addr;
+		phdr->p_vaddr = mem_chunk->addr;
+		phdr->p_paddr = mem_chunk->addr;
+		phdr->p_memsz = mem_chunk->size;
+		phdr->p_flags = PF_R | PF_W | PF_X;
+		phdr->p_align = PAGE_SIZE;
+		phdr++;
+	}
+	kfree(chunk_array);
+	return i;
+}
+
+/*
+ * Initialize ELF note
+ */
+static void *nt_init(void *buf, Elf64_Word type, void *desc, int d_len,
+		     const char *name)
+{
+	Elf64_Nhdr *note;
+	u64 len;
+
+	note = (Elf64_Nhdr *)buf;
+	note->n_namesz = strlen(name) + 1;
+	note->n_descsz = d_len;
+	note->n_type = type;
+	len = sizeof(Elf64_Nhdr);
+
+	memcpy(buf + len, name, note->n_namesz);
+	len = ROUNDUP(len + note->n_namesz, 4);
+
+	memcpy(buf + len, desc, note->n_descsz);
+	len = ROUNDUP(len + note->n_descsz, 4);
+
+	return PTR_ADD(buf, len);
+}
+
+/*
+ * Initialize prstatus note
+ */
+static void *nt_prstatus(void *ptr, struct save_area *cpu)
+{
+	struct nt_prstatus_64 nt_prstatus;
+	static int cpu_nr = 1;
+
+	memset(&nt_prstatus, 0, sizeof(nt_prstatus));
+	memcpy(&nt_prstatus.gprs, cpu->gp_regs, sizeof(cpu->gp_regs));
+	memcpy(&nt_prstatus.psw, cpu->psw, sizeof(cpu->psw));
+	memcpy(&nt_prstatus.acrs, cpu->acc_regs, sizeof(cpu->acc_regs));
+	nt_prstatus.pr_pid = cpu_nr;
+	cpu_nr++;
+
+	return nt_init(ptr, NT_PRSTATUS, &nt_prstatus, sizeof(nt_prstatus),
+			 "CORE");
+}
+
+/*
+ * Initialize fpregset (floating point) note
+ */
+static void *nt_fpregset(void *ptr, struct save_area *cpu)
+{
+	struct nt_fpregset_64 nt_fpregset;
+
+	memset(&nt_fpregset, 0, sizeof(nt_fpregset));
+	memcpy(&nt_fpregset.fpc, &cpu->fp_ctrl_reg, sizeof(cpu->fp_ctrl_reg));
+	memcpy(&nt_fpregset.fprs, &cpu->fp_regs, sizeof(cpu->fp_regs));
+
+	return nt_init(ptr, NT_FPREGSET, &nt_fpregset, sizeof(nt_fpregset),
+			 "CORE");
+}
+
+/*
+ * Initialize timer note
+ */
+static void *nt_s390_timer(void *ptr, struct save_area *cpu)
+{
+	return nt_init(ptr, NT_S390_TIMER, &cpu->timer, sizeof(cpu->timer),
+			 "LINUX");
+}
+
+/*
+ * Initialize TOD clock comparator note
+ */
+static void *nt_s390_tod_cmp(void *ptr, struct save_area *cpu)
+{
+	return nt_init(ptr, NT_S390_TODCMP, &cpu->clk_cmp,
+		       sizeof(cpu->clk_cmp), "LINUX");
+}
+
+/*
+ * Initialize TOD programmable register note
+ */
+static void *nt_s390_tod_preg(void *ptr, struct save_area *cpu)
+{
+	return nt_init(ptr, NT_S390_TODPREG, &cpu->tod_reg,
+		       sizeof(cpu->tod_reg), "LINUX");
+}
+
+/*
+ * Initialize control register note
+ */
+static void *nt_s390_ctrs(void *ptr, struct save_area *cpu)
+{
+	return nt_init(ptr, NT_S390_CTRS, &cpu->ctrl_regs,
+		       sizeof(cpu->ctrl_regs), "LINUX");
+}
+
+/*
+ * Initialize prefix register note
+ */
+static void *nt_s390_prefix(void *ptr, struct save_area *cpu)
+{
+	return nt_init(ptr, NT_S390_PREFIX, &cpu->pref_reg,
+			 sizeof(cpu->pref_reg), "LINUX");
+}
+
+/*
+ * Initialize prpsinfo note
+ */
+static void *nt_prpsinfo(void *ptr)
+{
+	struct nt_prpsinfo_64 prpsinfo;
+
+	memset(&prpsinfo, 0, sizeof(prpsinfo));
+	prpsinfo.pr_state = 0;
+	prpsinfo.pr_sname = 'R';
+	prpsinfo.pr_zomb = 0;
+	strcpy(prpsinfo.pr_fname, "vmlinux");
+
+	return nt_init(ptr, NT_PRPSINFO, &prpsinfo, sizeof(prpsinfo), "CORE");
+}
+
+/*
+ * Initialize vmcoreinfo note
+ */
+static void *nt_vmcoreinfo(void *ptr)
+{
+	struct meminfo meminfo_vmcoreinfo;
+	char note_name[11];
+	unsigned long addr;
+	char *vmcoreinfo;
+	Elf64_Nhdr note;
+
+	if (meminfo_old_get(MEMINFO_TYPE_VMCOREINFO, &meminfo_vmcoreinfo))
+		return ptr;
+	addr = meminfo_vmcoreinfo.addr;
+	memset(note_name, 0, sizeof(note_name));
+	crash_read_from_oldmem(&note, sizeof(note), addr, 0);
+	crash_read_from_oldmem(note_name, sizeof(note_name) - 1,
+			       addr + sizeof(note), 0);
+	if (strcmp(note_name, "VMCOREINFO") != 0)
+		return ptr;
+	vmcoreinfo = zg_alloc(note.n_descsz + 1);
+	crash_read_from_oldmem(vmcoreinfo, note.n_descsz, addr + 24, 0);
+	vmcoreinfo[note.n_descsz + 1] = 0;
+
+	return nt_init(ptr, 0, vmcoreinfo, note.n_descsz, "VMCOREINFO");
+}
+
+/*
+ * Initialize notes
+ */
+static void *notes_init(Elf64_Phdr *phdr, void *ptr, u64 notes_offset)
+{
+	struct save_area *cpu;
+	void *ptr_start = ptr;
+	int i;
+
+	ptr = nt_prpsinfo(ptr);
+
+	for (i = 0; zfcpdump_save_areas[i]; i++) {
+		cpu = zfcpdump_save_areas[i];
+		if (cpu->pref_reg == 0)
+			continue;
+		ptr = nt_prstatus(ptr, cpu);
+		ptr = nt_fpregset(ptr, cpu);
+		ptr = nt_s390_timer(ptr, cpu);
+		ptr = nt_s390_tod_cmp(ptr, cpu);
+		ptr = nt_s390_tod_preg(ptr, cpu);
+		ptr = nt_s390_ctrs(ptr, cpu);
+		ptr = nt_s390_prefix(ptr, cpu);
+	}
+	ptr = nt_vmcoreinfo(ptr);
+	memset(phdr, 0, sizeof(*phdr));
+	phdr->p_type = PT_NOTE;
+	phdr->p_offset = notes_offset;
+	phdr->p_filesz = (unsigned long) PTR_SUB(ptr, ptr_start);
+	phdr->p_memsz = phdr->p_filesz;
+	return ptr;
+}
+
+/*
+ * Initialize ELF header for kdump
+ */
+static void setup_kdump_elf_hdr(void)
+{
+	Elf64_Phdr *phdr_notes, *phdr_loads;
+	u32 alloc_size;
+	u64 hdr_off;
+	void *ptr;
+
+	if (!is_kdump_kernel())
+		return;
+	l.mem_chunk_cnt = mem_chunk_cnt();
+
+	alloc_size = HDR_BASE_SIZE + cpu_cnt() * HDR_PER_CPU_SIZE +
+		l.mem_chunk_cnt * HDR_PER_MEMC_SIZE;
+	l.hdr = zg_alloc(alloc_size);
+	/* Init elf header */
+	ptr = ehdr_init(l.hdr);
+	/* Init program headers */
+	phdr_notes = ptr;
+	ptr = PTR_ADD(ptr, sizeof(Elf64_Phdr));
+	phdr_loads = ptr;
+	ptr = PTR_ADD(ptr, sizeof(Elf64_Phdr) * l.mem_chunk_cnt);
+	/* Init notes */
+	hdr_off = PTR_DIFF(ptr, l.hdr);
+	ptr = notes_init(phdr_notes, ptr, hdr_off);
+	/* Init loads */
+	hdr_off = PTR_DIFF(ptr, l.hdr);
+	loads_init(phdr_loads, hdr_off);
+	l.hdr_size = hdr_off;
+	BUG_ON(l.hdr_size > alloc_size);
+}
+
+/*
+ * Get ELF header - called from vmcore common code
+ */
+int arch_vmcore_get_elf_hdr(char **elfcorebuf, size_t *elfcorebuf_sz)
+{
+	if (!l.hdr)
+		setup_kdump_elf_hdr();
+	*elfcorebuf = l.hdr;
+	*elfcorebuf_sz = l.hdr_size;
+	return 0;
+}
--- a/arch/s390/kernel/early.c
+++ b/arch/s390/kernel/early.c
@@ -20,6 +20,7 @@
 #include <linux/pfn.h>
 #include <linux/uaccess.h>
 #include <linux/kernel.h>
+#include <linux/crash_dump.h>
 #include <asm/ebcdic.h>
 #include <asm/ipl.h>
 #include <asm/lowcore.h>
@@ -29,6 +30,7 @@
 #include <asm/sysinfo.h>
 #include <asm/cpcmd.h>
 #include <asm/sclp.h>
+#include <asm/asm-offsets.h>
 #include "entry.h"
 
 /*
@@ -453,6 +455,14 @@ static void __init setup_boot_command_li
 	append_to_cmdline(append_ipl_scpdata);
 }
 
+static void __init setup_kdump(void)
+{
+#ifdef CONFIG_CRASH_DUMP
+	if (!oldmem_base)
+		return;
+	elfcorehdr_addr = ELFCORE_ADDR_NEWMEM; /* needed for is_kdump_kernel */
+#endif
+}
 
 /*
  * Save ipl parameters, clear bss memory, initialize storage keys
@@ -460,6 +470,8 @@ static void __init setup_boot_command_li
  */
 void __init startup_init(void)
 {
+	meminfo_init();
+	setup_kdump();
 	reset_tod_clock();
 	ipl_save_parameters();
 	rescue_initrd();
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -859,6 +859,34 @@ restart_crash:
 restart_go:
 #endif
 
+#
+# PSW restart interrupt handler
+#
+	.globl psw_restart_int_handler
+psw_restart_int_handler:
+	st	%r15,__LC_SAVE_AREA_64(%r0)	# save r15
+	basr	%r15,0
+0:	l	%r15,.Lrestart_stack-0b(%r15)	# load restart stack
+	l	%r15,0(%r15)
+	ahi	%r15,-SP_SIZE			# make room for pt_regs
+	stm	%r0,%r14,SP_R0(%r15)		# store gprs %r0-%r14 to stack
+	mvc	SP_R15(4,%r15),__LC_SAVE_AREA_64(%r0)# store saved %r15 to stack
+	mvc	SP_PSW(8,%r15),__LC_RST_OLD_PSW(%r0) # store restart old psw
+	xc	__SF_BACKCHAIN(4,%r15),__SF_BACKCHAIN(%r15) # set backchain to 0
+	basr	%r14,0
+1:	l	%r14,.Ldo_restart-1b(%r14)
+	basr	%r14,%r14
+
+	basr	%r14,0				# load disabled wait PSW if
+2:	lpsw	restart_psw_crash-2b(%r14)	# do_restart returns
+.Ldo_restart:
+	.long	do_restart
+.Lrestart_stack:
+	.long	restart_stack
+	.align 8
+restart_psw_crash:
+	.long	0x000a0000,0x00000000 + restart_psw_crash
+
 	.section .kprobes.text, "ax"
 
 #ifdef CONFIG_CHECK_STACK
--- a/arch/s390/kernel/entry64.S
+++ b/arch/s390/kernel/entry64.S
@@ -862,6 +862,27 @@ restart_crash:
 restart_go:
 #endif
 
+#
+# PSW restart interrupt handler
+#
+	.globl psw_restart_int_handler
+psw_restart_int_handler:
+	stg	%r15,__LC_SAVE_AREA_64(%r0)	# save r15
+	larl	%r15,restart_stack		# load restart stack
+	lg	%r15,0(%r15)
+	aghi	%r15,-SP_SIZE			# make room for pt_regs
+	stmg	%r0,%r14,SP_R0(%r15)		# store gprs %r0-%r14 to stack
+	mvc	SP_R15(8,%r15),__LC_SAVE_AREA_64(%r0)# store saved %r15 to stack
+	mvc	SP_PSW(16,%r15),__LC_RST_OLD_PSW(%r0)# store restart old psw
+	xc	__SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15) # set backchain to 0
+	brasl	%r14,do_restart
+
+	larl	%r14,restart_psw_crash		# load disabled wait PSW if
+	lpswe	0(%r14)				# do_restart returns
+	.align 8
+restart_psw_crash:
+	.quad	0x0002000080000000,0x0000000000000000 + restart_psw_crash
+
 	.section .kprobes.text, "ax"
 
 #ifdef CONFIG_CHECK_STACK
--- a/arch/s390/kernel/head.S
+++ b/arch/s390/kernel/head.S
@@ -450,10 +450,22 @@ start:
 	.org	0x10000
 	.globl	startup
 startup:
+	j	.Lep_startup_normal
+
+#
+# kdump startup-code at 0x10008, running in 64 bit absolute addressing mode
+#
+	.org	0x10008
+	.globl	startup_kdump
+startup_kdump:
+	j	.Lep_startup_kdump
+
+.Lep_startup_normal:
 	basr	%r13,0			# get base
 .LPG0:
 	xc	0x200(256),0x200	# partially clear lowcore
 	xc	0x300(256),0x300
+	xc	0xe00(256),0xe00
 	stck	__LC_LAST_UPDATE_CLOCK
 	spt	5f-.LPG0(%r13)
 	mvc	__LC_LAST_UPDATE_TIMER(8),5f-.LPG0(%r13)
@@ -535,6 +547,8 @@ startup:
 	.align	8
 5:	.long	0x7fffffff,0xffffffff
 
+#include "head_kdump.S"
+
 #
 # params at 10400 (setup.h)
 #
--- /dev/null
+++ b/arch/s390/kernel/head_kdump.S
@@ -0,0 +1,133 @@
+/*
+ * S390 kdump lowlevel functions (new kernel)
+ *
+ * Copyright IBM Corp. 2011
+ * Author(s): Michael Holzheu <holzheu@linux.vnet.ibm.com>
+ */
+
+#define DATAMOVER_ADDR	0x4000
+#define COPY_PAGE_ADDR	0x6000
+
+#ifdef CONFIG_CRASH_DUMP
+
+#
+# kdump entry (new kernel - not yet relocated)
+#
+# Note: This code has to be position independent
+#
+
+.align 2
+.Lep_startup_kdump:
+	basr	%r13,0
+.Lbase:
+	larl	%r2,.Lbase_addr			# Check, if we have been
+	lg	%r2,0(%r2)			# already relocated:
+	clgr	%r2,%r13			#
+	jne	.Lrelocate			# No : Start data mover
+	lghi	%r2,0				# Yes: Start kdump kernel
+	brasl	%r14,startup_kdump_relocated
+
+.Lrelocate:
+	lg	%r4,__LC_MEMINFO+__MI_ADDR(%r0)	# Load meminfo base (%r4)
+
+	lgr	%r5,%r4
+	aghi	%r5,__MI_TYPE_KDUMP_MEM		# Base for kdump meminfo
+	lg	%r2,__MI_ADDR(%r5)		# Load kdump base address (%r2)
+	lg	%r3,__MI_SIZE(%r5)		# Load kdump size (%r3)
+
+	stg	%r2,__LC_OLDMEM_BASE(%r2)	# Save kdump base
+	stg	%r3,__LC_OLDMEM_SIZE(%r2)	# Save kdump size
+
+	larl	%r10,.Lcopy_start		# Source of data mover
+	lghi	%r8,DATAMOVER_ADDR		# Target of data mover
+	mvc	0(256,%r8),0(%r10)		# Copy data mover code
+
+	agr	%r8,%r2				# Copy data mover to
+	mvc	0(256,%r8),0(%r10)		# reserved mem
+
+	lghi	%r14,DATAMOVER_ADDR		# Jump to copied data mover
+	basr	%r14,%r14
+.Lbase_addr:
+	.quad	.Lbase
+
+#
+# kdump data mover code (runs at address DATAMOVER_ADDR)
+#
+# r2: kdump base address
+# r3: kdump size
+#
+.Lcopy_start:
+	basr	%r13,0				# Base
+0:
+	lgr	%r11,%r2			# Save kdump base address
+	lgr	%r12,%r2
+	agr	%r12,%r3			# Compute kdump end address
+
+	lghi	%r5,0
+	lghi	%r10,COPY_PAGE_ADDR		# Load copy page address
+1:
+	mvc	0(256,%r10),0(%r5)		# Copy old kernel to tmp
+	mvc	0(256,%r5),0(%r11)		# Copy new kernel to old
+	mvc	0(256,%r11),0(%r10)		# Copy tmp to new
+	aghi	%r11,256
+	aghi	%r5,256
+	clgr	%r11,%r12
+	jl	1b
+
+	lg	%r14,.Lstartup_kdump-0b(%r13)
+	basr	%r14,%r14			# Start relocated kernel
+.Lstartup_kdump:
+	.long	0x00000000,0x00000000 + startup_kdump_relocated
+.Lcopy_end:
+
+#
+# Startup of kdump (relocated new kernel)
+#
+.align 2
+startup_kdump_relocated:
+	basr	%r13,0
+0:	lg	%r3,__LC_OLDMEM_BASE(%r0)	# Save oldmem base
+	stg	%r3,oldmem_base-0b(%r13)
+	lg	%r3,__LC_OLDMEM_SIZE(%r0)	# Save oldmem size
+	stg	%r3,oldmem_size-0b(%r13)
+
+	mvc	0(8,%r0),.Lrestart_psw-0b(%r13)	# Setup restart PSW
+	mvc	464(16,%r0),.Lpgm_psw-0b(%r13)	# Setup pgm check PSW
+	lhi	%r1,1				# Start new kernel
+	diag	%r1,%r1,0x308			# with diag 308
+
+.Lno_diag308:					# No diag 308
+	sam31					# Switch to 31 bit addr mode
+	sr	%r1,%r1				# Erase register r1
+	sr	%r2,%r2				# Erase register r2
+	sigp	%r1,%r2,0x12			# Switch to 31 bit arch mode
+	lpsw	0				# Start new kernel...
+.align	8
+.Lrestart_psw:
+	.long	0x00080000,0x80000000 + startup
+.Lpgm_psw:
+	.quad	0x0000000180000000,0x0000000000000000 + .Lno_diag308
+	.globl	oldmem_base
+oldmem_base:
+	.quad	0x0
+	.globl	oldmem_size
+oldmem_size:
+	.quad	0x0
+
+#else
+.align 2
+.Lep_startup_kdump:
+#ifdef CONFIG_64BIT
+	larl	%r13,startup_kdump_crash
+	lpswe	0(%r13)
+.align 8
+startup_kdump_crash:
+	.quad	0x0002000080000000,0x0000000000000000 + startup_kdump_crash
+#else
+	basr	%r13,0
+0:	lpsw	startup_kdump_crash-0b(%r13)
+.align 8
+startup_kdump_crash:
+	.long	0x000a0000,0x00000000 + startup_kdump_crash
+#endif /* CONFIG_64BIT */
+#endif /* CONFIG_CRASH_DUMP */
--- a/arch/s390/kernel/ipl.c
+++ b/arch/s390/kernel/ipl.c
@@ -16,6 +16,7 @@
 #include <linux/ctype.h>
 #include <linux/fs.h>
 #include <linux/gfp.h>
+#include <linux/crash_dump.h>
 #include <asm/ipl.h>
 #include <asm/smp.h>
 #include <asm/setup.h>
@@ -26,6 +27,7 @@
 #include <asm/sclp.h>
 #include <asm/sigp.h>
 #include <asm/checksum.h>
+#include <asm/lowcore.h>
 
 #define IPL_PARM_BLOCK_VERSION 0
 
@@ -45,11 +47,13 @@
  * - halt
  * - power off
  * - reipl
+ * - restart
  */
 #define ON_PANIC_STR		"on_panic"
 #define ON_HALT_STR		"on_halt"
 #define ON_POFF_STR		"on_poff"
 #define ON_REIPL_STR		"on_reboot"
+#define ON_RESTART_STR		"on_restart"
 
 struct shutdown_action;
 struct shutdown_trigger {
@@ -66,6 +70,7 @@ struct shutdown_trigger {
 #define SHUTDOWN_ACTION_VMCMD_STR	"vmcmd"
 #define SHUTDOWN_ACTION_STOP_STR	"stop"
 #define SHUTDOWN_ACTION_DUMP_REIPL_STR	"dump_reipl"
+#define SHUTDOWN_ACTION_KDUMP_STR	"kdump"
 
 struct shutdown_action {
 	char *name;
@@ -946,6 +951,13 @@ static struct attribute_group reipl_nss_
 	.attrs = reipl_nss_attrs,
 };
 
+static void set_reipl_block_actual(struct ipl_parameter_block *reipl_block)
+{
+	meminfo_update(MEMINFO_TYPE_IPIB, reipl_block, reipl_block->hdr.len,
+		       MEMINFO_FLAG_ELEM_VALID | MEMINFO_FLAG_CSUM_VALID);
+	reipl_block_actual = reipl_block;
+}
+
 /* reipl type */
 
 static int reipl_set_type(enum ipl_type type)
@@ -961,7 +973,7 @@ static int reipl_set_type(enum ipl_type
 			reipl_method = REIPL_METHOD_CCW_VM;
 		else
 			reipl_method = REIPL_METHOD_CCW_CIO;
-		reipl_block_actual = reipl_block_ccw;
+		set_reipl_block_actual(reipl_block_ccw);
 		break;
 	case IPL_TYPE_FCP:
 		if (diag308_set_works)
@@ -970,7 +982,7 @@ static int reipl_set_type(enum ipl_type
 			reipl_method = REIPL_METHOD_FCP_RO_VM;
 		else
 			reipl_method = REIPL_METHOD_FCP_RO_DIAG;
-		reipl_block_actual = reipl_block_fcp;
+		set_reipl_block_actual(reipl_block_fcp);
 		break;
 	case IPL_TYPE_FCP_DUMP:
 		reipl_method = REIPL_METHOD_FCP_DUMP;
@@ -980,7 +992,7 @@ static int reipl_set_type(enum ipl_type
 			reipl_method = REIPL_METHOD_NSS_DIAG;
 		else
 			reipl_method = REIPL_METHOD_NSS;
-		reipl_block_actual = reipl_block_nss;
+		set_reipl_block_actual(reipl_block_nss);
 		break;
 	case IPL_TYPE_UNKNOWN:
 		reipl_method = REIPL_METHOD_DEFAULT;
@@ -1111,6 +1123,12 @@ static void reipl_block_ccw_init(struct
 static void reipl_block_ccw_fill_parms(struct ipl_parameter_block *ipb)
 {
 	/* LOADPARM */
+	/* For kdump we use IPL parameters from original system */
+	if (is_kdump_kernel()) {
+		memcpy(ipb->ipl_info.ccw.load_parm,
+		       ipl_block.ipl_info.ccw.load_parm, LOADPARM_LEN);
+		return;
+	}
 	/* check if read scp info worked and set loadparm */
 	if (sclp_ipl_info.is_valid)
 		memcpy(ipb->ipl_info.ccw.load_parm,
@@ -1495,30 +1513,10 @@ static struct shutdown_action __refdata
 
 static void dump_reipl_run(struct shutdown_trigger *trigger)
 {
-	preempt_disable();
-	/*
-	 * Bypass dynamic address translation (DAT) when storing IPL parameter
-	 * information block address and checksum into the prefix area
-	 * (corresponding to absolute addresses 0-8191).
-	 * When enhanced DAT applies and the STE format control in one,
-	 * the absolute address is formed without prefixing. In this case a
-	 * normal store (stg/st) into the prefix area would no more match to
-	 * absolute addresses 0-8191.
-	 */
-#ifdef CONFIG_64BIT
-	asm volatile("sturg %0,%1"
-		:: "a" ((unsigned long) reipl_block_actual),
-		"a" (&lowcore_ptr[smp_processor_id()]->ipib));
-#else
-	asm volatile("stura %0,%1"
-		:: "a" ((unsigned long) reipl_block_actual),
-		"a" (&lowcore_ptr[smp_processor_id()]->ipib));
-#endif
-	asm volatile("stura %0,%1"
-		:: "a" (csum_partial(reipl_block_actual,
-				     reipl_block_actual->hdr.len, 0)),
-		"a" (&lowcore_ptr[smp_processor_id()]->ipib_checksum));
-	preempt_enable();
+	u32 csum;
+
+	csum = csum_partial(reipl_block_actual, reipl_block_actual->hdr.len, 0);
+	copy_to_absolute_zero(&S390_lowcore.ipib_checksum, &csum, sizeof(csum));
 	dump_run(trigger);
 }
 
@@ -1544,17 +1542,20 @@ static char vmcmd_on_reboot[128];
 static char vmcmd_on_panic[128];
 static char vmcmd_on_halt[128];
 static char vmcmd_on_poff[128];
+static char vmcmd_on_restart[128];
 
 DEFINE_IPL_ATTR_STR_RW(vmcmd, on_reboot, "%s\n", "%s\n", vmcmd_on_reboot);
 DEFINE_IPL_ATTR_STR_RW(vmcmd, on_panic, "%s\n", "%s\n", vmcmd_on_panic);
 DEFINE_IPL_ATTR_STR_RW(vmcmd, on_halt, "%s\n", "%s\n", vmcmd_on_halt);
 DEFINE_IPL_ATTR_STR_RW(vmcmd, on_poff, "%s\n", "%s\n", vmcmd_on_poff);
+DEFINE_IPL_ATTR_STR_RW(vmcmd, on_restart, "%s\n", "%s\n", vmcmd_on_restart);
 
 static struct attribute *vmcmd_attrs[] = {
 	&sys_vmcmd_on_reboot_attr.attr,
 	&sys_vmcmd_on_panic_attr.attr,
 	&sys_vmcmd_on_halt_attr.attr,
 	&sys_vmcmd_on_poff_attr.attr,
+	&sys_vmcmd_on_restart_attr.attr,
 	NULL,
 };
 
@@ -1576,6 +1577,8 @@ static void vmcmd_run(struct shutdown_tr
 		cmd = vmcmd_on_halt;
 	else if (strcmp(trigger->name, ON_POFF_STR) == 0)
 		cmd = vmcmd_on_poff;
+	else if (strcmp(trigger->name, ON_RESTART_STR) == 0)
+		cmd = vmcmd_on_restart;
 	else
 		return;
 
@@ -1621,11 +1624,43 @@ static void stop_run(struct shutdown_tri
 static struct shutdown_action stop_action = {SHUTDOWN_ACTION_STOP_STR,
 					     stop_run, NULL};
 
+/*
+ * kdump shutdown action: Trigger kdump on shutdown.
+ */
+
+#ifdef CONFIG_CRASH_DUMP
+static int kdump_init(void)
+{
+	if (crashk_res.start == 0)
+		return -EOPNOTSUPP;
+	return 0;
+}
+
+static void kdump_run(struct shutdown_trigger *trigger)
+{
+	/*
+	 * We do not call crash_kexec(), because the image could also
+	 * be loaded externally without kexec_load(). In this case
+	 * crash_kexec() would have no effect because crash_image is not
+	 * defined.
+	 */
+	machine_kdump();
+	disabled_wait((unsigned long) __builtin_return_address(0));
+}
+
+static struct shutdown_action kdump_action = {SHUTDOWN_ACTION_KDUMP_STR,
+					     kdump_run, kdump_init};
+#endif
+
 /* action list */
 
 static struct shutdown_action *shutdown_actions_list[] = {
 	&ipl_action, &reipl_action, &dump_reipl_action, &dump_action,
-	&vmcmd_action, &stop_action};
+	&vmcmd_action, &stop_action,
+#ifdef CONFIG_CRASH_DUMP
+	&kdump_action
+#endif
+	};
 #define SHUTDOWN_ACTIONS_COUNT (sizeof(shutdown_actions_list) / sizeof(void *))
 
 /*
@@ -1707,6 +1742,34 @@ static void do_panic(void)
 	stop_run(&on_panic_trigger);
 }
 
+/* on restart */
+
+static struct shutdown_trigger on_restart_trigger = {ON_RESTART_STR,
+	&reipl_action};
+
+static ssize_t on_restart_show(struct kobject *kobj,
+			       struct kobj_attribute *attr, char *page)
+{
+	return sprintf(page, "%s\n", on_restart_trigger.action->name);
+}
+
+static ssize_t on_restart_store(struct kobject *kobj,
+				struct kobj_attribute *attr,
+				const char *buf, size_t len)
+{
+	return set_trigger(buf, &on_restart_trigger, len);
+}
+
+static struct kobj_attribute on_restart_attr =
+	__ATTR(on_restart, 0644, on_restart_show, on_restart_store);
+
+void do_restart(void)
+{
+	smp_send_stop();
+	on_restart_trigger.action->fn(&on_restart_trigger);
+	stop_run(&on_restart_trigger);
+}
+
 /* on halt */
 
 static struct shutdown_trigger on_halt_trigger = {ON_HALT_STR, &stop_action};
@@ -1767,6 +1830,16 @@ void (*_machine_power_off)(void) = do_ma
 
 static void __init shutdown_triggers_init(void)
 {
+#ifdef CONFIG_CRASH_DUMP
+	/*
+	 * We set the kdump action for panic and restart, if the kdump
+	 * reserved area is defined.
+	 */
+	if (crashk_res.start != 0) {
+		on_restart_trigger.action = &kdump_action;
+		on_panic_trigger.action = &kdump_action;
+	}
+#endif
 	shutdown_actions_kset = kset_create_and_add("shutdown_actions", NULL,
 						    firmware_kobj);
 	if (!shutdown_actions_kset)
@@ -1783,7 +1856,9 @@ static void __init shutdown_triggers_ini
 	if (sysfs_create_file(&shutdown_actions_kset->kobj,
 			      &on_poff_attr.attr))
 		goto fail;
-
+	if (sysfs_create_file(&shutdown_actions_kset->kobj,
+			      &on_restart_attr.attr))
+		goto fail;
 	return;
 fail:
 	panic("shutdown_triggers_init failed\n");
@@ -1908,6 +1983,26 @@ void __init setup_ipl(void)
 	atomic_notifier_chain_register(&panic_notifier_list, &on_panic_nb);
 }
 
+/*
+ * In case of kdump get re-IPL configuration of crashed system via meminfo
+ */
+static int __init ipl_kdump_ipib_init(void)
+{
+#ifdef CONFIG_CRASH_DUMP
+	struct meminfo meminfo_ipib;
+
+	if (!is_kdump_kernel())
+		return -EINVAL;
+	if (meminfo_old_get(MEMINFO_TYPE_IPIB, &meminfo_ipib))
+		return -EINVAL;
+	crash_read_from_oldmem(&ipl_block, sizeof(ipl_block),
+			       meminfo_ipib.addr, 0);
+	return 0;
+#else
+	return -EINVAL;
+#endif
+}
+
 void __init ipl_update_parameters(void)
 {
 	int rc;
@@ -1915,6 +2010,35 @@ void __init ipl_update_parameters(void)
 	rc = diag308(DIAG308_STORE, &ipl_block);
 	if ((rc == DIAG308_RC_OK) || (rc == DIAG308_RC_NOCONFIG))
 		diag308_set_works = 1;
+	ipl_kdump_ipib_init();
+}
+
+/*
+ * For kdump IPL we set the IPL info to the values that get from the crashed
+ * system using the ipib meminfo pointer. Then a reboot of the kdump
+ * kernel will reboot the original system.
+ */
+static int setup_kdump_iplinfo(struct cio_iplinfo *iplinfo)
+{
+#ifdef CONFIG_CRASH_DUMP
+	if (ipl_kdump_ipib_init())
+		return -EINVAL;
+
+	if (ipl_block.hdr.pbt == DIAG308_IPL_TYPE_CCW) {
+		iplinfo->devno = ipl_block.ipl_info.ccw.devno;
+		iplinfo->is_qdio = 0;
+		return 0;
+	}
+	if (ipl_block.hdr.pbt == DIAG308_IPL_TYPE_FCP) {
+		iplinfo->devno = ipl_block.ipl_info.fcp.devno;
+		iplinfo->is_qdio = 1;
+		S390_lowcore.ipl_parmblock_ptr = (unsigned long) &ipl_block;
+		return 0;
+	}
+	return -ENODEV;
+#else
+	return -ENODEV;
+#endif
 }
 
 void __init ipl_save_parameters(void)
@@ -1922,9 +2046,13 @@ void __init ipl_save_parameters(void)
 	struct cio_iplinfo iplinfo;
 	void *src, *dst;
 
-	if (cio_get_iplinfo(&iplinfo))
-		return;
-
+	if (is_kdump_kernel()) {
+		if (setup_kdump_iplinfo(&iplinfo))
+			return;
+	} else {
+		if (cio_get_iplinfo(&iplinfo))
+			return;
+	}
 	ipl_devno = iplinfo.devno;
 	ipl_flags |= IPL_DEVNO_VALID;
 	if (!iplinfo.is_qdio)
@@ -1992,7 +2120,10 @@ void s390_reset_system(void)
 	S390_lowcore.program_new_psw.mask = psw_kernel_bits & ~PSW_MASK_MCHECK;
 	S390_lowcore.program_new_psw.addr =
 		PSW_ADDR_AMODE | (unsigned long) s390_base_pgm_handler;
-
-	do_reset_calls();
+#ifdef CONFIG_64BIT
+	if (diag308_set_works)
+		do_reset_diag308();
+	else
+#endif
+		do_reset_calls();
 }
-
--- a/arch/s390/kernel/machine_kexec.c
+++ b/arch/s390/kernel/machine_kexec.c
@@ -21,12 +21,169 @@
 #include <asm/smp.h>
 #include <asm/reset.h>
 #include <asm/ipl.h>
+#include <asm/cacheflush.h>
+#include <asm/asm-offsets.h>
+#include <asm/checksum.h>
+#include <asm/diag.h>
+#include <asm/sclp.h>
 
 typedef void (*relocate_kernel_t)(kimage_entry_t *, unsigned long);
 
 extern const unsigned char relocate_kernel[];
 extern const unsigned long long relocate_kernel_len;
 
+#ifdef CONFIG_CRASH_DUMP
+
+static struct meminfo meminfo_kdump_segments[KEXEC_SEGMENT_MAX];
+
+/*
+ * S390 version: Currently we do not support freeing crashkernel memory
+ */
+void crash_free_reserved_phys_range(unsigned long begin, unsigned long end)
+{
+	return;
+}
+
+/*
+ * S390 version: Just do real copy of segment
+ */
+int kimage_load_crash_segment(struct kimage *image,
+			      struct kexec_segment *segment)
+{
+	return copy_from_user_real((void *) segment->mem, segment->buf,
+				   segment->bufsz);
+}
+
+/*
+ * Update KDUMP_MEM meminfo and store oldmem base and size to absolute zero
+ */
+static void kdump_mem_update(void)
+{
+	unsigned long base, size;
+
+	base = crashk_res.start;
+	size = crashk_res.end - crashk_res.start + 1;
+	memcpy_real((void *) __LC_OLDMEM_BASE + base, &base, sizeof(base));
+	memcpy_real((void *) __LC_OLDMEM_SIZE + base, &size, sizeof(size));
+	meminfo_update(MEMINFO_TYPE_KDUMP_MEM, (void *) base, size,
+		       MEMINFO_FLAG_ELEM_VALID);
+}
+
+/*
+ * Clear kdump segments (kdump has been unloaded)
+ */
+static void kdump_segments_clear(void)
+{
+	memset(meminfo_kdump_segments, 0, sizeof(meminfo_kdump_segments));
+	meminfo_update(MEMINFO_TYPE_KDUMP_SEGM, NULL, 0, 0);
+	if (MACHINE_IS_VM)
+		diag10_range(PFN_DOWN(crashk_res.start),
+			     PFN_DOWN(crashk_res.end - crashk_res.start + 1));
+}
+
+/*
+ * Update kdump segments (kdump has been loaded)
+ */
+static void kdump_segments_update(struct kimage *image)
+{
+	int i, flags = MEMINFO_FLAG_ELEM_VALID | MEMINFO_FLAG_CSUM_VALID;
+
+	memset(meminfo_kdump_segments, 0, sizeof(meminfo_kdump_segments));
+
+	for (i = 0; i < image->nr_segments; i++) {
+		meminfo_kdump_segments[i].addr = image->segment[i].mem;
+		meminfo_kdump_segments[i].size = image->segment[i].memsz;
+		meminfo_kdump_segments[i].flags = flags;
+	}
+
+	meminfo_update(MEMINFO_TYPE_KDUMP_SEGM, &meminfo_kdump_segments,
+		       image->nr_segments * sizeof(struct meminfo),
+		       flags | MEMINFO_FLAG_ELEM_IND);
+}
+
+/*
+ * Finish kexec_load() and update meminfo data in case of kdump
+ */
+void machine_kexec_finish(struct kimage *image, int kexec_flags)
+{
+	if (!(kexec_flags & KEXEC_ON_CRASH))
+		return;
+	kdump_mem_update();
+	if (image)
+		kdump_segments_update(image);
+	else
+		kdump_segments_clear();
+}
+
+/*
+ * Print error message and load disabled wait PSW
+ */
+static void kdump_failed(const char *str)
+{
+	psw_t kdump_failed_psw;
+
+	kdump_failed_psw.mask = PSW_BASE_BITS | PSW_MASK_WAIT;
+	kdump_failed_psw.addr = (unsigned long) kdump_failed;
+	_sclp_print_early(str);
+	_sclp_print_early("Please use alternative dump tool");
+	__load_psw(kdump_failed_psw);
+}
+
+/*
+ * Check if kdump is loaded/valid and start it
+ */
+static void __machine_kdump(void *data)
+{
+	u32 flags = meminfo_array[MEMINFO_TYPE_KDUMP_SEGM].flags;
+	struct meminfo root;
+	psw_t kdump_psw;
+	u32 csum;
+
+	pfault_fini();
+	s390_reset_system();
+	__arch_local_irq_stnsm(0xfb); /* disable DAT */
+	do_store_status();
+
+	if (!(flags & MEMINFO_FLAG_ELEM_VALID))
+		kdump_failed("kdump failed: Kernel not loaded");
+
+	copy_from_absolute_zero(&root, &S390_lowcore.meminfo, sizeof(root));
+	copy_from_absolute_zero(&csum, &S390_lowcore.meminfo_csum,
+				sizeof(csum));
+	if (csum != csum_partial(&root, sizeof(root), 0))
+		kdump_failed("kdump failed: Invalid meminfo checksum");
+	if (meminfo_csum_check(&root, 1))
+		kdump_failed("kdump failed: Invalid checksum");
+
+	_sclp_print_early("Starting kdump");
+	kdump_psw.mask = PSW_BASE_BITS | PSW_DEFAULT_KEY;
+	kdump_psw.addr = crashk_res.start + 0x10008;
+	__load_psw(kdump_psw);
+}
+
+/*
+ * Start kdump on IPL CPU
+ */
+void machine_kdump(void)
+{
+	crash_save_vmcoreinfo();
+	smp_switch_to_ipl_cpu(__machine_kdump, NULL);
+}
+#endif
+
+/*
+ * Invalidate KDUMP_SEGM meminfo before new kdump is loaded
+ */
+static int machine_kexec_prepare_kdump(void)
+{
+#ifdef CONFIG_CRASH_DUMP
+	kdump_segments_clear();
+	return 0;
+#else
+	return -EINVAL;
+#endif
+}
+
 int machine_kexec_prepare(struct kimage *image)
 {
 	void *reboot_code_buffer;
@@ -35,6 +192,9 @@ int machine_kexec_prepare(struct kimage
 	if (ipl_flags & IPL_NSS_VALID)
 		return -ENOSYS;
 
+	if (image->type == KEXEC_TYPE_CRASH)
+		return machine_kexec_prepare_kdump();
+
 	/* We don't support anything but the default image type for now. */
 	if (image->type != KEXEC_TYPE_DEFAULT)
 		return -EINVAL;
@@ -72,6 +232,10 @@ static void __machine_kexec(void *data)
 
 void machine_kexec(struct kimage *image)
 {
+#ifdef CONFIG_CRASH_DUMP
+	if (image->type == KEXEC_TYPE_CRASH)
+		machine_kdump();
+#endif
 	tracer_disable();
 	smp_send_stop();
 	smp_switch_to_ipl_cpu(__machine_kexec, image);
--- a/arch/s390/kernel/mem_detect.c
+++ b/arch/s390/kernel/mem_detect.c
@@ -62,3 +62,73 @@ void detect_memory_layout(struct mem_chu
 	arch_local_irq_restore(flags);
 }
 EXPORT_SYMBOL(detect_memory_layout);
+
+/*
+ * Create memory hole with given address, size, and type
+ */
+void create_mem_hole(struct mem_chunk chunks[], unsigned long addr,
+		     unsigned long size, int type)
+{
+	unsigned long start, end, new_size;
+	int i;
+
+	for (i = 0; i < MEMORY_CHUNKS; i++) {
+		if (chunks[i].size == 0)
+			continue;
+		if (addr + size < chunks[i].addr)
+			continue;
+		if (addr >= chunks[i].addr + chunks[i].size)
+			continue;
+		start = max(addr, chunks[i].addr);
+		end = min(addr + size, chunks[i].addr + chunks[i].size);
+		new_size = end - start;
+		if (new_size == 0)
+			continue;
+		if (start == chunks[i].addr &&
+		    end == chunks[i].addr + chunks[i].size) {
+			/* Remove chunk */
+			chunks[i].type = type;
+		} else if (start == chunks[i].addr) {
+			/* Make chunk smaller at start */
+			if (i >= MEMORY_CHUNKS - 1)
+				panic("Unable to create memory hole");
+			memmove(&chunks[i + 1], &chunks[i],
+				sizeof(struct mem_chunk) *
+				(MEMORY_CHUNKS - (i + 1)));
+			chunks[i + 1].addr = chunks[i].addr + new_size;
+			chunks[i + 1].size = chunks[i].size - new_size;
+			chunks[i].size = new_size;
+			chunks[i].type = type;
+			i += 1;
+		} else if (end == chunks[i].addr + chunks[i].size) {
+			/* Make chunk smaller at end */
+			if (i >= MEMORY_CHUNKS - 1)
+				panic("Unable to create memory hole");
+			memmove(&chunks[i + 1], &chunks[i],
+				sizeof(struct mem_chunk) *
+				(MEMORY_CHUNKS - (i + 1)));
+			chunks[i + 1].addr = start;
+			chunks[i + 1].size = new_size;
+			chunks[i + 1].type = type;
+			chunks[i].size -= new_size;
+			i += 1;
+		} else {
+			/* Create memory hole */
+			if (i >= MEMORY_CHUNKS - 2)
+				panic("Unable to create memory hole");
+			memmove(&chunks[i + 2], &chunks[i],
+				sizeof(struct mem_chunk) *
+				(MEMORY_CHUNKS - (i + 2)));
+			chunks[i + 1].addr = addr;
+			chunks[i + 1].size = size;
+			chunks[i + 1].type = type;
+			chunks[i + 2].addr = addr + size;
+			chunks[i + 2].size =
+				chunks[i].addr + chunks[i].size - (addr + size);
+			chunks[i + 2].type = chunks[i].type;
+			chunks[i].size = addr - chunks[i].addr;
+			i += 2;
+		}
+	}
+}
+
--- /dev/null
+++ b/arch/s390/kernel/meminfo.c
@@ -0,0 +1,132 @@
+/*
+ * Store memory information for external users like stand-alone dump tools
+ *
+ * Copyright IBM Corp. 2011
+ * Author(s): Michael Holzheu <holzheu@linux.vnet.ibm.com>
+ */
+
+#include <asm/asm-offsets.h>
+#include <asm/lowcore.h>
+#include <asm/checksum.h>
+
+struct meminfo meminfo_array[MEMINFO_TYPE_LAST];
+
+static inline int meminfo_ind_cnt(struct meminfo *meminfo)
+{
+	return meminfo->size / sizeof(struct meminfo);
+}
+
+/*
+ * Recursively update meminfo checksums
+ */
+static void meminfo_csum_update(struct meminfo *meminfo)
+{
+	struct meminfo *child;
+	int i;
+
+	if (!(meminfo->flags & MEMINFO_FLAG_CSUM_VALID))
+		return;
+	if (meminfo->flags & MEMINFO_FLAG_ELEM_IND) {
+		child = (struct meminfo *) meminfo->addr;
+		for (i = 0; i < meminfo_ind_cnt(meminfo); i++) {
+			if (!(child[i].flags & MEMINFO_FLAG_ELEM_VALID))
+				continue;
+			meminfo_csum_update(&child[i]);
+		}
+	}
+	meminfo->csum = csum_partial_real((void *) meminfo->addr,
+					  meminfo->size, 0);
+}
+
+/*
+ * Verify checksum for meminfo element(s)
+ */
+int meminfo_csum_check(struct meminfo *meminfo, int recursive)
+{
+	struct meminfo *child;
+	u32 csum;
+	int i;
+
+	if (!(meminfo->flags & MEMINFO_FLAG_CSUM_VALID))
+		return 0;
+	csum = csum_partial_real((void *) meminfo->addr, meminfo->size, 0);
+	if (meminfo->csum != csum)
+		return -EINVAL;
+	if (!recursive)
+		return 0;
+	if (meminfo->flags & MEMINFO_FLAG_ELEM_IND) {
+		child = (struct meminfo *) meminfo->addr;
+		for (i = 0; i < meminfo_ind_cnt(meminfo); i++) {
+			if (!(child[i].flags & MEMINFO_FLAG_ELEM_VALID))
+				continue;
+			if (meminfo_csum_check(&child[i], 1))
+				return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Update root meminfo element and corresponding checksum
+ */
+static void meminfo_update_root(void)
+{
+	struct meminfo root;
+	u32 csum;
+
+	copy_from_absolute_zero(&root, &S390_lowcore.meminfo, sizeof(root));
+	meminfo_csum_update(&root);
+	copy_to_absolute_zero(&S390_lowcore.meminfo, &root, sizeof(root));
+	csum = csum_partial(&root, sizeof(root), 0);
+	copy_to_absolute_zero(&S390_lowcore.meminfo_csum, &csum, sizeof(csum));
+}
+
+/*
+ * Add memory info for given type
+ */
+void meminfo_update(enum meminfo_type type, void *buf, unsigned long size,
+		    u32 flags)
+{
+	struct meminfo *meminfo = &meminfo_array[type];
+
+	meminfo->addr = (unsigned long) buf;
+	meminfo->size = size;
+	meminfo->flags = flags;
+	meminfo_update_root();
+}
+
+/*
+ * Init meminfo and setup absolute zero pointer
+ */
+void __init meminfo_init(void)
+{
+	struct meminfo root;
+
+	root.addr = (unsigned long) &meminfo_array,
+	root.size = sizeof(meminfo_array),
+	root.flags = MEMINFO_FLAG_ELEM_VALID | MEMINFO_FLAG_ELEM_IND |
+		MEMINFO_FLAG_CSUM_VALID;
+	copy_to_absolute_zero(&S390_lowcore.meminfo, &root, sizeof(root));
+	meminfo_update_root();
+}
+
+#ifdef CONFIG_CRASH_DUMP
+/*
+ * Get meminfo from old kernel
+ */
+int meminfo_old_get(enum meminfo_type type, struct meminfo *meminfo)
+{
+	struct meminfo root, *meminfo_array_old;
+
+	if (!oldmem_base)
+		return -ENOENT;
+	memcpy_real(&root, (void *) oldmem_base + __LC_MEMINFO, sizeof(root));
+	if (type > meminfo_ind_cnt(&root))
+		return -ENOENT;
+	meminfo_array_old = (struct meminfo *) (oldmem_base + root.addr);
+	memcpy_real(meminfo, &meminfo_array_old[type], sizeof(*meminfo));
+	if (!(meminfo->flags & MEMINFO_FLAG_ELEM_VALID))
+		return -ENOENT;
+	return 0;
+}
+#endif
--- a/arch/s390/kernel/reipl64.S
+++ b/arch/s390/kernel/reipl64.S
@@ -1,5 +1,5 @@
 /*
- *    Copyright IBM Corp 2000,2009
+ *    Copyright IBM Corp 2000,2011
  *    Author(s): Holger Smolinski <Holger.Smolinski@de.ibm.com>,
  *		 Denis Joseph Barrow,
  */
@@ -7,6 +7,66 @@
 #include <asm/asm-offsets.h>
 
 #
+# do_store_status
+#
+# Prerequisites to run this function:
+# - DAT mode is off
+# - Prefix register is set to zero
+# - Original prefix register is stored in "dump_prefix_page"
+# - Lowcore protection is off
+#
+	.globl	do_store_status
+do_store_status:
+	/* Save register one and load save area base */
+	stg	%r1,__LC_SAVE_AREA_64(%r0)
+	lghi	%r1,SAVE_AREA_BASE
+	/* General purpose registers */
+	stmg	%r0,%r15,__LC_GPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	lg	%r2,__LC_SAVE_AREA_64(%r0)
+	stg	%r2,__LC_GPREGS_SAVE_AREA-SAVE_AREA_BASE+8(%r1)
+	/* Control registers */
+	stctg	%c0,%c15,__LC_CREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	/* Access registers */
+	stam	%a0,%a15,__LC_AREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	/* Floating point registers */
+	std	%f0, 0x00 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f1, 0x08 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f2, 0x10 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f3, 0x18 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f4, 0x20 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f5, 0x28 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f6, 0x30 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f7, 0x38 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f8, 0x40 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f9, 0x48 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f10,0x50 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f11,0x58 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f12,0x60 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f13,0x68 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f14,0x70 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	std	%f15,0x78 + __LC_FPREGS_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	/* Floating point control register */
+	stfpc	__LC_FP_CREG_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	/* CPU timer */
+	stpt	__LC_CPU_TIMER_SAVE_AREA-SAVE_AREA_BASE(%r1)
+	/* Saved prefix register */
+	larl	%r2,dump_prefix_page
+	mvc	__LC_PREFIX_SAVE_AREA-SAVE_AREA_BASE(4,%r1),0(%r2)
+	/* Clock comparator - seven bytes */
+	larl	%r2,.Lclkcmp
+	stckc	0(%r2)
+	mvc	__LC_CLOCK_COMP_SAVE_AREA-SAVE_AREA_BASE + 1(7,%r1),1(%r2)
+	/* Program status word */
+	epsw	%r2,%r3
+	st	%r2,__LC_PSW_SAVE_AREA-SAVE_AREA_BASE + 0(%r1)
+	st	%r3,__LC_PSW_SAVE_AREA-SAVE_AREA_BASE + 4(%r1)
+	larl	%r2,do_store_status
+	stg	%r2,__LC_PSW_SAVE_AREA-SAVE_AREA_BASE + 8(%r1)
+	br	%r14
+.align	8
+.Lclkcmp:	.quad	0x0000000000000000
+
+#
 # do_reipl_asm
 # Parameter: r2 = schid of reipl device
 #
@@ -14,22 +74,7 @@
 		.globl	do_reipl_asm
 do_reipl_asm:	basr	%r13,0
 .Lpg0:		lpswe	.Lnewpsw-.Lpg0(%r13)
-.Lpg1:		# do store status of all registers
-
-		stg	%r1,.Lregsave-.Lpg0(%r13)
-		lghi	%r1,0x1000
-		stmg	%r0,%r15,__LC_GPREGS_SAVE_AREA-0x1000(%r1)
-		lg	%r0,.Lregsave-.Lpg0(%r13)
-		stg	%r0,__LC_GPREGS_SAVE_AREA-0x1000+8(%r1)
-		stctg	%c0,%c15,__LC_CREGS_SAVE_AREA-0x1000(%r1)
-		stam	%a0,%a15,__LC_AREGS_SAVE_AREA-0x1000(%r1)
-		lg	%r10,.Ldump_pfx-.Lpg0(%r13)
-		mvc	__LC_PREFIX_SAVE_AREA-0x1000(4,%r1),0(%r10)
-		stfpc	__LC_FP_CREG_SAVE_AREA-0x1000(%r1)
-		stckc	.Lclkcmp-.Lpg0(%r13)
-		mvc	__LC_CLOCK_COMP_SAVE_AREA-0x1000(7,%r1),.Lclkcmp-.Lpg0(%r13)
-		stpt	__LC_CPU_TIMER_SAVE_AREA-0x1000(%r1)
-		stg	%r13, __LC_PSW_SAVE_AREA-0x1000+8(%r1)
+.Lpg1:		brasl	%r14,do_store_status
 
 		lctlg	%c6,%c6,.Lall-.Lpg0(%r13)
 		lgr	%r1,%r2
@@ -66,10 +111,7 @@ do_reipl_asm:	basr	%r13,0
 		st	%r14,.Ldispsw+12-.Lpg0(%r13)
 		lpswe	.Ldispsw-.Lpg0(%r13)
 		.align	8
-.Lclkcmp:	.quad	0x0000000000000000
 .Lall:		.quad	0x00000000ff000000
-.Ldump_pfx:	.quad	dump_prefix_page
-.Lregsave:	.quad	0x0000000000000000
 		.align	16
 /*
  * These addresses have to be 31 bit otherwise
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -42,6 +42,9 @@
 #include <linux/reboot.h>
 #include <linux/topology.h>
 #include <linux/ftrace.h>
+#include <linux/kexec.h>
+#include <linux/crash_dump.h>
+#include <linux/memory.h>
 
 #include <asm/ipl.h>
 #include <asm/uaccess.h>
@@ -57,6 +60,7 @@
 #include <asm/ebcdic.h>
 #include <asm/compat.h>
 #include <asm/kvm_virtio.h>
+#include <asm/diag.h>
 
 long psw_kernel_bits	= (PSW_BASE_BITS | PSW_MASK_DAT | PSW_ASC_PRIMARY |
 			   PSW_MASK_MCHECK | PSW_DEFAULT_KEY);
@@ -346,7 +350,7 @@ setup_lowcore(void)
 	lc = __alloc_bootmem_low(LC_PAGES * PAGE_SIZE, LC_PAGES * PAGE_SIZE, 0);
 	lc->restart_psw.mask = PSW_BASE_BITS | PSW_DEFAULT_KEY;
 	lc->restart_psw.addr =
-		PSW_ADDR_AMODE | (unsigned long) restart_int_handler;
+		PSW_ADDR_AMODE | (unsigned long) psw_restart_int_handler;
 	if (user_mode != HOME_SPACE_MODE)
 		lc->restart_psw.mask |= PSW_ASC_HOME;
 	lc->external_new_psw.mask = psw_kernel_bits;
@@ -435,6 +439,9 @@ static void __init setup_resources(void)
 	for (i = 0; i < MEMORY_CHUNKS; i++) {
 		if (!memory_chunk[i].size)
 			continue;
+		if (memory_chunk[i].type == CHUNK_OLDMEM ||
+		    memory_chunk[i].type == CHUNK_CRASHK)
+			continue;
 		res = alloc_bootmem_low(sizeof(*res));
 		res->flags = IORESOURCE_BUSY | IORESOURCE_MEM;
 		switch (memory_chunk[i].type) {
@@ -479,6 +486,7 @@ static void __init setup_memory_end(void
 	unsigned long max_mem;
 	int i;
 
+
 #ifdef CONFIG_ZFCPDUMP
 	if (ipl_info.type == IPL_TYPE_FCP_DUMP) {
 		memory_end = ZFCPDUMP_HSA_SIZE;
@@ -529,6 +537,193 @@ static void __init setup_memory_end(void
 		memory_end = memory_size;
 }
 
+void *restart_stack __attribute__((__section__(".data")));
+
+/*
+ * Setup new PSW and allocate stack for PSW restart interrupt
+ */
+static void __init setup_restart_psw(void)
+{
+	psw_t psw;
+
+	restart_stack = __alloc_bootmem(ASYNC_SIZE, ASYNC_SIZE, 0);
+	restart_stack += ASYNC_SIZE;
+
+	/*
+	 * Setup restart PSW for absolute zero lowcore. This is necesary
+	 * if PSW restart is done on an offline CPU that has lowcore zero
+	 */
+	psw.mask = PSW_BASE_BITS | PSW_DEFAULT_KEY;
+	psw.addr = PSW_ADDR_AMODE | (unsigned long) psw_restart_int_handler;
+	copy_to_absolute_zero(&S390_lowcore.restart_psw, &psw, sizeof(psw));
+}
+
+#ifdef CONFIG_CRASH_DUMP
+
+/*
+ * Find suitable location for crashkernel memory
+ */
+static unsigned long __init find_crash_base(unsigned long crash_size)
+{
+	unsigned long crash_base;
+	struct mem_chunk *chunk;
+	int i;
+
+	if (is_kdump_kernel() && (crash_size == oldmem_size))
+		return oldmem_base;
+
+	for (i = MEMORY_CHUNKS - 1; i >= 0; i--) {
+		chunk = &memory_chunk[i];
+		if (chunk->size == 0)
+			continue;
+		if (chunk->type != CHUNK_READ_WRITE)
+			continue;
+		if (chunk->size < crash_size)
+			continue;
+		crash_base = max(chunk->addr, crash_size);
+		crash_base = max(crash_base, ZFCPDUMP_HSA_SIZE_MAX);
+		crash_base = max(crash_base, (unsigned long) INITRD_START +
+				 INITRD_SIZE);
+		crash_base = PAGE_ALIGN(crash_base);
+		if (crash_base >= chunk->addr + chunk->size)
+			continue;
+		if (chunk->addr + chunk->size - crash_base < crash_size)
+			continue;
+		crash_base = chunk->size - crash_size;
+		return crash_base;
+	}
+	return 0;
+}
+
+/*
+ * Check if crash_base and crash_size is valid
+ */
+static int __init verify_crash_base(unsigned long crash_base,
+				    unsigned long crash_size)
+{
+	struct mem_chunk *chunk;
+	int i;
+
+	/*
+	 * Because we do the swap to zero, we must have at least 'crash_size'
+	 * bytes free space before crash_base
+	 */
+	if (crash_size > crash_base)
+		return -EINVAL;
+
+	/* First memory chunk must be at least crash_size */
+	if (memory_chunk[0].size < crash_size)
+		return -EINVAL;
+
+	/* Check if we fit into the respective memory chunk */
+	for (i = 0; i < MEMORY_CHUNKS; i++) {
+		chunk = &memory_chunk[i];
+		if (chunk->size == 0)
+			continue;
+		if (crash_base < chunk->addr)
+			continue;
+		if (crash_base >= chunk->addr + chunk->size)
+			continue;
+		/* we have found the memory chunk */
+		if (crash_base + crash_size > chunk->addr + chunk->size)
+			return -EINVAL;
+		return 0;
+	}
+	return -EINVAL;
+}
+
+/*
+ * Reserve kdump memory by creating a memory hole in the mem_chunk array
+ */
+static void __init reserve_kdump_bootmem(unsigned long addr, unsigned long size,
+					 int type)
+{
+	create_mem_hole(memory_chunk, addr, size, type);
+}
+
+/*
+ * When kdump is enabled, we have to ensure that no memory from
+ * the area [0 - crashkernel memory size] is set offline
+ */
+static int kdump_mem_notifier(struct notifier_block *nb,
+			      unsigned long action, void *data)
+{
+	struct memory_notify *arg = data;
+
+	if (arg->start_pfn >= PFN_DOWN(crashk_res.end - crashk_res.start + 1))
+		return NOTIFY_OK;
+	return NOTIFY_BAD;
+}
+
+static struct notifier_block kdump_mem_nb = {
+	.notifier_call = kdump_mem_notifier,
+};
+#endif
+
+/*
+ * Make sure that oldmem, where the dump is stored, is protected
+ */
+static void reserve_oldmem(void)
+{
+#ifdef CONFIG_CRASH_DUMP
+	if (!is_kdump_kernel())
+		return;
+
+	reserve_kdump_bootmem(oldmem_base, oldmem_size, CHUNK_OLDMEM);
+	reserve_kdump_bootmem(oldmem_size, memory_end - oldmem_size,
+			      CHUNK_OLDMEM);
+	if (oldmem_base + oldmem_size == real_memory_size)
+		saved_max_pfn = PFN_DOWN(oldmem_base) - 1;
+	else
+		saved_max_pfn = PFN_DOWN(real_memory_size) - 1;
+#endif
+}
+
+/*
+ * Reserve memory for kdump kernel to be loaded with kexec
+ */
+static void __init reserve_crashkernel(void)
+{
+#ifdef CONFIG_CRASH_DUMP
+	unsigned long long crash_base, crash_size;
+	int rc;
+
+	rc = parse_crashkernel(boot_command_line, memory_end, &crash_size,
+			       &crash_base);
+	if (rc || crash_size == 0)
+		return;
+	if (register_memory_notifier(&kdump_mem_nb))
+		return;
+	if (!crash_base)
+		crash_base = find_crash_base(crash_size);
+	if (!crash_base) {
+		pr_info("crashkernel reservation failed: %s\n",
+			"No suitable area found");
+		unregister_memory_notifier(&kdump_mem_nb);
+		return;
+	}
+	if (verify_crash_base(crash_base, crash_size)) {
+		pr_info("crashkernel reservation failed: %s\n",
+			"Invalid memory range specified");
+		unregister_memory_notifier(&kdump_mem_nb);
+		return;
+	}
+	if (!is_kdump_kernel() && MACHINE_IS_VM)
+		diag10_range(PFN_DOWN(crash_base), PFN_DOWN(crash_size));
+	crashk_res.start = crash_base;
+	crashk_res.end = crash_base + crash_size - 1;
+	insert_resource(&iomem_resource, &crashk_res);
+	meminfo_update(MEMINFO_TYPE_KDUMP_MEM, (void *) crash_base,
+		       crash_size, MEMINFO_FLAG_ELEM_VALID);
+	reserve_kdump_bootmem(crashk_res.start,
+			      crashk_res.end - crashk_res.start + 1,
+			      CHUNK_CRASHK);
+	pr_info("Reserving %lluMB of memory at %lluMB "
+		"for crashkernel (System RAM: %luMB)\n",
+		crash_size >> 20, crash_base >> 20, memory_end >> 20);
+#endif
+}
+
 static void __init
 setup_memory(void)
 {
@@ -559,6 +754,14 @@ setup_memory(void)
 		if (PFN_PHYS(start_pfn) + bmap_size > INITRD_START) {
 			start = PFN_PHYS(start_pfn) + bmap_size + PAGE_SIZE;
 
+#ifdef CONFIG_CRASH_DUMP
+			if (is_kdump_kernel()) {
+				/* Move initrd behind kdump oldmem */
+				if (start + INITRD_SIZE > oldmem_base &&
+				    start < oldmem_base + oldmem_size)
+					start = oldmem_base + oldmem_size;
+			}
+#endif
 			if (start + INITRD_SIZE > memory_end) {
 				pr_err("initrd extends beyond end of "
 				       "memory (0x%08lx > 0x%08lx) "
@@ -787,11 +990,16 @@ setup_arch(char **cmdline_p)
 
 	parse_early_param();
 
+	meminfo_update(MEMINFO_TYPE_VMCOREINFO, &vmcoreinfo_note,
+		       sizeof(vmcoreinfo_note), MEMINFO_FLAG_ELEM_VALID);
 	setup_ipl();
 	setup_memory_end();
 	setup_addressing_mode();
+	reserve_oldmem();
+	reserve_crashkernel();
 	setup_memory();
 	setup_resources();
+	setup_restart_psw();
 	setup_lowcore();
 
         cpu_init();
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -38,6 +38,7 @@
 #include <linux/timex.h>
 #include <linux/bootmem.h>
 #include <linux/slab.h>
+#include <linux/crash_dump.h>
 #include <asm/asm-offsets.h>
 #include <asm/ipl.h>
 #include <asm/setup.h>
@@ -281,11 +282,11 @@ void smp_ctl_clear_bit(int cr, int bit)
 }
 EXPORT_SYMBOL(smp_ctl_clear_bit);
 
-#ifdef CONFIG_ZFCPDUMP
+#if defined(CONFIG_ZFCPDUMP) || defined(CONFIG_CRASH_DUMP)
 
 static void __init smp_get_save_area(unsigned int cpu, unsigned int phy_cpu)
 {
-	if (ipl_info.type != IPL_TYPE_FCP_DUMP)
+	if (ipl_info.type != IPL_TYPE_FCP_DUMP && !is_kdump_kernel())
 		return;
 	if (cpu >= NR_CPUS) {
 		pr_warning("CPU %i exceeds the maximum %i and is excluded from "
@@ -403,6 +404,19 @@ static void __init smp_detect_cpus(void)
 	info = kmalloc(sizeof(*info), GFP_KERNEL);
 	if (!info)
 		panic("smp_detect_cpus failed to allocate memory\n");
+
+#ifdef CONFIG_CRASH_DUMP
+	if (is_kdump_kernel()) {
+		struct save_area *save_area;
+
+		save_area = kmalloc(sizeof(*save_area), GFP_KERNEL);
+		if (!save_area)
+			panic("could not allocate memory for save area\n");
+		crash_read_from_oldmem(save_area, sizeof(*save_area),
+				       SAVE_AREA_BASE, 0);
+		zfcpdump_save_areas[0] = save_area;
+	}
+#endif
 	/* Use sigp detection algorithm if sclp doesn't work. */
 	if (sclp_get_cpu_info(info)) {
 		smp_use_sigp_detection = 1;
@@ -470,6 +484,11 @@ int __cpuinit start_secondary(void *cpuv
 	ipi_call_unlock();
 	/* Switch on interrupts */
 	local_irq_enable();
+	__ctl_clear_bit(0, 28); /* Disable lowcore protection */
+	S390_lowcore.restart_psw.mask = PSW_BASE_BITS | PSW_DEFAULT_KEY;
+	S390_lowcore.restart_psw.addr =
+		PSW_ADDR_AMODE | (unsigned long) psw_restart_int_handler;
+	__ctl_set_bit(0, 28); /* Enable lowcore protection */
 	/* cpu_idle will call schedule for us */
 	cpu_idle();
 	return 0;
@@ -507,6 +526,9 @@ static int __cpuinit smp_alloc_lowcore(i
 	memset((char *)lowcore + 512, 0, sizeof(*lowcore) - 512);
 	lowcore->async_stack = async_stack + ASYNC_SIZE;
 	lowcore->panic_stack = panic_stack + PAGE_SIZE;
+	lowcore->restart_psw.mask = PSW_BASE_BITS | PSW_DEFAULT_KEY;
+	lowcore->restart_psw.addr =
+		PSW_ADDR_AMODE | (unsigned long) restart_int_handler;
 
 #ifndef CONFIG_64BIT
 	if (MACHINE_HAS_IEEE) {
--- a/arch/s390/mm/maccess.c
+++ b/arch/s390/mm/maccess.c
@@ -11,6 +11,7 @@
 #include <linux/kernel.h>
 #include <linux/types.h>
 #include <linux/errno.h>
+#include <linux/gfp.h>
 #include <asm/system.h>
 
 /*
@@ -60,6 +61,9 @@ long probe_kernel_write(void *dst, const
 	return copied < 0 ? -EFAULT : 0;
 }
 
+/*
+ * Copy memory in real mode (kernel to kernel)
+ */
 int memcpy_real(void *dest, void *src, size_t count)
 {
 	register unsigned long _dest asm("2") = (unsigned long) dest;
@@ -85,3 +89,82 @@ int memcpy_real(void *dest, void *src, s
 	arch_local_irq_restore(flags);
 	return rc;
 }
+
+/*
+ * Copy memory from kernel (real) to user (virtual)
+ */
+int copy_to_user_real(void __user *dest, void *src, size_t count)
+{
+	int offs = 0, size, rc;
+	char *buf;
+
+	buf = (char *) __get_free_page(GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+	rc = -EFAULT;
+	while (offs < count) {
+		size = min(PAGE_SIZE, count - offs);
+		if (memcpy_real(buf, src + offs, size))
+			goto out;
+		if (copy_to_user(dest + offs, buf, size))
+			goto out;
+		offs += size;
+	}
+	rc = 0;
+out:
+	free_page((unsigned long) buf);
+	return rc;
+}
+
+/*
+ * Copy memory from user (virtual) to kernel (real)
+ */
+int copy_from_user_real(void *dest, void __user *src, size_t count)
+{
+	int offs = 0, size, rc;
+	char *buf;
+
+	buf = (char *) __get_free_page(GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+	rc = -EFAULT;
+	while (offs < count) {
+		size = min(PAGE_SIZE, count - offs);
+		if (copy_from_user(buf, src + offs, size))
+			goto out;
+		if (memcpy_real(dest + offs, buf, size))
+			goto out;
+		offs += size;
+	}
+	rc = 0;
+out:
+	free_page((unsigned long) buf);
+	return rc;
+}
+
+/*
+ * Copy memory to absolute zero
+ */
+void copy_to_absolute_zero(void *dest, void *src, size_t count)
+{
+	unsigned long cr0;
+
+	BUG_ON((unsigned long) dest + count >= sizeof(struct _lowcore));
+	preempt_disable();
+	__ctl_store(cr0, 0, 0);
+	__ctl_clear_bit(0, 28); /* disable lowcore protection */
+	memcpy_real(dest + store_prefix(), src, count);
+	__ctl_load(cr0, 0, 0);
+	preempt_enable();
+}
+
+/*
+ * Copy memory from absolute zero
+ */
+void copy_from_absolute_zero(void *dest, void *src, size_t count)
+{
+	BUG_ON((unsigned long) src + count >= sizeof(struct _lowcore));
+	preempt_disable();
+	memcpy_real(dest, src + store_prefix(), count);
+	preempt_enable();
+}
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -335,6 +335,9 @@ void __init vmem_map_init(void)
 	ro_start = ((unsigned long)&_stext) & PAGE_MASK;
 	ro_end = PFN_ALIGN((unsigned long)&_eshared);
 	for (i = 0; i < MEMORY_CHUNKS && memory_chunk[i].size > 0; i++) {
+		if (memory_chunk[i].type == CHUNK_CRASHK ||
+		    memory_chunk[i].type == CHUNK_OLDMEM)
+			continue;
 		start = memory_chunk[i].addr;
 		end = memory_chunk[i].addr + memory_chunk[i].size;
 		if (start >= ro_end || end <= ro_start)
--- a/drivers/s390/char/zcore.c
+++ b/drivers/s390/char/zcore.c
@@ -142,22 +142,6 @@ static int memcpy_hsa_kernel(void *dest,
 	return memcpy_hsa(dest, src, count, TO_KERNEL);
 }
 
-static int memcpy_real_user(void __user *dest, unsigned long src, size_t count)
-{
-	static char buf[4096];
-	int offs = 0, size;
-
-	while (offs < count) {
-		size = min(sizeof(buf), count - offs);
-		if (memcpy_real(buf, (void *) src + offs, size))
-			return -EFAULT;
-		if (copy_to_user(dest + offs, buf, size))
-			return -EFAULT;
-		offs += size;
-	}
-	return 0;
-}
-
 static int __init init_cpu_info(enum arch_id arch)
 {
 	struct save_area *sa;
@@ -346,8 +330,8 @@ static ssize_t zcore_read(struct file *f
 
 	/* Copy from real mem */
 	size = count - mem_offs - hdr_count;
-	rc = memcpy_real_user(buf + hdr_count + mem_offs, mem_start + mem_offs,
-			      size);
+	rc = copy_to_user_real(buf + hdr_count + mem_offs,
+			       (void *) mem_start + mem_offs, size);
 	if (rc)
 		goto fail;
 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch 9/9] kexec-tools: Add s390 kdump support
  2011-07-04 17:09 ` Michael Holzheu
@ 2011-07-04 17:09   ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, horms, schwidefsky, heiko.carstens, kexec, linux-kernel,
	linux-s390

[-- Attachment #1: kexec-tools-s390-kdump.patch --]
[-- Type: text/plain, Size: 3964 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

This patch adds kdump support for s390 to the kexec tool and enables the
"--load-panic" option. When loading the kdump kernel and ramdisk we add the
address of the crashkernel memory to the normal load address.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 kexec/arch/s390/kexec-image.c |   16 +++++++++++++---
 kexec/arch/s390/kexec-s390.c  |   24 +++++++++++++++++++++---
 2 files changed, 34 insertions(+), 6 deletions(-)

--- a/kexec/arch/s390/kexec-image.c
+++ b/kexec/arch/s390/kexec-image.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 #include <getopt.h>
 #include "../../kexec.h"
+#include "../../kexec-syscall.h"
 #include "kexec-s390.h"
 #include <arch/options.h>
 
@@ -32,6 +33,7 @@ image_s390_load(int argc, char **argv, c
 	int command_line_len;
 	off_t ramdisk_len;
 	unsigned int ramdisk_origin;
+	uint64_t crash_base, crash_end;
 	int opt;
 
 	static const struct option options[] =
@@ -47,6 +49,7 @@ image_s390_load(int argc, char **argv, c
 	command_line = NULL;
 	ramdisk_len = 0;
 	ramdisk_origin = 0;
+	crash_base = 0;
 
 	while ((opt = getopt_long(argc,argv,short_options,options,0)) != -1) {
 		switch(opt) {
@@ -71,10 +74,16 @@ image_s390_load(int argc, char **argv, c
 			return -1;
 		}
 	}
+	if (info->kexec_flags & KEXEC_ON_CRASH) {
+		if (parse_iomem_single("Crash kernel\n", &crash_base,
+				       &crash_end))
+			return -1;
+	}
 
 	/* Add kernel segment */
 	add_segment(info, kernel_buf + IMAGE_READ_OFFSET,
-		    kernel_size - IMAGE_READ_OFFSET, IMAGE_READ_OFFSET,
+		    kernel_size - IMAGE_READ_OFFSET,
+		    crash_base + IMAGE_READ_OFFSET,
 		    kernel_size - IMAGE_READ_OFFSET);
 
 	/* We do want to change the kernel image */
@@ -88,7 +97,8 @@ image_s390_load(int argc, char **argv, c
 			return -1;
 		}
 		ramdisk_origin = RAMDISK_ORIGIN_ADDR;
-		add_segment(info, rd_buffer, ramdisk_len, RAMDISK_ORIGIN_ADDR, ramdisk_len);
+		add_segment(info, rd_buffer, ramdisk_len,
+			    crash_base + RAMDISK_ORIGIN_ADDR, ramdisk_len);
 	}
 	
 	/* Register the ramdisk in the kernel. */
@@ -111,7 +121,7 @@ image_s390_load(int argc, char **argv, c
 		memcpy(krnl_buffer + COMMAND_LINE_OFFS, command_line, strlen(command_line));
 	}
 
-	info->entry = (void *) IMAGE_READ_OFFSET;
+	info->entry = (void *) crash_base + IMAGE_READ_OFFSET;
 
 	return 0;
 }
--- a/kexec/arch/s390/kexec-s390.c
+++ b/kexec/arch/s390/kexec-s390.c
@@ -37,8 +37,9 @@ static struct memory_range memory_range[
  */
 
 int get_memory_ranges(struct memory_range **range, int *ranges,
-		      unsigned long UNUSED(flags))
+		      unsigned long flags)
 {
+	char crash_kernel[] = "Crash kernel\n";
 	char sys_ram[] = "System RAM\n";
 	const char *iomem = proc_iomem();
 	FILE *fp;
@@ -62,7 +63,9 @@ int get_memory_ranges(struct memory_rang
 
 		sscanf(line,"%Lx-%Lx : %n", &start, &end, &cons);
 		str = line+cons;
-		if(memcmp(str,sys_ram,strlen(sys_ram)) == 0) {
+		if((memcmp(str,sys_ram,strlen(sys_ram)) == 0) ||
+		   ((flags & KEXEC_ON_CRASH) &&
+		   memcmp(str,crash_kernel,strlen(crash_kernel)) == 0)) {
 			memory_range[current_range].start = start;
 			memory_range[current_range].end = end;
 			memory_range[current_range].type = RANGE_RAM;
@@ -76,6 +79,18 @@ int get_memory_ranges(struct memory_rang
 	*range = memory_range;
 	*ranges = current_range;
 
+	if ((flags & KEXEC_ON_CRASH) && !(flags & KEXEC_PRESERVE_CONTEXT)) {
+		uint64_t start, end;
+
+		if (parse_iomem_single("Crash kernel\n", &start, &end)) {
+			fprintf(stderr, "parse_iomem_single failed.\n");
+			return -1;
+		}
+		if (start > mem_min)
+			mem_min = start;
+		if (end < mem_max)
+			mem_max = end;
+	}
 	return 0;
 }
 
@@ -112,5 +127,8 @@ void arch_update_purgatory(struct kexec_
 
 int is_crashkernel_mem_reserved(void)
 {
-	return 0; /* kdump is not supported on this platform (yet) */
+	uint64_t start, end;
+
+	return parse_iomem_single("Crash kernel\n", &start, &end) == 0 ?
+		(start != end) : 0;
 }


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [patch 9/9] kexec-tools: Add s390 kdump support
@ 2011-07-04 17:09   ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-04 17:09 UTC (permalink / raw)
  To: ebiederm, vgoyal, hbabu, mahesh
  Cc: oomichi, linux-s390, kexec, heiko.carstens, linux-kernel, horms,
	schwidefsky

[-- Attachment #1: kexec-tools-s390-kdump.patch --]
[-- Type: text/plain, Size: 4108 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

This patch adds kdump support for s390 to the kexec tool and enables the
"--load-panic" option. When loading the kdump kernel and ramdisk we add the
address of the crashkernel memory to the normal load address.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 kexec/arch/s390/kexec-image.c |   16 +++++++++++++---
 kexec/arch/s390/kexec-s390.c  |   24 +++++++++++++++++++++---
 2 files changed, 34 insertions(+), 6 deletions(-)

--- a/kexec/arch/s390/kexec-image.c
+++ b/kexec/arch/s390/kexec-image.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 #include <getopt.h>
 #include "../../kexec.h"
+#include "../../kexec-syscall.h"
 #include "kexec-s390.h"
 #include <arch/options.h>
 
@@ -32,6 +33,7 @@ image_s390_load(int argc, char **argv, c
 	int command_line_len;
 	off_t ramdisk_len;
 	unsigned int ramdisk_origin;
+	uint64_t crash_base, crash_end;
 	int opt;
 
 	static const struct option options[] =
@@ -47,6 +49,7 @@ image_s390_load(int argc, char **argv, c
 	command_line = NULL;
 	ramdisk_len = 0;
 	ramdisk_origin = 0;
+	crash_base = 0;
 
 	while ((opt = getopt_long(argc,argv,short_options,options,0)) != -1) {
 		switch(opt) {
@@ -71,10 +74,16 @@ image_s390_load(int argc, char **argv, c
 			return -1;
 		}
 	}
+	if (info->kexec_flags & KEXEC_ON_CRASH) {
+		if (parse_iomem_single("Crash kernel\n", &crash_base,
+				       &crash_end))
+			return -1;
+	}
 
 	/* Add kernel segment */
 	add_segment(info, kernel_buf + IMAGE_READ_OFFSET,
-		    kernel_size - IMAGE_READ_OFFSET, IMAGE_READ_OFFSET,
+		    kernel_size - IMAGE_READ_OFFSET,
+		    crash_base + IMAGE_READ_OFFSET,
 		    kernel_size - IMAGE_READ_OFFSET);
 
 	/* We do want to change the kernel image */
@@ -88,7 +97,8 @@ image_s390_load(int argc, char **argv, c
 			return -1;
 		}
 		ramdisk_origin = RAMDISK_ORIGIN_ADDR;
-		add_segment(info, rd_buffer, ramdisk_len, RAMDISK_ORIGIN_ADDR, ramdisk_len);
+		add_segment(info, rd_buffer, ramdisk_len,
+			    crash_base + RAMDISK_ORIGIN_ADDR, ramdisk_len);
 	}
 	
 	/* Register the ramdisk in the kernel. */
@@ -111,7 +121,7 @@ image_s390_load(int argc, char **argv, c
 		memcpy(krnl_buffer + COMMAND_LINE_OFFS, command_line, strlen(command_line));
 	}
 
-	info->entry = (void *) IMAGE_READ_OFFSET;
+	info->entry = (void *) crash_base + IMAGE_READ_OFFSET;
 
 	return 0;
 }
--- a/kexec/arch/s390/kexec-s390.c
+++ b/kexec/arch/s390/kexec-s390.c
@@ -37,8 +37,9 @@ static struct memory_range memory_range[
  */
 
 int get_memory_ranges(struct memory_range **range, int *ranges,
-		      unsigned long UNUSED(flags))
+		      unsigned long flags)
 {
+	char crash_kernel[] = "Crash kernel\n";
 	char sys_ram[] = "System RAM\n";
 	const char *iomem = proc_iomem();
 	FILE *fp;
@@ -62,7 +63,9 @@ int get_memory_ranges(struct memory_rang
 
 		sscanf(line,"%Lx-%Lx : %n", &start, &end, &cons);
 		str = line+cons;
-		if(memcmp(str,sys_ram,strlen(sys_ram)) == 0) {
+		if((memcmp(str,sys_ram,strlen(sys_ram)) == 0) ||
+		   ((flags & KEXEC_ON_CRASH) &&
+		   memcmp(str,crash_kernel,strlen(crash_kernel)) == 0)) {
 			memory_range[current_range].start = start;
 			memory_range[current_range].end = end;
 			memory_range[current_range].type = RANGE_RAM;
@@ -76,6 +79,18 @@ int get_memory_ranges(struct memory_rang
 	*range = memory_range;
 	*ranges = current_range;
 
+	if ((flags & KEXEC_ON_CRASH) && !(flags & KEXEC_PRESERVE_CONTEXT)) {
+		uint64_t start, end;
+
+		if (parse_iomem_single("Crash kernel\n", &start, &end)) {
+			fprintf(stderr, "parse_iomem_single failed.\n");
+			return -1;
+		}
+		if (start > mem_min)
+			mem_min = start;
+		if (end < mem_max)
+			mem_max = end;
+	}
 	return 0;
 }
 
@@ -112,5 +127,8 @@ void arch_update_purgatory(struct kexec_
 
 int is_crashkernel_mem_reserved(void)
 {
-	return 0; /* kdump is not supported on this platform (yet) */
+	uint64_t start, end;
+
+	return parse_iomem_single("Crash kernel\n", &start, &end) == 0 ?
+		(start != end) : 0;
 }


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-04 17:09 ` Michael Holzheu
@ 2011-07-05 20:26   ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-05 20:26 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: ebiederm, hbabu, mahesh, oomichi, horms, schwidefsky,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Mon, Jul 04, 2011 at 07:09:22PM +0200, Michael Holzheu wrote:
> This patch series adds kdump support for the s390 architecture (64 bit). There
> are a few common code changes necessary because the s390 implementation is
> different to other architectures in some points. Especially these common code
> patches (1-7) should be reviewed. Patch 8 "s390: kdump backend code" contains
> the s390 specific part. Patch 9 includes the necessary changes for the kexec
> tool.
> 
> In the following I describe the main differences of the s390 implementation:
> 
> The s390 kernel is not relocatable therefore the crashkernel memory is swapped
> with the area [0 - crashkernel memory] before the kdump kernel is started.
> Architectures other than s390 run the kdump kernel at a memory location that is
> disjunct to the standard location for the kernel image and to all memory that
> might be in use for I/O by the production system. The main reason for this
> seems to be that these architectures do not have a means to clear all ongoing
> I/O. If active memory of the production system is reused by the kdump kernel
> they run into memory corruption issues. On s390 with diagnose call 308 or boot
> (IPL) there is the possibility to stop all ongoing I/O. Therefore we can safely
> run the kdump kernel at the old location.
> 
> On s390 we do not create page tables for the crashkernel memory and use a
> memcpy_real() function to load the kdump kernel and ramdisk in kexec_load()
> system call.
> 
> On s390 we have external kdump triggers. For example stand-alone dump tools.
> The address range information of crashkernel memory is stored at a well defined
> storage location that can be used by the external dump triggers to find the
> kdump entry point. To export the address range for the crashkernel memory we
> introduce a new mechanism that we call meminfo. This allows to define checksum
> secured information in memory that is accessible via an s390 ABI defined
> storage address. The following information is currently stored via meminfo:
> * Crashkernel memory range
> * kexec segments for kdump
> * Pointer to vmcoreinfo note

I don't understand what is stand-alone dump tools and why the existing
mechanism of preparing ELF headers to describe all the above info
and just passing the address of header on kernel commnad line
(crashkernel=) will not work for s390. Introducing an entirely new
infrastructure for communicating the same information does not
sound too exciting.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-05 20:26   ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-05 20:26 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, schwidefsky, kexec

On Mon, Jul 04, 2011 at 07:09:22PM +0200, Michael Holzheu wrote:
> This patch series adds kdump support for the s390 architecture (64 bit). There
> are a few common code changes necessary because the s390 implementation is
> different to other architectures in some points. Especially these common code
> patches (1-7) should be reviewed. Patch 8 "s390: kdump backend code" contains
> the s390 specific part. Patch 9 includes the necessary changes for the kexec
> tool.
> 
> In the following I describe the main differences of the s390 implementation:
> 
> The s390 kernel is not relocatable therefore the crashkernel memory is swapped
> with the area [0 - crashkernel memory] before the kdump kernel is started.
> Architectures other than s390 run the kdump kernel at a memory location that is
> disjunct to the standard location for the kernel image and to all memory that
> might be in use for I/O by the production system. The main reason for this
> seems to be that these architectures do not have a means to clear all ongoing
> I/O. If active memory of the production system is reused by the kdump kernel
> they run into memory corruption issues. On s390 with diagnose call 308 or boot
> (IPL) there is the possibility to stop all ongoing I/O. Therefore we can safely
> run the kdump kernel at the old location.
> 
> On s390 we do not create page tables for the crashkernel memory and use a
> memcpy_real() function to load the kdump kernel and ramdisk in kexec_load()
> system call.
> 
> On s390 we have external kdump triggers. For example stand-alone dump tools.
> The address range information of crashkernel memory is stored at a well defined
> storage location that can be used by the external dump triggers to find the
> kdump entry point. To export the address range for the crashkernel memory we
> introduce a new mechanism that we call meminfo. This allows to define checksum
> secured information in memory that is accessible via an s390 ABI defined
> storage address. The following information is currently stored via meminfo:
> * Crashkernel memory range
> * kexec segments for kdump
> * Pointer to vmcoreinfo note

I don't understand what is stand-alone dump tools and why the existing
mechanism of preparing ELF headers to describe all the above info
and just passing the address of header on kernel commnad line
(crashkernel=) will not work for s390. Introducing an entirely new
infrastructure for communicating the same information does not
sound too exciting.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-05 20:26   ` Vivek Goyal
@ 2011-07-06  9:24     ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-06  9:24 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: ebiederm, hbabu, mahesh, oomichi, horms, schwidefsky,
	heiko.carstens, kexec, linux-kernel, linux-s390

Hello Vivec,

On Tue, 2011-07-05 at 16:26 -0400, Vivek Goyal wrote:
> On Mon, Jul 04, 2011 at 07:09:22PM +0200, Michael Holzheu wrote:

[snip]

> I don't understand what is stand-alone dump tools and 

S390 stand-alone dump tools are independent mini operating systems that
are installed on disks or tapes. When a dump should be created, these
stand-alone dump tools are booted. All that they do is to write the dump
(current memory plus the CPU registers) to the disk/tape device.

The advantage compared to kdump is that since they are freshly loaded
into memory they can't be overwritten in memory. Another advantage is
that since it is different code, it is much less likely that the dump
tool will run into the same problem than the previously crashed kernel.
Also the boot process ensures that the hardware is in a initialized
state. And last but not least, with the stand-alone dump tools you can
dump early kernel problems which is not possible using kdump, because
you can't dump before the kdump kernel has been loaded with kexec.

That were more or less the arguments, why we did not support kdump in
the past.

In order to increase dump reliability with kdump, we now implemented a
two stage approach. The stand-alone dump tools first check via meminfo,
if kdump is valid using checksums. If kdump is loaded and healthy it is
started. Otherwise the stand-alone dump tools create a full-blown
stand-alone dump.

With this approach we still keep our s390 dump reliability and gain the
great kdump features, e.g. distributor installer support, dump filtering
with makedumpfile, etc.

> why the existing
> mechanism of preparing ELF headers to describe all the above info
> and just passing the address of header on kernel commnad line
> (crashkernel=) will not work for s390. Introducing an entirely new
> infrastructure for communicating the same information does not
> sound too exciting.

We need the meminfo interface anyway for the two stage approach. The
stand-alone dump tools have to find and verify the kdump kernel in order
to start it. Therefore the interface is there and can be used. Also
creating the ELF header in the 2nd kernel is more flexible and easier
IMHO:
* You do not have to care about memory or CPU hotplug.
* You do not have to preallocate CPU crash notes etc.
* It works independently from the tool/mechanism that loads the kdump
kernel into memory. E.g. we have the idea to load the kdump kernel at
boot time into the crashkernel memory (not via the kexec_load system
call). That would solve the main kdump problems: The kdump kernel can't
be overwritten by I/O and also early kernel problems could then be
dumped using kdump.

What do you think?

Michael


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-06  9:24     ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-06  9:24 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, schwidefsky, kexec

Hello Vivec,

On Tue, 2011-07-05 at 16:26 -0400, Vivek Goyal wrote:
> On Mon, Jul 04, 2011 at 07:09:22PM +0200, Michael Holzheu wrote:

[snip]

> I don't understand what is stand-alone dump tools and 

S390 stand-alone dump tools are independent mini operating systems that
are installed on disks or tapes. When a dump should be created, these
stand-alone dump tools are booted. All that they do is to write the dump
(current memory plus the CPU registers) to the disk/tape device.

The advantage compared to kdump is that since they are freshly loaded
into memory they can't be overwritten in memory. Another advantage is
that since it is different code, it is much less likely that the dump
tool will run into the same problem than the previously crashed kernel.
Also the boot process ensures that the hardware is in a initialized
state. And last but not least, with the stand-alone dump tools you can
dump early kernel problems which is not possible using kdump, because
you can't dump before the kdump kernel has been loaded with kexec.

That were more or less the arguments, why we did not support kdump in
the past.

In order to increase dump reliability with kdump, we now implemented a
two stage approach. The stand-alone dump tools first check via meminfo,
if kdump is valid using checksums. If kdump is loaded and healthy it is
started. Otherwise the stand-alone dump tools create a full-blown
stand-alone dump.

With this approach we still keep our s390 dump reliability and gain the
great kdump features, e.g. distributor installer support, dump filtering
with makedumpfile, etc.

> why the existing
> mechanism of preparing ELF headers to describe all the above info
> and just passing the address of header on kernel commnad line
> (crashkernel=) will not work for s390. Introducing an entirely new
> infrastructure for communicating the same information does not
> sound too exciting.

We need the meminfo interface anyway for the two stage approach. The
stand-alone dump tools have to find and verify the kdump kernel in order
to start it. Therefore the interface is there and can be used. Also
creating the ELF header in the 2nd kernel is more flexible and easier
IMHO:
* You do not have to care about memory or CPU hotplug.
* You do not have to preallocate CPU crash notes etc.
* It works independently from the tool/mechanism that loads the kdump
kernel into memory. E.g. we have the idea to load the kdump kernel at
boot time into the crashkernel memory (not via the kexec_load system
call). That would solve the main kdump problems: The kdump kernel can't
be overwritten by I/O and also early kernel problems could then be
dumped using kdump.

What do you think?

Michael


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-06  9:24     ` Michael Holzheu
@ 2011-07-07 19:33       ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-07 19:33 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: ebiederm, hbabu, mahesh, oomichi, horms, schwidefsky,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Wed, Jul 06, 2011 at 11:24:47AM +0200, Michael Holzheu wrote:
> Hello Vivec,
> 
> On Tue, 2011-07-05 at 16:26 -0400, Vivek Goyal wrote:
> > On Mon, Jul 04, 2011 at 07:09:22PM +0200, Michael Holzheu wrote:
> 
> [snip]
> 
> > I don't understand what is stand-alone dump tools and 
> 
> S390 stand-alone dump tools are independent mini operating systems that
> are installed on disks or tapes. When a dump should be created, these
> stand-alone dump tools are booted. All that they do is to write the dump
> (current memory plus the CPU registers) to the disk/tape device.
> 
> The advantage compared to kdump is that since they are freshly loaded
> into memory they can't be overwritten in memory.

> Another advantage is
> that since it is different code, it is much less likely that the dump
> tool will run into the same problem than the previously crashed kernel.

I think in practice this is not really a problem. If your kernel
is not stable enough to even boot and copy a file, then most likely
it has not even been deployed. The very fact that a kernel has been
up and running verifies that it is a stable kernel for that machine
and is capable of capturing the dump.

> Also the boot process ensures that the hardware is in a initialized
> state.

Who makes sure that hardware is in initiliazed state? Kdump kernel,
stand alone kernel or BIOS.

> And last but not least, with the stand-alone dump tools you can
> dump early kernel problems which is not possible using kdump, because
> you can't dump before the kdump kernel has been loaded with kexec.
> 

That is one limitation but again if your kernel can't even boot,
it is not ready to ship and it is more of a development issue and
there are other ways to debug problems. So I would not worry too
much about it.

On a side note, few months back there were folks who were trying
to enhance bootloaders to be able to prepare basic environment so
that a kdump kernel can boot even in the event of early first
kernel boot.

> That were more or less the arguments, why we did not support kdump in
> the past.
> 
> In order to increase dump reliability with kdump, we now implemented a
> two stage approach. The stand-alone dump tools first check via meminfo,
> if kdump is valid using checksums. If kdump is loaded and healthy it is
> started. Otherwise the stand-alone dump tools create a full-blown
> stand-alone dump.

kexec-tools purgatory code also checks the checksum of loaded kernel
and other information and next kernel boot starts only if nothing
has been corrupted in first kernel. So this additional meminfo strucutres
and need of checksums sounds unnecessary. I think what you do need is
that somehow invoking second hook (s390 specific stand alone kernel)
in case primary kernel is corrupted.

> 
> With this approach we still keep our s390 dump reliability and gain the
> great kdump features, e.g. distributor installer support, dump filtering
> with makedumpfile, etc.
> 
> > why the existing
> > mechanism of preparing ELF headers to describe all the above info
> > and just passing the address of header on kernel commnad line
> > (crashkernel=) will not work for s390. Introducing an entirely new
> > infrastructure for communicating the same information does not
> > sound too exciting.
> 
> We need the meminfo interface anyway for the two stage approach. The
> stand-alone dump tools have to find and verify the kdump kernel in order
> to start it.

kexec-tools does this verification already. We verify the checksum of
all the loaded information in reserved area. So why introduce this
meminfo interface.

> Therefore the interface is there and can be used. Also
> creating the ELF header in the 2nd kernel is more flexible and easier
> IMHO:
> * You do not have to care about memory or CPU hotplug.

Reloading the kernel upon memory or cpu hotplug should be trivial. This
does not justify to move away from standard ELF interface and creation
of a new one.

> * You do not have to preallocate CPU crash notes etc.

Its a small per cpu area. Looks like otherwise you will create meminfo
areas otherwise.

> * It works independently from the tool/mechanism that loads the kdump
> kernel into memory. E.g. we have the idea to load the kdump kernel at
> boot time into the crashkernel memory (not via the kexec_load system
> call). That would solve the main kdump problems: The kdump kernel can't
> be overwritten by I/O and also early kernel problems could then be
> dumped using kdump.

Can you give more details how exactly it works. I know very little about
s390 dump mechanism.

When do you load kdump kernel and who does it?

Who gets the control first after crash?

To me it looked like that you regularly load kdump kernel and if that
is corrupted then somehow you boot standalone kernel. So corruption
of kdump kernel should not be a issue for you.

Do you load kdump kenrel from some tape/storage after system crash. Where
does bootloader lies and how do you make sure it is not corrupted and
associated device is in good condition.

To me we should not create a arch specific way of passing information
between kernels. Stand alone kernel should be able to parse the
ELF headers which contains all the relevant info. They have already
been checksum verified.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-07 19:33       ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-07 19:33 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, schwidefsky, kexec

On Wed, Jul 06, 2011 at 11:24:47AM +0200, Michael Holzheu wrote:
> Hello Vivec,
> 
> On Tue, 2011-07-05 at 16:26 -0400, Vivek Goyal wrote:
> > On Mon, Jul 04, 2011 at 07:09:22PM +0200, Michael Holzheu wrote:
> 
> [snip]
> 
> > I don't understand what is stand-alone dump tools and 
> 
> S390 stand-alone dump tools are independent mini operating systems that
> are installed on disks or tapes. When a dump should be created, these
> stand-alone dump tools are booted. All that they do is to write the dump
> (current memory plus the CPU registers) to the disk/tape device.
> 
> The advantage compared to kdump is that since they are freshly loaded
> into memory they can't be overwritten in memory.

> Another advantage is
> that since it is different code, it is much less likely that the dump
> tool will run into the same problem than the previously crashed kernel.

I think in practice this is not really a problem. If your kernel
is not stable enough to even boot and copy a file, then most likely
it has not even been deployed. The very fact that a kernel has been
up and running verifies that it is a stable kernel for that machine
and is capable of capturing the dump.

> Also the boot process ensures that the hardware is in a initialized
> state.

Who makes sure that hardware is in initiliazed state? Kdump kernel,
stand alone kernel or BIOS.

> And last but not least, with the stand-alone dump tools you can
> dump early kernel problems which is not possible using kdump, because
> you can't dump before the kdump kernel has been loaded with kexec.
> 

That is one limitation but again if your kernel can't even boot,
it is not ready to ship and it is more of a development issue and
there are other ways to debug problems. So I would not worry too
much about it.

On a side note, few months back there were folks who were trying
to enhance bootloaders to be able to prepare basic environment so
that a kdump kernel can boot even in the event of early first
kernel boot.

> That were more or less the arguments, why we did not support kdump in
> the past.
> 
> In order to increase dump reliability with kdump, we now implemented a
> two stage approach. The stand-alone dump tools first check via meminfo,
> if kdump is valid using checksums. If kdump is loaded and healthy it is
> started. Otherwise the stand-alone dump tools create a full-blown
> stand-alone dump.

kexec-tools purgatory code also checks the checksum of loaded kernel
and other information and next kernel boot starts only if nothing
has been corrupted in first kernel. So this additional meminfo strucutres
and need of checksums sounds unnecessary. I think what you do need is
that somehow invoking second hook (s390 specific stand alone kernel)
in case primary kernel is corrupted.

> 
> With this approach we still keep our s390 dump reliability and gain the
> great kdump features, e.g. distributor installer support, dump filtering
> with makedumpfile, etc.
> 
> > why the existing
> > mechanism of preparing ELF headers to describe all the above info
> > and just passing the address of header on kernel commnad line
> > (crashkernel=) will not work for s390. Introducing an entirely new
> > infrastructure for communicating the same information does not
> > sound too exciting.
> 
> We need the meminfo interface anyway for the two stage approach. The
> stand-alone dump tools have to find and verify the kdump kernel in order
> to start it.

kexec-tools does this verification already. We verify the checksum of
all the loaded information in reserved area. So why introduce this
meminfo interface.

> Therefore the interface is there and can be used. Also
> creating the ELF header in the 2nd kernel is more flexible and easier
> IMHO:
> * You do not have to care about memory or CPU hotplug.

Reloading the kernel upon memory or cpu hotplug should be trivial. This
does not justify to move away from standard ELF interface and creation
of a new one.

> * You do not have to preallocate CPU crash notes etc.

Its a small per cpu area. Looks like otherwise you will create meminfo
areas otherwise.

> * It works independently from the tool/mechanism that loads the kdump
> kernel into memory. E.g. we have the idea to load the kdump kernel at
> boot time into the crashkernel memory (not via the kexec_load system
> call). That would solve the main kdump problems: The kdump kernel can't
> be overwritten by I/O and also early kernel problems could then be
> dumped using kdump.

Can you give more details how exactly it works. I know very little about
s390 dump mechanism.

When do you load kdump kernel and who does it?

Who gets the control first after crash?

To me it looked like that you regularly load kdump kernel and if that
is corrupted then somehow you boot standalone kernel. So corruption
of kdump kernel should not be a issue for you.

Do you load kdump kenrel from some tape/storage after system crash. Where
does bootloader lies and how do you make sure it is not corrupted and
associated device is in good condition.

To me we should not create a arch specific way of passing information
between kernels. Stand alone kernel should be able to parse the
ELF headers which contains all the relevant info. They have already
been checksum verified.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-07 19:33       ` Vivek Goyal
@ 2011-07-08  9:01         ` Martin Schwidefsky
  -1 siblings, 0 replies; 112+ messages in thread
From: Martin Schwidefsky @ 2011-07-08  9:01 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Michael Holzheu, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Thu, 7 Jul 2011 15:33:21 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Wed, Jul 06, 2011 at 11:24:47AM +0200, Michael Holzheu wrote:
> > Hello Vivec,
> > 
> > On Tue, 2011-07-05 at 16:26 -0400, Vivek Goyal wrote:
> > > On Mon, Jul 04, 2011 at 07:09:22PM +0200, Michael Holzheu wrote:
> > 
> > [snip]
> > 
> > > I don't understand what is stand-alone dump tools and 
> > 
> > S390 stand-alone dump tools are independent mini operating systems that
> > are installed on disks or tapes. When a dump should be created, these
> > stand-alone dump tools are booted. All that they do is to write the dump
> > (current memory plus the CPU registers) to the disk/tape device.
> > 
> > The advantage compared to kdump is that since they are freshly loaded
> > into memory they can't be overwritten in memory.
> 
> > Another advantage is
> > that since it is different code, it is much less likely that the dump
> > tool will run into the same problem than the previously crashed kernel.
> 
> I think in practice this is not really a problem. If your kernel
> is not stable enough to even boot and copy a file, then most likely
> it has not even been deployed. The very fact that a kernel has been
> up and running verifies that it is a stable kernel for that machine
> and is capable of capturing the dump.

Yes, this is a theoretical consideration. In practice the kdump kernel will
work if it has not been corrupted.
 
> > Also the boot process ensures that the hardware is in a initialized
> > state.
> 
> Who makes sure that hardware is in initiliazed state? Kdump kernel,
> stand alone kernel or BIOS.

The machine does that on IPL. Call it the BIOS, although we use different
names for all that code that runs below the OS.

> > And last but not least, with the stand-alone dump tools you can
> > dump early kernel problems which is not possible using kdump, because
> > you can't dump before the kdump kernel has been loaded with kexec.
> > 
> 
> That is one limitation but again if your kernel can't even boot,
> it is not ready to ship and it is more of a development issue and
> there are other ways to debug problems. So I would not worry too
> much about it.
> 
> On a side note, few months back there were folks who were trying
> to enhance bootloaders to be able to prepare basic environment so
> that a kdump kernel can boot even in the event of early first
> kernel boot.

Well, here it is not only about the kernel code. The IPL could be
prevented by a setup problem as well. And if you can not get the system
to boot far enough to load the kdump kernel you are bust.
 
> > That were more or less the arguments, why we did not support kdump in
> > the past.
> > 
> > In order to increase dump reliability with kdump, we now implemented a
> > two stage approach. The stand-alone dump tools first check via meminfo,
> > if kdump is valid using checksums. If kdump is loaded and healthy it is
> > started. Otherwise the stand-alone dump tools create a full-blown
> > stand-alone dump.
> 
> kexec-tools purgatory code also checks the checksum of loaded kernel
> and other information and next kernel boot starts only if nothing
> has been corrupted in first kernel. So this additional meminfo strucutres
> and need of checksums sounds unnecessary. I think what you do need is
> that somehow invoking second hook (s390 specific stand alone kernel)
> in case primary kernel is corrupted.

Yes, but what do you do if the checksum tells you that the kexec kernel
has been compromised? If the independent stand-alone dumper does the
check it can fall back to the "dump-all" case.

> > 
> > With this approach we still keep our s390 dump reliability and gain the
> > great kdump features, e.g. distributor installer support, dump filtering
> > with makedumpfile, etc.
> > 
> > > why the existing
> > > mechanism of preparing ELF headers to describe all the above info
> > > and just passing the address of header on kernel commnad line
> > > (crashkernel=) will not work for s390. Introducing an entirely new
> > > infrastructure for communicating the same information does not
> > > sound too exciting.
> > 
> > We need the meminfo interface anyway for the two stage approach. The
> > stand-alone dump tools have to find and verify the kdump kernel in order
> > to start it.
> 
> kexec-tools does this verification already. We verify the checksum of
> all the loaded information in reserved area. So why introduce this
> meminfo interface.

Again, what do you do if the verification fails? Fail to dump the borked
system? Imho not a good option.

> > Therefore the interface is there and can be used. Also
> > creating the ELF header in the 2nd kernel is more flexible and easier
> > IMHO:
> > * You do not have to care about memory or CPU hotplug.
> 
> Reloading the kernel upon memory or cpu hotplug should be trivial. This
> does not justify to move away from standard ELF interface and creation
> of a new one.

We do not move away from the ELF interface, we just create the ELF headers
at a different time, no?

> > * You do not have to preallocate CPU crash notes etc.
> 
> Its a small per cpu area. Looks like otherwise you will create meminfo
> areas otherwise.

Probably doesn't matter.

> > * It works independently from the tool/mechanism that loads the kdump
> > kernel into memory. E.g. we have the idea to load the kdump kernel at
> > boot time into the crashkernel memory (not via the kexec_load system
> > call). That would solve the main kdump problems: The kdump kernel can't
> > be overwritten by I/O and also early kernel problems could then be
> > dumped using kdump.
> 
> Can you give more details how exactly it works. I know very little about
> s390 dump mechanism.

Before we started working on kdump the only way to get a dump is to boot
a stand-alone dumper. That is a small piece of assembler code that is
loaded into the first 64KB of memory (which is reserved for these kind of
things). This assembler code will then write everything to the dump device.
This works very reliable (which is of utmost importance to us) but has the
problem that it will be awfully slow for large memory sizes.
 
> When do you load kdump kernel and who does it?

If the crashed kernel is still operational enough to call panic it can
cause an IPL to the stand-alone dump tool (or do a reset of the I/O
subsystem and directly call kdump with the new code if the checksums
turn out ok).
If the crashed kernel is totally bust then the administrator has to do
a manual IPL from the disk where the stand-alone dumper has been installed.
 
> Who gets the control first after crash?

Depends. If the kernel can recognize the crash as such it can proceed to
execute the configured "on_panic" shutdown action. If the kernel is bust
the code loaded by the next IPL gets control. This can be a "normal" boot
or a stand-alone dumper.

> To me it looked like that you regularly load kdump kernel and if that
> is corrupted then somehow you boot standalone kernel. So corruption
> of kdump kernel should not be a issue for you.

It is the other way round. We load the standalone dumper, then check if
the kdump kernel looks good. Only if all the checksums turn out ok we
jump to the purgatory code from the standalone dump code.

> Do you load kdump kenrel from some tape/storage after system crash. Where
> does bootloader lies and how do you make sure it is not corrupted and
> associated device is in good condition.

The bootloader sits on the boot disk / tape. If you are able to boot from
that device then it is reasonable to assume that the device is in good
condition. To get a corrupted bootloader you'd need a stray I/O to that
device. The stand-alone dumper sits on its own disk / tape which is not in
use for normal operation. Very unlikely that this device will get hit.
 
> To me we should not create a arch specific way of passing information
> between kernels. Stand alone kernel should be able to parse the
> ELF headers which contains all the relevant info. They have already
> been checksum verified.

Ok, so this seems to be the main point of discussion. When to create the
ELF headers and how to pass all the required information from the crashed
system to the kdump kernel.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-08  9:01         ` Martin Schwidefsky
  0 siblings, 0 replies; 112+ messages in thread
From: Martin Schwidefsky @ 2011-07-08  9:01 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Michael Holzheu, kexec

On Thu, 7 Jul 2011 15:33:21 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Wed, Jul 06, 2011 at 11:24:47AM +0200, Michael Holzheu wrote:
> > Hello Vivec,
> > 
> > On Tue, 2011-07-05 at 16:26 -0400, Vivek Goyal wrote:
> > > On Mon, Jul 04, 2011 at 07:09:22PM +0200, Michael Holzheu wrote:
> > 
> > [snip]
> > 
> > > I don't understand what is stand-alone dump tools and 
> > 
> > S390 stand-alone dump tools are independent mini operating systems that
> > are installed on disks or tapes. When a dump should be created, these
> > stand-alone dump tools are booted. All that they do is to write the dump
> > (current memory plus the CPU registers) to the disk/tape device.
> > 
> > The advantage compared to kdump is that since they are freshly loaded
> > into memory they can't be overwritten in memory.
> 
> > Another advantage is
> > that since it is different code, it is much less likely that the dump
> > tool will run into the same problem than the previously crashed kernel.
> 
> I think in practice this is not really a problem. If your kernel
> is not stable enough to even boot and copy a file, then most likely
> it has not even been deployed. The very fact that a kernel has been
> up and running verifies that it is a stable kernel for that machine
> and is capable of capturing the dump.

Yes, this is a theoretical consideration. In practice the kdump kernel will
work if it has not been corrupted.
 
> > Also the boot process ensures that the hardware is in a initialized
> > state.
> 
> Who makes sure that hardware is in initiliazed state? Kdump kernel,
> stand alone kernel or BIOS.

The machine does that on IPL. Call it the BIOS, although we use different
names for all that code that runs below the OS.

> > And last but not least, with the stand-alone dump tools you can
> > dump early kernel problems which is not possible using kdump, because
> > you can't dump before the kdump kernel has been loaded with kexec.
> > 
> 
> That is one limitation but again if your kernel can't even boot,
> it is not ready to ship and it is more of a development issue and
> there are other ways to debug problems. So I would not worry too
> much about it.
> 
> On a side note, few months back there were folks who were trying
> to enhance bootloaders to be able to prepare basic environment so
> that a kdump kernel can boot even in the event of early first
> kernel boot.

Well, here it is not only about the kernel code. The IPL could be
prevented by a setup problem as well. And if you can not get the system
to boot far enough to load the kdump kernel you are bust.
 
> > That were more or less the arguments, why we did not support kdump in
> > the past.
> > 
> > In order to increase dump reliability with kdump, we now implemented a
> > two stage approach. The stand-alone dump tools first check via meminfo,
> > if kdump is valid using checksums. If kdump is loaded and healthy it is
> > started. Otherwise the stand-alone dump tools create a full-blown
> > stand-alone dump.
> 
> kexec-tools purgatory code also checks the checksum of loaded kernel
> and other information and next kernel boot starts only if nothing
> has been corrupted in first kernel. So this additional meminfo strucutres
> and need of checksums sounds unnecessary. I think what you do need is
> that somehow invoking second hook (s390 specific stand alone kernel)
> in case primary kernel is corrupted.

Yes, but what do you do if the checksum tells you that the kexec kernel
has been compromised? If the independent stand-alone dumper does the
check it can fall back to the "dump-all" case.

> > 
> > With this approach we still keep our s390 dump reliability and gain the
> > great kdump features, e.g. distributor installer support, dump filtering
> > with makedumpfile, etc.
> > 
> > > why the existing
> > > mechanism of preparing ELF headers to describe all the above info
> > > and just passing the address of header on kernel commnad line
> > > (crashkernel=) will not work for s390. Introducing an entirely new
> > > infrastructure for communicating the same information does not
> > > sound too exciting.
> > 
> > We need the meminfo interface anyway for the two stage approach. The
> > stand-alone dump tools have to find and verify the kdump kernel in order
> > to start it.
> 
> kexec-tools does this verification already. We verify the checksum of
> all the loaded information in reserved area. So why introduce this
> meminfo interface.

Again, what do you do if the verification fails? Fail to dump the borked
system? Imho not a good option.

> > Therefore the interface is there and can be used. Also
> > creating the ELF header in the 2nd kernel is more flexible and easier
> > IMHO:
> > * You do not have to care about memory or CPU hotplug.
> 
> Reloading the kernel upon memory or cpu hotplug should be trivial. This
> does not justify to move away from standard ELF interface and creation
> of a new one.

We do not move away from the ELF interface, we just create the ELF headers
at a different time, no?

> > * You do not have to preallocate CPU crash notes etc.
> 
> Its a small per cpu area. Looks like otherwise you will create meminfo
> areas otherwise.

Probably doesn't matter.

> > * It works independently from the tool/mechanism that loads the kdump
> > kernel into memory. E.g. we have the idea to load the kdump kernel at
> > boot time into the crashkernel memory (not via the kexec_load system
> > call). That would solve the main kdump problems: The kdump kernel can't
> > be overwritten by I/O and also early kernel problems could then be
> > dumped using kdump.
> 
> Can you give more details how exactly it works. I know very little about
> s390 dump mechanism.

Before we started working on kdump the only way to get a dump is to boot
a stand-alone dumper. That is a small piece of assembler code that is
loaded into the first 64KB of memory (which is reserved for these kind of
things). This assembler code will then write everything to the dump device.
This works very reliable (which is of utmost importance to us) but has the
problem that it will be awfully slow for large memory sizes.
 
> When do you load kdump kernel and who does it?

If the crashed kernel is still operational enough to call panic it can
cause an IPL to the stand-alone dump tool (or do a reset of the I/O
subsystem and directly call kdump with the new code if the checksums
turn out ok).
If the crashed kernel is totally bust then the administrator has to do
a manual IPL from the disk where the stand-alone dumper has been installed.
 
> Who gets the control first after crash?

Depends. If the kernel can recognize the crash as such it can proceed to
execute the configured "on_panic" shutdown action. If the kernel is bust
the code loaded by the next IPL gets control. This can be a "normal" boot
or a stand-alone dumper.

> To me it looked like that you regularly load kdump kernel and if that
> is corrupted then somehow you boot standalone kernel. So corruption
> of kdump kernel should not be a issue for you.

It is the other way round. We load the standalone dumper, then check if
the kdump kernel looks good. Only if all the checksums turn out ok we
jump to the purgatory code from the standalone dump code.

> Do you load kdump kenrel from some tape/storage after system crash. Where
> does bootloader lies and how do you make sure it is not corrupted and
> associated device is in good condition.

The bootloader sits on the boot disk / tape. If you are able to boot from
that device then it is reasonable to assume that the device is in good
condition. To get a corrupted bootloader you'd need a stray I/O to that
device. The stand-alone dumper sits on its own disk / tape which is not in
use for normal operation. Very unlikely that this device will get hit.
 
> To me we should not create a arch specific way of passing information
> between kernels. Stand alone kernel should be able to parse the
> ELF headers which contains all the relevant info. They have already
> been checksum verified.

Ok, so this seems to be the main point of discussion. When to create the
ELF headers and how to pass all the required information from the crashed
system to the kdump kernel.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-07 19:33       ` Vivek Goyal
@ 2011-07-08 13:04         ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-08 13:04 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, schwidefsky, kexec

Hello Vivek,

On Thu, 2011-07-07 at 15:33 -0400, Vivek Goyal wrote:
> > Another advantage is
> > that since it is different code, it is much less likely that the dump
> > tool will run into the same problem than the previously crashed kernel.
> 
> I think in practice this is not really a problem. If your kernel
> is not stable enough to even boot and copy a file, then most likely
> it has not even been deployed. The very fact that a kernel has been
> up and running verifies that it is a stable kernel for that machine
> and is capable of capturing the dump.

I don't want to argue, about probabilities. Even if we gain only a
little more reliability this is important for us. Don't forget that we
write software for mainframes. We accept that the last 0.1 percent of
reliability can be very expensive compared to the first 99.9 percent.

[snip]

> > And last but not least, with the stand-alone dump tools you can
> > dump early kernel problems which is not possible using kdump, because
> > you can't dump before the kdump kernel has been loaded with kexec.
> > 
> 
> That is one limitation but again if your kernel can't even boot,
> it is not ready to ship and it is more of a development issue and
> there are other ways to debug problems. So I would not worry too
> much about it.

We worry about that. See the comment above regarding the 100 percent.

> On a side note, few months back there were folks who were trying
> to enhance bootloaders to be able to prepare basic environment so
> that a kdump kernel can boot even in the event of early first
> kernel boot.

This is one more argument to create the ELF header in the 2nd kernel.
With our approach loading the kdump kernel at boot time is almost
trivial.

Example (e.g. crashkernel=xxxM@256M):

1. The boot loader loads standard kernel and kdump kernel into memory.
The kdump kernel is loaded into crashkernel memory to 256M. No more
setup (e.g. creating ELF headers) is necessary.
2. We could add a kernel parameter "kexec_load=<segm addr>,<segm
size>, ..." that does an internal kexec_load(). After this kernel
parameter is processed, kdump is armed.

What do you think?

> > That were more or less the arguments, why we did not support kdump in
> > the past.
> > 
> > In order to increase dump reliability with kdump, we now implemented a
> > two stage approach. The stand-alone dump tools first check via meminfo,
> > if kdump is valid using checksums. If kdump is loaded and healthy it is
> > started. Otherwise the stand-alone dump tools create a full-blown
> > stand-alone dump.
> 
> kexec-tools purgatory code also checks the checksum of loaded kernel
> and other information and next kernel boot starts only if nothing
> has been corrupted in first kernel. 

Can you point me to the code where this is done and from where in the
kernel that code is called? Currently with our implementation we do not
use any purgatory code from kexec tools.

> So this additional meminfo strucutres
> and need of checksums sounds unnecessary. I think what you do need is
> that somehow invoking second hook (s390 specific stand alone kernel)
> in case primary kernel is corrupted.
> > 
> > With this approach we still keep our s390 dump reliability and gain the
> > great kdump features, e.g. distributor installer support, dump filtering
> > with makedumpfile, etc.
> > 
> > > why the existing
> > > mechanism of preparing ELF headers to describe all the above info
> > > and just passing the address of header on kernel commnad line
> > > (crashkernel=) will not work for s390. Introducing an entirely new
> > > infrastructure for communicating the same information does not
> > > sound too exciting.
> > 
> > We need the meminfo interface anyway for the two stage approach. The
> > stand-alone dump tools have to find and verify the kdump kernel in order
> > to start it.
> 
> kexec-tools does this verification already. We verify the checksum of
> all the loaded information in reserved area. So why introduce this
> meminfo interface.

Ok, where is this done and when?

> > Therefore the interface is there and can be used. Also
> > creating the ELF header in the 2nd kernel is more flexible and easier
> > IMHO:
> > * You do not have to care about memory or CPU hotplug.
> 
> Reloading the kernel upon memory or cpu hotplug should be trivial. This
> does not justify to move away from standard ELF interface and creation
> of a new one.
> 
> > * You do not have to preallocate CPU crash notes etc.
> 
> Its a small per cpu area. Looks like otherwise you will create meminfo
> areas otherwise.
> 
> > * It works independently from the tool/mechanism that loads the kdump
> > kernel into memory. E.g. we have the idea to load the kdump kernel at
> > boot time into the crashkernel memory (not via the kexec_load system
> > call). That would solve the main kdump problems: The kdump kernel can't
> > be overwritten by I/O and also early kernel problems could then be
> > dumped using kdump.
> 
> Can you give more details how exactly it works. I know very little about
> s390 dump mechanism.

Maybe I confused you here. What I wanted to describe is the following
idea:
1. The running production kernel starts with "crashkernel=" and reserves
memory for kdump. No kdump is loaded with kexec.
2. The system crashes
3. To create the dump, a prepared dump disk is booted. The boot loader
loads the kdump kernel into crashkernel memory.
4. The boot loader starts kdump kernel on s390 with entry point
<crashkernel base> + 0x10008
5. The kdump kernel creates ELF header etc...

So this is simple for the boot loader code because no preparation steps
like creating the ELF header are required. This is similar to scenario
of pre-loading the kdump kernel together with the standard kernel at
startup that I described above.

> 
> When do you load kdump kernel and who does it?

Currently we load the kdump kernel with kexec like it is done on all
other architectures. The other options I described above are currently
just ideas that we have for the future.

> Who gets the control first after crash?
> 
> To me it looked like that you regularly load kdump kernel and if that
> is corrupted then somehow you boot standalone kernel. So corruption
> of kdump kernel should not be a issue for you.

As Martin already said: It can be the other way round. The stand-alone
dump tool gets first control. We trust this code because it is freshly
loaded and has a different code base. This code verifies the kdump setup
and jumps into the pre-loaded kdump (crashkernel base + 0x10008) if
everything is ok. Otherwise it creates a traditional s390 dump.

> 
> Do you load kdump kenrel from some tape/storage after system crash. Where
> does bootloader lies and how do you make sure it is not corrupted and
> associated device is in good condition.
> 
> To me we should not create a arch specific way of passing information
> between kernels.

I agree that a common code solution would be better.

Michael


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-08 13:04         ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-08 13:04 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, schwidefsky, kexec

Hello Vivek,

On Thu, 2011-07-07 at 15:33 -0400, Vivek Goyal wrote:
> > Another advantage is
> > that since it is different code, it is much less likely that the dump
> > tool will run into the same problem than the previously crashed kernel.
> 
> I think in practice this is not really a problem. If your kernel
> is not stable enough to even boot and copy a file, then most likely
> it has not even been deployed. The very fact that a kernel has been
> up and running verifies that it is a stable kernel for that machine
> and is capable of capturing the dump.

I don't want to argue, about probabilities. Even if we gain only a
little more reliability this is important for us. Don't forget that we
write software for mainframes. We accept that the last 0.1 percent of
reliability can be very expensive compared to the first 99.9 percent.

[snip]

> > And last but not least, with the stand-alone dump tools you can
> > dump early kernel problems which is not possible using kdump, because
> > you can't dump before the kdump kernel has been loaded with kexec.
> > 
> 
> That is one limitation but again if your kernel can't even boot,
> it is not ready to ship and it is more of a development issue and
> there are other ways to debug problems. So I would not worry too
> much about it.

We worry about that. See the comment above regarding the 100 percent.

> On a side note, few months back there were folks who were trying
> to enhance bootloaders to be able to prepare basic environment so
> that a kdump kernel can boot even in the event of early first
> kernel boot.

This is one more argument to create the ELF header in the 2nd kernel.
With our approach loading the kdump kernel at boot time is almost
trivial.

Example (e.g. crashkernel=xxxM@256M):

1. The boot loader loads standard kernel and kdump kernel into memory.
The kdump kernel is loaded into crashkernel memory to 256M. No more
setup (e.g. creating ELF headers) is necessary.
2. We could add a kernel parameter "kexec_load=<segm addr>,<segm
size>, ..." that does an internal kexec_load(). After this kernel
parameter is processed, kdump is armed.

What do you think?

> > That were more or less the arguments, why we did not support kdump in
> > the past.
> > 
> > In order to increase dump reliability with kdump, we now implemented a
> > two stage approach. The stand-alone dump tools first check via meminfo,
> > if kdump is valid using checksums. If kdump is loaded and healthy it is
> > started. Otherwise the stand-alone dump tools create a full-blown
> > stand-alone dump.
> 
> kexec-tools purgatory code also checks the checksum of loaded kernel
> and other information and next kernel boot starts only if nothing
> has been corrupted in first kernel. 

Can you point me to the code where this is done and from where in the
kernel that code is called? Currently with our implementation we do not
use any purgatory code from kexec tools.

> So this additional meminfo strucutres
> and need of checksums sounds unnecessary. I think what you do need is
> that somehow invoking second hook (s390 specific stand alone kernel)
> in case primary kernel is corrupted.
> > 
> > With this approach we still keep our s390 dump reliability and gain the
> > great kdump features, e.g. distributor installer support, dump filtering
> > with makedumpfile, etc.
> > 
> > > why the existing
> > > mechanism of preparing ELF headers to describe all the above info
> > > and just passing the address of header on kernel commnad line
> > > (crashkernel=) will not work for s390. Introducing an entirely new
> > > infrastructure for communicating the same information does not
> > > sound too exciting.
> > 
> > We need the meminfo interface anyway for the two stage approach. The
> > stand-alone dump tools have to find and verify the kdump kernel in order
> > to start it.
> 
> kexec-tools does this verification already. We verify the checksum of
> all the loaded information in reserved area. So why introduce this
> meminfo interface.

Ok, where is this done and when?

> > Therefore the interface is there and can be used. Also
> > creating the ELF header in the 2nd kernel is more flexible and easier
> > IMHO:
> > * You do not have to care about memory or CPU hotplug.
> 
> Reloading the kernel upon memory or cpu hotplug should be trivial. This
> does not justify to move away from standard ELF interface and creation
> of a new one.
> 
> > * You do not have to preallocate CPU crash notes etc.
> 
> Its a small per cpu area. Looks like otherwise you will create meminfo
> areas otherwise.
> 
> > * It works independently from the tool/mechanism that loads the kdump
> > kernel into memory. E.g. we have the idea to load the kdump kernel at
> > boot time into the crashkernel memory (not via the kexec_load system
> > call). That would solve the main kdump problems: The kdump kernel can't
> > be overwritten by I/O and also early kernel problems could then be
> > dumped using kdump.
> 
> Can you give more details how exactly it works. I know very little about
> s390 dump mechanism.

Maybe I confused you here. What I wanted to describe is the following
idea:
1. The running production kernel starts with "crashkernel=" and reserves
memory for kdump. No kdump is loaded with kexec.
2. The system crashes
3. To create the dump, a prepared dump disk is booted. The boot loader
loads the kdump kernel into crashkernel memory.
4. The boot loader starts kdump kernel on s390 with entry point
<crashkernel base> + 0x10008
5. The kdump kernel creates ELF header etc...

So this is simple for the boot loader code because no preparation steps
like creating the ELF header are required. This is similar to scenario
of pre-loading the kdump kernel together with the standard kernel at
startup that I described above.

> 
> When do you load kdump kernel and who does it?

Currently we load the kdump kernel with kexec like it is done on all
other architectures. The other options I described above are currently
just ideas that we have for the future.

> Who gets the control first after crash?
> 
> To me it looked like that you regularly load kdump kernel and if that
> is corrupted then somehow you boot standalone kernel. So corruption
> of kdump kernel should not be a issue for you.

As Martin already said: It can be the other way round. The stand-alone
dump tool gets first control. We trust this code because it is freshly
loaded and has a different code base. This code verifies the kdump setup
and jumps into the pre-loaded kdump (crashkernel base + 0x10008) if
everything is ok. Otherwise it creates a traditional s390 dump.

> 
> Do you load kdump kenrel from some tape/storage after system crash. Where
> does bootloader lies and how do you make sure it is not corrupted and
> associated device is in good condition.
> 
> To me we should not create a arch specific way of passing information
> between kernels.

I agree that a common code solution would be better.

Michael


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-07 19:33       ` Vivek Goyal
                         ` (2 preceding siblings ...)
  (?)
@ 2011-07-08 14:02       ` Michael Holzheu
  2011-07-11 14:07           ` Vivek Goyal
  -1 siblings, 1 reply; 112+ messages in thread
From: Michael Holzheu @ 2011-07-08 14:02 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: ebiederm, hbabu, mahesh, oomichi, horms, schwidefsky,
	heiko.carstens, kexec, linux-kernel, linux-s390

[-- Attachment #1: Type: text/plain, Size: 5769 bytes --]

Hello Vivek,

I attached a document where the s390 port is described in more detail.
Perhaps this helps you to understand what want and what we are doing. If
not - just delete it :-)

Michael

On Thu, 2011-07-07 at 15:33 -0400, Vivek Goyal wrote:
> On Wed, Jul 06, 2011 at 11:24:47AM +0200, Michael Holzheu wrote:
> > Hello Vivec,
> > 
> > On Tue, 2011-07-05 at 16:26 -0400, Vivek Goyal wrote:
> > > On Mon, Jul 04, 2011 at 07:09:22PM +0200, Michael Holzheu wrote:
> > 
> > [snip]
> > 
> > > I don't understand what is stand-alone dump tools and 
> > 
> > S390 stand-alone dump tools are independent mini operating systems that
> > are installed on disks or tapes. When a dump should be created, these
> > stand-alone dump tools are booted. All that they do is to write the dump
> > (current memory plus the CPU registers) to the disk/tape device.
> > 
> > The advantage compared to kdump is that since they are freshly loaded
> > into memory they can't be overwritten in memory.
> 
> > Another advantage is
> > that since it is different code, it is much less likely that the dump
> > tool will run into the same problem than the previously crashed kernel.
> 
> I think in practice this is not really a problem. If your kernel
> is not stable enough to even boot and copy a file, then most likely
> it has not even been deployed. The very fact that a kernel has been
> up and running verifies that it is a stable kernel for that machine
> and is capable of capturing the dump.
> 
> > Also the boot process ensures that the hardware is in a initialized
> > state.
> 
> Who makes sure that hardware is in initiliazed state? Kdump kernel,
> stand alone kernel or BIOS.
> 
> > And last but not least, with the stand-alone dump tools you can
> > dump early kernel problems which is not possible using kdump, because
> > you can't dump before the kdump kernel has been loaded with kexec.
> > 
> 
> That is one limitation but again if your kernel can't even boot,
> it is not ready to ship and it is more of a development issue and
> there are other ways to debug problems. So I would not worry too
> much about it.
> 
> On a side note, few months back there were folks who were trying
> to enhance bootloaders to be able to prepare basic environment so
> that a kdump kernel can boot even in the event of early first
> kernel boot.
> 
> > That were more or less the arguments, why we did not support kdump in
> > the past.
> > 
> > In order to increase dump reliability with kdump, we now implemented a
> > two stage approach. The stand-alone dump tools first check via meminfo,
> > if kdump is valid using checksums. If kdump is loaded and healthy it is
> > started. Otherwise the stand-alone dump tools create a full-blown
> > stand-alone dump.
> 
> kexec-tools purgatory code also checks the checksum of loaded kernel
> and other information and next kernel boot starts only if nothing
> has been corrupted in first kernel. So this additional meminfo strucutres
> and need of checksums sounds unnecessary. I think what you do need is
> that somehow invoking second hook (s390 specific stand alone kernel)
> in case primary kernel is corrupted.
> 
> > 
> > With this approach we still keep our s390 dump reliability and gain the
> > great kdump features, e.g. distributor installer support, dump filtering
> > with makedumpfile, etc.
> > 
> > > why the existing
> > > mechanism of preparing ELF headers to describe all the above info
> > > and just passing the address of header on kernel commnad line
> > > (crashkernel=) will not work for s390. Introducing an entirely new
> > > infrastructure for communicating the same information does not
> > > sound too exciting.
> > 
> > We need the meminfo interface anyway for the two stage approach. The
> > stand-alone dump tools have to find and verify the kdump kernel in order
> > to start it.
> 
> kexec-tools does this verification already. We verify the checksum of
> all the loaded information in reserved area. So why introduce this
> meminfo interface.
> 
> > Therefore the interface is there and can be used. Also
> > creating the ELF header in the 2nd kernel is more flexible and easier
> > IMHO:
> > * You do not have to care about memory or CPU hotplug.
> 
> Reloading the kernel upon memory or cpu hotplug should be trivial. This
> does not justify to move away from standard ELF interface and creation
> of a new one.
> 
> > * You do not have to preallocate CPU crash notes etc.
> 
> Its a small per cpu area. Looks like otherwise you will create meminfo
> areas otherwise.
> 
> > * It works independently from the tool/mechanism that loads the kdump
> > kernel into memory. E.g. we have the idea to load the kdump kernel at
> > boot time into the crashkernel memory (not via the kexec_load system
> > call). That would solve the main kdump problems: The kdump kernel can't
> > be overwritten by I/O and also early kernel problems could then be
> > dumped using kdump.
> 
> Can you give more details how exactly it works. I know very little about
> s390 dump mechanism.
> 
> When do you load kdump kernel and who does it?
> 
> Who gets the control first after crash?
> 
> To me it looked like that you regularly load kdump kernel and if that
> is corrupted then somehow you boot standalone kernel. So corruption
> of kdump kernel should not be a issue for you.
> 
> Do you load kdump kenrel from some tape/storage after system crash. Where
> does bootloader lies and how do you make sure it is not corrupted and
> associated device is in good condition.
> 
> To me we should not create a arch specific way of passing information
> between kernels. Stand alone kernel should be able to parse the
> ELF headers which contains all the relevant info. They have already
> been checksum verified.
> 
> Thanks
> Vivek


[-- Attachment #2: kdump_s390_port.pdf --]
[-- Type: application/pdf, Size: 108392 bytes --]

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-07 19:33       ` Vivek Goyal
                         ` (3 preceding siblings ...)
  (?)
@ 2011-07-09 17:58       ` Valdis.Kletnieks
  2011-07-12 13:52           ` Vivek Goyal
  -1 siblings, 1 reply; 112+ messages in thread
From: Valdis.Kletnieks @ 2011-07-09 17:58 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Michael Holzheu, ebiederm, hbabu, mahesh, oomichi, horms,
	schwidefsky, heiko.carstens, kexec, linux-kernel, linux-s390

[-- Attachment #1: Type: text/plain, Size: 2763 bytes --]

On Thu, 07 Jul 2011 15:33:21 EDT, Vivek Goyal said:
> On Wed, Jul 06, 2011 at 11:24:47AM +0200, Michael Holzheu wrote:

> > S390 stand-alone dump tools are independent mini operating systems that
> > are installed on disks or tapes. When a dump should be created, these
> > stand-alone dump tools are booted. All that they do is to write the dump
> > (current memory plus the CPU registers) to the disk/tape device.
> > 
> > The advantage compared to kdump is that since they are freshly loaded
> > into memory they can't be overwritten in memory.
> 
> > Another advantage is
> > that since it is different code, it is much less likely that the dump
> > tool will run into the same problem than the previously crashed kernel.
> 
> I think in practice this is not really a problem. If your kernel
> is not stable enough to even boot and copy a file, then most likely
> it has not even been deployed. The very fact that a kernel has been
> up and running verifies that it is a stable kernel for that machine
> and is capable of capturing the dump.

Vivek: I used to do VM/XA on S/390 boxes for a living, and that's *not* where
Michael is coming from.

What the standalone dump code does is take a system that may have the moral
equivalent of 256 separate PCI buses, several hundred disks all visible in
multipath configurations, dozens of other devices, and as long as you can find
*one* console and *one* tape/disk drive that works, you can capture a dump.

More than once in my career, I got into a situation where the production system
would hang - and booting off another disk that contained an older copy with
maybe a few less patches would *also* hang.  VM/XA would simply *not run*.
Booting the standalone dump utility (which shared zero code with VM/XA, and did
*much* less initialization of I/O devices not needed for the actual dump) would
work just fine.  This would get me a dump that would show that we had a
(usually) hardware issue - either we were tripping over an errata that *no*
released version of VM/XA had a workaround for, or outright defective hardware.

For the same efficiency reasons that Linux doesn't do a lot of checking for
"can never happen" cases, VM/XA doesn't check some things. So when busted
hardware would present logically impossible combinations of status bits (for
instance, "device still connected" but "I/O bus disconnected"), Bad Things
would happen.  Booting a tiny dump program that never even *tried* to look at
the bad bits posted by the miscreant hardware would allow you to get the info
you needed to debug it.

*THAT* is the use case -  when you have one customer out there in East Podunk
who is consistently managing to hang their system so hard you can't get enough
info out of it to figure out what's broken.



[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-08 14:02       ` Michael Holzheu
@ 2011-07-11 14:07           ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-11 14:07 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: ebiederm, hbabu, mahesh, oomichi, horms, schwidefsky,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Fri, Jul 08, 2011 at 04:02:18PM +0200, Michael Holzheu wrote:
> Hello Vivek,
> 
> I attached a document where the s390 port is described in more detail.
> Perhaps this helps you to understand what want and what we are doing. If
> not - just delete it :-)
> 

Michael,

Thanks for the documentation. I have gone through it quickly and there
are some parts I am still missing.

On x86, after the kernel crash we jump to purgatory code which does the
checksum verification of all the loaded segments and if everything is
fine, it jumps to kdump kernel's entry point and second kernel boots.

On s390, looks like after purgatory, control is going to some other
piece of code (which does IPL?), and that code decides whether to
start kdump kernel or launch stand alone kernel as backup plan?

If yes, is that code whic does the IPL, also loaded in crashkernel memory
as part of kdump kernel? If not, how does kexec-tools come to know
where to jump after doing checksum on loaded kernel?

Thanks
Vivek


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-11 14:07           ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-11 14:07 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, schwidefsky, kexec

On Fri, Jul 08, 2011 at 04:02:18PM +0200, Michael Holzheu wrote:
> Hello Vivek,
> 
> I attached a document where the s390 port is described in more detail.
> Perhaps this helps you to understand what want and what we are doing. If
> not - just delete it :-)
> 

Michael,

Thanks for the documentation. I have gone through it quickly and there
are some parts I am still missing.

On x86, after the kernel crash we jump to purgatory code which does the
checksum verification of all the loaded segments and if everything is
fine, it jumps to kdump kernel's entry point and second kernel boots.

On s390, looks like after purgatory, control is going to some other
piece of code (which does IPL?), and that code decides whether to
start kdump kernel or launch stand alone kernel as backup plan?

If yes, is that code whic does the IPL, also loaded in crashkernel memory
as part of kdump kernel? If not, how does kexec-tools come to know
where to jump after doing checksum on loaded kernel?

Thanks
Vivek


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-08  9:01         ` Martin Schwidefsky
@ 2011-07-11 14:42           ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-11 14:42 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Michael Holzheu, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Fri, Jul 08, 2011 at 11:01:21AM +0200, Martin Schwidefsky wrote:

[..]
> > 
> > kexec-tools purgatory code also checks the checksum of loaded kernel
> > and other information and next kernel boot starts only if nothing
> > has been corrupted in first kernel. So this additional meminfo strucutres
> > and need of checksums sounds unnecessary. I think what you do need is
> > that somehow invoking second hook (s390 specific stand alone kernel)
> > in case primary kernel is corrupted.
> 
> Yes, but what do you do if the checksum tells you that the kexec kernel
> has been compromised? If the independent stand-alone dumper does the
> check it can fall back to the "dump-all" case.

So this independent dump (which takes the decision whether to continue
to boot kdump kernel or stand alone dumper) is loaded where?  On x86,
every thing is loaded in crashkernel memory and at run time we update
purgatory with entry point of kernel.

I guess you could write s390 specific purgatory code where you do
the checksum on loaded kdump kernel and if it corrupted, then you
can continue to jump to boot stand alone kernel.

BTW, you seem to have capability of doing IPL of stand alone kernel
from disk/tape after kernel crash. If yes, then why not IPL the
regular linux kernel in case its copy in memory is corrupted.

What happens if kdump kernel is not corrupted and later it fails to boot
due to some platform issue or device driver issue etc? I am assuming
that dump capture will fail. If yes, then backup mechanism is designed
only to protect against kdump kernel's corruption while loaded in
memory?

In Michael's doc, I noticed he talked about unmapping the crashkernel
memory so that kernel. That should protect against kernel but he
mentioned about the possibility of device being able to DMA to said
memory reason. I am wondering that is it possible to program IOMMU
in such a way that any DMA attempt to said memory reason fails. If
yes, then I guess corruption problem will be solved without one
being worried about crating a backup plan for stand alone kernel and
one can just focus on making kdump kernel work.

> 
> > > 
> > > With this approach we still keep our s390 dump reliability and gain the
> > > great kdump features, e.g. distributor installer support, dump filtering
> > > with makedumpfile, etc.

So reliability only comes from the fact that stand alone kernel is booted
from the disk? So as long as kdump kernel is not corrupted, it is as
realiable as stand alone kernel?

How many a time in practice we have run into kdump kernel corruption
issues? Will unmapping from kernel page tables and doing something at
IOMMU level not take care of that issue?

> > > 
> > > > why the existing
> > > > mechanism of preparing ELF headers to describe all the above info
> > > > and just passing the address of header on kernel commnad line
> > > > (crashkernel=) will not work for s390. Introducing an entirely new
> > > > infrastructure for communicating the same information does not
> > > > sound too exciting.
> > > 
> > > We need the meminfo interface anyway for the two stage approach. The
> > > stand-alone dump tools have to find and verify the kdump kernel in order
> > > to start it.

kexec-tools purgatory code already has the checksum logic. So you don't
have to redo that in stand alone tools. I think you probably need to
s390 specic purgatory and jump to IPLing stand alone kernel if kdump
kernel is corrupted instead of rebooting back or spinning infinitely
in the loop/

> > 
> > kexec-tools does this verification already. We verify the checksum of
> > all the loaded information in reserved area. So why introduce this
> > meminfo interface.
> 
> Again, what do you do if the verification fails? Fail to dump the borked
> system? Imho not a good option.

On regular systems we did not have any backup plan so IIRC, we spin in
infinite loop. 

If one can do something about it, fine. But this again takes me back to
original question, then instead of creating backup plan, why not IPL
the kdump kernel from disk/tape the way you do for stand alone kernels.

> 
> > > Therefore the interface is there and can be used. Also
> > > creating the ELF header in the 2nd kernel is more flexible and easier
> > > IMHO:
> > > * You do not have to care about memory or CPU hotplug.
> > 
> > Reloading the kernel upon memory or cpu hotplug should be trivial. This
> > does not justify to move away from standard ELF interface and creation
> > of a new one.
> 
> We do not move away from the ELF interface, we just create the ELF headers
> at a different time, no?

Existing kernel already provides a way to communicate relevant information
to new kernel/binary about the first kernel and that is through ELF. You
are moving away from that and creating one more interface, meminfo to
get all the info about first kernel. What's wrong with continue parsing
ELF to get all the needed info. Is there any piece of information missing
which you require?

> 
> > > * You do not have to preallocate CPU crash notes etc.
> > 
> > Its a small per cpu area. Looks like otherwise you will create meminfo
> > areas otherwise.
> 
> Probably doesn't matter.
> 
> > > * It works independently from the tool/mechanism that loads the kdump
> > > kernel into memory. E.g. we have the idea to load the kdump kernel at
> > > boot time into the crashkernel memory (not via the kexec_load system
> > > call). That would solve the main kdump problems: The kdump kernel can't
> > > be overwritten by I/O and also early kernel problems could then be
> > > dumped using kdump.

So looks like you are loading two kernels at a time. One primary kernel
and other kernel in crashkernel memory area. But that would solve only
early crash dump problem and not the corruption problem?

I think we are trying to solve multiple problems at one go. We want
the regular capability to boot a kdump kernel and also solve the problem
of eary boot crash.

Why not solve the bigger problem in first step (and that is capturing
filtered dump of big RAM systems fast) and do the integration with
regular kexec-tools (create ELF headers etc) and s390 specific purgatory
code. 

Once all this is done, then you can look at how to capture early 
kernel crashes (if it turns out to be a real problem).

> > 
> > Can you give more details how exactly it works. I know very little about
> > s390 dump mechanism.
> 
> Before we started working on kdump the only way to get a dump is to boot
> a stand-alone dumper. That is a small piece of assembler code that is
> loaded into the first 64KB of memory (which is reserved for these kind of
> things). This assembler code will then write everything to the dump device.
> This works very reliable (which is of utmost importance to us) but has the
> problem that it will be awfully slow for large memory sizes.

When and who loads this assembler code into memory and how do we make
sure this code is not corrupted.

I got the part about being slow because you have to write specific
drivers for saving dump and you don't have filtering capabilty. In
today's big memory systems it makes sense to reuse kdump's capability
to use first kernel's drivers and filtering in user space.

>  
> > When do you load kdump kernel and who does it?
> 
> If the crashed kernel is still operational enough to call panic it can
> cause an IPL to the stand-alone dump tool (or do a reset of the I/O
> subsystem and directly call kdump with the new code if the checksums
> turn out ok).
> If the crashed kernel is totally bust then the administrator has to do
> a manual IPL from the disk where the stand-alone dumper has been installed.
>  
> > Who gets the control first after crash?
> 
> Depends. If the kernel can recognize the crash as such it can proceed to
> execute the configured "on_panic" shutdown action. If the kernel is bust
> the code loaded by the next IPL gets control. This can be a "normal" boot
> or a stand-alone dumper.
> 
> > To me it looked like that you regularly load kdump kernel and if that
> > is corrupted then somehow you boot standalone kernel. So corruption
> > of kdump kernel should not be a issue for you.
> 
> It is the other way round. We load the standalone dumper, then check if
> the kdump kernel looks good. Only if all the checksums turn out ok we
> jump to the purgatory code from the standalone dump code.

Ok. So again why not reuse the checksump capability of kexec-tools and
instead of infinite looping you can jump to stand alone tools + IPL etc.
I understand this will require a tighter integration with kexec-tools
and using ELF header mechanism and will not cover the early kernel
crashes.

> 
> > Do you load kdump kenrel from some tape/storage after system crash. Where
> > does bootloader lies and how do you make sure it is not corrupted and
> > associated device is in good condition.
> 
> The bootloader sits on the boot disk / tape. If you are able to boot from
> that device then it is reasonable to assume that the device is in good
> condition. To get a corrupted bootloader you'd need a stray I/O to that
> device. The stand-alone dumper sits on its own disk / tape which is not in
> use for normal operation. Very unlikely that this device will get hit.
>  
> > To me we should not create a arch specific way of passing information
> > between kernels. Stand alone kernel should be able to parse the
> > ELF headers which contains all the relevant info. They have already
> > been checksum verified.
> 
> Ok, so this seems to be the main point of discussion. When to create the
> ELF headers and how to pass all the required information from the crashed
> system to the kdump kernel.

To me we seem to be diverging a lot from existing kdump+kexec-tools
mechanism just to solve the case of early crash dumping. If we break
down the problem in two parts and do thing kexec-tools way (with a
backup path of booting stand alone kernel if kdump kenrel is corrupted),
things might be better.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-11 14:42           ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-11 14:42 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Michael Holzheu, kexec

On Fri, Jul 08, 2011 at 11:01:21AM +0200, Martin Schwidefsky wrote:

[..]
> > 
> > kexec-tools purgatory code also checks the checksum of loaded kernel
> > and other information and next kernel boot starts only if nothing
> > has been corrupted in first kernel. So this additional meminfo strucutres
> > and need of checksums sounds unnecessary. I think what you do need is
> > that somehow invoking second hook (s390 specific stand alone kernel)
> > in case primary kernel is corrupted.
> 
> Yes, but what do you do if the checksum tells you that the kexec kernel
> has been compromised? If the independent stand-alone dumper does the
> check it can fall back to the "dump-all" case.

So this independent dump (which takes the decision whether to continue
to boot kdump kernel or stand alone dumper) is loaded where?  On x86,
every thing is loaded in crashkernel memory and at run time we update
purgatory with entry point of kernel.

I guess you could write s390 specific purgatory code where you do
the checksum on loaded kdump kernel and if it corrupted, then you
can continue to jump to boot stand alone kernel.

BTW, you seem to have capability of doing IPL of stand alone kernel
from disk/tape after kernel crash. If yes, then why not IPL the
regular linux kernel in case its copy in memory is corrupted.

What happens if kdump kernel is not corrupted and later it fails to boot
due to some platform issue or device driver issue etc? I am assuming
that dump capture will fail. If yes, then backup mechanism is designed
only to protect against kdump kernel's corruption while loaded in
memory?

In Michael's doc, I noticed he talked about unmapping the crashkernel
memory so that kernel. That should protect against kernel but he
mentioned about the possibility of device being able to DMA to said
memory reason. I am wondering that is it possible to program IOMMU
in such a way that any DMA attempt to said memory reason fails. If
yes, then I guess corruption problem will be solved without one
being worried about crating a backup plan for stand alone kernel and
one can just focus on making kdump kernel work.

> 
> > > 
> > > With this approach we still keep our s390 dump reliability and gain the
> > > great kdump features, e.g. distributor installer support, dump filtering
> > > with makedumpfile, etc.

So reliability only comes from the fact that stand alone kernel is booted
from the disk? So as long as kdump kernel is not corrupted, it is as
realiable as stand alone kernel?

How many a time in practice we have run into kdump kernel corruption
issues? Will unmapping from kernel page tables and doing something at
IOMMU level not take care of that issue?

> > > 
> > > > why the existing
> > > > mechanism of preparing ELF headers to describe all the above info
> > > > and just passing the address of header on kernel commnad line
> > > > (crashkernel=) will not work for s390. Introducing an entirely new
> > > > infrastructure for communicating the same information does not
> > > > sound too exciting.
> > > 
> > > We need the meminfo interface anyway for the two stage approach. The
> > > stand-alone dump tools have to find and verify the kdump kernel in order
> > > to start it.

kexec-tools purgatory code already has the checksum logic. So you don't
have to redo that in stand alone tools. I think you probably need to
s390 specic purgatory and jump to IPLing stand alone kernel if kdump
kernel is corrupted instead of rebooting back or spinning infinitely
in the loop/

> > 
> > kexec-tools does this verification already. We verify the checksum of
> > all the loaded information in reserved area. So why introduce this
> > meminfo interface.
> 
> Again, what do you do if the verification fails? Fail to dump the borked
> system? Imho not a good option.

On regular systems we did not have any backup plan so IIRC, we spin in
infinite loop. 

If one can do something about it, fine. But this again takes me back to
original question, then instead of creating backup plan, why not IPL
the kdump kernel from disk/tape the way you do for stand alone kernels.

> 
> > > Therefore the interface is there and can be used. Also
> > > creating the ELF header in the 2nd kernel is more flexible and easier
> > > IMHO:
> > > * You do not have to care about memory or CPU hotplug.
> > 
> > Reloading the kernel upon memory or cpu hotplug should be trivial. This
> > does not justify to move away from standard ELF interface and creation
> > of a new one.
> 
> We do not move away from the ELF interface, we just create the ELF headers
> at a different time, no?

Existing kernel already provides a way to communicate relevant information
to new kernel/binary about the first kernel and that is through ELF. You
are moving away from that and creating one more interface, meminfo to
get all the info about first kernel. What's wrong with continue parsing
ELF to get all the needed info. Is there any piece of information missing
which you require?

> 
> > > * You do not have to preallocate CPU crash notes etc.
> > 
> > Its a small per cpu area. Looks like otherwise you will create meminfo
> > areas otherwise.
> 
> Probably doesn't matter.
> 
> > > * It works independently from the tool/mechanism that loads the kdump
> > > kernel into memory. E.g. we have the idea to load the kdump kernel at
> > > boot time into the crashkernel memory (not via the kexec_load system
> > > call). That would solve the main kdump problems: The kdump kernel can't
> > > be overwritten by I/O and also early kernel problems could then be
> > > dumped using kdump.

So looks like you are loading two kernels at a time. One primary kernel
and other kernel in crashkernel memory area. But that would solve only
early crash dump problem and not the corruption problem?

I think we are trying to solve multiple problems at one go. We want
the regular capability to boot a kdump kernel and also solve the problem
of eary boot crash.

Why not solve the bigger problem in first step (and that is capturing
filtered dump of big RAM systems fast) and do the integration with
regular kexec-tools (create ELF headers etc) and s390 specific purgatory
code. 

Once all this is done, then you can look at how to capture early 
kernel crashes (if it turns out to be a real problem).

> > 
> > Can you give more details how exactly it works. I know very little about
> > s390 dump mechanism.
> 
> Before we started working on kdump the only way to get a dump is to boot
> a stand-alone dumper. That is a small piece of assembler code that is
> loaded into the first 64KB of memory (which is reserved for these kind of
> things). This assembler code will then write everything to the dump device.
> This works very reliable (which is of utmost importance to us) but has the
> problem that it will be awfully slow for large memory sizes.

When and who loads this assembler code into memory and how do we make
sure this code is not corrupted.

I got the part about being slow because you have to write specific
drivers for saving dump and you don't have filtering capabilty. In
today's big memory systems it makes sense to reuse kdump's capability
to use first kernel's drivers and filtering in user space.

>  
> > When do you load kdump kernel and who does it?
> 
> If the crashed kernel is still operational enough to call panic it can
> cause an IPL to the stand-alone dump tool (or do a reset of the I/O
> subsystem and directly call kdump with the new code if the checksums
> turn out ok).
> If the crashed kernel is totally bust then the administrator has to do
> a manual IPL from the disk where the stand-alone dumper has been installed.
>  
> > Who gets the control first after crash?
> 
> Depends. If the kernel can recognize the crash as such it can proceed to
> execute the configured "on_panic" shutdown action. If the kernel is bust
> the code loaded by the next IPL gets control. This can be a "normal" boot
> or a stand-alone dumper.
> 
> > To me it looked like that you regularly load kdump kernel and if that
> > is corrupted then somehow you boot standalone kernel. So corruption
> > of kdump kernel should not be a issue for you.
> 
> It is the other way round. We load the standalone dumper, then check if
> the kdump kernel looks good. Only if all the checksums turn out ok we
> jump to the purgatory code from the standalone dump code.

Ok. So again why not reuse the checksump capability of kexec-tools and
instead of infinite looping you can jump to stand alone tools + IPL etc.
I understand this will require a tighter integration with kexec-tools
and using ELF header mechanism and will not cover the early kernel
crashes.

> 
> > Do you load kdump kenrel from some tape/storage after system crash. Where
> > does bootloader lies and how do you make sure it is not corrupted and
> > associated device is in good condition.
> 
> The bootloader sits on the boot disk / tape. If you are able to boot from
> that device then it is reasonable to assume that the device is in good
> condition. To get a corrupted bootloader you'd need a stray I/O to that
> device. The stand-alone dumper sits on its own disk / tape which is not in
> use for normal operation. Very unlikely that this device will get hit.
>  
> > To me we should not create a arch specific way of passing information
> > between kernels. Stand alone kernel should be able to parse the
> > ELF headers which contains all the relevant info. They have already
> > been checksum verified.
> 
> Ok, so this seems to be the main point of discussion. When to create the
> ELF headers and how to pass all the required information from the crashed
> system to the kdump kernel.

To me we seem to be diverging a lot from existing kdump+kexec-tools
mechanism just to solve the case of early crash dumping. If we break
down the problem in two parts and do thing kexec-tools way (with a
backup path of booting stand alone kernel if kdump kenrel is corrupted),
things might be better.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-11 14:07           ` Vivek Goyal
@ 2011-07-11 15:06             ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-11 15:06 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: ebiederm, hbabu, mahesh, oomichi, horms, schwidefsky,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Mon, 2011-07-11 at 10:07 -0400, Vivek Goyal wrote:
> On Fri, Jul 08, 2011 at 04:02:18PM +0200, Michael Holzheu wrote:
> > Hello Vivek,
> > 
> > I attached a document where the s390 port is described in more detail.
> > Perhaps this helps you to understand what want and what we are doing. If
> > not - just delete it :-)
> > 
> 
> Michael,
> 
> Thanks for the documentation. I have gone through it quickly and there
> are some parts I am still missing.
> 
> On x86, after the kernel crash we jump to purgatory code which does the
> checksum verification of all the loaded segments and if everything is
> fine, it jumps to kdump kernel's entry point and second kernel boots.
> 
> On s390, looks like after purgatory, control is going to some other
> piece of code (which does IPL?), and that code decides whether to
> start kdump kernel or launch stand alone kernel as backup plan?
>
> If yes, is that code whic does the IPL, also loaded in crashkernel memory
> as part of kdump kernel? If not, how does kexec-tools come to know
> where to jump after doing checksum on loaded kernel?

On s390 currently we currently do not use the purgatory kexec-tools code
at all. Therefore this checksum check is not done.

We have defined a new kernel entry point for kdump at <crashk base> +
0x10008. When this entry point is used, the first instructions swap the
area [0, crashk size] with [crashk base, crashk base + crashk size]. The
information about crashk base and size is taken from meminfo. After the
swap the Linux kernel is started with the information that we are in
kdump mode and the information about crashk base and size (which is
oldmem base and size now).

There are different ways how the kdump entry point can be called on s390
depending on the setting (/sys/firmware) for "panic" and
"restart" (something like NMI on intel). If you do not want the more
reliable two stage approach with the stand-alone dump tools, it is also
possible that the entry point is called directly via machine_kexec(). In
this case the crashed kernel does the s390 specific checksum test and
then jumps to <crashk base> + 0x10008.

The more reliable solution with our stand-alone dump tools works as
follows:
1. Stand-alone dump disk or tape is IPLed (booted). This can be done
either automatically via the panic() kernel code path or manually by the
operator of the (virtual) machine.
2. The stand-alone dump tool (mini OS) is loaded to 0x2000 under
0x10000. This area is always free and is not used by Linux on s390.
3. The stand-alone dump tool finds the crashkernel memory via meminfo
that can be found via a pointer at address 0xe14.
4. The stand-alone verifies the kdump checksums and jumps to <crashk
base> + 0x10008, if everything is ok.

Michael








^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-11 15:06             ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-11 15:06 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, schwidefsky, kexec

On Mon, 2011-07-11 at 10:07 -0400, Vivek Goyal wrote:
> On Fri, Jul 08, 2011 at 04:02:18PM +0200, Michael Holzheu wrote:
> > Hello Vivek,
> > 
> > I attached a document where the s390 port is described in more detail.
> > Perhaps this helps you to understand what want and what we are doing. If
> > not - just delete it :-)
> > 
> 
> Michael,
> 
> Thanks for the documentation. I have gone through it quickly and there
> are some parts I am still missing.
> 
> On x86, after the kernel crash we jump to purgatory code which does the
> checksum verification of all the loaded segments and if everything is
> fine, it jumps to kdump kernel's entry point and second kernel boots.
> 
> On s390, looks like after purgatory, control is going to some other
> piece of code (which does IPL?), and that code decides whether to
> start kdump kernel or launch stand alone kernel as backup plan?
>
> If yes, is that code whic does the IPL, also loaded in crashkernel memory
> as part of kdump kernel? If not, how does kexec-tools come to know
> where to jump after doing checksum on loaded kernel?

On s390 currently we currently do not use the purgatory kexec-tools code
at all. Therefore this checksum check is not done.

We have defined a new kernel entry point for kdump at <crashk base> +
0x10008. When this entry point is used, the first instructions swap the
area [0, crashk size] with [crashk base, crashk base + crashk size]. The
information about crashk base and size is taken from meminfo. After the
swap the Linux kernel is started with the information that we are in
kdump mode and the information about crashk base and size (which is
oldmem base and size now).

There are different ways how the kdump entry point can be called on s390
depending on the setting (/sys/firmware) for "panic" and
"restart" (something like NMI on intel). If you do not want the more
reliable two stage approach with the stand-alone dump tools, it is also
possible that the entry point is called directly via machine_kexec(). In
this case the crashed kernel does the s390 specific checksum test and
then jumps to <crashk base> + 0x10008.

The more reliable solution with our stand-alone dump tools works as
follows:
1. Stand-alone dump disk or tape is IPLed (booted). This can be done
either automatically via the panic() kernel code path or manually by the
operator of the (virtual) machine.
2. The stand-alone dump tool (mini OS) is loaded to 0x2000 under
0x10000. This area is always free and is not used by Linux on s390.
3. The stand-alone dump tool finds the crashkernel memory via meminfo
that can be found via a pointer at address 0xe14.
4. The stand-alone verifies the kdump checksums and jumps to <crashk
base> + 0x10008, if everything is ok.

Michael








_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-08 13:04         ` Michael Holzheu
@ 2011-07-11 15:36           ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-11 15:36 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, schwidefsky, kexec

On Fri, Jul 08, 2011 at 03:04:03PM +0200, Michael Holzheu wrote:
> Hello Vivek,
> 
> On Thu, 2011-07-07 at 15:33 -0400, Vivek Goyal wrote:
> > > Another advantage is
> > > that since it is different code, it is much less likely that the dump
> > > tool will run into the same problem than the previously crashed kernel.
> > 
> > I think in practice this is not really a problem. If your kernel
> > is not stable enough to even boot and copy a file, then most likely
> > it has not even been deployed. The very fact that a kernel has been
> > up and running verifies that it is a stable kernel for that machine
> > and is capable of capturing the dump.
> 
> I don't want to argue, about probabilities. Even if we gain only a
> little more reliability this is important for us. Don't forget that we
> write software for mainframes. We accept that the last 0.1 percent of
> reliability can be very expensive compared to the first 99.9 percent.
> 
> [snip]
> 
> > > And last but not least, with the stand-alone dump tools you can
> > > dump early kernel problems which is not possible using kdump, because
> > > you can't dump before the kdump kernel has been loaded with kexec.
> > > 
> > 
> > That is one limitation but again if your kernel can't even boot,
> > it is not ready to ship and it is more of a development issue and
> > there are other ways to debug problems. So I would not worry too
> > much about it.
> 
> We worry about that. See the comment above regarding the 100 percent.
> 
> > On a side note, few months back there were folks who were trying
> > to enhance bootloaders to be able to prepare basic environment so
> > that a kdump kernel can boot even in the event of early first
> > kernel boot.
> 
> This is one more argument to create the ELF header in the 2nd kernel.
> With our approach loading the kdump kernel at boot time is almost
> trivial.

I think ELF header is just the way of passing some required information
from first kernel to second kernel. In second kernel, we anyway prepare
fresh headers for /proc/vmcore.

So in your mechanism if you don't need any info from second kernel it
is fine to not use ELF. But if you do need, then it makes sense to
use existing mechanism instead of creating a new one (seems to be
meminfo in your case).

I think at the end of the day it would not matter much whether kexec-tools
created those headers or boot loader did. But there are advantages to
doing things in kexec-tools.

- A user space is fully booted and it provides scope for enhancements and
  intellingent things.

	- Depending on dump target a user can filter out some of the
	  modules from kdump ramdisk and reduce the size of memory
	  required. With a pure bootloader approach, I guess one will
	  do the change, generate a new initrd and then reboot the
	  system.

	  With kexec-tools it is just a matter of regnerating initrd
	  and reloading the kernel using kexec system call.

	  So we avoid extra reboot.

This is just one of the arguments. I think key thing here seems to be
that whatever kexec-tools is doing, should we do that in bootloader 
to serve the case of early crash. 

IMHO, I am not too concerned about early crash at this point of time for
the simple reason that you can't even deploy the kernel which can't boot.
This is a developer environment issue and and not a customer deployment
scenario. But other people ofcourse might have different requirement.

So cater to those requirements, I think it is fine that bootloader
does what kexec-tools is doing. Load kdump kernel, tell first kernel
about it, load purgatory (which enables transition between two kernels,
does checksums, sets up right page tables etc). Looks like s390
wants to take this path, I guess it is fine as long as it is clear
from the patches.

> 
> Example (e.g. crashkernel=xxxM@256M):
> 
> 1. The boot loader loads standard kernel and kdump kernel into memory.
> The kdump kernel is loaded into crashkernel memory to 256M. No more
> setup (e.g. creating ELF headers) is necessary.
> 2. We could add a kernel parameter "kexec_load=<segm addr>,<segm
> size>, ..." that does an internal kexec_load(). After this kernel
> parameter is processed, kdump is armed.

I think I am not worried about kexec_load() as such. I am just trying
to understand the theme of the patchset and a mixed approach of using
kexec-tools as well as using boot loader is confusing me. 

I am still trying to figure out what is short term plan and what is
long term and whether you are going for kexec-tools as bootloader
for loading kdump kernel approach or s390 boot loader loading second
kernel approach.

> 
> What do you think?
> 
> > > That were more or less the arguments, why we did not support kdump in
> > > the past.
> > > 
> > > In order to increase dump reliability with kdump, we now implemented a
> > > two stage approach. The stand-alone dump tools first check via meminfo,
> > > if kdump is valid using checksums. If kdump is loaded and healthy it is
> > > started. Otherwise the stand-alone dump tools create a full-blown
> > > stand-alone dump.
> > 
> > kexec-tools purgatory code also checks the checksum of loaded kernel
> > and other information and next kernel boot starts only if nothing
> > has been corrupted in first kernel. 
> 
> Can you point me to the code where this is done and from where in the
> kernel that code is called? Currently with our implementation we do not
> use any purgatory code from kexec tools.

kexec-tools/purgatory/purgatory.c (verify_sha256_digest()).

> > and need of checksums sounds unnecessary. I think what you do need is
> > that somehow invoking second hook (s390 specific stand alone kernel)
> > in case primary kernel is corrupted.
> > > 
> > > With this approach we still keep our s390 dump reliability and gain the
> > > great kdump features, e.g. distributor installer support, dump filtering
> > > with makedumpfile, etc.
> > > 
> > > > why the existing
> > > > mechanism of preparing ELF headers to describe all the above info
> > > > and just passing the address of header on kernel commnad line
> > > > (crashkernel=) will not work for s390. Introducing an entirely new
> > > > infrastructure for communicating the same information does not
> > > > sound too exciting.
> > > 
> > > We need the meminfo interface anyway for the two stage approach. The
> > > stand-alone dump tools have to find and verify the kdump kernel in order
> > > to start it.
> > 
> > kexec-tools does this verification already. We verify the checksum of
> > all the loaded information in reserved area. So why introduce this
> > meminfo interface.
> 
> Ok, where is this done and when?

kexec-tools prepares a binary shim (we call purgatory) which is loaded
in kernel using kexec system call. After system crash control is passed
to this pargatory which verifies the checksums of all the loaded
segments and jumps to entry point of second kernel.

verify_sha256_digest() is the function which does all the verification
and loops forever if checksums don't match.

> 
> > > Therefore the interface is there and can be used. Also
> > > creating the ELF header in the 2nd kernel is more flexible and easier
> > > IMHO:
> > > * You do not have to care about memory or CPU hotplug.
> > 
> > Reloading the kernel upon memory or cpu hotplug should be trivial. This
> > does not justify to move away from standard ELF interface and creation
> > of a new one.
> > 
> > > * You do not have to preallocate CPU crash notes etc.
> > 
> > Its a small per cpu area. Looks like otherwise you will create meminfo
> > areas otherwise.
> > 
> > > * It works independently from the tool/mechanism that loads the kdump
> > > kernel into memory. E.g. we have the idea to load the kdump kernel at
> > > boot time into the crashkernel memory (not via the kexec_load system
> > > call). That would solve the main kdump problems: The kdump kernel can't
> > > be overwritten by I/O and also early kernel problems could then be
> > > dumped using kdump.
> > 
> > Can you give more details how exactly it works. I know very little about
> > s390 dump mechanism.
> 
> Maybe I confused you here. What I wanted to describe is the following
> idea:
> 1. The running production kernel starts with "crashkernel=" and reserves
> memory for kdump. No kdump is loaded with kexec.
> 2. The system crashes
> 3. To create the dump, a prepared dump disk is booted. The boot loader
> loads the kdump kernel into crashkernel memory.
> 4. The boot loader starts kdump kernel on s390 with entry point
> <crashkernel base> + 0x10008
> 5. The kdump kernel creates ELF header etc...
> 
> So this is simple for the boot loader code because no preparation steps
> like creating the ELF header are required. This is similar to scenario
> of pre-loading the kdump kernel together with the standard kernel at
> startup that I described above.
> 
> > 
> > When do you load kdump kernel and who does it?
> 
> Currently we load the kdump kernel with kexec like it is done on all
> other architectures. The other options I described above are currently
> just ideas that we have for the future.

So bootloader doing everything is future idea and for the time we still
use kexec_load() for loading kernel? If yes, then we can stop worrying
about early crash kernel case till you implement the future idea?

In fact, if kdump kenrel is not loaded, your existing mechanism of
IPLing stand alone tools should work as it without any modifications,
isn't it? This does not provide you filtering capability in early
crash but does retain ability to capture dumps.

> 
> > Who gets the control first after crash?
> > 
> > To me it looked like that you regularly load kdump kernel and if that
> > is corrupted then somehow you boot standalone kernel. So corruption
> > of kdump kernel should not be a issue for you.
> 
> As Martin already said: It can be the other way round. The stand-alone
> dump tool gets first control. We trust this code because it is freshly
> loaded and has a different code base.

I am not sure having a differnt code base means more reliability or
less reliability. It might also mean a less tested code and less
reliable. But anyway, I will not get into that debate as things have
been working for you.

> This code verifies the kdump setup
> and jumps into the pre-loaded kdump (crashkernel base + 0x10008) if
> everything is ok. Otherwise it creates a traditional s390 dump.

Ok, so the code which does the verification and takes the decision of
either booting kdump kernel or stand alone kernel is part of dump tools?
Is it loaded fresh into memory after crash and who does that?

If you are going for kexec-tools based appraoch, then as I said in
previous mail, looks like you can just create s390 specific purgatory
and just reuse the infrastructure for checksum verification. You
just need to do little enahnacement so that if kdump kernel is
corrupted, you jump to the code which loads s390 stand alone kernel
instead of looping forever.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-11 15:36           ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-11 15:36 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, schwidefsky, kexec

On Fri, Jul 08, 2011 at 03:04:03PM +0200, Michael Holzheu wrote:
> Hello Vivek,
> 
> On Thu, 2011-07-07 at 15:33 -0400, Vivek Goyal wrote:
> > > Another advantage is
> > > that since it is different code, it is much less likely that the dump
> > > tool will run into the same problem than the previously crashed kernel.
> > 
> > I think in practice this is not really a problem. If your kernel
> > is not stable enough to even boot and copy a file, then most likely
> > it has not even been deployed. The very fact that a kernel has been
> > up and running verifies that it is a stable kernel for that machine
> > and is capable of capturing the dump.
> 
> I don't want to argue, about probabilities. Even if we gain only a
> little more reliability this is important for us. Don't forget that we
> write software for mainframes. We accept that the last 0.1 percent of
> reliability can be very expensive compared to the first 99.9 percent.
> 
> [snip]
> 
> > > And last but not least, with the stand-alone dump tools you can
> > > dump early kernel problems which is not possible using kdump, because
> > > you can't dump before the kdump kernel has been loaded with kexec.
> > > 
> > 
> > That is one limitation but again if your kernel can't even boot,
> > it is not ready to ship and it is more of a development issue and
> > there are other ways to debug problems. So I would not worry too
> > much about it.
> 
> We worry about that. See the comment above regarding the 100 percent.
> 
> > On a side note, few months back there were folks who were trying
> > to enhance bootloaders to be able to prepare basic environment so
> > that a kdump kernel can boot even in the event of early first
> > kernel boot.
> 
> This is one more argument to create the ELF header in the 2nd kernel.
> With our approach loading the kdump kernel at boot time is almost
> trivial.

I think ELF header is just the way of passing some required information
from first kernel to second kernel. In second kernel, we anyway prepare
fresh headers for /proc/vmcore.

So in your mechanism if you don't need any info from second kernel it
is fine to not use ELF. But if you do need, then it makes sense to
use existing mechanism instead of creating a new one (seems to be
meminfo in your case).

I think at the end of the day it would not matter much whether kexec-tools
created those headers or boot loader did. But there are advantages to
doing things in kexec-tools.

- A user space is fully booted and it provides scope for enhancements and
  intellingent things.

	- Depending on dump target a user can filter out some of the
	  modules from kdump ramdisk and reduce the size of memory
	  required. With a pure bootloader approach, I guess one will
	  do the change, generate a new initrd and then reboot the
	  system.

	  With kexec-tools it is just a matter of regnerating initrd
	  and reloading the kernel using kexec system call.

	  So we avoid extra reboot.

This is just one of the arguments. I think key thing here seems to be
that whatever kexec-tools is doing, should we do that in bootloader 
to serve the case of early crash. 

IMHO, I am not too concerned about early crash at this point of time for
the simple reason that you can't even deploy the kernel which can't boot.
This is a developer environment issue and and not a customer deployment
scenario. But other people ofcourse might have different requirement.

So cater to those requirements, I think it is fine that bootloader
does what kexec-tools is doing. Load kdump kernel, tell first kernel
about it, load purgatory (which enables transition between two kernels,
does checksums, sets up right page tables etc). Looks like s390
wants to take this path, I guess it is fine as long as it is clear
from the patches.

> 
> Example (e.g. crashkernel=xxxM@256M):
> 
> 1. The boot loader loads standard kernel and kdump kernel into memory.
> The kdump kernel is loaded into crashkernel memory to 256M. No more
> setup (e.g. creating ELF headers) is necessary.
> 2. We could add a kernel parameter "kexec_load=<segm addr>,<segm
> size>, ..." that does an internal kexec_load(). After this kernel
> parameter is processed, kdump is armed.

I think I am not worried about kexec_load() as such. I am just trying
to understand the theme of the patchset and a mixed approach of using
kexec-tools as well as using boot loader is confusing me. 

I am still trying to figure out what is short term plan and what is
long term and whether you are going for kexec-tools as bootloader
for loading kdump kernel approach or s390 boot loader loading second
kernel approach.

> 
> What do you think?
> 
> > > That were more or less the arguments, why we did not support kdump in
> > > the past.
> > > 
> > > In order to increase dump reliability with kdump, we now implemented a
> > > two stage approach. The stand-alone dump tools first check via meminfo,
> > > if kdump is valid using checksums. If kdump is loaded and healthy it is
> > > started. Otherwise the stand-alone dump tools create a full-blown
> > > stand-alone dump.
> > 
> > kexec-tools purgatory code also checks the checksum of loaded kernel
> > and other information and next kernel boot starts only if nothing
> > has been corrupted in first kernel. 
> 
> Can you point me to the code where this is done and from where in the
> kernel that code is called? Currently with our implementation we do not
> use any purgatory code from kexec tools.

kexec-tools/purgatory/purgatory.c (verify_sha256_digest()).

> > and need of checksums sounds unnecessary. I think what you do need is
> > that somehow invoking second hook (s390 specific stand alone kernel)
> > in case primary kernel is corrupted.
> > > 
> > > With this approach we still keep our s390 dump reliability and gain the
> > > great kdump features, e.g. distributor installer support, dump filtering
> > > with makedumpfile, etc.
> > > 
> > > > why the existing
> > > > mechanism of preparing ELF headers to describe all the above info
> > > > and just passing the address of header on kernel commnad line
> > > > (crashkernel=) will not work for s390. Introducing an entirely new
> > > > infrastructure for communicating the same information does not
> > > > sound too exciting.
> > > 
> > > We need the meminfo interface anyway for the two stage approach. The
> > > stand-alone dump tools have to find and verify the kdump kernel in order
> > > to start it.
> > 
> > kexec-tools does this verification already. We verify the checksum of
> > all the loaded information in reserved area. So why introduce this
> > meminfo interface.
> 
> Ok, where is this done and when?

kexec-tools prepares a binary shim (we call purgatory) which is loaded
in kernel using kexec system call. After system crash control is passed
to this pargatory which verifies the checksums of all the loaded
segments and jumps to entry point of second kernel.

verify_sha256_digest() is the function which does all the verification
and loops forever if checksums don't match.

> 
> > > Therefore the interface is there and can be used. Also
> > > creating the ELF header in the 2nd kernel is more flexible and easier
> > > IMHO:
> > > * You do not have to care about memory or CPU hotplug.
> > 
> > Reloading the kernel upon memory or cpu hotplug should be trivial. This
> > does not justify to move away from standard ELF interface and creation
> > of a new one.
> > 
> > > * You do not have to preallocate CPU crash notes etc.
> > 
> > Its a small per cpu area. Looks like otherwise you will create meminfo
> > areas otherwise.
> > 
> > > * It works independently from the tool/mechanism that loads the kdump
> > > kernel into memory. E.g. we have the idea to load the kdump kernel at
> > > boot time into the crashkernel memory (not via the kexec_load system
> > > call). That would solve the main kdump problems: The kdump kernel can't
> > > be overwritten by I/O and also early kernel problems could then be
> > > dumped using kdump.
> > 
> > Can you give more details how exactly it works. I know very little about
> > s390 dump mechanism.
> 
> Maybe I confused you here. What I wanted to describe is the following
> idea:
> 1. The running production kernel starts with "crashkernel=" and reserves
> memory for kdump. No kdump is loaded with kexec.
> 2. The system crashes
> 3. To create the dump, a prepared dump disk is booted. The boot loader
> loads the kdump kernel into crashkernel memory.
> 4. The boot loader starts kdump kernel on s390 with entry point
> <crashkernel base> + 0x10008
> 5. The kdump kernel creates ELF header etc...
> 
> So this is simple for the boot loader code because no preparation steps
> like creating the ELF header are required. This is similar to scenario
> of pre-loading the kdump kernel together with the standard kernel at
> startup that I described above.
> 
> > 
> > When do you load kdump kernel and who does it?
> 
> Currently we load the kdump kernel with kexec like it is done on all
> other architectures. The other options I described above are currently
> just ideas that we have for the future.

So bootloader doing everything is future idea and for the time we still
use kexec_load() for loading kernel? If yes, then we can stop worrying
about early crash kernel case till you implement the future idea?

In fact, if kdump kenrel is not loaded, your existing mechanism of
IPLing stand alone tools should work as it without any modifications,
isn't it? This does not provide you filtering capability in early
crash but does retain ability to capture dumps.

> 
> > Who gets the control first after crash?
> > 
> > To me it looked like that you regularly load kdump kernel and if that
> > is corrupted then somehow you boot standalone kernel. So corruption
> > of kdump kernel should not be a issue for you.
> 
> As Martin already said: It can be the other way round. The stand-alone
> dump tool gets first control. We trust this code because it is freshly
> loaded and has a different code base.

I am not sure having a differnt code base means more reliability or
less reliability. It might also mean a less tested code and less
reliable. But anyway, I will not get into that debate as things have
been working for you.

> This code verifies the kdump setup
> and jumps into the pre-loaded kdump (crashkernel base + 0x10008) if
> everything is ok. Otherwise it creates a traditional s390 dump.

Ok, so the code which does the verification and takes the decision of
either booting kdump kernel or stand alone kernel is part of dump tools?
Is it loaded fresh into memory after crash and who does that?

If you are going for kexec-tools based appraoch, then as I said in
previous mail, looks like you can just create s390 specific purgatory
and just reuse the infrastructure for checksum verification. You
just need to do little enahnacement so that if kdump kernel is
corrupted, you jump to the code which loads s390 stand alone kernel
instead of looping forever.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-11 14:42           ` Vivek Goyal
@ 2011-07-11 15:56             ` Martin Schwidefsky
  -1 siblings, 0 replies; 112+ messages in thread
From: Martin Schwidefsky @ 2011-07-11 15:56 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Michael Holzheu, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Mon, 11 Jul 2011 10:42:55 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Fri, Jul 08, 2011 at 11:01:21AM +0200, Martin Schwidefsky wrote:
> 
> [..]
> > > 
> > > kexec-tools purgatory code also checks the checksum of loaded kernel
> > > and other information and next kernel boot starts only if nothing
> > > has been corrupted in first kernel. So this additional meminfo strucutres
> > > and need of checksums sounds unnecessary. I think what you do need is
> > > that somehow invoking second hook (s390 specific stand alone kernel)
> > > in case primary kernel is corrupted.
> > 
> > Yes, but what do you do if the checksum tells you that the kexec kernel
> > has been compromised? If the independent stand-alone dumper does the
> > check it can fall back to the "dump-all" case.
> 
> So this independent dump (which takes the decision whether to continue
> to boot kdump kernel or stand alone dumper) is loaded where?  On x86,
> every thing is loaded in crashkernel memory and at run time we update
> purgatory with entry point of kernel.

The dasd stand-alone dumper is loaded into the first 64KB of memory that
is specifically left unused for a tool like that. It is our "if all breaks
use that one" dumper. It is written in assembler and is really small,
currently its size is 8KB.

> I guess you could write s390 specific purgatory code where you do
> the checksum on loaded kdump kernel and if it corrupted, then you
> can continue to jump to boot stand alone kernel.

Basically yes, with the current implementation is a pre-purgatory piece
of code that is included into the dumper. After the checksums turn out
ok it branches to the purgatory code.

> BTW, you seem to have capability of doing IPL of stand alone kernel
> from disk/tape after kernel crash. If yes, then why not IPL the
> regular linux kernel in case its copy in memory is corrupted.

We played with the idea to load the kdump kernel into its designated
area and make the startup code recognize this situation. It then swaps
the memory starting at zero with the kdump memory area. We need to do
that because on s390 all kernels start at zero (and that is not easy
to change). The trouble here is that we would have to set up a boot
disk with the exact kdump area memory address for each system. 
This is a) error prone and b) our customers are used to have a single
dump device for all their servers on a single system. 

> What happens if kdump kernel is not corrupted and later it fails to boot
> due to some platform issue or device driver issue etc? I am assuming
> that dump capture will fail. If yes, then backup mechanism is designed
> only to protect against kdump kernel's corruption while loaded in
> memory?

The kdump kernel is limited in it functionality. Yes it is bigger than
the stand-alone dumper but it is still very small compared to a production
system. As in 256 MB of memory, one disk, probably a single network
connection. The likelihood of a failure is indeed bigger compared to the
stand-alone dumper, you basically trade a bit of reliability for advanced
functions (like dump to network) and speed thanks to filtering. With the
checksum the reliability should be really good though.

> In Michael's doc, I noticed he talked about unmapping the crashkernel
> memory so that kernel. That should protect against kernel but he
> mentioned about the possibility of device being able to DMA to said
> memory reason. I am wondering that is it possible to program IOMMU
> in such a way that any DMA attempt to said memory reason fails. If
> yes, then I guess corruption problem will be solved without one
> being worried about crating a backup plan for stand alone kernel and
> one can just focus on making kdump kernel work.

We unmap the crashkernel from the kernel address space to make it harder
to corrupt the kdump kernel with a wild pointer. The only way how the
crashkernel can go bad is via DMA. I/O addresses are absolute, with a
bad address you can overwrite any piece of memory. Thats why we want that
check-summing mechanism before passing control to anything that has been
in memory at the time of the crash.

> > 
> > > > 
> > > > With this approach we still keep our s390 dump reliability and gain the
> > > > great kdump features, e.g. distributor installer support, dump filtering
> > > > with makedumpfile, etc.
> 
> So reliability only comes from the fact that stand alone kernel is booted
> from the disk? So as long as kdump kernel is not corrupted, it is as
> realiable as stand alone kernel?

Yes, reliability comes from a fresh IPL/boot. It resets everything to a sane
state and we can then collect the memory content. As long as you don't get
fancy with kdump (e.g. with dump to network), a checksum verified kdump
kernel should be close to the stand-alone dumper as far as reliability is
concerned.

> How many a time in practice we have run into kdump kernel corruption
> issues? Will unmapping from kernel page tables and doing something at
> IOMMU level not take care of that issue?

Well as we do not use kdump at customer sites yet we do not have a lot of
practice with it. But we did have a few real cases with broken I/O going
on a rampage.

> > > > 
> > > > > why the existing
> > > > > mechanism of preparing ELF headers to describe all the above info
> > > > > and just passing the address of header on kernel commnad line
> > > > > (crashkernel=) will not work for s390. Introducing an entirely new
> > > > > infrastructure for communicating the same information does not
> > > > > sound too exciting.
> > > > 
> > > > We need the meminfo interface anyway for the two stage approach. The
> > > > stand-alone dump tools have to find and verify the kdump kernel in order
> > > > to start it.
> 
> kexec-tools purgatory code already has the checksum logic. So you don't
> have to redo that in stand alone tools. I think you probably need to
> s390 specic purgatory and jump to IPLing stand alone kernel if kdump
> kernel is corrupted instead of rebooting back or spinning infinitely
> in the loop/

I can not quite follow you here. The purgatory code is part of the kdump kernel,
no? When we trigger a dump with the stand-alone tools we will start executing
code in the assembler function of that stand-alone tools. We can not trust
the kdump kernel yet, not without doing the checksums first.

> > > 
> > > kexec-tools does this verification already. We verify the checksum of
> > > all the loaded information in reserved area. So why introduce this
> > > meminfo interface.
> > 
> > Again, what do you do if the verification fails? Fail to dump the borked
> > system? Imho not a good option.
> 
> On regular systems we did not have any backup plan so IIRC, we spin in
> infinite loop. 

Even worse, going to an infinite loop is VERY bad. One of the things we will
do after the checksum of the kdump kernel turned out ok is to write to some
field in the kdump kernel to invalidate the checksum. If we crash again the
stand-alone dumper will find the checksum to be bad the second time around.
No infinite loop here.

> If one can do something about it, fine. But this again takes me back to
> original question, then instead of creating backup plan, why not IPL
> the kdump kernel from disk/tape the way you do for stand alone kernels.

As outlined above it is basically a setup issue.

> > 
> > > > Therefore the interface is there and can be used. Also
> > > > creating the ELF header in the 2nd kernel is more flexible and easier
> > > > IMHO:
> > > > * You do not have to care about memory or CPU hotplug.
> > > 
> > > Reloading the kernel upon memory or cpu hotplug should be trivial. This
> > > does not justify to move away from standard ELF interface and creation
> > > of a new one.
> > 
> > We do not move away from the ELF interface, we just create the ELF headers
> > at a different time, no?
> 
> Existing kernel already provides a way to communicate relevant information
> to new kernel/binary about the first kernel and that is through ELF. You
> are moving away from that and creating one more interface, meminfo to
> get all the info about first kernel. What's wrong with continue parsing
> ELF to get all the needed info. Is there any piece of information missing
> which you require?

I'll have to discuss this with Michael once more. 
 
> > 
> > > > * You do not have to preallocate CPU crash notes etc.
> > > 
> > > Its a small per cpu area. Looks like otherwise you will create meminfo
> > > areas otherwise.
> > 
> > Probably doesn't matter.
> > 
> > > > * It works independently from the tool/mechanism that loads the kdump
> > > > kernel into memory. E.g. we have the idea to load the kdump kernel at
> > > > boot time into the crashkernel memory (not via the kexec_load system
> > > > call). That would solve the main kdump problems: The kdump kernel can't
> > > > be overwritten by I/O and also early kernel problems could then be
> > > > dumped using kdump.
> 
> So looks like you are loading two kernels at a time. One primary kernel
> and other kernel in crashkernel memory area. But that would solve only
> early crash dump problem and not the corruption problem?

That would solve both the early crash dump problem and the I/O corruption
problem -- if we would know where we can load the kdump kernel.
Remember: one stand-alone dump disk for all the servers in a server farm.
They might have different kdump memory area addresses.

> I think we are trying to solve multiple problems at one go. We want
> the regular capability to boot a kdump kernel and also solve the problem
> of eary boot crash.
> 
> Why not solve the bigger problem in first step (and that is capturing
> filtered dump of big RAM systems fast) and do the integration with
> regular kexec-tools (create ELF headers etc) and s390 specific purgatory
> code. 

Consider that problem solved. kdump support for s390 is just around the
corner.

> Once all this is done, then you can look at how to capture early 
> kernel crashes (if it turns out to be a real problem).

The patches to solve the early kernel crash / I/O corruption are on the table.
It is just the order of the patches in the set we are talking about, no?

> > > 
> > > Can you give more details how exactly it works. I know very little about
> > > s390 dump mechanism.
> > 
> > Before we started working on kdump the only way to get a dump is to boot
> > a stand-alone dumper. That is a small piece of assembler code that is
> > loaded into the first 64KB of memory (which is reserved for these kind of
> > things). This assembler code will then write everything to the dump device.
> > This works very reliable (which is of utmost importance to us) but has the
> > problem that it will be awfully slow for large memory sizes.
> 
> When and who loads this assembler code into memory and how do we make
> sure this code is not corrupted.

A fresh IPL / boot does that.

> I got the part about being slow because you have to write specific
> drivers for saving dump and you don't have filtering capabilty. In
> today's big memory systems it makes sense to reuse kdump's capability
> to use first kernel's drivers and filtering in user space.

That is exactly what we are trying to achieve.
 
> >  
> > > When do you load kdump kernel and who does it?
> > 
> > If the crashed kernel is still operational enough to call panic it can
> > cause an IPL to the stand-alone dump tool (or do a reset of the I/O
> > subsystem and directly call kdump with the new code if the checksums
> > turn out ok).
> > If the crashed kernel is totally bust then the administrator has to do
> > a manual IPL from the disk where the stand-alone dumper has been installed.
> >  
> > > Who gets the control first after crash?
> > 
> > Depends. If the kernel can recognize the crash as such it can proceed to
> > execute the configured "on_panic" shutdown action. If the kernel is bust
> > the code loaded by the next IPL gets control. This can be a "normal" boot
> > or a stand-alone dumper.
> > 
> > > To me it looked like that you regularly load kdump kernel and if that
> > > is corrupted then somehow you boot standalone kernel. So corruption
> > > of kdump kernel should not be a issue for you.
> > 
> > It is the other way round. We load the standalone dumper, then check if
> > the kdump kernel looks good. Only if all the checksums turn out ok we
> > jump to the purgatory code from the standalone dump code.
> 
> Ok. So again why not reuse the checksump capability of kexec-tools and
> instead of infinite looping you can jump to stand alone tools + IPL etc.
> I understand this will require a tighter integration with kexec-tools
> and using ELF header mechanism and will not cover the early kernel
> crashes.

Imho the checksum of kexec-tools is in the wrong place.

> > 
> > > Do you load kdump kenrel from some tape/storage after system crash. Where
> > > does bootloader lies and how do you make sure it is not corrupted and
> > > associated device is in good condition.
> > 
> > The bootloader sits on the boot disk / tape. If you are able to boot from
> > that device then it is reasonable to assume that the device is in good
> > condition. To get a corrupted bootloader you'd need a stray I/O to that
> > device. The stand-alone dumper sits on its own disk / tape which is not in
> > use for normal operation. Very unlikely that this device will get hit.
> >  
> > > To me we should not create a arch specific way of passing information
> > > between kernels. Stand alone kernel should be able to parse the
> > > ELF headers which contains all the relevant info. They have already
> > > been checksum verified.
> > 
> > Ok, so this seems to be the main point of discussion. When to create the
> > ELF headers and how to pass all the required information from the crashed
> > system to the kdump kernel.
> 
> To me we seem to be diverging a lot from existing kdump+kexec-tools
> mechanism just to solve the case of early crash dumping. If we break
> down the problem in two parts and do thing kexec-tools way (with a
> backup path of booting stand alone kernel if kdump kenrel is corrupted),
> things might be better.

The "backup path of booting stand alone kernel" would result in passing
the control twice, once from the stand-alone dumper to the kexec purgatory
(after the purgatory checksum has been verified), then doing more checks 
in the kdump kernel, only to return to the stand-alone dumper if some check
fails. Does not really sound enticing to me.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-11 15:56             ` Martin Schwidefsky
  0 siblings, 0 replies; 112+ messages in thread
From: Martin Schwidefsky @ 2011-07-11 15:56 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Michael Holzheu, kexec

On Mon, 11 Jul 2011 10:42:55 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Fri, Jul 08, 2011 at 11:01:21AM +0200, Martin Schwidefsky wrote:
> 
> [..]
> > > 
> > > kexec-tools purgatory code also checks the checksum of loaded kernel
> > > and other information and next kernel boot starts only if nothing
> > > has been corrupted in first kernel. So this additional meminfo strucutres
> > > and need of checksums sounds unnecessary. I think what you do need is
> > > that somehow invoking second hook (s390 specific stand alone kernel)
> > > in case primary kernel is corrupted.
> > 
> > Yes, but what do you do if the checksum tells you that the kexec kernel
> > has been compromised? If the independent stand-alone dumper does the
> > check it can fall back to the "dump-all" case.
> 
> So this independent dump (which takes the decision whether to continue
> to boot kdump kernel or stand alone dumper) is loaded where?  On x86,
> every thing is loaded in crashkernel memory and at run time we update
> purgatory with entry point of kernel.

The dasd stand-alone dumper is loaded into the first 64KB of memory that
is specifically left unused for a tool like that. It is our "if all breaks
use that one" dumper. It is written in assembler and is really small,
currently its size is 8KB.

> I guess you could write s390 specific purgatory code where you do
> the checksum on loaded kdump kernel and if it corrupted, then you
> can continue to jump to boot stand alone kernel.

Basically yes, with the current implementation is a pre-purgatory piece
of code that is included into the dumper. After the checksums turn out
ok it branches to the purgatory code.

> BTW, you seem to have capability of doing IPL of stand alone kernel
> from disk/tape after kernel crash. If yes, then why not IPL the
> regular linux kernel in case its copy in memory is corrupted.

We played with the idea to load the kdump kernel into its designated
area and make the startup code recognize this situation. It then swaps
the memory starting at zero with the kdump memory area. We need to do
that because on s390 all kernels start at zero (and that is not easy
to change). The trouble here is that we would have to set up a boot
disk with the exact kdump area memory address for each system. 
This is a) error prone and b) our customers are used to have a single
dump device for all their servers on a single system. 

> What happens if kdump kernel is not corrupted and later it fails to boot
> due to some platform issue or device driver issue etc? I am assuming
> that dump capture will fail. If yes, then backup mechanism is designed
> only to protect against kdump kernel's corruption while loaded in
> memory?

The kdump kernel is limited in it functionality. Yes it is bigger than
the stand-alone dumper but it is still very small compared to a production
system. As in 256 MB of memory, one disk, probably a single network
connection. The likelihood of a failure is indeed bigger compared to the
stand-alone dumper, you basically trade a bit of reliability for advanced
functions (like dump to network) and speed thanks to filtering. With the
checksum the reliability should be really good though.

> In Michael's doc, I noticed he talked about unmapping the crashkernel
> memory so that kernel. That should protect against kernel but he
> mentioned about the possibility of device being able to DMA to said
> memory reason. I am wondering that is it possible to program IOMMU
> in such a way that any DMA attempt to said memory reason fails. If
> yes, then I guess corruption problem will be solved without one
> being worried about crating a backup plan for stand alone kernel and
> one can just focus on making kdump kernel work.

We unmap the crashkernel from the kernel address space to make it harder
to corrupt the kdump kernel with a wild pointer. The only way how the
crashkernel can go bad is via DMA. I/O addresses are absolute, with a
bad address you can overwrite any piece of memory. Thats why we want that
check-summing mechanism before passing control to anything that has been
in memory at the time of the crash.

> > 
> > > > 
> > > > With this approach we still keep our s390 dump reliability and gain the
> > > > great kdump features, e.g. distributor installer support, dump filtering
> > > > with makedumpfile, etc.
> 
> So reliability only comes from the fact that stand alone kernel is booted
> from the disk? So as long as kdump kernel is not corrupted, it is as
> realiable as stand alone kernel?

Yes, reliability comes from a fresh IPL/boot. It resets everything to a sane
state and we can then collect the memory content. As long as you don't get
fancy with kdump (e.g. with dump to network), a checksum verified kdump
kernel should be close to the stand-alone dumper as far as reliability is
concerned.

> How many a time in practice we have run into kdump kernel corruption
> issues? Will unmapping from kernel page tables and doing something at
> IOMMU level not take care of that issue?

Well as we do not use kdump at customer sites yet we do not have a lot of
practice with it. But we did have a few real cases with broken I/O going
on a rampage.

> > > > 
> > > > > why the existing
> > > > > mechanism of preparing ELF headers to describe all the above info
> > > > > and just passing the address of header on kernel commnad line
> > > > > (crashkernel=) will not work for s390. Introducing an entirely new
> > > > > infrastructure for communicating the same information does not
> > > > > sound too exciting.
> > > > 
> > > > We need the meminfo interface anyway for the two stage approach. The
> > > > stand-alone dump tools have to find and verify the kdump kernel in order
> > > > to start it.
> 
> kexec-tools purgatory code already has the checksum logic. So you don't
> have to redo that in stand alone tools. I think you probably need to
> s390 specic purgatory and jump to IPLing stand alone kernel if kdump
> kernel is corrupted instead of rebooting back or spinning infinitely
> in the loop/

I can not quite follow you here. The purgatory code is part of the kdump kernel,
no? When we trigger a dump with the stand-alone tools we will start executing
code in the assembler function of that stand-alone tools. We can not trust
the kdump kernel yet, not without doing the checksums first.

> > > 
> > > kexec-tools does this verification already. We verify the checksum of
> > > all the loaded information in reserved area. So why introduce this
> > > meminfo interface.
> > 
> > Again, what do you do if the verification fails? Fail to dump the borked
> > system? Imho not a good option.
> 
> On regular systems we did not have any backup plan so IIRC, we spin in
> infinite loop. 

Even worse, going to an infinite loop is VERY bad. One of the things we will
do after the checksum of the kdump kernel turned out ok is to write to some
field in the kdump kernel to invalidate the checksum. If we crash again the
stand-alone dumper will find the checksum to be bad the second time around.
No infinite loop here.

> If one can do something about it, fine. But this again takes me back to
> original question, then instead of creating backup plan, why not IPL
> the kdump kernel from disk/tape the way you do for stand alone kernels.

As outlined above it is basically a setup issue.

> > 
> > > > Therefore the interface is there and can be used. Also
> > > > creating the ELF header in the 2nd kernel is more flexible and easier
> > > > IMHO:
> > > > * You do not have to care about memory or CPU hotplug.
> > > 
> > > Reloading the kernel upon memory or cpu hotplug should be trivial. This
> > > does not justify to move away from standard ELF interface and creation
> > > of a new one.
> > 
> > We do not move away from the ELF interface, we just create the ELF headers
> > at a different time, no?
> 
> Existing kernel already provides a way to communicate relevant information
> to new kernel/binary about the first kernel and that is through ELF. You
> are moving away from that and creating one more interface, meminfo to
> get all the info about first kernel. What's wrong with continue parsing
> ELF to get all the needed info. Is there any piece of information missing
> which you require?

I'll have to discuss this with Michael once more. 
 
> > 
> > > > * You do not have to preallocate CPU crash notes etc.
> > > 
> > > Its a small per cpu area. Looks like otherwise you will create meminfo
> > > areas otherwise.
> > 
> > Probably doesn't matter.
> > 
> > > > * It works independently from the tool/mechanism that loads the kdump
> > > > kernel into memory. E.g. we have the idea to load the kdump kernel at
> > > > boot time into the crashkernel memory (not via the kexec_load system
> > > > call). That would solve the main kdump problems: The kdump kernel can't
> > > > be overwritten by I/O and also early kernel problems could then be
> > > > dumped using kdump.
> 
> So looks like you are loading two kernels at a time. One primary kernel
> and other kernel in crashkernel memory area. But that would solve only
> early crash dump problem and not the corruption problem?

That would solve both the early crash dump problem and the I/O corruption
problem -- if we would know where we can load the kdump kernel.
Remember: one stand-alone dump disk for all the servers in a server farm.
They might have different kdump memory area addresses.

> I think we are trying to solve multiple problems at one go. We want
> the regular capability to boot a kdump kernel and also solve the problem
> of eary boot crash.
> 
> Why not solve the bigger problem in first step (and that is capturing
> filtered dump of big RAM systems fast) and do the integration with
> regular kexec-tools (create ELF headers etc) and s390 specific purgatory
> code. 

Consider that problem solved. kdump support for s390 is just around the
corner.

> Once all this is done, then you can look at how to capture early 
> kernel crashes (if it turns out to be a real problem).

The patches to solve the early kernel crash / I/O corruption are on the table.
It is just the order of the patches in the set we are talking about, no?

> > > 
> > > Can you give more details how exactly it works. I know very little about
> > > s390 dump mechanism.
> > 
> > Before we started working on kdump the only way to get a dump is to boot
> > a stand-alone dumper. That is a small piece of assembler code that is
> > loaded into the first 64KB of memory (which is reserved for these kind of
> > things). This assembler code will then write everything to the dump device.
> > This works very reliable (which is of utmost importance to us) but has the
> > problem that it will be awfully slow for large memory sizes.
> 
> When and who loads this assembler code into memory and how do we make
> sure this code is not corrupted.

A fresh IPL / boot does that.

> I got the part about being slow because you have to write specific
> drivers for saving dump and you don't have filtering capabilty. In
> today's big memory systems it makes sense to reuse kdump's capability
> to use first kernel's drivers and filtering in user space.

That is exactly what we are trying to achieve.
 
> >  
> > > When do you load kdump kernel and who does it?
> > 
> > If the crashed kernel is still operational enough to call panic it can
> > cause an IPL to the stand-alone dump tool (or do a reset of the I/O
> > subsystem and directly call kdump with the new code if the checksums
> > turn out ok).
> > If the crashed kernel is totally bust then the administrator has to do
> > a manual IPL from the disk where the stand-alone dumper has been installed.
> >  
> > > Who gets the control first after crash?
> > 
> > Depends. If the kernel can recognize the crash as such it can proceed to
> > execute the configured "on_panic" shutdown action. If the kernel is bust
> > the code loaded by the next IPL gets control. This can be a "normal" boot
> > or a stand-alone dumper.
> > 
> > > To me it looked like that you regularly load kdump kernel and if that
> > > is corrupted then somehow you boot standalone kernel. So corruption
> > > of kdump kernel should not be a issue for you.
> > 
> > It is the other way round. We load the standalone dumper, then check if
> > the kdump kernel looks good. Only if all the checksums turn out ok we
> > jump to the purgatory code from the standalone dump code.
> 
> Ok. So again why not reuse the checksump capability of kexec-tools and
> instead of infinite looping you can jump to stand alone tools + IPL etc.
> I understand this will require a tighter integration with kexec-tools
> and using ELF header mechanism and will not cover the early kernel
> crashes.

Imho the checksum of kexec-tools is in the wrong place.

> > 
> > > Do you load kdump kenrel from some tape/storage after system crash. Where
> > > does bootloader lies and how do you make sure it is not corrupted and
> > > associated device is in good condition.
> > 
> > The bootloader sits on the boot disk / tape. If you are able to boot from
> > that device then it is reasonable to assume that the device is in good
> > condition. To get a corrupted bootloader you'd need a stray I/O to that
> > device. The stand-alone dumper sits on its own disk / tape which is not in
> > use for normal operation. Very unlikely that this device will get hit.
> >  
> > > To me we should not create a arch specific way of passing information
> > > between kernels. Stand alone kernel should be able to parse the
> > > ELF headers which contains all the relevant info. They have already
> > > been checksum verified.
> > 
> > Ok, so this seems to be the main point of discussion. When to create the
> > ELF headers and how to pass all the required information from the crashed
> > system to the kdump kernel.
> 
> To me we seem to be diverging a lot from existing kdump+kexec-tools
> mechanism just to solve the case of early crash dumping. If we break
> down the problem in two parts and do thing kexec-tools way (with a
> backup path of booting stand alone kernel if kdump kenrel is corrupted),
> things might be better.

The "backup path of booting stand alone kernel" would result in passing
the control twice, once from the stand-alone dumper to the kexec purgatory
(after the purgatory checksum has been verified), then doing more checks 
in the kdump kernel, only to return to the stand-alone dumper if some check
fails. Does not really sound enticing to me.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-09 17:58       ` Valdis.Kletnieks
@ 2011-07-12 13:52           ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-12 13:52 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Michael Holzheu, ebiederm, hbabu, mahesh, oomichi, horms,
	schwidefsky, heiko.carstens, kexec, linux-kernel, linux-s390

On Sat, Jul 09, 2011 at 01:58:19PM -0400, Valdis.Kletnieks@vt.edu wrote:
> On Thu, 07 Jul 2011 15:33:21 EDT, Vivek Goyal said:
> > On Wed, Jul 06, 2011 at 11:24:47AM +0200, Michael Holzheu wrote:
> 
> > > S390 stand-alone dump tools are independent mini operating systems that
> > > are installed on disks or tapes. When a dump should be created, these
> > > stand-alone dump tools are booted. All that they do is to write the dump
> > > (current memory plus the CPU registers) to the disk/tape device.
> > > 
> > > The advantage compared to kdump is that since they are freshly loaded
> > > into memory they can't be overwritten in memory.
> > 
> > > Another advantage is
> > > that since it is different code, it is much less likely that the dump
> > > tool will run into the same problem than the previously crashed kernel.
> > 
> > I think in practice this is not really a problem. If your kernel
> > is not stable enough to even boot and copy a file, then most likely
> > it has not even been deployed. The very fact that a kernel has been
> > up and running verifies that it is a stable kernel for that machine
> > and is capable of capturing the dump.
> 
> Vivek: I used to do VM/XA on S/390 boxes for a living, and that's *not* where
> Michael is coming from.
> 
> What the standalone dump code does is take a system that may have the moral
> equivalent of 256 separate PCI buses, several hundred disks all visible in
> multipath configurations, dozens of other devices, and as long as you can find
> *one* console and *one* tape/disk drive that works, you can capture a dump.

IIUC, capturing dump in virtualized environment is much more easy as
software is not completely dead and hypervisor is still running. For
example, qemu can easily capture the memory snapshot of the VM once it
is hung reliably in all situations. Issue becomes mageability with filtering
with various kernel versions and across operating systems inside VM. Hence
kdump for linux is being deployed even in virtualized environment.

I guess using stand alone dump tools is very similar to qemu dump in terms
of reliability but lacks filtering capabilities and is limited to specific
devices. That way qemu is much more powerful.

> 
> More than once in my career, I got into a situation where the production system
> would hang - and booting off another disk that contained an older copy with
> maybe a few less patches would *also* hang.  VM/XA would simply *not run*.
> Booting the standalone dump utility (which shared zero code with VM/XA, and did
> *much* less initialization of I/O devices not needed for the actual dump) would
> work just fine.  This would get me a dump that would show that we had a
> (usually) hardware issue - either we were tripping over an errata that *no*
> released version of VM/XA had a workaround for, or outright defective hardware.

Can we not achieve almost equivalent of it by only loading very selective
modules in second kernel?

If not, one can always use qemu-kvm dump capability with kvm hypervisor if
kdump does not work. It will be a manual operation though like s390 stand
alone dump utility.

So the point is that I am fine with stand alone dump utitliy capturing
the dump. Just keep it as backup plan if kdump does not work. Also for
early crashes kdump will not work and stand alone dump utility will be
the primary plan to capture the dump.

In above example, are you saying that your production kernel does not even
boot now which used to boot in the past on same system (because of some
bad hardware state?).

> 
> For the same efficiency reasons that Linux doesn't do a lot of checking for
> "can never happen" cases, VM/XA doesn't check some things. So when busted
> hardware would present logically impossible combinations of status bits (for
> instance, "device still connected" but "I/O bus disconnected"), Bad Things
> would happen.  Booting a tiny dump program that never even *tried* to look at
> the bad bits posted by the miscreant hardware would allow you to get the info
> you needed to debug it.

Ok, may be. I am not saying that don't use stand alone dump utility for
severe hardware issues. I am just saying that a closer integration with
kexec infrastructure like other architecture will be better. We probably
do not require any common code changes except a custom purgatory for
s390 to IPL stand alone utilities.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-12 13:52           ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-12 13:52 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, schwidefsky, Michael Holzheu, kexec

On Sat, Jul 09, 2011 at 01:58:19PM -0400, Valdis.Kletnieks@vt.edu wrote:
> On Thu, 07 Jul 2011 15:33:21 EDT, Vivek Goyal said:
> > On Wed, Jul 06, 2011 at 11:24:47AM +0200, Michael Holzheu wrote:
> 
> > > S390 stand-alone dump tools are independent mini operating systems that
> > > are installed on disks or tapes. When a dump should be created, these
> > > stand-alone dump tools are booted. All that they do is to write the dump
> > > (current memory plus the CPU registers) to the disk/tape device.
> > > 
> > > The advantage compared to kdump is that since they are freshly loaded
> > > into memory they can't be overwritten in memory.
> > 
> > > Another advantage is
> > > that since it is different code, it is much less likely that the dump
> > > tool will run into the same problem than the previously crashed kernel.
> > 
> > I think in practice this is not really a problem. If your kernel
> > is not stable enough to even boot and copy a file, then most likely
> > it has not even been deployed. The very fact that a kernel has been
> > up and running verifies that it is a stable kernel for that machine
> > and is capable of capturing the dump.
> 
> Vivek: I used to do VM/XA on S/390 boxes for a living, and that's *not* where
> Michael is coming from.
> 
> What the standalone dump code does is take a system that may have the moral
> equivalent of 256 separate PCI buses, several hundred disks all visible in
> multipath configurations, dozens of other devices, and as long as you can find
> *one* console and *one* tape/disk drive that works, you can capture a dump.

IIUC, capturing dump in virtualized environment is much more easy as
software is not completely dead and hypervisor is still running. For
example, qemu can easily capture the memory snapshot of the VM once it
is hung reliably in all situations. Issue becomes mageability with filtering
with various kernel versions and across operating systems inside VM. Hence
kdump for linux is being deployed even in virtualized environment.

I guess using stand alone dump tools is very similar to qemu dump in terms
of reliability but lacks filtering capabilities and is limited to specific
devices. That way qemu is much more powerful.

> 
> More than once in my career, I got into a situation where the production system
> would hang - and booting off another disk that contained an older copy with
> maybe a few less patches would *also* hang.  VM/XA would simply *not run*.
> Booting the standalone dump utility (which shared zero code with VM/XA, and did
> *much* less initialization of I/O devices not needed for the actual dump) would
> work just fine.  This would get me a dump that would show that we had a
> (usually) hardware issue - either we were tripping over an errata that *no*
> released version of VM/XA had a workaround for, or outright defective hardware.

Can we not achieve almost equivalent of it by only loading very selective
modules in second kernel?

If not, one can always use qemu-kvm dump capability with kvm hypervisor if
kdump does not work. It will be a manual operation though like s390 stand
alone dump utility.

So the point is that I am fine with stand alone dump utitliy capturing
the dump. Just keep it as backup plan if kdump does not work. Also for
early crashes kdump will not work and stand alone dump utility will be
the primary plan to capture the dump.

In above example, are you saying that your production kernel does not even
boot now which used to boot in the past on same system (because of some
bad hardware state?).

> 
> For the same efficiency reasons that Linux doesn't do a lot of checking for
> "can never happen" cases, VM/XA doesn't check some things. So when busted
> hardware would present logically impossible combinations of status bits (for
> instance, "device still connected" but "I/O bus disconnected"), Bad Things
> would happen.  Booting a tiny dump program that never even *tried* to look at
> the bad bits posted by the miscreant hardware would allow you to get the info
> you needed to debug it.

Ok, may be. I am not saying that don't use stand alone dump utility for
severe hardware issues. I am just saying that a closer integration with
kexec infrastructure like other architecture will be better. We probably
do not require any common code changes except a custom purgatory for
s390 to IPL stand alone utilities.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-11 15:36           ` Vivek Goyal
@ 2011-07-12 17:29             ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-12 17:29 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, schwidefsky, kexec

Hello Vivek,

On Mon, 2011-07-11 at 11:36 -0400, Vivek Goyal wrote:
> > > On a side note, few months back there were folks who were trying
> > > to enhance bootloaders to be able to prepare basic environment so
> > > that a kdump kernel can boot even in the event of early first
> > > kernel boot.
> > 
> > This is one more argument to create the ELF header in the 2nd kernel.
> > With our approach loading the kdump kernel at boot time is almost
> > trivial.
> 
> I think ELF header is just the way of passing some required information
> from first kernel to second kernel. In second kernel, we anyway prepare
> fresh headers for /proc/vmcore.
> 
> So in your mechanism if you don't need any info from second kernel it
> is fine to not use ELF. But if you do need, then it makes sense to
> use existing mechanism instead of creating a new one (seems to be
> meminfo in your case).

Ok fine. Let's concentrate on the information that we have to pass from
the old to the new kernel. We have two ways to start the dump mechanism
to consider. First, direct call to kdump from the crashed system and
second, the detour via the stand-alone dump. In both cases we need the
following information from the old kernel:
* Pointer to vmcoreinfo
* Pointer to reboot (re-IPL) information (s390 specific)
* Boot CPU registers

The vmcoreinfo pointer is required for creating the vmcoreinfo ELF note
that is used afterwards by tools like makedumpfile.

The reboot information is required to ensure that a reboot of the kdump
kernel will restart the original production system.

The boot CPU registers are needed for the ELF CPU note of the IPL CPU. 

CPU registers of non-boot CPUs and the memory layout can be determined
in the 2nd kernel on s390.

Now let's see how we can transfer that information for the two cases we
have:

Case 1: Direct call via panic()

More or less we could do it the same way as on x86. The kexec tool
prepares the ELF header with ELF notes for vmcoreinfo, s390 reboot
information, ELF loads for the memory areas, and the containers for the
CPU notes. Panic writes the CPU registers to the prepared location and
jumps to purgatory code. Purgatory code start loaded kdump kernel with
"elfcorehdr=" parameter.

Case 2: Indirect call via stand-alone dump

When the stand-alone dump is started, it knows nothing about the crashed
system. We need to pass at least the address of the kdump entry point
and the address of the ELF header at a well defined location in order to
start kdump from the stand-alone dump tool. So it think we still need
something like meminfo.

To convert case 2 to the ELF header approach, we now would need to do
something like the following in the stand-alone tools code:
* Verify that kdump kernel is present.
* Save all non-boot CPU registers and then copy the registers of all
CPUs to the prepared ELF Notes. To do that the tools need to parse the
ELF header and to find the location of the required ELF notes.
* Call purgatory entry point.

We cannot trust anything in memory including the purgatory code. To
verify that the purgatory code is unmodified, we need the address and
the length of purgatory together with the checksum.

The s390 reboot information is *already* stored at a well defined
location that is used today by the stand-alone dump tools to reboot the
production system after dump (independent from kdump). This information
is protected by a checksum as well and is needed for the backup case
reboot, if we do not have a pre-loaded kdump or the purgatory checksum
fails.

In the following I describe the changes that (I think) I have to do, if
we switch to the ELF header communication.

1st kernel (crashed production system)
--------------------------------------
* Add information about kdump/purgatory entry point, address of ELF
header, purgatory start, length and checksum at some well defined
address so that stand-alone dump tools can find it.
* Communicate re-IPL block via ELF header:
  - Either new ELF note: Add /sys/kernel/s390_reboot_info with
    address of re-IPL info block 
  - Or perhaps add re-IPL block pointer to vmcoreinfo
* Fill CPU registers into ELF notes at crash time and call purgatory
* If purgatory returns, stop machine.

kexec tools
-----------
* Create and load ELF header + purgatory
* Create new ELF NOTE for s390 re-IPL info. Maybe not required,
  if we use vmcoreinfo.
* Change purgatory code:
  - Checksum failed: Return to caller instead of looping?
  - Checksum ok: jump to crashk base + 0x10008 and start kdump

2nd kernel (kdump)
------------------
* Prevent ELF header memory from being overwritten (how do we get the
ELF header size?)
* Parse ELF header and/or vmcoreinfo to get s390 re-IPL info

Stand-alone dump tools:
-----------------------
* Find ELF header, purgatory start/length/checksum, and kdump entry
point (meminfo?)
* Verify the purgatory.
* Parse ELF header and find location of pre-allocated ELF notes to store
CPU register sets.
* Jump to purgatory.
* If purgatory returns, write stand-alone dump.

Is that something that you had in mind? IMHO this does not eliminate the
need of something like meminfo. Also we have to consider that the
stand-alone dump tools are written in assembler and it is always hard to
add complex code here.

But perhaps I just can't see the forest for the trees and you have a
better idea?

Michael


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-12 17:29             ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-12 17:29 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, schwidefsky, kexec

Hello Vivek,

On Mon, 2011-07-11 at 11:36 -0400, Vivek Goyal wrote:
> > > On a side note, few months back there were folks who were trying
> > > to enhance bootloaders to be able to prepare basic environment so
> > > that a kdump kernel can boot even in the event of early first
> > > kernel boot.
> > 
> > This is one more argument to create the ELF header in the 2nd kernel.
> > With our approach loading the kdump kernel at boot time is almost
> > trivial.
> 
> I think ELF header is just the way of passing some required information
> from first kernel to second kernel. In second kernel, we anyway prepare
> fresh headers for /proc/vmcore.
> 
> So in your mechanism if you don't need any info from second kernel it
> is fine to not use ELF. But if you do need, then it makes sense to
> use existing mechanism instead of creating a new one (seems to be
> meminfo in your case).

Ok fine. Let's concentrate on the information that we have to pass from
the old to the new kernel. We have two ways to start the dump mechanism
to consider. First, direct call to kdump from the crashed system and
second, the detour via the stand-alone dump. In both cases we need the
following information from the old kernel:
* Pointer to vmcoreinfo
* Pointer to reboot (re-IPL) information (s390 specific)
* Boot CPU registers

The vmcoreinfo pointer is required for creating the vmcoreinfo ELF note
that is used afterwards by tools like makedumpfile.

The reboot information is required to ensure that a reboot of the kdump
kernel will restart the original production system.

The boot CPU registers are needed for the ELF CPU note of the IPL CPU. 

CPU registers of non-boot CPUs and the memory layout can be determined
in the 2nd kernel on s390.

Now let's see how we can transfer that information for the two cases we
have:

Case 1: Direct call via panic()

More or less we could do it the same way as on x86. The kexec tool
prepares the ELF header with ELF notes for vmcoreinfo, s390 reboot
information, ELF loads for the memory areas, and the containers for the
CPU notes. Panic writes the CPU registers to the prepared location and
jumps to purgatory code. Purgatory code start loaded kdump kernel with
"elfcorehdr=" parameter.

Case 2: Indirect call via stand-alone dump

When the stand-alone dump is started, it knows nothing about the crashed
system. We need to pass at least the address of the kdump entry point
and the address of the ELF header at a well defined location in order to
start kdump from the stand-alone dump tool. So it think we still need
something like meminfo.

To convert case 2 to the ELF header approach, we now would need to do
something like the following in the stand-alone tools code:
* Verify that kdump kernel is present.
* Save all non-boot CPU registers and then copy the registers of all
CPUs to the prepared ELF Notes. To do that the tools need to parse the
ELF header and to find the location of the required ELF notes.
* Call purgatory entry point.

We cannot trust anything in memory including the purgatory code. To
verify that the purgatory code is unmodified, we need the address and
the length of purgatory together with the checksum.

The s390 reboot information is *already* stored at a well defined
location that is used today by the stand-alone dump tools to reboot the
production system after dump (independent from kdump). This information
is protected by a checksum as well and is needed for the backup case
reboot, if we do not have a pre-loaded kdump or the purgatory checksum
fails.

In the following I describe the changes that (I think) I have to do, if
we switch to the ELF header communication.

1st kernel (crashed production system)
--------------------------------------
* Add information about kdump/purgatory entry point, address of ELF
header, purgatory start, length and checksum at some well defined
address so that stand-alone dump tools can find it.
* Communicate re-IPL block via ELF header:
  - Either new ELF note: Add /sys/kernel/s390_reboot_info with
    address of re-IPL info block 
  - Or perhaps add re-IPL block pointer to vmcoreinfo
* Fill CPU registers into ELF notes at crash time and call purgatory
* If purgatory returns, stop machine.

kexec tools
-----------
* Create and load ELF header + purgatory
* Create new ELF NOTE for s390 re-IPL info. Maybe not required,
  if we use vmcoreinfo.
* Change purgatory code:
  - Checksum failed: Return to caller instead of looping?
  - Checksum ok: jump to crashk base + 0x10008 and start kdump

2nd kernel (kdump)
------------------
* Prevent ELF header memory from being overwritten (how do we get the
ELF header size?)
* Parse ELF header and/or vmcoreinfo to get s390 re-IPL info

Stand-alone dump tools:
-----------------------
* Find ELF header, purgatory start/length/checksum, and kdump entry
point (meminfo?)
* Verify the purgatory.
* Parse ELF header and find location of pre-allocated ELF notes to store
CPU register sets.
* Jump to purgatory.
* If purgatory returns, write stand-alone dump.

Is that something that you had in mind? IMHO this does not eliminate the
need of something like meminfo. Also we have to consider that the
stand-alone dump tools are written in assembler and it is always hard to
add complex code here.

But perhaps I just can't see the forest for the trees and you have a
better idea?

Michael


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-11 15:56             ` Martin Schwidefsky
@ 2011-07-13 16:02               ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-13 16:02 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Michael Holzheu, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Mon, Jul 11, 2011 at 05:56:26PM +0200, Martin Schwidefsky wrote:

[..]
> > kexec-tools purgatory code already has the checksum logic. So you don't
> > have to redo that in stand alone tools. I think you probably need to
> > s390 specic purgatory and jump to IPLing stand alone kernel if kdump
> > kernel is corrupted instead of rebooting back or spinning infinitely
> > in the loop/
> 
> I can not quite follow you here. The purgatory code is part of the kdump kernel,
> no? When we trigger a dump with the stand-alone tools we will start executing
> code in the assembler function of that stand-alone tools. We can not trust
> the kdump kernel yet, not without doing the checksums first.

Purgatory is another piece of binary code which is loaded along with kdump
kernel in reserved memory area. So yes, there is a chance that this code
itself get corrupted.

So in case of stand alone dump, you save the calculated checksum of
kdump kernel at disk and not in memory? And then calculate the checksum
of memory image of kdump kernel and decide whether kdump kenrel is 
corrupted or not?

If yes, this sounds more reliable as checksum of kernel is stored on
some disk/tape.

[..]
> > Ok. So again why not reuse the checksump capability of kexec-tools and
> > instead of infinite looping you can jump to stand alone tools + IPL etc.
> > I understand this will require a tighter integration with kexec-tools
> > and using ELF header mechanism and will not cover the early kernel
> > crashes.
> 
> Imho the checksum of kexec-tools is in the wrong place.

Because you think that stored checksum can get corrupted?

[..]
> > To me we seem to be diverging a lot from existing kdump+kexec-tools
> > mechanism just to solve the case of early crash dumping. If we break
> > down the problem in two parts and do thing kexec-tools way (with a
> > backup path of booting stand alone kernel if kdump kenrel is corrupted),
> > things might be better.
> 
> The "backup path of booting stand alone kernel" would result in passing
> the control twice, once from the stand-alone dumper to the kexec purgatory
> (after the purgatory checksum has been verified), then doing more checks 
> in the kdump kernel, only to return to the stand-alone dumper if some check
> fails. Does not really sound enticing to me.

What I am suggesting is that stand alone dumper gets control only if
kdump kernel is corrupted.

So following sequence.

Kernel Crash ---> purgatory --> either kdump kenrel/IPL stand alone tools

Here only drawback seems to be that we assume that purgatory code and
pre-calculated checksum has not been corrupted. The big advantage is
that s390 kdump support looks very similar to other arches and
understaning and supporting kdump across architectures becomes easy.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-13 16:02               ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-13 16:02 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Michael Holzheu, kexec

On Mon, Jul 11, 2011 at 05:56:26PM +0200, Martin Schwidefsky wrote:

[..]
> > kexec-tools purgatory code already has the checksum logic. So you don't
> > have to redo that in stand alone tools. I think you probably need to
> > s390 specic purgatory and jump to IPLing stand alone kernel if kdump
> > kernel is corrupted instead of rebooting back or spinning infinitely
> > in the loop/
> 
> I can not quite follow you here. The purgatory code is part of the kdump kernel,
> no? When we trigger a dump with the stand-alone tools we will start executing
> code in the assembler function of that stand-alone tools. We can not trust
> the kdump kernel yet, not without doing the checksums first.

Purgatory is another piece of binary code which is loaded along with kdump
kernel in reserved memory area. So yes, there is a chance that this code
itself get corrupted.

So in case of stand alone dump, you save the calculated checksum of
kdump kernel at disk and not in memory? And then calculate the checksum
of memory image of kdump kernel and decide whether kdump kenrel is 
corrupted or not?

If yes, this sounds more reliable as checksum of kernel is stored on
some disk/tape.

[..]
> > Ok. So again why not reuse the checksump capability of kexec-tools and
> > instead of infinite looping you can jump to stand alone tools + IPL etc.
> > I understand this will require a tighter integration with kexec-tools
> > and using ELF header mechanism and will not cover the early kernel
> > crashes.
> 
> Imho the checksum of kexec-tools is in the wrong place.

Because you think that stored checksum can get corrupted?

[..]
> > To me we seem to be diverging a lot from existing kdump+kexec-tools
> > mechanism just to solve the case of early crash dumping. If we break
> > down the problem in two parts and do thing kexec-tools way (with a
> > backup path of booting stand alone kernel if kdump kenrel is corrupted),
> > things might be better.
> 
> The "backup path of booting stand alone kernel" would result in passing
> the control twice, once from the stand-alone dumper to the kexec purgatory
> (after the purgatory checksum has been verified), then doing more checks 
> in the kdump kernel, only to return to the stand-alone dumper if some check
> fails. Does not really sound enticing to me.

What I am suggesting is that stand alone dumper gets control only if
kdump kernel is corrupted.

So following sequence.

Kernel Crash ---> purgatory --> either kdump kenrel/IPL stand alone tools

Here only drawback seems to be that we assume that purgatory code and
pre-calculated checksum has not been corrupted. The big advantage is
that s390 kdump support looks very similar to other arches and
understaning and supporting kdump across architectures becomes easy.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-13 16:02               ` Vivek Goyal
@ 2011-07-13 16:46                 ` Martin Schwidefsky
  -1 siblings, 0 replies; 112+ messages in thread
From: Martin Schwidefsky @ 2011-07-13 16:46 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Michael Holzheu, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Wed, 13 Jul 2011 12:02:39 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Mon, Jul 11, 2011 at 05:56:26PM +0200, Martin Schwidefsky wrote:
> 
> [..]
> > > kexec-tools purgatory code already has the checksum logic. So you don't
> > > have to redo that in stand alone tools. I think you probably need to
> > > s390 specic purgatory and jump to IPLing stand alone kernel if kdump
> > > kernel is corrupted instead of rebooting back or spinning infinitely
> > > in the loop/
> > 
> > I can not quite follow you here. The purgatory code is part of the kdump kernel,
> > no? When we trigger a dump with the stand-alone tools we will start executing
> > code in the assembler function of that stand-alone tools. We can not trust
> > the kdump kernel yet, not without doing the checksums first.
> 
> Purgatory is another piece of binary code which is loaded along with kdump
> kernel in reserved memory area. So yes, there is a chance that this code
> itself get corrupted.

Yes, that is one of the possible failure scenarios.
 
> So in case of stand alone dump, you save the calculated checksum of
> kdump kernel at disk and not in memory? And then calculate the checksum
> of memory image of kdump kernel and decide whether kdump kenrel is 
> corrupted or not?
> 
> If yes, this sounds more reliable as checksum of kernel is stored on
> some disk/tape.

No, the checksum for the purgatory code is stored in memory. If the purgatory
code is corrupted you would have to corrupt the checksum in a very specific
way as well to make it fail. The likelihood for that to happen is very low,
but if it does we still have a fallback plan: before we branch to the
purgatory code we invalidate the checksum. If the purgatory code has been
corrupt although the checksum told us that it is fine the machine will crash
again. If we then start the stand-alone dump tool again it will create a
full dump. But mind you that second IPL of the stand-alone dump tool is only
required for a very, very rare case.

> [..]
> > > Ok. So again why not reuse the checksump capability of kexec-tools and
> > > instead of infinite looping you can jump to stand alone tools + IPL etc.
> > > I understand this will require a tighter integration with kexec-tools
> > > and using ELF header mechanism and will not cover the early kernel
> > > crashes.
> > 
> > Imho the checksum of kexec-tools is in the wrong place.
> 
> Because you think that stored checksum can get corrupted?

No, what I meant is that the code that verifies the checksum has to be part
of the stand-alone dump tool and not the purgatory code.

> [..]
> > > To me we seem to be diverging a lot from existing kdump+kexec-tools
> > > mechanism just to solve the case of early crash dumping. If we break
> > > down the problem in two parts and do thing kexec-tools way (with a
> > > backup path of booting stand alone kernel if kdump kenrel is corrupted),
> > > things might be better.
> > 
> > The "backup path of booting stand alone kernel" would result in passing
> > the control twice, once from the stand-alone dumper to the kexec purgatory
> > (after the purgatory checksum has been verified), then doing more checks 
> > in the kdump kernel, only to return to the stand-alone dumper if some check
> > fails. Does not really sound enticing to me.
> 
> What I am suggesting is that stand alone dumper gets control only if
> kdump kernel is corrupted.
> 
> So following sequence.
> 
> Kernel Crash ---> purgatory --> either kdump kenrel/IPL stand alone tools
> 
> Here only drawback seems to be that we assume that purgatory code and
> pre-calculated checksum has not been corrupted. The big advantage is
> that s390 kdump support looks very similar to other arches and
> understaning and supporting kdump across architectures becomes easy.

My problem with that is the following: how do we get from the "Kernel Crash"
step to the purgatory code? It does work for "normal" panics, but it fails
miserably for a hard crash that does not even get as far as panic. That is
why we insist on a possible second order of things:

Kernel Crash --> IPL of stand-alone dump tool --> branch to kdump if the
checksums turn out ok. 

If the kernel called panic itself and branched to the purgatory code but the
checksum turned out to be bad we just stop there. Then the operator has to
do a manual IPL of the stand-alone dump tool to get the dump.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-13 16:46                 ` Martin Schwidefsky
  0 siblings, 0 replies; 112+ messages in thread
From: Martin Schwidefsky @ 2011-07-13 16:46 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Michael Holzheu, kexec

On Wed, 13 Jul 2011 12:02:39 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Mon, Jul 11, 2011 at 05:56:26PM +0200, Martin Schwidefsky wrote:
> 
> [..]
> > > kexec-tools purgatory code already has the checksum logic. So you don't
> > > have to redo that in stand alone tools. I think you probably need to
> > > s390 specic purgatory and jump to IPLing stand alone kernel if kdump
> > > kernel is corrupted instead of rebooting back or spinning infinitely
> > > in the loop/
> > 
> > I can not quite follow you here. The purgatory code is part of the kdump kernel,
> > no? When we trigger a dump with the stand-alone tools we will start executing
> > code in the assembler function of that stand-alone tools. We can not trust
> > the kdump kernel yet, not without doing the checksums first.
> 
> Purgatory is another piece of binary code which is loaded along with kdump
> kernel in reserved memory area. So yes, there is a chance that this code
> itself get corrupted.

Yes, that is one of the possible failure scenarios.
 
> So in case of stand alone dump, you save the calculated checksum of
> kdump kernel at disk and not in memory? And then calculate the checksum
> of memory image of kdump kernel and decide whether kdump kenrel is 
> corrupted or not?
> 
> If yes, this sounds more reliable as checksum of kernel is stored on
> some disk/tape.

No, the checksum for the purgatory code is stored in memory. If the purgatory
code is corrupted you would have to corrupt the checksum in a very specific
way as well to make it fail. The likelihood for that to happen is very low,
but if it does we still have a fallback plan: before we branch to the
purgatory code we invalidate the checksum. If the purgatory code has been
corrupt although the checksum told us that it is fine the machine will crash
again. If we then start the stand-alone dump tool again it will create a
full dump. But mind you that second IPL of the stand-alone dump tool is only
required for a very, very rare case.

> [..]
> > > Ok. So again why not reuse the checksump capability of kexec-tools and
> > > instead of infinite looping you can jump to stand alone tools + IPL etc.
> > > I understand this will require a tighter integration with kexec-tools
> > > and using ELF header mechanism and will not cover the early kernel
> > > crashes.
> > 
> > Imho the checksum of kexec-tools is in the wrong place.
> 
> Because you think that stored checksum can get corrupted?

No, what I meant is that the code that verifies the checksum has to be part
of the stand-alone dump tool and not the purgatory code.

> [..]
> > > To me we seem to be diverging a lot from existing kdump+kexec-tools
> > > mechanism just to solve the case of early crash dumping. If we break
> > > down the problem in two parts and do thing kexec-tools way (with a
> > > backup path of booting stand alone kernel if kdump kenrel is corrupted),
> > > things might be better.
> > 
> > The "backup path of booting stand alone kernel" would result in passing
> > the control twice, once from the stand-alone dumper to the kexec purgatory
> > (after the purgatory checksum has been verified), then doing more checks 
> > in the kdump kernel, only to return to the stand-alone dumper if some check
> > fails. Does not really sound enticing to me.
> 
> What I am suggesting is that stand alone dumper gets control only if
> kdump kernel is corrupted.
> 
> So following sequence.
> 
> Kernel Crash ---> purgatory --> either kdump kenrel/IPL stand alone tools
> 
> Here only drawback seems to be that we assume that purgatory code and
> pre-calculated checksum has not been corrupted. The big advantage is
> that s390 kdump support looks very similar to other arches and
> understaning and supporting kdump across architectures becomes easy.

My problem with that is the following: how do we get from the "Kernel Crash"
step to the purgatory code? It does work for "normal" panics, but it fails
miserably for a hard crash that does not even get as far as panic. That is
why we insist on a possible second order of things:

Kernel Crash --> IPL of stand-alone dump tool --> branch to kdump if the
checksums turn out ok. 

If the kernel called panic itself and branched to the purgatory code but the
checksum turned out to be bad we just stop there. Then the operator has to
do a manual IPL of the stand-alone dump tool to get the dump.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-13 16:46                 ` Martin Schwidefsky
@ 2011-07-13 16:59                   ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-13 16:59 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Vivek Goyal, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Wed, 2011-07-13 at 18:46 +0200, Martin Schwidefsky wrote:
> On Wed, 13 Jul 2011 12:02:39 -0400
> Vivek Goyal <vgoyal@redhat.com> wrote:
> > So in case of stand alone dump, you save the calculated checksum of
> > kdump kernel at disk and not in memory? And then calculate the checksum
> > of memory image of kdump kernel and decide whether kdump kenrel is 
> > corrupted or not?
> > 
> > If yes, this sounds more reliable as checksum of kernel is stored on
> > some disk/tape.
> 
> No, the checksum for the purgatory code is stored in memory. If the purgatory
> code is corrupted you would have to corrupt the checksum in a very specific
> way as well to make it fail. 

Currently we store the checksums for the loaded *kexec segments* in
memory at the end of kexec_load(). The stand-alone dump tools also
calculate the checksums for all segments and compare them with the
stored checksums. The dump tools can do that because we have meminfos
for all segments. A meminfo element contains:
* address of memory chunk
* size of memory chunk
* checksum of memory chunk (calculated at the end of kexec_load()

Michael


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-13 16:59                   ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-13 16:59 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, kexec, Vivek Goyal

On Wed, 2011-07-13 at 18:46 +0200, Martin Schwidefsky wrote:
> On Wed, 13 Jul 2011 12:02:39 -0400
> Vivek Goyal <vgoyal@redhat.com> wrote:
> > So in case of stand alone dump, you save the calculated checksum of
> > kdump kernel at disk and not in memory? And then calculate the checksum
> > of memory image of kdump kernel and decide whether kdump kenrel is 
> > corrupted or not?
> > 
> > If yes, this sounds more reliable as checksum of kernel is stored on
> > some disk/tape.
> 
> No, the checksum for the purgatory code is stored in memory. If the purgatory
> code is corrupted you would have to corrupt the checksum in a very specific
> way as well to make it fail. 

Currently we store the checksums for the loaded *kexec segments* in
memory at the end of kexec_load(). The stand-alone dump tools also
calculate the checksums for all segments and compare them with the
stored checksums. The dump tools can do that because we have meminfos
for all segments. A meminfo element contains:
* address of memory chunk
* size of memory chunk
* checksum of memory chunk (calculated at the end of kexec_load()

Michael


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-13 16:59                   ` Michael Holzheu
@ 2011-07-13 17:19                     ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-13 17:19 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: Martin Schwidefsky, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Wed, Jul 13, 2011 at 06:59:50PM +0200, Michael Holzheu wrote:
> On Wed, 2011-07-13 at 18:46 +0200, Martin Schwidefsky wrote:
> > On Wed, 13 Jul 2011 12:02:39 -0400
> > Vivek Goyal <vgoyal@redhat.com> wrote:
> > > So in case of stand alone dump, you save the calculated checksum of
> > > kdump kernel at disk and not in memory? And then calculate the checksum
> > > of memory image of kdump kernel and decide whether kdump kenrel is 
> > > corrupted or not?
> > > 
> > > If yes, this sounds more reliable as checksum of kernel is stored on
> > > some disk/tape.
> > 
> > No, the checksum for the purgatory code is stored in memory. If the purgatory
> > code is corrupted you would have to corrupt the checksum in a very specific
> > way as well to make it fail. 
> 
> Currently we store the checksums for the loaded *kexec segments* in
> memory at the end of kexec_load(). The stand-alone dump tools also
> calculate the checksums for all segments and compare them with the
> stored checksums. The dump tools can do that because we have meminfos
> for all segments. A meminfo element contains:
> * address of memory chunk
> * size of memory chunk
> * checksum of memory chunk (calculated at the end of kexec_load()

So this does not seem to be very different from what kexec and purgatory
is doing on x86. So why not simply reuse that and if checksum fails
jump to stand alone kernel. Why to duplicate all the checksum logic
in stand alone dump tools.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-13 17:19                     ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-13 17:19 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

On Wed, Jul 13, 2011 at 06:59:50PM +0200, Michael Holzheu wrote:
> On Wed, 2011-07-13 at 18:46 +0200, Martin Schwidefsky wrote:
> > On Wed, 13 Jul 2011 12:02:39 -0400
> > Vivek Goyal <vgoyal@redhat.com> wrote:
> > > So in case of stand alone dump, you save the calculated checksum of
> > > kdump kernel at disk and not in memory? And then calculate the checksum
> > > of memory image of kdump kernel and decide whether kdump kenrel is 
> > > corrupted or not?
> > > 
> > > If yes, this sounds more reliable as checksum of kernel is stored on
> > > some disk/tape.
> > 
> > No, the checksum for the purgatory code is stored in memory. If the purgatory
> > code is corrupted you would have to corrupt the checksum in a very specific
> > way as well to make it fail. 
> 
> Currently we store the checksums for the loaded *kexec segments* in
> memory at the end of kexec_load(). The stand-alone dump tools also
> calculate the checksums for all segments and compare them with the
> stored checksums. The dump tools can do that because we have meminfos
> for all segments. A meminfo element contains:
> * address of memory chunk
> * size of memory chunk
> * checksum of memory chunk (calculated at the end of kexec_load()

So this does not seem to be very different from what kexec and purgatory
is doing on x86. So why not simply reuse that and if checksum fails
jump to stand alone kernel. Why to duplicate all the checksum logic
in stand alone dump tools.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-13 16:46                 ` Martin Schwidefsky
@ 2011-07-13 20:00                   ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-13 20:00 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Michael Holzheu, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Wed, Jul 13, 2011 at 06:46:11PM +0200, Martin Schwidefsky wrote:

[..]
> > What I am suggesting is that stand alone dumper gets control only if
> > kdump kernel is corrupted.
> > 
> > So following sequence.
> > 
> > Kernel Crash ---> purgatory --> either kdump kenrel/IPL stand alone tools
> > 
> > Here only drawback seems to be that we assume that purgatory code and
> > pre-calculated checksum has not been corrupted. The big advantage is
> > that s390 kdump support looks very similar to other arches and
> > understaning and supporting kdump across architectures becomes easy.
> 
> My problem with that is the following: how do we get from the "Kernel Crash"
> step to the purgatory code? It does work for "normal" panics, but it fails
> miserably for a hard crash that does not even get as far as panic. That is
> why we insist on a possible second order of things:

What is hard crash? How does that happen and what does x86 and s390
do in that case?

Though I don't have details but your argument seems to be that in s390
we are always guranteed that we will jump to IPLing the stand alone
tools code irresepective of the system state hence it is relatively
safer to do checks in stand alone tools instead of purgatory where
code is in memory.

If due to hard hang, code can not even make to purgatory, where would
it go? Can't we do IPLing of stand alone tool then. 

So we first try to take purgatory path which does the checksum and is
consistent with other architectures. If that does not work in case
of hard hang, you always have the option of IPLing the stand alone tool
later manually.

This will also get rid of requirement passing all the segment and cheksum
info to stand alone tool with the help of meminfo (That's another sore
point). 

Bottom line, even if you can't make to purgatory reliably, you always
have the option of capturing dump manually using stand alone tools. We
don't have to mix up kdump and stand alone mechanism. If kdump fails, we
just need to have capability to still capture the dump using stand alone
tools manually. I think that will make things simpler even for stand alone
tools.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-13 20:00                   ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-13 20:00 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Michael Holzheu, kexec

On Wed, Jul 13, 2011 at 06:46:11PM +0200, Martin Schwidefsky wrote:

[..]
> > What I am suggesting is that stand alone dumper gets control only if
> > kdump kernel is corrupted.
> > 
> > So following sequence.
> > 
> > Kernel Crash ---> purgatory --> either kdump kenrel/IPL stand alone tools
> > 
> > Here only drawback seems to be that we assume that purgatory code and
> > pre-calculated checksum has not been corrupted. The big advantage is
> > that s390 kdump support looks very similar to other arches and
> > understaning and supporting kdump across architectures becomes easy.
> 
> My problem with that is the following: how do we get from the "Kernel Crash"
> step to the purgatory code? It does work for "normal" panics, but it fails
> miserably for a hard crash that does not even get as far as panic. That is
> why we insist on a possible second order of things:

What is hard crash? How does that happen and what does x86 and s390
do in that case?

Though I don't have details but your argument seems to be that in s390
we are always guranteed that we will jump to IPLing the stand alone
tools code irresepective of the system state hence it is relatively
safer to do checks in stand alone tools instead of purgatory where
code is in memory.

If due to hard hang, code can not even make to purgatory, where would
it go? Can't we do IPLing of stand alone tool then. 

So we first try to take purgatory path which does the checksum and is
consistent with other architectures. If that does not work in case
of hard hang, you always have the option of IPLing the stand alone tool
later manually.

This will also get rid of requirement passing all the segment and cheksum
info to stand alone tool with the help of meminfo (That's another sore
point). 

Bottom line, even if you can't make to purgatory reliably, you always
have the option of capturing dump manually using stand alone tools. We
don't have to mix up kdump and stand alone mechanism. If kdump fails, we
just need to have capability to still capture the dump using stand alone
tools manually. I think that will make things simpler even for stand alone
tools.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-13 20:00                   ` Vivek Goyal
@ 2011-07-14  7:18                     ` Martin Schwidefsky
  -1 siblings, 0 replies; 112+ messages in thread
From: Martin Schwidefsky @ 2011-07-14  7:18 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Michael Holzheu, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Wed, 13 Jul 2011 16:00:04 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Wed, Jul 13, 2011 at 06:46:11PM +0200, Martin Schwidefsky wrote:
> 
> [..]
> > > What I am suggesting is that stand alone dumper gets control only if
> > > kdump kernel is corrupted.
> > > 
> > > So following sequence.
> > > 
> > > Kernel Crash ---> purgatory --> either kdump kenrel/IPL stand alone tools
> > > 
> > > Here only drawback seems to be that we assume that purgatory code and
> > > pre-calculated checksum has not been corrupted. The big advantage is
> > > that s390 kdump support looks very similar to other arches and
> > > understaning and supporting kdump across architectures becomes easy.
> > 
> > My problem with that is the following: how do we get from the "Kernel Crash"
> > step to the purgatory code? It does work for "normal" panics, but it fails
> > miserably for a hard crash that does not even get as far as panic. That is
> > why we insist on a possible second order of things:
> 
> What is hard crash? How does that happen and what does x86 and s390
> do in that case?

E.g. an endless loop with interrupts disabled. To get out of this situation
we will IPL/boot a new system. That is either the production system itself
or the stand-alone dump tool. 
 
> Though I don't have details but your argument seems to be that in s390
> we are always guranteed that we will jump to IPLing the stand alone
> tools code irresepective of the system state hence it is relatively
> safer to do checks in stand alone tools instead of purgatory where
> code is in memory.

Now you got it. That is the crux of the argument.

> If due to hard hang, code can not even make to purgatory, where would
> it go? Can't we do IPLing of stand alone tool then. 

It doesn't go anywhere. Basically the system is manually stopped and
restarted. But on s390 we can still get to all the required information
to generated a dump. That is one of the major differences to x86, if
you have to do a restart the registers on x86 will be gone, no?
 
> So we first try to take purgatory path which does the checksum and is
> consistent with other architectures. If that does not work in case
> of hard hang, you always have the option of IPLing the stand alone tool
> later manually.

How are we suddenly on the purgatory path again? The code that gets
control in case of a hard crash + IPL is the stand-alone dump tool,
not the purgatory code. The first thing we want to do is to check if
the purgatory is still fine, that is do a checksum. If we have the
infrastructure in place to do one checksum then we can easily do the
other checksums as well.
 
> This will also get rid of requirement passing all the segment and cheksum
> info to stand alone tool with the help of meminfo (That's another sore
> point). 

No, it doesn't. We will still need to do the checksum for the purgatory
code and we already have the re-ipl information which won't go away.
 
> Bottom line, even if you can't make to purgatory reliably, you always
> have the option of capturing dump manually using stand alone tools. We
> don't have to mix up kdump and stand alone mechanism. If kdump fails, we
> just need to have capability to still capture the dump using stand alone
> tools manually. I think that will make things simpler even for stand alone
> tools.

If we decide not to mix kdump and stand-alone dump then we loose something.
Consider a hard crash where the kdump segments are still intact. What our
customers do in that case is to start the stand-alone dump utility. Without
a way to find and verify the kdump setup we would have to do a full dump.
Which will take its time if the memory size is big. See?

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-14  7:18                     ` Martin Schwidefsky
  0 siblings, 0 replies; 112+ messages in thread
From: Martin Schwidefsky @ 2011-07-14  7:18 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Michael Holzheu, kexec

On Wed, 13 Jul 2011 16:00:04 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Wed, Jul 13, 2011 at 06:46:11PM +0200, Martin Schwidefsky wrote:
> 
> [..]
> > > What I am suggesting is that stand alone dumper gets control only if
> > > kdump kernel is corrupted.
> > > 
> > > So following sequence.
> > > 
> > > Kernel Crash ---> purgatory --> either kdump kenrel/IPL stand alone tools
> > > 
> > > Here only drawback seems to be that we assume that purgatory code and
> > > pre-calculated checksum has not been corrupted. The big advantage is
> > > that s390 kdump support looks very similar to other arches and
> > > understaning and supporting kdump across architectures becomes easy.
> > 
> > My problem with that is the following: how do we get from the "Kernel Crash"
> > step to the purgatory code? It does work for "normal" panics, but it fails
> > miserably for a hard crash that does not even get as far as panic. That is
> > why we insist on a possible second order of things:
> 
> What is hard crash? How does that happen and what does x86 and s390
> do in that case?

E.g. an endless loop with interrupts disabled. To get out of this situation
we will IPL/boot a new system. That is either the production system itself
or the stand-alone dump tool. 
 
> Though I don't have details but your argument seems to be that in s390
> we are always guranteed that we will jump to IPLing the stand alone
> tools code irresepective of the system state hence it is relatively
> safer to do checks in stand alone tools instead of purgatory where
> code is in memory.

Now you got it. That is the crux of the argument.

> If due to hard hang, code can not even make to purgatory, where would
> it go? Can't we do IPLing of stand alone tool then. 

It doesn't go anywhere. Basically the system is manually stopped and
restarted. But on s390 we can still get to all the required information
to generated a dump. That is one of the major differences to x86, if
you have to do a restart the registers on x86 will be gone, no?
 
> So we first try to take purgatory path which does the checksum and is
> consistent with other architectures. If that does not work in case
> of hard hang, you always have the option of IPLing the stand alone tool
> later manually.

How are we suddenly on the purgatory path again? The code that gets
control in case of a hard crash + IPL is the stand-alone dump tool,
not the purgatory code. The first thing we want to do is to check if
the purgatory is still fine, that is do a checksum. If we have the
infrastructure in place to do one checksum then we can easily do the
other checksums as well.
 
> This will also get rid of requirement passing all the segment and cheksum
> info to stand alone tool with the help of meminfo (That's another sore
> point). 

No, it doesn't. We will still need to do the checksum for the purgatory
code and we already have the re-ipl information which won't go away.
 
> Bottom line, even if you can't make to purgatory reliably, you always
> have the option of capturing dump manually using stand alone tools. We
> don't have to mix up kdump and stand alone mechanism. If kdump fails, we
> just need to have capability to still capture the dump using stand alone
> tools manually. I think that will make things simpler even for stand alone
> tools.

If we decide not to mix kdump and stand-alone dump then we loose something.
Consider a hard crash where the kdump segments are still intact. What our
customers do in that case is to start the stand-alone dump utility. Without
a way to find and verify the kdump setup we would have to do a full dump.
Which will take its time if the memory size is big. See?

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-14  7:18                     ` Martin Schwidefsky
@ 2011-07-14 17:55                       ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-14 17:55 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Michael Holzheu, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Thu, Jul 14, 2011 at 09:18:00AM +0200, Martin Schwidefsky wrote:
> On Wed, 13 Jul 2011 16:00:04 -0400
> Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > On Wed, Jul 13, 2011 at 06:46:11PM +0200, Martin Schwidefsky wrote:
> > 
> > [..]
> > > > What I am suggesting is that stand alone dumper gets control only if
> > > > kdump kernel is corrupted.
> > > > 
> > > > So following sequence.
> > > > 
> > > > Kernel Crash ---> purgatory --> either kdump kenrel/IPL stand alone tools
> > > > 
> > > > Here only drawback seems to be that we assume that purgatory code and
> > > > pre-calculated checksum has not been corrupted. The big advantage is
> > > > that s390 kdump support looks very similar to other arches and
> > > > understaning and supporting kdump across architectures becomes easy.
> > > 
> > > My problem with that is the following: how do we get from the "Kernel Crash"
> > > step to the purgatory code? It does work for "normal" panics, but it fails
> > > miserably for a hard crash that does not even get as far as panic. That is
> > > why we insist on a possible second order of things:
> > 
> > What is hard crash? How does that happen and what does x86 and s390
> > do in that case?
> 
> E.g. an endless loop with interrupts disabled. To get out of this situation
> we will IPL/boot a new system. That is either the production system itself
> or the stand-alone dump tool. 

NMI hardware lockup detection will work in this situation and will lead
to kdump trigger.

>  
> > Though I don't have details but your argument seems to be that in s390
> > we are always guranteed that we will jump to IPLing the stand alone
> > tools code irresepective of the system state hence it is relatively
> > safer to do checks in stand alone tools instead of purgatory where
> > code is in memory.
> 
> Now you got it. That is the crux of the argument.
> 
> > If due to hard hang, code can not even make to purgatory, where would
> > it go? Can't we do IPLing of stand alone tool then. 
> 
> It doesn't go anywhere. Basically the system is manually stopped and
> restarted. But on s390 we can still get to all the required information
> to generated a dump. That is one of the major differences to x86, if
> you have to do a restart the registers on x86 will be gone, no?
>  
> > So we first try to take purgatory path which does the checksum and is
> > consistent with other architectures. If that does not work in case
> > of hard hang, you always have the option of IPLing the stand alone tool
> > later manually.
> 
> How are we suddenly on the purgatory path again? The code that gets
> control in case of a hard crash + IPL is the stand-alone dump tool,
> not the purgatory code.

I think that's the biggest contetion point. From the start of discussion
you have this hardcoded requirement that the moment panic() happens
you are jumping to some IPL code and that's what I am questioning. Why
can't you execute some more code after panic() (purgatory), before
you jump to IPL code (only if you have to). 

> The first thing we want to do is to check if
> the purgatory is still fine, that is do a checksum. If we have the
> infrastructure in place to do one checksum then we can easily do the
> other checksums as well.

Some piece of code you have to assume is fine. Are you not already
assuming that IPL code you have in first 64K bytes is fine and no
body has overwritten it. Are you not assuming that hook in panic()
(I think you are calling it shutdown trigger) is fine so that it
can help you jump to right place.

>  
> > This will also get rid of requirement passing all the segment and cheksum
> > info to stand alone tool with the help of meminfo (That's another sore
> > point). 
> 
> No, it doesn't. We will still need to do the checksum for the purgatory
> code and we already have the re-ipl information which won't go away.

It is a very small piece of code. The way you assume that your 8KB of
IPL code is fine, I think we shall have to have this assumption here
also.

>  
> > Bottom line, even if you can't make to purgatory reliably, you always
> > have the option of capturing dump manually using stand alone tools. We
> > don't have to mix up kdump and stand alone mechanism. If kdump fails, we
> > just need to have capability to still capture the dump using stand alone
> > tools manually. I think that will make things simpler even for stand alone
> > tools.
> 
> If we decide not to mix kdump and stand-alone dump then we loose something.
> Consider a hard crash where the kdump segments are still intact. What our
> customers do in that case is to start the stand-alone dump utility. Without
> a way to find and verify the kdump setup we would have to do a full dump.
> Which will take its time if the memory size is big. See?

This is a really-2 corner case where purgatory went bad. And even in
corner case you capture the dump just that it is not filtered.

I really don't understand that to address the corner case why would
you complicate the general kexec infrastructure and introduce new
interfaces like meminfo.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-14 17:55                       ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-14 17:55 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Michael Holzheu, kexec

On Thu, Jul 14, 2011 at 09:18:00AM +0200, Martin Schwidefsky wrote:
> On Wed, 13 Jul 2011 16:00:04 -0400
> Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > On Wed, Jul 13, 2011 at 06:46:11PM +0200, Martin Schwidefsky wrote:
> > 
> > [..]
> > > > What I am suggesting is that stand alone dumper gets control only if
> > > > kdump kernel is corrupted.
> > > > 
> > > > So following sequence.
> > > > 
> > > > Kernel Crash ---> purgatory --> either kdump kenrel/IPL stand alone tools
> > > > 
> > > > Here only drawback seems to be that we assume that purgatory code and
> > > > pre-calculated checksum has not been corrupted. The big advantage is
> > > > that s390 kdump support looks very similar to other arches and
> > > > understaning and supporting kdump across architectures becomes easy.
> > > 
> > > My problem with that is the following: how do we get from the "Kernel Crash"
> > > step to the purgatory code? It does work for "normal" panics, but it fails
> > > miserably for a hard crash that does not even get as far as panic. That is
> > > why we insist on a possible second order of things:
> > 
> > What is hard crash? How does that happen and what does x86 and s390
> > do in that case?
> 
> E.g. an endless loop with interrupts disabled. To get out of this situation
> we will IPL/boot a new system. That is either the production system itself
> or the stand-alone dump tool. 

NMI hardware lockup detection will work in this situation and will lead
to kdump trigger.

>  
> > Though I don't have details but your argument seems to be that in s390
> > we are always guranteed that we will jump to IPLing the stand alone
> > tools code irresepective of the system state hence it is relatively
> > safer to do checks in stand alone tools instead of purgatory where
> > code is in memory.
> 
> Now you got it. That is the crux of the argument.
> 
> > If due to hard hang, code can not even make to purgatory, where would
> > it go? Can't we do IPLing of stand alone tool then. 
> 
> It doesn't go anywhere. Basically the system is manually stopped and
> restarted. But on s390 we can still get to all the required information
> to generated a dump. That is one of the major differences to x86, if
> you have to do a restart the registers on x86 will be gone, no?
>  
> > So we first try to take purgatory path which does the checksum and is
> > consistent with other architectures. If that does not work in case
> > of hard hang, you always have the option of IPLing the stand alone tool
> > later manually.
> 
> How are we suddenly on the purgatory path again? The code that gets
> control in case of a hard crash + IPL is the stand-alone dump tool,
> not the purgatory code.

I think that's the biggest contetion point. From the start of discussion
you have this hardcoded requirement that the moment panic() happens
you are jumping to some IPL code and that's what I am questioning. Why
can't you execute some more code after panic() (purgatory), before
you jump to IPL code (only if you have to). 

> The first thing we want to do is to check if
> the purgatory is still fine, that is do a checksum. If we have the
> infrastructure in place to do one checksum then we can easily do the
> other checksums as well.

Some piece of code you have to assume is fine. Are you not already
assuming that IPL code you have in first 64K bytes is fine and no
body has overwritten it. Are you not assuming that hook in panic()
(I think you are calling it shutdown trigger) is fine so that it
can help you jump to right place.

>  
> > This will also get rid of requirement passing all the segment and cheksum
> > info to stand alone tool with the help of meminfo (That's another sore
> > point). 
> 
> No, it doesn't. We will still need to do the checksum for the purgatory
> code and we already have the re-ipl information which won't go away.

It is a very small piece of code. The way you assume that your 8KB of
IPL code is fine, I think we shall have to have this assumption here
also.

>  
> > Bottom line, even if you can't make to purgatory reliably, you always
> > have the option of capturing dump manually using stand alone tools. We
> > don't have to mix up kdump and stand alone mechanism. If kdump fails, we
> > just need to have capability to still capture the dump using stand alone
> > tools manually. I think that will make things simpler even for stand alone
> > tools.
> 
> If we decide not to mix kdump and stand-alone dump then we loose something.
> Consider a hard crash where the kdump segments are still intact. What our
> customers do in that case is to start the stand-alone dump utility. Without
> a way to find and verify the kdump setup we would have to do a full dump.
> Which will take its time if the memory size is big. See?

This is a really-2 corner case where purgatory went bad. And even in
corner case you capture the dump just that it is not filtered.

I really don't understand that to address the corner case why would
you complicate the general kexec infrastructure and introduce new
interfaces like meminfo.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-14 17:55                       ` Vivek Goyal
@ 2011-07-14 18:05                         ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-14 18:05 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Michael Holzheu, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Thu, Jul 14, 2011 at 01:55:32PM -0400, Vivek Goyal wrote:

[..]
> > > So we first try to take purgatory path which does the checksum and is
> > > consistent with other architectures. If that does not work in case
> > > of hard hang, you always have the option of IPLing the stand alone tool
> > > later manually.
> > 
> > How are we suddenly on the purgatory path again? The code that gets
> > control in case of a hard crash + IPL is the stand-alone dump tool,
> > not the purgatory code.
> 
> I think that's the biggest contetion point. From the start of discussion
> you have this hardcoded requirement that the moment panic() happens
> you are jumping to some IPL code and that's what I am questioning. Why
> can't you execute some more code after panic() (purgatory), before
> you jump to IPL code (only if you have to). 
> 

In your parlance of shutdown actions, I think it is equivalnet to saying
that "kdump" is a shutdown action and that that means is that for specific
trigger points we will execute "crash_kexec()" function which will try
to capture the dump.

In user space I think one can modify the kexec-tools infrastrucuture a
bit so that one is able to define an entry point in case checksum of
loaded segment failes. Once you are loding kdump kernel, you can define
that entry point. (And this would be jump to IPL etc.).

Thanks
Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-14 18:05                         ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-14 18:05 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Michael Holzheu, kexec

On Thu, Jul 14, 2011 at 01:55:32PM -0400, Vivek Goyal wrote:

[..]
> > > So we first try to take purgatory path which does the checksum and is
> > > consistent with other architectures. If that does not work in case
> > > of hard hang, you always have the option of IPLing the stand alone tool
> > > later manually.
> > 
> > How are we suddenly on the purgatory path again? The code that gets
> > control in case of a hard crash + IPL is the stand-alone dump tool,
> > not the purgatory code.
> 
> I think that's the biggest contetion point. From the start of discussion
> you have this hardcoded requirement that the moment panic() happens
> you are jumping to some IPL code and that's what I am questioning. Why
> can't you execute some more code after panic() (purgatory), before
> you jump to IPL code (only if you have to). 
> 

In your parlance of shutdown actions, I think it is equivalnet to saying
that "kdump" is a shutdown action and that that means is that for specific
trigger points we will execute "crash_kexec()" function which will try
to capture the dump.

In user space I think one can modify the kexec-tools infrastrucuture a
bit so that one is able to define an entry point in case checksum of
loaded segment failes. Once you are loding kdump kernel, you can define
that entry point. (And this would be jump to IPL etc.).

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-14 17:55                       ` Vivek Goyal
@ 2011-07-15 13:56                         ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-15 13:56 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Martin Schwidefsky, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

Hello Vivec,

On Thu, 2011-07-14 at 13:55 -0400, Vivek Goyal wrote:

[snip]

> > The first thing we want to do is to check if
> > the purgatory is still fine, that is do a checksum. If we have the
> > infrastructure in place to do one checksum then we can easily do the
> > other checksums as well.
> 
> Some piece of code you have to assume is fine. Are you not already
> assuming that IPL code you have in first 64K bytes is fine and no
> body has overwritten it.

We can assume that the IPL dump code is fine, because it is freshly
loaded into memory. Only when the disk is somehow corrupted we have a
problem.

> Are you not assuming that hook in panic()
> (I think you are calling it shutdown trigger) is fine so that it
> can help you jump to right place.

Yes, that is correct for automatic dump in case of panic(). The panic()
path can fail.

But there are two other options where really *no* code that was in
memory, when the system crashed, is used for the dump process or
verification of kdump:
1) Manual IPL/boot of stand-alone dump by the operator via the virtual
guest console
2) Automatic IPL/boot of stand-alone dump by our z/VM hypervisor
watchdog

Michael


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-15 13:56                         ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-15 13:56 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

Hello Vivec,

On Thu, 2011-07-14 at 13:55 -0400, Vivek Goyal wrote:

[snip]

> > The first thing we want to do is to check if
> > the purgatory is still fine, that is do a checksum. If we have the
> > infrastructure in place to do one checksum then we can easily do the
> > other checksums as well.
> 
> Some piece of code you have to assume is fine. Are you not already
> assuming that IPL code you have in first 64K bytes is fine and no
> body has overwritten it.

We can assume that the IPL dump code is fine, because it is freshly
loaded into memory. Only when the disk is somehow corrupted we have a
problem.

> Are you not assuming that hook in panic()
> (I think you are calling it shutdown trigger) is fine so that it
> can help you jump to right place.

Yes, that is correct for automatic dump in case of panic(). The panic()
path can fail.

But there are two other options where really *no* code that was in
memory, when the system crashed, is used for the dump process or
verification of kdump:
1) Manual IPL/boot of stand-alone dump by the operator via the virtual
guest console
2) Automatic IPL/boot of stand-alone dump by our z/VM hypervisor
watchdog

Michael


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-15 13:56                         ` Michael Holzheu
@ 2011-07-15 14:18                           ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-15 14:18 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: Martin Schwidefsky, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Fri, Jul 15, 2011 at 03:56:21PM +0200, Michael Holzheu wrote:
> Hello Vivec,
> 
> On Thu, 2011-07-14 at 13:55 -0400, Vivek Goyal wrote:
> 
> [snip]
> 
> > > The first thing we want to do is to check if
> > > the purgatory is still fine, that is do a checksum. If we have the
> > > infrastructure in place to do one checksum then we can easily do the
> > > other checksums as well.
> > 
> > Some piece of code you have to assume is fine. Are you not already
> > assuming that IPL code you have in first 64K bytes is fine and no
> > body has overwritten it.
> 
> We can assume that the IPL dump code is fine, because it is freshly
> loaded into memory. Only when the disk is somehow corrupted we have a
> problem.
> 
> > Are you not assuming that hook in panic()
> > (I think you are calling it shutdown trigger) is fine so that it
> > can help you jump to right place.
> 
> Yes, that is correct for automatic dump in case of panic(). The panic()
> path can fail.
> 
> But there are two other options where really *no* code that was in
> memory, when the system crashed, is used for the dump process or
> verification of kdump:
> 1) Manual IPL/boot of stand-alone dump by the operator via the virtual
> guest console
> 2) Automatic IPL/boot of stand-alone dump by our z/VM hypervisor
> watchdog

Hi Michael,

Ok. So IIUC, then purgatory code corruption is equivalent of panic() code
corruption and in that case above two options will help an admin capture
the dump.

That's precisely the point I am trying to make that stand alone dump
tools still remains the backup mechanism when kdump fails. Kdump can
fail ether because checksum of loaded kernel is bad or because purgatory
code itself got corrupted. In first case, purgatory itself can make
sure of jumping to location to IPL the dump tools and in second case
above two options will come into picture (manual dump via operator or
hypervisor watchdog initiated IPL).

If we go this path, this will should simplify the design a lot. dump
tools don't have to know anything about kdump kernel and there is no
need to pass any information. 

And in common case kdump should be able to capture the dump and filter
it. Only in extreme corner cases, we need to trigger this dump tool
mechanism and capture full memory dump.

How about doing it that way. This should not require much chagens in
common kexec code. Will require some changes in kexec-tools though, 
as you shall have to create a mechanism for purgatory to jump to in
case kdump kernel checksum fails.

Thanks
Vivek 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-15 14:18                           ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-15 14:18 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

On Fri, Jul 15, 2011 at 03:56:21PM +0200, Michael Holzheu wrote:
> Hello Vivec,
> 
> On Thu, 2011-07-14 at 13:55 -0400, Vivek Goyal wrote:
> 
> [snip]
> 
> > > The first thing we want to do is to check if
> > > the purgatory is still fine, that is do a checksum. If we have the
> > > infrastructure in place to do one checksum then we can easily do the
> > > other checksums as well.
> > 
> > Some piece of code you have to assume is fine. Are you not already
> > assuming that IPL code you have in first 64K bytes is fine and no
> > body has overwritten it.
> 
> We can assume that the IPL dump code is fine, because it is freshly
> loaded into memory. Only when the disk is somehow corrupted we have a
> problem.
> 
> > Are you not assuming that hook in panic()
> > (I think you are calling it shutdown trigger) is fine so that it
> > can help you jump to right place.
> 
> Yes, that is correct for automatic dump in case of panic(). The panic()
> path can fail.
> 
> But there are two other options where really *no* code that was in
> memory, when the system crashed, is used for the dump process or
> verification of kdump:
> 1) Manual IPL/boot of stand-alone dump by the operator via the virtual
> guest console
> 2) Automatic IPL/boot of stand-alone dump by our z/VM hypervisor
> watchdog

Hi Michael,

Ok. So IIUC, then purgatory code corruption is equivalent of panic() code
corruption and in that case above two options will help an admin capture
the dump.

That's precisely the point I am trying to make that stand alone dump
tools still remains the backup mechanism when kdump fails. Kdump can
fail ether because checksum of loaded kernel is bad or because purgatory
code itself got corrupted. In first case, purgatory itself can make
sure of jumping to location to IPL the dump tools and in second case
above two options will come into picture (manual dump via operator or
hypervisor watchdog initiated IPL).

If we go this path, this will should simplify the design a lot. dump
tools don't have to know anything about kdump kernel and there is no
need to pass any information. 

And in common case kdump should be able to capture the dump and filter
it. Only in extreme corner cases, we need to trigger this dump tool
mechanism and capture full memory dump.

How about doing it that way. This should not require much chagens in
common kexec code. Will require some changes in kexec-tools though, 
as you shall have to create a mechanism for purgatory to jump to in
case kdump kernel checksum fails.

Thanks
Vivek 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-14 18:05                         ` Vivek Goyal
@ 2011-07-15 14:21                           ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-15 14:21 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Martin Schwidefsky, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Thu, 2011-07-14 at 14:05 -0400, Vivek Goyal wrote:
> On Thu, Jul 14, 2011 at 01:55:32PM -0400, Vivek Goyal wrote:
> 
> [..]
> > > > So we first try to take purgatory path which does the checksum and is
> > > > consistent with other architectures. If that does not work in case
> > > > of hard hang, you always have the option of IPLing the stand alone tool
> > > > later manually.
> > > 
> > > How are we suddenly on the purgatory path again? The code that gets
> > > control in case of a hard crash + IPL is the stand-alone dump tool,
> > > not the purgatory code.
> > 
> > I think that's the biggest contetion point. From the start of discussion
> > you have this hardcoded requirement that the moment panic() happens
> > you are jumping to some IPL code and that's what I am questioning. Why
> > can't you execute some more code after panic() (purgatory), before
> > you jump to IPL code (only if you have to). 
> > 
> 
> In your parlance of shutdown actions, I think it is equivalnet to saying
> that "kdump" is a shutdown action and that that means is that for specific
> trigger points we will execute "crash_kexec()" function which will try
> to capture the dump.
> 
> In user space I think one can modify the kexec-tools infrastrucuture a
> bit so that one is able to define an entry point in case checksum of
> loaded segment failes. Once you are loding kdump kernel, you can define
> that entry point. (And this would be jump to IPL etc.).

You mean to jump back into the crashed kernel code in case the kdump
checksum failed?

In the meantime I was looking a bit more into the kexec code to find
out, what we would have to do, if we use the preallocated ELF header as
you want us to do. With our actual solution, we do not have to reserve
any special areas for the kdump kernel. Now we have to reserve the ELF
header. So what are the options?

The x86 implementation uses a kernel parameter "memmap=exactmap" to do
that.

On ia64 - if I understood the code correctly - they seem to pass a kdump
segment "EFI_memmap" to the kdump kernel that contains information about
all loaded kexec segments. With this segment they can find out the size
of the ELF header segment in the kdump kernel and then do the memory
reservation at boot time. Is that correct?

Michael



^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-15 14:21                           ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-15 14:21 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

On Thu, 2011-07-14 at 14:05 -0400, Vivek Goyal wrote:
> On Thu, Jul 14, 2011 at 01:55:32PM -0400, Vivek Goyal wrote:
> 
> [..]
> > > > So we first try to take purgatory path which does the checksum and is
> > > > consistent with other architectures. If that does not work in case
> > > > of hard hang, you always have the option of IPLing the stand alone tool
> > > > later manually.
> > > 
> > > How are we suddenly on the purgatory path again? The code that gets
> > > control in case of a hard crash + IPL is the stand-alone dump tool,
> > > not the purgatory code.
> > 
> > I think that's the biggest contetion point. From the start of discussion
> > you have this hardcoded requirement that the moment panic() happens
> > you are jumping to some IPL code and that's what I am questioning. Why
> > can't you execute some more code after panic() (purgatory), before
> > you jump to IPL code (only if you have to). 
> > 
> 
> In your parlance of shutdown actions, I think it is equivalnet to saying
> that "kdump" is a shutdown action and that that means is that for specific
> trigger points we will execute "crash_kexec()" function which will try
> to capture the dump.
> 
> In user space I think one can modify the kexec-tools infrastrucuture a
> bit so that one is able to define an entry point in case checksum of
> loaded segment failes. Once you are loding kdump kernel, you can define
> that entry point. (And this would be jump to IPL etc.).

You mean to jump back into the crashed kernel code in case the kdump
checksum failed?

In the meantime I was looking a bit more into the kexec code to find
out, what we would have to do, if we use the preallocated ELF header as
you want us to do. With our actual solution, we do not have to reserve
any special areas for the kdump kernel. Now we have to reserve the ELF
header. So what are the options?

The x86 implementation uses a kernel parameter "memmap=exactmap" to do
that.

On ia64 - if I understood the code correctly - they seem to pass a kdump
segment "EFI_memmap" to the kdump kernel that contains information about
all loaded kexec segments. With this segment they can find out the size
of the ELF header segment in the kdump kernel and then do the memory
reservation at boot time. Is that correct?

Michael



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-15 14:21                           ` Michael Holzheu
@ 2011-07-15 14:38                             ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-15 14:38 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: Martin Schwidefsky, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Fri, Jul 15, 2011 at 04:21:57PM +0200, Michael Holzheu wrote:
> On Thu, 2011-07-14 at 14:05 -0400, Vivek Goyal wrote:
> > On Thu, Jul 14, 2011 at 01:55:32PM -0400, Vivek Goyal wrote:
> > 
> > [..]
> > > > > So we first try to take purgatory path which does the checksum and is
> > > > > consistent with other architectures. If that does not work in case
> > > > > of hard hang, you always have the option of IPLing the stand alone tool
> > > > > later manually.
> > > > 
> > > > How are we suddenly on the purgatory path again? The code that gets
> > > > control in case of a hard crash + IPL is the stand-alone dump tool,
> > > > not the purgatory code.
> > > 
> > > I think that's the biggest contetion point. From the start of discussion
> > > you have this hardcoded requirement that the moment panic() happens
> > > you are jumping to some IPL code and that's what I am questioning. Why
> > > can't you execute some more code after panic() (purgatory), before
> > > you jump to IPL code (only if you have to). 
> > > 
> > 
> > In your parlance of shutdown actions, I think it is equivalnet to saying
> > that "kdump" is a shutdown action and that that means is that for specific
> > trigger points we will execute "crash_kexec()" function which will try
> > to capture the dump.
> > 
> > In user space I think one can modify the kexec-tools infrastrucuture a
> > bit so that one is able to define an entry point in case checksum of
> > loaded segment failes. Once you are loding kdump kernel, you can define
> > that entry point. (And this would be jump to IPL etc.).
> 
> You mean to jump back into the crashed kernel code in case the kdump
> checksum failed?

No. I meant jump to entry point so that one can IPL the dump tools. I
am not sure how do initiate the IPL after panic. Similar thing needs
to be done here. If it is as simple as jumping to some location in
low memory, then purgatory should be able to do that. I think we
shall have to figure out the details here.

Basically I am saying that purgatory detected that kdump kernel is
corrupted. In x86_64 we spin in inifinite loop as we don't have a
backup plan. But s390 has a backup plan of being able to IPL dump
tools.

Or in first step we can keep it even simpler. We can spin in infinite
loop and wait for either hypervisor watchdog to kick in for automatic
IPL or wait for operator intervention. That would simplify it even
further. 

> 
> In the meantime I was looking a bit more into the kexec code to find
> out, what we would have to do, if we use the preallocated ELF header as
> you want us to do. With our actual solution, we do not have to reserve
> any special areas for the kdump kernel. Now we have to reserve the ELF
> header. So what are the options?

ELF headers go into same memory area as kdump kenrel. Anyway you are
doing to reserve memory for kdump kernel and ELF headers will go 
right there.

Once you swap the kernel I think ELF headers continue to remain in
original location. Or may be you can move ELF headers too depending
on what turns out to be easier.

> 
> The x86 implementation uses a kernel parameter "memmap=exactmap" to do
> that.

It tells second kernel to use a memory map defined on command line.
Kexec-tools prepares this memory map with the help of memap= options. This
is to limit the memory second kernel use to boot into so that it does not
overwrite in any piece of memory used by first kernel.

In your case I think you shall have to do little more so that second
kernel also seems some of the lower memory areas so that later swapping
of kernel can be done.

> 
> On ia64 - if I understood the code correctly - they seem to pass a kdump
> segment "EFI_memmap" to the kdump kernel that contains information about
> all loaded kexec segments. With this segment they can find out the size
> of the ELF header segment in the kdump kernel and then do the memory
> reservation at boot time. Is that correct?

Sorry, I don't know the details of IA64. May be somebody else on the list
can pitch in with some clarifications here.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-15 14:38                             ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-15 14:38 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

On Fri, Jul 15, 2011 at 04:21:57PM +0200, Michael Holzheu wrote:
> On Thu, 2011-07-14 at 14:05 -0400, Vivek Goyal wrote:
> > On Thu, Jul 14, 2011 at 01:55:32PM -0400, Vivek Goyal wrote:
> > 
> > [..]
> > > > > So we first try to take purgatory path which does the checksum and is
> > > > > consistent with other architectures. If that does not work in case
> > > > > of hard hang, you always have the option of IPLing the stand alone tool
> > > > > later manually.
> > > > 
> > > > How are we suddenly on the purgatory path again? The code that gets
> > > > control in case of a hard crash + IPL is the stand-alone dump tool,
> > > > not the purgatory code.
> > > 
> > > I think that's the biggest contetion point. From the start of discussion
> > > you have this hardcoded requirement that the moment panic() happens
> > > you are jumping to some IPL code and that's what I am questioning. Why
> > > can't you execute some more code after panic() (purgatory), before
> > > you jump to IPL code (only if you have to). 
> > > 
> > 
> > In your parlance of shutdown actions, I think it is equivalnet to saying
> > that "kdump" is a shutdown action and that that means is that for specific
> > trigger points we will execute "crash_kexec()" function which will try
> > to capture the dump.
> > 
> > In user space I think one can modify the kexec-tools infrastrucuture a
> > bit so that one is able to define an entry point in case checksum of
> > loaded segment failes. Once you are loding kdump kernel, you can define
> > that entry point. (And this would be jump to IPL etc.).
> 
> You mean to jump back into the crashed kernel code in case the kdump
> checksum failed?

No. I meant jump to entry point so that one can IPL the dump tools. I
am not sure how do initiate the IPL after panic. Similar thing needs
to be done here. If it is as simple as jumping to some location in
low memory, then purgatory should be able to do that. I think we
shall have to figure out the details here.

Basically I am saying that purgatory detected that kdump kernel is
corrupted. In x86_64 we spin in inifinite loop as we don't have a
backup plan. But s390 has a backup plan of being able to IPL dump
tools.

Or in first step we can keep it even simpler. We can spin in infinite
loop and wait for either hypervisor watchdog to kick in for automatic
IPL or wait for operator intervention. That would simplify it even
further. 

> 
> In the meantime I was looking a bit more into the kexec code to find
> out, what we would have to do, if we use the preallocated ELF header as
> you want us to do. With our actual solution, we do not have to reserve
> any special areas for the kdump kernel. Now we have to reserve the ELF
> header. So what are the options?

ELF headers go into same memory area as kdump kenrel. Anyway you are
doing to reserve memory for kdump kernel and ELF headers will go 
right there.

Once you swap the kernel I think ELF headers continue to remain in
original location. Or may be you can move ELF headers too depending
on what turns out to be easier.

> 
> The x86 implementation uses a kernel parameter "memmap=exactmap" to do
> that.

It tells second kernel to use a memory map defined on command line.
Kexec-tools prepares this memory map with the help of memap= options. This
is to limit the memory second kernel use to boot into so that it does not
overwrite in any piece of memory used by first kernel.

In your case I think you shall have to do little more so that second
kernel also seems some of the lower memory areas so that later swapping
of kernel can be done.

> 
> On ia64 - if I understood the code correctly - they seem to pass a kdump
> segment "EFI_memmap" to the kdump kernel that contains information about
> all loaded kexec segments. With this segment they can find out the size
> of the ELF header segment in the kdump kernel and then do the memory
> reservation at boot time. Is that correct?

Sorry, I don't know the details of IA64. May be somebody else on the list
can pitch in with some clarifications here.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-15 14:38                             ` Vivek Goyal
@ 2011-07-15 15:43                               ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-15 15:43 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Martin Schwidefsky, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

Hello Vivek,

On Fri, 2011-07-15 at 10:38 -0400, Vivek Goyal wrote:
> > > In user space I think one can modify the kexec-tools infrastrucuture a
> > > bit so that one is able to define an entry point in case checksum of
> > > loaded segment failes. Once you are loding kdump kernel, you can define
> > > that entry point. (And this would be jump to IPL etc.).
> > 
> > You mean to jump back into the crashed kernel code in case the kdump
> > checksum failed?
> 
> No. I meant jump to entry point so that one can IPL the dump tools. I
> am not sure how do initiate the IPL after panic. Similar thing needs
> to be done here. If it is as simple as jumping to some location in
> low memory, then purgatory should be able to do that. I think we
> shall have to figure out the details here.

We have a machine instruction to IPL a dump tool from a device. The
parameters (e.g. device number, or WWPN/LUN for SCSI devices) are
currently configured via a s390 sysfs interface and an etc config file.
In theory we could read the sysfs files or the config file from the
kexec tool and patch the parameters into the purgatory code. The user
would then have to restart kexec each time when the configuration is
changed.

> Basically I am saying that purgatory detected that kdump kernel is
> corrupted. In x86_64 we spin in inifinite loop as we don't have a
> backup plan. But s390 has a backup plan of being able to IPL dump
> tools.
> 
> Or in first step we can keep it even simpler. We can spin in infinite
> loop

Looping is probably not a good option in a hypervisor environment like
we have it on s390. At least we should load a disabled wait PSW.

> and wait for either hypervisor watchdog to kick in for automatic
> IPL or wait for operator intervention.
> That would simplify it even
> further. 
> > 
> > In the meantime I was looking a bit more into the kexec code to find
> > out, what we would have to do, if we use the preallocated ELF header as
> > you want us to do. With our actual solution, we do not have to reserve
> > any special areas for the kdump kernel. Now we have to reserve the ELF
> > header. So what are the options?
> 
> ELF headers go into same memory area as kdump kenrel.

sure

> Anyway you are
> doing to reserve memory for kdump kernel and ELF headers will go 
> right there.
> Once you swap the kernel I think ELF headers continue to remain in
> original location. Or may be you can move ELF headers too depending
> on what turns out to be easier.
> 
> > 
> > The x86 implementation uses a kernel parameter "memmap=exactmap" to do
> > that.
> 
> It tells second kernel to use a memory map defined on command line.
> Kexec-tools prepares this memory map with the help of memap= options. This
> is to limit the memory second kernel use to boot into so that it does not
> overwrite in any piece of memory used by first kernel.

And to reserve the ELF header that is prepared by kexec tools, no?

> In your case I think you shall have to do little more so that second
> kernel also seems some of the lower memory areas so that later swapping
> of kernel can be done.

After the swap the ELF header is contained in the same memory than the
kdump kernel. When the kdump kernel starts, the ELF header has to be
saved from being overwritten (as kernel and ramdisk). I get the address
from the "elfcorehdr=" kernel parameter. How will I get the size?
Looking at the ia64 and x86 implementations I have the feeling there are
different mechanism available to do that.

> 
> > 
> > On ia64 - if I understood the code correctly - they seem to pass a kdump
> > segment "EFI_memmap" to the kdump kernel that contains information about
> > all loaded kexec segments. With this segment they can find out the size
> > of the ELF header segment in the kdump kernel and then do the memory
> > reservation at boot time. Is that correct?
> 
> Sorry, I don't know the details of IA64. May be somebody else on the list
> can pitch in with some clarifications here.

For me it looks like a mechanism where a block of information is
prepared by kexec tools and a pointer to that block is passed somehow to
the second kernel. I would assume that the definition of this block is
ia64 kernel ABI. 

See kernel:
* arch/ia64/kernel/setup.c: reserve_elfcorehdr()
* arch/ia64/kernel/head.S: ia64_boot_param

kexec tools:
* kexec/arch/ia64/kexec-elf-ia64.c: efi_memmap_buf
* purgatory/arch/ia64/entry.S: __boot_param_base

Michael



^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-15 15:43                               ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-15 15:43 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

Hello Vivek,

On Fri, 2011-07-15 at 10:38 -0400, Vivek Goyal wrote:
> > > In user space I think one can modify the kexec-tools infrastrucuture a
> > > bit so that one is able to define an entry point in case checksum of
> > > loaded segment failes. Once you are loding kdump kernel, you can define
> > > that entry point. (And this would be jump to IPL etc.).
> > 
> > You mean to jump back into the crashed kernel code in case the kdump
> > checksum failed?
> 
> No. I meant jump to entry point so that one can IPL the dump tools. I
> am not sure how do initiate the IPL after panic. Similar thing needs
> to be done here. If it is as simple as jumping to some location in
> low memory, then purgatory should be able to do that. I think we
> shall have to figure out the details here.

We have a machine instruction to IPL a dump tool from a device. The
parameters (e.g. device number, or WWPN/LUN for SCSI devices) are
currently configured via a s390 sysfs interface and an etc config file.
In theory we could read the sysfs files or the config file from the
kexec tool and patch the parameters into the purgatory code. The user
would then have to restart kexec each time when the configuration is
changed.

> Basically I am saying that purgatory detected that kdump kernel is
> corrupted. In x86_64 we spin in inifinite loop as we don't have a
> backup plan. But s390 has a backup plan of being able to IPL dump
> tools.
> 
> Or in first step we can keep it even simpler. We can spin in infinite
> loop

Looping is probably not a good option in a hypervisor environment like
we have it on s390. At least we should load a disabled wait PSW.

> and wait for either hypervisor watchdog to kick in for automatic
> IPL or wait for operator intervention.
> That would simplify it even
> further. 
> > 
> > In the meantime I was looking a bit more into the kexec code to find
> > out, what we would have to do, if we use the preallocated ELF header as
> > you want us to do. With our actual solution, we do not have to reserve
> > any special areas for the kdump kernel. Now we have to reserve the ELF
> > header. So what are the options?
> 
> ELF headers go into same memory area as kdump kenrel.

sure

> Anyway you are
> doing to reserve memory for kdump kernel and ELF headers will go 
> right there.
> Once you swap the kernel I think ELF headers continue to remain in
> original location. Or may be you can move ELF headers too depending
> on what turns out to be easier.
> 
> > 
> > The x86 implementation uses a kernel parameter "memmap=exactmap" to do
> > that.
> 
> It tells second kernel to use a memory map defined on command line.
> Kexec-tools prepares this memory map with the help of memap= options. This
> is to limit the memory second kernel use to boot into so that it does not
> overwrite in any piece of memory used by first kernel.

And to reserve the ELF header that is prepared by kexec tools, no?

> In your case I think you shall have to do little more so that second
> kernel also seems some of the lower memory areas so that later swapping
> of kernel can be done.

After the swap the ELF header is contained in the same memory than the
kdump kernel. When the kdump kernel starts, the ELF header has to be
saved from being overwritten (as kernel and ramdisk). I get the address
from the "elfcorehdr=" kernel parameter. How will I get the size?
Looking at the ia64 and x86 implementations I have the feeling there are
different mechanism available to do that.

> 
> > 
> > On ia64 - if I understood the code correctly - they seem to pass a kdump
> > segment "EFI_memmap" to the kdump kernel that contains information about
> > all loaded kexec segments. With this segment they can find out the size
> > of the ELF header segment in the kdump kernel and then do the memory
> > reservation at boot time. Is that correct?
> 
> Sorry, I don't know the details of IA64. May be somebody else on the list
> can pitch in with some clarifications here.

For me it looks like a mechanism where a block of information is
prepared by kexec tools and a pointer to that block is passed somehow to
the second kernel. I would assume that the definition of this block is
ia64 kernel ABI. 

See kernel:
* arch/ia64/kernel/setup.c: reserve_elfcorehdr()
* arch/ia64/kernel/head.S: ia64_boot_param

kexec tools:
* kexec/arch/ia64/kexec-elf-ia64.c: efi_memmap_buf
* purgatory/arch/ia64/entry.S: __boot_param_base

Michael



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-15 15:43                               ` Michael Holzheu
@ 2011-07-18 12:31                                 ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-18 12:31 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: Martin Schwidefsky, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Fri, Jul 15, 2011 at 05:43:23PM +0200, Michael Holzheu wrote:
> Hello Vivek,
> 
> On Fri, 2011-07-15 at 10:38 -0400, Vivek Goyal wrote:
> > > > In user space I think one can modify the kexec-tools infrastrucuture a
> > > > bit so that one is able to define an entry point in case checksum of
> > > > loaded segment failes. Once you are loding kdump kernel, you can define
> > > > that entry point. (And this would be jump to IPL etc.).
> > > 
> > > You mean to jump back into the crashed kernel code in case the kdump
> > > checksum failed?
> > 
> > No. I meant jump to entry point so that one can IPL the dump tools. I
> > am not sure how do initiate the IPL after panic. Similar thing needs
> > to be done here. If it is as simple as jumping to some location in
> > low memory, then purgatory should be able to do that. I think we
> > shall have to figure out the details here.
> 
> We have a machine instruction to IPL a dump tool from a device. The
> parameters (e.g. device number, or WWPN/LUN for SCSI devices) are
> currently configured via a s390 sysfs interface and an etc config file.
> In theory we could read the sysfs files or the config file from the
> kexec tool and patch the parameters into the purgatory code. The user
> would then have to restart kexec each time when the configuration is
> changed.

I think reading WWPN/LUN of scsi device from /sys and patching purgatory
makes sense. I think restarting kexec-tools on device set/change should
not be a big problem. There area already many events now when a user
is supposed to do that.

> 
> > Basically I am saying that purgatory detected that kdump kernel is
> > corrupted. In x86_64 we spin in inifinite loop as we don't have a
> > backup plan. But s390 has a backup plan of being able to IPL dump
> > tools.
> > 
> > Or in first step we can keep it even simpler. We can spin in infinite
> > loop
> 
> Looping is probably not a good option in a hypervisor environment like
> we have it on s390. At least we should load a disabled wait PSW.

What is "disabled wait PSW"?

> 
> > and wait for either hypervisor watchdog to kick in for automatic
> > IPL or wait for operator intervention.
> > That would simplify it even
> > further. 
> > > 
> > > In the meantime I was looking a bit more into the kexec code to find
> > > out, what we would have to do, if we use the preallocated ELF header as
> > > you want us to do. With our actual solution, we do not have to reserve
> > > any special areas for the kdump kernel. Now we have to reserve the ELF
> > > header. So what are the options?
> > 
> > ELF headers go into same memory area as kdump kenrel.
> 
> sure
> 
> > Anyway you are
> > doing to reserve memory for kdump kernel and ELF headers will go 
> > right there.
> > Once you swap the kernel I think ELF headers continue to remain in
> > original location. Or may be you can move ELF headers too depending
> > on what turns out to be easier.
> > 
> > > 
> > > The x86 implementation uses a kernel parameter "memmap=exactmap" to do
> > > that.
> > 
> > It tells second kernel to use a memory map defined on command line.
> > Kexec-tools prepares this memory map with the help of memap= options. This
> > is to limit the memory second kernel use to boot into so that it does not
> > overwrite in any piece of memory used by first kernel.
> 
> And to reserve the ELF header that is prepared by kexec tools, no?

Kind of. It just tells second kernel what memory can be used for boot. ELF
headers prepared by kexec-tools are part of that memory so that second
kernel can map that memory and can read the ELF headers and figure
out the layout of memory as seen by first kernel. These headers also
save the cpu state and bunch of kernel config options.

> 
> > In your case I think you shall have to do little more so that second
> > kernel also seems some of the lower memory areas so that later swapping
> > of kernel can be done.
> 
> After the swap the ELF header is contained in the same memory than the
> kdump kernel. When the kdump kernel starts, the ELF header has to be
> saved from being overwritten (as kernel and ramdisk). I get the address
> from the "elfcorehdr=" kernel parameter. How will I get the size?

By parsing the ELF header. It will give you information about how many
program headers and notes are there, their sizes and locations etc.

When kexec-tools loads ELF headers, it knows what's the total size of
ELF headers and it removes that chunk of memory from the memory map
passed to second kernel with memmap= options. IOW, some memory out
of reserved region is not usable by second kernel because we have
stored information in that memory. Kdump kernel maps that memory and
gets to read the ELF headers.

So you shall have to do something similar where you need to tell second
kernel what memory areas it can use for boot and remove ELF header
memory area from the map.

 
> Looking at the ia64 and x86 implementations I have the feeling there are
> different mechanism available to do that.
> 
> > 
> > > 
> > > On ia64 - if I understood the code correctly - they seem to pass a kdump
> > > segment "EFI_memmap" to the kdump kernel that contains information about
> > > all loaded kexec segments. With this segment they can find out the size
> > > of the ELF header segment in the kdump kernel and then do the memory
> > > reservation at boot time. Is that correct?
> > 
> > Sorry, I don't know the details of IA64. May be somebody else on the list
> > can pitch in with some clarifications here.
> 
> For me it looks like a mechanism where a block of information is
> prepared by kexec tools and a pointer to that block is passed somehow to
> the second kernel. I would assume that the definition of this block is
> ia64 kernel ABI. 

It is possible. Even in x86, we prepare a block of information, one
4K page and fill lots of x86 boot protocol information.

Look at.

kexec-tools/include/x86/x86-linux.h
kexec-tools/kexec/arch/i386/x86-linux-setup.c

Above header information contains information about e820 memory map also
and we fill that map info for normal kexec (fastboot, not kdump) also and
that's how second kernel comes to know about memory map of system.

I think one could possibly truncate the same map for kdump kernel to
tell second kernel about the memory to use. But IIRC, original memory
map is also used to determine max_pfn present in first kernel so that
in second kernel we don't try to map a memory beyond that and access
it, etc. Hence it was decided to leave it that way and pass the memory
map for second kernel on command line. 

So its possible that IA64 is doing preparing boot protocal specific
block and passing all the releavant information in that block instead
of making use of commnad line.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-18 12:31                                 ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-18 12:31 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

On Fri, Jul 15, 2011 at 05:43:23PM +0200, Michael Holzheu wrote:
> Hello Vivek,
> 
> On Fri, 2011-07-15 at 10:38 -0400, Vivek Goyal wrote:
> > > > In user space I think one can modify the kexec-tools infrastrucuture a
> > > > bit so that one is able to define an entry point in case checksum of
> > > > loaded segment failes. Once you are loding kdump kernel, you can define
> > > > that entry point. (And this would be jump to IPL etc.).
> > > 
> > > You mean to jump back into the crashed kernel code in case the kdump
> > > checksum failed?
> > 
> > No. I meant jump to entry point so that one can IPL the dump tools. I
> > am not sure how do initiate the IPL after panic. Similar thing needs
> > to be done here. If it is as simple as jumping to some location in
> > low memory, then purgatory should be able to do that. I think we
> > shall have to figure out the details here.
> 
> We have a machine instruction to IPL a dump tool from a device. The
> parameters (e.g. device number, or WWPN/LUN for SCSI devices) are
> currently configured via a s390 sysfs interface and an etc config file.
> In theory we could read the sysfs files or the config file from the
> kexec tool and patch the parameters into the purgatory code. The user
> would then have to restart kexec each time when the configuration is
> changed.

I think reading WWPN/LUN of scsi device from /sys and patching purgatory
makes sense. I think restarting kexec-tools on device set/change should
not be a big problem. There area already many events now when a user
is supposed to do that.

> 
> > Basically I am saying that purgatory detected that kdump kernel is
> > corrupted. In x86_64 we spin in inifinite loop as we don't have a
> > backup plan. But s390 has a backup plan of being able to IPL dump
> > tools.
> > 
> > Or in first step we can keep it even simpler. We can spin in infinite
> > loop
> 
> Looping is probably not a good option in a hypervisor environment like
> we have it on s390. At least we should load a disabled wait PSW.

What is "disabled wait PSW"?

> 
> > and wait for either hypervisor watchdog to kick in for automatic
> > IPL or wait for operator intervention.
> > That would simplify it even
> > further. 
> > > 
> > > In the meantime I was looking a bit more into the kexec code to find
> > > out, what we would have to do, if we use the preallocated ELF header as
> > > you want us to do. With our actual solution, we do not have to reserve
> > > any special areas for the kdump kernel. Now we have to reserve the ELF
> > > header. So what are the options?
> > 
> > ELF headers go into same memory area as kdump kenrel.
> 
> sure
> 
> > Anyway you are
> > doing to reserve memory for kdump kernel and ELF headers will go 
> > right there.
> > Once you swap the kernel I think ELF headers continue to remain in
> > original location. Or may be you can move ELF headers too depending
> > on what turns out to be easier.
> > 
> > > 
> > > The x86 implementation uses a kernel parameter "memmap=exactmap" to do
> > > that.
> > 
> > It tells second kernel to use a memory map defined on command line.
> > Kexec-tools prepares this memory map with the help of memap= options. This
> > is to limit the memory second kernel use to boot into so that it does not
> > overwrite in any piece of memory used by first kernel.
> 
> And to reserve the ELF header that is prepared by kexec tools, no?

Kind of. It just tells second kernel what memory can be used for boot. ELF
headers prepared by kexec-tools are part of that memory so that second
kernel can map that memory and can read the ELF headers and figure
out the layout of memory as seen by first kernel. These headers also
save the cpu state and bunch of kernel config options.

> 
> > In your case I think you shall have to do little more so that second
> > kernel also seems some of the lower memory areas so that later swapping
> > of kernel can be done.
> 
> After the swap the ELF header is contained in the same memory than the
> kdump kernel. When the kdump kernel starts, the ELF header has to be
> saved from being overwritten (as kernel and ramdisk). I get the address
> from the "elfcorehdr=" kernel parameter. How will I get the size?

By parsing the ELF header. It will give you information about how many
program headers and notes are there, their sizes and locations etc.

When kexec-tools loads ELF headers, it knows what's the total size of
ELF headers and it removes that chunk of memory from the memory map
passed to second kernel with memmap= options. IOW, some memory out
of reserved region is not usable by second kernel because we have
stored information in that memory. Kdump kernel maps that memory and
gets to read the ELF headers.

So you shall have to do something similar where you need to tell second
kernel what memory areas it can use for boot and remove ELF header
memory area from the map.

 
> Looking at the ia64 and x86 implementations I have the feeling there are
> different mechanism available to do that.
> 
> > 
> > > 
> > > On ia64 - if I understood the code correctly - they seem to pass a kdump
> > > segment "EFI_memmap" to the kdump kernel that contains information about
> > > all loaded kexec segments. With this segment they can find out the size
> > > of the ELF header segment in the kdump kernel and then do the memory
> > > reservation at boot time. Is that correct?
> > 
> > Sorry, I don't know the details of IA64. May be somebody else on the list
> > can pitch in with some clarifications here.
> 
> For me it looks like a mechanism where a block of information is
> prepared by kexec tools and a pointer to that block is passed somehow to
> the second kernel. I would assume that the definition of this block is
> ia64 kernel ABI. 

It is possible. Even in x86, we prepare a block of information, one
4K page and fill lots of x86 boot protocol information.

Look at.

kexec-tools/include/x86/x86-linux.h
kexec-tools/kexec/arch/i386/x86-linux-setup.c

Above header information contains information about e820 memory map also
and we fill that map info for normal kexec (fastboot, not kdump) also and
that's how second kernel comes to know about memory map of system.

I think one could possibly truncate the same map for kdump kernel to
tell second kernel about the memory to use. But IIRC, original memory
map is also used to determine max_pfn present in first kernel so that
in second kernel we don't try to map a memory beyond that and access
it, etc. Hence it was decided to leave it that way and pass the memory
map for second kernel on command line. 

So its possible that IA64 is doing preparing boot protocal specific
block and passing all the releavant information in that block instead
of making use of commnad line.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-14 17:55                       ` Vivek Goyal
@ 2011-07-18 13:57                         ` Martin Schwidefsky
  -1 siblings, 0 replies; 112+ messages in thread
From: Martin Schwidefsky @ 2011-07-18 13:57 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Michael Holzheu, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Thu, 14 Jul 2011 13:55:32 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Thu, Jul 14, 2011 at 09:18:00AM +0200, Martin Schwidefsky wrote:
> > On Wed, 13 Jul 2011 16:00:04 -0400
> > Vivek Goyal <vgoyal@redhat.com> wrote:
> > 
> > > On Wed, Jul 13, 2011 at 06:46:11PM +0200, Martin Schwidefsky wrote:
> > > 
> > > [..]
> > > > > What I am suggesting is that stand alone dumper gets control only if
> > > > > kdump kernel is corrupted.
> > > > > 
> > > > > So following sequence.
> > > > > 
> > > > > Kernel Crash ---> purgatory --> either kdump kenrel/IPL stand alone tools
> > > > > 
> > > > > Here only drawback seems to be that we assume that purgatory code and
> > > > > pre-calculated checksum has not been corrupted. The big advantage is
> > > > > that s390 kdump support looks very similar to other arches and
> > > > > understaning and supporting kdump across architectures becomes easy.
> > > > 
> > > > My problem with that is the following: how do we get from the "Kernel Crash"
> > > > step to the purgatory code? It does work for "normal" panics, but it fails
> > > > miserably for a hard crash that does not even get as far as panic. That is
> > > > why we insist on a possible second order of things:
> > > 
> > > What is hard crash? How does that happen and what does x86 and s390
> > > do in that case?
> > 
> > E.g. an endless loop with interrupts disabled. To get out of this situation
> > we will IPL/boot a new system. That is either the production system itself
> > or the stand-alone dump tool. 
> 
> NMI hardware lockup detection will work in this situation and will lead
> to kdump trigger.

Ok, that reduces the problem to the code that is execution as a result of the
nmi interrupt. Only if that code got corrupted it will fail. Should be pretty
save.
 
> >  
> > > Though I don't have details but your argument seems to be that in s390
> > > we are always guranteed that we will jump to IPLing the stand alone
> > > tools code irresepective of the system state hence it is relatively
> > > safer to do checks in stand alone tools instead of purgatory where
> > > code is in memory.
> > 
> > Now you got it. That is the crux of the argument.
> > 
> > > If due to hard hang, code can not even make to purgatory, where would
> > > it go? Can't we do IPLing of stand alone tool then. 
> > 
> > It doesn't go anywhere. Basically the system is manually stopped and
> > restarted. But on s390 we can still get to all the required information
> > to generated a dump. That is one of the major differences to x86, if
> > you have to do a restart the registers on x86 will be gone, no?
> >  
> > > So we first try to take purgatory path which does the checksum and is
> > > consistent with other architectures. If that does not work in case
> > > of hard hang, you always have the option of IPLing the stand alone tool
> > > later manually.
> > 
> > How are we suddenly on the purgatory path again? The code that gets
> > control in case of a hard crash + IPL is the stand-alone dump tool,
> > not the purgatory code.
> 
> I think that's the biggest contetion point. From the start of discussion
> you have this hardcoded requirement that the moment panic() happens
> you are jumping to some IPL code and that's what I am questioning. Why
> can't you execute some more code after panic() (purgatory), before
> you jump to IPL code (only if you have to). 

No, if panic() happens and the code on the panic path is fine we do whatever
is configured as a panic action. For the kdump panic action this can be a
branch to the purgatory code.
The hardcoded requirement we have is a different one: if the automatic panic
action fails for some reason, then we still want to be able to get a dump,
preferably a kdump if the kdump kernel is still fine.

> > The first thing we want to do is to check if
> > the purgatory is still fine, that is do a checksum. If we have the
> > infrastructure in place to do one checksum then we can easily do the
> > other checksums as well.
> 
> Some piece of code you have to assume is fine. Are you not already
> assuming that IPL code you have in first 64K bytes is fine and no
> body has overwritten it. Are you not assuming that hook in panic()
> (I think you are calling it shutdown trigger) is fine so that it
> can help you jump to right place.

There is no IPL code in the first 64K byte at the time the production system
went bad. It is loaded by the IPL of the stand-alone dump tool. An IPL
always loads the code from a "safe" place before it gets executed.

> >  
> > > This will also get rid of requirement passing all the segment and cheksum
> > > info to stand alone tool with the help of meminfo (That's another sore
> > > point). 
> > 
> > No, it doesn't. We will still need to do the checksum for the purgatory
> > code and we already have the re-ipl information which won't go away.
> 
> It is a very small piece of code. The way you assume that your 8KB of
> IPL code is fine, I think we shall have to have this assumption here
> also.

That 8KB of IPL code has been freshly loaded from disk, you can not really
compare that to a setup where the purgatory code has been lying in memory
for almost the complete lifetime of the production system.

> >  
> > > Bottom line, even if you can't make to purgatory reliably, you always
> > > have the option of capturing dump manually using stand alone tools. We
> > > don't have to mix up kdump and stand alone mechanism. If kdump fails, we
> > > just need to have capability to still capture the dump using stand alone
> > > tools manually. I think that will make things simpler even for stand alone
> > > tools.
> > 
> > If we decide not to mix kdump and stand-alone dump then we loose something.
> > Consider a hard crash where the kdump segments are still intact. What our
> > customers do in that case is to start the stand-alone dump utility. Without
> > a way to find and verify the kdump setup we would have to do a full dump.
> > Which will take its time if the memory size is big. See?
> 
> This is a really-2 corner case where purgatory went bad. And even in
> corner case you capture the dump just that it is not filtered.

I beg to differ here. It is not only a problem if the purgatory code went
bad. It is a simple rule we follow on s390: if the system is unresponsive
IPL the stand-alone dumper. The new thing we are discussing here is that
we really want to have the benefits of the kdump mechanism in this case
as well, no only in the case of an automatic dump via panic().
 
> I really don't understand that to address the corner case why would
> you complicate the general kexec infrastructure and introduce new
> interfaces like meminfo.

Is it really such a complication to the general kexec infrastructure?
All we want is to know where the segments for the kdump kernel are to
be able to verify them and the entry point for the kdump kernel.
It is not like we are proposing the meminfo interface for all kdump
users. That is just for s390.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-18 13:57                         ` Martin Schwidefsky
  0 siblings, 0 replies; 112+ messages in thread
From: Martin Schwidefsky @ 2011-07-18 13:57 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Michael Holzheu, kexec

On Thu, 14 Jul 2011 13:55:32 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Thu, Jul 14, 2011 at 09:18:00AM +0200, Martin Schwidefsky wrote:
> > On Wed, 13 Jul 2011 16:00:04 -0400
> > Vivek Goyal <vgoyal@redhat.com> wrote:
> > 
> > > On Wed, Jul 13, 2011 at 06:46:11PM +0200, Martin Schwidefsky wrote:
> > > 
> > > [..]
> > > > > What I am suggesting is that stand alone dumper gets control only if
> > > > > kdump kernel is corrupted.
> > > > > 
> > > > > So following sequence.
> > > > > 
> > > > > Kernel Crash ---> purgatory --> either kdump kenrel/IPL stand alone tools
> > > > > 
> > > > > Here only drawback seems to be that we assume that purgatory code and
> > > > > pre-calculated checksum has not been corrupted. The big advantage is
> > > > > that s390 kdump support looks very similar to other arches and
> > > > > understaning and supporting kdump across architectures becomes easy.
> > > > 
> > > > My problem with that is the following: how do we get from the "Kernel Crash"
> > > > step to the purgatory code? It does work for "normal" panics, but it fails
> > > > miserably for a hard crash that does not even get as far as panic. That is
> > > > why we insist on a possible second order of things:
> > > 
> > > What is hard crash? How does that happen and what does x86 and s390
> > > do in that case?
> > 
> > E.g. an endless loop with interrupts disabled. To get out of this situation
> > we will IPL/boot a new system. That is either the production system itself
> > or the stand-alone dump tool. 
> 
> NMI hardware lockup detection will work in this situation and will lead
> to kdump trigger.

Ok, that reduces the problem to the code that is execution as a result of the
nmi interrupt. Only if that code got corrupted it will fail. Should be pretty
save.
 
> >  
> > > Though I don't have details but your argument seems to be that in s390
> > > we are always guranteed that we will jump to IPLing the stand alone
> > > tools code irresepective of the system state hence it is relatively
> > > safer to do checks in stand alone tools instead of purgatory where
> > > code is in memory.
> > 
> > Now you got it. That is the crux of the argument.
> > 
> > > If due to hard hang, code can not even make to purgatory, where would
> > > it go? Can't we do IPLing of stand alone tool then. 
> > 
> > It doesn't go anywhere. Basically the system is manually stopped and
> > restarted. But on s390 we can still get to all the required information
> > to generated a dump. That is one of the major differences to x86, if
> > you have to do a restart the registers on x86 will be gone, no?
> >  
> > > So we first try to take purgatory path which does the checksum and is
> > > consistent with other architectures. If that does not work in case
> > > of hard hang, you always have the option of IPLing the stand alone tool
> > > later manually.
> > 
> > How are we suddenly on the purgatory path again? The code that gets
> > control in case of a hard crash + IPL is the stand-alone dump tool,
> > not the purgatory code.
> 
> I think that's the biggest contetion point. From the start of discussion
> you have this hardcoded requirement that the moment panic() happens
> you are jumping to some IPL code and that's what I am questioning. Why
> can't you execute some more code after panic() (purgatory), before
> you jump to IPL code (only if you have to). 

No, if panic() happens and the code on the panic path is fine we do whatever
is configured as a panic action. For the kdump panic action this can be a
branch to the purgatory code.
The hardcoded requirement we have is a different one: if the automatic panic
action fails for some reason, then we still want to be able to get a dump,
preferably a kdump if the kdump kernel is still fine.

> > The first thing we want to do is to check if
> > the purgatory is still fine, that is do a checksum. If we have the
> > infrastructure in place to do one checksum then we can easily do the
> > other checksums as well.
> 
> Some piece of code you have to assume is fine. Are you not already
> assuming that IPL code you have in first 64K bytes is fine and no
> body has overwritten it. Are you not assuming that hook in panic()
> (I think you are calling it shutdown trigger) is fine so that it
> can help you jump to right place.

There is no IPL code in the first 64K byte at the time the production system
went bad. It is loaded by the IPL of the stand-alone dump tool. An IPL
always loads the code from a "safe" place before it gets executed.

> >  
> > > This will also get rid of requirement passing all the segment and cheksum
> > > info to stand alone tool with the help of meminfo (That's another sore
> > > point). 
> > 
> > No, it doesn't. We will still need to do the checksum for the purgatory
> > code and we already have the re-ipl information which won't go away.
> 
> It is a very small piece of code. The way you assume that your 8KB of
> IPL code is fine, I think we shall have to have this assumption here
> also.

That 8KB of IPL code has been freshly loaded from disk, you can not really
compare that to a setup where the purgatory code has been lying in memory
for almost the complete lifetime of the production system.

> >  
> > > Bottom line, even if you can't make to purgatory reliably, you always
> > > have the option of capturing dump manually using stand alone tools. We
> > > don't have to mix up kdump and stand alone mechanism. If kdump fails, we
> > > just need to have capability to still capture the dump using stand alone
> > > tools manually. I think that will make things simpler even for stand alone
> > > tools.
> > 
> > If we decide not to mix kdump and stand-alone dump then we loose something.
> > Consider a hard crash where the kdump segments are still intact. What our
> > customers do in that case is to start the stand-alone dump utility. Without
> > a way to find and verify the kdump setup we would have to do a full dump.
> > Which will take its time if the memory size is big. See?
> 
> This is a really-2 corner case where purgatory went bad. And even in
> corner case you capture the dump just that it is not filtered.

I beg to differ here. It is not only a problem if the purgatory code went
bad. It is a simple rule we follow on s390: if the system is unresponsive
IPL the stand-alone dumper. The new thing we are discussing here is that
we really want to have the benefits of the kdump mechanism in this case
as well, no only in the case of an automatic dump via panic().
 
> I really don't understand that to address the corner case why would
> you complicate the general kexec infrastructure and introduce new
> interfaces like meminfo.

Is it really such a complication to the general kexec infrastructure?
All we want is to know where the segments for the kdump kernel are to
be able to verify them and the entry point for the kdump kernel.
It is not like we are proposing the meminfo interface for all kdump
users. That is just for s390.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-18 12:31                                 ` Vivek Goyal
@ 2011-07-18 14:00                                   ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-18 14:00 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Martin Schwidefsky, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

Hello Vivek,

On Mon, 2011-07-18 at 08:31 -0400, Vivek Goyal wrote:
> On Fri, Jul 15, 2011 at 05:43:23PM +0200, Michael Holzheu wrote:
> > > Or in first step we can keep it even simpler. We can spin in infinite
> > > loop
> > 
> > Looping is probably not a good option in a hypervisor environment like
> > we have it on s390. At least we should load a disabled wait PSW.
> 
> What is "disabled wait PSW"?

This is a PSW where interrupts are disabled and the wait bit is on. This
ensures that the virtual CPU is stopped and does not consume any CPU
time.
 
> > > In your case I think you shall have to do little more so that second
> > > kernel also seems some of the lower memory areas so that later swapping
> > > of kernel can be done.
> > 
> > After the swap the ELF header is contained in the same memory than the
> > kdump kernel. When the kdump kernel starts, the ELF header has to be
> > saved from being overwritten (as kernel and ramdisk). I get the address
> > from the "elfcorehdr=" kernel parameter. How will I get the size?
> 
> By parsing the ELF header. It will give you information about how many
> program headers and notes are there, their sizes and locations etc.

The only thing we need is the size of the preallocated header that is in
kdump memory. All other architectures seem to pass this information
somehow with different mechanisms to the kdump kernel (memmap kernel
parameter, boot parameters, etc.). Why should *we* parse the ELF header?

> When kexec-tools loads ELF headers, it knows what's the total size of
> ELF headers and it removes that chunk of memory from the memory map
> passed to second kernel with memmap= options. IOW, some memory out
> of reserved region is not usable by second kernel because we have
> stored information in that memory. Kdump kernel maps that memory and
> gets to read the ELF headers.
> 
> So you shall have to do something similar where you need to tell second
> kernel what memory areas it can use for boot and remove ELF header
> memory area from the map.

So if we do that, why should we parse the ELF header?

> > Looking at the ia64 and x86 implementations I have the feeling there are
> > different mechanism available to do that.
> > 
> > > 
> > > > 
> > > > On ia64 - if I understood the code correctly - they seem to pass a kdump
> > > > segment "EFI_memmap" to the kdump kernel that contains information about
> > > > all loaded kexec segments. With this segment they can find out the size
> > > > of the ELF header segment in the kdump kernel and then do the memory
> > > > reservation at boot time. Is that correct?
> > > 
> > > Sorry, I don't know the details of IA64. May be somebody else on the list
> > > can pitch in with some clarifications here.
> > 
> > For me it looks like a mechanism where a block of information is
> > prepared by kexec tools and a pointer to that block is passed somehow to
> > the second kernel. I would assume that the definition of this block is
> > ia64 kernel ABI. 
> 
> It is possible. Even in x86, we prepare a block of information, one
> 4K page and fill lots of x86 boot protocol information.
> 
> Look at.
> 
> kexec-tools/include/x86/x86-linux.h
> kexec-tools/kexec/arch/i386/x86-linux-setup.c
> 
> Above header information contains information about e820 memory map also
> and we fill that map info for normal kexec (fastboot, not kdump) also and
> that's how second kernel comes to know about memory map of system.
> 
> I think one could possibly truncate the same map for kdump kernel to
> tell second kernel about the memory to use. But IIRC, original memory
> map is also used to determine max_pfn present in first kernel so that
> in second kernel we don't try to map a memory beyond that and access
> it, etc. Hence it was decided to leave it that way and pass the memory
> map for second kernel on command line. 
> 
> So its possible that IA64 is doing preparing boot protocal specific
> block and passing all the releavant information in that block instead
> of making use of commnad line.

Just to come back to your initial argumentation against our meminfo
approach: It looks like that there are already other mechanisms besides
of ELF-header and kernel parameters to pass information to the kdump
kernel. Where is the conceptional difference to our meminfo interface?

Michael


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-18 14:00                                   ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-18 14:00 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

Hello Vivek,

On Mon, 2011-07-18 at 08:31 -0400, Vivek Goyal wrote:
> On Fri, Jul 15, 2011 at 05:43:23PM +0200, Michael Holzheu wrote:
> > > Or in first step we can keep it even simpler. We can spin in infinite
> > > loop
> > 
> > Looping is probably not a good option in a hypervisor environment like
> > we have it on s390. At least we should load a disabled wait PSW.
> 
> What is "disabled wait PSW"?

This is a PSW where interrupts are disabled and the wait bit is on. This
ensures that the virtual CPU is stopped and does not consume any CPU
time.
 
> > > In your case I think you shall have to do little more so that second
> > > kernel also seems some of the lower memory areas so that later swapping
> > > of kernel can be done.
> > 
> > After the swap the ELF header is contained in the same memory than the
> > kdump kernel. When the kdump kernel starts, the ELF header has to be
> > saved from being overwritten (as kernel and ramdisk). I get the address
> > from the "elfcorehdr=" kernel parameter. How will I get the size?
> 
> By parsing the ELF header. It will give you information about how many
> program headers and notes are there, their sizes and locations etc.

The only thing we need is the size of the preallocated header that is in
kdump memory. All other architectures seem to pass this information
somehow with different mechanisms to the kdump kernel (memmap kernel
parameter, boot parameters, etc.). Why should *we* parse the ELF header?

> When kexec-tools loads ELF headers, it knows what's the total size of
> ELF headers and it removes that chunk of memory from the memory map
> passed to second kernel with memmap= options. IOW, some memory out
> of reserved region is not usable by second kernel because we have
> stored information in that memory. Kdump kernel maps that memory and
> gets to read the ELF headers.
> 
> So you shall have to do something similar where you need to tell second
> kernel what memory areas it can use for boot and remove ELF header
> memory area from the map.

So if we do that, why should we parse the ELF header?

> > Looking at the ia64 and x86 implementations I have the feeling there are
> > different mechanism available to do that.
> > 
> > > 
> > > > 
> > > > On ia64 - if I understood the code correctly - they seem to pass a kdump
> > > > segment "EFI_memmap" to the kdump kernel that contains information about
> > > > all loaded kexec segments. With this segment they can find out the size
> > > > of the ELF header segment in the kdump kernel and then do the memory
> > > > reservation at boot time. Is that correct?
> > > 
> > > Sorry, I don't know the details of IA64. May be somebody else on the list
> > > can pitch in with some clarifications here.
> > 
> > For me it looks like a mechanism where a block of information is
> > prepared by kexec tools and a pointer to that block is passed somehow to
> > the second kernel. I would assume that the definition of this block is
> > ia64 kernel ABI. 
> 
> It is possible. Even in x86, we prepare a block of information, one
> 4K page and fill lots of x86 boot protocol information.
> 
> Look at.
> 
> kexec-tools/include/x86/x86-linux.h
> kexec-tools/kexec/arch/i386/x86-linux-setup.c
> 
> Above header information contains information about e820 memory map also
> and we fill that map info for normal kexec (fastboot, not kdump) also and
> that's how second kernel comes to know about memory map of system.
> 
> I think one could possibly truncate the same map for kdump kernel to
> tell second kernel about the memory to use. But IIRC, original memory
> map is also used to determine max_pfn present in first kernel so that
> in second kernel we don't try to map a memory beyond that and access
> it, etc. Hence it was decided to leave it that way and pass the memory
> map for second kernel on command line. 
> 
> So its possible that IA64 is doing preparing boot protocal specific
> block and passing all the releavant information in that block instead
> of making use of commnad line.

Just to come back to your initial argumentation against our meminfo
approach: It looks like that there are already other mechanisms besides
of ELF-header and kernel parameters to pass information to the kdump
kernel. Where is the conceptional difference to our meminfo interface?

Michael


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-18 14:00                                   ` Michael Holzheu
@ 2011-07-18 14:19                                     ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-18 14:19 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: Martin Schwidefsky, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Mon, Jul 18, 2011 at 04:00:41PM +0200, Michael Holzheu wrote:
> Hello Vivek,
> 
> On Mon, 2011-07-18 at 08:31 -0400, Vivek Goyal wrote:
> > On Fri, Jul 15, 2011 at 05:43:23PM +0200, Michael Holzheu wrote:
> > > > Or in first step we can keep it even simpler. We can spin in infinite
> > > > loop
> > > 
> > > Looping is probably not a good option in a hypervisor environment like
> > > we have it on s390. At least we should load a disabled wait PSW.
> > 
> > What is "disabled wait PSW"?
> 
> This is a PSW where interrupts are disabled and the wait bit is on. This
> ensures that the virtual CPU is stopped and does not consume any CPU
> time.
>  
> > > > In your case I think you shall have to do little more so that second
> > > > kernel also seems some of the lower memory areas so that later swapping
> > > > of kernel can be done.
> > > 
> > > After the swap the ELF header is contained in the same memory than the
> > > kdump kernel. When the kdump kernel starts, the ELF header has to be
> > > saved from being overwritten (as kernel and ramdisk). I get the address
> > > from the "elfcorehdr=" kernel parameter. How will I get the size?
> > 
> > By parsing the ELF header. It will give you information about how many
> > program headers and notes are there, their sizes and locations etc.
> 
> The only thing we need is the size of the preallocated header that is in
> kdump memory. All other architectures seem to pass this information
> somehow with different mechanisms to the kdump kernel (memmap kernel
> parameter, boot parameters, etc.). Why should *we* parse the ELF header?

ELF headers and memmap parameters are communicating two different pieces
of information to second kenrel.

- memap tells what memory second kernel can use to boot.
- ELF headers tell what memory areas first kernel was using and using
  that information how to construct ELF headers for /proc/vmcore interface
  in second kernel. On x86, ELF headers also communicate where the saved
  cpu state is for the first kernel.

Arch independent code in kdump kenrel (fs/proc/vmcore.c) is parsing those
ELF headers to export /proc/vmcore. So if you set up the headers right
you get that arch independent code for free without any changes to generic
code.

*Why should you not try to use what is avaialble already*

> 
> > When kexec-tools loads ELF headers, it knows what's the total size of
> > ELF headers and it removes that chunk of memory from the memory map
> > passed to second kernel with memmap= options. IOW, some memory out
> > of reserved region is not usable by second kernel because we have
> > stored information in that memory. Kdump kernel maps that memory and
> > gets to read the ELF headers.
> > 
> > So you shall have to do something similar where you need to tell second
> > kernel what memory areas it can use for boot and remove ELF header
> > memory area from the map.
> 
> So if we do that, why should we parse the ELF header?

To know three things.

- Memory areas being used by first kernel.
- Cpu states at the time of crash of first kernel.
- Some config options exported by first kernel with the help of ELF notes.

fs/proc/vmcore.c already does it for you. You just need to make sure that
you tell it following.

- Where to find the headers in memory (elfcorehdr=)
- A way to map that memory and access contents.
- Make sure these headers are not overwritten by newly booted kernel.

[..]
> > It is possible. Even in x86, we prepare a block of information, one
> > 4K page and fill lots of x86 boot protocol information.
> > 
> > Look at.
> > 
> > kexec-tools/include/x86/x86-linux.h
> > kexec-tools/kexec/arch/i386/x86-linux-setup.c
> > 
> > Above header information contains information about e820 memory map also
> > and we fill that map info for normal kexec (fastboot, not kdump) also and
> > that's how second kernel comes to know about memory map of system.
> > 
> > I think one could possibly truncate the same map for kdump kernel to
> > tell second kernel about the memory to use. But IIRC, original memory
> > map is also used to determine max_pfn present in first kernel so that
> > in second kernel we don't try to map a memory beyond that and access
> > it, etc. Hence it was decided to leave it that way and pass the memory
> > map for second kernel on command line. 
> > 
> > So its possible that IA64 is doing preparing boot protocal specific
> > block and passing all the releavant information in that block instead
> > of making use of commnad line.
> 
> Just to come back to your initial argumentation against our meminfo
> approach: It looks like that there are already other mechanisms besides
> of ELF-header and kernel parameters to pass information to the kdump
> kernel. Where is the conceptional difference to our meminfo interface?

That's well defined boot-loader and kernel protocol to on x86. kexec-tools
is just another boot loader and it uses that block to fill the information
a normal boot loader will do.

So if you have s390 specific boot loader/kernel protocol and if you extend
that, I think that should still be fine. Just keep the code in kexec-tools
for filling up the information which s390 specific code can parse. In
that case we should not require any generic changes to either kexec-tools
or kernel code. All the protocol specific details should be well hidden
in arch specific code.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-18 14:19                                     ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-18 14:19 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

On Mon, Jul 18, 2011 at 04:00:41PM +0200, Michael Holzheu wrote:
> Hello Vivek,
> 
> On Mon, 2011-07-18 at 08:31 -0400, Vivek Goyal wrote:
> > On Fri, Jul 15, 2011 at 05:43:23PM +0200, Michael Holzheu wrote:
> > > > Or in first step we can keep it even simpler. We can spin in infinite
> > > > loop
> > > 
> > > Looping is probably not a good option in a hypervisor environment like
> > > we have it on s390. At least we should load a disabled wait PSW.
> > 
> > What is "disabled wait PSW"?
> 
> This is a PSW where interrupts are disabled and the wait bit is on. This
> ensures that the virtual CPU is stopped and does not consume any CPU
> time.
>  
> > > > In your case I think you shall have to do little more so that second
> > > > kernel also seems some of the lower memory areas so that later swapping
> > > > of kernel can be done.
> > > 
> > > After the swap the ELF header is contained in the same memory than the
> > > kdump kernel. When the kdump kernel starts, the ELF header has to be
> > > saved from being overwritten (as kernel and ramdisk). I get the address
> > > from the "elfcorehdr=" kernel parameter. How will I get the size?
> > 
> > By parsing the ELF header. It will give you information about how many
> > program headers and notes are there, their sizes and locations etc.
> 
> The only thing we need is the size of the preallocated header that is in
> kdump memory. All other architectures seem to pass this information
> somehow with different mechanisms to the kdump kernel (memmap kernel
> parameter, boot parameters, etc.). Why should *we* parse the ELF header?

ELF headers and memmap parameters are communicating two different pieces
of information to second kenrel.

- memap tells what memory second kernel can use to boot.
- ELF headers tell what memory areas first kernel was using and using
  that information how to construct ELF headers for /proc/vmcore interface
  in second kernel. On x86, ELF headers also communicate where the saved
  cpu state is for the first kernel.

Arch independent code in kdump kenrel (fs/proc/vmcore.c) is parsing those
ELF headers to export /proc/vmcore. So if you set up the headers right
you get that arch independent code for free without any changes to generic
code.

*Why should you not try to use what is avaialble already*

> 
> > When kexec-tools loads ELF headers, it knows what's the total size of
> > ELF headers and it removes that chunk of memory from the memory map
> > passed to second kernel with memmap= options. IOW, some memory out
> > of reserved region is not usable by second kernel because we have
> > stored information in that memory. Kdump kernel maps that memory and
> > gets to read the ELF headers.
> > 
> > So you shall have to do something similar where you need to tell second
> > kernel what memory areas it can use for boot and remove ELF header
> > memory area from the map.
> 
> So if we do that, why should we parse the ELF header?

To know three things.

- Memory areas being used by first kernel.
- Cpu states at the time of crash of first kernel.
- Some config options exported by first kernel with the help of ELF notes.

fs/proc/vmcore.c already does it for you. You just need to make sure that
you tell it following.

- Where to find the headers in memory (elfcorehdr=)
- A way to map that memory and access contents.
- Make sure these headers are not overwritten by newly booted kernel.

[..]
> > It is possible. Even in x86, we prepare a block of information, one
> > 4K page and fill lots of x86 boot protocol information.
> > 
> > Look at.
> > 
> > kexec-tools/include/x86/x86-linux.h
> > kexec-tools/kexec/arch/i386/x86-linux-setup.c
> > 
> > Above header information contains information about e820 memory map also
> > and we fill that map info for normal kexec (fastboot, not kdump) also and
> > that's how second kernel comes to know about memory map of system.
> > 
> > I think one could possibly truncate the same map for kdump kernel to
> > tell second kernel about the memory to use. But IIRC, original memory
> > map is also used to determine max_pfn present in first kernel so that
> > in second kernel we don't try to map a memory beyond that and access
> > it, etc. Hence it was decided to leave it that way and pass the memory
> > map for second kernel on command line. 
> > 
> > So its possible that IA64 is doing preparing boot protocal specific
> > block and passing all the releavant information in that block instead
> > of making use of commnad line.
> 
> Just to come back to your initial argumentation against our meminfo
> approach: It looks like that there are already other mechanisms besides
> of ELF-header and kernel parameters to pass information to the kdump
> kernel. Where is the conceptional difference to our meminfo interface?

That's well defined boot-loader and kernel protocol to on x86. kexec-tools
is just another boot loader and it uses that block to fill the information
a normal boot loader will do.

So if you have s390 specific boot loader/kernel protocol and if you extend
that, I think that should still be fine. Just keep the code in kexec-tools
for filling up the information which s390 specific code can parse. In
that case we should not require any generic changes to either kexec-tools
or kernel code. All the protocol specific details should be well hidden
in arch specific code.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-18 14:19                                     ` Vivek Goyal
@ 2011-07-18 14:44                                       ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-18 14:44 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Martin Schwidefsky, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Mon, 2011-07-18 at 10:19 -0400, Vivek Goyal wrote:
> > > By parsing the ELF header. It will give you information about how many
> > > program headers and notes are there, their sizes and locations etc.
> > 
> > The only thing we need is the size of the preallocated header that is in
> > kdump memory. All other architectures seem to pass this information
> > somehow with different mechanisms to the kdump kernel (memmap kernel
> > parameter, boot parameters, etc.). Why should *we* parse the ELF header?
> 
> ELF headers and memmap parameters are communicating two different pieces
> of information to second kenrel.
> 
> - memap tells what memory second kernel can use to boot.
> - ELF headers tell what memory areas first kernel was using and using
>   that information how to construct ELF headers for /proc/vmcore interface
>   in second kernel. On x86, ELF headers also communicate where the saved
>   cpu state is for the first kernel.
> 
> Arch independent code in kdump kenrel (fs/proc/vmcore.c) is parsing those
> ELF headers to export /proc/vmcore. So if you set up the headers right
> you get that arch independent code for free without any changes to generic
> code.

Vivek, I know all these things. So, we (s390) do *not* have to parse the
ELF header. We only have to ensure the kexec prepared ELF header is
reserved until the /proc/vmcore parses it. All the ELF notes for CPUs,
etc. should automatically be reserved, because they are allocated in
oldmem by the old crashed kernel.

All what I was asking is how we pass best the information "size of the
preallocated ELF header" to the 2nd kernel for reserving the header. We
currently do not have the memmap kernel parameter.

> > 
> > > When kexec-tools loads ELF headers, it knows what's the total size of
> > > ELF headers and it removes that chunk of memory from the memory map
> > > passed to second kernel with memmap= options. IOW, some memory out
> > > of reserved region is not usable by second kernel because we have
> > > stored information in that memory. Kdump kernel maps that memory and
> > > gets to read the ELF headers.
> > > 
> > > So you shall have to do something similar where you need to tell second
> > > kernel what memory areas it can use for boot and remove ELF header
> > > memory area from the map.
> > 
> > So if we do that, why should we parse the ELF header?
> 
> To know three things.
> 
> - Memory areas being used by first kernel.
> - Cpu states at the time of crash of first kernel.
> - Some config options exported by first kernel with the help of ELF notes.

sure

> fs/proc/vmcore.c already does it for you. You just need to make sure that
> you tell it following.
> 
> - Where to find the headers in memory (elfcorehdr=)
> - A way to map that memory and access contents.

sure

> - Make sure these headers are not overwritten by newly booted kernel.

And that was my question: What is the best way to do that. E.g. we could
pass a 2nd kernel parameter "elfcorehdr_size", implement s390 boot
parameter or implement the memmap kernel parameter.

Currently the s390 kernel knows only two objects when booting: kernel
and ramdisk. Now with the ELF header we would have a third one.

Michael


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-18 14:44                                       ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-18 14:44 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

On Mon, 2011-07-18 at 10:19 -0400, Vivek Goyal wrote:
> > > By parsing the ELF header. It will give you information about how many
> > > program headers and notes are there, their sizes and locations etc.
> > 
> > The only thing we need is the size of the preallocated header that is in
> > kdump memory. All other architectures seem to pass this information
> > somehow with different mechanisms to the kdump kernel (memmap kernel
> > parameter, boot parameters, etc.). Why should *we* parse the ELF header?
> 
> ELF headers and memmap parameters are communicating two different pieces
> of information to second kenrel.
> 
> - memap tells what memory second kernel can use to boot.
> - ELF headers tell what memory areas first kernel was using and using
>   that information how to construct ELF headers for /proc/vmcore interface
>   in second kernel. On x86, ELF headers also communicate where the saved
>   cpu state is for the first kernel.
> 
> Arch independent code in kdump kenrel (fs/proc/vmcore.c) is parsing those
> ELF headers to export /proc/vmcore. So if you set up the headers right
> you get that arch independent code for free without any changes to generic
> code.

Vivek, I know all these things. So, we (s390) do *not* have to parse the
ELF header. We only have to ensure the kexec prepared ELF header is
reserved until the /proc/vmcore parses it. All the ELF notes for CPUs,
etc. should automatically be reserved, because they are allocated in
oldmem by the old crashed kernel.

All what I was asking is how we pass best the information "size of the
preallocated ELF header" to the 2nd kernel for reserving the header. We
currently do not have the memmap kernel parameter.

> > 
> > > When kexec-tools loads ELF headers, it knows what's the total size of
> > > ELF headers and it removes that chunk of memory from the memory map
> > > passed to second kernel with memmap= options. IOW, some memory out
> > > of reserved region is not usable by second kernel because we have
> > > stored information in that memory. Kdump kernel maps that memory and
> > > gets to read the ELF headers.
> > > 
> > > So you shall have to do something similar where you need to tell second
> > > kernel what memory areas it can use for boot and remove ELF header
> > > memory area from the map.
> > 
> > So if we do that, why should we parse the ELF header?
> 
> To know three things.
> 
> - Memory areas being used by first kernel.
> - Cpu states at the time of crash of first kernel.
> - Some config options exported by first kernel with the help of ELF notes.

sure

> fs/proc/vmcore.c already does it for you. You just need to make sure that
> you tell it following.
> 
> - Where to find the headers in memory (elfcorehdr=)
> - A way to map that memory and access contents.

sure

> - Make sure these headers are not overwritten by newly booted kernel.

And that was my question: What is the best way to do that. E.g. we could
pass a 2nd kernel parameter "elfcorehdr_size", implement s390 boot
parameter or implement the memmap kernel parameter.

Currently the s390 kernel knows only two objects when booting: kernel
and ramdisk. Now with the ELF header we would have a third one.

Michael


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-18 14:44                                       ` Michael Holzheu
@ 2011-07-18 15:25                                         ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-18 15:25 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: Martin Schwidefsky, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Mon, Jul 18, 2011 at 04:44:13PM +0200, Michael Holzheu wrote:
> On Mon, 2011-07-18 at 10:19 -0400, Vivek Goyal wrote:
> > > > By parsing the ELF header. It will give you information about how many
> > > > program headers and notes are there, their sizes and locations etc.
> > > 
> > > The only thing we need is the size of the preallocated header that is in
> > > kdump memory. All other architectures seem to pass this information
> > > somehow with different mechanisms to the kdump kernel (memmap kernel
> > > parameter, boot parameters, etc.). Why should *we* parse the ELF header?
> > 
> > ELF headers and memmap parameters are communicating two different pieces
> > of information to second kenrel.
> > 
> > - memap tells what memory second kernel can use to boot.
> > - ELF headers tell what memory areas first kernel was using and using
> >   that information how to construct ELF headers for /proc/vmcore interface
> >   in second kernel. On x86, ELF headers also communicate where the saved
> >   cpu state is for the first kernel.
> > 
> > Arch independent code in kdump kenrel (fs/proc/vmcore.c) is parsing those
> > ELF headers to export /proc/vmcore. So if you set up the headers right
> > you get that arch independent code for free without any changes to generic
> > code.
> 
> Vivek, I know all these things. So, we (s390) do *not* have to parse the
> ELF header. We only have to ensure the kexec prepared ELF header is
> reserved until the /proc/vmcore parses it. All the ELF notes for CPUs,
> etc. should automatically be reserved, because they are allocated in
> oldmem by the old crashed kernel.
> 

[..]
> All what I was asking is how we pass best the information "size of the
> preallocated ELF header" to the 2nd kernel for reserving the header. We
> currently do not have the memmap kernel parameter.

In theory you could come up with another command line option to pass
which also tells size of header segment. But having a generic
mechanism to provide memory map to second kernel will be more useful.
The reason being that apart from ELF headers there might be more
segments/memory areas which you need to exclude from the view of second
kernel.

For example, backup reason on x86. This is a reason of 640K in reserved
area where we copy the contets of first 640K of memory. In the past
it looked like that even though we have relocatable kernel, it still
needed some memory in low memory rgions. So we copy the contents of
first 640K in backup area in reserved memory region and exclude that
memory from the memory kdump kenrel can use (again using memmap=
options).

How do you pass memory map to kernel in s390? Isn't there a way
to modify that? That would be easiest I think.

If you have only 1 memory area to exclude, probably you can get away
with implementing elfcorehdrsize parameter. But this will be highly
arch specific and works only if there is one memory area you want to
exlucde.

Or for s390, implement a new parameter excludemem=x@y where you
tell kernel not to use specified memory area and kexec-tools should
be able to put right commnad line options for second kernel.

> 
> > > 
> > > > When kexec-tools loads ELF headers, it knows what's the total size of
> > > > ELF headers and it removes that chunk of memory from the memory map
> > > > passed to second kernel with memmap= options. IOW, some memory out
> > > > of reserved region is not usable by second kernel because we have
> > > > stored information in that memory. Kdump kernel maps that memory and
> > > > gets to read the ELF headers.
> > > > 
> > > > So you shall have to do something similar where you need to tell second
> > > > kernel what memory areas it can use for boot and remove ELF header
> > > > memory area from the map.
> > > 
> > > So if we do that, why should we parse the ELF header?
> > 
> > To know three things.
> > 
> > - Memory areas being used by first kernel.
> > - Cpu states at the time of crash of first kernel.
> > - Some config options exported by first kernel with the help of ELF notes.
> 
> sure
> 
> > fs/proc/vmcore.c already does it for you. You just need to make sure that
> > you tell it following.
> > 
> > - Where to find the headers in memory (elfcorehdr=)
> > - A way to map that memory and access contents.
> 
> sure
> 
> > - Make sure these headers are not overwritten by newly booted kernel.
> 
> And that was my question: What is the best way to do that. E.g. we could
> pass a 2nd kernel parameter "elfcorehdr_size", implement s390 boot
> parameter or implement the memmap kernel parameter.

You could do that but I think a more generic parameter will make more
sense.

- Either something along the lines of memmap=
- Or excludemem=x@y
- Or modify memory map in s390 specific bootloading protocol block etc.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-18 15:25                                         ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-18 15:25 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

On Mon, Jul 18, 2011 at 04:44:13PM +0200, Michael Holzheu wrote:
> On Mon, 2011-07-18 at 10:19 -0400, Vivek Goyal wrote:
> > > > By parsing the ELF header. It will give you information about how many
> > > > program headers and notes are there, their sizes and locations etc.
> > > 
> > > The only thing we need is the size of the preallocated header that is in
> > > kdump memory. All other architectures seem to pass this information
> > > somehow with different mechanisms to the kdump kernel (memmap kernel
> > > parameter, boot parameters, etc.). Why should *we* parse the ELF header?
> > 
> > ELF headers and memmap parameters are communicating two different pieces
> > of information to second kenrel.
> > 
> > - memap tells what memory second kernel can use to boot.
> > - ELF headers tell what memory areas first kernel was using and using
> >   that information how to construct ELF headers for /proc/vmcore interface
> >   in second kernel. On x86, ELF headers also communicate where the saved
> >   cpu state is for the first kernel.
> > 
> > Arch independent code in kdump kenrel (fs/proc/vmcore.c) is parsing those
> > ELF headers to export /proc/vmcore. So if you set up the headers right
> > you get that arch independent code for free without any changes to generic
> > code.
> 
> Vivek, I know all these things. So, we (s390) do *not* have to parse the
> ELF header. We only have to ensure the kexec prepared ELF header is
> reserved until the /proc/vmcore parses it. All the ELF notes for CPUs,
> etc. should automatically be reserved, because they are allocated in
> oldmem by the old crashed kernel.
> 

[..]
> All what I was asking is how we pass best the information "size of the
> preallocated ELF header" to the 2nd kernel for reserving the header. We
> currently do not have the memmap kernel parameter.

In theory you could come up with another command line option to pass
which also tells size of header segment. But having a generic
mechanism to provide memory map to second kernel will be more useful.
The reason being that apart from ELF headers there might be more
segments/memory areas which you need to exclude from the view of second
kernel.

For example, backup reason on x86. This is a reason of 640K in reserved
area where we copy the contets of first 640K of memory. In the past
it looked like that even though we have relocatable kernel, it still
needed some memory in low memory rgions. So we copy the contents of
first 640K in backup area in reserved memory region and exclude that
memory from the memory kdump kenrel can use (again using memmap=
options).

How do you pass memory map to kernel in s390? Isn't there a way
to modify that? That would be easiest I think.

If you have only 1 memory area to exclude, probably you can get away
with implementing elfcorehdrsize parameter. But this will be highly
arch specific and works only if there is one memory area you want to
exlucde.

Or for s390, implement a new parameter excludemem=x@y where you
tell kernel not to use specified memory area and kexec-tools should
be able to put right commnad line options for second kernel.

> 
> > > 
> > > > When kexec-tools loads ELF headers, it knows what's the total size of
> > > > ELF headers and it removes that chunk of memory from the memory map
> > > > passed to second kernel with memmap= options. IOW, some memory out
> > > > of reserved region is not usable by second kernel because we have
> > > > stored information in that memory. Kdump kernel maps that memory and
> > > > gets to read the ELF headers.
> > > > 
> > > > So you shall have to do something similar where you need to tell second
> > > > kernel what memory areas it can use for boot and remove ELF header
> > > > memory area from the map.
> > > 
> > > So if we do that, why should we parse the ELF header?
> > 
> > To know three things.
> > 
> > - Memory areas being used by first kernel.
> > - Cpu states at the time of crash of first kernel.
> > - Some config options exported by first kernel with the help of ELF notes.
> 
> sure
> 
> > fs/proc/vmcore.c already does it for you. You just need to make sure that
> > you tell it following.
> > 
> > - Where to find the headers in memory (elfcorehdr=)
> > - A way to map that memory and access contents.
> 
> sure
> 
> > - Make sure these headers are not overwritten by newly booted kernel.
> 
> And that was my question: What is the best way to do that. E.g. we could
> pass a 2nd kernel parameter "elfcorehdr_size", implement s390 boot
> parameter or implement the memmap kernel parameter.

You could do that but I think a more generic parameter will make more
sense.

- Either something along the lines of memmap=
- Or excludemem=x@y
- Or modify memory map in s390 specific bootloading protocol block etc.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-18 15:25                                         ` Vivek Goyal
@ 2011-07-18 18:03                                           ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-18 18:03 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Martin Schwidefsky, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

Hello Vivek,

On Mon, 2011-07-18 at 11:25 -0400, Vivek Goyal wrote:
> On Mon, Jul 18, 2011 at 04:44:13PM +0200, Michael Holzheu wrote:
> > On Mon, 2011-07-18 at 10:19 -0400, Vivek Goyal wrote:
> > > - Make sure these headers are not overwritten by newly booted kernel.
> > 
> > And that was my question: What is the best way to do that. E.g. we could
> > pass a 2nd kernel parameter "elfcorehdr_size", implement s390 boot
> > parameter or implement the memmap kernel parameter.
> 
> You could do that but I think a more generic parameter will make more
> sense.
> 
> - Either something along the lines of memmap=
> - Or excludemem=x@y
> - Or modify memory map in s390 specific bootloading protocol block etc.

Ok, understood. Thanks for the information.

We still have discussions here, if we could somehow implement our
original idea of triggering kdump by the stand-alone dump tools. Sorry
for being so stubborn :-(

So here comes the modified suggestion:

As requested by you we can pre-allocate the ELF header and use purgatory
as done on other architectures.

To allow the stand-alone dump tools as kdump triggers, we then only
would have to provide an s390 specific way to tell the stand-alone dump
tools:
1. Entry point address into purgatory
2. Address, size and checksum for purgatory

We could store address, size and checksum of the purgatory to a fixed
offset in the kdump kernel image. This can be done in the kexec tools
code. Then the dump tools only would need the crashkernel memory offset
to find all information. Then dump tools will verify purgatory and
afterwards jump to the purgatory code. Then purgatory verifies all kexec
segments. For s390, if this check fails, we return to caller
(stand-alone tools). If the check is ok, then purgatory code on s390
saves all registers to the preallocated ELF notes and starts kdump.

I think, this is all s390 specific and IMHO will not affect other
architectures at all.

What you as kdump framework maintainer would have to accept with this
solution is that it is allowed now to start kdump directly via purgatory
without using code from the old kernel (e.g. crash_kexec). This has as
implication that all things that the old kernel has to initialize for
kdump has to be done before the system crashes. Currently this is only
the initialization of vmcoreinfo.

Would such a solution be acceptable for you?

Michael


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-18 18:03                                           ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-18 18:03 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

Hello Vivek,

On Mon, 2011-07-18 at 11:25 -0400, Vivek Goyal wrote:
> On Mon, Jul 18, 2011 at 04:44:13PM +0200, Michael Holzheu wrote:
> > On Mon, 2011-07-18 at 10:19 -0400, Vivek Goyal wrote:
> > > - Make sure these headers are not overwritten by newly booted kernel.
> > 
> > And that was my question: What is the best way to do that. E.g. we could
> > pass a 2nd kernel parameter "elfcorehdr_size", implement s390 boot
> > parameter or implement the memmap kernel parameter.
> 
> You could do that but I think a more generic parameter will make more
> sense.
> 
> - Either something along the lines of memmap=
> - Or excludemem=x@y
> - Or modify memory map in s390 specific bootloading protocol block etc.

Ok, understood. Thanks for the information.

We still have discussions here, if we could somehow implement our
original idea of triggering kdump by the stand-alone dump tools. Sorry
for being so stubborn :-(

So here comes the modified suggestion:

As requested by you we can pre-allocate the ELF header and use purgatory
as done on other architectures.

To allow the stand-alone dump tools as kdump triggers, we then only
would have to provide an s390 specific way to tell the stand-alone dump
tools:
1. Entry point address into purgatory
2. Address, size and checksum for purgatory

We could store address, size and checksum of the purgatory to a fixed
offset in the kdump kernel image. This can be done in the kexec tools
code. Then the dump tools only would need the crashkernel memory offset
to find all information. Then dump tools will verify purgatory and
afterwards jump to the purgatory code. Then purgatory verifies all kexec
segments. For s390, if this check fails, we return to caller
(stand-alone tools). If the check is ok, then purgatory code on s390
saves all registers to the preallocated ELF notes and starts kdump.

I think, this is all s390 specific and IMHO will not affect other
architectures at all.

What you as kdump framework maintainer would have to accept with this
solution is that it is allowed now to start kdump directly via purgatory
without using code from the old kernel (e.g. crash_kexec). This has as
implication that all things that the old kernel has to initialize for
kdump has to be done before the system crashes. Currently this is only
the initialization of vmcoreinfo.

Would such a solution be acceptable for you?

Michael


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-18 18:03                                           ` Michael Holzheu
@ 2011-07-19 15:04                                             ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-19 15:04 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: Martin Schwidefsky, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Mon, Jul 18, 2011 at 08:03:08PM +0200, Michael Holzheu wrote:
> Hello Vivek,
> 
> On Mon, 2011-07-18 at 11:25 -0400, Vivek Goyal wrote:
> > On Mon, Jul 18, 2011 at 04:44:13PM +0200, Michael Holzheu wrote:
> > > On Mon, 2011-07-18 at 10:19 -0400, Vivek Goyal wrote:
> > > > - Make sure these headers are not overwritten by newly booted kernel.
> > > 
> > > And that was my question: What is the best way to do that. E.g. we could
> > > pass a 2nd kernel parameter "elfcorehdr_size", implement s390 boot
> > > parameter or implement the memmap kernel parameter.
> > 
> > You could do that but I think a more generic parameter will make more
> > sense.
> > 
> > - Either something along the lines of memmap=
> > - Or excludemem=x@y
> > - Or modify memory map in s390 specific bootloading protocol block etc.
> 
> Ok, understood. Thanks for the information.
> 
> We still have discussions here, if we could somehow implement our
> original idea of triggering kdump by the stand-alone dump tools. Sorry
> for being so stubborn :-(

What's the advantage of that. Why are we so stuborn about first passing
the control to dump tools after panic()?

The case of purgatory corruption is no different then panic() code
and associated hook code corruption. 

It is a corner case and even if it gets corrupted you have other
mechanisms to IPL dump tools and capture dump.

Why do you want to mix two mechanisms. What's the advantage of making
even dump tools complicated and make it aware of a kernel binary
object purgatory?

To me the simple interface is that there is no coupling between dump
tools and kdump. If there is no coupling, then there is no need to
exchange any information and no need to make any assumption about
hard coded location where purgatory entry point, size and checksums
are stored.

> 
> So here comes the modified suggestion:
> 
> As requested by you we can pre-allocate the ELF header and use purgatory
> as done on other architectures.
> 
> To allow the stand-alone dump tools as kdump triggers, we then only
> would have to provide an s390 specific way to tell the stand-alone dump
> tools:
> 1. Entry point address into purgatory
> 2. Address, size and checksum for purgatory
> 
> We could store address, size and checksum of the purgatory to a fixed
> offset in the kdump kernel image. This can be done in the kexec tools
> code.

I think this will require kernel changes also? Otherwise how would you
store variables in kernel address space.

Secondly, if the goal is to just be able to checksum purgatory also, then
it probably should be done in a generic mannner so that kernel could
checksum purgatory before jumping to it.

> Then the dump tools only would need the crashkernel memory offset
> to find all information. Then dump tools will verify purgatory and
> afterwards jump to the purgatory code. Then purgatory verifies all kexec
> segments. For s390, if this check fails, we return to caller
> (stand-alone tools). If the check is ok, then purgatory code on s390
> saves all registers to the preallocated ELF notes and starts kdump.

So far I really don't think that there is any need of involving dump
tools here. By making it a requirement we are just making the design
complex with no gains.

> 
> I think, this is all s390 specific and IMHO will not affect other
> architectures at all.
> 
> What you as kdump framework maintainer would have to accept with this
> solution is that it is allowed now to start kdump directly via purgatory
> without using code from the old kernel (e.g. crash_kexec). This has as
> implication that all things that the old kernel has to initialize for
> kdump has to be done before the system crashes. Currently this is only
> the initialization of vmcoreinfo.

when would you save vmcoreinfo? I guess I shall have to look at the
patches.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-19 15:04                                             ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-19 15:04 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

On Mon, Jul 18, 2011 at 08:03:08PM +0200, Michael Holzheu wrote:
> Hello Vivek,
> 
> On Mon, 2011-07-18 at 11:25 -0400, Vivek Goyal wrote:
> > On Mon, Jul 18, 2011 at 04:44:13PM +0200, Michael Holzheu wrote:
> > > On Mon, 2011-07-18 at 10:19 -0400, Vivek Goyal wrote:
> > > > - Make sure these headers are not overwritten by newly booted kernel.
> > > 
> > > And that was my question: What is the best way to do that. E.g. we could
> > > pass a 2nd kernel parameter "elfcorehdr_size", implement s390 boot
> > > parameter or implement the memmap kernel parameter.
> > 
> > You could do that but I think a more generic parameter will make more
> > sense.
> > 
> > - Either something along the lines of memmap=
> > - Or excludemem=x@y
> > - Or modify memory map in s390 specific bootloading protocol block etc.
> 
> Ok, understood. Thanks for the information.
> 
> We still have discussions here, if we could somehow implement our
> original idea of triggering kdump by the stand-alone dump tools. Sorry
> for being so stubborn :-(

What's the advantage of that. Why are we so stuborn about first passing
the control to dump tools after panic()?

The case of purgatory corruption is no different then panic() code
and associated hook code corruption. 

It is a corner case and even if it gets corrupted you have other
mechanisms to IPL dump tools and capture dump.

Why do you want to mix two mechanisms. What's the advantage of making
even dump tools complicated and make it aware of a kernel binary
object purgatory?

To me the simple interface is that there is no coupling between dump
tools and kdump. If there is no coupling, then there is no need to
exchange any information and no need to make any assumption about
hard coded location where purgatory entry point, size and checksums
are stored.

> 
> So here comes the modified suggestion:
> 
> As requested by you we can pre-allocate the ELF header and use purgatory
> as done on other architectures.
> 
> To allow the stand-alone dump tools as kdump triggers, we then only
> would have to provide an s390 specific way to tell the stand-alone dump
> tools:
> 1. Entry point address into purgatory
> 2. Address, size and checksum for purgatory
> 
> We could store address, size and checksum of the purgatory to a fixed
> offset in the kdump kernel image. This can be done in the kexec tools
> code.

I think this will require kernel changes also? Otherwise how would you
store variables in kernel address space.

Secondly, if the goal is to just be able to checksum purgatory also, then
it probably should be done in a generic mannner so that kernel could
checksum purgatory before jumping to it.

> Then the dump tools only would need the crashkernel memory offset
> to find all information. Then dump tools will verify purgatory and
> afterwards jump to the purgatory code. Then purgatory verifies all kexec
> segments. For s390, if this check fails, we return to caller
> (stand-alone tools). If the check is ok, then purgatory code on s390
> saves all registers to the preallocated ELF notes and starts kdump.

So far I really don't think that there is any need of involving dump
tools here. By making it a requirement we are just making the design
complex with no gains.

> 
> I think, this is all s390 specific and IMHO will not affect other
> architectures at all.
> 
> What you as kdump framework maintainer would have to accept with this
> solution is that it is allowed now to start kdump directly via purgatory
> without using code from the old kernel (e.g. crash_kexec). This has as
> implication that all things that the old kernel has to initialize for
> kdump has to be done before the system crashes. Currently this is only
> the initialization of vmcoreinfo.

when would you save vmcoreinfo? I guess I shall have to look at the
patches.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-19 15:04                                             ` Vivek Goyal
@ 2011-07-20  8:00                                               ` Martin Schwidefsky
  -1 siblings, 0 replies; 112+ messages in thread
From: Martin Schwidefsky @ 2011-07-20  8:00 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Michael Holzheu, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Tue, 19 Jul 2011 11:04:23 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Mon, Jul 18, 2011 at 08:03:08PM +0200, Michael Holzheu wrote:
> > Hello Vivek,
> > 
> > On Mon, 2011-07-18 at 11:25 -0400, Vivek Goyal wrote:
> > > On Mon, Jul 18, 2011 at 04:44:13PM +0200, Michael Holzheu wrote:
> > > > On Mon, 2011-07-18 at 10:19 -0400, Vivek Goyal wrote:
> > > > > - Make sure these headers are not overwritten by newly booted kernel.
> > > > 
> > > > And that was my question: What is the best way to do that. E.g. we could
> > > > pass a 2nd kernel parameter "elfcorehdr_size", implement s390 boot
> > > > parameter or implement the memmap kernel parameter.
> > > 
> > > You could do that but I think a more generic parameter will make more
> > > sense.
> > > 
> > > - Either something along the lines of memmap=
> > > - Or excludemem=x@y
> > > - Or modify memory map in s390 specific bootloading protocol block etc.
> > 
> > Ok, understood. Thanks for the information.
> > 
> > We still have discussions here, if we could somehow implement our
> > original idea of triggering kdump by the stand-alone dump tools. Sorry
> > for being so stubborn :-(
> 
> What's the advantage of that. Why are we so stuborn about first passing
> the control to dump tools after panic()?

I wonder when you will finally get it: we are not talking about the simple
case of a panic. Not all problems of the system will show up as a panic.
We occasionally have systems that just stop dead in their tracks. And this
is where an external dump trigger comes into play. For s390 that is the
stand-alone dumper.

> The case of purgatory corruption is no different then panic() code
> and associated hook code corruption. 

That is true. All the different pieces of code need to be verified with
a checksum.

> It is a corner case and even if it gets corrupted you have other
> mechanisms to IPL dump tools and capture dump.

It is definitely not a corner case. We use the stand-alone dumper as a
trigger to either start kdump if the code for kdump has not been corrupted
or as a fallback to do a full dump. The catch here is that we need a way to
distinguish the two cases. And that is where the checksums come into play.
See?

> Why do you want to mix two mechanisms. What's the advantage of making
> even dump tools complicated and make it aware of a kernel binary
> object purgatory?

We do that so we can use kdump in situations where the system just drops
dead and does not go over panic.

> To me the simple interface is that there is no coupling between dump
> tools and kdump. If there is no coupling, then there is no need to
> exchange any information and no need to make any assumption about
> hard coded location where purgatory entry point, size and checksums
> are stored.

Without the coupling we would have to do a full dump in case of an
unresponsive system. No fun if you have lots of main memory.

> > 
> > So here comes the modified suggestion:
> > 
> > As requested by you we can pre-allocate the ELF header and use purgatory
> > as done on other architectures.
> > 
> > To allow the stand-alone dump tools as kdump triggers, we then only
> > would have to provide an s390 specific way to tell the stand-alone dump
> > tools:
> > 1. Entry point address into purgatory
> > 2. Address, size and checksum for purgatory
> > 
> > We could store address, size and checksum of the purgatory to a fixed
> > offset in the kdump kernel image. This can be done in the kexec tools
> > code.
> 
> I think this will require kernel changes also? Otherwise how would you
> store variables in kernel address space.

I would think that this would best be implemented in some arch backend
function that is called on kexec_load.

> Secondly, if the goal is to just be able to checksum purgatory also, then
> it probably should be done in a generic mannner so that kernel could
> checksum purgatory before jumping to it.

You still seem to assume that the code that does the checksumming is
included in the main kernel and gets executed with crash_kexec. This is
incorrect in case of an external dump trigger. And before the stand-alone
dumper branches to the purgatory code it better makes sure that it does
not execute random numbers, otherwise we would get no dump at all.
 
> > Then the dump tools only would need the crashkernel memory offset
> > to find all information. Then dump tools will verify purgatory and
> > afterwards jump to the purgatory code. Then purgatory verifies all kexec
> > segments. For s390, if this check fails, we return to caller
> > (stand-alone tools). If the check is ok, then purgatory code on s390
> > saves all registers to the preallocated ELF notes and starts kdump.
> 
> So far I really don't think that there is any need of involving dump
> tools here. By making it a requirement we are just making the design
> complex with no gains.

We think otherwise. The dump trigger via the stand-alone dump tool is
a central requirement for us. And the design impact is minimal with the
latest suggestion from Michael.

> > 
> > I think, this is all s390 specific and IMHO will not affect other
> > architectures at all.
> > 
> > What you as kdump framework maintainer would have to accept with this
> > solution is that it is allowed now to start kdump directly via purgatory
> > without using code from the old kernel (e.g. crash_kexec). This has as
> > implication that all things that the old kernel has to initialize for
> > kdump has to be done before the system crashes. Currently this is only
> > the initialization of vmcoreinfo.
> 
> when would you save vmcoreinfo? I guess I shall have to look at the
> patches.

That should be patch #4 from the series.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-20  8:00                                               ` Martin Schwidefsky
  0 siblings, 0 replies; 112+ messages in thread
From: Martin Schwidefsky @ 2011-07-20  8:00 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Michael Holzheu, kexec

On Tue, 19 Jul 2011 11:04:23 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Mon, Jul 18, 2011 at 08:03:08PM +0200, Michael Holzheu wrote:
> > Hello Vivek,
> > 
> > On Mon, 2011-07-18 at 11:25 -0400, Vivek Goyal wrote:
> > > On Mon, Jul 18, 2011 at 04:44:13PM +0200, Michael Holzheu wrote:
> > > > On Mon, 2011-07-18 at 10:19 -0400, Vivek Goyal wrote:
> > > > > - Make sure these headers are not overwritten by newly booted kernel.
> > > > 
> > > > And that was my question: What is the best way to do that. E.g. we could
> > > > pass a 2nd kernel parameter "elfcorehdr_size", implement s390 boot
> > > > parameter or implement the memmap kernel parameter.
> > > 
> > > You could do that but I think a more generic parameter will make more
> > > sense.
> > > 
> > > - Either something along the lines of memmap=
> > > - Or excludemem=x@y
> > > - Or modify memory map in s390 specific bootloading protocol block etc.
> > 
> > Ok, understood. Thanks for the information.
> > 
> > We still have discussions here, if we could somehow implement our
> > original idea of triggering kdump by the stand-alone dump tools. Sorry
> > for being so stubborn :-(
> 
> What's the advantage of that. Why are we so stuborn about first passing
> the control to dump tools after panic()?

I wonder when you will finally get it: we are not talking about the simple
case of a panic. Not all problems of the system will show up as a panic.
We occasionally have systems that just stop dead in their tracks. And this
is where an external dump trigger comes into play. For s390 that is the
stand-alone dumper.

> The case of purgatory corruption is no different then panic() code
> and associated hook code corruption. 

That is true. All the different pieces of code need to be verified with
a checksum.

> It is a corner case and even if it gets corrupted you have other
> mechanisms to IPL dump tools and capture dump.

It is definitely not a corner case. We use the stand-alone dumper as a
trigger to either start kdump if the code for kdump has not been corrupted
or as a fallback to do a full dump. The catch here is that we need a way to
distinguish the two cases. And that is where the checksums come into play.
See?

> Why do you want to mix two mechanisms. What's the advantage of making
> even dump tools complicated and make it aware of a kernel binary
> object purgatory?

We do that so we can use kdump in situations where the system just drops
dead and does not go over panic.

> To me the simple interface is that there is no coupling between dump
> tools and kdump. If there is no coupling, then there is no need to
> exchange any information and no need to make any assumption about
> hard coded location where purgatory entry point, size and checksums
> are stored.

Without the coupling we would have to do a full dump in case of an
unresponsive system. No fun if you have lots of main memory.

> > 
> > So here comes the modified suggestion:
> > 
> > As requested by you we can pre-allocate the ELF header and use purgatory
> > as done on other architectures.
> > 
> > To allow the stand-alone dump tools as kdump triggers, we then only
> > would have to provide an s390 specific way to tell the stand-alone dump
> > tools:
> > 1. Entry point address into purgatory
> > 2. Address, size and checksum for purgatory
> > 
> > We could store address, size and checksum of the purgatory to a fixed
> > offset in the kdump kernel image. This can be done in the kexec tools
> > code.
> 
> I think this will require kernel changes also? Otherwise how would you
> store variables in kernel address space.

I would think that this would best be implemented in some arch backend
function that is called on kexec_load.

> Secondly, if the goal is to just be able to checksum purgatory also, then
> it probably should be done in a generic mannner so that kernel could
> checksum purgatory before jumping to it.

You still seem to assume that the code that does the checksumming is
included in the main kernel and gets executed with crash_kexec. This is
incorrect in case of an external dump trigger. And before the stand-alone
dumper branches to the purgatory code it better makes sure that it does
not execute random numbers, otherwise we would get no dump at all.
 
> > Then the dump tools only would need the crashkernel memory offset
> > to find all information. Then dump tools will verify purgatory and
> > afterwards jump to the purgatory code. Then purgatory verifies all kexec
> > segments. For s390, if this check fails, we return to caller
> > (stand-alone tools). If the check is ok, then purgatory code on s390
> > saves all registers to the preallocated ELF notes and starts kdump.
> 
> So far I really don't think that there is any need of involving dump
> tools here. By making it a requirement we are just making the design
> complex with no gains.

We think otherwise. The dump trigger via the stand-alone dump tool is
a central requirement for us. And the design impact is minimal with the
latest suggestion from Michael.

> > 
> > I think, this is all s390 specific and IMHO will not affect other
> > architectures at all.
> > 
> > What you as kdump framework maintainer would have to accept with this
> > solution is that it is allowed now to start kdump directly via purgatory
> > without using code from the old kernel (e.g. crash_kexec). This has as
> > implication that all things that the old kernel has to initialize for
> > kdump has to be done before the system crashes. Currently this is only
> > the initialization of vmcoreinfo.
> 
> when would you save vmcoreinfo? I guess I shall have to look at the
> patches.

That should be patch #4 from the series.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-19 15:04                                             ` Vivek Goyal
@ 2011-07-20  9:28                                               ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-20  9:28 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Martin Schwidefsky, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

Hello Vivek,

On Tue, 2011-07-19 at 11:04 -0400, Vivek Goyal wrote:
> On Mon, Jul 18, 2011 at 08:03:08PM +0200, Michael Holzheu wrote:

[snip]
 
> > To allow the stand-alone dump tools as kdump triggers, we then only
> > would have to provide an s390 specific way to tell the stand-alone dump
> > tools:
> > 1. Entry point address into purgatory
> > 2. Address, size and checksum for purgatory
> > 
> > We could store address, size and checksum of the purgatory to a fixed
> > offset in the kdump kernel image. This can be done in the kexec tools
> > code.
> 
> I think this will require kernel changes also? Otherwise how would you
> store variables in kernel address space.

We can store purgatory address, size and checksum in the kernel image
from kexec tools user space as we do it currently for the initrd.

[snip]

> > What you as kdump framework maintainer would have to accept with this
> > solution is that it is allowed now to start kdump directly via purgatory
> > without using code from the old kernel (e.g. crash_kexec). This has as
> > implication that all things that the old kernel has to initialize for
> > kdump has to be done before the system crashes. Currently this is only
> > the initialization of vmcoreinfo.
> 
> when would you save vmcoreinfo? I guess I shall have to look at the
> patches.

patch #4: We have to save vmcoreinfo at startup time of the production
kernel.

Perhaps it would be best to send you a new patch series with our latest
proposal.

Michael



^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-20  9:28                                               ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-20  9:28 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

Hello Vivek,

On Tue, 2011-07-19 at 11:04 -0400, Vivek Goyal wrote:
> On Mon, Jul 18, 2011 at 08:03:08PM +0200, Michael Holzheu wrote:

[snip]
 
> > To allow the stand-alone dump tools as kdump triggers, we then only
> > would have to provide an s390 specific way to tell the stand-alone dump
> > tools:
> > 1. Entry point address into purgatory
> > 2. Address, size and checksum for purgatory
> > 
> > We could store address, size and checksum of the purgatory to a fixed
> > offset in the kdump kernel image. This can be done in the kexec tools
> > code.
> 
> I think this will require kernel changes also? Otherwise how would you
> store variables in kernel address space.

We can store purgatory address, size and checksum in the kernel image
from kexec tools user space as we do it currently for the initrd.

[snip]

> > What you as kdump framework maintainer would have to accept with this
> > solution is that it is allowed now to start kdump directly via purgatory
> > without using code from the old kernel (e.g. crash_kexec). This has as
> > implication that all things that the old kernel has to initialize for
> > kdump has to be done before the system crashes. Currently this is only
> > the initialization of vmcoreinfo.
> 
> when would you save vmcoreinfo? I guess I shall have to look at the
> patches.

patch #4: We have to save vmcoreinfo at startup time of the production
kernel.

Perhaps it would be best to send you a new patch series with our latest
proposal.

Michael



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-18 18:03                                           ` Michael Holzheu
@ 2011-07-20 19:25                                             ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-20 19:25 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: Martin Schwidefsky, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Mon, Jul 18, 2011 at 08:03:08PM +0200, Michael Holzheu wrote:
> Hello Vivek,
> 
> On Mon, 2011-07-18 at 11:25 -0400, Vivek Goyal wrote:
> > On Mon, Jul 18, 2011 at 04:44:13PM +0200, Michael Holzheu wrote:
> > > On Mon, 2011-07-18 at 10:19 -0400, Vivek Goyal wrote:
> > > > - Make sure these headers are not overwritten by newly booted kernel.
> > > 
> > > And that was my question: What is the best way to do that. E.g. we could
> > > pass a 2nd kernel parameter "elfcorehdr_size", implement s390 boot
> > > parameter or implement the memmap kernel parameter.
> > 
> > You could do that but I think a more generic parameter will make more
> > sense.
> > 
> > - Either something along the lines of memmap=
> > - Or excludemem=x@y
> > - Or modify memory map in s390 specific bootloading protocol block etc.
> 
> Ok, understood. Thanks for the information.
> 
> We still have discussions here, if we could somehow implement our
> original idea of triggering kdump by the stand-alone dump tools. Sorry
> for being so stubborn :-(
> 
> So here comes the modified suggestion:
> 
> As requested by you we can pre-allocate the ELF header and use purgatory
> as done on other architectures.
> 
> To allow the stand-alone dump tools as kdump triggers, we then only
> would have to provide an s390 specific way to tell the stand-alone dump
> tools:
> 1. Entry point address into purgatory
> 2. Address, size and checksum for purgatory
> 
> We could store address, size and checksum of the purgatory to a fixed
> offset in the kdump kernel image. This can be done in the kexec tools
> code. Then the dump tools only would need the crashkernel memory offset
> to find all information. Then dump tools will verify purgatory and
> afterwards jump to the purgatory code. Then purgatory verifies all kexec
> segments. For s390, if this check fails, we return to caller
> (stand-alone tools). If the check is ok, then purgatory code on s390
> saves all registers to the preallocated ELF notes and starts kdump.
> 
> I think, this is all s390 specific and IMHO will not affect other
> architectures at all.
> 
> What you as kdump framework maintainer would have to accept with this
> solution is that it is allowed now to start kdump directly via purgatory
> without using code from the old kernel (e.g. crash_kexec). This has as
> implication that all things that the old kernel has to initialize for
> kdump has to be done before the system crashes. Currently this is only
> the initialization of vmcoreinfo.

Hi Michael,

Instead of introdcuing a new entry point for second kernel, why not
jump to crash_kexec() from stand alone tools? That should be functionally
equivalent to what you described above without any need to pass the 
purgatory details to stand alone tools.

Only thing which needs to be figured out is how to pass the address of
crash_kexec() to stand alone tools and set registers/parameters 
appropriately.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-20 19:25                                             ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-20 19:25 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

On Mon, Jul 18, 2011 at 08:03:08PM +0200, Michael Holzheu wrote:
> Hello Vivek,
> 
> On Mon, 2011-07-18 at 11:25 -0400, Vivek Goyal wrote:
> > On Mon, Jul 18, 2011 at 04:44:13PM +0200, Michael Holzheu wrote:
> > > On Mon, 2011-07-18 at 10:19 -0400, Vivek Goyal wrote:
> > > > - Make sure these headers are not overwritten by newly booted kernel.
> > > 
> > > And that was my question: What is the best way to do that. E.g. we could
> > > pass a 2nd kernel parameter "elfcorehdr_size", implement s390 boot
> > > parameter or implement the memmap kernel parameter.
> > 
> > You could do that but I think a more generic parameter will make more
> > sense.
> > 
> > - Either something along the lines of memmap=
> > - Or excludemem=x@y
> > - Or modify memory map in s390 specific bootloading protocol block etc.
> 
> Ok, understood. Thanks for the information.
> 
> We still have discussions here, if we could somehow implement our
> original idea of triggering kdump by the stand-alone dump tools. Sorry
> for being so stubborn :-(
> 
> So here comes the modified suggestion:
> 
> As requested by you we can pre-allocate the ELF header and use purgatory
> as done on other architectures.
> 
> To allow the stand-alone dump tools as kdump triggers, we then only
> would have to provide an s390 specific way to tell the stand-alone dump
> tools:
> 1. Entry point address into purgatory
> 2. Address, size and checksum for purgatory
> 
> We could store address, size and checksum of the purgatory to a fixed
> offset in the kdump kernel image. This can be done in the kexec tools
> code. Then the dump tools only would need the crashkernel memory offset
> to find all information. Then dump tools will verify purgatory and
> afterwards jump to the purgatory code. Then purgatory verifies all kexec
> segments. For s390, if this check fails, we return to caller
> (stand-alone tools). If the check is ok, then purgatory code on s390
> saves all registers to the preallocated ELF notes and starts kdump.
> 
> I think, this is all s390 specific and IMHO will not affect other
> architectures at all.
> 
> What you as kdump framework maintainer would have to accept with this
> solution is that it is allowed now to start kdump directly via purgatory
> without using code from the old kernel (e.g. crash_kexec). This has as
> implication that all things that the old kernel has to initialize for
> kdump has to be done before the system crashes. Currently this is only
> the initialization of vmcoreinfo.

Hi Michael,

Instead of introdcuing a new entry point for second kernel, why not
jump to crash_kexec() from stand alone tools? That should be functionally
equivalent to what you described above without any need to pass the 
purgatory details to stand alone tools.

Only thing which needs to be figured out is how to pass the address of
crash_kexec() to stand alone tools and set registers/parameters 
appropriately.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-20  9:28                                               ` Michael Holzheu
@ 2011-07-20 20:24                                                 ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-20 20:24 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: Martin Schwidefsky, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Wed, Jul 20, 2011 at 11:28:35AM +0200, Michael Holzheu wrote:

[..]
> > > What you as kdump framework maintainer would have to accept with this
> > > solution is that it is allowed now to start kdump directly via purgatory
> > > without using code from the old kernel (e.g. crash_kexec). This has as
> > > implication that all things that the old kernel has to initialize for
> > > kdump has to be done before the system crashes. Currently this is only
> > > the initialization of vmcoreinfo.
> > 
> > when would you save vmcoreinfo? I guess I shall have to look at the
> > patches.
> 
> patch #4: We have to save vmcoreinfo at startup time of the production
> kernel.
> 
> Perhaps it would be best to send you a new patch series with our latest
> proposal.

This patch 4 looks reasonable. So you will prepare notes after boot and
refresh it after adding CRASHTIME if crash_kexec() is executed.

Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-20 20:24                                                 ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-20 20:24 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

On Wed, Jul 20, 2011 at 11:28:35AM +0200, Michael Holzheu wrote:

[..]
> > > What you as kdump framework maintainer would have to accept with this
> > > solution is that it is allowed now to start kdump directly via purgatory
> > > without using code from the old kernel (e.g. crash_kexec). This has as
> > > implication that all things that the old kernel has to initialize for
> > > kdump has to be done before the system crashes. Currently this is only
> > > the initialization of vmcoreinfo.
> > 
> > when would you save vmcoreinfo? I guess I shall have to look at the
> > patches.
> 
> patch #4: We have to save vmcoreinfo at startup time of the production
> kernel.
> 
> Perhaps it would be best to send you a new patch series with our latest
> proposal.

This patch 4 looks reasonable. So you will prepare notes after boot and
refresh it after adding CRASHTIME if crash_kexec() is executed.

Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-20 19:25                                             ` Vivek Goyal
@ 2011-07-21 14:58                                               ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-21 14:58 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

Hello Vivek,

On Wed, 2011-07-20 at 15:25 -0400, Vivek Goyal wrote:
> On Mon, Jul 18, 2011 at 08:03:08PM +0200, Michael Holzheu wrote:

[snip]

> > What you as kdump framework maintainer would have to accept with this
> > solution is that it is allowed now to start kdump directly via purgatory
> > without using code from the old kernel (e.g. crash_kexec). This has as
> > implication that all things that the old kernel has to initialize for
> > kdump has to be done before the system crashes. Currently this is only
> > the initialization of vmcoreinfo.
> 
> Hi Michael,
> 
> Instead of introdcuing a new entry point for second kernel, why not
> jump to crash_kexec() from stand alone tools? That should be functionally
> equivalent to what you described above without any need to pass the 
> purgatory details to stand alone tools.

That has the drawback that we still execute unchecked code from the
crashed kernel. But ...

... I discussed that with Martin and we had an idea how to deal with
this problem. On s390 when an invalid opcode is executed or invalid
parameters are used, we get a program check interrupt. When the
crash_kexec() code path or data is corrupted, it is almost sure that we
get a program check. The stand-alone dump tools could establish a
program check interrupt handler that jumps back to the dump tools code
and then create full-blown dump.

So I think with this mechanism we could use an entry point (name it
s390_kdump_entry) in the old kernel that calls crash_kexec().

We would change the purgatory code that for s390 it returns to the
caller, if the checksum test fails. This *requires* that
s390_kdump_entry()->crash_kexec()->machine_kexec() is allowed to return.
Currently this is the case.

> Only thing which needs to be figured out is how to pass the address of
> crash_kexec() to stand alone tools and set registers/parameters 
> appropriately.

We could do this s390 specific (e.g. using meminfo). In this case this
would only be used for kernel/dump tools communication and not for
kernel/kernel communication. So I hope this should not be a problem for
you.

Then the design would look like the following:
* Define s390_kdump_entry in old kernel that calls crash_kexec()
* Use preallocated ELF core header
* s390_kdump_entry code path stores registers to ELF notes,  ...
* ... and finally jumps to purgatory code
* For s390 the purgatory code returns to caller in case of
  checksum failure
* dump tools call s390_kdump_entry with program check handler
  for error handling

I think, if we do it that way, we do not affect the current kdump
framework at all.

Is that acceptable for you? If yes, I would rework my patches
accordingly.

Michael


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-21 14:58                                               ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-21 14:58 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

Hello Vivek,

On Wed, 2011-07-20 at 15:25 -0400, Vivek Goyal wrote:
> On Mon, Jul 18, 2011 at 08:03:08PM +0200, Michael Holzheu wrote:

[snip]

> > What you as kdump framework maintainer would have to accept with this
> > solution is that it is allowed now to start kdump directly via purgatory
> > without using code from the old kernel (e.g. crash_kexec). This has as
> > implication that all things that the old kernel has to initialize for
> > kdump has to be done before the system crashes. Currently this is only
> > the initialization of vmcoreinfo.
> 
> Hi Michael,
> 
> Instead of introdcuing a new entry point for second kernel, why not
> jump to crash_kexec() from stand alone tools? That should be functionally
> equivalent to what you described above without any need to pass the 
> purgatory details to stand alone tools.

That has the drawback that we still execute unchecked code from the
crashed kernel. But ...

... I discussed that with Martin and we had an idea how to deal with
this problem. On s390 when an invalid opcode is executed or invalid
parameters are used, we get a program check interrupt. When the
crash_kexec() code path or data is corrupted, it is almost sure that we
get a program check. The stand-alone dump tools could establish a
program check interrupt handler that jumps back to the dump tools code
and then create full-blown dump.

So I think with this mechanism we could use an entry point (name it
s390_kdump_entry) in the old kernel that calls crash_kexec().

We would change the purgatory code that for s390 it returns to the
caller, if the checksum test fails. This *requires* that
s390_kdump_entry()->crash_kexec()->machine_kexec() is allowed to return.
Currently this is the case.

> Only thing which needs to be figured out is how to pass the address of
> crash_kexec() to stand alone tools and set registers/parameters 
> appropriately.

We could do this s390 specific (e.g. using meminfo). In this case this
would only be used for kernel/dump tools communication and not for
kernel/kernel communication. So I hope this should not be a problem for
you.

Then the design would look like the following:
* Define s390_kdump_entry in old kernel that calls crash_kexec()
* Use preallocated ELF core header
* s390_kdump_entry code path stores registers to ELF notes,  ...
* ... and finally jumps to purgatory code
* For s390 the purgatory code returns to caller in case of
  checksum failure
* dump tools call s390_kdump_entry with program check handler
  for error handling

I think, if we do it that way, we do not affect the current kdump
framework at all.

Is that acceptable for you? If yes, I would rework my patches
accordingly.

Michael


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-21 14:58                                               ` Michael Holzheu
@ 2011-07-21 21:22                                                 ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-21 21:22 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

On Thu, Jul 21, 2011 at 04:58:18PM +0200, Michael Holzheu wrote:
> Hello Vivek,
> 
> On Wed, 2011-07-20 at 15:25 -0400, Vivek Goyal wrote:
> > On Mon, Jul 18, 2011 at 08:03:08PM +0200, Michael Holzheu wrote:
> 
> [snip]
> 
> > > What you as kdump framework maintainer would have to accept with this
> > > solution is that it is allowed now to start kdump directly via purgatory
> > > without using code from the old kernel (e.g. crash_kexec). This has as
> > > implication that all things that the old kernel has to initialize for
> > > kdump has to be done before the system crashes. Currently this is only
> > > the initialization of vmcoreinfo.
> > 
> > Hi Michael,
> > 
> > Instead of introdcuing a new entry point for second kernel, why not
> > jump to crash_kexec() from stand alone tools? That should be functionally
> > equivalent to what you described above without any need to pass the 
> > purgatory details to stand alone tools.
> 
> That has the drawback that we still execute unchecked code from the
> crashed kernel. But ...
> 
> ... I discussed that with Martin and we had an idea how to deal with
> this problem. On s390 when an invalid opcode is executed or invalid
> parameters are used, we get a program check interrupt. When the
> crash_kexec() code path or data is corrupted, it is almost sure that we
> get a program check. The stand-alone dump tools could establish a
> program check interrupt handler that jumps back to the dump tools code
> and then create full-blown dump.

Ok, that sounds good. So now you don't have to worry about checksumming
purgatory code.

> 
> So I think with this mechanism we could use an entry point (name it
> s390_kdump_entry) in the old kernel that calls crash_kexec().
> 
> We would change the purgatory code that for s390 it returns to the
> caller, if the checksum test fails. This *requires* that
> s390_kdump_entry()->crash_kexec()->machine_kexec() is allowed to return.
> Currently this is the case.

Can we directly jump to entry point of stand alone dump tools from purgaotry
if checksum fails? We should be able to set this entry point in user space
while loading kdump kernel.

> 
> > Only thing which needs to be figured out is how to pass the address of
> > crash_kexec() to stand alone tools and set registers/parameters 
> > appropriately.
> 
> We could do this s390 specific (e.g. using meminfo). In this case this
> would only be used for kernel/dump tools communication and not for
> kernel/kernel communication. So I hope this should not be a problem for
> you.

So you will be preparing a block/segment of data (called meminfo, though
this name does not make much sense anymore), and pass it to second kernel?
All done in user space and no first kernel involvement?

I am trying to remember the details that how do you tell second kernel
where this this data block is. I recall that last time you said something
about setting this in kernel in kexec-tools but I did not understand it.

> 
> Then the design would look like the following:
> * Define s390_kdump_entry in old kernel that calls crash_kexec()
> * Use preallocated ELF core header
> * s390_kdump_entry code path stores registers to ELF notes,  ...

crash_kexec() -> crash_setup_regs() already does that. We just need to
define an s390 specific crash_setup_regs().

> * ... and finally jumps to purgatory code
> * For s390 the purgatory code returns to caller in case of
>   checksum failure
> * dump tools call s390_kdump_entry with program check handler
>   for error handling

I thought that program check handler will call something else and not
s390_kdump_entry()? Because program check handler is supposed to hit
when any of the code we are executing is corrupted and we can not
jump to kdump tool any more. Otherwise we will be nesting.

In fact how do we differentiate between kdump code being corrupted
vs some normal kernel code being corrupted. In first case we would like
to jump to dump tools and take full dump and in second case it would
be desirable to jump to kdump kernel.

> 
> I think, if we do it that way, we do not affect the current kdump
> framework at all.

Can you give some more details about various code flows and entry points.
Like panic() path, hard hang path. From your mail it sounds that even
with program check handler, after panic() you would like to jump to
stand alone tools first and then call s390_kdump_entry(). I think that
should not be required any more as you are not doing any checksumming
in dump tools anymore?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-21 21:22                                                 ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-21 21:22 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

On Thu, Jul 21, 2011 at 04:58:18PM +0200, Michael Holzheu wrote:
> Hello Vivek,
> 
> On Wed, 2011-07-20 at 15:25 -0400, Vivek Goyal wrote:
> > On Mon, Jul 18, 2011 at 08:03:08PM +0200, Michael Holzheu wrote:
> 
> [snip]
> 
> > > What you as kdump framework maintainer would have to accept with this
> > > solution is that it is allowed now to start kdump directly via purgatory
> > > without using code from the old kernel (e.g. crash_kexec). This has as
> > > implication that all things that the old kernel has to initialize for
> > > kdump has to be done before the system crashes. Currently this is only
> > > the initialization of vmcoreinfo.
> > 
> > Hi Michael,
> > 
> > Instead of introdcuing a new entry point for second kernel, why not
> > jump to crash_kexec() from stand alone tools? That should be functionally
> > equivalent to what you described above without any need to pass the 
> > purgatory details to stand alone tools.
> 
> That has the drawback that we still execute unchecked code from the
> crashed kernel. But ...
> 
> ... I discussed that with Martin and we had an idea how to deal with
> this problem. On s390 when an invalid opcode is executed or invalid
> parameters are used, we get a program check interrupt. When the
> crash_kexec() code path or data is corrupted, it is almost sure that we
> get a program check. The stand-alone dump tools could establish a
> program check interrupt handler that jumps back to the dump tools code
> and then create full-blown dump.

Ok, that sounds good. So now you don't have to worry about checksumming
purgatory code.

> 
> So I think with this mechanism we could use an entry point (name it
> s390_kdump_entry) in the old kernel that calls crash_kexec().
> 
> We would change the purgatory code that for s390 it returns to the
> caller, if the checksum test fails. This *requires* that
> s390_kdump_entry()->crash_kexec()->machine_kexec() is allowed to return.
> Currently this is the case.

Can we directly jump to entry point of stand alone dump tools from purgaotry
if checksum fails? We should be able to set this entry point in user space
while loading kdump kernel.

> 
> > Only thing which needs to be figured out is how to pass the address of
> > crash_kexec() to stand alone tools and set registers/parameters 
> > appropriately.
> 
> We could do this s390 specific (e.g. using meminfo). In this case this
> would only be used for kernel/dump tools communication and not for
> kernel/kernel communication. So I hope this should not be a problem for
> you.

So you will be preparing a block/segment of data (called meminfo, though
this name does not make much sense anymore), and pass it to second kernel?
All done in user space and no first kernel involvement?

I am trying to remember the details that how do you tell second kernel
where this this data block is. I recall that last time you said something
about setting this in kernel in kexec-tools but I did not understand it.

> 
> Then the design would look like the following:
> * Define s390_kdump_entry in old kernel that calls crash_kexec()
> * Use preallocated ELF core header
> * s390_kdump_entry code path stores registers to ELF notes,  ...

crash_kexec() -> crash_setup_regs() already does that. We just need to
define an s390 specific crash_setup_regs().

> * ... and finally jumps to purgatory code
> * For s390 the purgatory code returns to caller in case of
>   checksum failure
> * dump tools call s390_kdump_entry with program check handler
>   for error handling

I thought that program check handler will call something else and not
s390_kdump_entry()? Because program check handler is supposed to hit
when any of the code we are executing is corrupted and we can not
jump to kdump tool any more. Otherwise we will be nesting.

In fact how do we differentiate between kdump code being corrupted
vs some normal kernel code being corrupted. In first case we would like
to jump to dump tools and take full dump and in second case it would
be desirable to jump to kdump kernel.

> 
> I think, if we do it that way, we do not affect the current kdump
> framework at all.

Can you give some more details about various code flows and entry points.
Like panic() path, hard hang path. From your mail it sounds that even
with program check handler, after panic() you would like to jump to
stand alone tools first and then call s390_kdump_entry(). I think that
should not be required any more as you are not doing any checksumming
in dump tools anymore?

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-21 21:22                                                 ` Vivek Goyal
@ 2011-07-22  9:33                                                   ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-22  9:33 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

Hello Vivek,

On Thu, 2011-07-21 at 17:22 -0400, Vivek Goyal wrote:
> On Thu, Jul 21, 2011 at 04:58:18PM +0200, Michael Holzheu wrote:
> > We would change the purgatory code that for s390 it returns to the
> > caller, if the checksum test fails. This *requires* that
> > s390_kdump_entry()->crash_kexec()->machine_kexec() is allowed to return.
> > Currently this is the case.
> 
> Can we directly jump to entry point of stand alone dump tools from purgaotry
> if checksum fails? We should be able to set this entry point in user space
> while loading kdump kernel.

I described a new idea with forced program check below.

> > > Only thing which needs to be figured out is how to pass the address of
> > > crash_kexec() to stand alone tools and set registers/parameters 
> > > appropriately.
> > 
> > We could do this s390 specific (e.g. using meminfo). In this case this
> > would only be used for kernel/dump tools communication and not for
> > kernel/kernel communication. So I hope this should not be a problem for
> > you.
> 
> So you will be preparing a block/segment of data (called meminfo, though
> this name does not make much sense anymore), and pass it to second kernel?
> All done in user space and no first kernel involvement?
>
> I am trying to remember the details that how do you tell second kernel
> where this this data block is. I recall that last time you said something
> about setting this in kernel in kexec-tools but I did not understand it.

Better you forget everything :-)

We will establish a s390 specific mechanism that allows dump tools to
find s390_kdump_entry and does not affect the kdump framework. Hopefully
nothing you have to worry about.

> > 
> > Then the design would look like the following:
> > * Define s390_kdump_entry in old kernel that calls crash_kexec()
> > * Use preallocated ELF core header
> > * s390_kdump_entry code path stores registers to ELF notes,  ...
> 
> crash_kexec() -> crash_setup_regs() already does that. We just need to
> define an s390 specific crash_setup_regs().

I looked at the code. x86 seems to store only registers for current CPU.
Where are all other CPUs stored? ia64 has an empty implementation. Where
are registers stored there?

> 
> > * ... and finally jumps to purgatory code
> > * For s390 the purgatory code returns to caller in case of
> >   checksum failure
> > * dump tools call s390_kdump_entry with program check handler
> >   for error handling
> 
> I thought that program check handler will call something else and not
> s390_kdump_entry()? Because program check handler is supposed to hit
> when any of the code we are executing is corrupted and we can not
> jump to kdump tool any more. Otherwise we will be nesting.

Looks like the sentence was misleading. What I wanted to say is:
* First dump tools setup program check handler that jumps back to
  dump tool in case kdump fails
* Then dump tools call s390_dump_entry

> > 
> > I think, if we do it that way, we do not affect the current kdump
> > framework at all.
> 
> Can you give some more details about various code flows and entry points.
> Like panic() path, hard hang path. From your mail it sounds that even
> with program check handler, after panic() you would like to jump to
> stand alone tools first and then call s390_kdump_entry(). I think that
> should not be required any more as you are not doing any checksumming
> in dump tools anymore?

Ok some code flows:

Generally we have the flow:
* crash_kexec -> machine_kexec -> purgatory -> kdump

crash_kexec can be entered by e.g.:
* panic -> kdump shutdown action -> crash_kexec
* panic -> s390 dump shutdown action -> auto IPL dump tool -> s390_kdump_entry -> crash_kexec
* hard hang -> manual IPL dump tool -> s390_kdump_entry -> crash_kexec

Handling for corrupted kdump:

New idea for returning to dump tools in case of program check:
We could force a program check for s390, if purgatory checksum
fails. Then we would automatically return to stand-alone dump
tools.

The flow would look like the following in this case:

IPL dump tool -> s390_kdump_entry -> crash_kexec +--> purgatory -+->[checksum ok]---> kdump
      ^                                          |               |
      |                                          |        [checksum fail]
      |                                          |               |
      |                                          |     [forced program check]
      +------[program check]---------------------+               |
      |                                                          |
      +----------------------------------------------------------+

Then of course also the kernel code would have to install a special
program check handler before calling purgatory.
                         
Michael


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-22  9:33                                                   ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-22  9:33 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

Hello Vivek,

On Thu, 2011-07-21 at 17:22 -0400, Vivek Goyal wrote:
> On Thu, Jul 21, 2011 at 04:58:18PM +0200, Michael Holzheu wrote:
> > We would change the purgatory code that for s390 it returns to the
> > caller, if the checksum test fails. This *requires* that
> > s390_kdump_entry()->crash_kexec()->machine_kexec() is allowed to return.
> > Currently this is the case.
> 
> Can we directly jump to entry point of stand alone dump tools from purgaotry
> if checksum fails? We should be able to set this entry point in user space
> while loading kdump kernel.

I described a new idea with forced program check below.

> > > Only thing which needs to be figured out is how to pass the address of
> > > crash_kexec() to stand alone tools and set registers/parameters 
> > > appropriately.
> > 
> > We could do this s390 specific (e.g. using meminfo). In this case this
> > would only be used for kernel/dump tools communication and not for
> > kernel/kernel communication. So I hope this should not be a problem for
> > you.
> 
> So you will be preparing a block/segment of data (called meminfo, though
> this name does not make much sense anymore), and pass it to second kernel?
> All done in user space and no first kernel involvement?
>
> I am trying to remember the details that how do you tell second kernel
> where this this data block is. I recall that last time you said something
> about setting this in kernel in kexec-tools but I did not understand it.

Better you forget everything :-)

We will establish a s390 specific mechanism that allows dump tools to
find s390_kdump_entry and does not affect the kdump framework. Hopefully
nothing you have to worry about.

> > 
> > Then the design would look like the following:
> > * Define s390_kdump_entry in old kernel that calls crash_kexec()
> > * Use preallocated ELF core header
> > * s390_kdump_entry code path stores registers to ELF notes,  ...
> 
> crash_kexec() -> crash_setup_regs() already does that. We just need to
> define an s390 specific crash_setup_regs().

I looked at the code. x86 seems to store only registers for current CPU.
Where are all other CPUs stored? ia64 has an empty implementation. Where
are registers stored there?

> 
> > * ... and finally jumps to purgatory code
> > * For s390 the purgatory code returns to caller in case of
> >   checksum failure
> > * dump tools call s390_kdump_entry with program check handler
> >   for error handling
> 
> I thought that program check handler will call something else and not
> s390_kdump_entry()? Because program check handler is supposed to hit
> when any of the code we are executing is corrupted and we can not
> jump to kdump tool any more. Otherwise we will be nesting.

Looks like the sentence was misleading. What I wanted to say is:
* First dump tools setup program check handler that jumps back to
  dump tool in case kdump fails
* Then dump tools call s390_dump_entry

> > 
> > I think, if we do it that way, we do not affect the current kdump
> > framework at all.
> 
> Can you give some more details about various code flows and entry points.
> Like panic() path, hard hang path. From your mail it sounds that even
> with program check handler, after panic() you would like to jump to
> stand alone tools first and then call s390_kdump_entry(). I think that
> should not be required any more as you are not doing any checksumming
> in dump tools anymore?

Ok some code flows:

Generally we have the flow:
* crash_kexec -> machine_kexec -> purgatory -> kdump

crash_kexec can be entered by e.g.:
* panic -> kdump shutdown action -> crash_kexec
* panic -> s390 dump shutdown action -> auto IPL dump tool -> s390_kdump_entry -> crash_kexec
* hard hang -> manual IPL dump tool -> s390_kdump_entry -> crash_kexec

Handling for corrupted kdump:

New idea for returning to dump tools in case of program check:
We could force a program check for s390, if purgatory checksum
fails. Then we would automatically return to stand-alone dump
tools.

The flow would look like the following in this case:

IPL dump tool -> s390_kdump_entry -> crash_kexec +--> purgatory -+->[checksum ok]---> kdump
      ^                                          |               |
      |                                          |        [checksum fail]
      |                                          |               |
      |                                          |     [forced program check]
      +------[program check]---------------------+               |
      |                                                          |
      +----------------------------------------------------------+

Then of course also the kernel code would have to install a special
program check handler before calling purgatory.
                         
Michael


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-18 15:25                                         ` Vivek Goyal
@ 2011-07-22 15:26                                           ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-22 15:26 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Martin Schwidefsky, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

Hello Vivek,

Still thinking how we best get elfcorehdr size...

On Mon, 2011-07-18 at 11:25 -0400, Vivek Goyal wrote:
> On Mon, Jul 18, 2011 at 04:44:13PM +0200, Michael Holzheu wrote:
> You could do that but I think a more generic parameter will make more
> sense.
> 
> - Either something along the lines of memmap=
> - Or excludemem=x@y
> - Or modify memory map in s390 specific bootloading protocol block etc.

Wouldn't it be most natural to pass the length along with the address of
the ELF core header? What about extending the kernel elfcorehdr kernel
parameter and adding optional size:

elfcorehdr=<addr>[KMG],<size>[KMG]

Michael


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-22 15:26                                           ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-22 15:26 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

Hello Vivek,

Still thinking how we best get elfcorehdr size...

On Mon, 2011-07-18 at 11:25 -0400, Vivek Goyal wrote:
> On Mon, Jul 18, 2011 at 04:44:13PM +0200, Michael Holzheu wrote:
> You could do that but I think a more generic parameter will make more
> sense.
> 
> - Either something along the lines of memmap=
> - Or excludemem=x@y
> - Or modify memory map in s390 specific bootloading protocol block etc.

Wouldn't it be most natural to pass the length along with the address of
the ELF core header? What about extending the kernel elfcorehdr kernel
parameter and adding optional size:

elfcorehdr=<addr>[KMG],<size>[KMG]

Michael


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-22  9:33                                                   ` Michael Holzheu
@ 2011-07-25 16:02                                                     ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-25 16:02 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

On Fri, Jul 22, 2011 at 11:33:11AM +0200, Michael Holzheu wrote:

[..]
> > > 
> > > Then the design would look like the following:
> > > * Define s390_kdump_entry in old kernel that calls crash_kexec()
> > > * Use preallocated ELF core header
> > > * s390_kdump_entry code path stores registers to ELF notes,  ...
> > 
> > crash_kexec() -> crash_setup_regs() already does that. We just need to
> > define an s390 specific crash_setup_regs().
> 
> I looked at the code. x86 seems to store only registers for current CPU.
> Where are all other CPUs stored? ia64 has an empty implementation. Where
> are registers stored there?

native_machine_crash_shutdown()
	kdump_nmi_shootdown_cpus()
		kdump_nmi_callback()
			crash_save_cpu()

Basically crashing cpu sends NMI to other cpus to stop them and with-in
NMI handler it also saves per cpu state.

> 
> > 
> > > * ... and finally jumps to purgatory code
> > > * For s390 the purgatory code returns to caller in case of
> > >   checksum failure
> > > * dump tools call s390_kdump_entry with program check handler
> > >   for error handling
> > 
> > I thought that program check handler will call something else and not
> > s390_kdump_entry()? Because program check handler is supposed to hit
> > when any of the code we are executing is corrupted and we can not
> > jump to kdump tool any more. Otherwise we will be nesting.
> 
> Looks like the sentence was misleading. What I wanted to say is:
> * First dump tools setup program check handler that jumps back to
>   dump tool in case kdump fails
> * Then dump tools call s390_dump_entry
> 
> > > 
> > > I think, if we do it that way, we do not affect the current kdump
> > > framework at all.
> > 
> > Can you give some more details about various code flows and entry points.
> > Like panic() path, hard hang path. From your mail it sounds that even
> > with program check handler, after panic() you would like to jump to
> > stand alone tools first and then call s390_kdump_entry(). I think that
> > should not be required any more as you are not doing any checksumming
> > in dump tools anymore?
> 
> Ok some code flows:
> 
> Generally we have the flow:
> * crash_kexec -> machine_kexec -> purgatory -> kdump
> 
> crash_kexec can be entered by e.g.:
> * panic -> kdump shutdown action -> crash_kexec
> * panic -> s390 dump shutdown action -> auto IPL dump tool -> s390_kdump_entry -> crash_kexec

So after panic() You will still jump to dump tools? The only thing you
need to do there is installing program check handler and could have been
easily done in kernel too.

> * hard hang -> manual IPL dump tool -> s390_kdump_entry -> crash_kexec

This one makes sense as kernel is hard hung and dump tools need to
force crash_kexec() now. It is more like x86 NMI handler.

> 
> Handling for corrupted kdump:
> 
> New idea for returning to dump tools in case of program check:
> We could force a program check for s390, if purgatory checksum
> fails. Then we would automatically return to stand-alone dump
> tools.
> 
> The flow would look like the following in this case:
> 
> IPL dump tool -> s390_kdump_entry -> crash_kexec +--> purgatory -+->[checksum ok]---> kdump
>       ^                                          |               |
>       |                                          |        [checksum fail]
>       |                                          |               |
>       |                                          |     [forced program check]
>       +------[program check]---------------------+               |
>       |                                                          |
>       +----------------------------------------------------------+
> 
> Then of course also the kernel code would have to install a special
> program check handler before calling purgatory.

If kernel code is going to install the program check handler before
calling purgatory, then we don't need to jump to dump tools at all
after panic()?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-25 16:02                                                     ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-25 16:02 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

On Fri, Jul 22, 2011 at 11:33:11AM +0200, Michael Holzheu wrote:

[..]
> > > 
> > > Then the design would look like the following:
> > > * Define s390_kdump_entry in old kernel that calls crash_kexec()
> > > * Use preallocated ELF core header
> > > * s390_kdump_entry code path stores registers to ELF notes,  ...
> > 
> > crash_kexec() -> crash_setup_regs() already does that. We just need to
> > define an s390 specific crash_setup_regs().
> 
> I looked at the code. x86 seems to store only registers for current CPU.
> Where are all other CPUs stored? ia64 has an empty implementation. Where
> are registers stored there?

native_machine_crash_shutdown()
	kdump_nmi_shootdown_cpus()
		kdump_nmi_callback()
			crash_save_cpu()

Basically crashing cpu sends NMI to other cpus to stop them and with-in
NMI handler it also saves per cpu state.

> 
> > 
> > > * ... and finally jumps to purgatory code
> > > * For s390 the purgatory code returns to caller in case of
> > >   checksum failure
> > > * dump tools call s390_kdump_entry with program check handler
> > >   for error handling
> > 
> > I thought that program check handler will call something else and not
> > s390_kdump_entry()? Because program check handler is supposed to hit
> > when any of the code we are executing is corrupted and we can not
> > jump to kdump tool any more. Otherwise we will be nesting.
> 
> Looks like the sentence was misleading. What I wanted to say is:
> * First dump tools setup program check handler that jumps back to
>   dump tool in case kdump fails
> * Then dump tools call s390_dump_entry
> 
> > > 
> > > I think, if we do it that way, we do not affect the current kdump
> > > framework at all.
> > 
> > Can you give some more details about various code flows and entry points.
> > Like panic() path, hard hang path. From your mail it sounds that even
> > with program check handler, after panic() you would like to jump to
> > stand alone tools first and then call s390_kdump_entry(). I think that
> > should not be required any more as you are not doing any checksumming
> > in dump tools anymore?
> 
> Ok some code flows:
> 
> Generally we have the flow:
> * crash_kexec -> machine_kexec -> purgatory -> kdump
> 
> crash_kexec can be entered by e.g.:
> * panic -> kdump shutdown action -> crash_kexec
> * panic -> s390 dump shutdown action -> auto IPL dump tool -> s390_kdump_entry -> crash_kexec

So after panic() You will still jump to dump tools? The only thing you
need to do there is installing program check handler and could have been
easily done in kernel too.

> * hard hang -> manual IPL dump tool -> s390_kdump_entry -> crash_kexec

This one makes sense as kernel is hard hung and dump tools need to
force crash_kexec() now. It is more like x86 NMI handler.

> 
> Handling for corrupted kdump:
> 
> New idea for returning to dump tools in case of program check:
> We could force a program check for s390, if purgatory checksum
> fails. Then we would automatically return to stand-alone dump
> tools.
> 
> The flow would look like the following in this case:
> 
> IPL dump tool -> s390_kdump_entry -> crash_kexec +--> purgatory -+->[checksum ok]---> kdump
>       ^                                          |               |
>       |                                          |        [checksum fail]
>       |                                          |               |
>       |                                          |     [forced program check]
>       +------[program check]---------------------+               |
>       |                                                          |
>       +----------------------------------------------------------+
> 
> Then of course also the kernel code would have to install a special
> program check handler before calling purgatory.

If kernel code is going to install the program check handler before
calling purgatory, then we don't need to jump to dump tools at all
after panic()?

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-22 15:26                                           ` Michael Holzheu
@ 2011-07-25 18:07                                             ` Vivek Goyal
  -1 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-25 18:07 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: Martin Schwidefsky, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Fri, Jul 22, 2011 at 05:26:32PM +0200, Michael Holzheu wrote:
> Hello Vivek,
> 
> Still thinking how we best get elfcorehdr size...
> 
> On Mon, 2011-07-18 at 11:25 -0400, Vivek Goyal wrote:
> > On Mon, Jul 18, 2011 at 04:44:13PM +0200, Michael Holzheu wrote:
> > You could do that but I think a more generic parameter will make more
> > sense.
> > 
> > - Either something along the lines of memmap=
> > - Or excludemem=x@y
> > - Or modify memory map in s390 specific bootloading protocol block etc.
> 
> Wouldn't it be most natural to pass the length along with the address of
> the ELF core header? What about extending the kernel elfcorehdr kernel
> parameter and adding optional size:
> 
> elfcorehdr=<addr>[KMG],<size>[KMG]

I think it could be memap= style. elfcorehdr=X[KMG]@Y[KMG]. Though to
support backward compatibility we will have to support old format of
plain elfcorehdr=X too.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-25 18:07                                             ` Vivek Goyal
  0 siblings, 0 replies; 112+ messages in thread
From: Vivek Goyal @ 2011-07-25 18:07 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

On Fri, Jul 22, 2011 at 05:26:32PM +0200, Michael Holzheu wrote:
> Hello Vivek,
> 
> Still thinking how we best get elfcorehdr size...
> 
> On Mon, 2011-07-18 at 11:25 -0400, Vivek Goyal wrote:
> > On Mon, Jul 18, 2011 at 04:44:13PM +0200, Michael Holzheu wrote:
> > You could do that but I think a more generic parameter will make more
> > sense.
> > 
> > - Either something along the lines of memmap=
> > - Or excludemem=x@y
> > - Or modify memory map in s390 specific bootloading protocol block etc.
> 
> Wouldn't it be most natural to pass the length along with the address of
> the ELF core header? What about extending the kernel elfcorehdr kernel
> parameter and adding optional size:
> 
> elfcorehdr=<addr>[KMG],<size>[KMG]

I think it could be memap= style. elfcorehdr=X[KMG]@Y[KMG]. Though to
support backward compatibility we will have to support old format of
plain elfcorehdr=X too.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-25 18:07                                             ` Vivek Goyal
@ 2011-07-26  9:32                                               ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-26  9:32 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Martin Schwidefsky, ebiederm, hbabu, mahesh, oomichi, horms,
	heiko.carstens, kexec, linux-kernel, linux-s390

On Mon, 2011-07-25 at 14:07 -0400, Vivek Goyal wrote:
> On Fri, Jul 22, 2011 at 05:26:32PM +0200, Michael Holzheu wrote:
> > Hello Vivek,
> > 
> > Still thinking how we best get elfcorehdr size...
> > 
> > On Mon, 2011-07-18 at 11:25 -0400, Vivek Goyal wrote:
> > > On Mon, Jul 18, 2011 at 04:44:13PM +0200, Michael Holzheu wrote:
> > > You could do that but I think a more generic parameter will make more
> > > sense.
> > > 
> > > - Either something along the lines of memmap=
> > > - Or excludemem=x@y
> > > - Or modify memory map in s390 specific bootloading protocol block etc.
> > 
> > Wouldn't it be most natural to pass the length along with the address of
> > the ELF core header? What about extending the kernel elfcorehdr kernel
> > parameter and adding optional size:
> > 
> > elfcorehdr=<addr>[KMG],<size>[KMG]
> 
> I think it could be memap= style. elfcorehdr=X[KMG]@Y[KMG]. Though to
> support backward compatibility we will have to support old format of
> plain elfcorehdr=X too.

Ok, fine. I will add a patch for that.

Michael



^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-26  9:32                                               ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-26  9:32 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

On Mon, 2011-07-25 at 14:07 -0400, Vivek Goyal wrote:
> On Fri, Jul 22, 2011 at 05:26:32PM +0200, Michael Holzheu wrote:
> > Hello Vivek,
> > 
> > Still thinking how we best get elfcorehdr size...
> > 
> > On Mon, 2011-07-18 at 11:25 -0400, Vivek Goyal wrote:
> > > On Mon, Jul 18, 2011 at 04:44:13PM +0200, Michael Holzheu wrote:
> > > You could do that but I think a more generic parameter will make more
> > > sense.
> > > 
> > > - Either something along the lines of memmap=
> > > - Or excludemem=x@y
> > > - Or modify memory map in s390 specific bootloading protocol block etc.
> > 
> > Wouldn't it be most natural to pass the length along with the address of
> > the ELF core header? What about extending the kernel elfcorehdr kernel
> > parameter and adding optional size:
> > 
> > elfcorehdr=<addr>[KMG],<size>[KMG]
> 
> I think it could be memap= style. elfcorehdr=X[KMG]@Y[KMG]. Though to
> support backward compatibility we will have to support old format of
> plain elfcorehdr=X too.

Ok, fine. I will add a patch for that.

Michael



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
  2011-07-25 16:02                                                     ` Vivek Goyal
@ 2011-07-26  9:44                                                       ` Michael Holzheu
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-26  9:44 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

On Mon, 2011-07-25 at 12:02 -0400, Vivek Goyal wrote:
> On Fri, Jul 22, 2011 at 11:33:11AM +0200, Michael Holzheu wrote:

[snip]

> > crash_kexec can be entered by e.g.:
> > * panic -> kdump shutdown action -> crash_kexec
> > * panic -> s390 dump shutdown action -> auto IPL dump tool -> s390_kdump_entry -> crash_kexec
> 
> So after panic() You will still jump to dump tools?

This is our current mechanism to create an automatic dump in case of
panic. I do not see a reason to change that now.

>  The only thing you
> need to do there is installing program check handler and could have been
> easily done in kernel too.
> 
> > * hard hang -> manual IPL dump tool -> s390_kdump_entry -> crash_kexec
> 
> This one makes sense as kernel is hard hung and dump tools need to
> force crash_kexec() now. It is more like x86 NMI handler.
> 
> > 
> > Handling for corrupted kdump:
> > 
> > New idea for returning to dump tools in case of program check:
> > We could force a program check for s390, if purgatory checksum
> > fails. Then we would automatically return to stand-alone dump
> > tools.
> > 
> > The flow would look like the following in this case:
> > 
> > IPL dump tool -> s390_kdump_entry -> crash_kexec +--> purgatory -+->[checksum ok]---> kdump
> >       ^                                          |               |
> >       |                                          |        [checksum fail]
> >       |                                          |               |
> >       |                                          |     [forced program check]
> >       +------[program check]---------------------+               |
> >       |                                                          |
> >       +----------------------------------------------------------+
> > 
> > Then of course also the kernel code would have to install a special
> > program check handler before calling purgatory.
> 
> If kernel code is going to install the program check handler before
> calling purgatory, then we don't need to jump to dump tools at all
> after panic()?

Independent from hard wait or panic, if dump tools have control, we want
to jump back to the dump tools code in order to be able to create
full-blown dump as backup method.

I think we now have a design, where almost no change of your kdump
framework is needed. I will resend the updated patch series.

Michael



^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [patch 0/9] kdump: Patch series for s390 support
@ 2011-07-26  9:44                                                       ` Michael Holzheu
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Holzheu @ 2011-07-26  9:44 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: oomichi, linux-s390, mahesh, heiko.carstens, linux-kernel, hbabu,
	horms, ebiederm, Martin Schwidefsky, kexec

On Mon, 2011-07-25 at 12:02 -0400, Vivek Goyal wrote:
> On Fri, Jul 22, 2011 at 11:33:11AM +0200, Michael Holzheu wrote:

[snip]

> > crash_kexec can be entered by e.g.:
> > * panic -> kdump shutdown action -> crash_kexec
> > * panic -> s390 dump shutdown action -> auto IPL dump tool -> s390_kdump_entry -> crash_kexec
> 
> So after panic() You will still jump to dump tools?

This is our current mechanism to create an automatic dump in case of
panic. I do not see a reason to change that now.

>  The only thing you
> need to do there is installing program check handler and could have been
> easily done in kernel too.
> 
> > * hard hang -> manual IPL dump tool -> s390_kdump_entry -> crash_kexec
> 
> This one makes sense as kernel is hard hung and dump tools need to
> force crash_kexec() now. It is more like x86 NMI handler.
> 
> > 
> > Handling for corrupted kdump:
> > 
> > New idea for returning to dump tools in case of program check:
> > We could force a program check for s390, if purgatory checksum
> > fails. Then we would automatically return to stand-alone dump
> > tools.
> > 
> > The flow would look like the following in this case:
> > 
> > IPL dump tool -> s390_kdump_entry -> crash_kexec +--> purgatory -+->[checksum ok]---> kdump
> >       ^                                          |               |
> >       |                                          |        [checksum fail]
> >       |                                          |               |
> >       |                                          |     [forced program check]
> >       +------[program check]---------------------+               |
> >       |                                                          |
> >       +----------------------------------------------------------+
> > 
> > Then of course also the kernel code would have to install a special
> > program check handler before calling purgatory.
> 
> If kernel code is going to install the program check handler before
> calling purgatory, then we don't need to jump to dump tools at all
> after panic()?

Independent from hard wait or panic, if dump tools have control, we want
to jump back to the dump tools code in order to be able to create
full-blown dump as backup method.

I think we now have a design, where almost no change of your kdump
framework is needed. I will resend the updated patch series.

Michael



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 112+ messages in thread

end of thread, other threads:[~2011-07-26  9:44 UTC | newest]

Thread overview: 112+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-04 17:09 [patch 0/9] kdump: Patch series for s390 support Michael Holzheu
2011-07-04 17:09 ` Michael Holzheu
2011-07-04 17:09 ` [patch 1/9] kdump: Add KEXEC_CRASH_CONTROL_MEMORY_LIMIT Michael Holzheu
2011-07-04 17:09   ` Michael Holzheu
2011-07-04 17:09 ` [patch 2/9] kdump: Add machine_kexec_finish() Michael Holzheu
2011-07-04 17:09   ` Michael Holzheu
2011-07-04 17:09 ` [patch 3/9] kdump: Make kimage_load_crash_segment() weak Michael Holzheu
2011-07-04 17:09   ` Michael Holzheu
2011-07-04 17:09 ` [patch 4/9] kdump: Initialize vmcoreinfo note at startup Michael Holzheu
2011-07-04 17:09   ` Michael Holzheu
2011-07-04 17:09 ` [patch 5/9] kdump: Allow vmcore ELF header to be created in new kernel Michael Holzheu
2011-07-04 17:09   ` Michael Holzheu
2011-07-04 17:09 ` [patch 6/9] kdump: Merge set_vmcore_list_offsets_elf_32/64() Michael Holzheu
2011-07-04 17:09   ` Michael Holzheu
2011-07-04 17:09 ` [patch 7/9] kdump: Trigger kdump via panic notifier chain on s390 Michael Holzheu
2011-07-04 17:09   ` Michael Holzheu
2011-07-04 17:09 ` [patch 8/9] s390: kdump backend code Michael Holzheu
2011-07-04 17:09   ` Michael Holzheu
2011-07-04 17:09 ` [patch 9/9] kexec-tools: Add s390 kdump support Michael Holzheu
2011-07-04 17:09   ` Michael Holzheu
2011-07-05 20:26 ` [patch 0/9] kdump: Patch series for s390 support Vivek Goyal
2011-07-05 20:26   ` Vivek Goyal
2011-07-06  9:24   ` Michael Holzheu
2011-07-06  9:24     ` Michael Holzheu
2011-07-07 19:33     ` Vivek Goyal
2011-07-07 19:33       ` Vivek Goyal
2011-07-08  9:01       ` Martin Schwidefsky
2011-07-08  9:01         ` Martin Schwidefsky
2011-07-11 14:42         ` Vivek Goyal
2011-07-11 14:42           ` Vivek Goyal
2011-07-11 15:56           ` Martin Schwidefsky
2011-07-11 15:56             ` Martin Schwidefsky
2011-07-13 16:02             ` Vivek Goyal
2011-07-13 16:02               ` Vivek Goyal
2011-07-13 16:46               ` Martin Schwidefsky
2011-07-13 16:46                 ` Martin Schwidefsky
2011-07-13 16:59                 ` Michael Holzheu
2011-07-13 16:59                   ` Michael Holzheu
2011-07-13 17:19                   ` Vivek Goyal
2011-07-13 17:19                     ` Vivek Goyal
2011-07-13 20:00                 ` Vivek Goyal
2011-07-13 20:00                   ` Vivek Goyal
2011-07-14  7:18                   ` Martin Schwidefsky
2011-07-14  7:18                     ` Martin Schwidefsky
2011-07-14 17:55                     ` Vivek Goyal
2011-07-14 17:55                       ` Vivek Goyal
2011-07-14 18:05                       ` Vivek Goyal
2011-07-14 18:05                         ` Vivek Goyal
2011-07-15 14:21                         ` Michael Holzheu
2011-07-15 14:21                           ` Michael Holzheu
2011-07-15 14:38                           ` Vivek Goyal
2011-07-15 14:38                             ` Vivek Goyal
2011-07-15 15:43                             ` Michael Holzheu
2011-07-15 15:43                               ` Michael Holzheu
2011-07-18 12:31                               ` Vivek Goyal
2011-07-18 12:31                                 ` Vivek Goyal
2011-07-18 14:00                                 ` Michael Holzheu
2011-07-18 14:00                                   ` Michael Holzheu
2011-07-18 14:19                                   ` Vivek Goyal
2011-07-18 14:19                                     ` Vivek Goyal
2011-07-18 14:44                                     ` Michael Holzheu
2011-07-18 14:44                                       ` Michael Holzheu
2011-07-18 15:25                                       ` Vivek Goyal
2011-07-18 15:25                                         ` Vivek Goyal
2011-07-18 18:03                                         ` Michael Holzheu
2011-07-18 18:03                                           ` Michael Holzheu
2011-07-19 15:04                                           ` Vivek Goyal
2011-07-19 15:04                                             ` Vivek Goyal
2011-07-20  8:00                                             ` Martin Schwidefsky
2011-07-20  8:00                                               ` Martin Schwidefsky
2011-07-20  9:28                                             ` Michael Holzheu
2011-07-20  9:28                                               ` Michael Holzheu
2011-07-20 20:24                                               ` Vivek Goyal
2011-07-20 20:24                                                 ` Vivek Goyal
2011-07-20 19:25                                           ` Vivek Goyal
2011-07-20 19:25                                             ` Vivek Goyal
2011-07-21 14:58                                             ` Michael Holzheu
2011-07-21 14:58                                               ` Michael Holzheu
2011-07-21 21:22                                               ` Vivek Goyal
2011-07-21 21:22                                                 ` Vivek Goyal
2011-07-22  9:33                                                 ` Michael Holzheu
2011-07-22  9:33                                                   ` Michael Holzheu
2011-07-25 16:02                                                   ` Vivek Goyal
2011-07-25 16:02                                                     ` Vivek Goyal
2011-07-26  9:44                                                     ` Michael Holzheu
2011-07-26  9:44                                                       ` Michael Holzheu
2011-07-22 15:26                                         ` Michael Holzheu
2011-07-22 15:26                                           ` Michael Holzheu
2011-07-25 18:07                                           ` Vivek Goyal
2011-07-25 18:07                                             ` Vivek Goyal
2011-07-26  9:32                                             ` Michael Holzheu
2011-07-26  9:32                                               ` Michael Holzheu
2011-07-15 13:56                       ` Michael Holzheu
2011-07-15 13:56                         ` Michael Holzheu
2011-07-15 14:18                         ` Vivek Goyal
2011-07-15 14:18                           ` Vivek Goyal
2011-07-18 13:57                       ` Martin Schwidefsky
2011-07-18 13:57                         ` Martin Schwidefsky
2011-07-08 13:04       ` Michael Holzheu
2011-07-08 13:04         ` Michael Holzheu
2011-07-11 15:36         ` Vivek Goyal
2011-07-11 15:36           ` Vivek Goyal
2011-07-12 17:29           ` Michael Holzheu
2011-07-12 17:29             ` Michael Holzheu
2011-07-08 14:02       ` Michael Holzheu
2011-07-11 14:07         ` Vivek Goyal
2011-07-11 14:07           ` Vivek Goyal
2011-07-11 15:06           ` Michael Holzheu
2011-07-11 15:06             ` Michael Holzheu
2011-07-09 17:58       ` Valdis.Kletnieks
2011-07-12 13:52         ` Vivek Goyal
2011-07-12 13:52           ` Vivek Goyal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.