All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][patch 0/2] kdump: Allow removal of page tables for crashkernel memory
@ 2011-09-08 13:26 ` Michael Holzheu
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Holzheu @ 2011-09-08 13:26 UTC (permalink / raw)
  To: vgoyal
  Cc: ebiederm, mahesh, schwidefsky, heiko.carstens, kexec,
	linux-kernel, linux-s390

Hello Vivek,

I am back from vacation. We had one topic left that we didn't finish last time:
The removal of page tables for the crashkernel memory to protect the
loaded kdump kernel.

This patch series implements a new proposal for that:
[1] kdump: Add infrastructure for unmapping crashkernel memory
[2] s390: Add architecture code for unmapping crashkernel memory

The patches apply on the last kdump patch series that I sent.

Michael

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC][patch 0/2] kdump: Allow removal of page tables for crashkernel memory
@ 2011-09-08 13:26 ` Michael Holzheu
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Holzheu @ 2011-09-08 13:26 UTC (permalink / raw)
  To: vgoyal
  Cc: linux-s390, mahesh, heiko.carstens, linux-kernel, ebiederm,
	schwidefsky, kexec

Hello Vivek,

I am back from vacation. We had one topic left that we didn't finish last time:
The removal of page tables for the crashkernel memory to protect the
loaded kdump kernel.

This patch series implements a new proposal for that:
[1] kdump: Add infrastructure for unmapping crashkernel memory
[2] s390: Add architecture code for unmapping crashkernel memory

The patches apply on the last kdump patch series that I sent.

Michael

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC][patch 1/2] kdump: Add infrastructure for unmapping crashkernel memory
  2011-09-08 13:26 ` Michael Holzheu
@ 2011-09-08 13:26   ` Michael Holzheu
  -1 siblings, 0 replies; 14+ messages in thread
From: Michael Holzheu @ 2011-09-08 13:26 UTC (permalink / raw)
  To: vgoyal
  Cc: ebiederm, mahesh, schwidefsky, heiko.carstens, kexec,
	linux-kernel, linux-s390

[-- Attachment #1: s390-kdump-common-crash_map_pages.patch --]
[-- Type: text/plain, Size: 3246 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

This patch introduces a mechanism that allows architecture backends to
remove page tables for the crashkernel memory. This can protect the loaded
kdump kernel from being overwritten by broken kernel code.
A new function crash_map_pages() is added that can be implemented by
architecture code. This function has the following syntax:

void crash_map_pages(int enable);

"enable" can be 0 for removing or 1 for adding page tables.  The function is
called before and after the crashkernel segments are loaded. It is also
called in crash_shrink_memory() to create new page tables when the
crashkernel memory size is reduced.

To support architectures that have large pages this patch also introduces
a new define KEXEC_CRASH_MEM_ALIGN. The crashkernel start and size must 
always be aligned with KEXEC_CRASH_MEM_ALIGN.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 include/linux/kexec.h |    5 +++++
 kernel/kexec.c        |   16 ++++++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)

--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -37,6 +37,10 @@
 #define KEXEC_CRASH_CONTROL_MEMORY_LIMIT KEXEC_CONTROL_MEMORY_LIMIT
 #endif
 
+#ifndef KEXEC_CRASH_MEM_ALIGN
+#define KEXEC_CRASH_MEM_ALIGN PAGE_SIZE
+#endif
+
 #define KEXEC_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4)
 #define KEXEC_CORE_NOTE_NAME "CORE"
 #define KEXEC_CORE_NOTE_NAME_BYTES ALIGN(sizeof(KEXEC_CORE_NOTE_NAME), 4)
@@ -133,6 +137,7 @@ extern void crash_kexec(struct pt_regs *
 int kexec_should_crash(struct task_struct *);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
 void crash_save_vmcoreinfo(void);
+void crash_map_pages(int enable);
 void arch_crash_save_vmcoreinfo(void);
 void vmcoreinfo_append_str(const char *fmt, ...)
 	__attribute__ ((format (printf, 1, 2)));
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -999,6 +999,7 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
 			kimage_free(xchg(&kexec_crash_image, NULL));
 			result = kimage_crash_alloc(&image, entry,
 						     nr_segments, segments);
+			crash_map_pages(1);
 		}
 		if (result)
 			goto out;
@@ -1015,6 +1016,8 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
 				goto out;
 		}
 		kimage_terminate(image);
+		if (flags & KEXEC_ON_CRASH)
+			crash_map_pages(0);
 	}
 	/* Install the new kernel, and  Uninstall the old */
 	image = xchg(dest_image, image);
@@ -1026,6 +1029,13 @@ out:
 	return result;
 }
 
+/*
+ * provide an empty default implementation here -- architecture
+ * code may override this
+ */
+void __weak crash_map_pages(int enable)
+{}
+
 #ifdef CONFIG_COMPAT
 asmlinkage long compat_sys_kexec_load(unsigned long entry,
 				unsigned long nr_segments,
@@ -1134,14 +1144,16 @@ int crash_shrink_memory(unsigned long ne
 		goto unlock;
 	}
 
-	start = roundup(start, PAGE_SIZE);
-	end = roundup(start + new_size, PAGE_SIZE);
+	start = roundup(start, KEXEC_CRASH_MEM_ALIGN);
+	end = roundup(start + new_size, KEXEC_CRASH_MEM_ALIGN);
 
+	crash_map_pages(1);
 	crash_free_reserved_phys_range(end, crashk_res.end);
 
 	if ((start == end) && (crashk_res.parent != NULL))
 		release_resource(&crashk_res);
 	crashk_res.end = end - 1;
+	crash_map_pages(0);
 
 unlock:
 	mutex_unlock(&kexec_mutex);


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC][patch 1/2] kdump: Add infrastructure for unmapping crashkernel memory
@ 2011-09-08 13:26   ` Michael Holzheu
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Holzheu @ 2011-09-08 13:26 UTC (permalink / raw)
  To: vgoyal
  Cc: linux-s390, mahesh, heiko.carstens, linux-kernel, ebiederm,
	schwidefsky, kexec

[-- Attachment #1: s390-kdump-common-crash_map_pages.patch --]
[-- Type: text/plain, Size: 3390 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

This patch introduces a mechanism that allows architecture backends to
remove page tables for the crashkernel memory. This can protect the loaded
kdump kernel from being overwritten by broken kernel code.
A new function crash_map_pages() is added that can be implemented by
architecture code. This function has the following syntax:

void crash_map_pages(int enable);

"enable" can be 0 for removing or 1 for adding page tables.  The function is
called before and after the crashkernel segments are loaded. It is also
called in crash_shrink_memory() to create new page tables when the
crashkernel memory size is reduced.

To support architectures that have large pages this patch also introduces
a new define KEXEC_CRASH_MEM_ALIGN. The crashkernel start and size must 
always be aligned with KEXEC_CRASH_MEM_ALIGN.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 include/linux/kexec.h |    5 +++++
 kernel/kexec.c        |   16 ++++++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)

--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -37,6 +37,10 @@
 #define KEXEC_CRASH_CONTROL_MEMORY_LIMIT KEXEC_CONTROL_MEMORY_LIMIT
 #endif
 
+#ifndef KEXEC_CRASH_MEM_ALIGN
+#define KEXEC_CRASH_MEM_ALIGN PAGE_SIZE
+#endif
+
 #define KEXEC_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4)
 #define KEXEC_CORE_NOTE_NAME "CORE"
 #define KEXEC_CORE_NOTE_NAME_BYTES ALIGN(sizeof(KEXEC_CORE_NOTE_NAME), 4)
@@ -133,6 +137,7 @@ extern void crash_kexec(struct pt_regs *
 int kexec_should_crash(struct task_struct *);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
 void crash_save_vmcoreinfo(void);
+void crash_map_pages(int enable);
 void arch_crash_save_vmcoreinfo(void);
 void vmcoreinfo_append_str(const char *fmt, ...)
 	__attribute__ ((format (printf, 1, 2)));
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -999,6 +999,7 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
 			kimage_free(xchg(&kexec_crash_image, NULL));
 			result = kimage_crash_alloc(&image, entry,
 						     nr_segments, segments);
+			crash_map_pages(1);
 		}
 		if (result)
 			goto out;
@@ -1015,6 +1016,8 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
 				goto out;
 		}
 		kimage_terminate(image);
+		if (flags & KEXEC_ON_CRASH)
+			crash_map_pages(0);
 	}
 	/* Install the new kernel, and  Uninstall the old */
 	image = xchg(dest_image, image);
@@ -1026,6 +1029,13 @@ out:
 	return result;
 }
 
+/*
+ * provide an empty default implementation here -- architecture
+ * code may override this
+ */
+void __weak crash_map_pages(int enable)
+{}
+
 #ifdef CONFIG_COMPAT
 asmlinkage long compat_sys_kexec_load(unsigned long entry,
 				unsigned long nr_segments,
@@ -1134,14 +1144,16 @@ int crash_shrink_memory(unsigned long ne
 		goto unlock;
 	}
 
-	start = roundup(start, PAGE_SIZE);
-	end = roundup(start + new_size, PAGE_SIZE);
+	start = roundup(start, KEXEC_CRASH_MEM_ALIGN);
+	end = roundup(start + new_size, KEXEC_CRASH_MEM_ALIGN);
 
+	crash_map_pages(1);
 	crash_free_reserved_phys_range(end, crashk_res.end);
 
 	if ((start == end) && (crashk_res.parent != NULL))
 		release_resource(&crashk_res);
 	crashk_res.end = end - 1;
+	crash_map_pages(0);
 
 unlock:
 	mutex_unlock(&kexec_mutex);


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC][patch 2/2] s390: Add architecture code for unmapping crashkernel memory
  2011-09-08 13:26 ` Michael Holzheu
@ 2011-09-08 13:26   ` Michael Holzheu
  -1 siblings, 0 replies; 14+ messages in thread
From: Michael Holzheu @ 2011-09-08 13:26 UTC (permalink / raw)
  To: vgoyal
  Cc: ebiederm, mahesh, schwidefsky, heiko.carstens, kexec,
	linux-kernel, linux-s390

[-- Attachment #1: s390-kdump-arch-crash_map_pages.patch --]
[-- Type: text/plain, Size: 3075 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

This patch implements the crash_map_pages() function for s390.
KEXEC_CRASH_MEM_ALIGN is set to HPAGE_SIZE, in order to support
kernel mappings that use large pages.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 arch/s390/include/asm/kexec.h    |    3 +++
 arch/s390/kernel/machine_kexec.c |   15 +++++++++++++++
 arch/s390/kernel/setup.c         |   10 ++++++----
 3 files changed, 24 insertions(+), 4 deletions(-)

--- a/arch/s390/include/asm/kexec.h
+++ b/arch/s390/include/asm/kexec.h
@@ -36,6 +36,9 @@
 /* Allocate one page for the pdp and the second for the code */
 #define KEXEC_CONTROL_PAGE_SIZE 4096
 
+/* Alignment of crashkernel memory */
+#define KEXEC_CRASH_MEM_ALIGN HPAGE_SIZE
+
 /* The native architecture */
 #define KEXEC_ARCH KEXEC_ARCH_S390
 
--- a/arch/s390/kernel/machine_kexec.c
+++ b/arch/s390/kernel/machine_kexec.c
@@ -243,6 +243,21 @@ static void __machine_kdump(void *image)
 #endif
 
 /*
+ * Map or unmap crashkernel memory
+ */
+void crash_map_pages(int enable)
+{
+	unsigned long size = crashk_res.end - crashk_res.start + 1;
+
+	BUG_ON(crashk_res.start % KEXEC_CRASH_MEM_ALIGN ||
+	       size % KEXEC_CRASH_MEM_ALIGN);
+	if (enable)
+		vmem_add_mapping(crashk_res.start, size);
+	else
+		vmem_remove_mapping(crashk_res.start, size);
+}
+
+/*
  * Give back memory to hypervisor before new kdump is loaded
  */
 static int machine_kexec_prepare_kdump(void)
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -446,6 +446,7 @@ static void __init setup_resources(void)
 		res->flags = IORESOURCE_BUSY | IORESOURCE_MEM;
 		switch (memory_chunk[i].type) {
 		case CHUNK_READ_WRITE:
+		case CHUNK_CRASHK:
 			res->name = "System RAM";
 			break;
 		case CHUNK_READ_ONLY:
@@ -706,8 +707,8 @@ static void __init reserve_crashkernel(v
 			       &crash_base);
 	if (rc || crash_size == 0)
 		return;
-	crash_base = PAGE_ALIGN(crash_base);
-	crash_size = PAGE_ALIGN(crash_size);
+	crash_base = ALIGN(crash_base, KEXEC_CRASH_MEM_ALIGN);
+	crash_size = ALIGN(crash_size, KEXEC_CRASH_MEM_ALIGN);
 	if (register_memory_notifier(&kdump_mem_nb))
 		return;
 	if (!crash_base)
@@ -727,7 +728,7 @@ static void __init reserve_crashkernel(v
 	crashk_res.start = crash_base;
 	crashk_res.end = crash_base + crash_size - 1;
 	insert_resource(&iomem_resource, &crashk_res);
-	reserve_kdump_bootmem(crash_base, crash_size, CHUNK_READ_WRITE);
+	reserve_kdump_bootmem(crash_base, crash_size, CHUNK_CRASHK);
 	pr_info("Reserving %lluMB of memory at %lluMB "
 		"for crashkernel (System RAM: %luMB)\n",
 		crash_size >> 20, crash_base >> 20, memory_end >> 20);
@@ -802,7 +803,8 @@ setup_memory(void)
 	for (i = 0; i < MEMORY_CHUNKS && memory_chunk[i].size > 0; i++) {
 		unsigned long start_chunk, end_chunk, pfn;
 
-		if (memory_chunk[i].type != CHUNK_READ_WRITE)
+		if (memory_chunk[i].type != CHUNK_READ_WRITE &&
+		    memory_chunk[i].type != CHUNK_CRASHK)
 			continue;
 		start_chunk = PFN_DOWN(memory_chunk[i].addr);
 		end_chunk = start_chunk + PFN_DOWN(memory_chunk[i].size);


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC][patch 2/2] s390: Add architecture code for unmapping crashkernel memory
@ 2011-09-08 13:26   ` Michael Holzheu
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Holzheu @ 2011-09-08 13:26 UTC (permalink / raw)
  To: vgoyal
  Cc: linux-s390, mahesh, heiko.carstens, linux-kernel, ebiederm,
	schwidefsky, kexec

[-- Attachment #1: s390-kdump-arch-crash_map_pages.patch --]
[-- Type: text/plain, Size: 3219 bytes --]

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

This patch implements the crash_map_pages() function for s390.
KEXEC_CRASH_MEM_ALIGN is set to HPAGE_SIZE, in order to support
kernel mappings that use large pages.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 arch/s390/include/asm/kexec.h    |    3 +++
 arch/s390/kernel/machine_kexec.c |   15 +++++++++++++++
 arch/s390/kernel/setup.c         |   10 ++++++----
 3 files changed, 24 insertions(+), 4 deletions(-)

--- a/arch/s390/include/asm/kexec.h
+++ b/arch/s390/include/asm/kexec.h
@@ -36,6 +36,9 @@
 /* Allocate one page for the pdp and the second for the code */
 #define KEXEC_CONTROL_PAGE_SIZE 4096
 
+/* Alignment of crashkernel memory */
+#define KEXEC_CRASH_MEM_ALIGN HPAGE_SIZE
+
 /* The native architecture */
 #define KEXEC_ARCH KEXEC_ARCH_S390
 
--- a/arch/s390/kernel/machine_kexec.c
+++ b/arch/s390/kernel/machine_kexec.c
@@ -243,6 +243,21 @@ static void __machine_kdump(void *image)
 #endif
 
 /*
+ * Map or unmap crashkernel memory
+ */
+void crash_map_pages(int enable)
+{
+	unsigned long size = crashk_res.end - crashk_res.start + 1;
+
+	BUG_ON(crashk_res.start % KEXEC_CRASH_MEM_ALIGN ||
+	       size % KEXEC_CRASH_MEM_ALIGN);
+	if (enable)
+		vmem_add_mapping(crashk_res.start, size);
+	else
+		vmem_remove_mapping(crashk_res.start, size);
+}
+
+/*
  * Give back memory to hypervisor before new kdump is loaded
  */
 static int machine_kexec_prepare_kdump(void)
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -446,6 +446,7 @@ static void __init setup_resources(void)
 		res->flags = IORESOURCE_BUSY | IORESOURCE_MEM;
 		switch (memory_chunk[i].type) {
 		case CHUNK_READ_WRITE:
+		case CHUNK_CRASHK:
 			res->name = "System RAM";
 			break;
 		case CHUNK_READ_ONLY:
@@ -706,8 +707,8 @@ static void __init reserve_crashkernel(v
 			       &crash_base);
 	if (rc || crash_size == 0)
 		return;
-	crash_base = PAGE_ALIGN(crash_base);
-	crash_size = PAGE_ALIGN(crash_size);
+	crash_base = ALIGN(crash_base, KEXEC_CRASH_MEM_ALIGN);
+	crash_size = ALIGN(crash_size, KEXEC_CRASH_MEM_ALIGN);
 	if (register_memory_notifier(&kdump_mem_nb))
 		return;
 	if (!crash_base)
@@ -727,7 +728,7 @@ static void __init reserve_crashkernel(v
 	crashk_res.start = crash_base;
 	crashk_res.end = crash_base + crash_size - 1;
 	insert_resource(&iomem_resource, &crashk_res);
-	reserve_kdump_bootmem(crash_base, crash_size, CHUNK_READ_WRITE);
+	reserve_kdump_bootmem(crash_base, crash_size, CHUNK_CRASHK);
 	pr_info("Reserving %lluMB of memory at %lluMB "
 		"for crashkernel (System RAM: %luMB)\n",
 		crash_size >> 20, crash_base >> 20, memory_end >> 20);
@@ -802,7 +803,8 @@ setup_memory(void)
 	for (i = 0; i < MEMORY_CHUNKS && memory_chunk[i].size > 0; i++) {
 		unsigned long start_chunk, end_chunk, pfn;
 
-		if (memory_chunk[i].type != CHUNK_READ_WRITE)
+		if (memory_chunk[i].type != CHUNK_READ_WRITE &&
+		    memory_chunk[i].type != CHUNK_CRASHK)
 			continue;
 		start_chunk = PFN_DOWN(memory_chunk[i].addr);
 		end_chunk = start_chunk + PFN_DOWN(memory_chunk[i].size);


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][patch 1/2] kdump: Add infrastructure for unmapping crashkernel memory
  2011-09-08 13:26   ` Michael Holzheu
@ 2011-09-09 18:23     ` Vivek Goyal
  -1 siblings, 0 replies; 14+ messages in thread
From: Vivek Goyal @ 2011-09-09 18:23 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: ebiederm, mahesh, schwidefsky, heiko.carstens, kexec,
	linux-kernel, linux-s390

On Thu, Sep 08, 2011 at 03:26:10PM +0200, Michael Holzheu wrote:
> From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> 
> This patch introduces a mechanism that allows architecture backends to
> remove page tables for the crashkernel memory. This can protect the loaded
> kdump kernel from being overwritten by broken kernel code.
> A new function crash_map_pages() is added that can be implemented by
> architecture code. This function has the following syntax:

I guess having separate functions for mapping and unmapping pages might
look cleaner. Because we are not passing a page range, so specifying
what pages we are talking about in function name might make it more
clear.

crash_map_reserved_pages()
crash_unmap_reserved_pages().

Secondly, what happens to the code which runs after crash (crash_kexec()).
Current x86 code assumes that reserved region is mapped at the time of
crash and does few things with control page there. 

So this generic approach is not valid atleast for x86, because it does
not tackle the scenario about how to map reserved range again once 
kernel crashes. It will only work if there is assumption that after
a crash, we don't expect reserved range/pages to be mapped.

Thanks
Vivek
 

> 
> void crash_map_pages(int enable);
> 
> "enable" can be 0 for removing or 1 for adding page tables.  The function is
> called before and after the crashkernel segments are loaded. It is also
> called in crash_shrink_memory() to create new page tables when the
> crashkernel memory size is reduced.
> 
> To support architectures that have large pages this patch also introduces
> a new define KEXEC_CRASH_MEM_ALIGN. The crashkernel start and size must 
> always be aligned with KEXEC_CRASH_MEM_ALIGN.
> 
> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> ---
>  include/linux/kexec.h |    5 +++++
>  kernel/kexec.c        |   16 ++++++++++++++--
>  2 files changed, 19 insertions(+), 2 deletions(-)
> 
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -37,6 +37,10 @@
>  #define KEXEC_CRASH_CONTROL_MEMORY_LIMIT KEXEC_CONTROL_MEMORY_LIMIT
>  #endif
>  
> +#ifndef KEXEC_CRASH_MEM_ALIGN
> +#define KEXEC_CRASH_MEM_ALIGN PAGE_SIZE
> +#endif
> +
>  #define KEXEC_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4)
>  #define KEXEC_CORE_NOTE_NAME "CORE"
>  #define KEXEC_CORE_NOTE_NAME_BYTES ALIGN(sizeof(KEXEC_CORE_NOTE_NAME), 4)
> @@ -133,6 +137,7 @@ extern void crash_kexec(struct pt_regs *
>  int kexec_should_crash(struct task_struct *);
>  void crash_save_cpu(struct pt_regs *regs, int cpu);
>  void crash_save_vmcoreinfo(void);
> +void crash_map_pages(int enable);
>  void arch_crash_save_vmcoreinfo(void);
>  void vmcoreinfo_append_str(const char *fmt, ...)
>  	__attribute__ ((format (printf, 1, 2)));
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -999,6 +999,7 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
>  			kimage_free(xchg(&kexec_crash_image, NULL));
>  			result = kimage_crash_alloc(&image, entry,
>  						     nr_segments, segments);
> +			crash_map_pages(1);
>  		}
>  		if (result)
>  			goto out;
> @@ -1015,6 +1016,8 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
>  				goto out;
>  		}
>  		kimage_terminate(image);
> +		if (flags & KEXEC_ON_CRASH)
> +			crash_map_pages(0);
>  	}
>  	/* Install the new kernel, and  Uninstall the old */
>  	image = xchg(dest_image, image);
> @@ -1026,6 +1029,13 @@ out:
>  	return result;
>  }
>  
> +/*
> + * provide an empty default implementation here -- architecture
> + * code may override this
> + */
> +void __weak crash_map_pages(int enable)
> +{}
> +
>  #ifdef CONFIG_COMPAT
>  asmlinkage long compat_sys_kexec_load(unsigned long entry,
>  				unsigned long nr_segments,
> @@ -1134,14 +1144,16 @@ int crash_shrink_memory(unsigned long ne
>  		goto unlock;
>  	}
>  
> -	start = roundup(start, PAGE_SIZE);
> -	end = roundup(start + new_size, PAGE_SIZE);
> +	start = roundup(start, KEXEC_CRASH_MEM_ALIGN);
> +	end = roundup(start + new_size, KEXEC_CRASH_MEM_ALIGN);
>  
> +	crash_map_pages(1);
>  	crash_free_reserved_phys_range(end, crashk_res.end);
>  
>  	if ((start == end) && (crashk_res.parent != NULL))
>  		release_resource(&crashk_res);
>  	crashk_res.end = end - 1;
> +	crash_map_pages(0);
>  
>  unlock:
>  	mutex_unlock(&kexec_mutex);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][patch 1/2] kdump: Add infrastructure for unmapping crashkernel memory
@ 2011-09-09 18:23     ` Vivek Goyal
  0 siblings, 0 replies; 14+ messages in thread
From: Vivek Goyal @ 2011-09-09 18:23 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: linux-s390, mahesh, heiko.carstens, linux-kernel, ebiederm,
	schwidefsky, kexec

On Thu, Sep 08, 2011 at 03:26:10PM +0200, Michael Holzheu wrote:
> From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> 
> This patch introduces a mechanism that allows architecture backends to
> remove page tables for the crashkernel memory. This can protect the loaded
> kdump kernel from being overwritten by broken kernel code.
> A new function crash_map_pages() is added that can be implemented by
> architecture code. This function has the following syntax:

I guess having separate functions for mapping and unmapping pages might
look cleaner. Because we are not passing a page range, so specifying
what pages we are talking about in function name might make it more
clear.

crash_map_reserved_pages()
crash_unmap_reserved_pages().

Secondly, what happens to the code which runs after crash (crash_kexec()).
Current x86 code assumes that reserved region is mapped at the time of
crash and does few things with control page there. 

So this generic approach is not valid atleast for x86, because it does
not tackle the scenario about how to map reserved range again once 
kernel crashes. It will only work if there is assumption that after
a crash, we don't expect reserved range/pages to be mapped.

Thanks
Vivek
 

> 
> void crash_map_pages(int enable);
> 
> "enable" can be 0 for removing or 1 for adding page tables.  The function is
> called before and after the crashkernel segments are loaded. It is also
> called in crash_shrink_memory() to create new page tables when the
> crashkernel memory size is reduced.
> 
> To support architectures that have large pages this patch also introduces
> a new define KEXEC_CRASH_MEM_ALIGN. The crashkernel start and size must 
> always be aligned with KEXEC_CRASH_MEM_ALIGN.
> 
> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> ---
>  include/linux/kexec.h |    5 +++++
>  kernel/kexec.c        |   16 ++++++++++++++--
>  2 files changed, 19 insertions(+), 2 deletions(-)
> 
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -37,6 +37,10 @@
>  #define KEXEC_CRASH_CONTROL_MEMORY_LIMIT KEXEC_CONTROL_MEMORY_LIMIT
>  #endif
>  
> +#ifndef KEXEC_CRASH_MEM_ALIGN
> +#define KEXEC_CRASH_MEM_ALIGN PAGE_SIZE
> +#endif
> +
>  #define KEXEC_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4)
>  #define KEXEC_CORE_NOTE_NAME "CORE"
>  #define KEXEC_CORE_NOTE_NAME_BYTES ALIGN(sizeof(KEXEC_CORE_NOTE_NAME), 4)
> @@ -133,6 +137,7 @@ extern void crash_kexec(struct pt_regs *
>  int kexec_should_crash(struct task_struct *);
>  void crash_save_cpu(struct pt_regs *regs, int cpu);
>  void crash_save_vmcoreinfo(void);
> +void crash_map_pages(int enable);
>  void arch_crash_save_vmcoreinfo(void);
>  void vmcoreinfo_append_str(const char *fmt, ...)
>  	__attribute__ ((format (printf, 1, 2)));
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -999,6 +999,7 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
>  			kimage_free(xchg(&kexec_crash_image, NULL));
>  			result = kimage_crash_alloc(&image, entry,
>  						     nr_segments, segments);
> +			crash_map_pages(1);
>  		}
>  		if (result)
>  			goto out;
> @@ -1015,6 +1016,8 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
>  				goto out;
>  		}
>  		kimage_terminate(image);
> +		if (flags & KEXEC_ON_CRASH)
> +			crash_map_pages(0);
>  	}
>  	/* Install the new kernel, and  Uninstall the old */
>  	image = xchg(dest_image, image);
> @@ -1026,6 +1029,13 @@ out:
>  	return result;
>  }
>  
> +/*
> + * provide an empty default implementation here -- architecture
> + * code may override this
> + */
> +void __weak crash_map_pages(int enable)
> +{}
> +
>  #ifdef CONFIG_COMPAT
>  asmlinkage long compat_sys_kexec_load(unsigned long entry,
>  				unsigned long nr_segments,
> @@ -1134,14 +1144,16 @@ int crash_shrink_memory(unsigned long ne
>  		goto unlock;
>  	}
>  
> -	start = roundup(start, PAGE_SIZE);
> -	end = roundup(start + new_size, PAGE_SIZE);
> +	start = roundup(start, KEXEC_CRASH_MEM_ALIGN);
> +	end = roundup(start + new_size, KEXEC_CRASH_MEM_ALIGN);
>  
> +	crash_map_pages(1);
>  	crash_free_reserved_phys_range(end, crashk_res.end);
>  
>  	if ((start == end) && (crashk_res.parent != NULL))
>  		release_resource(&crashk_res);
>  	crashk_res.end = end - 1;
> +	crash_map_pages(0);
>  
>  unlock:
>  	mutex_unlock(&kexec_mutex);

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][patch 1/2] kdump: Add infrastructure for unmapping crashkernel memory
  2011-09-08 13:26   ` Michael Holzheu
@ 2011-09-09 19:30     ` Vivek Goyal
  -1 siblings, 0 replies; 14+ messages in thread
From: Vivek Goyal @ 2011-09-09 19:30 UTC (permalink / raw)
  To: Michael Holzheu, Huang Ying
  Cc: ebiederm, mahesh, schwidefsky, heiko.carstens, kexec,
	linux-kernel, linux-s390

On Thu, Sep 08, 2011 at 03:26:10PM +0200, Michael Holzheu wrote:
> From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> 
> This patch introduces a mechanism that allows architecture backends to
> remove page tables for the crashkernel memory. This can protect the loaded
> kdump kernel from being overwritten by broken kernel code.
> A new function crash_map_pages() is added that can be implemented by
> architecture code. This function has the following syntax:
> 
> void crash_map_pages(int enable);

CCing Huang Ying. I am not sure if preserve context thing will be impacted
by this in anyway.

Hyuang,

While I am looking at x86 code, I had a question. gitblame tells you
changed that code last, so here I go.

What is init_transition_pgtable() and why do we need it. I see that 
init_pgtable() sets up identity mapped page table from 0 to max_pfn.
Code running in control page (identity_mapped onwards) will make use
of identity mapped page tables. Then I see that init_transition_pgtable()
goes ahead and seems to be modifying identity mapped page tables to
map address relocate_kernel to control code physical address. Why
do we have to do that?

Thanks
Vivek

> 
> "enable" can be 0 for removing or 1 for adding page tables.  The function is
> called before and after the crashkernel segments are loaded. It is also
> called in crash_shrink_memory() to create new page tables when the
> crashkernel memory size is reduced.
> 
> To support architectures that have large pages this patch also introduces
> a new define KEXEC_CRASH_MEM_ALIGN. The crashkernel start and size must 
> always be aligned with KEXEC_CRASH_MEM_ALIGN.
> 
> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> ---
>  include/linux/kexec.h |    5 +++++
>  kernel/kexec.c        |   16 ++++++++++++++--
>  2 files changed, 19 insertions(+), 2 deletions(-)
> 
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -37,6 +37,10 @@
>  #define KEXEC_CRASH_CONTROL_MEMORY_LIMIT KEXEC_CONTROL_MEMORY_LIMIT
>  #endif
>  
> +#ifndef KEXEC_CRASH_MEM_ALIGN
> +#define KEXEC_CRASH_MEM_ALIGN PAGE_SIZE
> +#endif
> +
>  #define KEXEC_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4)
>  #define KEXEC_CORE_NOTE_NAME "CORE"
>  #define KEXEC_CORE_NOTE_NAME_BYTES ALIGN(sizeof(KEXEC_CORE_NOTE_NAME), 4)
> @@ -133,6 +137,7 @@ extern void crash_kexec(struct pt_regs *
>  int kexec_should_crash(struct task_struct *);
>  void crash_save_cpu(struct pt_regs *regs, int cpu);
>  void crash_save_vmcoreinfo(void);
> +void crash_map_pages(int enable);
>  void arch_crash_save_vmcoreinfo(void);
>  void vmcoreinfo_append_str(const char *fmt, ...)
>  	__attribute__ ((format (printf, 1, 2)));
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -999,6 +999,7 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
>  			kimage_free(xchg(&kexec_crash_image, NULL));
>  			result = kimage_crash_alloc(&image, entry,
>  						     nr_segments, segments);
> +			crash_map_pages(1);
>  		}
>  		if (result)
>  			goto out;
> @@ -1015,6 +1016,8 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
>  				goto out;
>  		}
>  		kimage_terminate(image);
> +		if (flags & KEXEC_ON_CRASH)
> +			crash_map_pages(0);
>  	}
>  	/* Install the new kernel, and  Uninstall the old */
>  	image = xchg(dest_image, image);
> @@ -1026,6 +1029,13 @@ out:
>  	return result;
>  }
>  
> +/*
> + * provide an empty default implementation here -- architecture
> + * code may override this
> + */
> +void __weak crash_map_pages(int enable)
> +{}
> +
>  #ifdef CONFIG_COMPAT
>  asmlinkage long compat_sys_kexec_load(unsigned long entry,
>  				unsigned long nr_segments,
> @@ -1134,14 +1144,16 @@ int crash_shrink_memory(unsigned long ne
>  		goto unlock;
>  	}
>  
> -	start = roundup(start, PAGE_SIZE);
> -	end = roundup(start + new_size, PAGE_SIZE);
> +	start = roundup(start, KEXEC_CRASH_MEM_ALIGN);
> +	end = roundup(start + new_size, KEXEC_CRASH_MEM_ALIGN);
>  
> +	crash_map_pages(1);
>  	crash_free_reserved_phys_range(end, crashk_res.end);
>  
>  	if ((start == end) && (crashk_res.parent != NULL))
>  		release_resource(&crashk_res);
>  	crashk_res.end = end - 1;
> +	crash_map_pages(0);
>  
>  unlock:
>  	mutex_unlock(&kexec_mutex);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][patch 1/2] kdump: Add infrastructure for unmapping crashkernel memory
@ 2011-09-09 19:30     ` Vivek Goyal
  0 siblings, 0 replies; 14+ messages in thread
From: Vivek Goyal @ 2011-09-09 19:30 UTC (permalink / raw)
  To: Michael Holzheu, Huang Ying
  Cc: linux-s390, mahesh, heiko.carstens, linux-kernel, ebiederm,
	schwidefsky, kexec

On Thu, Sep 08, 2011 at 03:26:10PM +0200, Michael Holzheu wrote:
> From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> 
> This patch introduces a mechanism that allows architecture backends to
> remove page tables for the crashkernel memory. This can protect the loaded
> kdump kernel from being overwritten by broken kernel code.
> A new function crash_map_pages() is added that can be implemented by
> architecture code. This function has the following syntax:
> 
> void crash_map_pages(int enable);

CCing Huang Ying. I am not sure if preserve context thing will be impacted
by this in anyway.

Hyuang,

While I am looking at x86 code, I had a question. gitblame tells you
changed that code last, so here I go.

What is init_transition_pgtable() and why do we need it. I see that 
init_pgtable() sets up identity mapped page table from 0 to max_pfn.
Code running in control page (identity_mapped onwards) will make use
of identity mapped page tables. Then I see that init_transition_pgtable()
goes ahead and seems to be modifying identity mapped page tables to
map address relocate_kernel to control code physical address. Why
do we have to do that?

Thanks
Vivek

> 
> "enable" can be 0 for removing or 1 for adding page tables.  The function is
> called before and after the crashkernel segments are loaded. It is also
> called in crash_shrink_memory() to create new page tables when the
> crashkernel memory size is reduced.
> 
> To support architectures that have large pages this patch also introduces
> a new define KEXEC_CRASH_MEM_ALIGN. The crashkernel start and size must 
> always be aligned with KEXEC_CRASH_MEM_ALIGN.
> 
> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> ---
>  include/linux/kexec.h |    5 +++++
>  kernel/kexec.c        |   16 ++++++++++++++--
>  2 files changed, 19 insertions(+), 2 deletions(-)
> 
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -37,6 +37,10 @@
>  #define KEXEC_CRASH_CONTROL_MEMORY_LIMIT KEXEC_CONTROL_MEMORY_LIMIT
>  #endif
>  
> +#ifndef KEXEC_CRASH_MEM_ALIGN
> +#define KEXEC_CRASH_MEM_ALIGN PAGE_SIZE
> +#endif
> +
>  #define KEXEC_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4)
>  #define KEXEC_CORE_NOTE_NAME "CORE"
>  #define KEXEC_CORE_NOTE_NAME_BYTES ALIGN(sizeof(KEXEC_CORE_NOTE_NAME), 4)
> @@ -133,6 +137,7 @@ extern void crash_kexec(struct pt_regs *
>  int kexec_should_crash(struct task_struct *);
>  void crash_save_cpu(struct pt_regs *regs, int cpu);
>  void crash_save_vmcoreinfo(void);
> +void crash_map_pages(int enable);
>  void arch_crash_save_vmcoreinfo(void);
>  void vmcoreinfo_append_str(const char *fmt, ...)
>  	__attribute__ ((format (printf, 1, 2)));
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -999,6 +999,7 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
>  			kimage_free(xchg(&kexec_crash_image, NULL));
>  			result = kimage_crash_alloc(&image, entry,
>  						     nr_segments, segments);
> +			crash_map_pages(1);
>  		}
>  		if (result)
>  			goto out;
> @@ -1015,6 +1016,8 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
>  				goto out;
>  		}
>  		kimage_terminate(image);
> +		if (flags & KEXEC_ON_CRASH)
> +			crash_map_pages(0);
>  	}
>  	/* Install the new kernel, and  Uninstall the old */
>  	image = xchg(dest_image, image);
> @@ -1026,6 +1029,13 @@ out:
>  	return result;
>  }
>  
> +/*
> + * provide an empty default implementation here -- architecture
> + * code may override this
> + */
> +void __weak crash_map_pages(int enable)
> +{}
> +
>  #ifdef CONFIG_COMPAT
>  asmlinkage long compat_sys_kexec_load(unsigned long entry,
>  				unsigned long nr_segments,
> @@ -1134,14 +1144,16 @@ int crash_shrink_memory(unsigned long ne
>  		goto unlock;
>  	}
>  
> -	start = roundup(start, PAGE_SIZE);
> -	end = roundup(start + new_size, PAGE_SIZE);
> +	start = roundup(start, KEXEC_CRASH_MEM_ALIGN);
> +	end = roundup(start + new_size, KEXEC_CRASH_MEM_ALIGN);
>  
> +	crash_map_pages(1);
>  	crash_free_reserved_phys_range(end, crashk_res.end);
>  
>  	if ((start == end) && (crashk_res.parent != NULL))
>  		release_resource(&crashk_res);
>  	crashk_res.end = end - 1;
> +	crash_map_pages(0);
>  
>  unlock:
>  	mutex_unlock(&kexec_mutex);

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][patch 1/2] kdump: Add infrastructure for unmapping crashkernel memory
  2011-09-09 18:23     ` Vivek Goyal
@ 2011-09-12 15:55       ` Michael Holzheu
  -1 siblings, 0 replies; 14+ messages in thread
From: Michael Holzheu @ 2011-09-12 15:55 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: ebiederm, mahesh, schwidefsky, heiko.carstens, kexec,
	linux-kernel, linux-s390

Hello Vivek,

On Fri, 2011-09-09 at 14:23 -0400, Vivek Goyal wrote:
> On Thu, Sep 08, 2011 at 03:26:10PM +0200, Michael Holzheu wrote:
> > From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> > 
> > This patch introduces a mechanism that allows architecture backends to
> > remove page tables for the crashkernel memory. This can protect the loaded
> > kdump kernel from being overwritten by broken kernel code.
> > A new function crash_map_pages() is added that can be implemented by
> > architecture code. This function has the following syntax:
> 
> I guess having separate functions for mapping and unmapping pages might
> look cleaner. Because we are not passing a page range, so specifying
> what pages we are talking about in function name might make it more
> clear.
> 
> crash_map_reserved_pages()
> crash_unmap_reserved_pages().

Ok fine, no problem.

> Secondly, what happens to the code which runs after crash (crash_kexec()).
> Current x86 code assumes that reserved region is mapped at the time of
> crash and does few things with control page there. 

For s390, purgatory code can run in real mode. No page tables are
required.

> So this generic approach is not valid atleast for x86, because it does
> not tackle the scenario about how to map reserved range again once 
> kernel crashes. It will only work if there is assumption that after
> a crash, we don't expect reserved range/pages to be mapped.

All architectures that support unmapping of crashkernel memory have to
deal with this problem somehow. Either remap the crashkernel memory in
machine_kexec() again or be able to run in real mode.

I adjusted that patch regarding your comment above. Will the following
patch be ok for you?
---
Subject: kdump: Add infrastructure for unmapping crashkernel memory

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

This patch introduces a mechanism that allows architecture backends to
remove page tables for the crashkernel memory. This can protect the loaded
kdump kernel from being overwritten by broken kernel code.  Two new
functions crash_map_reserved_pages() and crash_unmap_reserved_pages() are
added that can be implemented by architecture code.  The
crash_map_reserved_pages() function is called before and
crash_unmap_reserved_pages() after the crashkernel segments are loaded.  The
functions are also called in crash_shrink_memory() to create/remove page
tables when the crashkernel memory size is reduced.

To support architectures that have large pages this patch also introduces
a new define KEXEC_CRASH_MEM_ALIGN. The crashkernel start and size must 
always be aligned with KEXEC_CRASH_MEM_ALIGN.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 include/linux/kexec.h |    6 ++++++
 kernel/kexec.c        |   21 +++++++++++++++++++--
 2 files changed, 25 insertions(+), 2 deletions(-)

--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -37,6 +37,10 @@
 #define KEXEC_CRASH_CONTROL_MEMORY_LIMIT KEXEC_CONTROL_MEMORY_LIMIT
 #endif
 
+#ifndef KEXEC_CRASH_MEM_ALIGN
+#define KEXEC_CRASH_MEM_ALIGN PAGE_SIZE
+#endif
+
 #define KEXEC_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4)
 #define KEXEC_CORE_NOTE_NAME "CORE"
 #define KEXEC_CORE_NOTE_NAME_BYTES ALIGN(sizeof(KEXEC_CORE_NOTE_NAME), 4)
@@ -133,6 +137,8 @@ extern void crash_kexec(struct pt_regs *
 int kexec_should_crash(struct task_struct *);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
 void crash_save_vmcoreinfo(void);
+void crash_map_reserved_pages(void);
+void crash_unmap_reserved_pages(void);
 void arch_crash_save_vmcoreinfo(void);
 void vmcoreinfo_append_str(const char *fmt, ...)
 	__attribute__ ((format (printf, 1, 2)));
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -999,6 +999,7 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
 			kimage_free(xchg(&kexec_crash_image, NULL));
 			result = kimage_crash_alloc(&image, entry,
 						     nr_segments, segments);
+			crash_map_reserved_pages();
 		}
 		if (result)
 			goto out;
@@ -1015,6 +1016,8 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
 				goto out;
 		}
 		kimage_terminate(image);
+		if (flags & KEXEC_ON_CRASH)
+			crash_unmap_reserved_pages();
 	}
 	/* Install the new kernel, and  Uninstall the old */
 	image = xchg(dest_image, image);
@@ -1026,6 +1029,18 @@ out:
 	return result;
 }
 
+/*
+ * Add and remove page tables for crashkernel memory
+ *
+ * Provide an empty default implementation here -- architecture
+ * code may override this
+ */
+void __weak crash_map_reserved_pages(void)
+{}
+
+void __weak crash_unmap_reserved_pages(void)
+{}
+
 #ifdef CONFIG_COMPAT
 asmlinkage long compat_sys_kexec_load(unsigned long entry,
 				unsigned long nr_segments,
@@ -1134,14 +1149,16 @@ int crash_shrink_memory(unsigned long ne
 		goto unlock;
 	}
 
-	start = roundup(start, PAGE_SIZE);
-	end = roundup(start + new_size, PAGE_SIZE);
+	start = roundup(start, KEXEC_CRASH_MEM_ALIGN);
+	end = roundup(start + new_size, KEXEC_CRASH_MEM_ALIGN);
 
+	crash_map_reserved_pages();
 	crash_free_reserved_phys_range(end, crashk_res.end);
 
 	if ((start == end) && (crashk_res.parent != NULL))
 		release_resource(&crashk_res);
 	crashk_res.end = end - 1;
+	crash_unmap_reserved_pages();
 
 unlock:
 	mutex_unlock(&kexec_mutex);



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][patch 1/2] kdump: Add infrastructure for unmapping crashkernel memory
@ 2011-09-12 15:55       ` Michael Holzheu
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Holzheu @ 2011-09-12 15:55 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-s390, mahesh, heiko.carstens, linux-kernel, ebiederm,
	schwidefsky, kexec

Hello Vivek,

On Fri, 2011-09-09 at 14:23 -0400, Vivek Goyal wrote:
> On Thu, Sep 08, 2011 at 03:26:10PM +0200, Michael Holzheu wrote:
> > From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> > 
> > This patch introduces a mechanism that allows architecture backends to
> > remove page tables for the crashkernel memory. This can protect the loaded
> > kdump kernel from being overwritten by broken kernel code.
> > A new function crash_map_pages() is added that can be implemented by
> > architecture code. This function has the following syntax:
> 
> I guess having separate functions for mapping and unmapping pages might
> look cleaner. Because we are not passing a page range, so specifying
> what pages we are talking about in function name might make it more
> clear.
> 
> crash_map_reserved_pages()
> crash_unmap_reserved_pages().

Ok fine, no problem.

> Secondly, what happens to the code which runs after crash (crash_kexec()).
> Current x86 code assumes that reserved region is mapped at the time of
> crash and does few things with control page there. 

For s390, purgatory code can run in real mode. No page tables are
required.

> So this generic approach is not valid atleast for x86, because it does
> not tackle the scenario about how to map reserved range again once 
> kernel crashes. It will only work if there is assumption that after
> a crash, we don't expect reserved range/pages to be mapped.

All architectures that support unmapping of crashkernel memory have to
deal with this problem somehow. Either remap the crashkernel memory in
machine_kexec() again or be able to run in real mode.

I adjusted that patch regarding your comment above. Will the following
patch be ok for you?
---
Subject: kdump: Add infrastructure for unmapping crashkernel memory

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

This patch introduces a mechanism that allows architecture backends to
remove page tables for the crashkernel memory. This can protect the loaded
kdump kernel from being overwritten by broken kernel code.  Two new
functions crash_map_reserved_pages() and crash_unmap_reserved_pages() are
added that can be implemented by architecture code.  The
crash_map_reserved_pages() function is called before and
crash_unmap_reserved_pages() after the crashkernel segments are loaded.  The
functions are also called in crash_shrink_memory() to create/remove page
tables when the crashkernel memory size is reduced.

To support architectures that have large pages this patch also introduces
a new define KEXEC_CRASH_MEM_ALIGN. The crashkernel start and size must 
always be aligned with KEXEC_CRASH_MEM_ALIGN.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 include/linux/kexec.h |    6 ++++++
 kernel/kexec.c        |   21 +++++++++++++++++++--
 2 files changed, 25 insertions(+), 2 deletions(-)

--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -37,6 +37,10 @@
 #define KEXEC_CRASH_CONTROL_MEMORY_LIMIT KEXEC_CONTROL_MEMORY_LIMIT
 #endif
 
+#ifndef KEXEC_CRASH_MEM_ALIGN
+#define KEXEC_CRASH_MEM_ALIGN PAGE_SIZE
+#endif
+
 #define KEXEC_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4)
 #define KEXEC_CORE_NOTE_NAME "CORE"
 #define KEXEC_CORE_NOTE_NAME_BYTES ALIGN(sizeof(KEXEC_CORE_NOTE_NAME), 4)
@@ -133,6 +137,8 @@ extern void crash_kexec(struct pt_regs *
 int kexec_should_crash(struct task_struct *);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
 void crash_save_vmcoreinfo(void);
+void crash_map_reserved_pages(void);
+void crash_unmap_reserved_pages(void);
 void arch_crash_save_vmcoreinfo(void);
 void vmcoreinfo_append_str(const char *fmt, ...)
 	__attribute__ ((format (printf, 1, 2)));
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -999,6 +999,7 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
 			kimage_free(xchg(&kexec_crash_image, NULL));
 			result = kimage_crash_alloc(&image, entry,
 						     nr_segments, segments);
+			crash_map_reserved_pages();
 		}
 		if (result)
 			goto out;
@@ -1015,6 +1016,8 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
 				goto out;
 		}
 		kimage_terminate(image);
+		if (flags & KEXEC_ON_CRASH)
+			crash_unmap_reserved_pages();
 	}
 	/* Install the new kernel, and  Uninstall the old */
 	image = xchg(dest_image, image);
@@ -1026,6 +1029,18 @@ out:
 	return result;
 }
 
+/*
+ * Add and remove page tables for crashkernel memory
+ *
+ * Provide an empty default implementation here -- architecture
+ * code may override this
+ */
+void __weak crash_map_reserved_pages(void)
+{}
+
+void __weak crash_unmap_reserved_pages(void)
+{}
+
 #ifdef CONFIG_COMPAT
 asmlinkage long compat_sys_kexec_load(unsigned long entry,
 				unsigned long nr_segments,
@@ -1134,14 +1149,16 @@ int crash_shrink_memory(unsigned long ne
 		goto unlock;
 	}
 
-	start = roundup(start, PAGE_SIZE);
-	end = roundup(start + new_size, PAGE_SIZE);
+	start = roundup(start, KEXEC_CRASH_MEM_ALIGN);
+	end = roundup(start + new_size, KEXEC_CRASH_MEM_ALIGN);
 
+	crash_map_reserved_pages();
 	crash_free_reserved_phys_range(end, crashk_res.end);
 
 	if ((start == end) && (crashk_res.parent != NULL))
 		release_resource(&crashk_res);
 	crashk_res.end = end - 1;
+	crash_unmap_reserved_pages();
 
 unlock:
 	mutex_unlock(&kexec_mutex);



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][patch 1/2] kdump: Add infrastructure for unmapping crashkernel memory
  2011-09-12 15:55       ` Michael Holzheu
@ 2011-09-13 13:11         ` Vivek Goyal
  -1 siblings, 0 replies; 14+ messages in thread
From: Vivek Goyal @ 2011-09-13 13:11 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: ebiederm, mahesh, schwidefsky, heiko.carstens, kexec,
	linux-kernel, linux-s390

On Mon, Sep 12, 2011 at 05:55:02PM +0200, Michael Holzheu wrote:
> Hello Vivek,
> 
> On Fri, 2011-09-09 at 14:23 -0400, Vivek Goyal wrote:
> > On Thu, Sep 08, 2011 at 03:26:10PM +0200, Michael Holzheu wrote:
> > > From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> > > 
> > > This patch introduces a mechanism that allows architecture backends to
> > > remove page tables for the crashkernel memory. This can protect the loaded
> > > kdump kernel from being overwritten by broken kernel code.
> > > A new function crash_map_pages() is added that can be implemented by
> > > architecture code. This function has the following syntax:
> > 
> > I guess having separate functions for mapping and unmapping pages might
> > look cleaner. Because we are not passing a page range, so specifying
> > what pages we are talking about in function name might make it more
> > clear.
> > 
> > crash_map_reserved_pages()
> > crash_unmap_reserved_pages().
> 
> Ok fine, no problem.
> 
> > Secondly, what happens to the code which runs after crash (crash_kexec()).
> > Current x86 code assumes that reserved region is mapped at the time of
> > crash and does few things with control page there. 
> 
> For s390, purgatory code can run in real mode. No page tables are
> required.
> 
> > So this generic approach is not valid atleast for x86, because it does
> > not tackle the scenario about how to map reserved range again once 
> > kernel crashes. It will only work if there is assumption that after
> > a crash, we don't expect reserved range/pages to be mapped.
> 
> All architectures that support unmapping of crashkernel memory have to
> deal with this problem somehow. Either remap the crashkernel memory in
> machine_kexec() again or be able to run in real mode.
> 
> I adjusted that patch regarding your comment above. Will the following
> patch be ok for you?

I guess it is fine. Atleast conceptually it makes sense to unmap
the reserved memory pages to avoid accidental corruption from some
kernel code. It does not take care of some kind of DMA happening
to this memory location though.

I am not an memory management expert so not sure if this is best way
to call unmap/map pages for some calls already exist which can do
the job.

Anyway, this change is not visible to user and changes can be done
later also without impacting anything else so it is a low risk change
that way.

So yes, please repost the series (as you need to change second patch
also to reflect new function names). I will ack the first patch.

Thanks
Vivek 


> ---
> Subject: kdump: Add infrastructure for unmapping crashkernel memory
> 
> From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> 
> This patch introduces a mechanism that allows architecture backends to
> remove page tables for the crashkernel memory. This can protect the loaded
> kdump kernel from being overwritten by broken kernel code.  Two new
> functions crash_map_reserved_pages() and crash_unmap_reserved_pages() are
> added that can be implemented by architecture code.  The
> crash_map_reserved_pages() function is called before and
> crash_unmap_reserved_pages() after the crashkernel segments are loaded.  The
> functions are also called in crash_shrink_memory() to create/remove page
> tables when the crashkernel memory size is reduced.
> 
> To support architectures that have large pages this patch also introduces
> a new define KEXEC_CRASH_MEM_ALIGN. The crashkernel start and size must 
> always be aligned with KEXEC_CRASH_MEM_ALIGN.
> 
> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> ---
>  include/linux/kexec.h |    6 ++++++
>  kernel/kexec.c        |   21 +++++++++++++++++++--
>  2 files changed, 25 insertions(+), 2 deletions(-)
> 
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -37,6 +37,10 @@
>  #define KEXEC_CRASH_CONTROL_MEMORY_LIMIT KEXEC_CONTROL_MEMORY_LIMIT
>  #endif
>  
> +#ifndef KEXEC_CRASH_MEM_ALIGN
> +#define KEXEC_CRASH_MEM_ALIGN PAGE_SIZE
> +#endif
> +
>  #define KEXEC_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4)
>  #define KEXEC_CORE_NOTE_NAME "CORE"
>  #define KEXEC_CORE_NOTE_NAME_BYTES ALIGN(sizeof(KEXEC_CORE_NOTE_NAME), 4)
> @@ -133,6 +137,8 @@ extern void crash_kexec(struct pt_regs *
>  int kexec_should_crash(struct task_struct *);
>  void crash_save_cpu(struct pt_regs *regs, int cpu);
>  void crash_save_vmcoreinfo(void);
> +void crash_map_reserved_pages(void);
> +void crash_unmap_reserved_pages(void);
>  void arch_crash_save_vmcoreinfo(void);
>  void vmcoreinfo_append_str(const char *fmt, ...)
>  	__attribute__ ((format (printf, 1, 2)));
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -999,6 +999,7 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
>  			kimage_free(xchg(&kexec_crash_image, NULL));
>  			result = kimage_crash_alloc(&image, entry,
>  						     nr_segments, segments);
> +			crash_map_reserved_pages();
>  		}
>  		if (result)
>  			goto out;
> @@ -1015,6 +1016,8 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
>  				goto out;
>  		}
>  		kimage_terminate(image);
> +		if (flags & KEXEC_ON_CRASH)
> +			crash_unmap_reserved_pages();
>  	}
>  	/* Install the new kernel, and  Uninstall the old */
>  	image = xchg(dest_image, image);
> @@ -1026,6 +1029,18 @@ out:
>  	return result;
>  }
>  
> +/*
> + * Add and remove page tables for crashkernel memory
> + *
> + * Provide an empty default implementation here -- architecture
> + * code may override this
> + */
> +void __weak crash_map_reserved_pages(void)
> +{}
> +
> +void __weak crash_unmap_reserved_pages(void)
> +{}
> +
>  #ifdef CONFIG_COMPAT
>  asmlinkage long compat_sys_kexec_load(unsigned long entry,
>  				unsigned long nr_segments,
> @@ -1134,14 +1149,16 @@ int crash_shrink_memory(unsigned long ne
>  		goto unlock;
>  	}
>  
> -	start = roundup(start, PAGE_SIZE);
> -	end = roundup(start + new_size, PAGE_SIZE);
> +	start = roundup(start, KEXEC_CRASH_MEM_ALIGN);
> +	end = roundup(start + new_size, KEXEC_CRASH_MEM_ALIGN);
>  
> +	crash_map_reserved_pages();
>  	crash_free_reserved_phys_range(end, crashk_res.end);
>  
>  	if ((start == end) && (crashk_res.parent != NULL))
>  		release_resource(&crashk_res);
>  	crashk_res.end = end - 1;
> +	crash_unmap_reserved_pages();
>  
>  unlock:
>  	mutex_unlock(&kexec_mutex);
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][patch 1/2] kdump: Add infrastructure for unmapping crashkernel memory
@ 2011-09-13 13:11         ` Vivek Goyal
  0 siblings, 0 replies; 14+ messages in thread
From: Vivek Goyal @ 2011-09-13 13:11 UTC (permalink / raw)
  To: Michael Holzheu
  Cc: linux-s390, mahesh, heiko.carstens, linux-kernel, ebiederm,
	schwidefsky, kexec

On Mon, Sep 12, 2011 at 05:55:02PM +0200, Michael Holzheu wrote:
> Hello Vivek,
> 
> On Fri, 2011-09-09 at 14:23 -0400, Vivek Goyal wrote:
> > On Thu, Sep 08, 2011 at 03:26:10PM +0200, Michael Holzheu wrote:
> > > From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> > > 
> > > This patch introduces a mechanism that allows architecture backends to
> > > remove page tables for the crashkernel memory. This can protect the loaded
> > > kdump kernel from being overwritten by broken kernel code.
> > > A new function crash_map_pages() is added that can be implemented by
> > > architecture code. This function has the following syntax:
> > 
> > I guess having separate functions for mapping and unmapping pages might
> > look cleaner. Because we are not passing a page range, so specifying
> > what pages we are talking about in function name might make it more
> > clear.
> > 
> > crash_map_reserved_pages()
> > crash_unmap_reserved_pages().
> 
> Ok fine, no problem.
> 
> > Secondly, what happens to the code which runs after crash (crash_kexec()).
> > Current x86 code assumes that reserved region is mapped at the time of
> > crash and does few things with control page there. 
> 
> For s390, purgatory code can run in real mode. No page tables are
> required.
> 
> > So this generic approach is not valid atleast for x86, because it does
> > not tackle the scenario about how to map reserved range again once 
> > kernel crashes. It will only work if there is assumption that after
> > a crash, we don't expect reserved range/pages to be mapped.
> 
> All architectures that support unmapping of crashkernel memory have to
> deal with this problem somehow. Either remap the crashkernel memory in
> machine_kexec() again or be able to run in real mode.
> 
> I adjusted that patch regarding your comment above. Will the following
> patch be ok for you?

I guess it is fine. Atleast conceptually it makes sense to unmap
the reserved memory pages to avoid accidental corruption from some
kernel code. It does not take care of some kind of DMA happening
to this memory location though.

I am not an memory management expert so not sure if this is best way
to call unmap/map pages for some calls already exist which can do
the job.

Anyway, this change is not visible to user and changes can be done
later also without impacting anything else so it is a low risk change
that way.

So yes, please repost the series (as you need to change second patch
also to reflect new function names). I will ack the first patch.

Thanks
Vivek 


> ---
> Subject: kdump: Add infrastructure for unmapping crashkernel memory
> 
> From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> 
> This patch introduces a mechanism that allows architecture backends to
> remove page tables for the crashkernel memory. This can protect the loaded
> kdump kernel from being overwritten by broken kernel code.  Two new
> functions crash_map_reserved_pages() and crash_unmap_reserved_pages() are
> added that can be implemented by architecture code.  The
> crash_map_reserved_pages() function is called before and
> crash_unmap_reserved_pages() after the crashkernel segments are loaded.  The
> functions are also called in crash_shrink_memory() to create/remove page
> tables when the crashkernel memory size is reduced.
> 
> To support architectures that have large pages this patch also introduces
> a new define KEXEC_CRASH_MEM_ALIGN. The crashkernel start and size must 
> always be aligned with KEXEC_CRASH_MEM_ALIGN.
> 
> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> ---
>  include/linux/kexec.h |    6 ++++++
>  kernel/kexec.c        |   21 +++++++++++++++++++--
>  2 files changed, 25 insertions(+), 2 deletions(-)
> 
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -37,6 +37,10 @@
>  #define KEXEC_CRASH_CONTROL_MEMORY_LIMIT KEXEC_CONTROL_MEMORY_LIMIT
>  #endif
>  
> +#ifndef KEXEC_CRASH_MEM_ALIGN
> +#define KEXEC_CRASH_MEM_ALIGN PAGE_SIZE
> +#endif
> +
>  #define KEXEC_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4)
>  #define KEXEC_CORE_NOTE_NAME "CORE"
>  #define KEXEC_CORE_NOTE_NAME_BYTES ALIGN(sizeof(KEXEC_CORE_NOTE_NAME), 4)
> @@ -133,6 +137,8 @@ extern void crash_kexec(struct pt_regs *
>  int kexec_should_crash(struct task_struct *);
>  void crash_save_cpu(struct pt_regs *regs, int cpu);
>  void crash_save_vmcoreinfo(void);
> +void crash_map_reserved_pages(void);
> +void crash_unmap_reserved_pages(void);
>  void arch_crash_save_vmcoreinfo(void);
>  void vmcoreinfo_append_str(const char *fmt, ...)
>  	__attribute__ ((format (printf, 1, 2)));
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -999,6 +999,7 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
>  			kimage_free(xchg(&kexec_crash_image, NULL));
>  			result = kimage_crash_alloc(&image, entry,
>  						     nr_segments, segments);
> +			crash_map_reserved_pages();
>  		}
>  		if (result)
>  			goto out;
> @@ -1015,6 +1016,8 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
>  				goto out;
>  		}
>  		kimage_terminate(image);
> +		if (flags & KEXEC_ON_CRASH)
> +			crash_unmap_reserved_pages();
>  	}
>  	/* Install the new kernel, and  Uninstall the old */
>  	image = xchg(dest_image, image);
> @@ -1026,6 +1029,18 @@ out:
>  	return result;
>  }
>  
> +/*
> + * Add and remove page tables for crashkernel memory
> + *
> + * Provide an empty default implementation here -- architecture
> + * code may override this
> + */
> +void __weak crash_map_reserved_pages(void)
> +{}
> +
> +void __weak crash_unmap_reserved_pages(void)
> +{}
> +
>  #ifdef CONFIG_COMPAT
>  asmlinkage long compat_sys_kexec_load(unsigned long entry,
>  				unsigned long nr_segments,
> @@ -1134,14 +1149,16 @@ int crash_shrink_memory(unsigned long ne
>  		goto unlock;
>  	}
>  
> -	start = roundup(start, PAGE_SIZE);
> -	end = roundup(start + new_size, PAGE_SIZE);
> +	start = roundup(start, KEXEC_CRASH_MEM_ALIGN);
> +	end = roundup(start + new_size, KEXEC_CRASH_MEM_ALIGN);
>  
> +	crash_map_reserved_pages();
>  	crash_free_reserved_phys_range(end, crashk_res.end);
>  
>  	if ((start == end) && (crashk_res.parent != NULL))
>  		release_resource(&crashk_res);
>  	crashk_res.end = end - 1;
> +	crash_unmap_reserved_pages();
>  
>  unlock:
>  	mutex_unlock(&kexec_mutex);
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-09-13 13:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-08 13:26 [RFC][patch 0/2] kdump: Allow removal of page tables for crashkernel memory Michael Holzheu
2011-09-08 13:26 ` Michael Holzheu
2011-09-08 13:26 ` [RFC][patch 1/2] kdump: Add infrastructure for unmapping " Michael Holzheu
2011-09-08 13:26   ` Michael Holzheu
2011-09-09 18:23   ` Vivek Goyal
2011-09-09 18:23     ` Vivek Goyal
2011-09-12 15:55     ` Michael Holzheu
2011-09-12 15:55       ` Michael Holzheu
2011-09-13 13:11       ` Vivek Goyal
2011-09-13 13:11         ` Vivek Goyal
2011-09-09 19:30   ` Vivek Goyal
2011-09-09 19:30     ` Vivek Goyal
2011-09-08 13:26 ` [RFC][patch 2/2] s390: Add architecture code " Michael Holzheu
2011-09-08 13:26   ` Michael Holzheu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.