All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: ebiederm@xmission.com, mahesh@linux.vnet.ibm.com,
	schwidefsky@de.ibm.com, heiko.carstens@de.ibm.com,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	linux-s390@vger.kernel.org
Subject: Re: [RFC][patch 1/2] kdump: Add infrastructure for unmapping crashkernel memory
Date: Mon, 12 Sep 2011 17:55:02 +0200	[thread overview]
Message-ID: <1315842902.3602.9.camel@br98xy6r> (raw)
In-Reply-To: <20110909182325.GD15748@redhat.com>

Hello Vivek,

On Fri, 2011-09-09 at 14:23 -0400, Vivek Goyal wrote:
> On Thu, Sep 08, 2011 at 03:26:10PM +0200, Michael Holzheu wrote:
> > From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> > 
> > This patch introduces a mechanism that allows architecture backends to
> > remove page tables for the crashkernel memory. This can protect the loaded
> > kdump kernel from being overwritten by broken kernel code.
> > A new function crash_map_pages() is added that can be implemented by
> > architecture code. This function has the following syntax:
> 
> I guess having separate functions for mapping and unmapping pages might
> look cleaner. Because we are not passing a page range, so specifying
> what pages we are talking about in function name might make it more
> clear.
> 
> crash_map_reserved_pages()
> crash_unmap_reserved_pages().

Ok fine, no problem.

> Secondly, what happens to the code which runs after crash (crash_kexec()).
> Current x86 code assumes that reserved region is mapped at the time of
> crash and does few things with control page there. 

For s390, purgatory code can run in real mode. No page tables are
required.

> So this generic approach is not valid atleast for x86, because it does
> not tackle the scenario about how to map reserved range again once 
> kernel crashes. It will only work if there is assumption that after
> a crash, we don't expect reserved range/pages to be mapped.

All architectures that support unmapping of crashkernel memory have to
deal with this problem somehow. Either remap the crashkernel memory in
machine_kexec() again or be able to run in real mode.

I adjusted that patch regarding your comment above. Will the following
patch be ok for you?
---
Subject: kdump: Add infrastructure for unmapping crashkernel memory

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

This patch introduces a mechanism that allows architecture backends to
remove page tables for the crashkernel memory. This can protect the loaded
kdump kernel from being overwritten by broken kernel code.  Two new
functions crash_map_reserved_pages() and crash_unmap_reserved_pages() are
added that can be implemented by architecture code.  The
crash_map_reserved_pages() function is called before and
crash_unmap_reserved_pages() after the crashkernel segments are loaded.  The
functions are also called in crash_shrink_memory() to create/remove page
tables when the crashkernel memory size is reduced.

To support architectures that have large pages this patch also introduces
a new define KEXEC_CRASH_MEM_ALIGN. The crashkernel start and size must 
always be aligned with KEXEC_CRASH_MEM_ALIGN.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 include/linux/kexec.h |    6 ++++++
 kernel/kexec.c        |   21 +++++++++++++++++++--
 2 files changed, 25 insertions(+), 2 deletions(-)

--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -37,6 +37,10 @@
 #define KEXEC_CRASH_CONTROL_MEMORY_LIMIT KEXEC_CONTROL_MEMORY_LIMIT
 #endif
 
+#ifndef KEXEC_CRASH_MEM_ALIGN
+#define KEXEC_CRASH_MEM_ALIGN PAGE_SIZE
+#endif
+
 #define KEXEC_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4)
 #define KEXEC_CORE_NOTE_NAME "CORE"
 #define KEXEC_CORE_NOTE_NAME_BYTES ALIGN(sizeof(KEXEC_CORE_NOTE_NAME), 4)
@@ -133,6 +137,8 @@ extern void crash_kexec(struct pt_regs *
 int kexec_should_crash(struct task_struct *);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
 void crash_save_vmcoreinfo(void);
+void crash_map_reserved_pages(void);
+void crash_unmap_reserved_pages(void);
 void arch_crash_save_vmcoreinfo(void);
 void vmcoreinfo_append_str(const char *fmt, ...)
 	__attribute__ ((format (printf, 1, 2)));
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -999,6 +999,7 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
 			kimage_free(xchg(&kexec_crash_image, NULL));
 			result = kimage_crash_alloc(&image, entry,
 						     nr_segments, segments);
+			crash_map_reserved_pages();
 		}
 		if (result)
 			goto out;
@@ -1015,6 +1016,8 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
 				goto out;
 		}
 		kimage_terminate(image);
+		if (flags & KEXEC_ON_CRASH)
+			crash_unmap_reserved_pages();
 	}
 	/* Install the new kernel, and  Uninstall the old */
 	image = xchg(dest_image, image);
@@ -1026,6 +1029,18 @@ out:
 	return result;
 }
 
+/*
+ * Add and remove page tables for crashkernel memory
+ *
+ * Provide an empty default implementation here -- architecture
+ * code may override this
+ */
+void __weak crash_map_reserved_pages(void)
+{}
+
+void __weak crash_unmap_reserved_pages(void)
+{}
+
 #ifdef CONFIG_COMPAT
 asmlinkage long compat_sys_kexec_load(unsigned long entry,
 				unsigned long nr_segments,
@@ -1134,14 +1149,16 @@ int crash_shrink_memory(unsigned long ne
 		goto unlock;
 	}
 
-	start = roundup(start, PAGE_SIZE);
-	end = roundup(start + new_size, PAGE_SIZE);
+	start = roundup(start, KEXEC_CRASH_MEM_ALIGN);
+	end = roundup(start + new_size, KEXEC_CRASH_MEM_ALIGN);
 
+	crash_map_reserved_pages();
 	crash_free_reserved_phys_range(end, crashk_res.end);
 
 	if ((start == end) && (crashk_res.parent != NULL))
 		release_resource(&crashk_res);
 	crashk_res.end = end - 1;
+	crash_unmap_reserved_pages();
 
 unlock:
 	mutex_unlock(&kexec_mutex);



WARNING: multiple messages have this Message-ID (diff)
From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: linux-s390@vger.kernel.org, mahesh@linux.vnet.ibm.com,
	heiko.carstens@de.ibm.com, linux-kernel@vger.kernel.org,
	ebiederm@xmission.com, schwidefsky@de.ibm.com,
	kexec@lists.infradead.org
Subject: Re: [RFC][patch 1/2] kdump: Add infrastructure for unmapping crashkernel memory
Date: Mon, 12 Sep 2011 17:55:02 +0200	[thread overview]
Message-ID: <1315842902.3602.9.camel@br98xy6r> (raw)
In-Reply-To: <20110909182325.GD15748@redhat.com>

Hello Vivek,

On Fri, 2011-09-09 at 14:23 -0400, Vivek Goyal wrote:
> On Thu, Sep 08, 2011 at 03:26:10PM +0200, Michael Holzheu wrote:
> > From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> > 
> > This patch introduces a mechanism that allows architecture backends to
> > remove page tables for the crashkernel memory. This can protect the loaded
> > kdump kernel from being overwritten by broken kernel code.
> > A new function crash_map_pages() is added that can be implemented by
> > architecture code. This function has the following syntax:
> 
> I guess having separate functions for mapping and unmapping pages might
> look cleaner. Because we are not passing a page range, so specifying
> what pages we are talking about in function name might make it more
> clear.
> 
> crash_map_reserved_pages()
> crash_unmap_reserved_pages().

Ok fine, no problem.

> Secondly, what happens to the code which runs after crash (crash_kexec()).
> Current x86 code assumes that reserved region is mapped at the time of
> crash and does few things with control page there. 

For s390, purgatory code can run in real mode. No page tables are
required.

> So this generic approach is not valid atleast for x86, because it does
> not tackle the scenario about how to map reserved range again once 
> kernel crashes. It will only work if there is assumption that after
> a crash, we don't expect reserved range/pages to be mapped.

All architectures that support unmapping of crashkernel memory have to
deal with this problem somehow. Either remap the crashkernel memory in
machine_kexec() again or be able to run in real mode.

I adjusted that patch regarding your comment above. Will the following
patch be ok for you?
---
Subject: kdump: Add infrastructure for unmapping crashkernel memory

From: Michael Holzheu <holzheu@linux.vnet.ibm.com>

This patch introduces a mechanism that allows architecture backends to
remove page tables for the crashkernel memory. This can protect the loaded
kdump kernel from being overwritten by broken kernel code.  Two new
functions crash_map_reserved_pages() and crash_unmap_reserved_pages() are
added that can be implemented by architecture code.  The
crash_map_reserved_pages() function is called before and
crash_unmap_reserved_pages() after the crashkernel segments are loaded.  The
functions are also called in crash_shrink_memory() to create/remove page
tables when the crashkernel memory size is reduced.

To support architectures that have large pages this patch also introduces
a new define KEXEC_CRASH_MEM_ALIGN. The crashkernel start and size must 
always be aligned with KEXEC_CRASH_MEM_ALIGN.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 include/linux/kexec.h |    6 ++++++
 kernel/kexec.c        |   21 +++++++++++++++++++--
 2 files changed, 25 insertions(+), 2 deletions(-)

--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -37,6 +37,10 @@
 #define KEXEC_CRASH_CONTROL_MEMORY_LIMIT KEXEC_CONTROL_MEMORY_LIMIT
 #endif
 
+#ifndef KEXEC_CRASH_MEM_ALIGN
+#define KEXEC_CRASH_MEM_ALIGN PAGE_SIZE
+#endif
+
 #define KEXEC_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4)
 #define KEXEC_CORE_NOTE_NAME "CORE"
 #define KEXEC_CORE_NOTE_NAME_BYTES ALIGN(sizeof(KEXEC_CORE_NOTE_NAME), 4)
@@ -133,6 +137,8 @@ extern void crash_kexec(struct pt_regs *
 int kexec_should_crash(struct task_struct *);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
 void crash_save_vmcoreinfo(void);
+void crash_map_reserved_pages(void);
+void crash_unmap_reserved_pages(void);
 void arch_crash_save_vmcoreinfo(void);
 void vmcoreinfo_append_str(const char *fmt, ...)
 	__attribute__ ((format (printf, 1, 2)));
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -999,6 +999,7 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
 			kimage_free(xchg(&kexec_crash_image, NULL));
 			result = kimage_crash_alloc(&image, entry,
 						     nr_segments, segments);
+			crash_map_reserved_pages();
 		}
 		if (result)
 			goto out;
@@ -1015,6 +1016,8 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon
 				goto out;
 		}
 		kimage_terminate(image);
+		if (flags & KEXEC_ON_CRASH)
+			crash_unmap_reserved_pages();
 	}
 	/* Install the new kernel, and  Uninstall the old */
 	image = xchg(dest_image, image);
@@ -1026,6 +1029,18 @@ out:
 	return result;
 }
 
+/*
+ * Add and remove page tables for crashkernel memory
+ *
+ * Provide an empty default implementation here -- architecture
+ * code may override this
+ */
+void __weak crash_map_reserved_pages(void)
+{}
+
+void __weak crash_unmap_reserved_pages(void)
+{}
+
 #ifdef CONFIG_COMPAT
 asmlinkage long compat_sys_kexec_load(unsigned long entry,
 				unsigned long nr_segments,
@@ -1134,14 +1149,16 @@ int crash_shrink_memory(unsigned long ne
 		goto unlock;
 	}
 
-	start = roundup(start, PAGE_SIZE);
-	end = roundup(start + new_size, PAGE_SIZE);
+	start = roundup(start, KEXEC_CRASH_MEM_ALIGN);
+	end = roundup(start + new_size, KEXEC_CRASH_MEM_ALIGN);
 
+	crash_map_reserved_pages();
 	crash_free_reserved_phys_range(end, crashk_res.end);
 
 	if ((start == end) && (crashk_res.parent != NULL))
 		release_resource(&crashk_res);
 	crashk_res.end = end - 1;
+	crash_unmap_reserved_pages();
 
 unlock:
 	mutex_unlock(&kexec_mutex);



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  reply	other threads:[~2011-09-12 15:55 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-08 13:26 [RFC][patch 0/2] kdump: Allow removal of page tables for crashkernel memory Michael Holzheu
2011-09-08 13:26 ` Michael Holzheu
2011-09-08 13:26 ` [RFC][patch 1/2] kdump: Add infrastructure for unmapping " Michael Holzheu
2011-09-08 13:26   ` Michael Holzheu
2011-09-09 18:23   ` Vivek Goyal
2011-09-09 18:23     ` Vivek Goyal
2011-09-12 15:55     ` Michael Holzheu [this message]
2011-09-12 15:55       ` Michael Holzheu
2011-09-13 13:11       ` Vivek Goyal
2011-09-13 13:11         ` Vivek Goyal
2011-09-09 19:30   ` Vivek Goyal
2011-09-09 19:30     ` Vivek Goyal
2011-09-08 13:26 ` [RFC][patch 2/2] s390: Add architecture code " Michael Holzheu
2011-09-08 13:26   ` Michael Holzheu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1315842902.3602.9.camel@br98xy6r \
    --to=holzheu@linux.vnet.ibm.com \
    --cc=ebiederm@xmission.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mahesh@linux.vnet.ibm.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.