All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2012-12-27  2:18 ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel


Hi,

This set of patches contains initial kexec/kdump implementation for Xen v3.
Currently only dom0 is supported, however, almost all infrustructure
required for domU support is ready.

Jan Beulich suggested to merge Xen x86 assembler code with baremetal x86 code.
This could simplify and reduce a bit size of kernel code. However, this solution
requires some changes in baremetal x86 code. First of all code which establishes
transition page table should be moved back from machine_kexec_$(BITS).c to
relocate_kernel_$(BITS).S. Another important thing which should be changed in that
case is format of page_list array. Xen kexec hypercall requires to alternate physical
addresses with virtual ones. These and other required stuff have not been done in that
version because I am not sure that solution will be accepted by kexec/kdump maintainers.
I hope that this email spark discussion about that topic.

Daniel

 arch/x86/Kconfig                     |    3 +
 arch/x86/include/asm/kexec.h         |   10 +-
 arch/x86/include/asm/xen/hypercall.h |    6 +
 arch/x86/include/asm/xen/kexec.h     |   79 ++++
 arch/x86/kernel/machine_kexec_64.c   |   12 +-
 arch/x86/kernel/vmlinux.lds.S        |    7 +-
 arch/x86/xen/Kconfig                 |    1 +
 arch/x86/xen/Makefile                |    3 +
 arch/x86/xen/enlighten.c             |   11 +
 arch/x86/xen/kexec.c                 |  150 +++++++
 arch/x86/xen/machine_kexec_32.c      |  226 +++++++++++
 arch/x86/xen/machine_kexec_64.c      |  318 +++++++++++++++
 arch/x86/xen/relocate_kernel_32.S    |  323 +++++++++++++++
 arch/x86/xen/relocate_kernel_64.S    |  309 ++++++++++++++
 drivers/xen/sys-hypervisor.c         |   42 ++-
 include/linux/kexec.h                |   26 ++-
 include/xen/interface/xen.h          |   33 ++
 kernel/Makefile                      |    1 +
 kernel/kexec-firmware.c              |  743 ++++++++++++++++++++++++++++++++++
 kernel/kexec.c                       |   46 ++-
 20 files changed, 2331 insertions(+), 18 deletions(-)

Daniel Kiper (11):
      kexec: introduce kexec firmware support
      x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
      xen: Introduce architecture independent data for kexec/kdump
      x86/xen: Introduce architecture dependent data for kexec/kdump
      x86/xen: Register resources required by kexec-tools
      x86/xen: Add i386 kexec/kdump implementation
      x86/xen: Add x86_64 kexec/kdump implementation
      x86/xen: Add kexec/kdump Kconfig and makefile rules
      x86/xen/enlighten: Add init and crash kexec/kdump hooks
      drivers/xen: Export vmcoreinfo through sysfs
      x86: Add Xen kexec control code size check to linker script

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2012-12-27  2:18 ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3-Sxgqhf6Nn4DQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	jbeulich-IBi9RG/b67k, konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
	maxim.uvarov-QHcLZuEGTsvQT0dZR+AlfA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR


Hi,

This set of patches contains initial kexec/kdump implementation for Xen v3.
Currently only dom0 is supported, however, almost all infrustructure
required for domU support is ready.

Jan Beulich suggested to merge Xen x86 assembler code with baremetal x86 code.
This could simplify and reduce a bit size of kernel code. However, this solution
requires some changes in baremetal x86 code. First of all code which establishes
transition page table should be moved back from machine_kexec_$(BITS).c to
relocate_kernel_$(BITS).S. Another important thing which should be changed in that
case is format of page_list array. Xen kexec hypercall requires to alternate physical
addresses with virtual ones. These and other required stuff have not been done in that
version because I am not sure that solution will be accepted by kexec/kdump maintainers.
I hope that this email spark discussion about that topic.

Daniel

 arch/x86/Kconfig                     |    3 +
 arch/x86/include/asm/kexec.h         |   10 +-
 arch/x86/include/asm/xen/hypercall.h |    6 +
 arch/x86/include/asm/xen/kexec.h     |   79 ++++
 arch/x86/kernel/machine_kexec_64.c   |   12 +-
 arch/x86/kernel/vmlinux.lds.S        |    7 +-
 arch/x86/xen/Kconfig                 |    1 +
 arch/x86/xen/Makefile                |    3 +
 arch/x86/xen/enlighten.c             |   11 +
 arch/x86/xen/kexec.c                 |  150 +++++++
 arch/x86/xen/machine_kexec_32.c      |  226 +++++++++++
 arch/x86/xen/machine_kexec_64.c      |  318 +++++++++++++++
 arch/x86/xen/relocate_kernel_32.S    |  323 +++++++++++++++
 arch/x86/xen/relocate_kernel_64.S    |  309 ++++++++++++++
 drivers/xen/sys-hypervisor.c         |   42 ++-
 include/linux/kexec.h                |   26 ++-
 include/xen/interface/xen.h          |   33 ++
 kernel/Makefile                      |    1 +
 kernel/kexec-firmware.c              |  743 ++++++++++++++++++++++++++++++++++
 kernel/kexec.c                       |   46 ++-
 20 files changed, 2331 insertions(+), 18 deletions(-)

Daniel Kiper (11):
      kexec: introduce kexec firmware support
      x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
      xen: Introduce architecture independent data for kexec/kdump
      x86/xen: Introduce architecture dependent data for kexec/kdump
      x86/xen: Register resources required by kexec-tools
      x86/xen: Add i386 kexec/kdump implementation
      x86/xen: Add x86_64 kexec/kdump implementation
      x86/xen: Add kexec/kdump Kconfig and makefile rules
      x86/xen/enlighten: Add init and crash kexec/kdump hooks
      drivers/xen: Export vmcoreinfo through sysfs
      x86: Add Xen kexec control code size check to linker script

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2012-12-27  2:18 ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel


Hi,

This set of patches contains initial kexec/kdump implementation for Xen v3.
Currently only dom0 is supported, however, almost all infrustructure
required for domU support is ready.

Jan Beulich suggested to merge Xen x86 assembler code with baremetal x86 code.
This could simplify and reduce a bit size of kernel code. However, this solution
requires some changes in baremetal x86 code. First of all code which establishes
transition page table should be moved back from machine_kexec_$(BITS).c to
relocate_kernel_$(BITS).S. Another important thing which should be changed in that
case is format of page_list array. Xen kexec hypercall requires to alternate physical
addresses with virtual ones. These and other required stuff have not been done in that
version because I am not sure that solution will be accepted by kexec/kdump maintainers.
I hope that this email spark discussion about that topic.

Daniel

 arch/x86/Kconfig                     |    3 +
 arch/x86/include/asm/kexec.h         |   10 +-
 arch/x86/include/asm/xen/hypercall.h |    6 +
 arch/x86/include/asm/xen/kexec.h     |   79 ++++
 arch/x86/kernel/machine_kexec_64.c   |   12 +-
 arch/x86/kernel/vmlinux.lds.S        |    7 +-
 arch/x86/xen/Kconfig                 |    1 +
 arch/x86/xen/Makefile                |    3 +
 arch/x86/xen/enlighten.c             |   11 +
 arch/x86/xen/kexec.c                 |  150 +++++++
 arch/x86/xen/machine_kexec_32.c      |  226 +++++++++++
 arch/x86/xen/machine_kexec_64.c      |  318 +++++++++++++++
 arch/x86/xen/relocate_kernel_32.S    |  323 +++++++++++++++
 arch/x86/xen/relocate_kernel_64.S    |  309 ++++++++++++++
 drivers/xen/sys-hypervisor.c         |   42 ++-
 include/linux/kexec.h                |   26 ++-
 include/xen/interface/xen.h          |   33 ++
 kernel/Makefile                      |    1 +
 kernel/kexec-firmware.c              |  743 ++++++++++++++++++++++++++++++++++
 kernel/kexec.c                       |   46 ++-
 20 files changed, 2331 insertions(+), 18 deletions(-)

Daniel Kiper (11):
      kexec: introduce kexec firmware support
      x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
      xen: Introduce architecture independent data for kexec/kdump
      x86/xen: Introduce architecture dependent data for kexec/kdump
      x86/xen: Register resources required by kexec-tools
      x86/xen: Add i386 kexec/kdump implementation
      x86/xen: Add x86_64 kexec/kdump implementation
      x86/xen: Add kexec/kdump Kconfig and makefile rules
      x86/xen/enlighten: Add init and crash kexec/kdump hooks
      drivers/xen: Export vmcoreinfo through sysfs
      x86: Add Xen kexec control code size check to linker script

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH v3 01/11] kexec: introduce kexec firmware support
@ 2012-12-27  2:18   ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default
Linux infrastructure and require some support from firmware and/or hypervisor.
To cope with that problem kexec firmware infrastructure was introduced.
It allows a developer to use all kexec/kdump features of given firmware
or hypervisor.

v3 - suggestions/fixes:
   - replace kexec_ops struct by kexec firmware infrastructure
     (suggested by Eric Biederman).

v2 - suggestions/fixes:
   - add comment for kexec_ops.crash_alloc_temp_store member
     (suggested by Konrad Rzeszutek Wilk),
   - simplify kexec_ops usage
     (suggested by Konrad Rzeszutek Wilk).

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 include/linux/kexec.h   |   26 ++-
 kernel/Makefile         |    1 +
 kernel/kexec-firmware.c |  743 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c          |   46 +++-
 4 files changed, 809 insertions(+), 7 deletions(-)
 create mode 100644 kernel/kexec-firmware.c

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index d0b8458..9568457 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -116,17 +116,34 @@ struct kimage {
 #endif
 };
 
-
-
 /* kexec interface functions */
 extern void machine_kexec(struct kimage *image);
 extern int machine_kexec_prepare(struct kimage *image);
 extern void machine_kexec_cleanup(struct kimage *image);
+extern struct page *mf_kexec_kimage_alloc_pages(gfp_t gfp_mask,
+						unsigned int order,
+						unsigned long limit);
+extern void mf_kexec_kimage_free_pages(struct page *page);
+extern unsigned long mf_kexec_page_to_pfn(struct page *page);
+extern struct page *mf_kexec_pfn_to_page(unsigned long mfn);
+extern unsigned long mf_kexec_virt_to_phys(volatile void *address);
+extern void *mf_kexec_phys_to_virt(unsigned long address);
+extern int mf_kexec_prepare(struct kimage *image);
+extern int mf_kexec_load(struct kimage *image);
+extern void mf_kexec_cleanup(struct kimage *image);
+extern void mf_kexec_unload(struct kimage *image);
+extern void mf_kexec_shutdown(void);
+extern void mf_kexec(struct kimage *image);
 extern asmlinkage long sys_kexec_load(unsigned long entry,
 					unsigned long nr_segments,
 					struct kexec_segment __user *segments,
 					unsigned long flags);
+extern long firmware_sys_kexec_load(unsigned long entry,
+					unsigned long nr_segments,
+					struct kexec_segment __user *segments,
+					unsigned long flags);
 extern int kernel_kexec(void);
+extern int firmware_kernel_kexec(void);
 #ifdef CONFIG_COMPAT
 extern asmlinkage long compat_sys_kexec_load(unsigned long entry,
 				unsigned long nr_segments,
@@ -135,7 +152,10 @@ extern asmlinkage long compat_sys_kexec_load(unsigned long entry,
 #endif
 extern struct page *kimage_alloc_control_pages(struct kimage *image,
 						unsigned int order);
+extern struct page *firmware_kimage_alloc_control_pages(struct kimage *image,
+							unsigned int order);
 extern void crash_kexec(struct pt_regs *);
+extern void firmware_crash_kexec(struct pt_regs *);
 int kexec_should_crash(struct task_struct *);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
 void crash_save_vmcoreinfo(void);
@@ -168,6 +188,8 @@ unsigned long paddr_vmcoreinfo_note(void);
 #define VMCOREINFO_CONFIG(name) \
 	vmcoreinfo_append_str("CONFIG_%s=y\n", #name)
 
+extern bool kexec_use_firmware;
+
 extern struct kimage *kexec_image;
 extern struct kimage *kexec_crash_image;
 
diff --git a/kernel/Makefile b/kernel/Makefile
index 6c072b6..bc96b2f 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -58,6 +58,7 @@ obj-$(CONFIG_MODULE_SIG) += module_signing.o modsign_pubkey.o modsign_certificat
 obj-$(CONFIG_KALLSYMS) += kallsyms.o
 obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o
 obj-$(CONFIG_KEXEC) += kexec.o
+obj-$(CONFIG_KEXEC_FIRMWARE) += kexec-firmware.o
 obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o
 obj-$(CONFIG_COMPAT) += compat.o
 obj-$(CONFIG_CGROUPS) += cgroup.o
diff --git a/kernel/kexec-firmware.c b/kernel/kexec-firmware.c
new file mode 100644
index 0000000..f6ddd4c
--- /dev/null
+++ b/kernel/kexec-firmware.c
@@ -0,0 +1,743 @@
+/*
+ * Copyright (C) 2002-2004 Eric Biederman  <ebiederm@xmission.com>
+ * Copyright (C) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * Most of the code here is a copy of kernel/kexec.c.
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
+ */
+
+#include <linux/atomic.h>
+#include <linux/errno.h>
+#include <linux/highmem.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/reboot.h>
+#include <linux/slab.h>
+
+#include <asm/uaccess.h>
+
+/*
+ * KIMAGE_NO_DEST is an impossible destination address..., for
+ * allocating pages whose destination address we do not care about.
+ */
+#define KIMAGE_NO_DEST (-1UL)
+
+static int kimage_is_destination_range(struct kimage *image,
+				       unsigned long start, unsigned long end);
+static struct page *kimage_alloc_page(struct kimage *image,
+				       gfp_t gfp_mask,
+				       unsigned long dest);
+
+static int do_kimage_alloc(struct kimage **rimage, unsigned long entry,
+	                    unsigned long nr_segments,
+                            struct kexec_segment __user *segments)
+{
+	size_t segment_bytes;
+	struct kimage *image;
+	unsigned long i;
+	int result;
+
+	/* Allocate a controlling structure */
+	result = -ENOMEM;
+	image = kzalloc(sizeof(*image), GFP_KERNEL);
+	if (!image)
+		goto out;
+
+	image->head = 0;
+	image->entry = &image->head;
+	image->last_entry = &image->head;
+	image->control_page = ~0; /* By default this does not apply */
+	image->start = entry;
+	image->type = KEXEC_TYPE_DEFAULT;
+
+	/* Initialize the list of control pages */
+	INIT_LIST_HEAD(&image->control_pages);
+
+	/* Initialize the list of destination pages */
+	INIT_LIST_HEAD(&image->dest_pages);
+
+	/* Initialize the list of unusable pages */
+	INIT_LIST_HEAD(&image->unuseable_pages);
+
+	/* Read in the segments */
+	image->nr_segments = nr_segments;
+	segment_bytes = nr_segments * sizeof(*segments);
+	result = copy_from_user(image->segment, segments, segment_bytes);
+	if (result) {
+		result = -EFAULT;
+		goto out;
+	}
+
+	/*
+	 * Verify we have good destination addresses.  The caller is
+	 * responsible for making certain we don't attempt to load
+	 * the new image into invalid or reserved areas of RAM.  This
+	 * just verifies it is an address we can use.
+	 *
+	 * Since the kernel does everything in page size chunks ensure
+	 * the destination addresses are page aligned.  Too many
+	 * special cases crop of when we don't do this.  The most
+	 * insidious is getting overlapping destination addresses
+	 * simply because addresses are changed to page size
+	 * granularity.
+	 */
+	result = -EADDRNOTAVAIL;
+	for (i = 0; i < nr_segments; i++) {
+		unsigned long mstart, mend;
+
+		mstart = image->segment[i].mem;
+		mend   = mstart + image->segment[i].memsz;
+		if ((mstart & ~PAGE_MASK) || (mend & ~PAGE_MASK))
+			goto out;
+		if (mend >= KEXEC_DESTINATION_MEMORY_LIMIT)
+			goto out;
+	}
+
+	/* Verify our destination addresses do not overlap.
+	 * If we alloed overlapping destination addresses
+	 * through very weird things can happen with no
+	 * easy explanation as one segment stops on another.
+	 */
+	result = -EINVAL;
+	for (i = 0; i < nr_segments; i++) {
+		unsigned long mstart, mend;
+		unsigned long j;
+
+		mstart = image->segment[i].mem;
+		mend   = mstart + image->segment[i].memsz;
+		for (j = 0; j < i; j++) {
+			unsigned long pstart, pend;
+			pstart = image->segment[j].mem;
+			pend   = pstart + image->segment[j].memsz;
+			/* Do the segments overlap ? */
+			if ((mend > pstart) && (mstart < pend))
+				goto out;
+		}
+	}
+
+	/* Ensure our buffer sizes are strictly less than
+	 * our memory sizes.  This should always be the case,
+	 * and it is easier to check up front than to be surprised
+	 * later on.
+	 */
+	result = -EINVAL;
+	for (i = 0; i < nr_segments; i++) {
+		if (image->segment[i].bufsz > image->segment[i].memsz)
+			goto out;
+	}
+
+	result = 0;
+out:
+	if (result == 0)
+		*rimage = image;
+	else
+		kfree(image);
+
+	return result;
+
+}
+
+static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry,
+				unsigned long nr_segments,
+				struct kexec_segment __user *segments)
+{
+	int result;
+	struct kimage *image;
+
+	/* Allocate and initialize a controlling structure */
+	image = NULL;
+	result = do_kimage_alloc(&image, entry, nr_segments, segments);
+	if (result)
+		goto out;
+
+	*rimage = image;
+
+	/*
+	 * Find a location for the control code buffer, and add it
+	 * the vector of segments so that it's pages will also be
+	 * counted as destination pages.
+	 */
+	result = -ENOMEM;
+	image->control_code_page = firmware_kimage_alloc_control_pages(image,
+					   get_order(KEXEC_CONTROL_PAGE_SIZE));
+	if (!image->control_code_page) {
+		printk(KERN_ERR "Could not allocate control_code_buffer\n");
+		goto out;
+	}
+
+	image->swap_page = firmware_kimage_alloc_control_pages(image, 0);
+	if (!image->swap_page) {
+		printk(KERN_ERR "Could not allocate swap buffer\n");
+		goto out;
+	}
+
+	result = 0;
+ out:
+	if (result == 0)
+		*rimage = image;
+	else
+		kfree(image);
+
+	return result;
+}
+
+static int kimage_crash_alloc(struct kimage **rimage, unsigned long entry,
+				unsigned long nr_segments,
+				struct kexec_segment __user *segments)
+{
+	int result;
+	struct kimage *image;
+	unsigned long i;
+
+	image = NULL;
+	/* Verify we have a valid entry point */
+	if ((entry < crashk_res.start) || (entry > crashk_res.end)) {
+		result = -EADDRNOTAVAIL;
+		goto out;
+	}
+
+	/* Allocate and initialize a controlling structure */
+	result = do_kimage_alloc(&image, entry, nr_segments, segments);
+	if (result)
+		goto out;
+
+	/* Enable the special crash kernel control page
+	 * allocation policy.
+	 */
+	image->control_page = crashk_res.start;
+	image->type = KEXEC_TYPE_CRASH;
+
+	/*
+	 * Verify we have good destination addresses.  Normally
+	 * the caller is responsible for making certain we don't
+	 * attempt to load the new image into invalid or reserved
+	 * areas of RAM.  But crash kernels are preloaded into a
+	 * reserved area of ram.  We must ensure the addresses
+	 * are in the reserved area otherwise preloading the
+	 * kernel could corrupt things.
+	 */
+	result = -EADDRNOTAVAIL;
+	for (i = 0; i < nr_segments; i++) {
+		unsigned long mstart, mend;
+
+		mstart = image->segment[i].mem;
+		mend = mstart + image->segment[i].memsz - 1;
+		/* Ensure we are within the crash kernel limits */
+		if ((mstart < crashk_res.start) || (mend > crashk_res.end))
+			goto out;
+	}
+
+	/*
+	 * Find a location for the control code buffer, and add
+	 * the vector of segments so that it's pages will also be
+	 * counted as destination pages.
+	 */
+	result = -ENOMEM;
+	image->control_code_page = firmware_kimage_alloc_control_pages(image,
+					   get_order(KEXEC_CONTROL_PAGE_SIZE));
+	if (!image->control_code_page) {
+		printk(KERN_ERR "Could not allocate control_code_buffer\n");
+		goto out;
+	}
+
+	result = 0;
+out:
+	if (result == 0)
+		*rimage = image;
+	else
+		kfree(image);
+
+	return result;
+}
+
+static int kimage_is_destination_range(struct kimage *image,
+					unsigned long start,
+					unsigned long end)
+{
+	unsigned long i;
+
+	for (i = 0; i < image->nr_segments; i++) {
+		unsigned long mstart, mend;
+
+		mstart = image->segment[i].mem;
+		mend = mstart + image->segment[i].memsz;
+		if ((end > mstart) && (start < mend))
+			return 1;
+	}
+
+	return 0;
+}
+
+static void kimage_free_page_list(struct list_head *list)
+{
+	struct list_head *pos, *next;
+
+	list_for_each_safe(pos, next, list) {
+		struct page *page;
+
+		page = list_entry(pos, struct page, lru);
+		list_del(&page->lru);
+		mf_kexec_kimage_free_pages(page);
+	}
+}
+
+static struct page *kimage_alloc_normal_control_pages(struct kimage *image,
+							unsigned int order)
+{
+	/* Control pages are special, they are the intermediaries
+	 * that are needed while we copy the rest of the pages
+	 * to their final resting place.  As such they must
+	 * not conflict with either the destination addresses
+	 * or memory the kernel is already using.
+	 *
+	 * The only case where we really need more than one of
+	 * these are for architectures where we cannot disable
+	 * the MMU and must instead generate an identity mapped
+	 * page table for all of the memory.
+	 *
+	 * At worst this runs in O(N) of the image size.
+	 */
+	struct list_head extra_pages;
+	struct page *pages;
+	unsigned int count;
+
+	count = 1 << order;
+	INIT_LIST_HEAD(&extra_pages);
+
+	/* Loop while I can allocate a page and the page allocated
+	 * is a destination page.
+	 */
+	do {
+		unsigned long pfn, epfn, addr, eaddr;
+
+		pages = mf_kexec_kimage_alloc_pages(GFP_KERNEL, order,
+							KEXEC_CONTROL_MEMORY_LIMIT);
+		if (!pages)
+			break;
+		pfn   = mf_kexec_page_to_pfn(pages);
+		epfn  = pfn + count;
+		addr  = pfn << PAGE_SHIFT;
+		eaddr = epfn << PAGE_SHIFT;
+		if ((epfn >= (KEXEC_CONTROL_MEMORY_LIMIT >> PAGE_SHIFT)) ||
+			      kimage_is_destination_range(image, addr, eaddr)) {
+			list_add(&pages->lru, &extra_pages);
+			pages = NULL;
+		}
+	} while (!pages);
+
+	if (pages) {
+		/* Remember the allocated page... */
+		list_add(&pages->lru, &image->control_pages);
+
+		/* Because the page is already in it's destination
+		 * location we will never allocate another page at
+		 * that address.  Therefore mf_kexec_kimage_alloc_pages
+		 * will not return it (again) and we don't need
+		 * to give it an entry in image->segment[].
+		 */
+	}
+	/* Deal with the destination pages I have inadvertently allocated.
+	 *
+	 * Ideally I would convert multi-page allocations into single
+	 * page allocations, and add everything to image->dest_pages.
+	 *
+	 * For now it is simpler to just free the pages.
+	 */
+	kimage_free_page_list(&extra_pages);
+
+	return pages;
+}
+
+struct page *firmware_kimage_alloc_control_pages(struct kimage *image,
+							unsigned int order)
+{
+	return kimage_alloc_normal_control_pages(image, order);
+}
+
+static int kimage_add_entry(struct kimage *image, kimage_entry_t entry)
+{
+	if (*image->entry != 0)
+		image->entry++;
+
+	if (image->entry == image->last_entry) {
+		kimage_entry_t *ind_page;
+		struct page *page;
+
+		page = kimage_alloc_page(image, GFP_KERNEL, KIMAGE_NO_DEST);
+		if (!page)
+			return -ENOMEM;
+
+		ind_page = page_address(page);
+		*image->entry = mf_kexec_virt_to_phys(ind_page) | IND_INDIRECTION;
+		image->entry = ind_page;
+		image->last_entry = ind_page +
+				      ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1);
+	}
+	*image->entry = entry;
+	image->entry++;
+	*image->entry = 0;
+
+	return 0;
+}
+
+static int kimage_set_destination(struct kimage *image,
+				   unsigned long destination)
+{
+	int result;
+
+	destination &= PAGE_MASK;
+	result = kimage_add_entry(image, destination | IND_DESTINATION);
+	if (result == 0)
+		image->destination = destination;
+
+	return result;
+}
+
+
+static int kimage_add_page(struct kimage *image, unsigned long page)
+{
+	int result;
+
+	page &= PAGE_MASK;
+	result = kimage_add_entry(image, page | IND_SOURCE);
+	if (result == 0)
+		image->destination += PAGE_SIZE;
+
+	return result;
+}
+
+
+static void kimage_free_extra_pages(struct kimage *image)
+{
+	/* Walk through and free any extra destination pages I may have */
+	kimage_free_page_list(&image->dest_pages);
+
+	/* Walk through and free any unusable pages I have cached */
+	kimage_free_page_list(&image->unuseable_pages);
+
+}
+static void kimage_terminate(struct kimage *image)
+{
+	if (*image->entry != 0)
+		image->entry++;
+
+	*image->entry = IND_DONE;
+}
+
+#define for_each_kimage_entry(image, ptr, entry) \
+	for (ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE); \
+		ptr = (entry & IND_INDIRECTION)? \
+			mf_kexec_phys_to_virt((entry & PAGE_MASK)): ptr +1)
+
+static void kimage_free_entry(kimage_entry_t entry)
+{
+	struct page *page;
+
+	page = mf_kexec_pfn_to_page(entry >> PAGE_SHIFT);
+	mf_kexec_kimage_free_pages(page);
+}
+
+static void kimage_free(struct kimage *image)
+{
+	kimage_entry_t *ptr, entry;
+	kimage_entry_t ind = 0;
+
+	if (!image)
+		return;
+
+	kimage_free_extra_pages(image);
+	for_each_kimage_entry(image, ptr, entry) {
+		if (entry & IND_INDIRECTION) {
+			/* Free the previous indirection page */
+			if (ind & IND_INDIRECTION)
+				kimage_free_entry(ind);
+			/* Save this indirection page until we are
+			 * done with it.
+			 */
+			ind = entry;
+		}
+		else if (entry & IND_SOURCE)
+			kimage_free_entry(entry);
+	}
+	/* Free the final indirection page */
+	if (ind & IND_INDIRECTION)
+		kimage_free_entry(ind);
+
+	/* Handle any machine specific cleanup */
+	mf_kexec_cleanup(image);
+
+	/* Free the kexec control pages... */
+	kimage_free_page_list(&image->control_pages);
+	kfree(image);
+}
+
+static kimage_entry_t *kimage_dst_used(struct kimage *image,
+					unsigned long page)
+{
+	kimage_entry_t *ptr, entry;
+	unsigned long destination = 0;
+
+	for_each_kimage_entry(image, ptr, entry) {
+		if (entry & IND_DESTINATION)
+			destination = entry & PAGE_MASK;
+		else if (entry & IND_SOURCE) {
+			if (page == destination)
+				return ptr;
+			destination += PAGE_SIZE;
+		}
+	}
+
+	return NULL;
+}
+
+static struct page *kimage_alloc_page(struct kimage *image,
+					gfp_t gfp_mask,
+					unsigned long destination)
+{
+	/*
+	 * Here we implement safeguards to ensure that a source page
+	 * is not copied to its destination page before the data on
+	 * the destination page is no longer useful.
+	 *
+	 * To do this we maintain the invariant that a source page is
+	 * either its own destination page, or it is not a
+	 * destination page at all.
+	 *
+	 * That is slightly stronger than required, but the proof
+	 * that no problems will not occur is trivial, and the
+	 * implementation is simply to verify.
+	 *
+	 * When allocating all pages normally this algorithm will run
+	 * in O(N) time, but in the worst case it will run in O(N^2)
+	 * time.   If the runtime is a problem the data structures can
+	 * be fixed.
+	 */
+	struct page *page;
+	unsigned long addr;
+
+	/*
+	 * Walk through the list of destination pages, and see if I
+	 * have a match.
+	 */
+	list_for_each_entry(page, &image->dest_pages, lru) {
+		addr = mf_kexec_page_to_pfn(page) << PAGE_SHIFT;
+		if (addr == destination) {
+			list_del(&page->lru);
+			return page;
+		}
+	}
+	page = NULL;
+	while (1) {
+		kimage_entry_t *old;
+
+		/* Allocate a page, if we run out of memory give up */
+		page = mf_kexec_kimage_alloc_pages(gfp_mask, 0,
+							KEXEC_SOURCE_MEMORY_LIMIT);
+		if (!page)
+			return NULL;
+		/* If the page cannot be used file it away */
+		if (mf_kexec_page_to_pfn(page) >
+				(KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) {
+			list_add(&page->lru, &image->unuseable_pages);
+			continue;
+		}
+		addr = mf_kexec_page_to_pfn(page) << PAGE_SHIFT;
+
+		/* If it is the destination page we want use it */
+		if (addr == destination)
+			break;
+
+		/* If the page is not a destination page use it */
+		if (!kimage_is_destination_range(image, addr,
+						  addr + PAGE_SIZE))
+			break;
+
+		/*
+		 * I know that the page is someones destination page.
+		 * See if there is already a source page for this
+		 * destination page.  And if so swap the source pages.
+		 */
+		old = kimage_dst_used(image, addr);
+		if (old) {
+			/* If so move it */
+			unsigned long old_addr;
+			struct page *old_page;
+
+			old_addr = *old & PAGE_MASK;
+			old_page = mf_kexec_pfn_to_page(old_addr >> PAGE_SHIFT);
+			copy_highpage(page, old_page);
+			*old = addr | (*old & ~PAGE_MASK);
+
+			/* The old page I have found cannot be a
+			 * destination page, so return it if it's
+			 * gfp_flags honor the ones passed in.
+			 */
+			if (!(gfp_mask & __GFP_HIGHMEM) &&
+			    PageHighMem(old_page)) {
+				mf_kexec_kimage_free_pages(old_page);
+				continue;
+			}
+			addr = old_addr;
+			page = old_page;
+			break;
+		}
+		else {
+			/* Place the page on the destination list I
+			 * will use it later.
+			 */
+			list_add(&page->lru, &image->dest_pages);
+		}
+	}
+
+	return page;
+}
+
+static int kimage_load_normal_segment(struct kimage *image,
+					 struct kexec_segment *segment)
+{
+	unsigned long maddr;
+	unsigned long ubytes, mbytes;
+	int result;
+	unsigned char __user *buf;
+
+	result = 0;
+	buf = segment->buf;
+	ubytes = segment->bufsz;
+	mbytes = segment->memsz;
+	maddr = segment->mem;
+
+	result = kimage_set_destination(image, maddr);
+	if (result < 0)
+		goto out;
+
+	while (mbytes) {
+		struct page *page;
+		char *ptr;
+		size_t uchunk, mchunk;
+
+		page = kimage_alloc_page(image, GFP_HIGHUSER, maddr);
+		if (!page) {
+			result  = -ENOMEM;
+			goto out;
+		}
+		result = kimage_add_page(image, mf_kexec_page_to_pfn(page)
+								<< PAGE_SHIFT);
+		if (result < 0)
+			goto out;
+
+		ptr = kmap(page);
+		/* Start with a clear page */
+		clear_page(ptr);
+		ptr += maddr & ~PAGE_MASK;
+		mchunk = PAGE_SIZE - (maddr & ~PAGE_MASK);
+		if (mchunk > mbytes)
+			mchunk = mbytes;
+
+		uchunk = mchunk;
+		if (uchunk > ubytes)
+			uchunk = ubytes;
+
+		result = copy_from_user(ptr, buf, uchunk);
+		kunmap(page);
+		if (result) {
+			result = -EFAULT;
+			goto out;
+		}
+		ubytes -= uchunk;
+		maddr  += mchunk;
+		buf    += mchunk;
+		mbytes -= mchunk;
+	}
+out:
+	return result;
+}
+
+static int kimage_load_segment(struct kimage *image,
+				struct kexec_segment *segment)
+{
+	return kimage_load_normal_segment(image, segment);
+}
+
+long firmware_sys_kexec_load(unsigned long entry, unsigned long nr_segments,
+				struct kexec_segment __user *segments,
+				unsigned long flags)
+{
+	struct kimage **dest_image, *image = NULL;
+	int result = 0;
+
+	dest_image = &kexec_image;
+	if (flags & KEXEC_ON_CRASH)
+		dest_image = &kexec_crash_image;
+	if (nr_segments > 0) {
+		unsigned long i;
+
+		/* Loading another kernel to reboot into */
+		if ((flags & KEXEC_ON_CRASH) == 0)
+			result = kimage_normal_alloc(&image, entry,
+							nr_segments, segments);
+		/* Loading another kernel to switch to if this one crashes */
+		else if (flags & KEXEC_ON_CRASH) {
+			/* Free any current crash dump kernel before
+			 * we corrupt it.
+			 */
+			mf_kexec_unload(image);
+			kimage_free(xchg(&kexec_crash_image, NULL));
+			result = kimage_crash_alloc(&image, entry,
+						     nr_segments, segments);
+		}
+		if (result)
+			goto out;
+
+		if (flags & KEXEC_PRESERVE_CONTEXT)
+			image->preserve_context = 1;
+		result = mf_kexec_prepare(image);
+		if (result)
+			goto out;
+
+		for (i = 0; i < nr_segments; i++) {
+			result = kimage_load_segment(image, &image->segment[i]);
+			if (result)
+				goto out;
+		}
+		kimage_terminate(image);
+	}
+
+	result = mf_kexec_load(image);
+
+	if (result)
+		goto out;
+
+	/* Install the new kernel, and  Uninstall the old */
+	image = xchg(dest_image, image);
+
+out:
+	mf_kexec_unload(image);
+
+	kimage_free(image);
+
+	return result;
+}
+
+void firmware_crash_kexec(struct pt_regs *regs)
+{
+	struct pt_regs fixed_regs;
+
+	crash_setup_regs(&fixed_regs, regs);
+	crash_save_vmcoreinfo();
+	machine_crash_shutdown(&fixed_regs);
+	mf_kexec(kexec_crash_image);
+}
+
+int firmware_kernel_kexec(void)
+{
+	kernel_restart_prepare(NULL);
+	printk(KERN_EMERG "Starting new kernel\n");
+	mf_kexec_shutdown();
+	mf_kexec(kexec_image);
+
+	return 0;
+}
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 5e4bd78..9f3b6cb 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -38,6 +38,10 @@
 #include <asm/io.h>
 #include <asm/sections.h>
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+bool kexec_use_firmware = false;
+#endif
+
 /* Per cpu memory for storing cpu states in case of system crash. */
 note_buf_t __percpu *crash_notes;
 
@@ -924,7 +928,7 @@ static int kimage_load_segment(struct kimage *image,
  *   the devices in a consistent state so a later kernel can
  *   reinitialize them.
  *
- * - A machine specific part that includes the syscall number
+ * - A machine/firmware specific part that includes the syscall number
  *   and the copies the image to it's final destination.  And
  *   jumps into the image at entry.
  *
@@ -978,6 +982,17 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
 	if (!mutex_trylock(&kexec_mutex))
 		return -EBUSY;
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+	if (kexec_use_firmware) {
+		result = firmware_sys_kexec_load(entry, nr_segments,
+							segments, flags);
+
+		mutex_unlock(&kexec_mutex);
+
+		return result;
+	}
+#endif
+
 	dest_image = &kexec_image;
 	if (flags & KEXEC_ON_CRASH)
 		dest_image = &kexec_crash_image;
@@ -1091,10 +1106,17 @@ void crash_kexec(struct pt_regs *regs)
 		if (kexec_crash_image) {
 			struct pt_regs fixed_regs;
 
-			crash_setup_regs(&fixed_regs, regs);
-			crash_save_vmcoreinfo();
-			machine_crash_shutdown(&fixed_regs);
-			machine_kexec(kexec_crash_image);
+#ifdef CONFIG_KEXEC_FIRMWARE
+			if (kexec_use_firmware)
+				firmware_crash_kexec(regs);
+			else
+#endif
+			{
+				crash_setup_regs(&fixed_regs, regs);
+				crash_save_vmcoreinfo();
+				machine_crash_shutdown(&fixed_regs);
+				machine_kexec(kexec_crash_image);
+			}
 		}
 		mutex_unlock(&kexec_mutex);
 	}
@@ -1132,6 +1154,13 @@ int crash_shrink_memory(unsigned long new_size)
 
 	mutex_lock(&kexec_mutex);
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+	if (kexec_use_firmware) {
+		ret = -ENOSYS;
+		goto unlock;
+	}
+#endif
+
 	if (kexec_crash_image) {
 		ret = -ENOENT;
 		goto unlock;
@@ -1536,6 +1565,13 @@ int kernel_kexec(void)
 		goto Unlock;
 	}
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+	if (kexec_use_firmware) {
+		error = firmware_kernel_kexec();
+		goto Unlock;
+	}
+#endif
+
 #ifdef CONFIG_KEXEC_JUMP
 	if (kexec_image->preserve_context) {
 		lock_system_sleep();
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 01/11] kexec: introduce kexec firmware support
  2012-12-27  2:18 ` Daniel Kiper
  (?)
  (?)
@ 2012-12-27  2:18 ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default
Linux infrastructure and require some support from firmware and/or hypervisor.
To cope with that problem kexec firmware infrastructure was introduced.
It allows a developer to use all kexec/kdump features of given firmware
or hypervisor.

v3 - suggestions/fixes:
   - replace kexec_ops struct by kexec firmware infrastructure
     (suggested by Eric Biederman).

v2 - suggestions/fixes:
   - add comment for kexec_ops.crash_alloc_temp_store member
     (suggested by Konrad Rzeszutek Wilk),
   - simplify kexec_ops usage
     (suggested by Konrad Rzeszutek Wilk).

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 include/linux/kexec.h   |   26 ++-
 kernel/Makefile         |    1 +
 kernel/kexec-firmware.c |  743 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c          |   46 +++-
 4 files changed, 809 insertions(+), 7 deletions(-)
 create mode 100644 kernel/kexec-firmware.c

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index d0b8458..9568457 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -116,17 +116,34 @@ struct kimage {
 #endif
 };
 
-
-
 /* kexec interface functions */
 extern void machine_kexec(struct kimage *image);
 extern int machine_kexec_prepare(struct kimage *image);
 extern void machine_kexec_cleanup(struct kimage *image);
+extern struct page *mf_kexec_kimage_alloc_pages(gfp_t gfp_mask,
+						unsigned int order,
+						unsigned long limit);
+extern void mf_kexec_kimage_free_pages(struct page *page);
+extern unsigned long mf_kexec_page_to_pfn(struct page *page);
+extern struct page *mf_kexec_pfn_to_page(unsigned long mfn);
+extern unsigned long mf_kexec_virt_to_phys(volatile void *address);
+extern void *mf_kexec_phys_to_virt(unsigned long address);
+extern int mf_kexec_prepare(struct kimage *image);
+extern int mf_kexec_load(struct kimage *image);
+extern void mf_kexec_cleanup(struct kimage *image);
+extern void mf_kexec_unload(struct kimage *image);
+extern void mf_kexec_shutdown(void);
+extern void mf_kexec(struct kimage *image);
 extern asmlinkage long sys_kexec_load(unsigned long entry,
 					unsigned long nr_segments,
 					struct kexec_segment __user *segments,
 					unsigned long flags);
+extern long firmware_sys_kexec_load(unsigned long entry,
+					unsigned long nr_segments,
+					struct kexec_segment __user *segments,
+					unsigned long flags);
 extern int kernel_kexec(void);
+extern int firmware_kernel_kexec(void);
 #ifdef CONFIG_COMPAT
 extern asmlinkage long compat_sys_kexec_load(unsigned long entry,
 				unsigned long nr_segments,
@@ -135,7 +152,10 @@ extern asmlinkage long compat_sys_kexec_load(unsigned long entry,
 #endif
 extern struct page *kimage_alloc_control_pages(struct kimage *image,
 						unsigned int order);
+extern struct page *firmware_kimage_alloc_control_pages(struct kimage *image,
+							unsigned int order);
 extern void crash_kexec(struct pt_regs *);
+extern void firmware_crash_kexec(struct pt_regs *);
 int kexec_should_crash(struct task_struct *);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
 void crash_save_vmcoreinfo(void);
@@ -168,6 +188,8 @@ unsigned long paddr_vmcoreinfo_note(void);
 #define VMCOREINFO_CONFIG(name) \
 	vmcoreinfo_append_str("CONFIG_%s=y\n", #name)
 
+extern bool kexec_use_firmware;
+
 extern struct kimage *kexec_image;
 extern struct kimage *kexec_crash_image;
 
diff --git a/kernel/Makefile b/kernel/Makefile
index 6c072b6..bc96b2f 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -58,6 +58,7 @@ obj-$(CONFIG_MODULE_SIG) += module_signing.o modsign_pubkey.o modsign_certificat
 obj-$(CONFIG_KALLSYMS) += kallsyms.o
 obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o
 obj-$(CONFIG_KEXEC) += kexec.o
+obj-$(CONFIG_KEXEC_FIRMWARE) += kexec-firmware.o
 obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o
 obj-$(CONFIG_COMPAT) += compat.o
 obj-$(CONFIG_CGROUPS) += cgroup.o
diff --git a/kernel/kexec-firmware.c b/kernel/kexec-firmware.c
new file mode 100644
index 0000000..f6ddd4c
--- /dev/null
+++ b/kernel/kexec-firmware.c
@@ -0,0 +1,743 @@
+/*
+ * Copyright (C) 2002-2004 Eric Biederman  <ebiederm@xmission.com>
+ * Copyright (C) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * Most of the code here is a copy of kernel/kexec.c.
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
+ */
+
+#include <linux/atomic.h>
+#include <linux/errno.h>
+#include <linux/highmem.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/reboot.h>
+#include <linux/slab.h>
+
+#include <asm/uaccess.h>
+
+/*
+ * KIMAGE_NO_DEST is an impossible destination address..., for
+ * allocating pages whose destination address we do not care about.
+ */
+#define KIMAGE_NO_DEST (-1UL)
+
+static int kimage_is_destination_range(struct kimage *image,
+				       unsigned long start, unsigned long end);
+static struct page *kimage_alloc_page(struct kimage *image,
+				       gfp_t gfp_mask,
+				       unsigned long dest);
+
+static int do_kimage_alloc(struct kimage **rimage, unsigned long entry,
+	                    unsigned long nr_segments,
+                            struct kexec_segment __user *segments)
+{
+	size_t segment_bytes;
+	struct kimage *image;
+	unsigned long i;
+	int result;
+
+	/* Allocate a controlling structure */
+	result = -ENOMEM;
+	image = kzalloc(sizeof(*image), GFP_KERNEL);
+	if (!image)
+		goto out;
+
+	image->head = 0;
+	image->entry = &image->head;
+	image->last_entry = &image->head;
+	image->control_page = ~0; /* By default this does not apply */
+	image->start = entry;
+	image->type = KEXEC_TYPE_DEFAULT;
+
+	/* Initialize the list of control pages */
+	INIT_LIST_HEAD(&image->control_pages);
+
+	/* Initialize the list of destination pages */
+	INIT_LIST_HEAD(&image->dest_pages);
+
+	/* Initialize the list of unusable pages */
+	INIT_LIST_HEAD(&image->unuseable_pages);
+
+	/* Read in the segments */
+	image->nr_segments = nr_segments;
+	segment_bytes = nr_segments * sizeof(*segments);
+	result = copy_from_user(image->segment, segments, segment_bytes);
+	if (result) {
+		result = -EFAULT;
+		goto out;
+	}
+
+	/*
+	 * Verify we have good destination addresses.  The caller is
+	 * responsible for making certain we don't attempt to load
+	 * the new image into invalid or reserved areas of RAM.  This
+	 * just verifies it is an address we can use.
+	 *
+	 * Since the kernel does everything in page size chunks ensure
+	 * the destination addresses are page aligned.  Too many
+	 * special cases crop of when we don't do this.  The most
+	 * insidious is getting overlapping destination addresses
+	 * simply because addresses are changed to page size
+	 * granularity.
+	 */
+	result = -EADDRNOTAVAIL;
+	for (i = 0; i < nr_segments; i++) {
+		unsigned long mstart, mend;
+
+		mstart = image->segment[i].mem;
+		mend   = mstart + image->segment[i].memsz;
+		if ((mstart & ~PAGE_MASK) || (mend & ~PAGE_MASK))
+			goto out;
+		if (mend >= KEXEC_DESTINATION_MEMORY_LIMIT)
+			goto out;
+	}
+
+	/* Verify our destination addresses do not overlap.
+	 * If we alloed overlapping destination addresses
+	 * through very weird things can happen with no
+	 * easy explanation as one segment stops on another.
+	 */
+	result = -EINVAL;
+	for (i = 0; i < nr_segments; i++) {
+		unsigned long mstart, mend;
+		unsigned long j;
+
+		mstart = image->segment[i].mem;
+		mend   = mstart + image->segment[i].memsz;
+		for (j = 0; j < i; j++) {
+			unsigned long pstart, pend;
+			pstart = image->segment[j].mem;
+			pend   = pstart + image->segment[j].memsz;
+			/* Do the segments overlap ? */
+			if ((mend > pstart) && (mstart < pend))
+				goto out;
+		}
+	}
+
+	/* Ensure our buffer sizes are strictly less than
+	 * our memory sizes.  This should always be the case,
+	 * and it is easier to check up front than to be surprised
+	 * later on.
+	 */
+	result = -EINVAL;
+	for (i = 0; i < nr_segments; i++) {
+		if (image->segment[i].bufsz > image->segment[i].memsz)
+			goto out;
+	}
+
+	result = 0;
+out:
+	if (result == 0)
+		*rimage = image;
+	else
+		kfree(image);
+
+	return result;
+
+}
+
+static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry,
+				unsigned long nr_segments,
+				struct kexec_segment __user *segments)
+{
+	int result;
+	struct kimage *image;
+
+	/* Allocate and initialize a controlling structure */
+	image = NULL;
+	result = do_kimage_alloc(&image, entry, nr_segments, segments);
+	if (result)
+		goto out;
+
+	*rimage = image;
+
+	/*
+	 * Find a location for the control code buffer, and add it
+	 * the vector of segments so that it's pages will also be
+	 * counted as destination pages.
+	 */
+	result = -ENOMEM;
+	image->control_code_page = firmware_kimage_alloc_control_pages(image,
+					   get_order(KEXEC_CONTROL_PAGE_SIZE));
+	if (!image->control_code_page) {
+		printk(KERN_ERR "Could not allocate control_code_buffer\n");
+		goto out;
+	}
+
+	image->swap_page = firmware_kimage_alloc_control_pages(image, 0);
+	if (!image->swap_page) {
+		printk(KERN_ERR "Could not allocate swap buffer\n");
+		goto out;
+	}
+
+	result = 0;
+ out:
+	if (result == 0)
+		*rimage = image;
+	else
+		kfree(image);
+
+	return result;
+}
+
+static int kimage_crash_alloc(struct kimage **rimage, unsigned long entry,
+				unsigned long nr_segments,
+				struct kexec_segment __user *segments)
+{
+	int result;
+	struct kimage *image;
+	unsigned long i;
+
+	image = NULL;
+	/* Verify we have a valid entry point */
+	if ((entry < crashk_res.start) || (entry > crashk_res.end)) {
+		result = -EADDRNOTAVAIL;
+		goto out;
+	}
+
+	/* Allocate and initialize a controlling structure */
+	result = do_kimage_alloc(&image, entry, nr_segments, segments);
+	if (result)
+		goto out;
+
+	/* Enable the special crash kernel control page
+	 * allocation policy.
+	 */
+	image->control_page = crashk_res.start;
+	image->type = KEXEC_TYPE_CRASH;
+
+	/*
+	 * Verify we have good destination addresses.  Normally
+	 * the caller is responsible for making certain we don't
+	 * attempt to load the new image into invalid or reserved
+	 * areas of RAM.  But crash kernels are preloaded into a
+	 * reserved area of ram.  We must ensure the addresses
+	 * are in the reserved area otherwise preloading the
+	 * kernel could corrupt things.
+	 */
+	result = -EADDRNOTAVAIL;
+	for (i = 0; i < nr_segments; i++) {
+		unsigned long mstart, mend;
+
+		mstart = image->segment[i].mem;
+		mend = mstart + image->segment[i].memsz - 1;
+		/* Ensure we are within the crash kernel limits */
+		if ((mstart < crashk_res.start) || (mend > crashk_res.end))
+			goto out;
+	}
+
+	/*
+	 * Find a location for the control code buffer, and add
+	 * the vector of segments so that it's pages will also be
+	 * counted as destination pages.
+	 */
+	result = -ENOMEM;
+	image->control_code_page = firmware_kimage_alloc_control_pages(image,
+					   get_order(KEXEC_CONTROL_PAGE_SIZE));
+	if (!image->control_code_page) {
+		printk(KERN_ERR "Could not allocate control_code_buffer\n");
+		goto out;
+	}
+
+	result = 0;
+out:
+	if (result == 0)
+		*rimage = image;
+	else
+		kfree(image);
+
+	return result;
+}
+
+static int kimage_is_destination_range(struct kimage *image,
+					unsigned long start,
+					unsigned long end)
+{
+	unsigned long i;
+
+	for (i = 0; i < image->nr_segments; i++) {
+		unsigned long mstart, mend;
+
+		mstart = image->segment[i].mem;
+		mend = mstart + image->segment[i].memsz;
+		if ((end > mstart) && (start < mend))
+			return 1;
+	}
+
+	return 0;
+}
+
+static void kimage_free_page_list(struct list_head *list)
+{
+	struct list_head *pos, *next;
+
+	list_for_each_safe(pos, next, list) {
+		struct page *page;
+
+		page = list_entry(pos, struct page, lru);
+		list_del(&page->lru);
+		mf_kexec_kimage_free_pages(page);
+	}
+}
+
+static struct page *kimage_alloc_normal_control_pages(struct kimage *image,
+							unsigned int order)
+{
+	/* Control pages are special, they are the intermediaries
+	 * that are needed while we copy the rest of the pages
+	 * to their final resting place.  As such they must
+	 * not conflict with either the destination addresses
+	 * or memory the kernel is already using.
+	 *
+	 * The only case where we really need more than one of
+	 * these are for architectures where we cannot disable
+	 * the MMU and must instead generate an identity mapped
+	 * page table for all of the memory.
+	 *
+	 * At worst this runs in O(N) of the image size.
+	 */
+	struct list_head extra_pages;
+	struct page *pages;
+	unsigned int count;
+
+	count = 1 << order;
+	INIT_LIST_HEAD(&extra_pages);
+
+	/* Loop while I can allocate a page and the page allocated
+	 * is a destination page.
+	 */
+	do {
+		unsigned long pfn, epfn, addr, eaddr;
+
+		pages = mf_kexec_kimage_alloc_pages(GFP_KERNEL, order,
+							KEXEC_CONTROL_MEMORY_LIMIT);
+		if (!pages)
+			break;
+		pfn   = mf_kexec_page_to_pfn(pages);
+		epfn  = pfn + count;
+		addr  = pfn << PAGE_SHIFT;
+		eaddr = epfn << PAGE_SHIFT;
+		if ((epfn >= (KEXEC_CONTROL_MEMORY_LIMIT >> PAGE_SHIFT)) ||
+			      kimage_is_destination_range(image, addr, eaddr)) {
+			list_add(&pages->lru, &extra_pages);
+			pages = NULL;
+		}
+	} while (!pages);
+
+	if (pages) {
+		/* Remember the allocated page... */
+		list_add(&pages->lru, &image->control_pages);
+
+		/* Because the page is already in it's destination
+		 * location we will never allocate another page at
+		 * that address.  Therefore mf_kexec_kimage_alloc_pages
+		 * will not return it (again) and we don't need
+		 * to give it an entry in image->segment[].
+		 */
+	}
+	/* Deal with the destination pages I have inadvertently allocated.
+	 *
+	 * Ideally I would convert multi-page allocations into single
+	 * page allocations, and add everything to image->dest_pages.
+	 *
+	 * For now it is simpler to just free the pages.
+	 */
+	kimage_free_page_list(&extra_pages);
+
+	return pages;
+}
+
+struct page *firmware_kimage_alloc_control_pages(struct kimage *image,
+							unsigned int order)
+{
+	return kimage_alloc_normal_control_pages(image, order);
+}
+
+static int kimage_add_entry(struct kimage *image, kimage_entry_t entry)
+{
+	if (*image->entry != 0)
+		image->entry++;
+
+	if (image->entry == image->last_entry) {
+		kimage_entry_t *ind_page;
+		struct page *page;
+
+		page = kimage_alloc_page(image, GFP_KERNEL, KIMAGE_NO_DEST);
+		if (!page)
+			return -ENOMEM;
+
+		ind_page = page_address(page);
+		*image->entry = mf_kexec_virt_to_phys(ind_page) | IND_INDIRECTION;
+		image->entry = ind_page;
+		image->last_entry = ind_page +
+				      ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1);
+	}
+	*image->entry = entry;
+	image->entry++;
+	*image->entry = 0;
+
+	return 0;
+}
+
+static int kimage_set_destination(struct kimage *image,
+				   unsigned long destination)
+{
+	int result;
+
+	destination &= PAGE_MASK;
+	result = kimage_add_entry(image, destination | IND_DESTINATION);
+	if (result == 0)
+		image->destination = destination;
+
+	return result;
+}
+
+
+static int kimage_add_page(struct kimage *image, unsigned long page)
+{
+	int result;
+
+	page &= PAGE_MASK;
+	result = kimage_add_entry(image, page | IND_SOURCE);
+	if (result == 0)
+		image->destination += PAGE_SIZE;
+
+	return result;
+}
+
+
+static void kimage_free_extra_pages(struct kimage *image)
+{
+	/* Walk through and free any extra destination pages I may have */
+	kimage_free_page_list(&image->dest_pages);
+
+	/* Walk through and free any unusable pages I have cached */
+	kimage_free_page_list(&image->unuseable_pages);
+
+}
+static void kimage_terminate(struct kimage *image)
+{
+	if (*image->entry != 0)
+		image->entry++;
+
+	*image->entry = IND_DONE;
+}
+
+#define for_each_kimage_entry(image, ptr, entry) \
+	for (ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE); \
+		ptr = (entry & IND_INDIRECTION)? \
+			mf_kexec_phys_to_virt((entry & PAGE_MASK)): ptr +1)
+
+static void kimage_free_entry(kimage_entry_t entry)
+{
+	struct page *page;
+
+	page = mf_kexec_pfn_to_page(entry >> PAGE_SHIFT);
+	mf_kexec_kimage_free_pages(page);
+}
+
+static void kimage_free(struct kimage *image)
+{
+	kimage_entry_t *ptr, entry;
+	kimage_entry_t ind = 0;
+
+	if (!image)
+		return;
+
+	kimage_free_extra_pages(image);
+	for_each_kimage_entry(image, ptr, entry) {
+		if (entry & IND_INDIRECTION) {
+			/* Free the previous indirection page */
+			if (ind & IND_INDIRECTION)
+				kimage_free_entry(ind);
+			/* Save this indirection page until we are
+			 * done with it.
+			 */
+			ind = entry;
+		}
+		else if (entry & IND_SOURCE)
+			kimage_free_entry(entry);
+	}
+	/* Free the final indirection page */
+	if (ind & IND_INDIRECTION)
+		kimage_free_entry(ind);
+
+	/* Handle any machine specific cleanup */
+	mf_kexec_cleanup(image);
+
+	/* Free the kexec control pages... */
+	kimage_free_page_list(&image->control_pages);
+	kfree(image);
+}
+
+static kimage_entry_t *kimage_dst_used(struct kimage *image,
+					unsigned long page)
+{
+	kimage_entry_t *ptr, entry;
+	unsigned long destination = 0;
+
+	for_each_kimage_entry(image, ptr, entry) {
+		if (entry & IND_DESTINATION)
+			destination = entry & PAGE_MASK;
+		else if (entry & IND_SOURCE) {
+			if (page == destination)
+				return ptr;
+			destination += PAGE_SIZE;
+		}
+	}
+
+	return NULL;
+}
+
+static struct page *kimage_alloc_page(struct kimage *image,
+					gfp_t gfp_mask,
+					unsigned long destination)
+{
+	/*
+	 * Here we implement safeguards to ensure that a source page
+	 * is not copied to its destination page before the data on
+	 * the destination page is no longer useful.
+	 *
+	 * To do this we maintain the invariant that a source page is
+	 * either its own destination page, or it is not a
+	 * destination page at all.
+	 *
+	 * That is slightly stronger than required, but the proof
+	 * that no problems will not occur is trivial, and the
+	 * implementation is simply to verify.
+	 *
+	 * When allocating all pages normally this algorithm will run
+	 * in O(N) time, but in the worst case it will run in O(N^2)
+	 * time.   If the runtime is a problem the data structures can
+	 * be fixed.
+	 */
+	struct page *page;
+	unsigned long addr;
+
+	/*
+	 * Walk through the list of destination pages, and see if I
+	 * have a match.
+	 */
+	list_for_each_entry(page, &image->dest_pages, lru) {
+		addr = mf_kexec_page_to_pfn(page) << PAGE_SHIFT;
+		if (addr == destination) {
+			list_del(&page->lru);
+			return page;
+		}
+	}
+	page = NULL;
+	while (1) {
+		kimage_entry_t *old;
+
+		/* Allocate a page, if we run out of memory give up */
+		page = mf_kexec_kimage_alloc_pages(gfp_mask, 0,
+							KEXEC_SOURCE_MEMORY_LIMIT);
+		if (!page)
+			return NULL;
+		/* If the page cannot be used file it away */
+		if (mf_kexec_page_to_pfn(page) >
+				(KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) {
+			list_add(&page->lru, &image->unuseable_pages);
+			continue;
+		}
+		addr = mf_kexec_page_to_pfn(page) << PAGE_SHIFT;
+
+		/* If it is the destination page we want use it */
+		if (addr == destination)
+			break;
+
+		/* If the page is not a destination page use it */
+		if (!kimage_is_destination_range(image, addr,
+						  addr + PAGE_SIZE))
+			break;
+
+		/*
+		 * I know that the page is someones destination page.
+		 * See if there is already a source page for this
+		 * destination page.  And if so swap the source pages.
+		 */
+		old = kimage_dst_used(image, addr);
+		if (old) {
+			/* If so move it */
+			unsigned long old_addr;
+			struct page *old_page;
+
+			old_addr = *old & PAGE_MASK;
+			old_page = mf_kexec_pfn_to_page(old_addr >> PAGE_SHIFT);
+			copy_highpage(page, old_page);
+			*old = addr | (*old & ~PAGE_MASK);
+
+			/* The old page I have found cannot be a
+			 * destination page, so return it if it's
+			 * gfp_flags honor the ones passed in.
+			 */
+			if (!(gfp_mask & __GFP_HIGHMEM) &&
+			    PageHighMem(old_page)) {
+				mf_kexec_kimage_free_pages(old_page);
+				continue;
+			}
+			addr = old_addr;
+			page = old_page;
+			break;
+		}
+		else {
+			/* Place the page on the destination list I
+			 * will use it later.
+			 */
+			list_add(&page->lru, &image->dest_pages);
+		}
+	}
+
+	return page;
+}
+
+static int kimage_load_normal_segment(struct kimage *image,
+					 struct kexec_segment *segment)
+{
+	unsigned long maddr;
+	unsigned long ubytes, mbytes;
+	int result;
+	unsigned char __user *buf;
+
+	result = 0;
+	buf = segment->buf;
+	ubytes = segment->bufsz;
+	mbytes = segment->memsz;
+	maddr = segment->mem;
+
+	result = kimage_set_destination(image, maddr);
+	if (result < 0)
+		goto out;
+
+	while (mbytes) {
+		struct page *page;
+		char *ptr;
+		size_t uchunk, mchunk;
+
+		page = kimage_alloc_page(image, GFP_HIGHUSER, maddr);
+		if (!page) {
+			result  = -ENOMEM;
+			goto out;
+		}
+		result = kimage_add_page(image, mf_kexec_page_to_pfn(page)
+								<< PAGE_SHIFT);
+		if (result < 0)
+			goto out;
+
+		ptr = kmap(page);
+		/* Start with a clear page */
+		clear_page(ptr);
+		ptr += maddr & ~PAGE_MASK;
+		mchunk = PAGE_SIZE - (maddr & ~PAGE_MASK);
+		if (mchunk > mbytes)
+			mchunk = mbytes;
+
+		uchunk = mchunk;
+		if (uchunk > ubytes)
+			uchunk = ubytes;
+
+		result = copy_from_user(ptr, buf, uchunk);
+		kunmap(page);
+		if (result) {
+			result = -EFAULT;
+			goto out;
+		}
+		ubytes -= uchunk;
+		maddr  += mchunk;
+		buf    += mchunk;
+		mbytes -= mchunk;
+	}
+out:
+	return result;
+}
+
+static int kimage_load_segment(struct kimage *image,
+				struct kexec_segment *segment)
+{
+	return kimage_load_normal_segment(image, segment);
+}
+
+long firmware_sys_kexec_load(unsigned long entry, unsigned long nr_segments,
+				struct kexec_segment __user *segments,
+				unsigned long flags)
+{
+	struct kimage **dest_image, *image = NULL;
+	int result = 0;
+
+	dest_image = &kexec_image;
+	if (flags & KEXEC_ON_CRASH)
+		dest_image = &kexec_crash_image;
+	if (nr_segments > 0) {
+		unsigned long i;
+
+		/* Loading another kernel to reboot into */
+		if ((flags & KEXEC_ON_CRASH) == 0)
+			result = kimage_normal_alloc(&image, entry,
+							nr_segments, segments);
+		/* Loading another kernel to switch to if this one crashes */
+		else if (flags & KEXEC_ON_CRASH) {
+			/* Free any current crash dump kernel before
+			 * we corrupt it.
+			 */
+			mf_kexec_unload(image);
+			kimage_free(xchg(&kexec_crash_image, NULL));
+			result = kimage_crash_alloc(&image, entry,
+						     nr_segments, segments);
+		}
+		if (result)
+			goto out;
+
+		if (flags & KEXEC_PRESERVE_CONTEXT)
+			image->preserve_context = 1;
+		result = mf_kexec_prepare(image);
+		if (result)
+			goto out;
+
+		for (i = 0; i < nr_segments; i++) {
+			result = kimage_load_segment(image, &image->segment[i]);
+			if (result)
+				goto out;
+		}
+		kimage_terminate(image);
+	}
+
+	result = mf_kexec_load(image);
+
+	if (result)
+		goto out;
+
+	/* Install the new kernel, and  Uninstall the old */
+	image = xchg(dest_image, image);
+
+out:
+	mf_kexec_unload(image);
+
+	kimage_free(image);
+
+	return result;
+}
+
+void firmware_crash_kexec(struct pt_regs *regs)
+{
+	struct pt_regs fixed_regs;
+
+	crash_setup_regs(&fixed_regs, regs);
+	crash_save_vmcoreinfo();
+	machine_crash_shutdown(&fixed_regs);
+	mf_kexec(kexec_crash_image);
+}
+
+int firmware_kernel_kexec(void)
+{
+	kernel_restart_prepare(NULL);
+	printk(KERN_EMERG "Starting new kernel\n");
+	mf_kexec_shutdown();
+	mf_kexec(kexec_image);
+
+	return 0;
+}
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 5e4bd78..9f3b6cb 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -38,6 +38,10 @@
 #include <asm/io.h>
 #include <asm/sections.h>
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+bool kexec_use_firmware = false;
+#endif
+
 /* Per cpu memory for storing cpu states in case of system crash. */
 note_buf_t __percpu *crash_notes;
 
@@ -924,7 +928,7 @@ static int kimage_load_segment(struct kimage *image,
  *   the devices in a consistent state so a later kernel can
  *   reinitialize them.
  *
- * - A machine specific part that includes the syscall number
+ * - A machine/firmware specific part that includes the syscall number
  *   and the copies the image to it's final destination.  And
  *   jumps into the image at entry.
  *
@@ -978,6 +982,17 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
 	if (!mutex_trylock(&kexec_mutex))
 		return -EBUSY;
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+	if (kexec_use_firmware) {
+		result = firmware_sys_kexec_load(entry, nr_segments,
+							segments, flags);
+
+		mutex_unlock(&kexec_mutex);
+
+		return result;
+	}
+#endif
+
 	dest_image = &kexec_image;
 	if (flags & KEXEC_ON_CRASH)
 		dest_image = &kexec_crash_image;
@@ -1091,10 +1106,17 @@ void crash_kexec(struct pt_regs *regs)
 		if (kexec_crash_image) {
 			struct pt_regs fixed_regs;
 
-			crash_setup_regs(&fixed_regs, regs);
-			crash_save_vmcoreinfo();
-			machine_crash_shutdown(&fixed_regs);
-			machine_kexec(kexec_crash_image);
+#ifdef CONFIG_KEXEC_FIRMWARE
+			if (kexec_use_firmware)
+				firmware_crash_kexec(regs);
+			else
+#endif
+			{
+				crash_setup_regs(&fixed_regs, regs);
+				crash_save_vmcoreinfo();
+				machine_crash_shutdown(&fixed_regs);
+				machine_kexec(kexec_crash_image);
+			}
 		}
 		mutex_unlock(&kexec_mutex);
 	}
@@ -1132,6 +1154,13 @@ int crash_shrink_memory(unsigned long new_size)
 
 	mutex_lock(&kexec_mutex);
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+	if (kexec_use_firmware) {
+		ret = -ENOSYS;
+		goto unlock;
+	}
+#endif
+
 	if (kexec_crash_image) {
 		ret = -ENOENT;
 		goto unlock;
@@ -1536,6 +1565,13 @@ int kernel_kexec(void)
 		goto Unlock;
 	}
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+	if (kexec_use_firmware) {
+		error = firmware_kernel_kexec();
+		goto Unlock;
+	}
+#endif
+
 #ifdef CONFIG_KEXEC_JUMP
 	if (kexec_image->preserve_context) {
 		lock_system_sleep();
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 01/11] kexec: introduce kexec firmware support
@ 2012-12-27  2:18   ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3-Sxgqhf6Nn4DQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	jbeulich-IBi9RG/b67k, konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
	maxim.uvarov-QHcLZuEGTsvQT0dZR+AlfA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR
  Cc: Daniel Kiper

Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default
Linux infrastructure and require some support from firmware and/or hypervisor.
To cope with that problem kexec firmware infrastructure was introduced.
It allows a developer to use all kexec/kdump features of given firmware
or hypervisor.

v3 - suggestions/fixes:
   - replace kexec_ops struct by kexec firmware infrastructure
     (suggested by Eric Biederman).

v2 - suggestions/fixes:
   - add comment for kexec_ops.crash_alloc_temp_store member
     (suggested by Konrad Rzeszutek Wilk),
   - simplify kexec_ops usage
     (suggested by Konrad Rzeszutek Wilk).

Signed-off-by: Daniel Kiper <daniel.kiper-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 include/linux/kexec.h   |   26 ++-
 kernel/Makefile         |    1 +
 kernel/kexec-firmware.c |  743 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c          |   46 +++-
 4 files changed, 809 insertions(+), 7 deletions(-)
 create mode 100644 kernel/kexec-firmware.c

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index d0b8458..9568457 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -116,17 +116,34 @@ struct kimage {
 #endif
 };
 
-
-
 /* kexec interface functions */
 extern void machine_kexec(struct kimage *image);
 extern int machine_kexec_prepare(struct kimage *image);
 extern void machine_kexec_cleanup(struct kimage *image);
+extern struct page *mf_kexec_kimage_alloc_pages(gfp_t gfp_mask,
+						unsigned int order,
+						unsigned long limit);
+extern void mf_kexec_kimage_free_pages(struct page *page);
+extern unsigned long mf_kexec_page_to_pfn(struct page *page);
+extern struct page *mf_kexec_pfn_to_page(unsigned long mfn);
+extern unsigned long mf_kexec_virt_to_phys(volatile void *address);
+extern void *mf_kexec_phys_to_virt(unsigned long address);
+extern int mf_kexec_prepare(struct kimage *image);
+extern int mf_kexec_load(struct kimage *image);
+extern void mf_kexec_cleanup(struct kimage *image);
+extern void mf_kexec_unload(struct kimage *image);
+extern void mf_kexec_shutdown(void);
+extern void mf_kexec(struct kimage *image);
 extern asmlinkage long sys_kexec_load(unsigned long entry,
 					unsigned long nr_segments,
 					struct kexec_segment __user *segments,
 					unsigned long flags);
+extern long firmware_sys_kexec_load(unsigned long entry,
+					unsigned long nr_segments,
+					struct kexec_segment __user *segments,
+					unsigned long flags);
 extern int kernel_kexec(void);
+extern int firmware_kernel_kexec(void);
 #ifdef CONFIG_COMPAT
 extern asmlinkage long compat_sys_kexec_load(unsigned long entry,
 				unsigned long nr_segments,
@@ -135,7 +152,10 @@ extern asmlinkage long compat_sys_kexec_load(unsigned long entry,
 #endif
 extern struct page *kimage_alloc_control_pages(struct kimage *image,
 						unsigned int order);
+extern struct page *firmware_kimage_alloc_control_pages(struct kimage *image,
+							unsigned int order);
 extern void crash_kexec(struct pt_regs *);
+extern void firmware_crash_kexec(struct pt_regs *);
 int kexec_should_crash(struct task_struct *);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
 void crash_save_vmcoreinfo(void);
@@ -168,6 +188,8 @@ unsigned long paddr_vmcoreinfo_note(void);
 #define VMCOREINFO_CONFIG(name) \
 	vmcoreinfo_append_str("CONFIG_%s=y\n", #name)
 
+extern bool kexec_use_firmware;
+
 extern struct kimage *kexec_image;
 extern struct kimage *kexec_crash_image;
 
diff --git a/kernel/Makefile b/kernel/Makefile
index 6c072b6..bc96b2f 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -58,6 +58,7 @@ obj-$(CONFIG_MODULE_SIG) += module_signing.o modsign_pubkey.o modsign_certificat
 obj-$(CONFIG_KALLSYMS) += kallsyms.o
 obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o
 obj-$(CONFIG_KEXEC) += kexec.o
+obj-$(CONFIG_KEXEC_FIRMWARE) += kexec-firmware.o
 obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o
 obj-$(CONFIG_COMPAT) += compat.o
 obj-$(CONFIG_CGROUPS) += cgroup.o
diff --git a/kernel/kexec-firmware.c b/kernel/kexec-firmware.c
new file mode 100644
index 0000000..f6ddd4c
--- /dev/null
+++ b/kernel/kexec-firmware.c
@@ -0,0 +1,743 @@
+/*
+ * Copyright (C) 2002-2004 Eric Biederman  <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
+ * Copyright (C) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * Most of the code here is a copy of kernel/kexec.c.
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
+ */
+
+#include <linux/atomic.h>
+#include <linux/errno.h>
+#include <linux/highmem.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/reboot.h>
+#include <linux/slab.h>
+
+#include <asm/uaccess.h>
+
+/*
+ * KIMAGE_NO_DEST is an impossible destination address..., for
+ * allocating pages whose destination address we do not care about.
+ */
+#define KIMAGE_NO_DEST (-1UL)
+
+static int kimage_is_destination_range(struct kimage *image,
+				       unsigned long start, unsigned long end);
+static struct page *kimage_alloc_page(struct kimage *image,
+				       gfp_t gfp_mask,
+				       unsigned long dest);
+
+static int do_kimage_alloc(struct kimage **rimage, unsigned long entry,
+	                    unsigned long nr_segments,
+                            struct kexec_segment __user *segments)
+{
+	size_t segment_bytes;
+	struct kimage *image;
+	unsigned long i;
+	int result;
+
+	/* Allocate a controlling structure */
+	result = -ENOMEM;
+	image = kzalloc(sizeof(*image), GFP_KERNEL);
+	if (!image)
+		goto out;
+
+	image->head = 0;
+	image->entry = &image->head;
+	image->last_entry = &image->head;
+	image->control_page = ~0; /* By default this does not apply */
+	image->start = entry;
+	image->type = KEXEC_TYPE_DEFAULT;
+
+	/* Initialize the list of control pages */
+	INIT_LIST_HEAD(&image->control_pages);
+
+	/* Initialize the list of destination pages */
+	INIT_LIST_HEAD(&image->dest_pages);
+
+	/* Initialize the list of unusable pages */
+	INIT_LIST_HEAD(&image->unuseable_pages);
+
+	/* Read in the segments */
+	image->nr_segments = nr_segments;
+	segment_bytes = nr_segments * sizeof(*segments);
+	result = copy_from_user(image->segment, segments, segment_bytes);
+	if (result) {
+		result = -EFAULT;
+		goto out;
+	}
+
+	/*
+	 * Verify we have good destination addresses.  The caller is
+	 * responsible for making certain we don't attempt to load
+	 * the new image into invalid or reserved areas of RAM.  This
+	 * just verifies it is an address we can use.
+	 *
+	 * Since the kernel does everything in page size chunks ensure
+	 * the destination addresses are page aligned.  Too many
+	 * special cases crop of when we don't do this.  The most
+	 * insidious is getting overlapping destination addresses
+	 * simply because addresses are changed to page size
+	 * granularity.
+	 */
+	result = -EADDRNOTAVAIL;
+	for (i = 0; i < nr_segments; i++) {
+		unsigned long mstart, mend;
+
+		mstart = image->segment[i].mem;
+		mend   = mstart + image->segment[i].memsz;
+		if ((mstart & ~PAGE_MASK) || (mend & ~PAGE_MASK))
+			goto out;
+		if (mend >= KEXEC_DESTINATION_MEMORY_LIMIT)
+			goto out;
+	}
+
+	/* Verify our destination addresses do not overlap.
+	 * If we alloed overlapping destination addresses
+	 * through very weird things can happen with no
+	 * easy explanation as one segment stops on another.
+	 */
+	result = -EINVAL;
+	for (i = 0; i < nr_segments; i++) {
+		unsigned long mstart, mend;
+		unsigned long j;
+
+		mstart = image->segment[i].mem;
+		mend   = mstart + image->segment[i].memsz;
+		for (j = 0; j < i; j++) {
+			unsigned long pstart, pend;
+			pstart = image->segment[j].mem;
+			pend   = pstart + image->segment[j].memsz;
+			/* Do the segments overlap ? */
+			if ((mend > pstart) && (mstart < pend))
+				goto out;
+		}
+	}
+
+	/* Ensure our buffer sizes are strictly less than
+	 * our memory sizes.  This should always be the case,
+	 * and it is easier to check up front than to be surprised
+	 * later on.
+	 */
+	result = -EINVAL;
+	for (i = 0; i < nr_segments; i++) {
+		if (image->segment[i].bufsz > image->segment[i].memsz)
+			goto out;
+	}
+
+	result = 0;
+out:
+	if (result == 0)
+		*rimage = image;
+	else
+		kfree(image);
+
+	return result;
+
+}
+
+static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry,
+				unsigned long nr_segments,
+				struct kexec_segment __user *segments)
+{
+	int result;
+	struct kimage *image;
+
+	/* Allocate and initialize a controlling structure */
+	image = NULL;
+	result = do_kimage_alloc(&image, entry, nr_segments, segments);
+	if (result)
+		goto out;
+
+	*rimage = image;
+
+	/*
+	 * Find a location for the control code buffer, and add it
+	 * the vector of segments so that it's pages will also be
+	 * counted as destination pages.
+	 */
+	result = -ENOMEM;
+	image->control_code_page = firmware_kimage_alloc_control_pages(image,
+					   get_order(KEXEC_CONTROL_PAGE_SIZE));
+	if (!image->control_code_page) {
+		printk(KERN_ERR "Could not allocate control_code_buffer\n");
+		goto out;
+	}
+
+	image->swap_page = firmware_kimage_alloc_control_pages(image, 0);
+	if (!image->swap_page) {
+		printk(KERN_ERR "Could not allocate swap buffer\n");
+		goto out;
+	}
+
+	result = 0;
+ out:
+	if (result == 0)
+		*rimage = image;
+	else
+		kfree(image);
+
+	return result;
+}
+
+static int kimage_crash_alloc(struct kimage **rimage, unsigned long entry,
+				unsigned long nr_segments,
+				struct kexec_segment __user *segments)
+{
+	int result;
+	struct kimage *image;
+	unsigned long i;
+
+	image = NULL;
+	/* Verify we have a valid entry point */
+	if ((entry < crashk_res.start) || (entry > crashk_res.end)) {
+		result = -EADDRNOTAVAIL;
+		goto out;
+	}
+
+	/* Allocate and initialize a controlling structure */
+	result = do_kimage_alloc(&image, entry, nr_segments, segments);
+	if (result)
+		goto out;
+
+	/* Enable the special crash kernel control page
+	 * allocation policy.
+	 */
+	image->control_page = crashk_res.start;
+	image->type = KEXEC_TYPE_CRASH;
+
+	/*
+	 * Verify we have good destination addresses.  Normally
+	 * the caller is responsible for making certain we don't
+	 * attempt to load the new image into invalid or reserved
+	 * areas of RAM.  But crash kernels are preloaded into a
+	 * reserved area of ram.  We must ensure the addresses
+	 * are in the reserved area otherwise preloading the
+	 * kernel could corrupt things.
+	 */
+	result = -EADDRNOTAVAIL;
+	for (i = 0; i < nr_segments; i++) {
+		unsigned long mstart, mend;
+
+		mstart = image->segment[i].mem;
+		mend = mstart + image->segment[i].memsz - 1;
+		/* Ensure we are within the crash kernel limits */
+		if ((mstart < crashk_res.start) || (mend > crashk_res.end))
+			goto out;
+	}
+
+	/*
+	 * Find a location for the control code buffer, and add
+	 * the vector of segments so that it's pages will also be
+	 * counted as destination pages.
+	 */
+	result = -ENOMEM;
+	image->control_code_page = firmware_kimage_alloc_control_pages(image,
+					   get_order(KEXEC_CONTROL_PAGE_SIZE));
+	if (!image->control_code_page) {
+		printk(KERN_ERR "Could not allocate control_code_buffer\n");
+		goto out;
+	}
+
+	result = 0;
+out:
+	if (result == 0)
+		*rimage = image;
+	else
+		kfree(image);
+
+	return result;
+}
+
+static int kimage_is_destination_range(struct kimage *image,
+					unsigned long start,
+					unsigned long end)
+{
+	unsigned long i;
+
+	for (i = 0; i < image->nr_segments; i++) {
+		unsigned long mstart, mend;
+
+		mstart = image->segment[i].mem;
+		mend = mstart + image->segment[i].memsz;
+		if ((end > mstart) && (start < mend))
+			return 1;
+	}
+
+	return 0;
+}
+
+static void kimage_free_page_list(struct list_head *list)
+{
+	struct list_head *pos, *next;
+
+	list_for_each_safe(pos, next, list) {
+		struct page *page;
+
+		page = list_entry(pos, struct page, lru);
+		list_del(&page->lru);
+		mf_kexec_kimage_free_pages(page);
+	}
+}
+
+static struct page *kimage_alloc_normal_control_pages(struct kimage *image,
+							unsigned int order)
+{
+	/* Control pages are special, they are the intermediaries
+	 * that are needed while we copy the rest of the pages
+	 * to their final resting place.  As such they must
+	 * not conflict with either the destination addresses
+	 * or memory the kernel is already using.
+	 *
+	 * The only case where we really need more than one of
+	 * these are for architectures where we cannot disable
+	 * the MMU and must instead generate an identity mapped
+	 * page table for all of the memory.
+	 *
+	 * At worst this runs in O(N) of the image size.
+	 */
+	struct list_head extra_pages;
+	struct page *pages;
+	unsigned int count;
+
+	count = 1 << order;
+	INIT_LIST_HEAD(&extra_pages);
+
+	/* Loop while I can allocate a page and the page allocated
+	 * is a destination page.
+	 */
+	do {
+		unsigned long pfn, epfn, addr, eaddr;
+
+		pages = mf_kexec_kimage_alloc_pages(GFP_KERNEL, order,
+							KEXEC_CONTROL_MEMORY_LIMIT);
+		if (!pages)
+			break;
+		pfn   = mf_kexec_page_to_pfn(pages);
+		epfn  = pfn + count;
+		addr  = pfn << PAGE_SHIFT;
+		eaddr = epfn << PAGE_SHIFT;
+		if ((epfn >= (KEXEC_CONTROL_MEMORY_LIMIT >> PAGE_SHIFT)) ||
+			      kimage_is_destination_range(image, addr, eaddr)) {
+			list_add(&pages->lru, &extra_pages);
+			pages = NULL;
+		}
+	} while (!pages);
+
+	if (pages) {
+		/* Remember the allocated page... */
+		list_add(&pages->lru, &image->control_pages);
+
+		/* Because the page is already in it's destination
+		 * location we will never allocate another page at
+		 * that address.  Therefore mf_kexec_kimage_alloc_pages
+		 * will not return it (again) and we don't need
+		 * to give it an entry in image->segment[].
+		 */
+	}
+	/* Deal with the destination pages I have inadvertently allocated.
+	 *
+	 * Ideally I would convert multi-page allocations into single
+	 * page allocations, and add everything to image->dest_pages.
+	 *
+	 * For now it is simpler to just free the pages.
+	 */
+	kimage_free_page_list(&extra_pages);
+
+	return pages;
+}
+
+struct page *firmware_kimage_alloc_control_pages(struct kimage *image,
+							unsigned int order)
+{
+	return kimage_alloc_normal_control_pages(image, order);
+}
+
+static int kimage_add_entry(struct kimage *image, kimage_entry_t entry)
+{
+	if (*image->entry != 0)
+		image->entry++;
+
+	if (image->entry == image->last_entry) {
+		kimage_entry_t *ind_page;
+		struct page *page;
+
+		page = kimage_alloc_page(image, GFP_KERNEL, KIMAGE_NO_DEST);
+		if (!page)
+			return -ENOMEM;
+
+		ind_page = page_address(page);
+		*image->entry = mf_kexec_virt_to_phys(ind_page) | IND_INDIRECTION;
+		image->entry = ind_page;
+		image->last_entry = ind_page +
+				      ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1);
+	}
+	*image->entry = entry;
+	image->entry++;
+	*image->entry = 0;
+
+	return 0;
+}
+
+static int kimage_set_destination(struct kimage *image,
+				   unsigned long destination)
+{
+	int result;
+
+	destination &= PAGE_MASK;
+	result = kimage_add_entry(image, destination | IND_DESTINATION);
+	if (result == 0)
+		image->destination = destination;
+
+	return result;
+}
+
+
+static int kimage_add_page(struct kimage *image, unsigned long page)
+{
+	int result;
+
+	page &= PAGE_MASK;
+	result = kimage_add_entry(image, page | IND_SOURCE);
+	if (result == 0)
+		image->destination += PAGE_SIZE;
+
+	return result;
+}
+
+
+static void kimage_free_extra_pages(struct kimage *image)
+{
+	/* Walk through and free any extra destination pages I may have */
+	kimage_free_page_list(&image->dest_pages);
+
+	/* Walk through and free any unusable pages I have cached */
+	kimage_free_page_list(&image->unuseable_pages);
+
+}
+static void kimage_terminate(struct kimage *image)
+{
+	if (*image->entry != 0)
+		image->entry++;
+
+	*image->entry = IND_DONE;
+}
+
+#define for_each_kimage_entry(image, ptr, entry) \
+	for (ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE); \
+		ptr = (entry & IND_INDIRECTION)? \
+			mf_kexec_phys_to_virt((entry & PAGE_MASK)): ptr +1)
+
+static void kimage_free_entry(kimage_entry_t entry)
+{
+	struct page *page;
+
+	page = mf_kexec_pfn_to_page(entry >> PAGE_SHIFT);
+	mf_kexec_kimage_free_pages(page);
+}
+
+static void kimage_free(struct kimage *image)
+{
+	kimage_entry_t *ptr, entry;
+	kimage_entry_t ind = 0;
+
+	if (!image)
+		return;
+
+	kimage_free_extra_pages(image);
+	for_each_kimage_entry(image, ptr, entry) {
+		if (entry & IND_INDIRECTION) {
+			/* Free the previous indirection page */
+			if (ind & IND_INDIRECTION)
+				kimage_free_entry(ind);
+			/* Save this indirection page until we are
+			 * done with it.
+			 */
+			ind = entry;
+		}
+		else if (entry & IND_SOURCE)
+			kimage_free_entry(entry);
+	}
+	/* Free the final indirection page */
+	if (ind & IND_INDIRECTION)
+		kimage_free_entry(ind);
+
+	/* Handle any machine specific cleanup */
+	mf_kexec_cleanup(image);
+
+	/* Free the kexec control pages... */
+	kimage_free_page_list(&image->control_pages);
+	kfree(image);
+}
+
+static kimage_entry_t *kimage_dst_used(struct kimage *image,
+					unsigned long page)
+{
+	kimage_entry_t *ptr, entry;
+	unsigned long destination = 0;
+
+	for_each_kimage_entry(image, ptr, entry) {
+		if (entry & IND_DESTINATION)
+			destination = entry & PAGE_MASK;
+		else if (entry & IND_SOURCE) {
+			if (page == destination)
+				return ptr;
+			destination += PAGE_SIZE;
+		}
+	}
+
+	return NULL;
+}
+
+static struct page *kimage_alloc_page(struct kimage *image,
+					gfp_t gfp_mask,
+					unsigned long destination)
+{
+	/*
+	 * Here we implement safeguards to ensure that a source page
+	 * is not copied to its destination page before the data on
+	 * the destination page is no longer useful.
+	 *
+	 * To do this we maintain the invariant that a source page is
+	 * either its own destination page, or it is not a
+	 * destination page at all.
+	 *
+	 * That is slightly stronger than required, but the proof
+	 * that no problems will not occur is trivial, and the
+	 * implementation is simply to verify.
+	 *
+	 * When allocating all pages normally this algorithm will run
+	 * in O(N) time, but in the worst case it will run in O(N^2)
+	 * time.   If the runtime is a problem the data structures can
+	 * be fixed.
+	 */
+	struct page *page;
+	unsigned long addr;
+
+	/*
+	 * Walk through the list of destination pages, and see if I
+	 * have a match.
+	 */
+	list_for_each_entry(page, &image->dest_pages, lru) {
+		addr = mf_kexec_page_to_pfn(page) << PAGE_SHIFT;
+		if (addr == destination) {
+			list_del(&page->lru);
+			return page;
+		}
+	}
+	page = NULL;
+	while (1) {
+		kimage_entry_t *old;
+
+		/* Allocate a page, if we run out of memory give up */
+		page = mf_kexec_kimage_alloc_pages(gfp_mask, 0,
+							KEXEC_SOURCE_MEMORY_LIMIT);
+		if (!page)
+			return NULL;
+		/* If the page cannot be used file it away */
+		if (mf_kexec_page_to_pfn(page) >
+				(KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) {
+			list_add(&page->lru, &image->unuseable_pages);
+			continue;
+		}
+		addr = mf_kexec_page_to_pfn(page) << PAGE_SHIFT;
+
+		/* If it is the destination page we want use it */
+		if (addr == destination)
+			break;
+
+		/* If the page is not a destination page use it */
+		if (!kimage_is_destination_range(image, addr,
+						  addr + PAGE_SIZE))
+			break;
+
+		/*
+		 * I know that the page is someones destination page.
+		 * See if there is already a source page for this
+		 * destination page.  And if so swap the source pages.
+		 */
+		old = kimage_dst_used(image, addr);
+		if (old) {
+			/* If so move it */
+			unsigned long old_addr;
+			struct page *old_page;
+
+			old_addr = *old & PAGE_MASK;
+			old_page = mf_kexec_pfn_to_page(old_addr >> PAGE_SHIFT);
+			copy_highpage(page, old_page);
+			*old = addr | (*old & ~PAGE_MASK);
+
+			/* The old page I have found cannot be a
+			 * destination page, so return it if it's
+			 * gfp_flags honor the ones passed in.
+			 */
+			if (!(gfp_mask & __GFP_HIGHMEM) &&
+			    PageHighMem(old_page)) {
+				mf_kexec_kimage_free_pages(old_page);
+				continue;
+			}
+			addr = old_addr;
+			page = old_page;
+			break;
+		}
+		else {
+			/* Place the page on the destination list I
+			 * will use it later.
+			 */
+			list_add(&page->lru, &image->dest_pages);
+		}
+	}
+
+	return page;
+}
+
+static int kimage_load_normal_segment(struct kimage *image,
+					 struct kexec_segment *segment)
+{
+	unsigned long maddr;
+	unsigned long ubytes, mbytes;
+	int result;
+	unsigned char __user *buf;
+
+	result = 0;
+	buf = segment->buf;
+	ubytes = segment->bufsz;
+	mbytes = segment->memsz;
+	maddr = segment->mem;
+
+	result = kimage_set_destination(image, maddr);
+	if (result < 0)
+		goto out;
+
+	while (mbytes) {
+		struct page *page;
+		char *ptr;
+		size_t uchunk, mchunk;
+
+		page = kimage_alloc_page(image, GFP_HIGHUSER, maddr);
+		if (!page) {
+			result  = -ENOMEM;
+			goto out;
+		}
+		result = kimage_add_page(image, mf_kexec_page_to_pfn(page)
+								<< PAGE_SHIFT);
+		if (result < 0)
+			goto out;
+
+		ptr = kmap(page);
+		/* Start with a clear page */
+		clear_page(ptr);
+		ptr += maddr & ~PAGE_MASK;
+		mchunk = PAGE_SIZE - (maddr & ~PAGE_MASK);
+		if (mchunk > mbytes)
+			mchunk = mbytes;
+
+		uchunk = mchunk;
+		if (uchunk > ubytes)
+			uchunk = ubytes;
+
+		result = copy_from_user(ptr, buf, uchunk);
+		kunmap(page);
+		if (result) {
+			result = -EFAULT;
+			goto out;
+		}
+		ubytes -= uchunk;
+		maddr  += mchunk;
+		buf    += mchunk;
+		mbytes -= mchunk;
+	}
+out:
+	return result;
+}
+
+static int kimage_load_segment(struct kimage *image,
+				struct kexec_segment *segment)
+{
+	return kimage_load_normal_segment(image, segment);
+}
+
+long firmware_sys_kexec_load(unsigned long entry, unsigned long nr_segments,
+				struct kexec_segment __user *segments,
+				unsigned long flags)
+{
+	struct kimage **dest_image, *image = NULL;
+	int result = 0;
+
+	dest_image = &kexec_image;
+	if (flags & KEXEC_ON_CRASH)
+		dest_image = &kexec_crash_image;
+	if (nr_segments > 0) {
+		unsigned long i;
+
+		/* Loading another kernel to reboot into */
+		if ((flags & KEXEC_ON_CRASH) == 0)
+			result = kimage_normal_alloc(&image, entry,
+							nr_segments, segments);
+		/* Loading another kernel to switch to if this one crashes */
+		else if (flags & KEXEC_ON_CRASH) {
+			/* Free any current crash dump kernel before
+			 * we corrupt it.
+			 */
+			mf_kexec_unload(image);
+			kimage_free(xchg(&kexec_crash_image, NULL));
+			result = kimage_crash_alloc(&image, entry,
+						     nr_segments, segments);
+		}
+		if (result)
+			goto out;
+
+		if (flags & KEXEC_PRESERVE_CONTEXT)
+			image->preserve_context = 1;
+		result = mf_kexec_prepare(image);
+		if (result)
+			goto out;
+
+		for (i = 0; i < nr_segments; i++) {
+			result = kimage_load_segment(image, &image->segment[i]);
+			if (result)
+				goto out;
+		}
+		kimage_terminate(image);
+	}
+
+	result = mf_kexec_load(image);
+
+	if (result)
+		goto out;
+
+	/* Install the new kernel, and  Uninstall the old */
+	image = xchg(dest_image, image);
+
+out:
+	mf_kexec_unload(image);
+
+	kimage_free(image);
+
+	return result;
+}
+
+void firmware_crash_kexec(struct pt_regs *regs)
+{
+	struct pt_regs fixed_regs;
+
+	crash_setup_regs(&fixed_regs, regs);
+	crash_save_vmcoreinfo();
+	machine_crash_shutdown(&fixed_regs);
+	mf_kexec(kexec_crash_image);
+}
+
+int firmware_kernel_kexec(void)
+{
+	kernel_restart_prepare(NULL);
+	printk(KERN_EMERG "Starting new kernel\n");
+	mf_kexec_shutdown();
+	mf_kexec(kexec_image);
+
+	return 0;
+}
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 5e4bd78..9f3b6cb 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -38,6 +38,10 @@
 #include <asm/io.h>
 #include <asm/sections.h>
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+bool kexec_use_firmware = false;
+#endif
+
 /* Per cpu memory for storing cpu states in case of system crash. */
 note_buf_t __percpu *crash_notes;
 
@@ -924,7 +928,7 @@ static int kimage_load_segment(struct kimage *image,
  *   the devices in a consistent state so a later kernel can
  *   reinitialize them.
  *
- * - A machine specific part that includes the syscall number
+ * - A machine/firmware specific part that includes the syscall number
  *   and the copies the image to it's final destination.  And
  *   jumps into the image at entry.
  *
@@ -978,6 +982,17 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
 	if (!mutex_trylock(&kexec_mutex))
 		return -EBUSY;
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+	if (kexec_use_firmware) {
+		result = firmware_sys_kexec_load(entry, nr_segments,
+							segments, flags);
+
+		mutex_unlock(&kexec_mutex);
+
+		return result;
+	}
+#endif
+
 	dest_image = &kexec_image;
 	if (flags & KEXEC_ON_CRASH)
 		dest_image = &kexec_crash_image;
@@ -1091,10 +1106,17 @@ void crash_kexec(struct pt_regs *regs)
 		if (kexec_crash_image) {
 			struct pt_regs fixed_regs;
 
-			crash_setup_regs(&fixed_regs, regs);
-			crash_save_vmcoreinfo();
-			machine_crash_shutdown(&fixed_regs);
-			machine_kexec(kexec_crash_image);
+#ifdef CONFIG_KEXEC_FIRMWARE
+			if (kexec_use_firmware)
+				firmware_crash_kexec(regs);
+			else
+#endif
+			{
+				crash_setup_regs(&fixed_regs, regs);
+				crash_save_vmcoreinfo();
+				machine_crash_shutdown(&fixed_regs);
+				machine_kexec(kexec_crash_image);
+			}
 		}
 		mutex_unlock(&kexec_mutex);
 	}
@@ -1132,6 +1154,13 @@ int crash_shrink_memory(unsigned long new_size)
 
 	mutex_lock(&kexec_mutex);
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+	if (kexec_use_firmware) {
+		ret = -ENOSYS;
+		goto unlock;
+	}
+#endif
+
 	if (kexec_crash_image) {
 		ret = -ENOENT;
 		goto unlock;
@@ -1536,6 +1565,13 @@ int kernel_kexec(void)
 		goto Unlock;
 	}
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+	if (kexec_use_firmware) {
+		error = firmware_kernel_kexec();
+		goto Unlock;
+	}
+#endif
+
 #ifdef CONFIG_KEXEC_JUMP
 	if (kexec_image->preserve_context) {
 		lock_system_sleep();
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 01/11] kexec: introduce kexec firmware support
@ 2012-12-27  2:18   ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default
Linux infrastructure and require some support from firmware and/or hypervisor.
To cope with that problem kexec firmware infrastructure was introduced.
It allows a developer to use all kexec/kdump features of given firmware
or hypervisor.

v3 - suggestions/fixes:
   - replace kexec_ops struct by kexec firmware infrastructure
     (suggested by Eric Biederman).

v2 - suggestions/fixes:
   - add comment for kexec_ops.crash_alloc_temp_store member
     (suggested by Konrad Rzeszutek Wilk),
   - simplify kexec_ops usage
     (suggested by Konrad Rzeszutek Wilk).

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 include/linux/kexec.h   |   26 ++-
 kernel/Makefile         |    1 +
 kernel/kexec-firmware.c |  743 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c          |   46 +++-
 4 files changed, 809 insertions(+), 7 deletions(-)
 create mode 100644 kernel/kexec-firmware.c

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index d0b8458..9568457 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -116,17 +116,34 @@ struct kimage {
 #endif
 };
 
-
-
 /* kexec interface functions */
 extern void machine_kexec(struct kimage *image);
 extern int machine_kexec_prepare(struct kimage *image);
 extern void machine_kexec_cleanup(struct kimage *image);
+extern struct page *mf_kexec_kimage_alloc_pages(gfp_t gfp_mask,
+						unsigned int order,
+						unsigned long limit);
+extern void mf_kexec_kimage_free_pages(struct page *page);
+extern unsigned long mf_kexec_page_to_pfn(struct page *page);
+extern struct page *mf_kexec_pfn_to_page(unsigned long mfn);
+extern unsigned long mf_kexec_virt_to_phys(volatile void *address);
+extern void *mf_kexec_phys_to_virt(unsigned long address);
+extern int mf_kexec_prepare(struct kimage *image);
+extern int mf_kexec_load(struct kimage *image);
+extern void mf_kexec_cleanup(struct kimage *image);
+extern void mf_kexec_unload(struct kimage *image);
+extern void mf_kexec_shutdown(void);
+extern void mf_kexec(struct kimage *image);
 extern asmlinkage long sys_kexec_load(unsigned long entry,
 					unsigned long nr_segments,
 					struct kexec_segment __user *segments,
 					unsigned long flags);
+extern long firmware_sys_kexec_load(unsigned long entry,
+					unsigned long nr_segments,
+					struct kexec_segment __user *segments,
+					unsigned long flags);
 extern int kernel_kexec(void);
+extern int firmware_kernel_kexec(void);
 #ifdef CONFIG_COMPAT
 extern asmlinkage long compat_sys_kexec_load(unsigned long entry,
 				unsigned long nr_segments,
@@ -135,7 +152,10 @@ extern asmlinkage long compat_sys_kexec_load(unsigned long entry,
 #endif
 extern struct page *kimage_alloc_control_pages(struct kimage *image,
 						unsigned int order);
+extern struct page *firmware_kimage_alloc_control_pages(struct kimage *image,
+							unsigned int order);
 extern void crash_kexec(struct pt_regs *);
+extern void firmware_crash_kexec(struct pt_regs *);
 int kexec_should_crash(struct task_struct *);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
 void crash_save_vmcoreinfo(void);
@@ -168,6 +188,8 @@ unsigned long paddr_vmcoreinfo_note(void);
 #define VMCOREINFO_CONFIG(name) \
 	vmcoreinfo_append_str("CONFIG_%s=y\n", #name)
 
+extern bool kexec_use_firmware;
+
 extern struct kimage *kexec_image;
 extern struct kimage *kexec_crash_image;
 
diff --git a/kernel/Makefile b/kernel/Makefile
index 6c072b6..bc96b2f 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -58,6 +58,7 @@ obj-$(CONFIG_MODULE_SIG) += module_signing.o modsign_pubkey.o modsign_certificat
 obj-$(CONFIG_KALLSYMS) += kallsyms.o
 obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o
 obj-$(CONFIG_KEXEC) += kexec.o
+obj-$(CONFIG_KEXEC_FIRMWARE) += kexec-firmware.o
 obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o
 obj-$(CONFIG_COMPAT) += compat.o
 obj-$(CONFIG_CGROUPS) += cgroup.o
diff --git a/kernel/kexec-firmware.c b/kernel/kexec-firmware.c
new file mode 100644
index 0000000..f6ddd4c
--- /dev/null
+++ b/kernel/kexec-firmware.c
@@ -0,0 +1,743 @@
+/*
+ * Copyright (C) 2002-2004 Eric Biederman  <ebiederm@xmission.com>
+ * Copyright (C) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * Most of the code here is a copy of kernel/kexec.c.
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
+ */
+
+#include <linux/atomic.h>
+#include <linux/errno.h>
+#include <linux/highmem.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/reboot.h>
+#include <linux/slab.h>
+
+#include <asm/uaccess.h>
+
+/*
+ * KIMAGE_NO_DEST is an impossible destination address..., for
+ * allocating pages whose destination address we do not care about.
+ */
+#define KIMAGE_NO_DEST (-1UL)
+
+static int kimage_is_destination_range(struct kimage *image,
+				       unsigned long start, unsigned long end);
+static struct page *kimage_alloc_page(struct kimage *image,
+				       gfp_t gfp_mask,
+				       unsigned long dest);
+
+static int do_kimage_alloc(struct kimage **rimage, unsigned long entry,
+	                    unsigned long nr_segments,
+                            struct kexec_segment __user *segments)
+{
+	size_t segment_bytes;
+	struct kimage *image;
+	unsigned long i;
+	int result;
+
+	/* Allocate a controlling structure */
+	result = -ENOMEM;
+	image = kzalloc(sizeof(*image), GFP_KERNEL);
+	if (!image)
+		goto out;
+
+	image->head = 0;
+	image->entry = &image->head;
+	image->last_entry = &image->head;
+	image->control_page = ~0; /* By default this does not apply */
+	image->start = entry;
+	image->type = KEXEC_TYPE_DEFAULT;
+
+	/* Initialize the list of control pages */
+	INIT_LIST_HEAD(&image->control_pages);
+
+	/* Initialize the list of destination pages */
+	INIT_LIST_HEAD(&image->dest_pages);
+
+	/* Initialize the list of unusable pages */
+	INIT_LIST_HEAD(&image->unuseable_pages);
+
+	/* Read in the segments */
+	image->nr_segments = nr_segments;
+	segment_bytes = nr_segments * sizeof(*segments);
+	result = copy_from_user(image->segment, segments, segment_bytes);
+	if (result) {
+		result = -EFAULT;
+		goto out;
+	}
+
+	/*
+	 * Verify we have good destination addresses.  The caller is
+	 * responsible for making certain we don't attempt to load
+	 * the new image into invalid or reserved areas of RAM.  This
+	 * just verifies it is an address we can use.
+	 *
+	 * Since the kernel does everything in page size chunks ensure
+	 * the destination addresses are page aligned.  Too many
+	 * special cases crop of when we don't do this.  The most
+	 * insidious is getting overlapping destination addresses
+	 * simply because addresses are changed to page size
+	 * granularity.
+	 */
+	result = -EADDRNOTAVAIL;
+	for (i = 0; i < nr_segments; i++) {
+		unsigned long mstart, mend;
+
+		mstart = image->segment[i].mem;
+		mend   = mstart + image->segment[i].memsz;
+		if ((mstart & ~PAGE_MASK) || (mend & ~PAGE_MASK))
+			goto out;
+		if (mend >= KEXEC_DESTINATION_MEMORY_LIMIT)
+			goto out;
+	}
+
+	/* Verify our destination addresses do not overlap.
+	 * If we alloed overlapping destination addresses
+	 * through very weird things can happen with no
+	 * easy explanation as one segment stops on another.
+	 */
+	result = -EINVAL;
+	for (i = 0; i < nr_segments; i++) {
+		unsigned long mstart, mend;
+		unsigned long j;
+
+		mstart = image->segment[i].mem;
+		mend   = mstart + image->segment[i].memsz;
+		for (j = 0; j < i; j++) {
+			unsigned long pstart, pend;
+			pstart = image->segment[j].mem;
+			pend   = pstart + image->segment[j].memsz;
+			/* Do the segments overlap ? */
+			if ((mend > pstart) && (mstart < pend))
+				goto out;
+		}
+	}
+
+	/* Ensure our buffer sizes are strictly less than
+	 * our memory sizes.  This should always be the case,
+	 * and it is easier to check up front than to be surprised
+	 * later on.
+	 */
+	result = -EINVAL;
+	for (i = 0; i < nr_segments; i++) {
+		if (image->segment[i].bufsz > image->segment[i].memsz)
+			goto out;
+	}
+
+	result = 0;
+out:
+	if (result == 0)
+		*rimage = image;
+	else
+		kfree(image);
+
+	return result;
+
+}
+
+static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry,
+				unsigned long nr_segments,
+				struct kexec_segment __user *segments)
+{
+	int result;
+	struct kimage *image;
+
+	/* Allocate and initialize a controlling structure */
+	image = NULL;
+	result = do_kimage_alloc(&image, entry, nr_segments, segments);
+	if (result)
+		goto out;
+
+	*rimage = image;
+
+	/*
+	 * Find a location for the control code buffer, and add it
+	 * the vector of segments so that it's pages will also be
+	 * counted as destination pages.
+	 */
+	result = -ENOMEM;
+	image->control_code_page = firmware_kimage_alloc_control_pages(image,
+					   get_order(KEXEC_CONTROL_PAGE_SIZE));
+	if (!image->control_code_page) {
+		printk(KERN_ERR "Could not allocate control_code_buffer\n");
+		goto out;
+	}
+
+	image->swap_page = firmware_kimage_alloc_control_pages(image, 0);
+	if (!image->swap_page) {
+		printk(KERN_ERR "Could not allocate swap buffer\n");
+		goto out;
+	}
+
+	result = 0;
+ out:
+	if (result == 0)
+		*rimage = image;
+	else
+		kfree(image);
+
+	return result;
+}
+
+static int kimage_crash_alloc(struct kimage **rimage, unsigned long entry,
+				unsigned long nr_segments,
+				struct kexec_segment __user *segments)
+{
+	int result;
+	struct kimage *image;
+	unsigned long i;
+
+	image = NULL;
+	/* Verify we have a valid entry point */
+	if ((entry < crashk_res.start) || (entry > crashk_res.end)) {
+		result = -EADDRNOTAVAIL;
+		goto out;
+	}
+
+	/* Allocate and initialize a controlling structure */
+	result = do_kimage_alloc(&image, entry, nr_segments, segments);
+	if (result)
+		goto out;
+
+	/* Enable the special crash kernel control page
+	 * allocation policy.
+	 */
+	image->control_page = crashk_res.start;
+	image->type = KEXEC_TYPE_CRASH;
+
+	/*
+	 * Verify we have good destination addresses.  Normally
+	 * the caller is responsible for making certain we don't
+	 * attempt to load the new image into invalid or reserved
+	 * areas of RAM.  But crash kernels are preloaded into a
+	 * reserved area of ram.  We must ensure the addresses
+	 * are in the reserved area otherwise preloading the
+	 * kernel could corrupt things.
+	 */
+	result = -EADDRNOTAVAIL;
+	for (i = 0; i < nr_segments; i++) {
+		unsigned long mstart, mend;
+
+		mstart = image->segment[i].mem;
+		mend = mstart + image->segment[i].memsz - 1;
+		/* Ensure we are within the crash kernel limits */
+		if ((mstart < crashk_res.start) || (mend > crashk_res.end))
+			goto out;
+	}
+
+	/*
+	 * Find a location for the control code buffer, and add
+	 * the vector of segments so that it's pages will also be
+	 * counted as destination pages.
+	 */
+	result = -ENOMEM;
+	image->control_code_page = firmware_kimage_alloc_control_pages(image,
+					   get_order(KEXEC_CONTROL_PAGE_SIZE));
+	if (!image->control_code_page) {
+		printk(KERN_ERR "Could not allocate control_code_buffer\n");
+		goto out;
+	}
+
+	result = 0;
+out:
+	if (result == 0)
+		*rimage = image;
+	else
+		kfree(image);
+
+	return result;
+}
+
+static int kimage_is_destination_range(struct kimage *image,
+					unsigned long start,
+					unsigned long end)
+{
+	unsigned long i;
+
+	for (i = 0; i < image->nr_segments; i++) {
+		unsigned long mstart, mend;
+
+		mstart = image->segment[i].mem;
+		mend = mstart + image->segment[i].memsz;
+		if ((end > mstart) && (start < mend))
+			return 1;
+	}
+
+	return 0;
+}
+
+static void kimage_free_page_list(struct list_head *list)
+{
+	struct list_head *pos, *next;
+
+	list_for_each_safe(pos, next, list) {
+		struct page *page;
+
+		page = list_entry(pos, struct page, lru);
+		list_del(&page->lru);
+		mf_kexec_kimage_free_pages(page);
+	}
+}
+
+static struct page *kimage_alloc_normal_control_pages(struct kimage *image,
+							unsigned int order)
+{
+	/* Control pages are special, they are the intermediaries
+	 * that are needed while we copy the rest of the pages
+	 * to their final resting place.  As such they must
+	 * not conflict with either the destination addresses
+	 * or memory the kernel is already using.
+	 *
+	 * The only case where we really need more than one of
+	 * these are for architectures where we cannot disable
+	 * the MMU and must instead generate an identity mapped
+	 * page table for all of the memory.
+	 *
+	 * At worst this runs in O(N) of the image size.
+	 */
+	struct list_head extra_pages;
+	struct page *pages;
+	unsigned int count;
+
+	count = 1 << order;
+	INIT_LIST_HEAD(&extra_pages);
+
+	/* Loop while I can allocate a page and the page allocated
+	 * is a destination page.
+	 */
+	do {
+		unsigned long pfn, epfn, addr, eaddr;
+
+		pages = mf_kexec_kimage_alloc_pages(GFP_KERNEL, order,
+							KEXEC_CONTROL_MEMORY_LIMIT);
+		if (!pages)
+			break;
+		pfn   = mf_kexec_page_to_pfn(pages);
+		epfn  = pfn + count;
+		addr  = pfn << PAGE_SHIFT;
+		eaddr = epfn << PAGE_SHIFT;
+		if ((epfn >= (KEXEC_CONTROL_MEMORY_LIMIT >> PAGE_SHIFT)) ||
+			      kimage_is_destination_range(image, addr, eaddr)) {
+			list_add(&pages->lru, &extra_pages);
+			pages = NULL;
+		}
+	} while (!pages);
+
+	if (pages) {
+		/* Remember the allocated page... */
+		list_add(&pages->lru, &image->control_pages);
+
+		/* Because the page is already in it's destination
+		 * location we will never allocate another page at
+		 * that address.  Therefore mf_kexec_kimage_alloc_pages
+		 * will not return it (again) and we don't need
+		 * to give it an entry in image->segment[].
+		 */
+	}
+	/* Deal with the destination pages I have inadvertently allocated.
+	 *
+	 * Ideally I would convert multi-page allocations into single
+	 * page allocations, and add everything to image->dest_pages.
+	 *
+	 * For now it is simpler to just free the pages.
+	 */
+	kimage_free_page_list(&extra_pages);
+
+	return pages;
+}
+
+struct page *firmware_kimage_alloc_control_pages(struct kimage *image,
+							unsigned int order)
+{
+	return kimage_alloc_normal_control_pages(image, order);
+}
+
+static int kimage_add_entry(struct kimage *image, kimage_entry_t entry)
+{
+	if (*image->entry != 0)
+		image->entry++;
+
+	if (image->entry == image->last_entry) {
+		kimage_entry_t *ind_page;
+		struct page *page;
+
+		page = kimage_alloc_page(image, GFP_KERNEL, KIMAGE_NO_DEST);
+		if (!page)
+			return -ENOMEM;
+
+		ind_page = page_address(page);
+		*image->entry = mf_kexec_virt_to_phys(ind_page) | IND_INDIRECTION;
+		image->entry = ind_page;
+		image->last_entry = ind_page +
+				      ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1);
+	}
+	*image->entry = entry;
+	image->entry++;
+	*image->entry = 0;
+
+	return 0;
+}
+
+static int kimage_set_destination(struct kimage *image,
+				   unsigned long destination)
+{
+	int result;
+
+	destination &= PAGE_MASK;
+	result = kimage_add_entry(image, destination | IND_DESTINATION);
+	if (result == 0)
+		image->destination = destination;
+
+	return result;
+}
+
+
+static int kimage_add_page(struct kimage *image, unsigned long page)
+{
+	int result;
+
+	page &= PAGE_MASK;
+	result = kimage_add_entry(image, page | IND_SOURCE);
+	if (result == 0)
+		image->destination += PAGE_SIZE;
+
+	return result;
+}
+
+
+static void kimage_free_extra_pages(struct kimage *image)
+{
+	/* Walk through and free any extra destination pages I may have */
+	kimage_free_page_list(&image->dest_pages);
+
+	/* Walk through and free any unusable pages I have cached */
+	kimage_free_page_list(&image->unuseable_pages);
+
+}
+static void kimage_terminate(struct kimage *image)
+{
+	if (*image->entry != 0)
+		image->entry++;
+
+	*image->entry = IND_DONE;
+}
+
+#define for_each_kimage_entry(image, ptr, entry) \
+	for (ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE); \
+		ptr = (entry & IND_INDIRECTION)? \
+			mf_kexec_phys_to_virt((entry & PAGE_MASK)): ptr +1)
+
+static void kimage_free_entry(kimage_entry_t entry)
+{
+	struct page *page;
+
+	page = mf_kexec_pfn_to_page(entry >> PAGE_SHIFT);
+	mf_kexec_kimage_free_pages(page);
+}
+
+static void kimage_free(struct kimage *image)
+{
+	kimage_entry_t *ptr, entry;
+	kimage_entry_t ind = 0;
+
+	if (!image)
+		return;
+
+	kimage_free_extra_pages(image);
+	for_each_kimage_entry(image, ptr, entry) {
+		if (entry & IND_INDIRECTION) {
+			/* Free the previous indirection page */
+			if (ind & IND_INDIRECTION)
+				kimage_free_entry(ind);
+			/* Save this indirection page until we are
+			 * done with it.
+			 */
+			ind = entry;
+		}
+		else if (entry & IND_SOURCE)
+			kimage_free_entry(entry);
+	}
+	/* Free the final indirection page */
+	if (ind & IND_INDIRECTION)
+		kimage_free_entry(ind);
+
+	/* Handle any machine specific cleanup */
+	mf_kexec_cleanup(image);
+
+	/* Free the kexec control pages... */
+	kimage_free_page_list(&image->control_pages);
+	kfree(image);
+}
+
+static kimage_entry_t *kimage_dst_used(struct kimage *image,
+					unsigned long page)
+{
+	kimage_entry_t *ptr, entry;
+	unsigned long destination = 0;
+
+	for_each_kimage_entry(image, ptr, entry) {
+		if (entry & IND_DESTINATION)
+			destination = entry & PAGE_MASK;
+		else if (entry & IND_SOURCE) {
+			if (page == destination)
+				return ptr;
+			destination += PAGE_SIZE;
+		}
+	}
+
+	return NULL;
+}
+
+static struct page *kimage_alloc_page(struct kimage *image,
+					gfp_t gfp_mask,
+					unsigned long destination)
+{
+	/*
+	 * Here we implement safeguards to ensure that a source page
+	 * is not copied to its destination page before the data on
+	 * the destination page is no longer useful.
+	 *
+	 * To do this we maintain the invariant that a source page is
+	 * either its own destination page, or it is not a
+	 * destination page at all.
+	 *
+	 * That is slightly stronger than required, but the proof
+	 * that no problems will not occur is trivial, and the
+	 * implementation is simply to verify.
+	 *
+	 * When allocating all pages normally this algorithm will run
+	 * in O(N) time, but in the worst case it will run in O(N^2)
+	 * time.   If the runtime is a problem the data structures can
+	 * be fixed.
+	 */
+	struct page *page;
+	unsigned long addr;
+
+	/*
+	 * Walk through the list of destination pages, and see if I
+	 * have a match.
+	 */
+	list_for_each_entry(page, &image->dest_pages, lru) {
+		addr = mf_kexec_page_to_pfn(page) << PAGE_SHIFT;
+		if (addr == destination) {
+			list_del(&page->lru);
+			return page;
+		}
+	}
+	page = NULL;
+	while (1) {
+		kimage_entry_t *old;
+
+		/* Allocate a page, if we run out of memory give up */
+		page = mf_kexec_kimage_alloc_pages(gfp_mask, 0,
+							KEXEC_SOURCE_MEMORY_LIMIT);
+		if (!page)
+			return NULL;
+		/* If the page cannot be used file it away */
+		if (mf_kexec_page_to_pfn(page) >
+				(KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) {
+			list_add(&page->lru, &image->unuseable_pages);
+			continue;
+		}
+		addr = mf_kexec_page_to_pfn(page) << PAGE_SHIFT;
+
+		/* If it is the destination page we want use it */
+		if (addr == destination)
+			break;
+
+		/* If the page is not a destination page use it */
+		if (!kimage_is_destination_range(image, addr,
+						  addr + PAGE_SIZE))
+			break;
+
+		/*
+		 * I know that the page is someones destination page.
+		 * See if there is already a source page for this
+		 * destination page.  And if so swap the source pages.
+		 */
+		old = kimage_dst_used(image, addr);
+		if (old) {
+			/* If so move it */
+			unsigned long old_addr;
+			struct page *old_page;
+
+			old_addr = *old & PAGE_MASK;
+			old_page = mf_kexec_pfn_to_page(old_addr >> PAGE_SHIFT);
+			copy_highpage(page, old_page);
+			*old = addr | (*old & ~PAGE_MASK);
+
+			/* The old page I have found cannot be a
+			 * destination page, so return it if it's
+			 * gfp_flags honor the ones passed in.
+			 */
+			if (!(gfp_mask & __GFP_HIGHMEM) &&
+			    PageHighMem(old_page)) {
+				mf_kexec_kimage_free_pages(old_page);
+				continue;
+			}
+			addr = old_addr;
+			page = old_page;
+			break;
+		}
+		else {
+			/* Place the page on the destination list I
+			 * will use it later.
+			 */
+			list_add(&page->lru, &image->dest_pages);
+		}
+	}
+
+	return page;
+}
+
+static int kimage_load_normal_segment(struct kimage *image,
+					 struct kexec_segment *segment)
+{
+	unsigned long maddr;
+	unsigned long ubytes, mbytes;
+	int result;
+	unsigned char __user *buf;
+
+	result = 0;
+	buf = segment->buf;
+	ubytes = segment->bufsz;
+	mbytes = segment->memsz;
+	maddr = segment->mem;
+
+	result = kimage_set_destination(image, maddr);
+	if (result < 0)
+		goto out;
+
+	while (mbytes) {
+		struct page *page;
+		char *ptr;
+		size_t uchunk, mchunk;
+
+		page = kimage_alloc_page(image, GFP_HIGHUSER, maddr);
+		if (!page) {
+			result  = -ENOMEM;
+			goto out;
+		}
+		result = kimage_add_page(image, mf_kexec_page_to_pfn(page)
+								<< PAGE_SHIFT);
+		if (result < 0)
+			goto out;
+
+		ptr = kmap(page);
+		/* Start with a clear page */
+		clear_page(ptr);
+		ptr += maddr & ~PAGE_MASK;
+		mchunk = PAGE_SIZE - (maddr & ~PAGE_MASK);
+		if (mchunk > mbytes)
+			mchunk = mbytes;
+
+		uchunk = mchunk;
+		if (uchunk > ubytes)
+			uchunk = ubytes;
+
+		result = copy_from_user(ptr, buf, uchunk);
+		kunmap(page);
+		if (result) {
+			result = -EFAULT;
+			goto out;
+		}
+		ubytes -= uchunk;
+		maddr  += mchunk;
+		buf    += mchunk;
+		mbytes -= mchunk;
+	}
+out:
+	return result;
+}
+
+static int kimage_load_segment(struct kimage *image,
+				struct kexec_segment *segment)
+{
+	return kimage_load_normal_segment(image, segment);
+}
+
+long firmware_sys_kexec_load(unsigned long entry, unsigned long nr_segments,
+				struct kexec_segment __user *segments,
+				unsigned long flags)
+{
+	struct kimage **dest_image, *image = NULL;
+	int result = 0;
+
+	dest_image = &kexec_image;
+	if (flags & KEXEC_ON_CRASH)
+		dest_image = &kexec_crash_image;
+	if (nr_segments > 0) {
+		unsigned long i;
+
+		/* Loading another kernel to reboot into */
+		if ((flags & KEXEC_ON_CRASH) == 0)
+			result = kimage_normal_alloc(&image, entry,
+							nr_segments, segments);
+		/* Loading another kernel to switch to if this one crashes */
+		else if (flags & KEXEC_ON_CRASH) {
+			/* Free any current crash dump kernel before
+			 * we corrupt it.
+			 */
+			mf_kexec_unload(image);
+			kimage_free(xchg(&kexec_crash_image, NULL));
+			result = kimage_crash_alloc(&image, entry,
+						     nr_segments, segments);
+		}
+		if (result)
+			goto out;
+
+		if (flags & KEXEC_PRESERVE_CONTEXT)
+			image->preserve_context = 1;
+		result = mf_kexec_prepare(image);
+		if (result)
+			goto out;
+
+		for (i = 0; i < nr_segments; i++) {
+			result = kimage_load_segment(image, &image->segment[i]);
+			if (result)
+				goto out;
+		}
+		kimage_terminate(image);
+	}
+
+	result = mf_kexec_load(image);
+
+	if (result)
+		goto out;
+
+	/* Install the new kernel, and  Uninstall the old */
+	image = xchg(dest_image, image);
+
+out:
+	mf_kexec_unload(image);
+
+	kimage_free(image);
+
+	return result;
+}
+
+void firmware_crash_kexec(struct pt_regs *regs)
+{
+	struct pt_regs fixed_regs;
+
+	crash_setup_regs(&fixed_regs, regs);
+	crash_save_vmcoreinfo();
+	machine_crash_shutdown(&fixed_regs);
+	mf_kexec(kexec_crash_image);
+}
+
+int firmware_kernel_kexec(void)
+{
+	kernel_restart_prepare(NULL);
+	printk(KERN_EMERG "Starting new kernel\n");
+	mf_kexec_shutdown();
+	mf_kexec(kexec_image);
+
+	return 0;
+}
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 5e4bd78..9f3b6cb 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -38,6 +38,10 @@
 #include <asm/io.h>
 #include <asm/sections.h>
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+bool kexec_use_firmware = false;
+#endif
+
 /* Per cpu memory for storing cpu states in case of system crash. */
 note_buf_t __percpu *crash_notes;
 
@@ -924,7 +928,7 @@ static int kimage_load_segment(struct kimage *image,
  *   the devices in a consistent state so a later kernel can
  *   reinitialize them.
  *
- * - A machine specific part that includes the syscall number
+ * - A machine/firmware specific part that includes the syscall number
  *   and the copies the image to it's final destination.  And
  *   jumps into the image at entry.
  *
@@ -978,6 +982,17 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
 	if (!mutex_trylock(&kexec_mutex))
 		return -EBUSY;
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+	if (kexec_use_firmware) {
+		result = firmware_sys_kexec_load(entry, nr_segments,
+							segments, flags);
+
+		mutex_unlock(&kexec_mutex);
+
+		return result;
+	}
+#endif
+
 	dest_image = &kexec_image;
 	if (flags & KEXEC_ON_CRASH)
 		dest_image = &kexec_crash_image;
@@ -1091,10 +1106,17 @@ void crash_kexec(struct pt_regs *regs)
 		if (kexec_crash_image) {
 			struct pt_regs fixed_regs;
 
-			crash_setup_regs(&fixed_regs, regs);
-			crash_save_vmcoreinfo();
-			machine_crash_shutdown(&fixed_regs);
-			machine_kexec(kexec_crash_image);
+#ifdef CONFIG_KEXEC_FIRMWARE
+			if (kexec_use_firmware)
+				firmware_crash_kexec(regs);
+			else
+#endif
+			{
+				crash_setup_regs(&fixed_regs, regs);
+				crash_save_vmcoreinfo();
+				machine_crash_shutdown(&fixed_regs);
+				machine_kexec(kexec_crash_image);
+			}
 		}
 		mutex_unlock(&kexec_mutex);
 	}
@@ -1132,6 +1154,13 @@ int crash_shrink_memory(unsigned long new_size)
 
 	mutex_lock(&kexec_mutex);
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+	if (kexec_use_firmware) {
+		ret = -ENOSYS;
+		goto unlock;
+	}
+#endif
+
 	if (kexec_crash_image) {
 		ret = -ENOENT;
 		goto unlock;
@@ -1536,6 +1565,13 @@ int kernel_kexec(void)
 		goto Unlock;
 	}
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+	if (kexec_use_firmware) {
+		error = firmware_kernel_kexec();
+		goto Unlock;
+	}
+#endif
+
 #ifdef CONFIG_KEXEC_JUMP
 	if (kexec_image->preserve_context) {
 		lock_system_sleep();
-- 
1.5.6.5


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2012-12-27  2:18     ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Some implementations (e.g. Xen PVOPS) could not use part of identity page table
to construct transition page table. It means that they require separate PUDs,
PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
requirement add extra pointer to PGD, PUD, PMD and PTE and align existing code.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/include/asm/kexec.h       |   10 +++++++---
 arch/x86/kernel/machine_kexec_64.c |   12 ++++++------
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 6080d26..cedd204 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -157,9 +157,13 @@ struct kimage_arch {
 };
 #else
 struct kimage_arch {
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
+	pgd_t *pgd;
+	pud_t *pud0;
+	pud_t *pud1;
+	pmd_t *pmd0;
+	pmd_t *pmd1;
+	pte_t *pte0;
+	pte_t *pte1;
 };
 #endif
 
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index b3ea9db..976e54b 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -137,9 +137,9 @@ out:
 
 static void free_transition_pgtable(struct kimage *image)
 {
-	free_page((unsigned long)image->arch.pud);
-	free_page((unsigned long)image->arch.pmd);
-	free_page((unsigned long)image->arch.pte);
+	free_page((unsigned long)image->arch.pud0);
+	free_page((unsigned long)image->arch.pmd0);
+	free_page((unsigned long)image->arch.pte0);
 }
 
 static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
@@ -157,7 +157,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 		pud = (pud_t *)get_zeroed_page(GFP_KERNEL);
 		if (!pud)
 			goto err;
-		image->arch.pud = pud;
+		image->arch.pud0 = pud;
 		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
 	}
 	pud = pud_offset(pgd, vaddr);
@@ -165,7 +165,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 		pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
 		if (!pmd)
 			goto err;
-		image->arch.pmd = pmd;
+		image->arch.pmd0 = pmd;
 		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
 	}
 	pmd = pmd_offset(pud, vaddr);
@@ -173,7 +173,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
 		if (!pte)
 			goto err;
-		image->arch.pte = pte;
+		image->arch.pte0 = pte;
 		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
 	}
 	pte = pte_offset_kernel(pmd, vaddr);
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2012-12-27  2:18   ` Daniel Kiper
                     ` (2 preceding siblings ...)
  (?)
@ 2012-12-27  2:18   ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Some implementations (e.g. Xen PVOPS) could not use part of identity page table
to construct transition page table. It means that they require separate PUDs,
PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
requirement add extra pointer to PGD, PUD, PMD and PTE and align existing code.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/include/asm/kexec.h       |   10 +++++++---
 arch/x86/kernel/machine_kexec_64.c |   12 ++++++------
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 6080d26..cedd204 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -157,9 +157,13 @@ struct kimage_arch {
 };
 #else
 struct kimage_arch {
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
+	pgd_t *pgd;
+	pud_t *pud0;
+	pud_t *pud1;
+	pmd_t *pmd0;
+	pmd_t *pmd1;
+	pte_t *pte0;
+	pte_t *pte1;
 };
 #endif
 
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index b3ea9db..976e54b 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -137,9 +137,9 @@ out:
 
 static void free_transition_pgtable(struct kimage *image)
 {
-	free_page((unsigned long)image->arch.pud);
-	free_page((unsigned long)image->arch.pmd);
-	free_page((unsigned long)image->arch.pte);
+	free_page((unsigned long)image->arch.pud0);
+	free_page((unsigned long)image->arch.pmd0);
+	free_page((unsigned long)image->arch.pte0);
 }
 
 static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
@@ -157,7 +157,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 		pud = (pud_t *)get_zeroed_page(GFP_KERNEL);
 		if (!pud)
 			goto err;
-		image->arch.pud = pud;
+		image->arch.pud0 = pud;
 		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
 	}
 	pud = pud_offset(pgd, vaddr);
@@ -165,7 +165,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 		pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
 		if (!pmd)
 			goto err;
-		image->arch.pmd = pmd;
+		image->arch.pmd0 = pmd;
 		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
 	}
 	pmd = pmd_offset(pud, vaddr);
@@ -173,7 +173,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
 		if (!pte)
 			goto err;
-		image->arch.pte = pte;
+		image->arch.pte0 = pte;
 		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
 	}
 	pte = pte_offset_kernel(pmd, vaddr);
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2012-12-27  2:18     ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3-Sxgqhf6Nn4DQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	jbeulich-IBi9RG/b67k, konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
	maxim.uvarov-QHcLZuEGTsvQT0dZR+AlfA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR
  Cc: Daniel Kiper

Some implementations (e.g. Xen PVOPS) could not use part of identity page table
to construct transition page table. It means that they require separate PUDs,
PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
requirement add extra pointer to PGD, PUD, PMD and PTE and align existing code.

Signed-off-by: Daniel Kiper <daniel.kiper-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 arch/x86/include/asm/kexec.h       |   10 +++++++---
 arch/x86/kernel/machine_kexec_64.c |   12 ++++++------
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 6080d26..cedd204 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -157,9 +157,13 @@ struct kimage_arch {
 };
 #else
 struct kimage_arch {
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
+	pgd_t *pgd;
+	pud_t *pud0;
+	pud_t *pud1;
+	pmd_t *pmd0;
+	pmd_t *pmd1;
+	pte_t *pte0;
+	pte_t *pte1;
 };
 #endif
 
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index b3ea9db..976e54b 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -137,9 +137,9 @@ out:
 
 static void free_transition_pgtable(struct kimage *image)
 {
-	free_page((unsigned long)image->arch.pud);
-	free_page((unsigned long)image->arch.pmd);
-	free_page((unsigned long)image->arch.pte);
+	free_page((unsigned long)image->arch.pud0);
+	free_page((unsigned long)image->arch.pmd0);
+	free_page((unsigned long)image->arch.pte0);
 }
 
 static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
@@ -157,7 +157,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 		pud = (pud_t *)get_zeroed_page(GFP_KERNEL);
 		if (!pud)
 			goto err;
-		image->arch.pud = pud;
+		image->arch.pud0 = pud;
 		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
 	}
 	pud = pud_offset(pgd, vaddr);
@@ -165,7 +165,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 		pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
 		if (!pmd)
 			goto err;
-		image->arch.pmd = pmd;
+		image->arch.pmd0 = pmd;
 		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
 	}
 	pmd = pmd_offset(pud, vaddr);
@@ -173,7 +173,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
 		if (!pte)
 			goto err;
-		image->arch.pte = pte;
+		image->arch.pte0 = pte;
 		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
 	}
 	pte = pte_offset_kernel(pmd, vaddr);
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2012-12-27  2:18     ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Some implementations (e.g. Xen PVOPS) could not use part of identity page table
to construct transition page table. It means that they require separate PUDs,
PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
requirement add extra pointer to PGD, PUD, PMD and PTE and align existing code.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/include/asm/kexec.h       |   10 +++++++---
 arch/x86/kernel/machine_kexec_64.c |   12 ++++++------
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 6080d26..cedd204 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -157,9 +157,13 @@ struct kimage_arch {
 };
 #else
 struct kimage_arch {
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
+	pgd_t *pgd;
+	pud_t *pud0;
+	pud_t *pud1;
+	pmd_t *pmd0;
+	pmd_t *pmd1;
+	pte_t *pte0;
+	pte_t *pte1;
 };
 #endif
 
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index b3ea9db..976e54b 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -137,9 +137,9 @@ out:
 
 static void free_transition_pgtable(struct kimage *image)
 {
-	free_page((unsigned long)image->arch.pud);
-	free_page((unsigned long)image->arch.pmd);
-	free_page((unsigned long)image->arch.pte);
+	free_page((unsigned long)image->arch.pud0);
+	free_page((unsigned long)image->arch.pmd0);
+	free_page((unsigned long)image->arch.pte0);
 }
 
 static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
@@ -157,7 +157,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 		pud = (pud_t *)get_zeroed_page(GFP_KERNEL);
 		if (!pud)
 			goto err;
-		image->arch.pud = pud;
+		image->arch.pud0 = pud;
 		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
 	}
 	pud = pud_offset(pgd, vaddr);
@@ -165,7 +165,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 		pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
 		if (!pmd)
 			goto err;
-		image->arch.pmd = pmd;
+		image->arch.pmd0 = pmd;
 		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
 	}
 	pmd = pmd_offset(pud, vaddr);
@@ -173,7 +173,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
 		if (!pte)
 			goto err;
-		image->arch.pte = pte;
+		image->arch.pte0 = pte;
 		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
 	}
 	pte = pte_offset_kernel(pmd, vaddr);
-- 
1.5.6.5


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 03/11] xen: Introduce architecture independent data for kexec/kdump
@ 2012-12-27  2:18       ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Introduce architecture independent constants and structures
required by Xen kexec/kdump implementation.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 include/xen/interface/xen.h |   33 +++++++++++++++++++++++++++++++++
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h
index 886a5d8..09c16ab 100644
--- a/include/xen/interface/xen.h
+++ b/include/xen/interface/xen.h
@@ -57,6 +57,7 @@
 #define __HYPERVISOR_event_channel_op     32
 #define __HYPERVISOR_physdev_op           33
 #define __HYPERVISOR_hvm_op               34
+#define __HYPERVISOR_kexec_op             37
 #define __HYPERVISOR_tmem_op              38
 
 /* Architecture-specific hypercall definitions. */
@@ -231,7 +232,39 @@ DEFINE_GUEST_HANDLE_STRUCT(mmuext_op);
 #define VMASST_TYPE_pae_extended_cr3     3
 #define MAX_VMASST_TYPE 3
 
+/*
+ * Commands to HYPERVISOR_kexec_op().
+ */
+#define KEXEC_CMD_kexec			0
+#define KEXEC_CMD_kexec_load		1
+#define KEXEC_CMD_kexec_unload		2
+#define KEXEC_CMD_kexec_get_range	3
+
+/*
+ * Memory ranges for kdump (utilized by HYPERVISOR_kexec_op()).
+ */
+#define KEXEC_RANGE_MA_CRASH		0
+#define KEXEC_RANGE_MA_XEN		1
+#define KEXEC_RANGE_MA_CPU		2
+#define KEXEC_RANGE_MA_XENHEAP		3
+#define KEXEC_RANGE_MA_BOOT_PARAM	4
+#define KEXEC_RANGE_MA_EFI_MEMMAP	5
+#define KEXEC_RANGE_MA_VMCOREINFO	6
+
 #ifndef __ASSEMBLY__
+struct xen_kexec_exec {
+	int type;
+};
+
+struct xen_kexec_range {
+	int range;
+	int nr;
+	unsigned long size;
+	unsigned long start;
+};
+
+extern unsigned long xen_vmcoreinfo_maddr;
+extern unsigned long xen_vmcoreinfo_max_size;
 
 typedef uint16_t domid_t;
 
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 03/11] xen: Introduce architecture independent data for kexec/kdump
  2012-12-27  2:18     ` Daniel Kiper
                       ` (2 preceding siblings ...)
  (?)
@ 2012-12-27  2:18     ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Introduce architecture independent constants and structures
required by Xen kexec/kdump implementation.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 include/xen/interface/xen.h |   33 +++++++++++++++++++++++++++++++++
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h
index 886a5d8..09c16ab 100644
--- a/include/xen/interface/xen.h
+++ b/include/xen/interface/xen.h
@@ -57,6 +57,7 @@
 #define __HYPERVISOR_event_channel_op     32
 #define __HYPERVISOR_physdev_op           33
 #define __HYPERVISOR_hvm_op               34
+#define __HYPERVISOR_kexec_op             37
 #define __HYPERVISOR_tmem_op              38
 
 /* Architecture-specific hypercall definitions. */
@@ -231,7 +232,39 @@ DEFINE_GUEST_HANDLE_STRUCT(mmuext_op);
 #define VMASST_TYPE_pae_extended_cr3     3
 #define MAX_VMASST_TYPE 3
 
+/*
+ * Commands to HYPERVISOR_kexec_op().
+ */
+#define KEXEC_CMD_kexec			0
+#define KEXEC_CMD_kexec_load		1
+#define KEXEC_CMD_kexec_unload		2
+#define KEXEC_CMD_kexec_get_range	3
+
+/*
+ * Memory ranges for kdump (utilized by HYPERVISOR_kexec_op()).
+ */
+#define KEXEC_RANGE_MA_CRASH		0
+#define KEXEC_RANGE_MA_XEN		1
+#define KEXEC_RANGE_MA_CPU		2
+#define KEXEC_RANGE_MA_XENHEAP		3
+#define KEXEC_RANGE_MA_BOOT_PARAM	4
+#define KEXEC_RANGE_MA_EFI_MEMMAP	5
+#define KEXEC_RANGE_MA_VMCOREINFO	6
+
 #ifndef __ASSEMBLY__
+struct xen_kexec_exec {
+	int type;
+};
+
+struct xen_kexec_range {
+	int range;
+	int nr;
+	unsigned long size;
+	unsigned long start;
+};
+
+extern unsigned long xen_vmcoreinfo_maddr;
+extern unsigned long xen_vmcoreinfo_max_size;
 
 typedef uint16_t domid_t;
 
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 03/11] xen: Introduce architecture independent data for kexec/kdump
@ 2012-12-27  2:18       ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3-Sxgqhf6Nn4DQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	jbeulich-IBi9RG/b67k, konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
	maxim.uvarov-QHcLZuEGTsvQT0dZR+AlfA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR
  Cc: Daniel Kiper

Introduce architecture independent constants and structures
required by Xen kexec/kdump implementation.

Signed-off-by: Daniel Kiper <daniel.kiper-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 include/xen/interface/xen.h |   33 +++++++++++++++++++++++++++++++++
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h
index 886a5d8..09c16ab 100644
--- a/include/xen/interface/xen.h
+++ b/include/xen/interface/xen.h
@@ -57,6 +57,7 @@
 #define __HYPERVISOR_event_channel_op     32
 #define __HYPERVISOR_physdev_op           33
 #define __HYPERVISOR_hvm_op               34
+#define __HYPERVISOR_kexec_op             37
 #define __HYPERVISOR_tmem_op              38
 
 /* Architecture-specific hypercall definitions. */
@@ -231,7 +232,39 @@ DEFINE_GUEST_HANDLE_STRUCT(mmuext_op);
 #define VMASST_TYPE_pae_extended_cr3     3
 #define MAX_VMASST_TYPE 3
 
+/*
+ * Commands to HYPERVISOR_kexec_op().
+ */
+#define KEXEC_CMD_kexec			0
+#define KEXEC_CMD_kexec_load		1
+#define KEXEC_CMD_kexec_unload		2
+#define KEXEC_CMD_kexec_get_range	3
+
+/*
+ * Memory ranges for kdump (utilized by HYPERVISOR_kexec_op()).
+ */
+#define KEXEC_RANGE_MA_CRASH		0
+#define KEXEC_RANGE_MA_XEN		1
+#define KEXEC_RANGE_MA_CPU		2
+#define KEXEC_RANGE_MA_XENHEAP		3
+#define KEXEC_RANGE_MA_BOOT_PARAM	4
+#define KEXEC_RANGE_MA_EFI_MEMMAP	5
+#define KEXEC_RANGE_MA_VMCOREINFO	6
+
 #ifndef __ASSEMBLY__
+struct xen_kexec_exec {
+	int type;
+};
+
+struct xen_kexec_range {
+	int range;
+	int nr;
+	unsigned long size;
+	unsigned long start;
+};
+
+extern unsigned long xen_vmcoreinfo_maddr;
+extern unsigned long xen_vmcoreinfo_max_size;
 
 typedef uint16_t domid_t;
 
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 03/11] xen: Introduce architecture independent data for kexec/kdump
@ 2012-12-27  2:18       ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Introduce architecture independent constants and structures
required by Xen kexec/kdump implementation.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 include/xen/interface/xen.h |   33 +++++++++++++++++++++++++++++++++
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h
index 886a5d8..09c16ab 100644
--- a/include/xen/interface/xen.h
+++ b/include/xen/interface/xen.h
@@ -57,6 +57,7 @@
 #define __HYPERVISOR_event_channel_op     32
 #define __HYPERVISOR_physdev_op           33
 #define __HYPERVISOR_hvm_op               34
+#define __HYPERVISOR_kexec_op             37
 #define __HYPERVISOR_tmem_op              38
 
 /* Architecture-specific hypercall definitions. */
@@ -231,7 +232,39 @@ DEFINE_GUEST_HANDLE_STRUCT(mmuext_op);
 #define VMASST_TYPE_pae_extended_cr3     3
 #define MAX_VMASST_TYPE 3
 
+/*
+ * Commands to HYPERVISOR_kexec_op().
+ */
+#define KEXEC_CMD_kexec			0
+#define KEXEC_CMD_kexec_load		1
+#define KEXEC_CMD_kexec_unload		2
+#define KEXEC_CMD_kexec_get_range	3
+
+/*
+ * Memory ranges for kdump (utilized by HYPERVISOR_kexec_op()).
+ */
+#define KEXEC_RANGE_MA_CRASH		0
+#define KEXEC_RANGE_MA_XEN		1
+#define KEXEC_RANGE_MA_CPU		2
+#define KEXEC_RANGE_MA_XENHEAP		3
+#define KEXEC_RANGE_MA_BOOT_PARAM	4
+#define KEXEC_RANGE_MA_EFI_MEMMAP	5
+#define KEXEC_RANGE_MA_VMCOREINFO	6
+
 #ifndef __ASSEMBLY__
+struct xen_kexec_exec {
+	int type;
+};
+
+struct xen_kexec_range {
+	int range;
+	int nr;
+	unsigned long size;
+	unsigned long start;
+};
+
+extern unsigned long xen_vmcoreinfo_maddr;
+extern unsigned long xen_vmcoreinfo_max_size;
 
 typedef uint16_t domid_t;
 
-- 
1.5.6.5


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 04/11] x86/xen: Introduce architecture dependent data for kexec/kdump
@ 2012-12-27  2:18         ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Introduce architecture dependent constants, structures and
functions required by Xen kexec/kdump implementation.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/include/asm/xen/hypercall.h |    6 +++
 arch/x86/include/asm/xen/kexec.h     |   79 ++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/xen/kexec.h

diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index c20d1ce..e76a1b8 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -459,6 +459,12 @@ HYPERVISOR_hvm_op(int op, void *arg)
 }
 
 static inline int
+HYPERVISOR_kexec_op(unsigned long op, void *args)
+{
+	return _hypercall2(int, kexec_op, op, args);
+}
+
+static inline int
 HYPERVISOR_tmem_op(
 	struct tmem_op *op)
 {
diff --git a/arch/x86/include/asm/xen/kexec.h b/arch/x86/include/asm/xen/kexec.h
new file mode 100644
index 0000000..d09b52f
--- /dev/null
+++ b/arch/x86/include/asm/xen/kexec.h
@@ -0,0 +1,79 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _ASM_X86_XEN_KEXEC_H
+#define _ASM_X86_XEN_KEXEC_H
+
+#define KEXEC_XEN_NO_PAGES	17
+
+#define XK_MA_CONTROL_PAGE	0
+#define XK_VA_CONTROL_PAGE	1
+#define XK_MA_PGD_PAGE		2
+#define XK_VA_PGD_PAGE		3
+#define XK_MA_PUD0_PAGE		4
+#define XK_VA_PUD0_PAGE		5
+#define XK_MA_PUD1_PAGE		6
+#define XK_VA_PUD1_PAGE		7
+#define XK_MA_PMD0_PAGE		8
+#define XK_VA_PMD0_PAGE		9
+#define XK_MA_PMD1_PAGE		10
+#define XK_VA_PMD1_PAGE		11
+#define XK_MA_PTE0_PAGE		12
+#define XK_VA_PTE0_PAGE		13
+#define XK_MA_PTE1_PAGE		14
+#define XK_VA_PTE1_PAGE		15
+#define XK_MA_TABLE_PAGE	16
+
+#ifndef __ASSEMBLY__
+struct xen_kexec_image {
+	unsigned long page_list[KEXEC_XEN_NO_PAGES];
+	unsigned long indirection_page;
+	unsigned long start_address;
+};
+
+struct xen_kexec_load {
+	int type;
+	struct xen_kexec_image image;
+};
+
+extern unsigned int xen_kexec_control_code_size;
+
+#ifdef CONFIG_X86_32
+extern void xen_relocate_kernel(unsigned long indirection_page,
+				unsigned long *page_list,
+				unsigned long start_address,
+				unsigned int has_pae,
+				unsigned int preserve_context);
+#else
+extern void xen_relocate_kernel(unsigned long indirection_page,
+				unsigned long *page_list,
+				unsigned long start_address,
+				unsigned int preserve_context);
+#endif
+#endif
+#endif /* _ASM_X86_XEN_KEXEC_H */
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 04/11] x86/xen: Introduce architecture dependent data for kexec/kdump
  2012-12-27  2:18       ` Daniel Kiper
  (?)
  (?)
@ 2012-12-27  2:18       ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Introduce architecture dependent constants, structures and
functions required by Xen kexec/kdump implementation.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/include/asm/xen/hypercall.h |    6 +++
 arch/x86/include/asm/xen/kexec.h     |   79 ++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/xen/kexec.h

diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index c20d1ce..e76a1b8 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -459,6 +459,12 @@ HYPERVISOR_hvm_op(int op, void *arg)
 }
 
 static inline int
+HYPERVISOR_kexec_op(unsigned long op, void *args)
+{
+	return _hypercall2(int, kexec_op, op, args);
+}
+
+static inline int
 HYPERVISOR_tmem_op(
 	struct tmem_op *op)
 {
diff --git a/arch/x86/include/asm/xen/kexec.h b/arch/x86/include/asm/xen/kexec.h
new file mode 100644
index 0000000..d09b52f
--- /dev/null
+++ b/arch/x86/include/asm/xen/kexec.h
@@ -0,0 +1,79 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _ASM_X86_XEN_KEXEC_H
+#define _ASM_X86_XEN_KEXEC_H
+
+#define KEXEC_XEN_NO_PAGES	17
+
+#define XK_MA_CONTROL_PAGE	0
+#define XK_VA_CONTROL_PAGE	1
+#define XK_MA_PGD_PAGE		2
+#define XK_VA_PGD_PAGE		3
+#define XK_MA_PUD0_PAGE		4
+#define XK_VA_PUD0_PAGE		5
+#define XK_MA_PUD1_PAGE		6
+#define XK_VA_PUD1_PAGE		7
+#define XK_MA_PMD0_PAGE		8
+#define XK_VA_PMD0_PAGE		9
+#define XK_MA_PMD1_PAGE		10
+#define XK_VA_PMD1_PAGE		11
+#define XK_MA_PTE0_PAGE		12
+#define XK_VA_PTE0_PAGE		13
+#define XK_MA_PTE1_PAGE		14
+#define XK_VA_PTE1_PAGE		15
+#define XK_MA_TABLE_PAGE	16
+
+#ifndef __ASSEMBLY__
+struct xen_kexec_image {
+	unsigned long page_list[KEXEC_XEN_NO_PAGES];
+	unsigned long indirection_page;
+	unsigned long start_address;
+};
+
+struct xen_kexec_load {
+	int type;
+	struct xen_kexec_image image;
+};
+
+extern unsigned int xen_kexec_control_code_size;
+
+#ifdef CONFIG_X86_32
+extern void xen_relocate_kernel(unsigned long indirection_page,
+				unsigned long *page_list,
+				unsigned long start_address,
+				unsigned int has_pae,
+				unsigned int preserve_context);
+#else
+extern void xen_relocate_kernel(unsigned long indirection_page,
+				unsigned long *page_list,
+				unsigned long start_address,
+				unsigned int preserve_context);
+#endif
+#endif
+#endif /* _ASM_X86_XEN_KEXEC_H */
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 04/11] x86/xen: Introduce architecture dependent data for kexec/kdump
@ 2012-12-27  2:18         ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3-Sxgqhf6Nn4DQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	jbeulich-IBi9RG/b67k, konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
	maxim.uvarov-QHcLZuEGTsvQT0dZR+AlfA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR
  Cc: Daniel Kiper

Introduce architecture dependent constants, structures and
functions required by Xen kexec/kdump implementation.

Signed-off-by: Daniel Kiper <daniel.kiper-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 arch/x86/include/asm/xen/hypercall.h |    6 +++
 arch/x86/include/asm/xen/kexec.h     |   79 ++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/xen/kexec.h

diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index c20d1ce..e76a1b8 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -459,6 +459,12 @@ HYPERVISOR_hvm_op(int op, void *arg)
 }
 
 static inline int
+HYPERVISOR_kexec_op(unsigned long op, void *args)
+{
+	return _hypercall2(int, kexec_op, op, args);
+}
+
+static inline int
 HYPERVISOR_tmem_op(
 	struct tmem_op *op)
 {
diff --git a/arch/x86/include/asm/xen/kexec.h b/arch/x86/include/asm/xen/kexec.h
new file mode 100644
index 0000000..d09b52f
--- /dev/null
+++ b/arch/x86/include/asm/xen/kexec.h
@@ -0,0 +1,79 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _ASM_X86_XEN_KEXEC_H
+#define _ASM_X86_XEN_KEXEC_H
+
+#define KEXEC_XEN_NO_PAGES	17
+
+#define XK_MA_CONTROL_PAGE	0
+#define XK_VA_CONTROL_PAGE	1
+#define XK_MA_PGD_PAGE		2
+#define XK_VA_PGD_PAGE		3
+#define XK_MA_PUD0_PAGE		4
+#define XK_VA_PUD0_PAGE		5
+#define XK_MA_PUD1_PAGE		6
+#define XK_VA_PUD1_PAGE		7
+#define XK_MA_PMD0_PAGE		8
+#define XK_VA_PMD0_PAGE		9
+#define XK_MA_PMD1_PAGE		10
+#define XK_VA_PMD1_PAGE		11
+#define XK_MA_PTE0_PAGE		12
+#define XK_VA_PTE0_PAGE		13
+#define XK_MA_PTE1_PAGE		14
+#define XK_VA_PTE1_PAGE		15
+#define XK_MA_TABLE_PAGE	16
+
+#ifndef __ASSEMBLY__
+struct xen_kexec_image {
+	unsigned long page_list[KEXEC_XEN_NO_PAGES];
+	unsigned long indirection_page;
+	unsigned long start_address;
+};
+
+struct xen_kexec_load {
+	int type;
+	struct xen_kexec_image image;
+};
+
+extern unsigned int xen_kexec_control_code_size;
+
+#ifdef CONFIG_X86_32
+extern void xen_relocate_kernel(unsigned long indirection_page,
+				unsigned long *page_list,
+				unsigned long start_address,
+				unsigned int has_pae,
+				unsigned int preserve_context);
+#else
+extern void xen_relocate_kernel(unsigned long indirection_page,
+				unsigned long *page_list,
+				unsigned long start_address,
+				unsigned int preserve_context);
+#endif
+#endif
+#endif /* _ASM_X86_XEN_KEXEC_H */
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 04/11] x86/xen: Introduce architecture dependent data for kexec/kdump
@ 2012-12-27  2:18         ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Introduce architecture dependent constants, structures and
functions required by Xen kexec/kdump implementation.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/include/asm/xen/hypercall.h |    6 +++
 arch/x86/include/asm/xen/kexec.h     |   79 ++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/xen/kexec.h

diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index c20d1ce..e76a1b8 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -459,6 +459,12 @@ HYPERVISOR_hvm_op(int op, void *arg)
 }
 
 static inline int
+HYPERVISOR_kexec_op(unsigned long op, void *args)
+{
+	return _hypercall2(int, kexec_op, op, args);
+}
+
+static inline int
 HYPERVISOR_tmem_op(
 	struct tmem_op *op)
 {
diff --git a/arch/x86/include/asm/xen/kexec.h b/arch/x86/include/asm/xen/kexec.h
new file mode 100644
index 0000000..d09b52f
--- /dev/null
+++ b/arch/x86/include/asm/xen/kexec.h
@@ -0,0 +1,79 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _ASM_X86_XEN_KEXEC_H
+#define _ASM_X86_XEN_KEXEC_H
+
+#define KEXEC_XEN_NO_PAGES	17
+
+#define XK_MA_CONTROL_PAGE	0
+#define XK_VA_CONTROL_PAGE	1
+#define XK_MA_PGD_PAGE		2
+#define XK_VA_PGD_PAGE		3
+#define XK_MA_PUD0_PAGE		4
+#define XK_VA_PUD0_PAGE		5
+#define XK_MA_PUD1_PAGE		6
+#define XK_VA_PUD1_PAGE		7
+#define XK_MA_PMD0_PAGE		8
+#define XK_VA_PMD0_PAGE		9
+#define XK_MA_PMD1_PAGE		10
+#define XK_VA_PMD1_PAGE		11
+#define XK_MA_PTE0_PAGE		12
+#define XK_VA_PTE0_PAGE		13
+#define XK_MA_PTE1_PAGE		14
+#define XK_VA_PTE1_PAGE		15
+#define XK_MA_TABLE_PAGE	16
+
+#ifndef __ASSEMBLY__
+struct xen_kexec_image {
+	unsigned long page_list[KEXEC_XEN_NO_PAGES];
+	unsigned long indirection_page;
+	unsigned long start_address;
+};
+
+struct xen_kexec_load {
+	int type;
+	struct xen_kexec_image image;
+};
+
+extern unsigned int xen_kexec_control_code_size;
+
+#ifdef CONFIG_X86_32
+extern void xen_relocate_kernel(unsigned long indirection_page,
+				unsigned long *page_list,
+				unsigned long start_address,
+				unsigned int has_pae,
+				unsigned int preserve_context);
+#else
+extern void xen_relocate_kernel(unsigned long indirection_page,
+				unsigned long *page_list,
+				unsigned long start_address,
+				unsigned int preserve_context);
+#endif
+#endif
+#endif /* _ASM_X86_XEN_KEXEC_H */
-- 
1.5.6.5


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 05/11] x86/xen: Register resources required by kexec-tools
@ 2012-12-27  2:18           ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Register resources required by kexec-tools.

v2 - suggestions/fixes:
   - change logging level
     (suggested by Konrad Rzeszutek Wilk).

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/xen/kexec.c |  150 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 150 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/xen/kexec.c

diff --git a/arch/x86/xen/kexec.c b/arch/x86/xen/kexec.c
new file mode 100644
index 0000000..7ec4c45
--- /dev/null
+++ b/arch/x86/xen/kexec.c
@@ -0,0 +1,150 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#include <xen/interface/platform.h>
+#include <xen/interface/xen.h>
+#include <xen/xen.h>
+
+#include <asm/xen/hypercall.h>
+
+unsigned long xen_vmcoreinfo_maddr = 0;
+unsigned long xen_vmcoreinfo_max_size = 0;
+
+static int __init xen_init_kexec_resources(void)
+{
+	int rc;
+	static struct resource xen_hypervisor_res = {
+		.name = "Hypervisor code and data",
+		.flags = IORESOURCE_BUSY | IORESOURCE_MEM
+	};
+	struct resource *cpu_res;
+	struct xen_kexec_range xkr;
+	struct xen_platform_op cpuinfo_op;
+	uint32_t cpus, i;
+
+	if (!xen_initial_domain())
+		return 0;
+
+	if (strstr(boot_command_line, "crashkernel="))
+		pr_warn("kexec: Ignoring crashkernel option. "
+			"It should be passed to Xen hypervisor.\n");
+
+	/* Register Crash kernel resource. */
+	xkr.range = KEXEC_RANGE_MA_CRASH;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_get_range, &xkr);
+
+	if (rc) {
+		pr_warn("kexec: %s: HYPERVISOR_kexec_op(KEXEC_RANGE_MA_CRASH)"
+			": %i\n", __func__, rc);
+		return rc;
+	}
+
+	if (!xkr.size)
+		return 0;
+
+	crashk_res.start = xkr.start;
+	crashk_res.end = xkr.start + xkr.size - 1;
+	insert_resource(&iomem_resource, &crashk_res);
+
+	/* Register Hypervisor code and data resource. */
+	xkr.range = KEXEC_RANGE_MA_XEN;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_get_range, &xkr);
+
+	if (rc) {
+		pr_warn("kexec: %s: HYPERVISOR_kexec_op(KEXEC_RANGE_MA_XEN)"
+			": %i\n", __func__, rc);
+		return rc;
+	}
+
+	xen_hypervisor_res.start = xkr.start;
+	xen_hypervisor_res.end = xkr.start + xkr.size - 1;
+	insert_resource(&iomem_resource, &xen_hypervisor_res);
+
+	/* Determine maximum number of physical CPUs. */
+	cpuinfo_op.cmd = XENPF_get_cpuinfo;
+	cpuinfo_op.u.pcpu_info.xen_cpuid = 0;
+	rc = HYPERVISOR_dom0_op(&cpuinfo_op);
+
+	if (rc) {
+		pr_warn("kexec: %s: HYPERVISOR_dom0_op(): %i\n", __func__, rc);
+		return rc;
+	}
+
+	cpus = cpuinfo_op.u.pcpu_info.max_present + 1;
+
+	/* Register CPUs Crash note resources. */
+	cpu_res = kcalloc(cpus, sizeof(struct resource), GFP_KERNEL);
+
+	if (!cpu_res) {
+		pr_warn("kexec: %s: kcalloc(): %i\n", __func__, -ENOMEM);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < cpus; ++i) {
+		xkr.range = KEXEC_RANGE_MA_CPU;
+		xkr.nr = i;
+		rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_get_range, &xkr);
+
+		if (rc) {
+			pr_warn("kexec: %s: cpu: %u: HYPERVISOR_kexec_op"
+				"(KEXEC_RANGE_MA_XEN): %i\n", __func__, i, rc);
+			continue;
+		}
+
+		cpu_res->name = "Crash note";
+		cpu_res->start = xkr.start;
+		cpu_res->end = xkr.start + xkr.size - 1;
+		cpu_res->flags = IORESOURCE_BUSY | IORESOURCE_MEM;
+		insert_resource(&iomem_resource, cpu_res++);
+	}
+
+	/* Get vmcoreinfo address and maximum allowed size. */
+	xkr.range = KEXEC_RANGE_MA_VMCOREINFO;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_get_range, &xkr);
+
+	if (rc) {
+		pr_warn("kexec: %s: HYPERVISOR_kexec_op(KEXEC_RANGE_MA_VMCOREINFO)"
+			": %i\n", __func__, rc);
+		return rc;
+	}
+
+	xen_vmcoreinfo_maddr = xkr.start;
+	xen_vmcoreinfo_max_size = xkr.size;
+
+	return 0;
+}
+
+core_initcall(xen_init_kexec_resources);
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 05/11] x86/xen: Register resources required by kexec-tools
  2012-12-27  2:18         ` Daniel Kiper
                           ` (2 preceding siblings ...)
  (?)
@ 2012-12-27  2:18         ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Register resources required by kexec-tools.

v2 - suggestions/fixes:
   - change logging level
     (suggested by Konrad Rzeszutek Wilk).

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/xen/kexec.c |  150 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 150 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/xen/kexec.c

diff --git a/arch/x86/xen/kexec.c b/arch/x86/xen/kexec.c
new file mode 100644
index 0000000..7ec4c45
--- /dev/null
+++ b/arch/x86/xen/kexec.c
@@ -0,0 +1,150 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#include <xen/interface/platform.h>
+#include <xen/interface/xen.h>
+#include <xen/xen.h>
+
+#include <asm/xen/hypercall.h>
+
+unsigned long xen_vmcoreinfo_maddr = 0;
+unsigned long xen_vmcoreinfo_max_size = 0;
+
+static int __init xen_init_kexec_resources(void)
+{
+	int rc;
+	static struct resource xen_hypervisor_res = {
+		.name = "Hypervisor code and data",
+		.flags = IORESOURCE_BUSY | IORESOURCE_MEM
+	};
+	struct resource *cpu_res;
+	struct xen_kexec_range xkr;
+	struct xen_platform_op cpuinfo_op;
+	uint32_t cpus, i;
+
+	if (!xen_initial_domain())
+		return 0;
+
+	if (strstr(boot_command_line, "crashkernel="))
+		pr_warn("kexec: Ignoring crashkernel option. "
+			"It should be passed to Xen hypervisor.\n");
+
+	/* Register Crash kernel resource. */
+	xkr.range = KEXEC_RANGE_MA_CRASH;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_get_range, &xkr);
+
+	if (rc) {
+		pr_warn("kexec: %s: HYPERVISOR_kexec_op(KEXEC_RANGE_MA_CRASH)"
+			": %i\n", __func__, rc);
+		return rc;
+	}
+
+	if (!xkr.size)
+		return 0;
+
+	crashk_res.start = xkr.start;
+	crashk_res.end = xkr.start + xkr.size - 1;
+	insert_resource(&iomem_resource, &crashk_res);
+
+	/* Register Hypervisor code and data resource. */
+	xkr.range = KEXEC_RANGE_MA_XEN;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_get_range, &xkr);
+
+	if (rc) {
+		pr_warn("kexec: %s: HYPERVISOR_kexec_op(KEXEC_RANGE_MA_XEN)"
+			": %i\n", __func__, rc);
+		return rc;
+	}
+
+	xen_hypervisor_res.start = xkr.start;
+	xen_hypervisor_res.end = xkr.start + xkr.size - 1;
+	insert_resource(&iomem_resource, &xen_hypervisor_res);
+
+	/* Determine maximum number of physical CPUs. */
+	cpuinfo_op.cmd = XENPF_get_cpuinfo;
+	cpuinfo_op.u.pcpu_info.xen_cpuid = 0;
+	rc = HYPERVISOR_dom0_op(&cpuinfo_op);
+
+	if (rc) {
+		pr_warn("kexec: %s: HYPERVISOR_dom0_op(): %i\n", __func__, rc);
+		return rc;
+	}
+
+	cpus = cpuinfo_op.u.pcpu_info.max_present + 1;
+
+	/* Register CPUs Crash note resources. */
+	cpu_res = kcalloc(cpus, sizeof(struct resource), GFP_KERNEL);
+
+	if (!cpu_res) {
+		pr_warn("kexec: %s: kcalloc(): %i\n", __func__, -ENOMEM);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < cpus; ++i) {
+		xkr.range = KEXEC_RANGE_MA_CPU;
+		xkr.nr = i;
+		rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_get_range, &xkr);
+
+		if (rc) {
+			pr_warn("kexec: %s: cpu: %u: HYPERVISOR_kexec_op"
+				"(KEXEC_RANGE_MA_XEN): %i\n", __func__, i, rc);
+			continue;
+		}
+
+		cpu_res->name = "Crash note";
+		cpu_res->start = xkr.start;
+		cpu_res->end = xkr.start + xkr.size - 1;
+		cpu_res->flags = IORESOURCE_BUSY | IORESOURCE_MEM;
+		insert_resource(&iomem_resource, cpu_res++);
+	}
+
+	/* Get vmcoreinfo address and maximum allowed size. */
+	xkr.range = KEXEC_RANGE_MA_VMCOREINFO;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_get_range, &xkr);
+
+	if (rc) {
+		pr_warn("kexec: %s: HYPERVISOR_kexec_op(KEXEC_RANGE_MA_VMCOREINFO)"
+			": %i\n", __func__, rc);
+		return rc;
+	}
+
+	xen_vmcoreinfo_maddr = xkr.start;
+	xen_vmcoreinfo_max_size = xkr.size;
+
+	return 0;
+}
+
+core_initcall(xen_init_kexec_resources);
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 05/11] x86/xen: Register resources required by kexec-tools
@ 2012-12-27  2:18           ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3-Sxgqhf6Nn4DQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	jbeulich-IBi9RG/b67k, konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
	maxim.uvarov-QHcLZuEGTsvQT0dZR+AlfA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR
  Cc: Daniel Kiper

Register resources required by kexec-tools.

v2 - suggestions/fixes:
   - change logging level
     (suggested by Konrad Rzeszutek Wilk).

Signed-off-by: Daniel Kiper <daniel.kiper-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 arch/x86/xen/kexec.c |  150 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 150 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/xen/kexec.c

diff --git a/arch/x86/xen/kexec.c b/arch/x86/xen/kexec.c
new file mode 100644
index 0000000..7ec4c45
--- /dev/null
+++ b/arch/x86/xen/kexec.c
@@ -0,0 +1,150 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#include <xen/interface/platform.h>
+#include <xen/interface/xen.h>
+#include <xen/xen.h>
+
+#include <asm/xen/hypercall.h>
+
+unsigned long xen_vmcoreinfo_maddr = 0;
+unsigned long xen_vmcoreinfo_max_size = 0;
+
+static int __init xen_init_kexec_resources(void)
+{
+	int rc;
+	static struct resource xen_hypervisor_res = {
+		.name = "Hypervisor code and data",
+		.flags = IORESOURCE_BUSY | IORESOURCE_MEM
+	};
+	struct resource *cpu_res;
+	struct xen_kexec_range xkr;
+	struct xen_platform_op cpuinfo_op;
+	uint32_t cpus, i;
+
+	if (!xen_initial_domain())
+		return 0;
+
+	if (strstr(boot_command_line, "crashkernel="))
+		pr_warn("kexec: Ignoring crashkernel option. "
+			"It should be passed to Xen hypervisor.\n");
+
+	/* Register Crash kernel resource. */
+	xkr.range = KEXEC_RANGE_MA_CRASH;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_get_range, &xkr);
+
+	if (rc) {
+		pr_warn("kexec: %s: HYPERVISOR_kexec_op(KEXEC_RANGE_MA_CRASH)"
+			": %i\n", __func__, rc);
+		return rc;
+	}
+
+	if (!xkr.size)
+		return 0;
+
+	crashk_res.start = xkr.start;
+	crashk_res.end = xkr.start + xkr.size - 1;
+	insert_resource(&iomem_resource, &crashk_res);
+
+	/* Register Hypervisor code and data resource. */
+	xkr.range = KEXEC_RANGE_MA_XEN;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_get_range, &xkr);
+
+	if (rc) {
+		pr_warn("kexec: %s: HYPERVISOR_kexec_op(KEXEC_RANGE_MA_XEN)"
+			": %i\n", __func__, rc);
+		return rc;
+	}
+
+	xen_hypervisor_res.start = xkr.start;
+	xen_hypervisor_res.end = xkr.start + xkr.size - 1;
+	insert_resource(&iomem_resource, &xen_hypervisor_res);
+
+	/* Determine maximum number of physical CPUs. */
+	cpuinfo_op.cmd = XENPF_get_cpuinfo;
+	cpuinfo_op.u.pcpu_info.xen_cpuid = 0;
+	rc = HYPERVISOR_dom0_op(&cpuinfo_op);
+
+	if (rc) {
+		pr_warn("kexec: %s: HYPERVISOR_dom0_op(): %i\n", __func__, rc);
+		return rc;
+	}
+
+	cpus = cpuinfo_op.u.pcpu_info.max_present + 1;
+
+	/* Register CPUs Crash note resources. */
+	cpu_res = kcalloc(cpus, sizeof(struct resource), GFP_KERNEL);
+
+	if (!cpu_res) {
+		pr_warn("kexec: %s: kcalloc(): %i\n", __func__, -ENOMEM);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < cpus; ++i) {
+		xkr.range = KEXEC_RANGE_MA_CPU;
+		xkr.nr = i;
+		rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_get_range, &xkr);
+
+		if (rc) {
+			pr_warn("kexec: %s: cpu: %u: HYPERVISOR_kexec_op"
+				"(KEXEC_RANGE_MA_XEN): %i\n", __func__, i, rc);
+			continue;
+		}
+
+		cpu_res->name = "Crash note";
+		cpu_res->start = xkr.start;
+		cpu_res->end = xkr.start + xkr.size - 1;
+		cpu_res->flags = IORESOURCE_BUSY | IORESOURCE_MEM;
+		insert_resource(&iomem_resource, cpu_res++);
+	}
+
+	/* Get vmcoreinfo address and maximum allowed size. */
+	xkr.range = KEXEC_RANGE_MA_VMCOREINFO;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_get_range, &xkr);
+
+	if (rc) {
+		pr_warn("kexec: %s: HYPERVISOR_kexec_op(KEXEC_RANGE_MA_VMCOREINFO)"
+			": %i\n", __func__, rc);
+		return rc;
+	}
+
+	xen_vmcoreinfo_maddr = xkr.start;
+	xen_vmcoreinfo_max_size = xkr.size;
+
+	return 0;
+}
+
+core_initcall(xen_init_kexec_resources);
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 05/11] x86/xen: Register resources required by kexec-tools
@ 2012-12-27  2:18           ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Register resources required by kexec-tools.

v2 - suggestions/fixes:
   - change logging level
     (suggested by Konrad Rzeszutek Wilk).

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/xen/kexec.c |  150 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 150 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/xen/kexec.c

diff --git a/arch/x86/xen/kexec.c b/arch/x86/xen/kexec.c
new file mode 100644
index 0000000..7ec4c45
--- /dev/null
+++ b/arch/x86/xen/kexec.c
@@ -0,0 +1,150 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+#include <xen/interface/platform.h>
+#include <xen/interface/xen.h>
+#include <xen/xen.h>
+
+#include <asm/xen/hypercall.h>
+
+unsigned long xen_vmcoreinfo_maddr = 0;
+unsigned long xen_vmcoreinfo_max_size = 0;
+
+static int __init xen_init_kexec_resources(void)
+{
+	int rc;
+	static struct resource xen_hypervisor_res = {
+		.name = "Hypervisor code and data",
+		.flags = IORESOURCE_BUSY | IORESOURCE_MEM
+	};
+	struct resource *cpu_res;
+	struct xen_kexec_range xkr;
+	struct xen_platform_op cpuinfo_op;
+	uint32_t cpus, i;
+
+	if (!xen_initial_domain())
+		return 0;
+
+	if (strstr(boot_command_line, "crashkernel="))
+		pr_warn("kexec: Ignoring crashkernel option. "
+			"It should be passed to Xen hypervisor.\n");
+
+	/* Register Crash kernel resource. */
+	xkr.range = KEXEC_RANGE_MA_CRASH;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_get_range, &xkr);
+
+	if (rc) {
+		pr_warn("kexec: %s: HYPERVISOR_kexec_op(KEXEC_RANGE_MA_CRASH)"
+			": %i\n", __func__, rc);
+		return rc;
+	}
+
+	if (!xkr.size)
+		return 0;
+
+	crashk_res.start = xkr.start;
+	crashk_res.end = xkr.start + xkr.size - 1;
+	insert_resource(&iomem_resource, &crashk_res);
+
+	/* Register Hypervisor code and data resource. */
+	xkr.range = KEXEC_RANGE_MA_XEN;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_get_range, &xkr);
+
+	if (rc) {
+		pr_warn("kexec: %s: HYPERVISOR_kexec_op(KEXEC_RANGE_MA_XEN)"
+			": %i\n", __func__, rc);
+		return rc;
+	}
+
+	xen_hypervisor_res.start = xkr.start;
+	xen_hypervisor_res.end = xkr.start + xkr.size - 1;
+	insert_resource(&iomem_resource, &xen_hypervisor_res);
+
+	/* Determine maximum number of physical CPUs. */
+	cpuinfo_op.cmd = XENPF_get_cpuinfo;
+	cpuinfo_op.u.pcpu_info.xen_cpuid = 0;
+	rc = HYPERVISOR_dom0_op(&cpuinfo_op);
+
+	if (rc) {
+		pr_warn("kexec: %s: HYPERVISOR_dom0_op(): %i\n", __func__, rc);
+		return rc;
+	}
+
+	cpus = cpuinfo_op.u.pcpu_info.max_present + 1;
+
+	/* Register CPUs Crash note resources. */
+	cpu_res = kcalloc(cpus, sizeof(struct resource), GFP_KERNEL);
+
+	if (!cpu_res) {
+		pr_warn("kexec: %s: kcalloc(): %i\n", __func__, -ENOMEM);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < cpus; ++i) {
+		xkr.range = KEXEC_RANGE_MA_CPU;
+		xkr.nr = i;
+		rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_get_range, &xkr);
+
+		if (rc) {
+			pr_warn("kexec: %s: cpu: %u: HYPERVISOR_kexec_op"
+				"(KEXEC_RANGE_MA_XEN): %i\n", __func__, i, rc);
+			continue;
+		}
+
+		cpu_res->name = "Crash note";
+		cpu_res->start = xkr.start;
+		cpu_res->end = xkr.start + xkr.size - 1;
+		cpu_res->flags = IORESOURCE_BUSY | IORESOURCE_MEM;
+		insert_resource(&iomem_resource, cpu_res++);
+	}
+
+	/* Get vmcoreinfo address and maximum allowed size. */
+	xkr.range = KEXEC_RANGE_MA_VMCOREINFO;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_get_range, &xkr);
+
+	if (rc) {
+		pr_warn("kexec: %s: HYPERVISOR_kexec_op(KEXEC_RANGE_MA_VMCOREINFO)"
+			": %i\n", __func__, rc);
+		return rc;
+	}
+
+	xen_vmcoreinfo_maddr = xkr.start;
+	xen_vmcoreinfo_max_size = xkr.size;
+
+	return 0;
+}
+
+core_initcall(xen_init_kexec_resources);
-- 
1.5.6.5


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 06/11] x86/xen: Add i386 kexec/kdump implementation
@ 2012-12-27  2:18             ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Add i386 kexec/kdump implementation.

v2 - suggestions/fixes:
   - allocate transition page table pages below 4 GiB
     (suggested by Jan Beulich).

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/xen/machine_kexec_32.c   |  226 ++++++++++++++++++++++++++
 arch/x86/xen/relocate_kernel_32.S |  323 +++++++++++++++++++++++++++++++++++++
 2 files changed, 549 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/xen/machine_kexec_32.c
 create mode 100644 arch/x86/xen/relocate_kernel_32.S

diff --git a/arch/x86/xen/machine_kexec_32.c b/arch/x86/xen/machine_kexec_32.c
new file mode 100644
index 0000000..011a5e8
--- /dev/null
+++ b/arch/x86/xen/machine_kexec_32.c
@@ -0,0 +1,226 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/mm.h>
+#include <linux/string.h>
+
+#include <xen/xen.h>
+#include <xen/xen-ops.h>
+
+#include <asm/xen/hypercall.h>
+#include <asm/xen/kexec.h>
+#include <asm/xen/page.h>
+
+#define __ma(vaddr)	(virt_to_machine(vaddr).maddr)
+
+static void *alloc_pgtable_page(struct kimage *image)
+{
+	struct page *page;
+
+	page = firmware_kimage_alloc_control_pages(image, 0);
+
+	if (!page || !page_address(page))
+		return NULL;
+
+	memset(page_address(page), 0, PAGE_SIZE);
+
+	return page_address(page);
+}
+
+static int alloc_transition_pgtable(struct kimage *image)
+{
+	image->arch.pgd = alloc_pgtable_page(image);
+
+	if (!image->arch.pgd)
+		return -ENOMEM;
+
+	image->arch.pmd0 = alloc_pgtable_page(image);
+
+	if (!image->arch.pmd0)
+		return -ENOMEM;
+
+	image->arch.pmd1 = alloc_pgtable_page(image);
+
+	if (!image->arch.pmd1)
+		return -ENOMEM;
+
+	image->arch.pte0 = alloc_pgtable_page(image);
+
+	if (!image->arch.pte0)
+		return -ENOMEM;
+
+	image->arch.pte1 = alloc_pgtable_page(image);
+
+	if (!image->arch.pte1)
+		return -ENOMEM;
+
+	return 0;
+}
+
+struct page *mf_kexec_kimage_alloc_pages(gfp_t gfp_mask,
+						unsigned int order,
+						unsigned long limit)
+{
+	struct page *pages;
+	unsigned int address_bits, i;
+
+	pages = alloc_pages(gfp_mask, order);
+
+	if (!pages)
+		return NULL;
+
+	address_bits = (limit == ULONG_MAX) ? BITS_PER_LONG : ilog2(limit);
+
+	/* Relocate set of pages below given limit. */
+	if (xen_create_contiguous_region((unsigned long)page_address(pages),
+							order, address_bits)) {
+		__free_pages(pages, order);
+		return NULL;
+	}
+
+	BUG_ON(PagePrivate(pages));
+
+	pages->mapping = NULL;
+	set_page_private(pages, order);
+
+	for (i = 0; i < (1 << order); ++i)
+		SetPageReserved(pages + i);
+
+	return pages;
+}
+
+void mf_kexec_kimage_free_pages(struct page *page)
+{
+	unsigned int i, order;
+
+	order = page_private(page);
+
+	for (i = 0; i < (1 << order); ++i)
+		ClearPageReserved(page + i);
+
+	xen_destroy_contiguous_region((unsigned long)page_address(page), order);
+	__free_pages(page, order);
+}
+
+unsigned long mf_kexec_page_to_pfn(struct page *page)
+{
+	return pfn_to_mfn(page_to_pfn(page));
+}
+
+struct page *mf_kexec_pfn_to_page(unsigned long mfn)
+{
+	return pfn_to_page(mfn_to_pfn(mfn));
+}
+
+unsigned long mf_kexec_virt_to_phys(volatile void *address)
+{
+	return virt_to_machine(address).maddr;
+}
+
+void *mf_kexec_phys_to_virt(unsigned long address)
+{
+	return phys_to_virt(machine_to_phys(XMADDR(address)).paddr);
+}
+
+int mf_kexec_prepare(struct kimage *image)
+{
+#ifdef CONFIG_KEXEC_JUMP
+	if (image->preserve_context) {
+		pr_info_once("kexec: Context preservation is not "
+				"supported in Xen domains.\n");
+		return -ENOSYS;
+	}
+#endif
+
+	return alloc_transition_pgtable(image);
+}
+
+int mf_kexec_load(struct kimage *image)
+{
+	void *control_page;
+	struct xen_kexec_load xkl = {};
+
+	/* Image is unloaded, nothing to do. */
+	if (!image)
+		return 0;
+
+	control_page = page_address(image->control_code_page);
+	memcpy(control_page, xen_relocate_kernel, xen_kexec_control_code_size);
+
+	xkl.type = image->type;
+	xkl.image.page_list[XK_MA_CONTROL_PAGE] = __ma(control_page);
+	xkl.image.page_list[XK_MA_TABLE_PAGE] = 0; /* Unused. */
+	xkl.image.page_list[XK_MA_PGD_PAGE] = __ma(image->arch.pgd);
+	xkl.image.page_list[XK_MA_PUD0_PAGE] = 0; /* Unused. */
+	xkl.image.page_list[XK_MA_PUD1_PAGE] = 0; /* Unused. */
+	xkl.image.page_list[XK_MA_PMD0_PAGE] = __ma(image->arch.pmd0);
+	xkl.image.page_list[XK_MA_PMD1_PAGE] = __ma(image->arch.pmd1);
+	xkl.image.page_list[XK_MA_PTE0_PAGE] = __ma(image->arch.pte0);
+	xkl.image.page_list[XK_MA_PTE1_PAGE] = __ma(image->arch.pte1);
+	xkl.image.indirection_page = image->head;
+	xkl.image.start_address = image->start;
+
+	return HYPERVISOR_kexec_op(KEXEC_CMD_kexec_load, &xkl);
+}
+
+void mf_kexec_cleanup(struct kimage *image)
+{
+}
+
+void mf_kexec_unload(struct kimage *image)
+{
+	int rc;
+	struct xen_kexec_load xkl = {};
+
+	if (!image)
+		return;
+
+	xkl.type = image->type;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_unload, &xkl);
+
+	WARN(rc, "kexec: %s: HYPERVISOR_kexec_op(): %i\n", __func__, rc);
+}
+
+void mf_kexec_shutdown(void)
+{
+}
+
+void mf_kexec(struct kimage *image)
+{
+	int rc;
+	struct xen_kexec_exec xke = {};
+
+	xke.type = image->type;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec, &xke);
+
+	pr_emerg("kexec: %s: HYPERVISOR_kexec_op(): %i\n", __func__, rc);
+	BUG();
+}
diff --git a/arch/x86/xen/relocate_kernel_32.S b/arch/x86/xen/relocate_kernel_32.S
new file mode 100644
index 0000000..0e81830
--- /dev/null
+++ b/arch/x86/xen/relocate_kernel_32.S
@@ -0,0 +1,323 @@
+/*
+ * Copyright (c) 2002-2005 Eric Biederman <ebiederm@xmission.com>
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either veesion 2 of the License, or
+ * (at your option) any later veesion.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <asm/cache.h>
+#include <asm/page_types.h>
+#include <asm/pgtable_types.h>
+#include <asm/processor-flags.h>
+
+#include <asm/xen/kexec.h>
+
+#define ARG_INDIRECTION_PAGE	0x4
+#define ARG_PAGE_LIST		0x8
+#define ARG_START_ADDRESS	0xc
+
+#define PTR(x)	(x << 2)
+
+	.text
+	.align	PAGE_SIZE
+	.globl	xen_kexec_control_code_size, xen_relocate_kernel
+
+xen_relocate_kernel:
+	/*
+	 * Must be relocatable PIC code callable as a C function.
+	 *
+	 * This function is called by Xen but here hypervisor is dead.
+	 * We are playing on bare metal.
+	 *
+	 * Every machine address passed to this function through
+	 * page_list (e.g. XK_MA_CONTROL_PAGE) is established
+	 * by dom0 during kexec load phase.
+	 *
+	 * Every virtual address passed to this function through page_list
+	 * (e.g. XK_VA_CONTROL_PAGE) is established by hypervisor during
+	 * HYPERVISOR_kexec_op(KEXEC_CMD_kexec_load) hypercall.
+	 *
+	 * 0x4(%esp) - indirection_page,
+	 * 0x8(%esp) - page_list,
+	 * 0xc(%esp) - start_address,
+	 * 0x10(%esp) - cpu_has_pae (ignored),
+	 * 0x14(%esp) - preserve_context (ignored).
+	 */
+
+	/* Zero out flags, and disable interrupts. */
+	pushl	$0
+	popfl
+
+	/* Get page_list address. */
+	movl	ARG_PAGE_LIST(%esp), %esi
+
+	/*
+	 * Map the control page at its virtual address
+	 * in transition page table.
+	 */
+	movl	PTR(XK_VA_CONTROL_PAGE)(%esi), %eax
+
+	/* Get PGD address and PGD entry index. */
+	movl	PTR(XK_VA_PGD_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PGDIR_SHIFT, %ecx
+	andl	$(PTRS_PER_PGD - 1), %ecx
+
+	/* Fill PGD entry with PMD0 reference. */
+	movl	PTR(XK_MA_PMD0_PAGE)(%esi), %edx
+	orl	$_PAGE_PRESENT, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/* Get PMD0 address and PMD0 entry index. */
+	movl	PTR(XK_VA_PMD0_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PMD_SHIFT, %ecx
+	andl	$(PTRS_PER_PMD - 1), %ecx
+
+	/* Fill PMD0 entry with PTE0 reference. */
+	movl	PTR(XK_MA_PTE0_PAGE)(%esi), %edx
+	orl	$_KERNPG_TABLE, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/* Get PTE0 address and PTE0 entry index. */
+	movl	PTR(XK_VA_PTE0_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PAGE_SHIFT, %ecx
+	andl	$(PTRS_PER_PTE - 1), %ecx
+
+	/* Fill PTE0 entry with control page reference. */
+	movl	PTR(XK_MA_CONTROL_PAGE)(%esi), %edx
+	orl	$__PAGE_KERNEL_EXEC, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/*
+	 * Identity map the control page at its machine address
+	 * in transition page table.
+	 */
+	movl	PTR(XK_MA_CONTROL_PAGE)(%esi), %eax
+
+	/* Get PGD address and PGD entry index. */
+	movl	PTR(XK_VA_PGD_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PGDIR_SHIFT, %ecx
+	andl	$(PTRS_PER_PGD - 1), %ecx
+
+	/* Fill PGD entry with PMD1 reference. */
+	movl	PTR(XK_MA_PMD1_PAGE)(%esi), %edx
+	orl	$_PAGE_PRESENT, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/* Get PMD1 address and PMD1 entry index. */
+	movl	PTR(XK_VA_PMD1_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PMD_SHIFT, %ecx
+	andl	$(PTRS_PER_PMD - 1), %ecx
+
+	/* Fill PMD1 entry with PTE1 reference. */
+	movl	PTR(XK_MA_PTE1_PAGE)(%esi), %edx
+	orl	$_KERNPG_TABLE, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/* Get PTE1 address and PTE1 entry index. */
+	movl	PTR(XK_VA_PTE1_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PAGE_SHIFT, %ecx
+	andl	$(PTRS_PER_PTE - 1), %ecx
+
+	/* Fill PTE1 entry with control page reference. */
+	movl	PTR(XK_MA_CONTROL_PAGE)(%esi), %edx
+	orl	$__PAGE_KERNEL_EXEC, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/*
+	 * Get machine address of control page now.
+	 * This is impossible after page table switch.
+	 */
+	movl	PTR(XK_MA_CONTROL_PAGE)(%esi), %ebx
+
+	/* Get machine address of transition page table now too. */
+	movl	PTR(XK_MA_PGD_PAGE)(%esi), %ecx
+
+	/* Get start_address too. */
+	movl	ARG_START_ADDRESS(%esp), %edx
+
+	/* Get indirection_page address too. */
+	movl	ARG_INDIRECTION_PAGE(%esp), %edi
+
+	/* Switch to transition page table. */
+	movl	%ecx, %cr3
+
+	/* Load IDT. */
+	lidtl	(idt_48 - xen_relocate_kernel)(%ebx)
+
+	/* Load GDT. */
+	leal	(gdt - xen_relocate_kernel)(%ebx), %eax
+	movl	%eax, (gdt_48 - xen_relocate_kernel + 2)(%ebx)
+	lgdtl	(gdt_48 - xen_relocate_kernel)(%ebx)
+
+	/* Load data segment registers. */
+	movl	$(gdt_ds - gdt), %eax
+	movl	%eax, %ds
+	movl	%eax, %es
+	movl	%eax, %fs
+	movl	%eax, %gs
+	movl	%eax, %ss
+
+	/* Setup a new stack at the end of machine address of control page. */
+	leal	PAGE_SIZE(%ebx), %esp
+
+	/* Store start_address on the stack. */
+	pushl   %edx
+
+	/* Jump to identity mapped page. */
+	pushl	$0
+	pushl	$(gdt_cs - gdt)
+	addl	$(identity_mapped - xen_relocate_kernel), %ebx
+	pushl	%ebx
+	iretl
+
+identity_mapped:
+	/*
+	 * Set %cr0 to a known state:
+	 *   - disable alignment check,
+	 *   - disable floating point emulation,
+	 *   - disable paging,
+	 *   - no task switch,
+	 *   - disable write protect,
+	 *   - enable protected mode.
+	 */
+	movl	%cr0, %eax
+	andl	$~(X86_CR0_AM | X86_CR0_EM | X86_CR0_PG | X86_CR0_TS | X86_CR0_WP), %eax
+	orl	$(X86_CR0_PE), %eax
+	movl	%eax, %cr0
+
+	/* Set %cr4 to a known state. */
+	xorl	%eax, %eax
+	movl	%eax, %cr4
+
+	jmp	1f
+
+1:
+	/* Flush the TLB (needed?). */
+	movl	%eax, %cr3
+
+	/* Do the copies. */
+	movl	%edi, %ecx	/* Put the indirection_page in %ecx. */
+	xorl	%edi, %edi
+	xorl	%esi, %esi
+	jmp	1f
+
+0:
+	/*
+	 * Top, read another doubleword from the indirection page.
+	 * Indirection page is an array which contains source
+	 * and destination address pairs. If all pairs could
+	 * not fit in one page then at the end of given
+	 * indirection page is pointer to next one.
+	 * Copy is stopped when done indicator
+	 * is found in indirection page.
+	 */
+	movl	(%ebx), %ecx
+	addl	$4, %ebx
+
+1:
+	testl	$0x1, %ecx	/* Is it a destination page? */
+	jz	2f
+
+	movl	%ecx, %edi
+	andl	$PAGE_MASK, %edi
+	jmp	0b
+
+2:
+	testl	$0x2, %ecx	/* Is it an indirection page? */
+	jz	2f
+
+	movl	%ecx, %ebx
+	andl	$PAGE_MASK, %ebx
+	jmp	0b
+
+2:
+	testl	$0x4, %ecx	/* Is it the done indicator? */
+	jz	2f
+	jmp	3f
+
+2:
+	testl	$0x8, %ecx	/* Is it the source indicator? */
+	jz	0b		/* Ignore it otherwise. */
+
+	movl	%ecx, %esi
+	andl	$PAGE_MASK, %esi
+	movl	$1024, %ecx
+
+	/* Copy page. */
+	rep	movsl
+	jmp	0b
+
+3:
+	/*
+	 * To be certain of avoiding problems with self-modifying code
+	 * I need to execute a serializing instruction here.
+	 * So I flush the TLB by reloading %cr3 here, it's handy,
+	 * and not processor dependent.
+	 */
+	xorl	%eax, %eax
+	movl	%eax, %cr3
+
+	/*
+	 * Set all of the registers to known values.
+	 * Leave %esp alone.
+	 */
+	xorl	%ebx, %ebx
+	xorl    %ecx, %ecx
+	xorl    %edx, %edx
+	xorl    %esi, %esi
+	xorl    %edi, %edi
+	xorl    %ebp, %ebp
+
+	/* Jump to start_address. */
+	retl
+
+	.align	L1_CACHE_BYTES
+
+gdt:
+	.quad	0x0000000000000000	/* NULL descriptor. */
+
+gdt_cs:
+	.quad	0x00cf9a000000ffff	/* 4 GiB code segment at 0x00000000. */
+
+gdt_ds:
+	.quad	0x00cf92000000ffff	/* 4 GiB data segment at 0x00000000. */
+gdt_end:
+
+gdt_48:
+	.word	gdt_end - gdt - 1	/* GDT limit. */
+	.long	0			/* GDT base - filled in by code above. */
+
+idt_48:
+	.word	0			/* IDT limit. */
+	.long	0			/* IDT base. */
+
+xen_kexec_control_code_size:
+	.long	. - xen_relocate_kernel
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 06/11] x86/xen: Add i386 kexec/kdump implementation
  2012-12-27  2:18           ` Daniel Kiper
                             ` (2 preceding siblings ...)
  (?)
@ 2012-12-27  2:18           ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Add i386 kexec/kdump implementation.

v2 - suggestions/fixes:
   - allocate transition page table pages below 4 GiB
     (suggested by Jan Beulich).

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/xen/machine_kexec_32.c   |  226 ++++++++++++++++++++++++++
 arch/x86/xen/relocate_kernel_32.S |  323 +++++++++++++++++++++++++++++++++++++
 2 files changed, 549 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/xen/machine_kexec_32.c
 create mode 100644 arch/x86/xen/relocate_kernel_32.S

diff --git a/arch/x86/xen/machine_kexec_32.c b/arch/x86/xen/machine_kexec_32.c
new file mode 100644
index 0000000..011a5e8
--- /dev/null
+++ b/arch/x86/xen/machine_kexec_32.c
@@ -0,0 +1,226 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/mm.h>
+#include <linux/string.h>
+
+#include <xen/xen.h>
+#include <xen/xen-ops.h>
+
+#include <asm/xen/hypercall.h>
+#include <asm/xen/kexec.h>
+#include <asm/xen/page.h>
+
+#define __ma(vaddr)	(virt_to_machine(vaddr).maddr)
+
+static void *alloc_pgtable_page(struct kimage *image)
+{
+	struct page *page;
+
+	page = firmware_kimage_alloc_control_pages(image, 0);
+
+	if (!page || !page_address(page))
+		return NULL;
+
+	memset(page_address(page), 0, PAGE_SIZE);
+
+	return page_address(page);
+}
+
+static int alloc_transition_pgtable(struct kimage *image)
+{
+	image->arch.pgd = alloc_pgtable_page(image);
+
+	if (!image->arch.pgd)
+		return -ENOMEM;
+
+	image->arch.pmd0 = alloc_pgtable_page(image);
+
+	if (!image->arch.pmd0)
+		return -ENOMEM;
+
+	image->arch.pmd1 = alloc_pgtable_page(image);
+
+	if (!image->arch.pmd1)
+		return -ENOMEM;
+
+	image->arch.pte0 = alloc_pgtable_page(image);
+
+	if (!image->arch.pte0)
+		return -ENOMEM;
+
+	image->arch.pte1 = alloc_pgtable_page(image);
+
+	if (!image->arch.pte1)
+		return -ENOMEM;
+
+	return 0;
+}
+
+struct page *mf_kexec_kimage_alloc_pages(gfp_t gfp_mask,
+						unsigned int order,
+						unsigned long limit)
+{
+	struct page *pages;
+	unsigned int address_bits, i;
+
+	pages = alloc_pages(gfp_mask, order);
+
+	if (!pages)
+		return NULL;
+
+	address_bits = (limit == ULONG_MAX) ? BITS_PER_LONG : ilog2(limit);
+
+	/* Relocate set of pages below given limit. */
+	if (xen_create_contiguous_region((unsigned long)page_address(pages),
+							order, address_bits)) {
+		__free_pages(pages, order);
+		return NULL;
+	}
+
+	BUG_ON(PagePrivate(pages));
+
+	pages->mapping = NULL;
+	set_page_private(pages, order);
+
+	for (i = 0; i < (1 << order); ++i)
+		SetPageReserved(pages + i);
+
+	return pages;
+}
+
+void mf_kexec_kimage_free_pages(struct page *page)
+{
+	unsigned int i, order;
+
+	order = page_private(page);
+
+	for (i = 0; i < (1 << order); ++i)
+		ClearPageReserved(page + i);
+
+	xen_destroy_contiguous_region((unsigned long)page_address(page), order);
+	__free_pages(page, order);
+}
+
+unsigned long mf_kexec_page_to_pfn(struct page *page)
+{
+	return pfn_to_mfn(page_to_pfn(page));
+}
+
+struct page *mf_kexec_pfn_to_page(unsigned long mfn)
+{
+	return pfn_to_page(mfn_to_pfn(mfn));
+}
+
+unsigned long mf_kexec_virt_to_phys(volatile void *address)
+{
+	return virt_to_machine(address).maddr;
+}
+
+void *mf_kexec_phys_to_virt(unsigned long address)
+{
+	return phys_to_virt(machine_to_phys(XMADDR(address)).paddr);
+}
+
+int mf_kexec_prepare(struct kimage *image)
+{
+#ifdef CONFIG_KEXEC_JUMP
+	if (image->preserve_context) {
+		pr_info_once("kexec: Context preservation is not "
+				"supported in Xen domains.\n");
+		return -ENOSYS;
+	}
+#endif
+
+	return alloc_transition_pgtable(image);
+}
+
+int mf_kexec_load(struct kimage *image)
+{
+	void *control_page;
+	struct xen_kexec_load xkl = {};
+
+	/* Image is unloaded, nothing to do. */
+	if (!image)
+		return 0;
+
+	control_page = page_address(image->control_code_page);
+	memcpy(control_page, xen_relocate_kernel, xen_kexec_control_code_size);
+
+	xkl.type = image->type;
+	xkl.image.page_list[XK_MA_CONTROL_PAGE] = __ma(control_page);
+	xkl.image.page_list[XK_MA_TABLE_PAGE] = 0; /* Unused. */
+	xkl.image.page_list[XK_MA_PGD_PAGE] = __ma(image->arch.pgd);
+	xkl.image.page_list[XK_MA_PUD0_PAGE] = 0; /* Unused. */
+	xkl.image.page_list[XK_MA_PUD1_PAGE] = 0; /* Unused. */
+	xkl.image.page_list[XK_MA_PMD0_PAGE] = __ma(image->arch.pmd0);
+	xkl.image.page_list[XK_MA_PMD1_PAGE] = __ma(image->arch.pmd1);
+	xkl.image.page_list[XK_MA_PTE0_PAGE] = __ma(image->arch.pte0);
+	xkl.image.page_list[XK_MA_PTE1_PAGE] = __ma(image->arch.pte1);
+	xkl.image.indirection_page = image->head;
+	xkl.image.start_address = image->start;
+
+	return HYPERVISOR_kexec_op(KEXEC_CMD_kexec_load, &xkl);
+}
+
+void mf_kexec_cleanup(struct kimage *image)
+{
+}
+
+void mf_kexec_unload(struct kimage *image)
+{
+	int rc;
+	struct xen_kexec_load xkl = {};
+
+	if (!image)
+		return;
+
+	xkl.type = image->type;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_unload, &xkl);
+
+	WARN(rc, "kexec: %s: HYPERVISOR_kexec_op(): %i\n", __func__, rc);
+}
+
+void mf_kexec_shutdown(void)
+{
+}
+
+void mf_kexec(struct kimage *image)
+{
+	int rc;
+	struct xen_kexec_exec xke = {};
+
+	xke.type = image->type;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec, &xke);
+
+	pr_emerg("kexec: %s: HYPERVISOR_kexec_op(): %i\n", __func__, rc);
+	BUG();
+}
diff --git a/arch/x86/xen/relocate_kernel_32.S b/arch/x86/xen/relocate_kernel_32.S
new file mode 100644
index 0000000..0e81830
--- /dev/null
+++ b/arch/x86/xen/relocate_kernel_32.S
@@ -0,0 +1,323 @@
+/*
+ * Copyright (c) 2002-2005 Eric Biederman <ebiederm@xmission.com>
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either veesion 2 of the License, or
+ * (at your option) any later veesion.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <asm/cache.h>
+#include <asm/page_types.h>
+#include <asm/pgtable_types.h>
+#include <asm/processor-flags.h>
+
+#include <asm/xen/kexec.h>
+
+#define ARG_INDIRECTION_PAGE	0x4
+#define ARG_PAGE_LIST		0x8
+#define ARG_START_ADDRESS	0xc
+
+#define PTR(x)	(x << 2)
+
+	.text
+	.align	PAGE_SIZE
+	.globl	xen_kexec_control_code_size, xen_relocate_kernel
+
+xen_relocate_kernel:
+	/*
+	 * Must be relocatable PIC code callable as a C function.
+	 *
+	 * This function is called by Xen but here hypervisor is dead.
+	 * We are playing on bare metal.
+	 *
+	 * Every machine address passed to this function through
+	 * page_list (e.g. XK_MA_CONTROL_PAGE) is established
+	 * by dom0 during kexec load phase.
+	 *
+	 * Every virtual address passed to this function through page_list
+	 * (e.g. XK_VA_CONTROL_PAGE) is established by hypervisor during
+	 * HYPERVISOR_kexec_op(KEXEC_CMD_kexec_load) hypercall.
+	 *
+	 * 0x4(%esp) - indirection_page,
+	 * 0x8(%esp) - page_list,
+	 * 0xc(%esp) - start_address,
+	 * 0x10(%esp) - cpu_has_pae (ignored),
+	 * 0x14(%esp) - preserve_context (ignored).
+	 */
+
+	/* Zero out flags, and disable interrupts. */
+	pushl	$0
+	popfl
+
+	/* Get page_list address. */
+	movl	ARG_PAGE_LIST(%esp), %esi
+
+	/*
+	 * Map the control page at its virtual address
+	 * in transition page table.
+	 */
+	movl	PTR(XK_VA_CONTROL_PAGE)(%esi), %eax
+
+	/* Get PGD address and PGD entry index. */
+	movl	PTR(XK_VA_PGD_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PGDIR_SHIFT, %ecx
+	andl	$(PTRS_PER_PGD - 1), %ecx
+
+	/* Fill PGD entry with PMD0 reference. */
+	movl	PTR(XK_MA_PMD0_PAGE)(%esi), %edx
+	orl	$_PAGE_PRESENT, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/* Get PMD0 address and PMD0 entry index. */
+	movl	PTR(XK_VA_PMD0_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PMD_SHIFT, %ecx
+	andl	$(PTRS_PER_PMD - 1), %ecx
+
+	/* Fill PMD0 entry with PTE0 reference. */
+	movl	PTR(XK_MA_PTE0_PAGE)(%esi), %edx
+	orl	$_KERNPG_TABLE, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/* Get PTE0 address and PTE0 entry index. */
+	movl	PTR(XK_VA_PTE0_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PAGE_SHIFT, %ecx
+	andl	$(PTRS_PER_PTE - 1), %ecx
+
+	/* Fill PTE0 entry with control page reference. */
+	movl	PTR(XK_MA_CONTROL_PAGE)(%esi), %edx
+	orl	$__PAGE_KERNEL_EXEC, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/*
+	 * Identity map the control page at its machine address
+	 * in transition page table.
+	 */
+	movl	PTR(XK_MA_CONTROL_PAGE)(%esi), %eax
+
+	/* Get PGD address and PGD entry index. */
+	movl	PTR(XK_VA_PGD_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PGDIR_SHIFT, %ecx
+	andl	$(PTRS_PER_PGD - 1), %ecx
+
+	/* Fill PGD entry with PMD1 reference. */
+	movl	PTR(XK_MA_PMD1_PAGE)(%esi), %edx
+	orl	$_PAGE_PRESENT, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/* Get PMD1 address and PMD1 entry index. */
+	movl	PTR(XK_VA_PMD1_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PMD_SHIFT, %ecx
+	andl	$(PTRS_PER_PMD - 1), %ecx
+
+	/* Fill PMD1 entry with PTE1 reference. */
+	movl	PTR(XK_MA_PTE1_PAGE)(%esi), %edx
+	orl	$_KERNPG_TABLE, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/* Get PTE1 address and PTE1 entry index. */
+	movl	PTR(XK_VA_PTE1_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PAGE_SHIFT, %ecx
+	andl	$(PTRS_PER_PTE - 1), %ecx
+
+	/* Fill PTE1 entry with control page reference. */
+	movl	PTR(XK_MA_CONTROL_PAGE)(%esi), %edx
+	orl	$__PAGE_KERNEL_EXEC, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/*
+	 * Get machine address of control page now.
+	 * This is impossible after page table switch.
+	 */
+	movl	PTR(XK_MA_CONTROL_PAGE)(%esi), %ebx
+
+	/* Get machine address of transition page table now too. */
+	movl	PTR(XK_MA_PGD_PAGE)(%esi), %ecx
+
+	/* Get start_address too. */
+	movl	ARG_START_ADDRESS(%esp), %edx
+
+	/* Get indirection_page address too. */
+	movl	ARG_INDIRECTION_PAGE(%esp), %edi
+
+	/* Switch to transition page table. */
+	movl	%ecx, %cr3
+
+	/* Load IDT. */
+	lidtl	(idt_48 - xen_relocate_kernel)(%ebx)
+
+	/* Load GDT. */
+	leal	(gdt - xen_relocate_kernel)(%ebx), %eax
+	movl	%eax, (gdt_48 - xen_relocate_kernel + 2)(%ebx)
+	lgdtl	(gdt_48 - xen_relocate_kernel)(%ebx)
+
+	/* Load data segment registers. */
+	movl	$(gdt_ds - gdt), %eax
+	movl	%eax, %ds
+	movl	%eax, %es
+	movl	%eax, %fs
+	movl	%eax, %gs
+	movl	%eax, %ss
+
+	/* Setup a new stack at the end of machine address of control page. */
+	leal	PAGE_SIZE(%ebx), %esp
+
+	/* Store start_address on the stack. */
+	pushl   %edx
+
+	/* Jump to identity mapped page. */
+	pushl	$0
+	pushl	$(gdt_cs - gdt)
+	addl	$(identity_mapped - xen_relocate_kernel), %ebx
+	pushl	%ebx
+	iretl
+
+identity_mapped:
+	/*
+	 * Set %cr0 to a known state:
+	 *   - disable alignment check,
+	 *   - disable floating point emulation,
+	 *   - disable paging,
+	 *   - no task switch,
+	 *   - disable write protect,
+	 *   - enable protected mode.
+	 */
+	movl	%cr0, %eax
+	andl	$~(X86_CR0_AM | X86_CR0_EM | X86_CR0_PG | X86_CR0_TS | X86_CR0_WP), %eax
+	orl	$(X86_CR0_PE), %eax
+	movl	%eax, %cr0
+
+	/* Set %cr4 to a known state. */
+	xorl	%eax, %eax
+	movl	%eax, %cr4
+
+	jmp	1f
+
+1:
+	/* Flush the TLB (needed?). */
+	movl	%eax, %cr3
+
+	/* Do the copies. */
+	movl	%edi, %ecx	/* Put the indirection_page in %ecx. */
+	xorl	%edi, %edi
+	xorl	%esi, %esi
+	jmp	1f
+
+0:
+	/*
+	 * Top, read another doubleword from the indirection page.
+	 * Indirection page is an array which contains source
+	 * and destination address pairs. If all pairs could
+	 * not fit in one page then at the end of given
+	 * indirection page is pointer to next one.
+	 * Copy is stopped when done indicator
+	 * is found in indirection page.
+	 */
+	movl	(%ebx), %ecx
+	addl	$4, %ebx
+
+1:
+	testl	$0x1, %ecx	/* Is it a destination page? */
+	jz	2f
+
+	movl	%ecx, %edi
+	andl	$PAGE_MASK, %edi
+	jmp	0b
+
+2:
+	testl	$0x2, %ecx	/* Is it an indirection page? */
+	jz	2f
+
+	movl	%ecx, %ebx
+	andl	$PAGE_MASK, %ebx
+	jmp	0b
+
+2:
+	testl	$0x4, %ecx	/* Is it the done indicator? */
+	jz	2f
+	jmp	3f
+
+2:
+	testl	$0x8, %ecx	/* Is it the source indicator? */
+	jz	0b		/* Ignore it otherwise. */
+
+	movl	%ecx, %esi
+	andl	$PAGE_MASK, %esi
+	movl	$1024, %ecx
+
+	/* Copy page. */
+	rep	movsl
+	jmp	0b
+
+3:
+	/*
+	 * To be certain of avoiding problems with self-modifying code
+	 * I need to execute a serializing instruction here.
+	 * So I flush the TLB by reloading %cr3 here, it's handy,
+	 * and not processor dependent.
+	 */
+	xorl	%eax, %eax
+	movl	%eax, %cr3
+
+	/*
+	 * Set all of the registers to known values.
+	 * Leave %esp alone.
+	 */
+	xorl	%ebx, %ebx
+	xorl    %ecx, %ecx
+	xorl    %edx, %edx
+	xorl    %esi, %esi
+	xorl    %edi, %edi
+	xorl    %ebp, %ebp
+
+	/* Jump to start_address. */
+	retl
+
+	.align	L1_CACHE_BYTES
+
+gdt:
+	.quad	0x0000000000000000	/* NULL descriptor. */
+
+gdt_cs:
+	.quad	0x00cf9a000000ffff	/* 4 GiB code segment at 0x00000000. */
+
+gdt_ds:
+	.quad	0x00cf92000000ffff	/* 4 GiB data segment at 0x00000000. */
+gdt_end:
+
+gdt_48:
+	.word	gdt_end - gdt - 1	/* GDT limit. */
+	.long	0			/* GDT base - filled in by code above. */
+
+idt_48:
+	.word	0			/* IDT limit. */
+	.long	0			/* IDT base. */
+
+xen_kexec_control_code_size:
+	.long	. - xen_relocate_kernel
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 06/11] x86/xen: Add i386 kexec/kdump implementation
@ 2012-12-27  2:18             ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3-Sxgqhf6Nn4DQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	jbeulich-IBi9RG/b67k, konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
	maxim.uvarov-QHcLZuEGTsvQT0dZR+AlfA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR
  Cc: Daniel Kiper

Add i386 kexec/kdump implementation.

v2 - suggestions/fixes:
   - allocate transition page table pages below 4 GiB
     (suggested by Jan Beulich).

Signed-off-by: Daniel Kiper <daniel.kiper-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 arch/x86/xen/machine_kexec_32.c   |  226 ++++++++++++++++++++++++++
 arch/x86/xen/relocate_kernel_32.S |  323 +++++++++++++++++++++++++++++++++++++
 2 files changed, 549 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/xen/machine_kexec_32.c
 create mode 100644 arch/x86/xen/relocate_kernel_32.S

diff --git a/arch/x86/xen/machine_kexec_32.c b/arch/x86/xen/machine_kexec_32.c
new file mode 100644
index 0000000..011a5e8
--- /dev/null
+++ b/arch/x86/xen/machine_kexec_32.c
@@ -0,0 +1,226 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/mm.h>
+#include <linux/string.h>
+
+#include <xen/xen.h>
+#include <xen/xen-ops.h>
+
+#include <asm/xen/hypercall.h>
+#include <asm/xen/kexec.h>
+#include <asm/xen/page.h>
+
+#define __ma(vaddr)	(virt_to_machine(vaddr).maddr)
+
+static void *alloc_pgtable_page(struct kimage *image)
+{
+	struct page *page;
+
+	page = firmware_kimage_alloc_control_pages(image, 0);
+
+	if (!page || !page_address(page))
+		return NULL;
+
+	memset(page_address(page), 0, PAGE_SIZE);
+
+	return page_address(page);
+}
+
+static int alloc_transition_pgtable(struct kimage *image)
+{
+	image->arch.pgd = alloc_pgtable_page(image);
+
+	if (!image->arch.pgd)
+		return -ENOMEM;
+
+	image->arch.pmd0 = alloc_pgtable_page(image);
+
+	if (!image->arch.pmd0)
+		return -ENOMEM;
+
+	image->arch.pmd1 = alloc_pgtable_page(image);
+
+	if (!image->arch.pmd1)
+		return -ENOMEM;
+
+	image->arch.pte0 = alloc_pgtable_page(image);
+
+	if (!image->arch.pte0)
+		return -ENOMEM;
+
+	image->arch.pte1 = alloc_pgtable_page(image);
+
+	if (!image->arch.pte1)
+		return -ENOMEM;
+
+	return 0;
+}
+
+struct page *mf_kexec_kimage_alloc_pages(gfp_t gfp_mask,
+						unsigned int order,
+						unsigned long limit)
+{
+	struct page *pages;
+	unsigned int address_bits, i;
+
+	pages = alloc_pages(gfp_mask, order);
+
+	if (!pages)
+		return NULL;
+
+	address_bits = (limit == ULONG_MAX) ? BITS_PER_LONG : ilog2(limit);
+
+	/* Relocate set of pages below given limit. */
+	if (xen_create_contiguous_region((unsigned long)page_address(pages),
+							order, address_bits)) {
+		__free_pages(pages, order);
+		return NULL;
+	}
+
+	BUG_ON(PagePrivate(pages));
+
+	pages->mapping = NULL;
+	set_page_private(pages, order);
+
+	for (i = 0; i < (1 << order); ++i)
+		SetPageReserved(pages + i);
+
+	return pages;
+}
+
+void mf_kexec_kimage_free_pages(struct page *page)
+{
+	unsigned int i, order;
+
+	order = page_private(page);
+
+	for (i = 0; i < (1 << order); ++i)
+		ClearPageReserved(page + i);
+
+	xen_destroy_contiguous_region((unsigned long)page_address(page), order);
+	__free_pages(page, order);
+}
+
+unsigned long mf_kexec_page_to_pfn(struct page *page)
+{
+	return pfn_to_mfn(page_to_pfn(page));
+}
+
+struct page *mf_kexec_pfn_to_page(unsigned long mfn)
+{
+	return pfn_to_page(mfn_to_pfn(mfn));
+}
+
+unsigned long mf_kexec_virt_to_phys(volatile void *address)
+{
+	return virt_to_machine(address).maddr;
+}
+
+void *mf_kexec_phys_to_virt(unsigned long address)
+{
+	return phys_to_virt(machine_to_phys(XMADDR(address)).paddr);
+}
+
+int mf_kexec_prepare(struct kimage *image)
+{
+#ifdef CONFIG_KEXEC_JUMP
+	if (image->preserve_context) {
+		pr_info_once("kexec: Context preservation is not "
+				"supported in Xen domains.\n");
+		return -ENOSYS;
+	}
+#endif
+
+	return alloc_transition_pgtable(image);
+}
+
+int mf_kexec_load(struct kimage *image)
+{
+	void *control_page;
+	struct xen_kexec_load xkl = {};
+
+	/* Image is unloaded, nothing to do. */
+	if (!image)
+		return 0;
+
+	control_page = page_address(image->control_code_page);
+	memcpy(control_page, xen_relocate_kernel, xen_kexec_control_code_size);
+
+	xkl.type = image->type;
+	xkl.image.page_list[XK_MA_CONTROL_PAGE] = __ma(control_page);
+	xkl.image.page_list[XK_MA_TABLE_PAGE] = 0; /* Unused. */
+	xkl.image.page_list[XK_MA_PGD_PAGE] = __ma(image->arch.pgd);
+	xkl.image.page_list[XK_MA_PUD0_PAGE] = 0; /* Unused. */
+	xkl.image.page_list[XK_MA_PUD1_PAGE] = 0; /* Unused. */
+	xkl.image.page_list[XK_MA_PMD0_PAGE] = __ma(image->arch.pmd0);
+	xkl.image.page_list[XK_MA_PMD1_PAGE] = __ma(image->arch.pmd1);
+	xkl.image.page_list[XK_MA_PTE0_PAGE] = __ma(image->arch.pte0);
+	xkl.image.page_list[XK_MA_PTE1_PAGE] = __ma(image->arch.pte1);
+	xkl.image.indirection_page = image->head;
+	xkl.image.start_address = image->start;
+
+	return HYPERVISOR_kexec_op(KEXEC_CMD_kexec_load, &xkl);
+}
+
+void mf_kexec_cleanup(struct kimage *image)
+{
+}
+
+void mf_kexec_unload(struct kimage *image)
+{
+	int rc;
+	struct xen_kexec_load xkl = {};
+
+	if (!image)
+		return;
+
+	xkl.type = image->type;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_unload, &xkl);
+
+	WARN(rc, "kexec: %s: HYPERVISOR_kexec_op(): %i\n", __func__, rc);
+}
+
+void mf_kexec_shutdown(void)
+{
+}
+
+void mf_kexec(struct kimage *image)
+{
+	int rc;
+	struct xen_kexec_exec xke = {};
+
+	xke.type = image->type;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec, &xke);
+
+	pr_emerg("kexec: %s: HYPERVISOR_kexec_op(): %i\n", __func__, rc);
+	BUG();
+}
diff --git a/arch/x86/xen/relocate_kernel_32.S b/arch/x86/xen/relocate_kernel_32.S
new file mode 100644
index 0000000..0e81830
--- /dev/null
+++ b/arch/x86/xen/relocate_kernel_32.S
@@ -0,0 +1,323 @@
+/*
+ * Copyright (c) 2002-2005 Eric Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either veesion 2 of the License, or
+ * (at your option) any later veesion.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <asm/cache.h>
+#include <asm/page_types.h>
+#include <asm/pgtable_types.h>
+#include <asm/processor-flags.h>
+
+#include <asm/xen/kexec.h>
+
+#define ARG_INDIRECTION_PAGE	0x4
+#define ARG_PAGE_LIST		0x8
+#define ARG_START_ADDRESS	0xc
+
+#define PTR(x)	(x << 2)
+
+	.text
+	.align	PAGE_SIZE
+	.globl	xen_kexec_control_code_size, xen_relocate_kernel
+
+xen_relocate_kernel:
+	/*
+	 * Must be relocatable PIC code callable as a C function.
+	 *
+	 * This function is called by Xen but here hypervisor is dead.
+	 * We are playing on bare metal.
+	 *
+	 * Every machine address passed to this function through
+	 * page_list (e.g. XK_MA_CONTROL_PAGE) is established
+	 * by dom0 during kexec load phase.
+	 *
+	 * Every virtual address passed to this function through page_list
+	 * (e.g. XK_VA_CONTROL_PAGE) is established by hypervisor during
+	 * HYPERVISOR_kexec_op(KEXEC_CMD_kexec_load) hypercall.
+	 *
+	 * 0x4(%esp) - indirection_page,
+	 * 0x8(%esp) - page_list,
+	 * 0xc(%esp) - start_address,
+	 * 0x10(%esp) - cpu_has_pae (ignored),
+	 * 0x14(%esp) - preserve_context (ignored).
+	 */
+
+	/* Zero out flags, and disable interrupts. */
+	pushl	$0
+	popfl
+
+	/* Get page_list address. */
+	movl	ARG_PAGE_LIST(%esp), %esi
+
+	/*
+	 * Map the control page at its virtual address
+	 * in transition page table.
+	 */
+	movl	PTR(XK_VA_CONTROL_PAGE)(%esi), %eax
+
+	/* Get PGD address and PGD entry index. */
+	movl	PTR(XK_VA_PGD_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PGDIR_SHIFT, %ecx
+	andl	$(PTRS_PER_PGD - 1), %ecx
+
+	/* Fill PGD entry with PMD0 reference. */
+	movl	PTR(XK_MA_PMD0_PAGE)(%esi), %edx
+	orl	$_PAGE_PRESENT, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/* Get PMD0 address and PMD0 entry index. */
+	movl	PTR(XK_VA_PMD0_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PMD_SHIFT, %ecx
+	andl	$(PTRS_PER_PMD - 1), %ecx
+
+	/* Fill PMD0 entry with PTE0 reference. */
+	movl	PTR(XK_MA_PTE0_PAGE)(%esi), %edx
+	orl	$_KERNPG_TABLE, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/* Get PTE0 address and PTE0 entry index. */
+	movl	PTR(XK_VA_PTE0_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PAGE_SHIFT, %ecx
+	andl	$(PTRS_PER_PTE - 1), %ecx
+
+	/* Fill PTE0 entry with control page reference. */
+	movl	PTR(XK_MA_CONTROL_PAGE)(%esi), %edx
+	orl	$__PAGE_KERNEL_EXEC, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/*
+	 * Identity map the control page at its machine address
+	 * in transition page table.
+	 */
+	movl	PTR(XK_MA_CONTROL_PAGE)(%esi), %eax
+
+	/* Get PGD address and PGD entry index. */
+	movl	PTR(XK_VA_PGD_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PGDIR_SHIFT, %ecx
+	andl	$(PTRS_PER_PGD - 1), %ecx
+
+	/* Fill PGD entry with PMD1 reference. */
+	movl	PTR(XK_MA_PMD1_PAGE)(%esi), %edx
+	orl	$_PAGE_PRESENT, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/* Get PMD1 address and PMD1 entry index. */
+	movl	PTR(XK_VA_PMD1_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PMD_SHIFT, %ecx
+	andl	$(PTRS_PER_PMD - 1), %ecx
+
+	/* Fill PMD1 entry with PTE1 reference. */
+	movl	PTR(XK_MA_PTE1_PAGE)(%esi), %edx
+	orl	$_KERNPG_TABLE, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/* Get PTE1 address and PTE1 entry index. */
+	movl	PTR(XK_VA_PTE1_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PAGE_SHIFT, %ecx
+	andl	$(PTRS_PER_PTE - 1), %ecx
+
+	/* Fill PTE1 entry with control page reference. */
+	movl	PTR(XK_MA_CONTROL_PAGE)(%esi), %edx
+	orl	$__PAGE_KERNEL_EXEC, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/*
+	 * Get machine address of control page now.
+	 * This is impossible after page table switch.
+	 */
+	movl	PTR(XK_MA_CONTROL_PAGE)(%esi), %ebx
+
+	/* Get machine address of transition page table now too. */
+	movl	PTR(XK_MA_PGD_PAGE)(%esi), %ecx
+
+	/* Get start_address too. */
+	movl	ARG_START_ADDRESS(%esp), %edx
+
+	/* Get indirection_page address too. */
+	movl	ARG_INDIRECTION_PAGE(%esp), %edi
+
+	/* Switch to transition page table. */
+	movl	%ecx, %cr3
+
+	/* Load IDT. */
+	lidtl	(idt_48 - xen_relocate_kernel)(%ebx)
+
+	/* Load GDT. */
+	leal	(gdt - xen_relocate_kernel)(%ebx), %eax
+	movl	%eax, (gdt_48 - xen_relocate_kernel + 2)(%ebx)
+	lgdtl	(gdt_48 - xen_relocate_kernel)(%ebx)
+
+	/* Load data segment registers. */
+	movl	$(gdt_ds - gdt), %eax
+	movl	%eax, %ds
+	movl	%eax, %es
+	movl	%eax, %fs
+	movl	%eax, %gs
+	movl	%eax, %ss
+
+	/* Setup a new stack at the end of machine address of control page. */
+	leal	PAGE_SIZE(%ebx), %esp
+
+	/* Store start_address on the stack. */
+	pushl   %edx
+
+	/* Jump to identity mapped page. */
+	pushl	$0
+	pushl	$(gdt_cs - gdt)
+	addl	$(identity_mapped - xen_relocate_kernel), %ebx
+	pushl	%ebx
+	iretl
+
+identity_mapped:
+	/*
+	 * Set %cr0 to a known state:
+	 *   - disable alignment check,
+	 *   - disable floating point emulation,
+	 *   - disable paging,
+	 *   - no task switch,
+	 *   - disable write protect,
+	 *   - enable protected mode.
+	 */
+	movl	%cr0, %eax
+	andl	$~(X86_CR0_AM | X86_CR0_EM | X86_CR0_PG | X86_CR0_TS | X86_CR0_WP), %eax
+	orl	$(X86_CR0_PE), %eax
+	movl	%eax, %cr0
+
+	/* Set %cr4 to a known state. */
+	xorl	%eax, %eax
+	movl	%eax, %cr4
+
+	jmp	1f
+
+1:
+	/* Flush the TLB (needed?). */
+	movl	%eax, %cr3
+
+	/* Do the copies. */
+	movl	%edi, %ecx	/* Put the indirection_page in %ecx. */
+	xorl	%edi, %edi
+	xorl	%esi, %esi
+	jmp	1f
+
+0:
+	/*
+	 * Top, read another doubleword from the indirection page.
+	 * Indirection page is an array which contains source
+	 * and destination address pairs. If all pairs could
+	 * not fit in one page then at the end of given
+	 * indirection page is pointer to next one.
+	 * Copy is stopped when done indicator
+	 * is found in indirection page.
+	 */
+	movl	(%ebx), %ecx
+	addl	$4, %ebx
+
+1:
+	testl	$0x1, %ecx	/* Is it a destination page? */
+	jz	2f
+
+	movl	%ecx, %edi
+	andl	$PAGE_MASK, %edi
+	jmp	0b
+
+2:
+	testl	$0x2, %ecx	/* Is it an indirection page? */
+	jz	2f
+
+	movl	%ecx, %ebx
+	andl	$PAGE_MASK, %ebx
+	jmp	0b
+
+2:
+	testl	$0x4, %ecx	/* Is it the done indicator? */
+	jz	2f
+	jmp	3f
+
+2:
+	testl	$0x8, %ecx	/* Is it the source indicator? */
+	jz	0b		/* Ignore it otherwise. */
+
+	movl	%ecx, %esi
+	andl	$PAGE_MASK, %esi
+	movl	$1024, %ecx
+
+	/* Copy page. */
+	rep	movsl
+	jmp	0b
+
+3:
+	/*
+	 * To be certain of avoiding problems with self-modifying code
+	 * I need to execute a serializing instruction here.
+	 * So I flush the TLB by reloading %cr3 here, it's handy,
+	 * and not processor dependent.
+	 */
+	xorl	%eax, %eax
+	movl	%eax, %cr3
+
+	/*
+	 * Set all of the registers to known values.
+	 * Leave %esp alone.
+	 */
+	xorl	%ebx, %ebx
+	xorl    %ecx, %ecx
+	xorl    %edx, %edx
+	xorl    %esi, %esi
+	xorl    %edi, %edi
+	xorl    %ebp, %ebp
+
+	/* Jump to start_address. */
+	retl
+
+	.align	L1_CACHE_BYTES
+
+gdt:
+	.quad	0x0000000000000000	/* NULL descriptor. */
+
+gdt_cs:
+	.quad	0x00cf9a000000ffff	/* 4 GiB code segment at 0x00000000. */
+
+gdt_ds:
+	.quad	0x00cf92000000ffff	/* 4 GiB data segment at 0x00000000. */
+gdt_end:
+
+gdt_48:
+	.word	gdt_end - gdt - 1	/* GDT limit. */
+	.long	0			/* GDT base - filled in by code above. */
+
+idt_48:
+	.word	0			/* IDT limit. */
+	.long	0			/* IDT base. */
+
+xen_kexec_control_code_size:
+	.long	. - xen_relocate_kernel
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 06/11] x86/xen: Add i386 kexec/kdump implementation
@ 2012-12-27  2:18             ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Add i386 kexec/kdump implementation.

v2 - suggestions/fixes:
   - allocate transition page table pages below 4 GiB
     (suggested by Jan Beulich).

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/xen/machine_kexec_32.c   |  226 ++++++++++++++++++++++++++
 arch/x86/xen/relocate_kernel_32.S |  323 +++++++++++++++++++++++++++++++++++++
 2 files changed, 549 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/xen/machine_kexec_32.c
 create mode 100644 arch/x86/xen/relocate_kernel_32.S

diff --git a/arch/x86/xen/machine_kexec_32.c b/arch/x86/xen/machine_kexec_32.c
new file mode 100644
index 0000000..011a5e8
--- /dev/null
+++ b/arch/x86/xen/machine_kexec_32.c
@@ -0,0 +1,226 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/mm.h>
+#include <linux/string.h>
+
+#include <xen/xen.h>
+#include <xen/xen-ops.h>
+
+#include <asm/xen/hypercall.h>
+#include <asm/xen/kexec.h>
+#include <asm/xen/page.h>
+
+#define __ma(vaddr)	(virt_to_machine(vaddr).maddr)
+
+static void *alloc_pgtable_page(struct kimage *image)
+{
+	struct page *page;
+
+	page = firmware_kimage_alloc_control_pages(image, 0);
+
+	if (!page || !page_address(page))
+		return NULL;
+
+	memset(page_address(page), 0, PAGE_SIZE);
+
+	return page_address(page);
+}
+
+static int alloc_transition_pgtable(struct kimage *image)
+{
+	image->arch.pgd = alloc_pgtable_page(image);
+
+	if (!image->arch.pgd)
+		return -ENOMEM;
+
+	image->arch.pmd0 = alloc_pgtable_page(image);
+
+	if (!image->arch.pmd0)
+		return -ENOMEM;
+
+	image->arch.pmd1 = alloc_pgtable_page(image);
+
+	if (!image->arch.pmd1)
+		return -ENOMEM;
+
+	image->arch.pte0 = alloc_pgtable_page(image);
+
+	if (!image->arch.pte0)
+		return -ENOMEM;
+
+	image->arch.pte1 = alloc_pgtable_page(image);
+
+	if (!image->arch.pte1)
+		return -ENOMEM;
+
+	return 0;
+}
+
+struct page *mf_kexec_kimage_alloc_pages(gfp_t gfp_mask,
+						unsigned int order,
+						unsigned long limit)
+{
+	struct page *pages;
+	unsigned int address_bits, i;
+
+	pages = alloc_pages(gfp_mask, order);
+
+	if (!pages)
+		return NULL;
+
+	address_bits = (limit == ULONG_MAX) ? BITS_PER_LONG : ilog2(limit);
+
+	/* Relocate set of pages below given limit. */
+	if (xen_create_contiguous_region((unsigned long)page_address(pages),
+							order, address_bits)) {
+		__free_pages(pages, order);
+		return NULL;
+	}
+
+	BUG_ON(PagePrivate(pages));
+
+	pages->mapping = NULL;
+	set_page_private(pages, order);
+
+	for (i = 0; i < (1 << order); ++i)
+		SetPageReserved(pages + i);
+
+	return pages;
+}
+
+void mf_kexec_kimage_free_pages(struct page *page)
+{
+	unsigned int i, order;
+
+	order = page_private(page);
+
+	for (i = 0; i < (1 << order); ++i)
+		ClearPageReserved(page + i);
+
+	xen_destroy_contiguous_region((unsigned long)page_address(page), order);
+	__free_pages(page, order);
+}
+
+unsigned long mf_kexec_page_to_pfn(struct page *page)
+{
+	return pfn_to_mfn(page_to_pfn(page));
+}
+
+struct page *mf_kexec_pfn_to_page(unsigned long mfn)
+{
+	return pfn_to_page(mfn_to_pfn(mfn));
+}
+
+unsigned long mf_kexec_virt_to_phys(volatile void *address)
+{
+	return virt_to_machine(address).maddr;
+}
+
+void *mf_kexec_phys_to_virt(unsigned long address)
+{
+	return phys_to_virt(machine_to_phys(XMADDR(address)).paddr);
+}
+
+int mf_kexec_prepare(struct kimage *image)
+{
+#ifdef CONFIG_KEXEC_JUMP
+	if (image->preserve_context) {
+		pr_info_once("kexec: Context preservation is not "
+				"supported in Xen domains.\n");
+		return -ENOSYS;
+	}
+#endif
+
+	return alloc_transition_pgtable(image);
+}
+
+int mf_kexec_load(struct kimage *image)
+{
+	void *control_page;
+	struct xen_kexec_load xkl = {};
+
+	/* Image is unloaded, nothing to do. */
+	if (!image)
+		return 0;
+
+	control_page = page_address(image->control_code_page);
+	memcpy(control_page, xen_relocate_kernel, xen_kexec_control_code_size);
+
+	xkl.type = image->type;
+	xkl.image.page_list[XK_MA_CONTROL_PAGE] = __ma(control_page);
+	xkl.image.page_list[XK_MA_TABLE_PAGE] = 0; /* Unused. */
+	xkl.image.page_list[XK_MA_PGD_PAGE] = __ma(image->arch.pgd);
+	xkl.image.page_list[XK_MA_PUD0_PAGE] = 0; /* Unused. */
+	xkl.image.page_list[XK_MA_PUD1_PAGE] = 0; /* Unused. */
+	xkl.image.page_list[XK_MA_PMD0_PAGE] = __ma(image->arch.pmd0);
+	xkl.image.page_list[XK_MA_PMD1_PAGE] = __ma(image->arch.pmd1);
+	xkl.image.page_list[XK_MA_PTE0_PAGE] = __ma(image->arch.pte0);
+	xkl.image.page_list[XK_MA_PTE1_PAGE] = __ma(image->arch.pte1);
+	xkl.image.indirection_page = image->head;
+	xkl.image.start_address = image->start;
+
+	return HYPERVISOR_kexec_op(KEXEC_CMD_kexec_load, &xkl);
+}
+
+void mf_kexec_cleanup(struct kimage *image)
+{
+}
+
+void mf_kexec_unload(struct kimage *image)
+{
+	int rc;
+	struct xen_kexec_load xkl = {};
+
+	if (!image)
+		return;
+
+	xkl.type = image->type;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_unload, &xkl);
+
+	WARN(rc, "kexec: %s: HYPERVISOR_kexec_op(): %i\n", __func__, rc);
+}
+
+void mf_kexec_shutdown(void)
+{
+}
+
+void mf_kexec(struct kimage *image)
+{
+	int rc;
+	struct xen_kexec_exec xke = {};
+
+	xke.type = image->type;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec, &xke);
+
+	pr_emerg("kexec: %s: HYPERVISOR_kexec_op(): %i\n", __func__, rc);
+	BUG();
+}
diff --git a/arch/x86/xen/relocate_kernel_32.S b/arch/x86/xen/relocate_kernel_32.S
new file mode 100644
index 0000000..0e81830
--- /dev/null
+++ b/arch/x86/xen/relocate_kernel_32.S
@@ -0,0 +1,323 @@
+/*
+ * Copyright (c) 2002-2005 Eric Biederman <ebiederm@xmission.com>
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either veesion 2 of the License, or
+ * (at your option) any later veesion.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <asm/cache.h>
+#include <asm/page_types.h>
+#include <asm/pgtable_types.h>
+#include <asm/processor-flags.h>
+
+#include <asm/xen/kexec.h>
+
+#define ARG_INDIRECTION_PAGE	0x4
+#define ARG_PAGE_LIST		0x8
+#define ARG_START_ADDRESS	0xc
+
+#define PTR(x)	(x << 2)
+
+	.text
+	.align	PAGE_SIZE
+	.globl	xen_kexec_control_code_size, xen_relocate_kernel
+
+xen_relocate_kernel:
+	/*
+	 * Must be relocatable PIC code callable as a C function.
+	 *
+	 * This function is called by Xen but here hypervisor is dead.
+	 * We are playing on bare metal.
+	 *
+	 * Every machine address passed to this function through
+	 * page_list (e.g. XK_MA_CONTROL_PAGE) is established
+	 * by dom0 during kexec load phase.
+	 *
+	 * Every virtual address passed to this function through page_list
+	 * (e.g. XK_VA_CONTROL_PAGE) is established by hypervisor during
+	 * HYPERVISOR_kexec_op(KEXEC_CMD_kexec_load) hypercall.
+	 *
+	 * 0x4(%esp) - indirection_page,
+	 * 0x8(%esp) - page_list,
+	 * 0xc(%esp) - start_address,
+	 * 0x10(%esp) - cpu_has_pae (ignored),
+	 * 0x14(%esp) - preserve_context (ignored).
+	 */
+
+	/* Zero out flags, and disable interrupts. */
+	pushl	$0
+	popfl
+
+	/* Get page_list address. */
+	movl	ARG_PAGE_LIST(%esp), %esi
+
+	/*
+	 * Map the control page at its virtual address
+	 * in transition page table.
+	 */
+	movl	PTR(XK_VA_CONTROL_PAGE)(%esi), %eax
+
+	/* Get PGD address and PGD entry index. */
+	movl	PTR(XK_VA_PGD_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PGDIR_SHIFT, %ecx
+	andl	$(PTRS_PER_PGD - 1), %ecx
+
+	/* Fill PGD entry with PMD0 reference. */
+	movl	PTR(XK_MA_PMD0_PAGE)(%esi), %edx
+	orl	$_PAGE_PRESENT, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/* Get PMD0 address and PMD0 entry index. */
+	movl	PTR(XK_VA_PMD0_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PMD_SHIFT, %ecx
+	andl	$(PTRS_PER_PMD - 1), %ecx
+
+	/* Fill PMD0 entry with PTE0 reference. */
+	movl	PTR(XK_MA_PTE0_PAGE)(%esi), %edx
+	orl	$_KERNPG_TABLE, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/* Get PTE0 address and PTE0 entry index. */
+	movl	PTR(XK_VA_PTE0_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PAGE_SHIFT, %ecx
+	andl	$(PTRS_PER_PTE - 1), %ecx
+
+	/* Fill PTE0 entry with control page reference. */
+	movl	PTR(XK_MA_CONTROL_PAGE)(%esi), %edx
+	orl	$__PAGE_KERNEL_EXEC, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/*
+	 * Identity map the control page at its machine address
+	 * in transition page table.
+	 */
+	movl	PTR(XK_MA_CONTROL_PAGE)(%esi), %eax
+
+	/* Get PGD address and PGD entry index. */
+	movl	PTR(XK_VA_PGD_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PGDIR_SHIFT, %ecx
+	andl	$(PTRS_PER_PGD - 1), %ecx
+
+	/* Fill PGD entry with PMD1 reference. */
+	movl	PTR(XK_MA_PMD1_PAGE)(%esi), %edx
+	orl	$_PAGE_PRESENT, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/* Get PMD1 address and PMD1 entry index. */
+	movl	PTR(XK_VA_PMD1_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PMD_SHIFT, %ecx
+	andl	$(PTRS_PER_PMD - 1), %ecx
+
+	/* Fill PMD1 entry with PTE1 reference. */
+	movl	PTR(XK_MA_PTE1_PAGE)(%esi), %edx
+	orl	$_KERNPG_TABLE, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/* Get PTE1 address and PTE1 entry index. */
+	movl	PTR(XK_VA_PTE1_PAGE)(%esi), %ebx
+	movl	%eax, %ecx
+	shrl	$PAGE_SHIFT, %ecx
+	andl	$(PTRS_PER_PTE - 1), %ecx
+
+	/* Fill PTE1 entry with control page reference. */
+	movl	PTR(XK_MA_CONTROL_PAGE)(%esi), %edx
+	orl	$__PAGE_KERNEL_EXEC, %edx
+	movl	%edx, (%ebx, %ecx, 8)
+
+	/*
+	 * Get machine address of control page now.
+	 * This is impossible after page table switch.
+	 */
+	movl	PTR(XK_MA_CONTROL_PAGE)(%esi), %ebx
+
+	/* Get machine address of transition page table now too. */
+	movl	PTR(XK_MA_PGD_PAGE)(%esi), %ecx
+
+	/* Get start_address too. */
+	movl	ARG_START_ADDRESS(%esp), %edx
+
+	/* Get indirection_page address too. */
+	movl	ARG_INDIRECTION_PAGE(%esp), %edi
+
+	/* Switch to transition page table. */
+	movl	%ecx, %cr3
+
+	/* Load IDT. */
+	lidtl	(idt_48 - xen_relocate_kernel)(%ebx)
+
+	/* Load GDT. */
+	leal	(gdt - xen_relocate_kernel)(%ebx), %eax
+	movl	%eax, (gdt_48 - xen_relocate_kernel + 2)(%ebx)
+	lgdtl	(gdt_48 - xen_relocate_kernel)(%ebx)
+
+	/* Load data segment registers. */
+	movl	$(gdt_ds - gdt), %eax
+	movl	%eax, %ds
+	movl	%eax, %es
+	movl	%eax, %fs
+	movl	%eax, %gs
+	movl	%eax, %ss
+
+	/* Setup a new stack at the end of machine address of control page. */
+	leal	PAGE_SIZE(%ebx), %esp
+
+	/* Store start_address on the stack. */
+	pushl   %edx
+
+	/* Jump to identity mapped page. */
+	pushl	$0
+	pushl	$(gdt_cs - gdt)
+	addl	$(identity_mapped - xen_relocate_kernel), %ebx
+	pushl	%ebx
+	iretl
+
+identity_mapped:
+	/*
+	 * Set %cr0 to a known state:
+	 *   - disable alignment check,
+	 *   - disable floating point emulation,
+	 *   - disable paging,
+	 *   - no task switch,
+	 *   - disable write protect,
+	 *   - enable protected mode.
+	 */
+	movl	%cr0, %eax
+	andl	$~(X86_CR0_AM | X86_CR0_EM | X86_CR0_PG | X86_CR0_TS | X86_CR0_WP), %eax
+	orl	$(X86_CR0_PE), %eax
+	movl	%eax, %cr0
+
+	/* Set %cr4 to a known state. */
+	xorl	%eax, %eax
+	movl	%eax, %cr4
+
+	jmp	1f
+
+1:
+	/* Flush the TLB (needed?). */
+	movl	%eax, %cr3
+
+	/* Do the copies. */
+	movl	%edi, %ecx	/* Put the indirection_page in %ecx. */
+	xorl	%edi, %edi
+	xorl	%esi, %esi
+	jmp	1f
+
+0:
+	/*
+	 * Top, read another doubleword from the indirection page.
+	 * Indirection page is an array which contains source
+	 * and destination address pairs. If all pairs could
+	 * not fit in one page then at the end of given
+	 * indirection page is pointer to next one.
+	 * Copy is stopped when done indicator
+	 * is found in indirection page.
+	 */
+	movl	(%ebx), %ecx
+	addl	$4, %ebx
+
+1:
+	testl	$0x1, %ecx	/* Is it a destination page? */
+	jz	2f
+
+	movl	%ecx, %edi
+	andl	$PAGE_MASK, %edi
+	jmp	0b
+
+2:
+	testl	$0x2, %ecx	/* Is it an indirection page? */
+	jz	2f
+
+	movl	%ecx, %ebx
+	andl	$PAGE_MASK, %ebx
+	jmp	0b
+
+2:
+	testl	$0x4, %ecx	/* Is it the done indicator? */
+	jz	2f
+	jmp	3f
+
+2:
+	testl	$0x8, %ecx	/* Is it the source indicator? */
+	jz	0b		/* Ignore it otherwise. */
+
+	movl	%ecx, %esi
+	andl	$PAGE_MASK, %esi
+	movl	$1024, %ecx
+
+	/* Copy page. */
+	rep	movsl
+	jmp	0b
+
+3:
+	/*
+	 * To be certain of avoiding problems with self-modifying code
+	 * I need to execute a serializing instruction here.
+	 * So I flush the TLB by reloading %cr3 here, it's handy,
+	 * and not processor dependent.
+	 */
+	xorl	%eax, %eax
+	movl	%eax, %cr3
+
+	/*
+	 * Set all of the registers to known values.
+	 * Leave %esp alone.
+	 */
+	xorl	%ebx, %ebx
+	xorl    %ecx, %ecx
+	xorl    %edx, %edx
+	xorl    %esi, %esi
+	xorl    %edi, %edi
+	xorl    %ebp, %ebp
+
+	/* Jump to start_address. */
+	retl
+
+	.align	L1_CACHE_BYTES
+
+gdt:
+	.quad	0x0000000000000000	/* NULL descriptor. */
+
+gdt_cs:
+	.quad	0x00cf9a000000ffff	/* 4 GiB code segment at 0x00000000. */
+
+gdt_ds:
+	.quad	0x00cf92000000ffff	/* 4 GiB data segment at 0x00000000. */
+gdt_end:
+
+gdt_48:
+	.word	gdt_end - gdt - 1	/* GDT limit. */
+	.long	0			/* GDT base - filled in by code above. */
+
+idt_48:
+	.word	0			/* IDT limit. */
+	.long	0			/* IDT base. */
+
+xen_kexec_control_code_size:
+	.long	. - xen_relocate_kernel
-- 
1.5.6.5


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 07/11] x86/xen: Add x86_64 kexec/kdump implementation
@ 2012-12-27  2:18               ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Add x86_64 kexec/kdump implementation.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/xen/machine_kexec_64.c   |  318 +++++++++++++++++++++++++++++++++++++
 arch/x86/xen/relocate_kernel_64.S |  309 +++++++++++++++++++++++++++++++++++
 2 files changed, 627 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/xen/machine_kexec_64.c
 create mode 100644 arch/x86/xen/relocate_kernel_64.S

diff --git a/arch/x86/xen/machine_kexec_64.c b/arch/x86/xen/machine_kexec_64.c
new file mode 100644
index 0000000..2600342
--- /dev/null
+++ b/arch/x86/xen/machine_kexec_64.c
@@ -0,0 +1,318 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/mm.h>
+#include <linux/string.h>
+
+#include <xen/interface/memory.h>
+#include <xen/xen.h>
+
+#include <asm/xen/hypercall.h>
+#include <asm/xen/kexec.h>
+#include <asm/xen/page.h>
+
+#define __ma(vaddr)	(virt_to_machine(vaddr).maddr)
+
+static void init_level2_page(pmd_t *pmd, unsigned long addr)
+{
+	unsigned long end_addr = addr + PUD_SIZE;
+
+	while (addr < end_addr) {
+		native_set_pmd(pmd++, native_make_pmd(addr | __PAGE_KERNEL_LARGE_EXEC));
+		addr += PMD_SIZE;
+	}
+}
+
+static int init_level3_page(struct kimage *image, pud_t *pud,
+				unsigned long addr, unsigned long last_addr)
+{
+	pmd_t *pmd;
+	struct page *page;
+	unsigned long end_addr = addr + PGDIR_SIZE;
+
+	while ((addr < last_addr) && (addr < end_addr)) {
+		page = firmware_kimage_alloc_control_pages(image, 0);
+
+		if (!page)
+			return -ENOMEM;
+
+		pmd = page_address(page);
+		init_level2_page(pmd, addr);
+		native_set_pud(pud++, native_make_pud(__ma(pmd) | _KERNPG_TABLE));
+		addr += PUD_SIZE;
+	}
+
+	/* Clear the unused entries. */
+	while (addr < end_addr) {
+		native_pud_clear(pud++);
+		addr += PUD_SIZE;
+	}
+
+	return 0;
+}
+
+
+static int init_level4_page(struct kimage *image, pgd_t *pgd,
+				unsigned long addr, unsigned long last_addr)
+{
+	int rc;
+	pud_t *pud;
+	struct page *page;
+	unsigned long end_addr = addr + PTRS_PER_PGD * PGDIR_SIZE;
+
+	while ((addr < last_addr) && (addr < end_addr)) {
+		page = firmware_kimage_alloc_control_pages(image, 0);
+
+		if (!page)
+			return -ENOMEM;
+
+		pud = page_address(page);
+		rc = init_level3_page(image, pud, addr, last_addr);
+
+		if (rc)
+			return rc;
+
+		native_set_pgd(pgd++, native_make_pgd(__ma(pud) | _KERNPG_TABLE));
+		addr += PGDIR_SIZE;
+	}
+
+	/* Clear the unused entries. */
+	while (addr < end_addr) {
+		native_pgd_clear(pgd++);
+		addr += PGDIR_SIZE;
+	}
+
+	return 0;
+}
+
+static void free_transition_pgtable(struct kimage *image)
+{
+	free_page((unsigned long)image->arch.pgd);
+	free_page((unsigned long)image->arch.pud0);
+	free_page((unsigned long)image->arch.pud1);
+	free_page((unsigned long)image->arch.pmd0);
+	free_page((unsigned long)image->arch.pmd1);
+	free_page((unsigned long)image->arch.pte0);
+	free_page((unsigned long)image->arch.pte1);
+}
+
+static int alloc_transition_pgtable(struct kimage *image)
+{
+	image->arch.pgd = (pgd_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pgd)
+		goto err;
+
+	image->arch.pud0 = (pud_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pud0)
+		goto err;
+
+	image->arch.pud1 = (pud_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pud1)
+		goto err;
+
+	image->arch.pmd0 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pmd0)
+		goto err;
+
+	image->arch.pmd1 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pmd1)
+		goto err;
+
+	image->arch.pte0 = (pte_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pte0)
+		goto err;
+
+	image->arch.pte1 = (pte_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pte1)
+		goto err;
+
+	return 0;
+
+err:
+	free_transition_pgtable(image);
+
+	return -ENOMEM;
+}
+
+static int init_pgtable(struct kimage *image, pgd_t *pgd)
+{
+	int rc;
+	unsigned long max_mfn;
+
+	max_mfn = HYPERVISOR_memory_op(XENMEM_maximum_ram_page, NULL);
+
+	rc = init_level4_page(image, pgd, 0, PFN_PHYS(max_mfn));
+
+	if (rc)
+		return rc;
+
+	return alloc_transition_pgtable(image);
+}
+
+struct page *mf_kexec_kimage_alloc_pages(gfp_t gfp_mask,
+						unsigned int order,
+						unsigned long limit)
+{
+	struct page *pages;
+	unsigned int i;
+
+	pages = alloc_pages(gfp_mask, order);
+
+	if (!pages)
+		return NULL;
+
+	BUG_ON(PagePrivate(pages));
+
+	pages->mapping = NULL;
+	set_page_private(pages, order);
+
+	for (i = 0; i < (1 << order); ++i)
+		SetPageReserved(pages + i);
+
+	return pages;
+}
+
+void mf_kexec_kimage_free_pages(struct page *page)
+{
+	unsigned int i, order;
+
+	order = page_private(page);
+
+	for (i = 0; i < (1 << order); ++i)
+		ClearPageReserved(page + i);
+
+	__free_pages(page, order);
+}
+
+unsigned long mf_kexec_page_to_pfn(struct page *page)
+{
+	return pfn_to_mfn(page_to_pfn(page));
+}
+
+struct page *mf_kexec_pfn_to_page(unsigned long mfn)
+{
+	return pfn_to_page(mfn_to_pfn(mfn));
+}
+
+unsigned long mf_kexec_virt_to_phys(volatile void *address)
+{
+	return virt_to_machine(address).maddr;
+}
+
+void *mf_kexec_phys_to_virt(unsigned long address)
+{
+	return phys_to_virt(machine_to_phys(XMADDR(address)).paddr);
+}
+
+int mf_kexec_prepare(struct kimage *image)
+{
+#ifdef CONFIG_KEXEC_JUMP
+	if (image->preserve_context) {
+		pr_info_once("kexec: Context preservation is not "
+				"supported in Xen domains.\n");
+		return -ENOSYS;
+	}
+#endif
+
+	return init_pgtable(image, page_address(image->control_code_page));
+}
+
+int mf_kexec_load(struct kimage *image)
+{
+	void *control_page, *table_page;
+	struct xen_kexec_load xkl = {};
+
+	/* Image is unloaded, nothing to do. */
+	if (!image)
+		return 0;
+
+	table_page = page_address(image->control_code_page);
+	control_page = table_page + PAGE_SIZE;
+
+	memcpy(control_page, xen_relocate_kernel, xen_kexec_control_code_size);
+
+	xkl.type = image->type;
+	xkl.image.page_list[XK_MA_CONTROL_PAGE] = __ma(control_page);
+	xkl.image.page_list[XK_MA_TABLE_PAGE] = __ma(table_page);
+	xkl.image.page_list[XK_MA_PGD_PAGE] = __ma(image->arch.pgd);
+	xkl.image.page_list[XK_MA_PUD0_PAGE] = __ma(image->arch.pud0);
+	xkl.image.page_list[XK_MA_PUD1_PAGE] = __ma(image->arch.pud1);
+	xkl.image.page_list[XK_MA_PMD0_PAGE] = __ma(image->arch.pmd0);
+	xkl.image.page_list[XK_MA_PMD1_PAGE] = __ma(image->arch.pmd1);
+	xkl.image.page_list[XK_MA_PTE0_PAGE] = __ma(image->arch.pte0);
+	xkl.image.page_list[XK_MA_PTE1_PAGE] = __ma(image->arch.pte1);
+	xkl.image.indirection_page = image->head;
+	xkl.image.start_address = image->start;
+
+	return HYPERVISOR_kexec_op(KEXEC_CMD_kexec_load, &xkl);
+}
+
+void mf_kexec_cleanup(struct kimage *image)
+{
+	free_transition_pgtable(image);
+}
+
+void mf_kexec_unload(struct kimage *image)
+{
+	int rc;
+	struct xen_kexec_load xkl = {};
+
+	if (!image)
+		return;
+
+	xkl.type = image->type;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_unload, &xkl);
+
+	WARN(rc, "kexec: %s: HYPERVISOR_kexec_op(): %i\n", __func__, rc);
+}
+
+void mf_kexec_shutdown(void)
+{
+}
+
+void mf_kexec(struct kimage *image)
+{
+	int rc;
+	struct xen_kexec_exec xke = {};
+
+	xke.type = image->type;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec, &xke);
+
+	pr_emerg("kexec: %s: HYPERVISOR_kexec_op(): %i\n", __func__, rc);
+	BUG();
+}
diff --git a/arch/x86/xen/relocate_kernel_64.S b/arch/x86/xen/relocate_kernel_64.S
new file mode 100644
index 0000000..8f641f1
--- /dev/null
+++ b/arch/x86/xen/relocate_kernel_64.S
@@ -0,0 +1,309 @@
+/*
+ * Copyright (c) 2002-2005 Eric Biederman <ebiederm@xmission.com>
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <asm/page_types.h>
+#include <asm/pgtable_types.h>
+#include <asm/processor-flags.h>
+
+#include <asm/xen/kexec.h>
+
+#define PTR(x)	(x << 3)
+
+	.text
+	.code64
+	.globl	xen_kexec_control_code_size, xen_relocate_kernel
+
+xen_relocate_kernel:
+	/*
+	 * Must be relocatable PIC code callable as a C function.
+	 *
+	 * This function is called by Xen but here hypervisor is dead.
+	 * We are playing on bare metal.
+	 *
+	 * Every machine address passed to this function through
+	 * page_list (e.g. XK_MA_CONTROL_PAGE) is established
+	 * by dom0 during kexec load phase.
+	 *
+	 * Every virtual address passed to this function through page_list
+	 * (e.g. XK_VA_CONTROL_PAGE) is established by hypervisor during
+	 * HYPERVISOR_kexec_op(KEXEC_CMD_kexec_load) hypercall.
+	 *
+	 * %rdi - indirection_page,
+	 * %rsi - page_list,
+	 * %rdx - start_address,
+	 * %ecx - preserve_context (ignored).
+	 */
+
+	/* Zero out flags, and disable interrupts. */
+	pushq	$0
+	popfq
+
+	/*
+	 * Map the control page at its virtual address
+	 * in transition page table.
+	 */
+	movq	PTR(XK_VA_CONTROL_PAGE)(%rsi), %r8
+
+	/* Get PGD address and PGD entry index. */
+	movq	PTR(XK_VA_PGD_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PGDIR_SHIFT, %r10
+	andq	$(PTRS_PER_PGD - 1), %r10
+
+	/* Fill PGD entry with PUD0 reference. */
+	movq	PTR(XK_MA_PUD0_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PUD0 address and PUD0 entry index. */
+	movq	PTR(XK_VA_PUD0_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PUD_SHIFT, %r10
+	andq	$(PTRS_PER_PUD - 1), %r10
+
+	/* Fill PUD0 entry with PMD0 reference. */
+	movq	PTR(XK_MA_PMD0_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PMD0 address and PMD0 entry index. */
+	movq	PTR(XK_VA_PMD0_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PMD_SHIFT, %r10
+	andq	$(PTRS_PER_PMD - 1), %r10
+
+	/* Fill PMD0 entry with PTE0 reference. */
+	movq	PTR(XK_MA_PTE0_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PTE0 address and PTE0 entry index. */
+	movq	PTR(XK_VA_PTE0_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PAGE_SHIFT, %r10
+	andq	$(PTRS_PER_PTE - 1), %r10
+
+	/* Fill PTE0 entry with control page reference. */
+	movq	PTR(XK_MA_CONTROL_PAGE)(%rsi), %r11
+	orq	$__PAGE_KERNEL_EXEC, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/*
+	 * Identity map the control page at its machine address
+	 * in transition page table.
+	 */
+	movq	PTR(XK_MA_CONTROL_PAGE)(%rsi), %r8
+
+	/* Get PGD address and PGD entry index. */
+	movq	PTR(XK_VA_PGD_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PGDIR_SHIFT, %r10
+	andq	$(PTRS_PER_PGD - 1), %r10
+
+	/* Fill PGD entry with PUD1 reference. */
+	movq	PTR(XK_MA_PUD1_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PUD1 address and PUD1 entry index. */
+	movq	PTR(XK_VA_PUD1_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PUD_SHIFT, %r10
+	andq	$(PTRS_PER_PUD - 1), %r10
+
+	/* Fill PUD1 entry with PMD1 reference. */
+	movq	PTR(XK_MA_PMD1_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PMD1 address and PMD1 entry index. */
+	movq	PTR(XK_VA_PMD1_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PMD_SHIFT, %r10
+	andq	$(PTRS_PER_PMD - 1), %r10
+
+	/* Fill PMD1 entry with PTE1 reference. */
+	movq	PTR(XK_MA_PTE1_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PTE1 address and PTE1 entry index. */
+	movq	PTR(XK_VA_PTE1_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PAGE_SHIFT, %r10
+	andq	$(PTRS_PER_PTE - 1), %r10
+
+	/* Fill PTE1 entry with control page reference. */
+	movq	PTR(XK_MA_CONTROL_PAGE)(%rsi), %r11
+	orq	$__PAGE_KERNEL_EXEC, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/*
+	 * Get machine address of control page now.
+	 * This is impossible after page table switch.
+	 */
+	movq	PTR(XK_MA_CONTROL_PAGE)(%rsi), %r8
+
+	/* Get machine address of identity page table now too. */
+	movq	PTR(XK_MA_TABLE_PAGE)(%rsi), %r9
+
+	/* Get machine address of transition page table now too. */
+	movq	PTR(XK_MA_PGD_PAGE)(%rsi), %r10
+
+	/* Switch to transition page table. */
+	movq	%r10, %cr3
+
+	/* Setup a new stack at the end of machine address of control page. */
+	leaq	PAGE_SIZE(%r8), %rsp
+
+	/* Store start_address on the stack. */
+	pushq   %rdx
+
+	/* Jump to identity mapped page. */
+	addq	$(identity_mapped - xen_relocate_kernel), %r8
+	jmpq	*%r8
+
+identity_mapped:
+	/* Switch to identity page table. */
+	movq	%r9, %cr3
+
+	/*
+	 * Set %cr0 to a known state:
+	 *   - disable alignment check,
+	 *   - disable floating point emulation,
+	 *   - no task switch,
+	 *   - disable write protect,
+	 *   - enable protected mode,
+	 *   - enable paging.
+	 */
+	movq	%cr0, %rax
+	andq	$~(X86_CR0_AM | X86_CR0_EM | X86_CR0_TS | X86_CR0_WP), %rax
+	orl	$(X86_CR0_PE | X86_CR0_PG), %eax
+	movq	%rax, %cr0
+
+	/*
+	 * Set %cr4 to a known state:
+	 *   - enable physical address extension.
+	 */
+	movq	$X86_CR4_PAE, %rax
+	movq	%rax, %cr4
+
+	jmp	1f
+
+1:
+	/* Flush the TLB (needed?). */
+	movq	%r9, %cr3
+
+	/* Do the copies. */
+	movq	%rdi, %rcx	/* Put the indirection_page in %rcx. */
+	xorq	%rdi, %rdi
+	xorq	%rsi, %rsi
+	jmp	1f
+
+0:
+	/*
+	 * Top, read another quadword from the indirection page.
+	 * Indirection page is an array which contains source
+	 * and destination address pairs. If all pairs could
+	 * not fit in one page then at the end of given
+	 * indirection page is pointer to next one.
+	 * Copy is stopped when done indicator
+	 * is found in indirection page.
+	 */
+	movq	(%rbx), %rcx
+	addq	$8, %rbx
+
+1:
+	testq	$0x1, %rcx	/* Is it a destination page? */
+	jz	2f
+
+	movq	%rcx, %rdi
+	andq	$PAGE_MASK, %rdi
+	jmp	0b
+
+2:
+	testq	$0x2, %rcx	/* Is it an indirection page? */
+	jz	2f
+
+	movq	%rcx, %rbx
+	andq	$PAGE_MASK, %rbx
+	jmp	0b
+
+2:
+	testq	$0x4, %rcx	/* Is it the done indicator? */
+	jz	2f
+	jmp	3f
+
+2:
+	testq	$0x8, %rcx	/* Is it the source indicator? */
+	jz	0b		/* Ignore it otherwise. */
+
+	movq	%rcx, %rsi
+	andq	$PAGE_MASK, %rsi
+	movq	$512, %rcx
+
+	/* Copy page. */
+	rep	movsq
+	jmp	0b
+
+3:
+	/*
+	 * To be certain of avoiding problems with self-modifying code
+	 * I need to execute a serializing instruction here.
+	 * So I flush the TLB by reloading %cr3 here, it's handy,
+	 * and not processor dependent.
+	 */
+	movq	%cr3, %rax
+	movq	%rax, %cr3
+
+	/*
+	 * Set all of the registers to known values.
+	 * Leave %rsp alone.
+	 */
+	xorq	%rax, %rax
+	xorq	%rbx, %rbx
+	xorq    %rcx, %rcx
+	xorq    %rdx, %rdx
+	xorq    %rsi, %rsi
+	xorq    %rdi, %rdi
+	xorq    %rbp, %rbp
+	xorq	%r8, %r8
+	xorq	%r9, %r9
+	xorq	%r10, %r10
+	xorq	%r11, %r11
+	xorq	%r12, %r12
+	xorq	%r13, %r13
+	xorq	%r14, %r14
+	xorq	%r15, %r15
+
+	/* Jump to start_address. */
+	retq
+
+xen_kexec_control_code_size:
+	.long	. - xen_relocate_kernel
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 07/11] x86/xen: Add x86_64 kexec/kdump implementation
  2012-12-27  2:18             ` Daniel Kiper
  (?)
  (?)
@ 2012-12-27  2:18             ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Add x86_64 kexec/kdump implementation.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/xen/machine_kexec_64.c   |  318 +++++++++++++++++++++++++++++++++++++
 arch/x86/xen/relocate_kernel_64.S |  309 +++++++++++++++++++++++++++++++++++
 2 files changed, 627 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/xen/machine_kexec_64.c
 create mode 100644 arch/x86/xen/relocate_kernel_64.S

diff --git a/arch/x86/xen/machine_kexec_64.c b/arch/x86/xen/machine_kexec_64.c
new file mode 100644
index 0000000..2600342
--- /dev/null
+++ b/arch/x86/xen/machine_kexec_64.c
@@ -0,0 +1,318 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/mm.h>
+#include <linux/string.h>
+
+#include <xen/interface/memory.h>
+#include <xen/xen.h>
+
+#include <asm/xen/hypercall.h>
+#include <asm/xen/kexec.h>
+#include <asm/xen/page.h>
+
+#define __ma(vaddr)	(virt_to_machine(vaddr).maddr)
+
+static void init_level2_page(pmd_t *pmd, unsigned long addr)
+{
+	unsigned long end_addr = addr + PUD_SIZE;
+
+	while (addr < end_addr) {
+		native_set_pmd(pmd++, native_make_pmd(addr | __PAGE_KERNEL_LARGE_EXEC));
+		addr += PMD_SIZE;
+	}
+}
+
+static int init_level3_page(struct kimage *image, pud_t *pud,
+				unsigned long addr, unsigned long last_addr)
+{
+	pmd_t *pmd;
+	struct page *page;
+	unsigned long end_addr = addr + PGDIR_SIZE;
+
+	while ((addr < last_addr) && (addr < end_addr)) {
+		page = firmware_kimage_alloc_control_pages(image, 0);
+
+		if (!page)
+			return -ENOMEM;
+
+		pmd = page_address(page);
+		init_level2_page(pmd, addr);
+		native_set_pud(pud++, native_make_pud(__ma(pmd) | _KERNPG_TABLE));
+		addr += PUD_SIZE;
+	}
+
+	/* Clear the unused entries. */
+	while (addr < end_addr) {
+		native_pud_clear(pud++);
+		addr += PUD_SIZE;
+	}
+
+	return 0;
+}
+
+
+static int init_level4_page(struct kimage *image, pgd_t *pgd,
+				unsigned long addr, unsigned long last_addr)
+{
+	int rc;
+	pud_t *pud;
+	struct page *page;
+	unsigned long end_addr = addr + PTRS_PER_PGD * PGDIR_SIZE;
+
+	while ((addr < last_addr) && (addr < end_addr)) {
+		page = firmware_kimage_alloc_control_pages(image, 0);
+
+		if (!page)
+			return -ENOMEM;
+
+		pud = page_address(page);
+		rc = init_level3_page(image, pud, addr, last_addr);
+
+		if (rc)
+			return rc;
+
+		native_set_pgd(pgd++, native_make_pgd(__ma(pud) | _KERNPG_TABLE));
+		addr += PGDIR_SIZE;
+	}
+
+	/* Clear the unused entries. */
+	while (addr < end_addr) {
+		native_pgd_clear(pgd++);
+		addr += PGDIR_SIZE;
+	}
+
+	return 0;
+}
+
+static void free_transition_pgtable(struct kimage *image)
+{
+	free_page((unsigned long)image->arch.pgd);
+	free_page((unsigned long)image->arch.pud0);
+	free_page((unsigned long)image->arch.pud1);
+	free_page((unsigned long)image->arch.pmd0);
+	free_page((unsigned long)image->arch.pmd1);
+	free_page((unsigned long)image->arch.pte0);
+	free_page((unsigned long)image->arch.pte1);
+}
+
+static int alloc_transition_pgtable(struct kimage *image)
+{
+	image->arch.pgd = (pgd_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pgd)
+		goto err;
+
+	image->arch.pud0 = (pud_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pud0)
+		goto err;
+
+	image->arch.pud1 = (pud_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pud1)
+		goto err;
+
+	image->arch.pmd0 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pmd0)
+		goto err;
+
+	image->arch.pmd1 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pmd1)
+		goto err;
+
+	image->arch.pte0 = (pte_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pte0)
+		goto err;
+
+	image->arch.pte1 = (pte_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pte1)
+		goto err;
+
+	return 0;
+
+err:
+	free_transition_pgtable(image);
+
+	return -ENOMEM;
+}
+
+static int init_pgtable(struct kimage *image, pgd_t *pgd)
+{
+	int rc;
+	unsigned long max_mfn;
+
+	max_mfn = HYPERVISOR_memory_op(XENMEM_maximum_ram_page, NULL);
+
+	rc = init_level4_page(image, pgd, 0, PFN_PHYS(max_mfn));
+
+	if (rc)
+		return rc;
+
+	return alloc_transition_pgtable(image);
+}
+
+struct page *mf_kexec_kimage_alloc_pages(gfp_t gfp_mask,
+						unsigned int order,
+						unsigned long limit)
+{
+	struct page *pages;
+	unsigned int i;
+
+	pages = alloc_pages(gfp_mask, order);
+
+	if (!pages)
+		return NULL;
+
+	BUG_ON(PagePrivate(pages));
+
+	pages->mapping = NULL;
+	set_page_private(pages, order);
+
+	for (i = 0; i < (1 << order); ++i)
+		SetPageReserved(pages + i);
+
+	return pages;
+}
+
+void mf_kexec_kimage_free_pages(struct page *page)
+{
+	unsigned int i, order;
+
+	order = page_private(page);
+
+	for (i = 0; i < (1 << order); ++i)
+		ClearPageReserved(page + i);
+
+	__free_pages(page, order);
+}
+
+unsigned long mf_kexec_page_to_pfn(struct page *page)
+{
+	return pfn_to_mfn(page_to_pfn(page));
+}
+
+struct page *mf_kexec_pfn_to_page(unsigned long mfn)
+{
+	return pfn_to_page(mfn_to_pfn(mfn));
+}
+
+unsigned long mf_kexec_virt_to_phys(volatile void *address)
+{
+	return virt_to_machine(address).maddr;
+}
+
+void *mf_kexec_phys_to_virt(unsigned long address)
+{
+	return phys_to_virt(machine_to_phys(XMADDR(address)).paddr);
+}
+
+int mf_kexec_prepare(struct kimage *image)
+{
+#ifdef CONFIG_KEXEC_JUMP
+	if (image->preserve_context) {
+		pr_info_once("kexec: Context preservation is not "
+				"supported in Xen domains.\n");
+		return -ENOSYS;
+	}
+#endif
+
+	return init_pgtable(image, page_address(image->control_code_page));
+}
+
+int mf_kexec_load(struct kimage *image)
+{
+	void *control_page, *table_page;
+	struct xen_kexec_load xkl = {};
+
+	/* Image is unloaded, nothing to do. */
+	if (!image)
+		return 0;
+
+	table_page = page_address(image->control_code_page);
+	control_page = table_page + PAGE_SIZE;
+
+	memcpy(control_page, xen_relocate_kernel, xen_kexec_control_code_size);
+
+	xkl.type = image->type;
+	xkl.image.page_list[XK_MA_CONTROL_PAGE] = __ma(control_page);
+	xkl.image.page_list[XK_MA_TABLE_PAGE] = __ma(table_page);
+	xkl.image.page_list[XK_MA_PGD_PAGE] = __ma(image->arch.pgd);
+	xkl.image.page_list[XK_MA_PUD0_PAGE] = __ma(image->arch.pud0);
+	xkl.image.page_list[XK_MA_PUD1_PAGE] = __ma(image->arch.pud1);
+	xkl.image.page_list[XK_MA_PMD0_PAGE] = __ma(image->arch.pmd0);
+	xkl.image.page_list[XK_MA_PMD1_PAGE] = __ma(image->arch.pmd1);
+	xkl.image.page_list[XK_MA_PTE0_PAGE] = __ma(image->arch.pte0);
+	xkl.image.page_list[XK_MA_PTE1_PAGE] = __ma(image->arch.pte1);
+	xkl.image.indirection_page = image->head;
+	xkl.image.start_address = image->start;
+
+	return HYPERVISOR_kexec_op(KEXEC_CMD_kexec_load, &xkl);
+}
+
+void mf_kexec_cleanup(struct kimage *image)
+{
+	free_transition_pgtable(image);
+}
+
+void mf_kexec_unload(struct kimage *image)
+{
+	int rc;
+	struct xen_kexec_load xkl = {};
+
+	if (!image)
+		return;
+
+	xkl.type = image->type;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_unload, &xkl);
+
+	WARN(rc, "kexec: %s: HYPERVISOR_kexec_op(): %i\n", __func__, rc);
+}
+
+void mf_kexec_shutdown(void)
+{
+}
+
+void mf_kexec(struct kimage *image)
+{
+	int rc;
+	struct xen_kexec_exec xke = {};
+
+	xke.type = image->type;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec, &xke);
+
+	pr_emerg("kexec: %s: HYPERVISOR_kexec_op(): %i\n", __func__, rc);
+	BUG();
+}
diff --git a/arch/x86/xen/relocate_kernel_64.S b/arch/x86/xen/relocate_kernel_64.S
new file mode 100644
index 0000000..8f641f1
--- /dev/null
+++ b/arch/x86/xen/relocate_kernel_64.S
@@ -0,0 +1,309 @@
+/*
+ * Copyright (c) 2002-2005 Eric Biederman <ebiederm@xmission.com>
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <asm/page_types.h>
+#include <asm/pgtable_types.h>
+#include <asm/processor-flags.h>
+
+#include <asm/xen/kexec.h>
+
+#define PTR(x)	(x << 3)
+
+	.text
+	.code64
+	.globl	xen_kexec_control_code_size, xen_relocate_kernel
+
+xen_relocate_kernel:
+	/*
+	 * Must be relocatable PIC code callable as a C function.
+	 *
+	 * This function is called by Xen but here hypervisor is dead.
+	 * We are playing on bare metal.
+	 *
+	 * Every machine address passed to this function through
+	 * page_list (e.g. XK_MA_CONTROL_PAGE) is established
+	 * by dom0 during kexec load phase.
+	 *
+	 * Every virtual address passed to this function through page_list
+	 * (e.g. XK_VA_CONTROL_PAGE) is established by hypervisor during
+	 * HYPERVISOR_kexec_op(KEXEC_CMD_kexec_load) hypercall.
+	 *
+	 * %rdi - indirection_page,
+	 * %rsi - page_list,
+	 * %rdx - start_address,
+	 * %ecx - preserve_context (ignored).
+	 */
+
+	/* Zero out flags, and disable interrupts. */
+	pushq	$0
+	popfq
+
+	/*
+	 * Map the control page at its virtual address
+	 * in transition page table.
+	 */
+	movq	PTR(XK_VA_CONTROL_PAGE)(%rsi), %r8
+
+	/* Get PGD address and PGD entry index. */
+	movq	PTR(XK_VA_PGD_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PGDIR_SHIFT, %r10
+	andq	$(PTRS_PER_PGD - 1), %r10
+
+	/* Fill PGD entry with PUD0 reference. */
+	movq	PTR(XK_MA_PUD0_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PUD0 address and PUD0 entry index. */
+	movq	PTR(XK_VA_PUD0_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PUD_SHIFT, %r10
+	andq	$(PTRS_PER_PUD - 1), %r10
+
+	/* Fill PUD0 entry with PMD0 reference. */
+	movq	PTR(XK_MA_PMD0_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PMD0 address and PMD0 entry index. */
+	movq	PTR(XK_VA_PMD0_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PMD_SHIFT, %r10
+	andq	$(PTRS_PER_PMD - 1), %r10
+
+	/* Fill PMD0 entry with PTE0 reference. */
+	movq	PTR(XK_MA_PTE0_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PTE0 address and PTE0 entry index. */
+	movq	PTR(XK_VA_PTE0_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PAGE_SHIFT, %r10
+	andq	$(PTRS_PER_PTE - 1), %r10
+
+	/* Fill PTE0 entry with control page reference. */
+	movq	PTR(XK_MA_CONTROL_PAGE)(%rsi), %r11
+	orq	$__PAGE_KERNEL_EXEC, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/*
+	 * Identity map the control page at its machine address
+	 * in transition page table.
+	 */
+	movq	PTR(XK_MA_CONTROL_PAGE)(%rsi), %r8
+
+	/* Get PGD address and PGD entry index. */
+	movq	PTR(XK_VA_PGD_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PGDIR_SHIFT, %r10
+	andq	$(PTRS_PER_PGD - 1), %r10
+
+	/* Fill PGD entry with PUD1 reference. */
+	movq	PTR(XK_MA_PUD1_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PUD1 address and PUD1 entry index. */
+	movq	PTR(XK_VA_PUD1_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PUD_SHIFT, %r10
+	andq	$(PTRS_PER_PUD - 1), %r10
+
+	/* Fill PUD1 entry with PMD1 reference. */
+	movq	PTR(XK_MA_PMD1_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PMD1 address and PMD1 entry index. */
+	movq	PTR(XK_VA_PMD1_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PMD_SHIFT, %r10
+	andq	$(PTRS_PER_PMD - 1), %r10
+
+	/* Fill PMD1 entry with PTE1 reference. */
+	movq	PTR(XK_MA_PTE1_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PTE1 address and PTE1 entry index. */
+	movq	PTR(XK_VA_PTE1_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PAGE_SHIFT, %r10
+	andq	$(PTRS_PER_PTE - 1), %r10
+
+	/* Fill PTE1 entry with control page reference. */
+	movq	PTR(XK_MA_CONTROL_PAGE)(%rsi), %r11
+	orq	$__PAGE_KERNEL_EXEC, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/*
+	 * Get machine address of control page now.
+	 * This is impossible after page table switch.
+	 */
+	movq	PTR(XK_MA_CONTROL_PAGE)(%rsi), %r8
+
+	/* Get machine address of identity page table now too. */
+	movq	PTR(XK_MA_TABLE_PAGE)(%rsi), %r9
+
+	/* Get machine address of transition page table now too. */
+	movq	PTR(XK_MA_PGD_PAGE)(%rsi), %r10
+
+	/* Switch to transition page table. */
+	movq	%r10, %cr3
+
+	/* Setup a new stack at the end of machine address of control page. */
+	leaq	PAGE_SIZE(%r8), %rsp
+
+	/* Store start_address on the stack. */
+	pushq   %rdx
+
+	/* Jump to identity mapped page. */
+	addq	$(identity_mapped - xen_relocate_kernel), %r8
+	jmpq	*%r8
+
+identity_mapped:
+	/* Switch to identity page table. */
+	movq	%r9, %cr3
+
+	/*
+	 * Set %cr0 to a known state:
+	 *   - disable alignment check,
+	 *   - disable floating point emulation,
+	 *   - no task switch,
+	 *   - disable write protect,
+	 *   - enable protected mode,
+	 *   - enable paging.
+	 */
+	movq	%cr0, %rax
+	andq	$~(X86_CR0_AM | X86_CR0_EM | X86_CR0_TS | X86_CR0_WP), %rax
+	orl	$(X86_CR0_PE | X86_CR0_PG), %eax
+	movq	%rax, %cr0
+
+	/*
+	 * Set %cr4 to a known state:
+	 *   - enable physical address extension.
+	 */
+	movq	$X86_CR4_PAE, %rax
+	movq	%rax, %cr4
+
+	jmp	1f
+
+1:
+	/* Flush the TLB (needed?). */
+	movq	%r9, %cr3
+
+	/* Do the copies. */
+	movq	%rdi, %rcx	/* Put the indirection_page in %rcx. */
+	xorq	%rdi, %rdi
+	xorq	%rsi, %rsi
+	jmp	1f
+
+0:
+	/*
+	 * Top, read another quadword from the indirection page.
+	 * Indirection page is an array which contains source
+	 * and destination address pairs. If all pairs could
+	 * not fit in one page then at the end of given
+	 * indirection page is pointer to next one.
+	 * Copy is stopped when done indicator
+	 * is found in indirection page.
+	 */
+	movq	(%rbx), %rcx
+	addq	$8, %rbx
+
+1:
+	testq	$0x1, %rcx	/* Is it a destination page? */
+	jz	2f
+
+	movq	%rcx, %rdi
+	andq	$PAGE_MASK, %rdi
+	jmp	0b
+
+2:
+	testq	$0x2, %rcx	/* Is it an indirection page? */
+	jz	2f
+
+	movq	%rcx, %rbx
+	andq	$PAGE_MASK, %rbx
+	jmp	0b
+
+2:
+	testq	$0x4, %rcx	/* Is it the done indicator? */
+	jz	2f
+	jmp	3f
+
+2:
+	testq	$0x8, %rcx	/* Is it the source indicator? */
+	jz	0b		/* Ignore it otherwise. */
+
+	movq	%rcx, %rsi
+	andq	$PAGE_MASK, %rsi
+	movq	$512, %rcx
+
+	/* Copy page. */
+	rep	movsq
+	jmp	0b
+
+3:
+	/*
+	 * To be certain of avoiding problems with self-modifying code
+	 * I need to execute a serializing instruction here.
+	 * So I flush the TLB by reloading %cr3 here, it's handy,
+	 * and not processor dependent.
+	 */
+	movq	%cr3, %rax
+	movq	%rax, %cr3
+
+	/*
+	 * Set all of the registers to known values.
+	 * Leave %rsp alone.
+	 */
+	xorq	%rax, %rax
+	xorq	%rbx, %rbx
+	xorq    %rcx, %rcx
+	xorq    %rdx, %rdx
+	xorq    %rsi, %rsi
+	xorq    %rdi, %rdi
+	xorq    %rbp, %rbp
+	xorq	%r8, %r8
+	xorq	%r9, %r9
+	xorq	%r10, %r10
+	xorq	%r11, %r11
+	xorq	%r12, %r12
+	xorq	%r13, %r13
+	xorq	%r14, %r14
+	xorq	%r15, %r15
+
+	/* Jump to start_address. */
+	retq
+
+xen_kexec_control_code_size:
+	.long	. - xen_relocate_kernel
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 07/11] x86/xen: Add x86_64 kexec/kdump implementation
@ 2012-12-27  2:18               ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3-Sxgqhf6Nn4DQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	jbeulich-IBi9RG/b67k, konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
	maxim.uvarov-QHcLZuEGTsvQT0dZR+AlfA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR
  Cc: Daniel Kiper

Add x86_64 kexec/kdump implementation.

Signed-off-by: Daniel Kiper <daniel.kiper-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 arch/x86/xen/machine_kexec_64.c   |  318 +++++++++++++++++++++++++++++++++++++
 arch/x86/xen/relocate_kernel_64.S |  309 +++++++++++++++++++++++++++++++++++
 2 files changed, 627 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/xen/machine_kexec_64.c
 create mode 100644 arch/x86/xen/relocate_kernel_64.S

diff --git a/arch/x86/xen/machine_kexec_64.c b/arch/x86/xen/machine_kexec_64.c
new file mode 100644
index 0000000..2600342
--- /dev/null
+++ b/arch/x86/xen/machine_kexec_64.c
@@ -0,0 +1,318 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/mm.h>
+#include <linux/string.h>
+
+#include <xen/interface/memory.h>
+#include <xen/xen.h>
+
+#include <asm/xen/hypercall.h>
+#include <asm/xen/kexec.h>
+#include <asm/xen/page.h>
+
+#define __ma(vaddr)	(virt_to_machine(vaddr).maddr)
+
+static void init_level2_page(pmd_t *pmd, unsigned long addr)
+{
+	unsigned long end_addr = addr + PUD_SIZE;
+
+	while (addr < end_addr) {
+		native_set_pmd(pmd++, native_make_pmd(addr | __PAGE_KERNEL_LARGE_EXEC));
+		addr += PMD_SIZE;
+	}
+}
+
+static int init_level3_page(struct kimage *image, pud_t *pud,
+				unsigned long addr, unsigned long last_addr)
+{
+	pmd_t *pmd;
+	struct page *page;
+	unsigned long end_addr = addr + PGDIR_SIZE;
+
+	while ((addr < last_addr) && (addr < end_addr)) {
+		page = firmware_kimage_alloc_control_pages(image, 0);
+
+		if (!page)
+			return -ENOMEM;
+
+		pmd = page_address(page);
+		init_level2_page(pmd, addr);
+		native_set_pud(pud++, native_make_pud(__ma(pmd) | _KERNPG_TABLE));
+		addr += PUD_SIZE;
+	}
+
+	/* Clear the unused entries. */
+	while (addr < end_addr) {
+		native_pud_clear(pud++);
+		addr += PUD_SIZE;
+	}
+
+	return 0;
+}
+
+
+static int init_level4_page(struct kimage *image, pgd_t *pgd,
+				unsigned long addr, unsigned long last_addr)
+{
+	int rc;
+	pud_t *pud;
+	struct page *page;
+	unsigned long end_addr = addr + PTRS_PER_PGD * PGDIR_SIZE;
+
+	while ((addr < last_addr) && (addr < end_addr)) {
+		page = firmware_kimage_alloc_control_pages(image, 0);
+
+		if (!page)
+			return -ENOMEM;
+
+		pud = page_address(page);
+		rc = init_level3_page(image, pud, addr, last_addr);
+
+		if (rc)
+			return rc;
+
+		native_set_pgd(pgd++, native_make_pgd(__ma(pud) | _KERNPG_TABLE));
+		addr += PGDIR_SIZE;
+	}
+
+	/* Clear the unused entries. */
+	while (addr < end_addr) {
+		native_pgd_clear(pgd++);
+		addr += PGDIR_SIZE;
+	}
+
+	return 0;
+}
+
+static void free_transition_pgtable(struct kimage *image)
+{
+	free_page((unsigned long)image->arch.pgd);
+	free_page((unsigned long)image->arch.pud0);
+	free_page((unsigned long)image->arch.pud1);
+	free_page((unsigned long)image->arch.pmd0);
+	free_page((unsigned long)image->arch.pmd1);
+	free_page((unsigned long)image->arch.pte0);
+	free_page((unsigned long)image->arch.pte1);
+}
+
+static int alloc_transition_pgtable(struct kimage *image)
+{
+	image->arch.pgd = (pgd_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pgd)
+		goto err;
+
+	image->arch.pud0 = (pud_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pud0)
+		goto err;
+
+	image->arch.pud1 = (pud_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pud1)
+		goto err;
+
+	image->arch.pmd0 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pmd0)
+		goto err;
+
+	image->arch.pmd1 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pmd1)
+		goto err;
+
+	image->arch.pte0 = (pte_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pte0)
+		goto err;
+
+	image->arch.pte1 = (pte_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pte1)
+		goto err;
+
+	return 0;
+
+err:
+	free_transition_pgtable(image);
+
+	return -ENOMEM;
+}
+
+static int init_pgtable(struct kimage *image, pgd_t *pgd)
+{
+	int rc;
+	unsigned long max_mfn;
+
+	max_mfn = HYPERVISOR_memory_op(XENMEM_maximum_ram_page, NULL);
+
+	rc = init_level4_page(image, pgd, 0, PFN_PHYS(max_mfn));
+
+	if (rc)
+		return rc;
+
+	return alloc_transition_pgtable(image);
+}
+
+struct page *mf_kexec_kimage_alloc_pages(gfp_t gfp_mask,
+						unsigned int order,
+						unsigned long limit)
+{
+	struct page *pages;
+	unsigned int i;
+
+	pages = alloc_pages(gfp_mask, order);
+
+	if (!pages)
+		return NULL;
+
+	BUG_ON(PagePrivate(pages));
+
+	pages->mapping = NULL;
+	set_page_private(pages, order);
+
+	for (i = 0; i < (1 << order); ++i)
+		SetPageReserved(pages + i);
+
+	return pages;
+}
+
+void mf_kexec_kimage_free_pages(struct page *page)
+{
+	unsigned int i, order;
+
+	order = page_private(page);
+
+	for (i = 0; i < (1 << order); ++i)
+		ClearPageReserved(page + i);
+
+	__free_pages(page, order);
+}
+
+unsigned long mf_kexec_page_to_pfn(struct page *page)
+{
+	return pfn_to_mfn(page_to_pfn(page));
+}
+
+struct page *mf_kexec_pfn_to_page(unsigned long mfn)
+{
+	return pfn_to_page(mfn_to_pfn(mfn));
+}
+
+unsigned long mf_kexec_virt_to_phys(volatile void *address)
+{
+	return virt_to_machine(address).maddr;
+}
+
+void *mf_kexec_phys_to_virt(unsigned long address)
+{
+	return phys_to_virt(machine_to_phys(XMADDR(address)).paddr);
+}
+
+int mf_kexec_prepare(struct kimage *image)
+{
+#ifdef CONFIG_KEXEC_JUMP
+	if (image->preserve_context) {
+		pr_info_once("kexec: Context preservation is not "
+				"supported in Xen domains.\n");
+		return -ENOSYS;
+	}
+#endif
+
+	return init_pgtable(image, page_address(image->control_code_page));
+}
+
+int mf_kexec_load(struct kimage *image)
+{
+	void *control_page, *table_page;
+	struct xen_kexec_load xkl = {};
+
+	/* Image is unloaded, nothing to do. */
+	if (!image)
+		return 0;
+
+	table_page = page_address(image->control_code_page);
+	control_page = table_page + PAGE_SIZE;
+
+	memcpy(control_page, xen_relocate_kernel, xen_kexec_control_code_size);
+
+	xkl.type = image->type;
+	xkl.image.page_list[XK_MA_CONTROL_PAGE] = __ma(control_page);
+	xkl.image.page_list[XK_MA_TABLE_PAGE] = __ma(table_page);
+	xkl.image.page_list[XK_MA_PGD_PAGE] = __ma(image->arch.pgd);
+	xkl.image.page_list[XK_MA_PUD0_PAGE] = __ma(image->arch.pud0);
+	xkl.image.page_list[XK_MA_PUD1_PAGE] = __ma(image->arch.pud1);
+	xkl.image.page_list[XK_MA_PMD0_PAGE] = __ma(image->arch.pmd0);
+	xkl.image.page_list[XK_MA_PMD1_PAGE] = __ma(image->arch.pmd1);
+	xkl.image.page_list[XK_MA_PTE0_PAGE] = __ma(image->arch.pte0);
+	xkl.image.page_list[XK_MA_PTE1_PAGE] = __ma(image->arch.pte1);
+	xkl.image.indirection_page = image->head;
+	xkl.image.start_address = image->start;
+
+	return HYPERVISOR_kexec_op(KEXEC_CMD_kexec_load, &xkl);
+}
+
+void mf_kexec_cleanup(struct kimage *image)
+{
+	free_transition_pgtable(image);
+}
+
+void mf_kexec_unload(struct kimage *image)
+{
+	int rc;
+	struct xen_kexec_load xkl = {};
+
+	if (!image)
+		return;
+
+	xkl.type = image->type;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_unload, &xkl);
+
+	WARN(rc, "kexec: %s: HYPERVISOR_kexec_op(): %i\n", __func__, rc);
+}
+
+void mf_kexec_shutdown(void)
+{
+}
+
+void mf_kexec(struct kimage *image)
+{
+	int rc;
+	struct xen_kexec_exec xke = {};
+
+	xke.type = image->type;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec, &xke);
+
+	pr_emerg("kexec: %s: HYPERVISOR_kexec_op(): %i\n", __func__, rc);
+	BUG();
+}
diff --git a/arch/x86/xen/relocate_kernel_64.S b/arch/x86/xen/relocate_kernel_64.S
new file mode 100644
index 0000000..8f641f1
--- /dev/null
+++ b/arch/x86/xen/relocate_kernel_64.S
@@ -0,0 +1,309 @@
+/*
+ * Copyright (c) 2002-2005 Eric Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <asm/page_types.h>
+#include <asm/pgtable_types.h>
+#include <asm/processor-flags.h>
+
+#include <asm/xen/kexec.h>
+
+#define PTR(x)	(x << 3)
+
+	.text
+	.code64
+	.globl	xen_kexec_control_code_size, xen_relocate_kernel
+
+xen_relocate_kernel:
+	/*
+	 * Must be relocatable PIC code callable as a C function.
+	 *
+	 * This function is called by Xen but here hypervisor is dead.
+	 * We are playing on bare metal.
+	 *
+	 * Every machine address passed to this function through
+	 * page_list (e.g. XK_MA_CONTROL_PAGE) is established
+	 * by dom0 during kexec load phase.
+	 *
+	 * Every virtual address passed to this function through page_list
+	 * (e.g. XK_VA_CONTROL_PAGE) is established by hypervisor during
+	 * HYPERVISOR_kexec_op(KEXEC_CMD_kexec_load) hypercall.
+	 *
+	 * %rdi - indirection_page,
+	 * %rsi - page_list,
+	 * %rdx - start_address,
+	 * %ecx - preserve_context (ignored).
+	 */
+
+	/* Zero out flags, and disable interrupts. */
+	pushq	$0
+	popfq
+
+	/*
+	 * Map the control page at its virtual address
+	 * in transition page table.
+	 */
+	movq	PTR(XK_VA_CONTROL_PAGE)(%rsi), %r8
+
+	/* Get PGD address and PGD entry index. */
+	movq	PTR(XK_VA_PGD_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PGDIR_SHIFT, %r10
+	andq	$(PTRS_PER_PGD - 1), %r10
+
+	/* Fill PGD entry with PUD0 reference. */
+	movq	PTR(XK_MA_PUD0_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PUD0 address and PUD0 entry index. */
+	movq	PTR(XK_VA_PUD0_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PUD_SHIFT, %r10
+	andq	$(PTRS_PER_PUD - 1), %r10
+
+	/* Fill PUD0 entry with PMD0 reference. */
+	movq	PTR(XK_MA_PMD0_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PMD0 address and PMD0 entry index. */
+	movq	PTR(XK_VA_PMD0_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PMD_SHIFT, %r10
+	andq	$(PTRS_PER_PMD - 1), %r10
+
+	/* Fill PMD0 entry with PTE0 reference. */
+	movq	PTR(XK_MA_PTE0_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PTE0 address and PTE0 entry index. */
+	movq	PTR(XK_VA_PTE0_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PAGE_SHIFT, %r10
+	andq	$(PTRS_PER_PTE - 1), %r10
+
+	/* Fill PTE0 entry with control page reference. */
+	movq	PTR(XK_MA_CONTROL_PAGE)(%rsi), %r11
+	orq	$__PAGE_KERNEL_EXEC, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/*
+	 * Identity map the control page at its machine address
+	 * in transition page table.
+	 */
+	movq	PTR(XK_MA_CONTROL_PAGE)(%rsi), %r8
+
+	/* Get PGD address and PGD entry index. */
+	movq	PTR(XK_VA_PGD_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PGDIR_SHIFT, %r10
+	andq	$(PTRS_PER_PGD - 1), %r10
+
+	/* Fill PGD entry with PUD1 reference. */
+	movq	PTR(XK_MA_PUD1_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PUD1 address and PUD1 entry index. */
+	movq	PTR(XK_VA_PUD1_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PUD_SHIFT, %r10
+	andq	$(PTRS_PER_PUD - 1), %r10
+
+	/* Fill PUD1 entry with PMD1 reference. */
+	movq	PTR(XK_MA_PMD1_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PMD1 address and PMD1 entry index. */
+	movq	PTR(XK_VA_PMD1_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PMD_SHIFT, %r10
+	andq	$(PTRS_PER_PMD - 1), %r10
+
+	/* Fill PMD1 entry with PTE1 reference. */
+	movq	PTR(XK_MA_PTE1_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PTE1 address and PTE1 entry index. */
+	movq	PTR(XK_VA_PTE1_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PAGE_SHIFT, %r10
+	andq	$(PTRS_PER_PTE - 1), %r10
+
+	/* Fill PTE1 entry with control page reference. */
+	movq	PTR(XK_MA_CONTROL_PAGE)(%rsi), %r11
+	orq	$__PAGE_KERNEL_EXEC, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/*
+	 * Get machine address of control page now.
+	 * This is impossible after page table switch.
+	 */
+	movq	PTR(XK_MA_CONTROL_PAGE)(%rsi), %r8
+
+	/* Get machine address of identity page table now too. */
+	movq	PTR(XK_MA_TABLE_PAGE)(%rsi), %r9
+
+	/* Get machine address of transition page table now too. */
+	movq	PTR(XK_MA_PGD_PAGE)(%rsi), %r10
+
+	/* Switch to transition page table. */
+	movq	%r10, %cr3
+
+	/* Setup a new stack at the end of machine address of control page. */
+	leaq	PAGE_SIZE(%r8), %rsp
+
+	/* Store start_address on the stack. */
+	pushq   %rdx
+
+	/* Jump to identity mapped page. */
+	addq	$(identity_mapped - xen_relocate_kernel), %r8
+	jmpq	*%r8
+
+identity_mapped:
+	/* Switch to identity page table. */
+	movq	%r9, %cr3
+
+	/*
+	 * Set %cr0 to a known state:
+	 *   - disable alignment check,
+	 *   - disable floating point emulation,
+	 *   - no task switch,
+	 *   - disable write protect,
+	 *   - enable protected mode,
+	 *   - enable paging.
+	 */
+	movq	%cr0, %rax
+	andq	$~(X86_CR0_AM | X86_CR0_EM | X86_CR0_TS | X86_CR0_WP), %rax
+	orl	$(X86_CR0_PE | X86_CR0_PG), %eax
+	movq	%rax, %cr0
+
+	/*
+	 * Set %cr4 to a known state:
+	 *   - enable physical address extension.
+	 */
+	movq	$X86_CR4_PAE, %rax
+	movq	%rax, %cr4
+
+	jmp	1f
+
+1:
+	/* Flush the TLB (needed?). */
+	movq	%r9, %cr3
+
+	/* Do the copies. */
+	movq	%rdi, %rcx	/* Put the indirection_page in %rcx. */
+	xorq	%rdi, %rdi
+	xorq	%rsi, %rsi
+	jmp	1f
+
+0:
+	/*
+	 * Top, read another quadword from the indirection page.
+	 * Indirection page is an array which contains source
+	 * and destination address pairs. If all pairs could
+	 * not fit in one page then at the end of given
+	 * indirection page is pointer to next one.
+	 * Copy is stopped when done indicator
+	 * is found in indirection page.
+	 */
+	movq	(%rbx), %rcx
+	addq	$8, %rbx
+
+1:
+	testq	$0x1, %rcx	/* Is it a destination page? */
+	jz	2f
+
+	movq	%rcx, %rdi
+	andq	$PAGE_MASK, %rdi
+	jmp	0b
+
+2:
+	testq	$0x2, %rcx	/* Is it an indirection page? */
+	jz	2f
+
+	movq	%rcx, %rbx
+	andq	$PAGE_MASK, %rbx
+	jmp	0b
+
+2:
+	testq	$0x4, %rcx	/* Is it the done indicator? */
+	jz	2f
+	jmp	3f
+
+2:
+	testq	$0x8, %rcx	/* Is it the source indicator? */
+	jz	0b		/* Ignore it otherwise. */
+
+	movq	%rcx, %rsi
+	andq	$PAGE_MASK, %rsi
+	movq	$512, %rcx
+
+	/* Copy page. */
+	rep	movsq
+	jmp	0b
+
+3:
+	/*
+	 * To be certain of avoiding problems with self-modifying code
+	 * I need to execute a serializing instruction here.
+	 * So I flush the TLB by reloading %cr3 here, it's handy,
+	 * and not processor dependent.
+	 */
+	movq	%cr3, %rax
+	movq	%rax, %cr3
+
+	/*
+	 * Set all of the registers to known values.
+	 * Leave %rsp alone.
+	 */
+	xorq	%rax, %rax
+	xorq	%rbx, %rbx
+	xorq    %rcx, %rcx
+	xorq    %rdx, %rdx
+	xorq    %rsi, %rsi
+	xorq    %rdi, %rdi
+	xorq    %rbp, %rbp
+	xorq	%r8, %r8
+	xorq	%r9, %r9
+	xorq	%r10, %r10
+	xorq	%r11, %r11
+	xorq	%r12, %r12
+	xorq	%r13, %r13
+	xorq	%r14, %r14
+	xorq	%r15, %r15
+
+	/* Jump to start_address. */
+	retq
+
+xen_kexec_control_code_size:
+	.long	. - xen_relocate_kernel
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 07/11] x86/xen: Add x86_64 kexec/kdump implementation
@ 2012-12-27  2:18               ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Add x86_64 kexec/kdump implementation.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/xen/machine_kexec_64.c   |  318 +++++++++++++++++++++++++++++++++++++
 arch/x86/xen/relocate_kernel_64.S |  309 +++++++++++++++++++++++++++++++++++
 2 files changed, 627 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/xen/machine_kexec_64.c
 create mode 100644 arch/x86/xen/relocate_kernel_64.S

diff --git a/arch/x86/xen/machine_kexec_64.c b/arch/x86/xen/machine_kexec_64.c
new file mode 100644
index 0000000..2600342
--- /dev/null
+++ b/arch/x86/xen/machine_kexec_64.c
@@ -0,0 +1,318 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/mm.h>
+#include <linux/string.h>
+
+#include <xen/interface/memory.h>
+#include <xen/xen.h>
+
+#include <asm/xen/hypercall.h>
+#include <asm/xen/kexec.h>
+#include <asm/xen/page.h>
+
+#define __ma(vaddr)	(virt_to_machine(vaddr).maddr)
+
+static void init_level2_page(pmd_t *pmd, unsigned long addr)
+{
+	unsigned long end_addr = addr + PUD_SIZE;
+
+	while (addr < end_addr) {
+		native_set_pmd(pmd++, native_make_pmd(addr | __PAGE_KERNEL_LARGE_EXEC));
+		addr += PMD_SIZE;
+	}
+}
+
+static int init_level3_page(struct kimage *image, pud_t *pud,
+				unsigned long addr, unsigned long last_addr)
+{
+	pmd_t *pmd;
+	struct page *page;
+	unsigned long end_addr = addr + PGDIR_SIZE;
+
+	while ((addr < last_addr) && (addr < end_addr)) {
+		page = firmware_kimage_alloc_control_pages(image, 0);
+
+		if (!page)
+			return -ENOMEM;
+
+		pmd = page_address(page);
+		init_level2_page(pmd, addr);
+		native_set_pud(pud++, native_make_pud(__ma(pmd) | _KERNPG_TABLE));
+		addr += PUD_SIZE;
+	}
+
+	/* Clear the unused entries. */
+	while (addr < end_addr) {
+		native_pud_clear(pud++);
+		addr += PUD_SIZE;
+	}
+
+	return 0;
+}
+
+
+static int init_level4_page(struct kimage *image, pgd_t *pgd,
+				unsigned long addr, unsigned long last_addr)
+{
+	int rc;
+	pud_t *pud;
+	struct page *page;
+	unsigned long end_addr = addr + PTRS_PER_PGD * PGDIR_SIZE;
+
+	while ((addr < last_addr) && (addr < end_addr)) {
+		page = firmware_kimage_alloc_control_pages(image, 0);
+
+		if (!page)
+			return -ENOMEM;
+
+		pud = page_address(page);
+		rc = init_level3_page(image, pud, addr, last_addr);
+
+		if (rc)
+			return rc;
+
+		native_set_pgd(pgd++, native_make_pgd(__ma(pud) | _KERNPG_TABLE));
+		addr += PGDIR_SIZE;
+	}
+
+	/* Clear the unused entries. */
+	while (addr < end_addr) {
+		native_pgd_clear(pgd++);
+		addr += PGDIR_SIZE;
+	}
+
+	return 0;
+}
+
+static void free_transition_pgtable(struct kimage *image)
+{
+	free_page((unsigned long)image->arch.pgd);
+	free_page((unsigned long)image->arch.pud0);
+	free_page((unsigned long)image->arch.pud1);
+	free_page((unsigned long)image->arch.pmd0);
+	free_page((unsigned long)image->arch.pmd1);
+	free_page((unsigned long)image->arch.pte0);
+	free_page((unsigned long)image->arch.pte1);
+}
+
+static int alloc_transition_pgtable(struct kimage *image)
+{
+	image->arch.pgd = (pgd_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pgd)
+		goto err;
+
+	image->arch.pud0 = (pud_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pud0)
+		goto err;
+
+	image->arch.pud1 = (pud_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pud1)
+		goto err;
+
+	image->arch.pmd0 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pmd0)
+		goto err;
+
+	image->arch.pmd1 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pmd1)
+		goto err;
+
+	image->arch.pte0 = (pte_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pte0)
+		goto err;
+
+	image->arch.pte1 = (pte_t *)get_zeroed_page(GFP_KERNEL);
+
+	if (!image->arch.pte1)
+		goto err;
+
+	return 0;
+
+err:
+	free_transition_pgtable(image);
+
+	return -ENOMEM;
+}
+
+static int init_pgtable(struct kimage *image, pgd_t *pgd)
+{
+	int rc;
+	unsigned long max_mfn;
+
+	max_mfn = HYPERVISOR_memory_op(XENMEM_maximum_ram_page, NULL);
+
+	rc = init_level4_page(image, pgd, 0, PFN_PHYS(max_mfn));
+
+	if (rc)
+		return rc;
+
+	return alloc_transition_pgtable(image);
+}
+
+struct page *mf_kexec_kimage_alloc_pages(gfp_t gfp_mask,
+						unsigned int order,
+						unsigned long limit)
+{
+	struct page *pages;
+	unsigned int i;
+
+	pages = alloc_pages(gfp_mask, order);
+
+	if (!pages)
+		return NULL;
+
+	BUG_ON(PagePrivate(pages));
+
+	pages->mapping = NULL;
+	set_page_private(pages, order);
+
+	for (i = 0; i < (1 << order); ++i)
+		SetPageReserved(pages + i);
+
+	return pages;
+}
+
+void mf_kexec_kimage_free_pages(struct page *page)
+{
+	unsigned int i, order;
+
+	order = page_private(page);
+
+	for (i = 0; i < (1 << order); ++i)
+		ClearPageReserved(page + i);
+
+	__free_pages(page, order);
+}
+
+unsigned long mf_kexec_page_to_pfn(struct page *page)
+{
+	return pfn_to_mfn(page_to_pfn(page));
+}
+
+struct page *mf_kexec_pfn_to_page(unsigned long mfn)
+{
+	return pfn_to_page(mfn_to_pfn(mfn));
+}
+
+unsigned long mf_kexec_virt_to_phys(volatile void *address)
+{
+	return virt_to_machine(address).maddr;
+}
+
+void *mf_kexec_phys_to_virt(unsigned long address)
+{
+	return phys_to_virt(machine_to_phys(XMADDR(address)).paddr);
+}
+
+int mf_kexec_prepare(struct kimage *image)
+{
+#ifdef CONFIG_KEXEC_JUMP
+	if (image->preserve_context) {
+		pr_info_once("kexec: Context preservation is not "
+				"supported in Xen domains.\n");
+		return -ENOSYS;
+	}
+#endif
+
+	return init_pgtable(image, page_address(image->control_code_page));
+}
+
+int mf_kexec_load(struct kimage *image)
+{
+	void *control_page, *table_page;
+	struct xen_kexec_load xkl = {};
+
+	/* Image is unloaded, nothing to do. */
+	if (!image)
+		return 0;
+
+	table_page = page_address(image->control_code_page);
+	control_page = table_page + PAGE_SIZE;
+
+	memcpy(control_page, xen_relocate_kernel, xen_kexec_control_code_size);
+
+	xkl.type = image->type;
+	xkl.image.page_list[XK_MA_CONTROL_PAGE] = __ma(control_page);
+	xkl.image.page_list[XK_MA_TABLE_PAGE] = __ma(table_page);
+	xkl.image.page_list[XK_MA_PGD_PAGE] = __ma(image->arch.pgd);
+	xkl.image.page_list[XK_MA_PUD0_PAGE] = __ma(image->arch.pud0);
+	xkl.image.page_list[XK_MA_PUD1_PAGE] = __ma(image->arch.pud1);
+	xkl.image.page_list[XK_MA_PMD0_PAGE] = __ma(image->arch.pmd0);
+	xkl.image.page_list[XK_MA_PMD1_PAGE] = __ma(image->arch.pmd1);
+	xkl.image.page_list[XK_MA_PTE0_PAGE] = __ma(image->arch.pte0);
+	xkl.image.page_list[XK_MA_PTE1_PAGE] = __ma(image->arch.pte1);
+	xkl.image.indirection_page = image->head;
+	xkl.image.start_address = image->start;
+
+	return HYPERVISOR_kexec_op(KEXEC_CMD_kexec_load, &xkl);
+}
+
+void mf_kexec_cleanup(struct kimage *image)
+{
+	free_transition_pgtable(image);
+}
+
+void mf_kexec_unload(struct kimage *image)
+{
+	int rc;
+	struct xen_kexec_load xkl = {};
+
+	if (!image)
+		return;
+
+	xkl.type = image->type;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_unload, &xkl);
+
+	WARN(rc, "kexec: %s: HYPERVISOR_kexec_op(): %i\n", __func__, rc);
+}
+
+void mf_kexec_shutdown(void)
+{
+}
+
+void mf_kexec(struct kimage *image)
+{
+	int rc;
+	struct xen_kexec_exec xke = {};
+
+	xke.type = image->type;
+	rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec, &xke);
+
+	pr_emerg("kexec: %s: HYPERVISOR_kexec_op(): %i\n", __func__, rc);
+	BUG();
+}
diff --git a/arch/x86/xen/relocate_kernel_64.S b/arch/x86/xen/relocate_kernel_64.S
new file mode 100644
index 0000000..8f641f1
--- /dev/null
+++ b/arch/x86/xen/relocate_kernel_64.S
@@ -0,0 +1,309 @@
+/*
+ * Copyright (c) 2002-2005 Eric Biederman <ebiederm@xmission.com>
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <asm/page_types.h>
+#include <asm/pgtable_types.h>
+#include <asm/processor-flags.h>
+
+#include <asm/xen/kexec.h>
+
+#define PTR(x)	(x << 3)
+
+	.text
+	.code64
+	.globl	xen_kexec_control_code_size, xen_relocate_kernel
+
+xen_relocate_kernel:
+	/*
+	 * Must be relocatable PIC code callable as a C function.
+	 *
+	 * This function is called by Xen but here hypervisor is dead.
+	 * We are playing on bare metal.
+	 *
+	 * Every machine address passed to this function through
+	 * page_list (e.g. XK_MA_CONTROL_PAGE) is established
+	 * by dom0 during kexec load phase.
+	 *
+	 * Every virtual address passed to this function through page_list
+	 * (e.g. XK_VA_CONTROL_PAGE) is established by hypervisor during
+	 * HYPERVISOR_kexec_op(KEXEC_CMD_kexec_load) hypercall.
+	 *
+	 * %rdi - indirection_page,
+	 * %rsi - page_list,
+	 * %rdx - start_address,
+	 * %ecx - preserve_context (ignored).
+	 */
+
+	/* Zero out flags, and disable interrupts. */
+	pushq	$0
+	popfq
+
+	/*
+	 * Map the control page at its virtual address
+	 * in transition page table.
+	 */
+	movq	PTR(XK_VA_CONTROL_PAGE)(%rsi), %r8
+
+	/* Get PGD address and PGD entry index. */
+	movq	PTR(XK_VA_PGD_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PGDIR_SHIFT, %r10
+	andq	$(PTRS_PER_PGD - 1), %r10
+
+	/* Fill PGD entry with PUD0 reference. */
+	movq	PTR(XK_MA_PUD0_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PUD0 address and PUD0 entry index. */
+	movq	PTR(XK_VA_PUD0_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PUD_SHIFT, %r10
+	andq	$(PTRS_PER_PUD - 1), %r10
+
+	/* Fill PUD0 entry with PMD0 reference. */
+	movq	PTR(XK_MA_PMD0_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PMD0 address and PMD0 entry index. */
+	movq	PTR(XK_VA_PMD0_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PMD_SHIFT, %r10
+	andq	$(PTRS_PER_PMD - 1), %r10
+
+	/* Fill PMD0 entry with PTE0 reference. */
+	movq	PTR(XK_MA_PTE0_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PTE0 address and PTE0 entry index. */
+	movq	PTR(XK_VA_PTE0_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PAGE_SHIFT, %r10
+	andq	$(PTRS_PER_PTE - 1), %r10
+
+	/* Fill PTE0 entry with control page reference. */
+	movq	PTR(XK_MA_CONTROL_PAGE)(%rsi), %r11
+	orq	$__PAGE_KERNEL_EXEC, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/*
+	 * Identity map the control page at its machine address
+	 * in transition page table.
+	 */
+	movq	PTR(XK_MA_CONTROL_PAGE)(%rsi), %r8
+
+	/* Get PGD address and PGD entry index. */
+	movq	PTR(XK_VA_PGD_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PGDIR_SHIFT, %r10
+	andq	$(PTRS_PER_PGD - 1), %r10
+
+	/* Fill PGD entry with PUD1 reference. */
+	movq	PTR(XK_MA_PUD1_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PUD1 address and PUD1 entry index. */
+	movq	PTR(XK_VA_PUD1_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PUD_SHIFT, %r10
+	andq	$(PTRS_PER_PUD - 1), %r10
+
+	/* Fill PUD1 entry with PMD1 reference. */
+	movq	PTR(XK_MA_PMD1_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PMD1 address and PMD1 entry index. */
+	movq	PTR(XK_VA_PMD1_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PMD_SHIFT, %r10
+	andq	$(PTRS_PER_PMD - 1), %r10
+
+	/* Fill PMD1 entry with PTE1 reference. */
+	movq	PTR(XK_MA_PTE1_PAGE)(%rsi), %r11
+	orq	$_KERNPG_TABLE, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/* Get PTE1 address and PTE1 entry index. */
+	movq	PTR(XK_VA_PTE1_PAGE)(%rsi), %r9
+	movq	%r8, %r10
+	shrq	$PAGE_SHIFT, %r10
+	andq	$(PTRS_PER_PTE - 1), %r10
+
+	/* Fill PTE1 entry with control page reference. */
+	movq	PTR(XK_MA_CONTROL_PAGE)(%rsi), %r11
+	orq	$__PAGE_KERNEL_EXEC, %r11
+	movq	%r11, (%r9, %r10, 8)
+
+	/*
+	 * Get machine address of control page now.
+	 * This is impossible after page table switch.
+	 */
+	movq	PTR(XK_MA_CONTROL_PAGE)(%rsi), %r8
+
+	/* Get machine address of identity page table now too. */
+	movq	PTR(XK_MA_TABLE_PAGE)(%rsi), %r9
+
+	/* Get machine address of transition page table now too. */
+	movq	PTR(XK_MA_PGD_PAGE)(%rsi), %r10
+
+	/* Switch to transition page table. */
+	movq	%r10, %cr3
+
+	/* Setup a new stack at the end of machine address of control page. */
+	leaq	PAGE_SIZE(%r8), %rsp
+
+	/* Store start_address on the stack. */
+	pushq   %rdx
+
+	/* Jump to identity mapped page. */
+	addq	$(identity_mapped - xen_relocate_kernel), %r8
+	jmpq	*%r8
+
+identity_mapped:
+	/* Switch to identity page table. */
+	movq	%r9, %cr3
+
+	/*
+	 * Set %cr0 to a known state:
+	 *   - disable alignment check,
+	 *   - disable floating point emulation,
+	 *   - no task switch,
+	 *   - disable write protect,
+	 *   - enable protected mode,
+	 *   - enable paging.
+	 */
+	movq	%cr0, %rax
+	andq	$~(X86_CR0_AM | X86_CR0_EM | X86_CR0_TS | X86_CR0_WP), %rax
+	orl	$(X86_CR0_PE | X86_CR0_PG), %eax
+	movq	%rax, %cr0
+
+	/*
+	 * Set %cr4 to a known state:
+	 *   - enable physical address extension.
+	 */
+	movq	$X86_CR4_PAE, %rax
+	movq	%rax, %cr4
+
+	jmp	1f
+
+1:
+	/* Flush the TLB (needed?). */
+	movq	%r9, %cr3
+
+	/* Do the copies. */
+	movq	%rdi, %rcx	/* Put the indirection_page in %rcx. */
+	xorq	%rdi, %rdi
+	xorq	%rsi, %rsi
+	jmp	1f
+
+0:
+	/*
+	 * Top, read another quadword from the indirection page.
+	 * Indirection page is an array which contains source
+	 * and destination address pairs. If all pairs could
+	 * not fit in one page then at the end of given
+	 * indirection page is pointer to next one.
+	 * Copy is stopped when done indicator
+	 * is found in indirection page.
+	 */
+	movq	(%rbx), %rcx
+	addq	$8, %rbx
+
+1:
+	testq	$0x1, %rcx	/* Is it a destination page? */
+	jz	2f
+
+	movq	%rcx, %rdi
+	andq	$PAGE_MASK, %rdi
+	jmp	0b
+
+2:
+	testq	$0x2, %rcx	/* Is it an indirection page? */
+	jz	2f
+
+	movq	%rcx, %rbx
+	andq	$PAGE_MASK, %rbx
+	jmp	0b
+
+2:
+	testq	$0x4, %rcx	/* Is it the done indicator? */
+	jz	2f
+	jmp	3f
+
+2:
+	testq	$0x8, %rcx	/* Is it the source indicator? */
+	jz	0b		/* Ignore it otherwise. */
+
+	movq	%rcx, %rsi
+	andq	$PAGE_MASK, %rsi
+	movq	$512, %rcx
+
+	/* Copy page. */
+	rep	movsq
+	jmp	0b
+
+3:
+	/*
+	 * To be certain of avoiding problems with self-modifying code
+	 * I need to execute a serializing instruction here.
+	 * So I flush the TLB by reloading %cr3 here, it's handy,
+	 * and not processor dependent.
+	 */
+	movq	%cr3, %rax
+	movq	%rax, %cr3
+
+	/*
+	 * Set all of the registers to known values.
+	 * Leave %rsp alone.
+	 */
+	xorq	%rax, %rax
+	xorq	%rbx, %rbx
+	xorq    %rcx, %rcx
+	xorq    %rdx, %rdx
+	xorq    %rsi, %rsi
+	xorq    %rdi, %rdi
+	xorq    %rbp, %rbp
+	xorq	%r8, %r8
+	xorq	%r9, %r9
+	xorq	%r10, %r10
+	xorq	%r11, %r11
+	xorq	%r12, %r12
+	xorq	%r13, %r13
+	xorq	%r14, %r14
+	xorq	%r15, %r15
+
+	/* Jump to start_address. */
+	retq
+
+xen_kexec_control_code_size:
+	.long	. - xen_relocate_kernel
-- 
1.5.6.5


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 08/11] x86/xen: Add kexec/kdump Kconfig and makefile rules
@ 2012-12-27  2:18                 ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Add kexec/kdump Kconfig and makefile rules.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/Kconfig      |    3 +++
 arch/x86/xen/Kconfig  |    1 +
 arch/x86/xen/Makefile |    3 +++
 3 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 79795af..e2746c4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1600,6 +1600,9 @@ config KEXEC_JUMP
 	  Jump between original kernel and kexeced kernel and invoke
 	  code in physical address mode via KEXEC
 
+config KEXEC_FIRMWARE
+	def_bool n
+
 config PHYSICAL_START
 	hex "Physical address where the kernel is loaded" if (EXPERT || CRASH_DUMP)
 	default "0x1000000"
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 131dacd..8469c1c 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -7,6 +7,7 @@ config XEN
 	select PARAVIRT
 	select PARAVIRT_CLOCK
 	select XEN_HAVE_PVMMU
+	select KEXEC_FIRMWARE if KEXEC
 	depends on X86_64 || (X86_32 && X86_PAE && !X86_VISWS)
 	depends on X86_TSC
 	help
diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 96ab2c0..99952d7 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -22,3 +22,6 @@ obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o
 obj-$(CONFIG_XEN_DEBUG_FS)	+= debugfs.o
 obj-$(CONFIG_XEN_DOM0)		+= apic.o vga.o
 obj-$(CONFIG_SWIOTLB_XEN)	+= pci-swiotlb-xen.o
+obj-$(CONFIG_KEXEC_FIRMWARE)	+= kexec.o
+obj-$(CONFIG_KEXEC_FIRMWARE)	+= machine_kexec_$(BITS).o
+obj-$(CONFIG_KEXEC_FIRMWARE)	+= relocate_kernel_$(BITS).o
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 08/11] x86/xen: Add kexec/kdump Kconfig and makefile rules
  2012-12-27  2:18               ` Daniel Kiper
  (?)
  (?)
@ 2012-12-27  2:18               ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Add kexec/kdump Kconfig and makefile rules.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/Kconfig      |    3 +++
 arch/x86/xen/Kconfig  |    1 +
 arch/x86/xen/Makefile |    3 +++
 3 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 79795af..e2746c4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1600,6 +1600,9 @@ config KEXEC_JUMP
 	  Jump between original kernel and kexeced kernel and invoke
 	  code in physical address mode via KEXEC
 
+config KEXEC_FIRMWARE
+	def_bool n
+
 config PHYSICAL_START
 	hex "Physical address where the kernel is loaded" if (EXPERT || CRASH_DUMP)
 	default "0x1000000"
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 131dacd..8469c1c 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -7,6 +7,7 @@ config XEN
 	select PARAVIRT
 	select PARAVIRT_CLOCK
 	select XEN_HAVE_PVMMU
+	select KEXEC_FIRMWARE if KEXEC
 	depends on X86_64 || (X86_32 && X86_PAE && !X86_VISWS)
 	depends on X86_TSC
 	help
diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 96ab2c0..99952d7 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -22,3 +22,6 @@ obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o
 obj-$(CONFIG_XEN_DEBUG_FS)	+= debugfs.o
 obj-$(CONFIG_XEN_DOM0)		+= apic.o vga.o
 obj-$(CONFIG_SWIOTLB_XEN)	+= pci-swiotlb-xen.o
+obj-$(CONFIG_KEXEC_FIRMWARE)	+= kexec.o
+obj-$(CONFIG_KEXEC_FIRMWARE)	+= machine_kexec_$(BITS).o
+obj-$(CONFIG_KEXEC_FIRMWARE)	+= relocate_kernel_$(BITS).o
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 08/11] x86/xen: Add kexec/kdump Kconfig and makefile rules
@ 2012-12-27  2:18                 ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3-Sxgqhf6Nn4DQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	jbeulich-IBi9RG/b67k, konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
	maxim.uvarov-QHcLZuEGTsvQT0dZR+AlfA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR
  Cc: Daniel Kiper

Add kexec/kdump Kconfig and makefile rules.

Signed-off-by: Daniel Kiper <daniel.kiper-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 arch/x86/Kconfig      |    3 +++
 arch/x86/xen/Kconfig  |    1 +
 arch/x86/xen/Makefile |    3 +++
 3 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 79795af..e2746c4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1600,6 +1600,9 @@ config KEXEC_JUMP
 	  Jump between original kernel and kexeced kernel and invoke
 	  code in physical address mode via KEXEC
 
+config KEXEC_FIRMWARE
+	def_bool n
+
 config PHYSICAL_START
 	hex "Physical address where the kernel is loaded" if (EXPERT || CRASH_DUMP)
 	default "0x1000000"
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 131dacd..8469c1c 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -7,6 +7,7 @@ config XEN
 	select PARAVIRT
 	select PARAVIRT_CLOCK
 	select XEN_HAVE_PVMMU
+	select KEXEC_FIRMWARE if KEXEC
 	depends on X86_64 || (X86_32 && X86_PAE && !X86_VISWS)
 	depends on X86_TSC
 	help
diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 96ab2c0..99952d7 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -22,3 +22,6 @@ obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o
 obj-$(CONFIG_XEN_DEBUG_FS)	+= debugfs.o
 obj-$(CONFIG_XEN_DOM0)		+= apic.o vga.o
 obj-$(CONFIG_SWIOTLB_XEN)	+= pci-swiotlb-xen.o
+obj-$(CONFIG_KEXEC_FIRMWARE)	+= kexec.o
+obj-$(CONFIG_KEXEC_FIRMWARE)	+= machine_kexec_$(BITS).o
+obj-$(CONFIG_KEXEC_FIRMWARE)	+= relocate_kernel_$(BITS).o
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 08/11] x86/xen: Add kexec/kdump Kconfig and makefile rules
@ 2012-12-27  2:18                 ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Add kexec/kdump Kconfig and makefile rules.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/Kconfig      |    3 +++
 arch/x86/xen/Kconfig  |    1 +
 arch/x86/xen/Makefile |    3 +++
 3 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 79795af..e2746c4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1600,6 +1600,9 @@ config KEXEC_JUMP
 	  Jump between original kernel and kexeced kernel and invoke
 	  code in physical address mode via KEXEC
 
+config KEXEC_FIRMWARE
+	def_bool n
+
 config PHYSICAL_START
 	hex "Physical address where the kernel is loaded" if (EXPERT || CRASH_DUMP)
 	default "0x1000000"
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 131dacd..8469c1c 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -7,6 +7,7 @@ config XEN
 	select PARAVIRT
 	select PARAVIRT_CLOCK
 	select XEN_HAVE_PVMMU
+	select KEXEC_FIRMWARE if KEXEC
 	depends on X86_64 || (X86_32 && X86_PAE && !X86_VISWS)
 	depends on X86_TSC
 	help
diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 96ab2c0..99952d7 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -22,3 +22,6 @@ obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o
 obj-$(CONFIG_XEN_DEBUG_FS)	+= debugfs.o
 obj-$(CONFIG_XEN_DOM0)		+= apic.o vga.o
 obj-$(CONFIG_SWIOTLB_XEN)	+= pci-swiotlb-xen.o
+obj-$(CONFIG_KEXEC_FIRMWARE)	+= kexec.o
+obj-$(CONFIG_KEXEC_FIRMWARE)	+= machine_kexec_$(BITS).o
+obj-$(CONFIG_KEXEC_FIRMWARE)	+= relocate_kernel_$(BITS).o
-- 
1.5.6.5


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 09/11] x86/xen/enlighten: Add init and crash kexec/kdump hooks
@ 2012-12-27  2:18                   ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Add init and crash kexec/kdump hooks.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/xen/enlighten.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 138e566..5025bba 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -31,6 +31,7 @@
 #include <linux/pci.h>
 #include <linux/gfp.h>
 #include <linux/memblock.h>
+#include <linux/kexec.h>
 
 #include <xen/xen.h>
 #include <xen/events.h>
@@ -1276,6 +1277,12 @@ static void xen_machine_power_off(void)
 
 static void xen_crash_shutdown(struct pt_regs *regs)
 {
+#ifdef CONFIG_KEXEC_FIRMWARE
+	if (kexec_crash_image) {
+		crash_save_cpu(regs, safe_smp_processor_id());
+		return;
+	}
+#endif
 	xen_reboot(SHUTDOWN_crash);
 }
 
@@ -1353,6 +1360,10 @@ asmlinkage void __init xen_start_kernel(void)
 
 	xen_init_mmu_ops();
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+	kexec_use_firmware = true;
+#endif
+
 	/* Prevent unwanted bits from being set in PTEs. */
 	__supported_pte_mask &= ~_PAGE_GLOBAL;
 #if 0
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 09/11] x86/xen/enlighten: Add init and crash kexec/kdump hooks
  2012-12-27  2:18                 ` Daniel Kiper
  (?)
  (?)
@ 2012-12-27  2:18                 ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Add init and crash kexec/kdump hooks.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/xen/enlighten.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 138e566..5025bba 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -31,6 +31,7 @@
 #include <linux/pci.h>
 #include <linux/gfp.h>
 #include <linux/memblock.h>
+#include <linux/kexec.h>
 
 #include <xen/xen.h>
 #include <xen/events.h>
@@ -1276,6 +1277,12 @@ static void xen_machine_power_off(void)
 
 static void xen_crash_shutdown(struct pt_regs *regs)
 {
+#ifdef CONFIG_KEXEC_FIRMWARE
+	if (kexec_crash_image) {
+		crash_save_cpu(regs, safe_smp_processor_id());
+		return;
+	}
+#endif
 	xen_reboot(SHUTDOWN_crash);
 }
 
@@ -1353,6 +1360,10 @@ asmlinkage void __init xen_start_kernel(void)
 
 	xen_init_mmu_ops();
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+	kexec_use_firmware = true;
+#endif
+
 	/* Prevent unwanted bits from being set in PTEs. */
 	__supported_pte_mask &= ~_PAGE_GLOBAL;
 #if 0
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 09/11] x86/xen/enlighten: Add init and crash kexec/kdump hooks
@ 2012-12-27  2:18                   ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3-Sxgqhf6Nn4DQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	jbeulich-IBi9RG/b67k, konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
	maxim.uvarov-QHcLZuEGTsvQT0dZR+AlfA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR
  Cc: Daniel Kiper

Add init and crash kexec/kdump hooks.

Signed-off-by: Daniel Kiper <daniel.kiper-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 arch/x86/xen/enlighten.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 138e566..5025bba 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -31,6 +31,7 @@
 #include <linux/pci.h>
 #include <linux/gfp.h>
 #include <linux/memblock.h>
+#include <linux/kexec.h>
 
 #include <xen/xen.h>
 #include <xen/events.h>
@@ -1276,6 +1277,12 @@ static void xen_machine_power_off(void)
 
 static void xen_crash_shutdown(struct pt_regs *regs)
 {
+#ifdef CONFIG_KEXEC_FIRMWARE
+	if (kexec_crash_image) {
+		crash_save_cpu(regs, safe_smp_processor_id());
+		return;
+	}
+#endif
 	xen_reboot(SHUTDOWN_crash);
 }
 
@@ -1353,6 +1360,10 @@ asmlinkage void __init xen_start_kernel(void)
 
 	xen_init_mmu_ops();
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+	kexec_use_firmware = true;
+#endif
+
 	/* Prevent unwanted bits from being set in PTEs. */
 	__supported_pte_mask &= ~_PAGE_GLOBAL;
 #if 0
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 09/11] x86/xen/enlighten: Add init and crash kexec/kdump hooks
@ 2012-12-27  2:18                   ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Add init and crash kexec/kdump hooks.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/xen/enlighten.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 138e566..5025bba 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -31,6 +31,7 @@
 #include <linux/pci.h>
 #include <linux/gfp.h>
 #include <linux/memblock.h>
+#include <linux/kexec.h>
 
 #include <xen/xen.h>
 #include <xen/events.h>
@@ -1276,6 +1277,12 @@ static void xen_machine_power_off(void)
 
 static void xen_crash_shutdown(struct pt_regs *regs)
 {
+#ifdef CONFIG_KEXEC_FIRMWARE
+	if (kexec_crash_image) {
+		crash_save_cpu(regs, safe_smp_processor_id());
+		return;
+	}
+#endif
 	xen_reboot(SHUTDOWN_crash);
 }
 
@@ -1353,6 +1360,10 @@ asmlinkage void __init xen_start_kernel(void)
 
 	xen_init_mmu_ops();
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+	kexec_use_firmware = true;
+#endif
+
 	/* Prevent unwanted bits from being set in PTEs. */
 	__supported_pte_mask &= ~_PAGE_GLOBAL;
 #if 0
-- 
1.5.6.5


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 10/11] drivers/xen: Export vmcoreinfo through sysfs
@ 2012-12-27  2:18                     ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Export vmcoreinfo through sysfs.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 drivers/xen/sys-hypervisor.c |   42 +++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 41 insertions(+), 1 deletions(-)

diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
index 96453f8..9dd290c 100644
--- a/drivers/xen/sys-hypervisor.c
+++ b/drivers/xen/sys-hypervisor.c
@@ -368,6 +368,41 @@ static void xen_properties_destroy(void)
 	sysfs_remove_group(hypervisor_kobj, &xen_properties_group);
 }
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+static ssize_t vmcoreinfo_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+	return sprintf(buffer, "%lx %lx\n", xen_vmcoreinfo_maddr,
+						xen_vmcoreinfo_max_size);
+}
+
+HYPERVISOR_ATTR_RO(vmcoreinfo);
+
+static int __init xen_vmcoreinfo_init(void)
+{
+	if (!xen_vmcoreinfo_max_size)
+		return 0;
+
+	return sysfs_create_file(hypervisor_kobj, &vmcoreinfo_attr.attr);
+}
+
+static void xen_vmcoreinfo_destroy(void)
+{
+	if (!xen_vmcoreinfo_max_size)
+		return;
+
+	sysfs_remove_file(hypervisor_kobj, &vmcoreinfo_attr.attr);
+}
+#else
+static int __init xen_vmcoreinfo_init(void)
+{
+	return 0;
+}
+
+static void xen_vmcoreinfo_destroy(void)
+{
+}
+#endif
+
 static int __init hyper_sysfs_init(void)
 {
 	int ret;
@@ -390,9 +425,14 @@ static int __init hyper_sysfs_init(void)
 	ret = xen_properties_init();
 	if (ret)
 		goto prop_out;
+	ret = xen_vmcoreinfo_init();
+	if (ret)
+		goto vmcoreinfo_out;
 
 	goto out;
 
+vmcoreinfo_out:
+	xen_properties_destroy();
 prop_out:
 	xen_sysfs_uuid_destroy();
 uuid_out:
@@ -407,12 +447,12 @@ out:
 
 static void __exit hyper_sysfs_exit(void)
 {
+	xen_vmcoreinfo_destroy();
 	xen_properties_destroy();
 	xen_compilation_destroy();
 	xen_sysfs_uuid_destroy();
 	xen_sysfs_version_destroy();
 	xen_sysfs_type_destroy();
-
 }
 module_init(hyper_sysfs_init);
 module_exit(hyper_sysfs_exit);
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 10/11] drivers/xen: Export vmcoreinfo through sysfs
  2012-12-27  2:18                   ` Daniel Kiper
  (?)
  (?)
@ 2012-12-27  2:18                   ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Export vmcoreinfo through sysfs.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 drivers/xen/sys-hypervisor.c |   42 +++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 41 insertions(+), 1 deletions(-)

diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
index 96453f8..9dd290c 100644
--- a/drivers/xen/sys-hypervisor.c
+++ b/drivers/xen/sys-hypervisor.c
@@ -368,6 +368,41 @@ static void xen_properties_destroy(void)
 	sysfs_remove_group(hypervisor_kobj, &xen_properties_group);
 }
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+static ssize_t vmcoreinfo_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+	return sprintf(buffer, "%lx %lx\n", xen_vmcoreinfo_maddr,
+						xen_vmcoreinfo_max_size);
+}
+
+HYPERVISOR_ATTR_RO(vmcoreinfo);
+
+static int __init xen_vmcoreinfo_init(void)
+{
+	if (!xen_vmcoreinfo_max_size)
+		return 0;
+
+	return sysfs_create_file(hypervisor_kobj, &vmcoreinfo_attr.attr);
+}
+
+static void xen_vmcoreinfo_destroy(void)
+{
+	if (!xen_vmcoreinfo_max_size)
+		return;
+
+	sysfs_remove_file(hypervisor_kobj, &vmcoreinfo_attr.attr);
+}
+#else
+static int __init xen_vmcoreinfo_init(void)
+{
+	return 0;
+}
+
+static void xen_vmcoreinfo_destroy(void)
+{
+}
+#endif
+
 static int __init hyper_sysfs_init(void)
 {
 	int ret;
@@ -390,9 +425,14 @@ static int __init hyper_sysfs_init(void)
 	ret = xen_properties_init();
 	if (ret)
 		goto prop_out;
+	ret = xen_vmcoreinfo_init();
+	if (ret)
+		goto vmcoreinfo_out;
 
 	goto out;
 
+vmcoreinfo_out:
+	xen_properties_destroy();
 prop_out:
 	xen_sysfs_uuid_destroy();
 uuid_out:
@@ -407,12 +447,12 @@ out:
 
 static void __exit hyper_sysfs_exit(void)
 {
+	xen_vmcoreinfo_destroy();
 	xen_properties_destroy();
 	xen_compilation_destroy();
 	xen_sysfs_uuid_destroy();
 	xen_sysfs_version_destroy();
 	xen_sysfs_type_destroy();
-
 }
 module_init(hyper_sysfs_init);
 module_exit(hyper_sysfs_exit);
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 10/11] drivers/xen: Export vmcoreinfo through sysfs
@ 2012-12-27  2:18                     ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3-Sxgqhf6Nn4DQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	jbeulich-IBi9RG/b67k, konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
	maxim.uvarov-QHcLZuEGTsvQT0dZR+AlfA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR
  Cc: Daniel Kiper

Export vmcoreinfo through sysfs.

Signed-off-by: Daniel Kiper <daniel.kiper-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 drivers/xen/sys-hypervisor.c |   42 +++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 41 insertions(+), 1 deletions(-)

diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
index 96453f8..9dd290c 100644
--- a/drivers/xen/sys-hypervisor.c
+++ b/drivers/xen/sys-hypervisor.c
@@ -368,6 +368,41 @@ static void xen_properties_destroy(void)
 	sysfs_remove_group(hypervisor_kobj, &xen_properties_group);
 }
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+static ssize_t vmcoreinfo_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+	return sprintf(buffer, "%lx %lx\n", xen_vmcoreinfo_maddr,
+						xen_vmcoreinfo_max_size);
+}
+
+HYPERVISOR_ATTR_RO(vmcoreinfo);
+
+static int __init xen_vmcoreinfo_init(void)
+{
+	if (!xen_vmcoreinfo_max_size)
+		return 0;
+
+	return sysfs_create_file(hypervisor_kobj, &vmcoreinfo_attr.attr);
+}
+
+static void xen_vmcoreinfo_destroy(void)
+{
+	if (!xen_vmcoreinfo_max_size)
+		return;
+
+	sysfs_remove_file(hypervisor_kobj, &vmcoreinfo_attr.attr);
+}
+#else
+static int __init xen_vmcoreinfo_init(void)
+{
+	return 0;
+}
+
+static void xen_vmcoreinfo_destroy(void)
+{
+}
+#endif
+
 static int __init hyper_sysfs_init(void)
 {
 	int ret;
@@ -390,9 +425,14 @@ static int __init hyper_sysfs_init(void)
 	ret = xen_properties_init();
 	if (ret)
 		goto prop_out;
+	ret = xen_vmcoreinfo_init();
+	if (ret)
+		goto vmcoreinfo_out;
 
 	goto out;
 
+vmcoreinfo_out:
+	xen_properties_destroy();
 prop_out:
 	xen_sysfs_uuid_destroy();
 uuid_out:
@@ -407,12 +447,12 @@ out:
 
 static void __exit hyper_sysfs_exit(void)
 {
+	xen_vmcoreinfo_destroy();
 	xen_properties_destroy();
 	xen_compilation_destroy();
 	xen_sysfs_uuid_destroy();
 	xen_sysfs_version_destroy();
 	xen_sysfs_type_destroy();
-
 }
 module_init(hyper_sysfs_init);
 module_exit(hyper_sysfs_exit);
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 10/11] drivers/xen: Export vmcoreinfo through sysfs
@ 2012-12-27  2:18                     ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:18 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Export vmcoreinfo through sysfs.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 drivers/xen/sys-hypervisor.c |   42 +++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 41 insertions(+), 1 deletions(-)

diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
index 96453f8..9dd290c 100644
--- a/drivers/xen/sys-hypervisor.c
+++ b/drivers/xen/sys-hypervisor.c
@@ -368,6 +368,41 @@ static void xen_properties_destroy(void)
 	sysfs_remove_group(hypervisor_kobj, &xen_properties_group);
 }
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+static ssize_t vmcoreinfo_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+	return sprintf(buffer, "%lx %lx\n", xen_vmcoreinfo_maddr,
+						xen_vmcoreinfo_max_size);
+}
+
+HYPERVISOR_ATTR_RO(vmcoreinfo);
+
+static int __init xen_vmcoreinfo_init(void)
+{
+	if (!xen_vmcoreinfo_max_size)
+		return 0;
+
+	return sysfs_create_file(hypervisor_kobj, &vmcoreinfo_attr.attr);
+}
+
+static void xen_vmcoreinfo_destroy(void)
+{
+	if (!xen_vmcoreinfo_max_size)
+		return;
+
+	sysfs_remove_file(hypervisor_kobj, &vmcoreinfo_attr.attr);
+}
+#else
+static int __init xen_vmcoreinfo_init(void)
+{
+	return 0;
+}
+
+static void xen_vmcoreinfo_destroy(void)
+{
+}
+#endif
+
 static int __init hyper_sysfs_init(void)
 {
 	int ret;
@@ -390,9 +425,14 @@ static int __init hyper_sysfs_init(void)
 	ret = xen_properties_init();
 	if (ret)
 		goto prop_out;
+	ret = xen_vmcoreinfo_init();
+	if (ret)
+		goto vmcoreinfo_out;
 
 	goto out;
 
+vmcoreinfo_out:
+	xen_properties_destroy();
 prop_out:
 	xen_sysfs_uuid_destroy();
 uuid_out:
@@ -407,12 +447,12 @@ out:
 
 static void __exit hyper_sysfs_exit(void)
 {
+	xen_vmcoreinfo_destroy();
 	xen_properties_destroy();
 	xen_compilation_destroy();
 	xen_sysfs_uuid_destroy();
 	xen_sysfs_version_destroy();
 	xen_sysfs_type_destroy();
-
 }
 module_init(hyper_sysfs_init);
 module_exit(hyper_sysfs_exit);
-- 
1.5.6.5


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 11/11] x86: Add Xen kexec control code size check to linker script
@ 2012-12-27  2:19                       ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:19 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Add Xen kexec control code size check to linker script.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/kernel/vmlinux.lds.S |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 22a1530..f18786a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -360,5 +360,10 @@ INIT_PER_CPU(irq_stack_union);
 
 . = ASSERT(kexec_control_code_size <= KEXEC_CONTROL_CODE_MAX_SIZE,
            "kexec control code size is too big");
-#endif
 
+#ifdef CONFIG_XEN
+. = ASSERT(xen_kexec_control_code_size - xen_relocate_kernel <=
+		KEXEC_CONTROL_CODE_MAX_SIZE,
+		"Xen kexec control code size is too big");
+#endif
+#endif
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 11/11] x86: Add Xen kexec control code size check to linker script
  2012-12-27  2:18                     ` Daniel Kiper
                                       ` (2 preceding siblings ...)
  (?)
@ 2012-12-27  2:19                     ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:19 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Add Xen kexec control code size check to linker script.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/kernel/vmlinux.lds.S |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 22a1530..f18786a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -360,5 +360,10 @@ INIT_PER_CPU(irq_stack_union);
 
 . = ASSERT(kexec_control_code_size <= KEXEC_CONTROL_CODE_MAX_SIZE,
            "kexec control code size is too big");
-#endif
 
+#ifdef CONFIG_XEN
+. = ASSERT(xen_kexec_control_code_size - xen_relocate_kernel <=
+		KEXEC_CONTROL_CODE_MAX_SIZE,
+		"Xen kexec control code size is too big");
+#endif
+#endif
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 11/11] x86: Add Xen kexec control code size check to linker script
@ 2012-12-27  2:19                       ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:19 UTC (permalink / raw)
  To: andrew.cooper3-Sxgqhf6Nn4DQT0dZR+AlfA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	jbeulich-IBi9RG/b67k, konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
	maxim.uvarov-QHcLZuEGTsvQT0dZR+AlfA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR
  Cc: Daniel Kiper

Add Xen kexec control code size check to linker script.

Signed-off-by: Daniel Kiper <daniel.kiper-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 arch/x86/kernel/vmlinux.lds.S |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 22a1530..f18786a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -360,5 +360,10 @@ INIT_PER_CPU(irq_stack_union);
 
 . = ASSERT(kexec_control_code_size <= KEXEC_CONTROL_CODE_MAX_SIZE,
            "kexec control code size is too big");
-#endif
 
+#ifdef CONFIG_XEN
+. = ASSERT(xen_kexec_control_code_size - xen_relocate_kernel <=
+		KEXEC_CONTROL_CODE_MAX_SIZE,
+		"Xen kexec control code size is too big");
+#endif
+#endif
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* [PATCH v3 11/11] x86: Add Xen kexec control code size check to linker script
@ 2012-12-27  2:19                       ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2012-12-27  2:19 UTC (permalink / raw)
  To: andrew.cooper3, ebiederm, hpa, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel
  Cc: Daniel Kiper

Add Xen kexec control code size check to linker script.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 arch/x86/kernel/vmlinux.lds.S |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 22a1530..f18786a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -360,5 +360,10 @@ INIT_PER_CPU(irq_stack_union);
 
 . = ASSERT(kexec_control_code_size <= KEXEC_CONTROL_CODE_MAX_SIZE,
            "kexec control code size is too big");
-#endif
 
+#ifdef CONFIG_XEN
+. = ASSERT(xen_kexec_control_code_size - xen_relocate_kernel <=
+		KEXEC_CONTROL_CODE_MAX_SIZE,
+		"Xen kexec control code size is too big");
+#endif
+#endif
-- 
1.5.6.5


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2012-12-27  2:18     ` Daniel Kiper
@ 2012-12-27  3:33       ` H. Peter Anvin
  -1 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2012-12-27  3:33 UTC (permalink / raw)
  To: Daniel Kiper, andrew.cooper3, ebiederm, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel

Hmm... this code is being redone at the moment... this might conflict.

Daniel Kiper <daniel.kiper@oracle.com> wrote:

>Some implementations (e.g. Xen PVOPS) could not use part of identity
>page table
>to construct transition page table. It means that they require separate
>PUDs,
>PMDs and PTEs for virtual and physical (identity) mapping. To satisfy
>that
>requirement add extra pointer to PGD, PUD, PMD and PTE and align
>existing code.
>
>Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
>---
> arch/x86/include/asm/kexec.h       |   10 +++++++---
> arch/x86/kernel/machine_kexec_64.c |   12 ++++++------
> 2 files changed, 13 insertions(+), 9 deletions(-)
>
>diff --git a/arch/x86/include/asm/kexec.h
>b/arch/x86/include/asm/kexec.h
>index 6080d26..cedd204 100644
>--- a/arch/x86/include/asm/kexec.h
>+++ b/arch/x86/include/asm/kexec.h
>@@ -157,9 +157,13 @@ struct kimage_arch {
> };
> #else
> struct kimage_arch {
>-	pud_t *pud;
>-	pmd_t *pmd;
>-	pte_t *pte;
>+	pgd_t *pgd;
>+	pud_t *pud0;
>+	pud_t *pud1;
>+	pmd_t *pmd0;
>+	pmd_t *pmd1;
>+	pte_t *pte0;
>+	pte_t *pte1;
> };
> #endif
> 
>diff --git a/arch/x86/kernel/machine_kexec_64.c
>b/arch/x86/kernel/machine_kexec_64.c
>index b3ea9db..976e54b 100644
>--- a/arch/x86/kernel/machine_kexec_64.c
>+++ b/arch/x86/kernel/machine_kexec_64.c
>@@ -137,9 +137,9 @@ out:
> 
> static void free_transition_pgtable(struct kimage *image)
> {
>-	free_page((unsigned long)image->arch.pud);
>-	free_page((unsigned long)image->arch.pmd);
>-	free_page((unsigned long)image->arch.pte);
>+	free_page((unsigned long)image->arch.pud0);
>+	free_page((unsigned long)image->arch.pmd0);
>+	free_page((unsigned long)image->arch.pte0);
> }
> 
> static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
>@@ -157,7 +157,7 @@ static int init_transition_pgtable(struct kimage
>*image, pgd_t *pgd)
> 		pud = (pud_t *)get_zeroed_page(GFP_KERNEL);
> 		if (!pud)
> 			goto err;
>-		image->arch.pud = pud;
>+		image->arch.pud0 = pud;
> 		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
> 	}
> 	pud = pud_offset(pgd, vaddr);
>@@ -165,7 +165,7 @@ static int init_transition_pgtable(struct kimage
>*image, pgd_t *pgd)
> 		pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
> 		if (!pmd)
> 			goto err;
>-		image->arch.pmd = pmd;
>+		image->arch.pmd0 = pmd;
> 		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
> 	}
> 	pmd = pmd_offset(pud, vaddr);
>@@ -173,7 +173,7 @@ static int init_transition_pgtable(struct kimage
>*image, pgd_t *pgd)
> 		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
> 		if (!pte)
> 			goto err;
>-		image->arch.pte = pte;
>+		image->arch.pte0 = pte;
> 		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
> 	}
> 	pte = pte_offset_kernel(pmd, vaddr);

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2012-12-27  3:33       ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2012-12-27  3:33 UTC (permalink / raw)
  To: Daniel Kiper, andrew.cooper3, ebiederm, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel

Hmm... this code is being redone at the moment... this might conflict.

Daniel Kiper <daniel.kiper@oracle.com> wrote:

>Some implementations (e.g. Xen PVOPS) could not use part of identity
>page table
>to construct transition page table. It means that they require separate
>PUDs,
>PMDs and PTEs for virtual and physical (identity) mapping. To satisfy
>that
>requirement add extra pointer to PGD, PUD, PMD and PTE and align
>existing code.
>
>Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
>---
> arch/x86/include/asm/kexec.h       |   10 +++++++---
> arch/x86/kernel/machine_kexec_64.c |   12 ++++++------
> 2 files changed, 13 insertions(+), 9 deletions(-)
>
>diff --git a/arch/x86/include/asm/kexec.h
>b/arch/x86/include/asm/kexec.h
>index 6080d26..cedd204 100644
>--- a/arch/x86/include/asm/kexec.h
>+++ b/arch/x86/include/asm/kexec.h
>@@ -157,9 +157,13 @@ struct kimage_arch {
> };
> #else
> struct kimage_arch {
>-	pud_t *pud;
>-	pmd_t *pmd;
>-	pte_t *pte;
>+	pgd_t *pgd;
>+	pud_t *pud0;
>+	pud_t *pud1;
>+	pmd_t *pmd0;
>+	pmd_t *pmd1;
>+	pte_t *pte0;
>+	pte_t *pte1;
> };
> #endif
> 
>diff --git a/arch/x86/kernel/machine_kexec_64.c
>b/arch/x86/kernel/machine_kexec_64.c
>index b3ea9db..976e54b 100644
>--- a/arch/x86/kernel/machine_kexec_64.c
>+++ b/arch/x86/kernel/machine_kexec_64.c
>@@ -137,9 +137,9 @@ out:
> 
> static void free_transition_pgtable(struct kimage *image)
> {
>-	free_page((unsigned long)image->arch.pud);
>-	free_page((unsigned long)image->arch.pmd);
>-	free_page((unsigned long)image->arch.pte);
>+	free_page((unsigned long)image->arch.pud0);
>+	free_page((unsigned long)image->arch.pmd0);
>+	free_page((unsigned long)image->arch.pte0);
> }
> 
> static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
>@@ -157,7 +157,7 @@ static int init_transition_pgtable(struct kimage
>*image, pgd_t *pgd)
> 		pud = (pud_t *)get_zeroed_page(GFP_KERNEL);
> 		if (!pud)
> 			goto err;
>-		image->arch.pud = pud;
>+		image->arch.pud0 = pud;
> 		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
> 	}
> 	pud = pud_offset(pgd, vaddr);
>@@ -165,7 +165,7 @@ static int init_transition_pgtable(struct kimage
>*image, pgd_t *pgd)
> 		pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
> 		if (!pmd)
> 			goto err;
>-		image->arch.pmd = pmd;
>+		image->arch.pmd0 = pmd;
> 		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
> 	}
> 	pmd = pmd_offset(pud, vaddr);
>@@ -173,7 +173,7 @@ static int init_transition_pgtable(struct kimage
>*image, pgd_t *pgd)
> 		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
> 		if (!pte)
> 			goto err;
>-		image->arch.pte = pte;
>+		image->arch.pte0 = pte;
> 		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
> 	}
> 	pte = pte_offset_kernel(pmd, vaddr);

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 06/11] x86/xen: Add i386 kexec/kdump implementation
  2012-12-27  2:18             ` Daniel Kiper
  (?)
@ 2012-12-27  4:00               ` H. Peter Anvin
  -1 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2012-12-27  4:00 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: andrew.cooper3, ebiederm, jbeulich, konrad.wilk, maxim.uvarov,
	mingo, tglx, vgoyal, x86, kexec, linux-kernel, virtualization,
	xen-devel

On 12/26/2012 06:18 PM, Daniel Kiper wrote:
> Add i386 kexec/kdump implementation.
>
> v2 - suggestions/fixes:
>     - allocate transition page table pages below 4 GiB
>       (suggested by Jan Beulich).
>

Why?

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 06/11] x86/xen: Add i386 kexec/kdump implementation
@ 2012-12-27  4:00               ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2012-12-27  4:00 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	virtualization, mingo, ebiederm, jbeulich, maxim.uvarov, tglx,
	vgoyal

On 12/26/2012 06:18 PM, Daniel Kiper wrote:
> Add i386 kexec/kdump implementation.
>
> v2 - suggestions/fixes:
>     - allocate transition page table pages below 4 GiB
>       (suggested by Jan Beulich).
>

Why?

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 06/11] x86/xen: Add i386 kexec/kdump implementation
@ 2012-12-27  4:00               ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2012-12-27  4:00 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	virtualization, mingo, ebiederm, jbeulich, maxim.uvarov, tglx,
	vgoyal

On 12/26/2012 06:18 PM, Daniel Kiper wrote:
> Add i386 kexec/kdump implementation.
>
> v2 - suggestions/fixes:
>     - allocate transition page table pages below 4 GiB
>       (suggested by Jan Beulich).
>

Why?

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2012-12-27  2:18 ` Daniel Kiper
  (?)
@ 2012-12-27  4:02   ` H. Peter Anvin
  -1 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2012-12-27  4:02 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: andrew.cooper3, ebiederm, jbeulich, konrad.wilk, maxim.uvarov,
	mingo, tglx, vgoyal, x86, kexec, linux-kernel, virtualization,
	xen-devel

On 12/26/2012 06:18 PM, Daniel Kiper wrote:
> Hi,
>
> This set of patches contains initial kexec/kdump implementation for Xen v3.
> Currently only dom0 is supported, however, almost all infrustructure
> required for domU support is ready.
>
> Jan Beulich suggested to merge Xen x86 assembler code with baremetal x86 code.
> This could simplify and reduce a bit size of kernel code. However, this solution
> requires some changes in baremetal x86 code. First of all code which establishes
> transition page table should be moved back from machine_kexec_$(BITS).c to
> relocate_kernel_$(BITS).S. Another important thing which should be changed in that
> case is format of page_list array. Xen kexec hypercall requires to alternate physical
> addresses with virtual ones. These and other required stuff have not been done in that
> version because I am not sure that solution will be accepted by kexec/kdump maintainers.
> I hope that this email spark discussion about that topic.
>

I want a detailed list of the constraints that this assumes and 
therefore imposes on the native implementation as a result of this.  We 
have had way too many patches where Xen PV hacks effectively nailgun 
arbitrary, and sometimes poor, design decisions in place and now we 
can't fix them.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2012-12-27  4:02   ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2012-12-27  4:02 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	virtualization, mingo, ebiederm, jbeulich, maxim.uvarov, tglx,
	vgoyal

On 12/26/2012 06:18 PM, Daniel Kiper wrote:
> Hi,
>
> This set of patches contains initial kexec/kdump implementation for Xen v3.
> Currently only dom0 is supported, however, almost all infrustructure
> required for domU support is ready.
>
> Jan Beulich suggested to merge Xen x86 assembler code with baremetal x86 code.
> This could simplify and reduce a bit size of kernel code. However, this solution
> requires some changes in baremetal x86 code. First of all code which establishes
> transition page table should be moved back from machine_kexec_$(BITS).c to
> relocate_kernel_$(BITS).S. Another important thing which should be changed in that
> case is format of page_list array. Xen kexec hypercall requires to alternate physical
> addresses with virtual ones. These and other required stuff have not been done in that
> version because I am not sure that solution will be accepted by kexec/kdump maintainers.
> I hope that this email spark discussion about that topic.
>

I want a detailed list of the constraints that this assumes and 
therefore imposes on the native implementation as a result of this.  We 
have had way too many patches where Xen PV hacks effectively nailgun 
arbitrary, and sometimes poor, design decisions in place and now we 
can't fix them.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2012-12-27  4:02   ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2012-12-27  4:02 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	virtualization, mingo, ebiederm, jbeulich, maxim.uvarov, tglx,
	vgoyal

On 12/26/2012 06:18 PM, Daniel Kiper wrote:
> Hi,
>
> This set of patches contains initial kexec/kdump implementation for Xen v3.
> Currently only dom0 is supported, however, almost all infrustructure
> required for domU support is ready.
>
> Jan Beulich suggested to merge Xen x86 assembler code with baremetal x86 code.
> This could simplify and reduce a bit size of kernel code. However, this solution
> requires some changes in baremetal x86 code. First of all code which establishes
> transition page table should be moved back from machine_kexec_$(BITS).c to
> relocate_kernel_$(BITS).S. Another important thing which should be changed in that
> case is format of page_list array. Xen kexec hypercall requires to alternate physical
> addresses with virtual ones. These and other required stuff have not been done in that
> version because I am not sure that solution will be accepted by kexec/kdump maintainers.
> I hope that this email spark discussion about that topic.
>

I want a detailed list of the constraints that this assumes and 
therefore imposes on the native implementation as a result of this.  We 
have had way too many patches where Xen PV hacks effectively nailgun 
arbitrary, and sometimes poor, design decisions in place and now we 
can't fix them.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 01/11] kexec: introduce kexec firmware support
  2012-12-27  2:18   ` Daniel Kiper
  (?)
@ 2012-12-27  4:46     ` Eric W. Biederman
  -1 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2012-12-27  4:46 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: andrew.cooper3, hpa, jbeulich, konrad.wilk, maxim.uvarov, mingo,
	tglx, vgoyal, x86, kexec, linux-kernel, virtualization,
	xen-devel

Daniel Kiper <daniel.kiper@oracle.com> writes:

> Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default
> Linux infrastructure and require some support from firmware and/or hypervisor.
> To cope with that problem kexec firmware infrastructure was introduced.
> It allows a developer to use all kexec/kdump features of given firmware
> or hypervisor.

As this stands this patch is wrong.

You need to pass an additional flag from userspace through /sbin/kexec
that says load the kexec image in the firmware.  A global variable here
is not ok.

As I understand it you are loading a kexec on xen panic image.  Which
is semantically different from a kexec on linux panic image.  It is not
ok to do have a silly global variable kexec_use_firmware.

Furthermore it is not ok to have a conditional code outside of header
files.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 01/11] kexec: introduce kexec firmware support
@ 2012-12-27  4:46     ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2012-12-27  4:46 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: x86, konrad.wilk, andrew.cooper3, hpa, kexec, linux-kernel,
	virtualization, mingo, jbeulich, maxim.uvarov, tglx, xen-devel,
	vgoyal

Daniel Kiper <daniel.kiper@oracle.com> writes:

> Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default
> Linux infrastructure and require some support from firmware and/or hypervisor.
> To cope with that problem kexec firmware infrastructure was introduced.
> It allows a developer to use all kexec/kdump features of given firmware
> or hypervisor.

As this stands this patch is wrong.

You need to pass an additional flag from userspace through /sbin/kexec
that says load the kexec image in the firmware.  A global variable here
is not ok.

As I understand it you are loading a kexec on xen panic image.  Which
is semantically different from a kexec on linux panic image.  It is not
ok to do have a silly global variable kexec_use_firmware.

Furthermore it is not ok to have a conditional code outside of header
files.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 01/11] kexec: introduce kexec firmware support
@ 2012-12-27  4:46     ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2012-12-27  4:46 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: x86, konrad.wilk, andrew.cooper3, hpa, kexec, linux-kernel,
	virtualization, mingo, jbeulich, maxim.uvarov, tglx, xen-devel,
	vgoyal

Daniel Kiper <daniel.kiper@oracle.com> writes:

> Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default
> Linux infrastructure and require some support from firmware and/or hypervisor.
> To cope with that problem kexec firmware infrastructure was introduced.
> It allows a developer to use all kexec/kdump features of given firmware
> or hypervisor.

As this stands this patch is wrong.

You need to pass an additional flag from userspace through /sbin/kexec
that says load the kexec image in the firmware.  A global variable here
is not ok.

As I understand it you are loading a kexec on xen panic image.  Which
is semantically different from a kexec on linux panic image.  It is not
ok to do have a silly global variable kexec_use_firmware.

Furthermore it is not ok to have a conditional code outside of header
files.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2012-12-27  4:02   ` H. Peter Anvin
  (?)
@ 2012-12-27  7:53     ` Eric W. Biederman
  -1 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2012-12-27  7:53 UTC (permalink / raw)
  To: H. Peter Anvin, Daniel Kiper
  Cc: andrew.cooper3, jbeulich, konrad.wilk, maxim.uvarov, mingo, tglx,
	vgoyal, x86, kexec, linux-kernel, virtualization, xen-devel

The syscall ABI still has the wrong semantics.

Aka totally unmaintainable and umergeable.

The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.

I expect a lot of decisions about what code can be shared and what code can't is going to be driven by the simple question what does the syscall mean.

Sharing machine_kexec.c and relocate_kernel.S does not make much sense to me when what you are doing is effectively passing your arguments through to the Xen version of kexec.

Either Xen has it's own version of those routines or I expect the Xen version of kexec is buggy.   I can't imagine what sharing that code would mean.  By the same token I can't any need to duplicate the code either.

Furthermore since this is just passing data from one version of the syscall to another I expect you can share the majority of the code across all architectures that implement Xen.  The only part I can see being arch specific is the Xen syscall stub.

With respect to the proposed semantics of silently giving the kexec system call different meaning when running under Xen,
/sbin/kexec has to act somewhat differently when loading code into the Xen hypervisor so there is no point not making that explicit in the ABI.

Eric


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2012-12-27  7:53     ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2012-12-27  7:53 UTC (permalink / raw)
  To: H. Peter Anvin, Daniel Kiper
  Cc: xen-devel, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	virtualization, mingo, jbeulich, maxim.uvarov, tglx, vgoyal

The syscall ABI still has the wrong semantics.

Aka totally unmaintainable and umergeable.

The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.

I expect a lot of decisions about what code can be shared and what code can't is going to be driven by the simple question what does the syscall mean.

Sharing machine_kexec.c and relocate_kernel.S does not make much sense to me when what you are doing is effectively passing your arguments through to the Xen version of kexec.

Either Xen has it's own version of those routines or I expect the Xen version of kexec is buggy.   I can't imagine what sharing that code would mean.  By the same token I can't any need to duplicate the code either.

Furthermore since this is just passing data from one version of the syscall to another I expect you can share the majority of the code across all architectures that implement Xen.  The only part I can see being arch specific is the Xen syscall stub.

With respect to the proposed semantics of silently giving the kexec system call different meaning when running under Xen,
/sbin/kexec has to act somewhat differently when loading code into the Xen hypervisor so there is no point not making that explicit in the ABI.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2012-12-27  7:53     ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2012-12-27  7:53 UTC (permalink / raw)
  To: H. Peter Anvin, Daniel Kiper
  Cc: xen-devel, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	virtualization, mingo, jbeulich, maxim.uvarov, tglx, vgoyal

The syscall ABI still has the wrong semantics.

Aka totally unmaintainable and umergeable.

The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.

I expect a lot of decisions about what code can be shared and what code can't is going to be driven by the simple question what does the syscall mean.

Sharing machine_kexec.c and relocate_kernel.S does not make much sense to me when what you are doing is effectively passing your arguments through to the Xen version of kexec.

Either Xen has it's own version of those routines or I expect the Xen version of kexec is buggy.   I can't imagine what sharing that code would mean.  By the same token I can't any need to duplicate the code either.

Furthermore since this is just passing data from one version of the syscall to another I expect you can share the majority of the code across all architectures that implement Xen.  The only part I can see being arch specific is the Xen syscall stub.

With respect to the proposed semantics of silently giving the kexec system call different meaning when running under Xen,
/sbin/kexec has to act somewhat differently when loading code into the Xen hypervisor so there is no point not making that explicit in the ABI.

Eric


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2012-12-27  7:53     ` Eric W. Biederman
  (?)
@ 2012-12-27 14:18       ` Andrew Cooper
  -1 siblings, 0 replies; 217+ messages in thread
From: Andrew Cooper @ 2012-12-27 14:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: H. Peter Anvin, Daniel Kiper, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel

On 27/12/2012 07:53, Eric W. Biederman wrote:
> The syscall ABI still has the wrong semantics.
>
> Aka totally unmaintainable and umergeable.
>
> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.

There are two requirements pulling at this patch series, but I agree
that we need to clarify them.

When dom0 loads a crash kernel, it is loading one for Xen to use.  As a
dom0 crash causes a Xen crash, having dom0 set up a kdump kernel for
itself is completely useless.  This ability is present in "classic Xen
dom0" kernels, but the feature is currently missing in PVOPS.

Many cloud customers and service providers want the ability for a VM
administrator to be able to load a kdump/kexec kernel within a
domain[1].  This allows the VM administrator to take more proactive
steps to isolate the cause of a crash, the state of which is most likely
discarded while tearing down the domain.  The result being that as far
as Xen is concerned, the domain is still alive, while the kdump
kernel/environment can work its usual magic.  I am not aware of any
feature like this existing in the past.

~Andrew

[1] http://lists.xen.org/archives/html/xen-devel/2012-11/msg01274.html

>
> I expect a lot of decisions about what code can be shared and what code can't is going to be driven by the simple question what does the syscall mean.
>
> Sharing machine_kexec.c and relocate_kernel.S does not make much sense to me when what you are doing is effectively passing your arguments through to the Xen version of kexec.
>
> Either Xen has it's own version of those routines or I expect the Xen version of kexec is buggy.   I can't imagine what sharing that code would mean.  By the same token I can't any need to duplicate the code either.
>
> Furthermore since this is just passing data from one version of the syscall to another I expect you can share the majority of the code across all architectures that implement Xen.  The only part I can see being arch specific is the Xen syscall stub.
>
> With respect to the proposed semantics of silently giving the kexec system call different meaning when running under Xen,
> /sbin/kexec has to act somewhat differently when loading code into the Xen hypervisor so there is no point not making that explicit in the ABI.
>
> Eric
>


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2012-12-27 14:18       ` Andrew Cooper
  0 siblings, 0 replies; 217+ messages in thread
From: Andrew Cooper @ 2012-12-27 14:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: x86, konrad.wilk, Daniel Kiper, H. Peter Anvin, kexec,
	linux-kernel, virtualization, mingo, jbeulich, maxim.uvarov,
	tglx, xen-devel, vgoyal

On 27/12/2012 07:53, Eric W. Biederman wrote:
> The syscall ABI still has the wrong semantics.
>
> Aka totally unmaintainable and umergeable.
>
> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.

There are two requirements pulling at this patch series, but I agree
that we need to clarify them.

When dom0 loads a crash kernel, it is loading one for Xen to use.  As a
dom0 crash causes a Xen crash, having dom0 set up a kdump kernel for
itself is completely useless.  This ability is present in "classic Xen
dom0" kernels, but the feature is currently missing in PVOPS.

Many cloud customers and service providers want the ability for a VM
administrator to be able to load a kdump/kexec kernel within a
domain[1].  This allows the VM administrator to take more proactive
steps to isolate the cause of a crash, the state of which is most likely
discarded while tearing down the domain.  The result being that as far
as Xen is concerned, the domain is still alive, while the kdump
kernel/environment can work its usual magic.  I am not aware of any
feature like this existing in the past.

~Andrew

[1] http://lists.xen.org/archives/html/xen-devel/2012-11/msg01274.html

>
> I expect a lot of decisions about what code can be shared and what code can't is going to be driven by the simple question what does the syscall mean.
>
> Sharing machine_kexec.c and relocate_kernel.S does not make much sense to me when what you are doing is effectively passing your arguments through to the Xen version of kexec.
>
> Either Xen has it's own version of those routines or I expect the Xen version of kexec is buggy.   I can't imagine what sharing that code would mean.  By the same token I can't any need to duplicate the code either.
>
> Furthermore since this is just passing data from one version of the syscall to another I expect you can share the majority of the code across all architectures that implement Xen.  The only part I can see being arch specific is the Xen syscall stub.
>
> With respect to the proposed semantics of silently giving the kexec system call different meaning when running under Xen,
> /sbin/kexec has to act somewhat differently when loading code into the Xen hypervisor so there is no point not making that explicit in the ABI.
>
> Eric
>

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2012-12-27 14:18       ` Andrew Cooper
  0 siblings, 0 replies; 217+ messages in thread
From: Andrew Cooper @ 2012-12-27 14:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: x86, konrad.wilk, Daniel Kiper, H. Peter Anvin, kexec,
	linux-kernel, virtualization, mingo, jbeulich, maxim.uvarov,
	tglx, xen-devel, vgoyal

On 27/12/2012 07:53, Eric W. Biederman wrote:
> The syscall ABI still has the wrong semantics.
>
> Aka totally unmaintainable and umergeable.
>
> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.

There are two requirements pulling at this patch series, but I agree
that we need to clarify them.

When dom0 loads a crash kernel, it is loading one for Xen to use.  As a
dom0 crash causes a Xen crash, having dom0 set up a kdump kernel for
itself is completely useless.  This ability is present in "classic Xen
dom0" kernels, but the feature is currently missing in PVOPS.

Many cloud customers and service providers want the ability for a VM
administrator to be able to load a kdump/kexec kernel within a
domain[1].  This allows the VM administrator to take more proactive
steps to isolate the cause of a crash, the state of which is most likely
discarded while tearing down the domain.  The result being that as far
as Xen is concerned, the domain is still alive, while the kdump
kernel/environment can work its usual magic.  I am not aware of any
feature like this existing in the past.

~Andrew

[1] http://lists.xen.org/archives/html/xen-devel/2012-11/msg01274.html

>
> I expect a lot of decisions about what code can be shared and what code can't is going to be driven by the simple question what does the syscall mean.
>
> Sharing machine_kexec.c and relocate_kernel.S does not make much sense to me when what you are doing is effectively passing your arguments through to the Xen version of kexec.
>
> Either Xen has it's own version of those routines or I expect the Xen version of kexec is buggy.   I can't imagine what sharing that code would mean.  By the same token I can't any need to duplicate the code either.
>
> Furthermore since this is just passing data from one version of the syscall to another I expect you can share the majority of the code across all architectures that implement Xen.  The only part I can see being arch specific is the Xen syscall stub.
>
> With respect to the proposed semantics of silently giving the kexec system call different meaning when running under Xen,
> /sbin/kexec has to act somewhat differently when loading code into the Xen hypervisor so there is no point not making that explicit in the ABI.
>
> Eric
>


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2012-12-27 14:18       ` Andrew Cooper
  (?)
@ 2012-12-27 18:02         ` Eric W. Biederman
  -1 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2012-12-27 18:02 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: H. Peter Anvin, Daniel Kiper, jbeulich, konrad.wilk,
	maxim.uvarov, mingo, tglx, vgoyal, x86, kexec, linux-kernel,
	virtualization, xen-devel

Andrew Cooper <andrew.cooper3@citrix.com> writes:

> On 27/12/2012 07:53, Eric W. Biederman wrote:
>> The syscall ABI still has the wrong semantics.
>>
>> Aka totally unmaintainable and umergeable.
>>
>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
>
> There are two requirements pulling at this patch series, but I agree
> that we need to clarify them.

It probably make sense to split them apart a little even.

> When dom0 loads a crash kernel, it is loading one for Xen to use.  As a
> dom0 crash causes a Xen crash, having dom0 set up a kdump kernel for
> itself is completely useless.  This ability is present in "classic Xen
> dom0" kernels, but the feature is currently missing in PVOPS.

> Many cloud customers and service providers want the ability for a VM
> administrator to be able to load a kdump/kexec kernel within a
> domain[1].  This allows the VM administrator to take more proactive
> steps to isolate the cause of a crash, the state of which is most likely
> discarded while tearing down the domain.  The result being that as far
> as Xen is concerned, the domain is still alive, while the kdump
> kernel/environment can work its usual magic.  I am not aware of any
> feature like this existing in the past.

Which makes domU support semantically just the normal kexec/kdump
support.  Got it.

The point of implementing domU is for those times when the hypervisor
admin and the kernel admin are different.

For domU support modifying or adding alternate versions of
machine_kexec.c and relocate_kernel.S to add paravirtualization support
make sense.

There is the practical argument that for implementation efficiency of
crash dumps it would be better if that support came from the hypervisor
or the hypervisor environment.  But this gets into the practical reality
that the hypervisor environment does not do that today.  Furthermore
kexec all by itself working in a paravirtualized environment under Xen
makes sense.

domU support is what Peter was worrying about for cleanliness, and
we need some x86 backend ops there, and generally to be careful.


For dom0 support we need to extend the kexec_load system call, and
get it right.

When we are done I expect both dom0 and domU support of kexec to work
in dom0.  I don't know if the normal kexec or kdump case will ever make
sense in dom0 but there is no reason for that case to be broken.

> ~Andrew
>
> [1] http://lists.xen.org/archives/html/xen-devel/2012-11/msg01274.html

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2012-12-27 18:02         ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2012-12-27 18:02 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: x86, konrad.wilk, Daniel Kiper, H. Peter Anvin, kexec,
	linux-kernel, virtualization, mingo, jbeulich, maxim.uvarov,
	tglx, xen-devel, vgoyal

Andrew Cooper <andrew.cooper3@citrix.com> writes:

> On 27/12/2012 07:53, Eric W. Biederman wrote:
>> The syscall ABI still has the wrong semantics.
>>
>> Aka totally unmaintainable and umergeable.
>>
>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
>
> There are two requirements pulling at this patch series, but I agree
> that we need to clarify them.

It probably make sense to split them apart a little even.

> When dom0 loads a crash kernel, it is loading one for Xen to use.  As a
> dom0 crash causes a Xen crash, having dom0 set up a kdump kernel for
> itself is completely useless.  This ability is present in "classic Xen
> dom0" kernels, but the feature is currently missing in PVOPS.

> Many cloud customers and service providers want the ability for a VM
> administrator to be able to load a kdump/kexec kernel within a
> domain[1].  This allows the VM administrator to take more proactive
> steps to isolate the cause of a crash, the state of which is most likely
> discarded while tearing down the domain.  The result being that as far
> as Xen is concerned, the domain is still alive, while the kdump
> kernel/environment can work its usual magic.  I am not aware of any
> feature like this existing in the past.

Which makes domU support semantically just the normal kexec/kdump
support.  Got it.

The point of implementing domU is for those times when the hypervisor
admin and the kernel admin are different.

For domU support modifying or adding alternate versions of
machine_kexec.c and relocate_kernel.S to add paravirtualization support
make sense.

There is the practical argument that for implementation efficiency of
crash dumps it would be better if that support came from the hypervisor
or the hypervisor environment.  But this gets into the practical reality
that the hypervisor environment does not do that today.  Furthermore
kexec all by itself working in a paravirtualized environment under Xen
makes sense.

domU support is what Peter was worrying about for cleanliness, and
we need some x86 backend ops there, and generally to be careful.


For dom0 support we need to extend the kexec_load system call, and
get it right.

When we are done I expect both dom0 and domU support of kexec to work
in dom0.  I don't know if the normal kexec or kdump case will ever make
sense in dom0 but there is no reason for that case to be broken.

> ~Andrew
>
> [1] http://lists.xen.org/archives/html/xen-devel/2012-11/msg01274.html

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2012-12-27 18:02         ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2012-12-27 18:02 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: x86, konrad.wilk, Daniel Kiper, H. Peter Anvin, kexec,
	linux-kernel, virtualization, mingo, jbeulich, maxim.uvarov,
	tglx, xen-devel, vgoyal

Andrew Cooper <andrew.cooper3@citrix.com> writes:

> On 27/12/2012 07:53, Eric W. Biederman wrote:
>> The syscall ABI still has the wrong semantics.
>>
>> Aka totally unmaintainable and umergeable.
>>
>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
>
> There are two requirements pulling at this patch series, but I agree
> that we need to clarify them.

It probably make sense to split them apart a little even.

> When dom0 loads a crash kernel, it is loading one for Xen to use.  As a
> dom0 crash causes a Xen crash, having dom0 set up a kdump kernel for
> itself is completely useless.  This ability is present in "classic Xen
> dom0" kernels, but the feature is currently missing in PVOPS.

> Many cloud customers and service providers want the ability for a VM
> administrator to be able to load a kdump/kexec kernel within a
> domain[1].  This allows the VM administrator to take more proactive
> steps to isolate the cause of a crash, the state of which is most likely
> discarded while tearing down the domain.  The result being that as far
> as Xen is concerned, the domain is still alive, while the kdump
> kernel/environment can work its usual magic.  I am not aware of any
> feature like this existing in the past.

Which makes domU support semantically just the normal kexec/kdump
support.  Got it.

The point of implementing domU is for those times when the hypervisor
admin and the kernel admin are different.

For domU support modifying or adding alternate versions of
machine_kexec.c and relocate_kernel.S to add paravirtualization support
make sense.

There is the practical argument that for implementation efficiency of
crash dumps it would be better if that support came from the hypervisor
or the hypervisor environment.  But this gets into the practical reality
that the hypervisor environment does not do that today.  Furthermore
kexec all by itself working in a paravirtualized environment under Xen
makes sense.

domU support is what Peter was worrying about for cleanliness, and
we need some x86 backend ops there, and generally to be careful.


For dom0 support we need to extend the kexec_load system call, and
get it right.

When we are done I expect both dom0 and domU support of kexec to work
in dom0.  I don't know if the normal kexec or kdump case will ever make
sense in dom0 but there is no reason for that case to be broken.

> ~Andrew
>
> [1] http://lists.xen.org/archives/html/xen-devel/2012-11/msg01274.html

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 09/11] x86/xen/enlighten: Add init and crash kexec/kdump hooks
  2012-12-27  2:18                   ` Daniel Kiper
  (?)
@ 2012-12-27 18:53                     ` H. Peter Anvin
  -1 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2012-12-27 18:53 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: andrew.cooper3, ebiederm, jbeulich, konrad.wilk, maxim.uvarov,
	mingo, tglx, vgoyal, x86, kexec, linux-kernel, virtualization,
	xen-devel

On 12/26/2012 06:18 PM, Daniel Kiper wrote:
> Add init and crash kexec/kdump hooks.
>
> Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
> ---
>   arch/x86/xen/enlighten.c |   11 +++++++++++
>   1 files changed, 11 insertions(+), 0 deletions(-)

On the general issue of hooks:

Hooks need their pre- and postsemantics extremely ewell documented, let 
they end up being an impossible roadblock to development.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 09/11] x86/xen/enlighten: Add init and crash kexec/kdump hooks
@ 2012-12-27 18:53                     ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2012-12-27 18:53 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	virtualization, mingo, ebiederm, jbeulich, maxim.uvarov, tglx,
	vgoyal

On 12/26/2012 06:18 PM, Daniel Kiper wrote:
> Add init and crash kexec/kdump hooks.
>
> Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
> ---
>   arch/x86/xen/enlighten.c |   11 +++++++++++
>   1 files changed, 11 insertions(+), 0 deletions(-)

On the general issue of hooks:

Hooks need their pre- and postsemantics extremely ewell documented, let 
they end up being an impossible roadblock to development.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 09/11] x86/xen/enlighten: Add init and crash kexec/kdump hooks
@ 2012-12-27 18:53                     ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2012-12-27 18:53 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	virtualization, mingo, ebiederm, jbeulich, maxim.uvarov, tglx,
	vgoyal

On 12/26/2012 06:18 PM, Daniel Kiper wrote:
> Add init and crash kexec/kdump hooks.
>
> Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
> ---
>   arch/x86/xen/enlighten.c |   11 +++++++++++
>   1 files changed, 11 insertions(+), 0 deletions(-)

On the general issue of hooks:

Hooks need their pre- and postsemantics extremely ewell documented, let 
they end up being an impossible roadblock to development.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2012-12-27 18:02         ` Eric W. Biederman
  (?)
@ 2013-01-02 11:26           ` Andrew Cooper
  -1 siblings, 0 replies; 217+ messages in thread
From: Andrew Cooper @ 2013-01-02 11:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: x86, konrad.wilk, Daniel Kiper, H. Peter Anvin, kexec,
	linux-kernel, virtualization, mingo, jbeulich, maxim.uvarov,
	tglx, xen-devel, vgoyal

On 27/12/12 18:02, Eric W. Biederman wrote:
> Andrew Cooper<andrew.cooper3@citrix.com>  writes:
>
>> On 27/12/2012 07:53, Eric W. Biederman wrote:
>>> The syscall ABI still has the wrong semantics.
>>>
>>> Aka totally unmaintainable and umergeable.
>>>
>>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
>> There are two requirements pulling at this patch series, but I agree
>> that we need to clarify them.
> It probably make sense to split them apart a little even.
>
>

Thinking about this split, there might be a way to simply it even more.

/sbin/kexec can load the "Xen" crash kernel itself by issuing hypercalls 
using /dev/xen/privcmd.  This would remove the need for the dom0 kernel 
to distinguish between loading a crash kernel for itself and loading a 
kernel for Xen.

Or is this just a silly idea complicating the matter?

~Andrew

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-02 11:26           ` Andrew Cooper
  0 siblings, 0 replies; 217+ messages in thread
From: Andrew Cooper @ 2013-01-02 11:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: xen-devel, konrad.wilk, Daniel Kiper, x86, kexec, linux-kernel,
	virtualization, mingo, jbeulich, H. Peter Anvin, maxim.uvarov,
	tglx, vgoyal

On 27/12/12 18:02, Eric W. Biederman wrote:
> Andrew Cooper<andrew.cooper3@citrix.com>  writes:
>
>> On 27/12/2012 07:53, Eric W. Biederman wrote:
>>> The syscall ABI still has the wrong semantics.
>>>
>>> Aka totally unmaintainable and umergeable.
>>>
>>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
>> There are two requirements pulling at this patch series, but I agree
>> that we need to clarify them.
> It probably make sense to split them apart a little even.
>
>

Thinking about this split, there might be a way to simply it even more.

/sbin/kexec can load the "Xen" crash kernel itself by issuing hypercalls 
using /dev/xen/privcmd.  This would remove the need for the dom0 kernel 
to distinguish between loading a crash kernel for itself and loading a 
kernel for Xen.

Or is this just a silly idea complicating the matter?

~Andrew

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-02 11:26           ` Andrew Cooper
  0 siblings, 0 replies; 217+ messages in thread
From: Andrew Cooper @ 2013-01-02 11:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: xen-devel, konrad.wilk, Daniel Kiper, x86, kexec, linux-kernel,
	virtualization, mingo, jbeulich, H. Peter Anvin, maxim.uvarov,
	tglx, vgoyal

On 27/12/12 18:02, Eric W. Biederman wrote:
> Andrew Cooper<andrew.cooper3@citrix.com>  writes:
>
>> On 27/12/2012 07:53, Eric W. Biederman wrote:
>>> The syscall ABI still has the wrong semantics.
>>>
>>> Aka totally unmaintainable and umergeable.
>>>
>>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
>> There are two requirements pulling at this patch series, but I agree
>> that we need to clarify them.
> It probably make sense to split them apart a little even.
>
>

Thinking about this split, there might be a way to simply it even more.

/sbin/kexec can load the "Xen" crash kernel itself by issuing hypercalls 
using /dev/xen/privcmd.  This would remove the need for the dom0 kernel 
to distinguish between loading a crash kernel for itself and loading a 
kernel for Xen.

Or is this just a silly idea complicating the matter?

~Andrew

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-02 11:26           ` Andrew Cooper
  (?)
@ 2013-01-02 11:47             ` Eric W. Biederman
  -1 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2013-01-02 11:47 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: x86, konrad.wilk, Daniel Kiper, H. Peter Anvin, kexec,
	linux-kernel, virtualization, mingo, jbeulich, maxim.uvarov,
	tglx, xen-devel, vgoyal

Andrew Cooper <andrew.cooper3@citrix.com> writes:

> On 27/12/12 18:02, Eric W. Biederman wrote:
>> Andrew Cooper<andrew.cooper3@citrix.com>  writes:
>>
>>> On 27/12/2012 07:53, Eric W. Biederman wrote:
>>>> The syscall ABI still has the wrong semantics.
>>>>
>>>> Aka totally unmaintainable and umergeable.
>>>>
>>>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
>>> There are two requirements pulling at this patch series, but I agree
>>> that we need to clarify them.
>> It probably make sense to split them apart a little even.
>>
>>
>
> Thinking about this split, there might be a way to simply it even more.
>
> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> hypercalls using /dev/xen/privcmd.  This would remove the need for the
> dom0 kernel to distinguish between loading a crash kernel for itself
> and loading a kernel for Xen.
>
> Or is this just a silly idea complicating the matter?

At a first approximation it sounds reasonable.

If the Xen kexec actually copies the loaded kernel to somewhere internal
like the linux kexec that would be entirely reasonable.  If Xen has
other requirements on the dom0 case you might not be able to implement
the call without linux kernel support.

But if you can implement it all in terms of /dev/xen/privcmd go for it.

Eric


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-02 11:47             ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2013-01-02 11:47 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: xen-devel, konrad.wilk, Daniel Kiper, x86, kexec, linux-kernel,
	virtualization, mingo, jbeulich, H. Peter Anvin, maxim.uvarov,
	tglx, vgoyal

Andrew Cooper <andrew.cooper3@citrix.com> writes:

> On 27/12/12 18:02, Eric W. Biederman wrote:
>> Andrew Cooper<andrew.cooper3@citrix.com>  writes:
>>
>>> On 27/12/2012 07:53, Eric W. Biederman wrote:
>>>> The syscall ABI still has the wrong semantics.
>>>>
>>>> Aka totally unmaintainable and umergeable.
>>>>
>>>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
>>> There are two requirements pulling at this patch series, but I agree
>>> that we need to clarify them.
>> It probably make sense to split them apart a little even.
>>
>>
>
> Thinking about this split, there might be a way to simply it even more.
>
> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> hypercalls using /dev/xen/privcmd.  This would remove the need for the
> dom0 kernel to distinguish between loading a crash kernel for itself
> and loading a kernel for Xen.
>
> Or is this just a silly idea complicating the matter?

At a first approximation it sounds reasonable.

If the Xen kexec actually copies the loaded kernel to somewhere internal
like the linux kexec that would be entirely reasonable.  If Xen has
other requirements on the dom0 case you might not be able to implement
the call without linux kernel support.

But if you can implement it all in terms of /dev/xen/privcmd go for it.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-02 11:47             ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2013-01-02 11:47 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: xen-devel, konrad.wilk, Daniel Kiper, x86, kexec, linux-kernel,
	virtualization, mingo, jbeulich, H. Peter Anvin, maxim.uvarov,
	tglx, vgoyal

Andrew Cooper <andrew.cooper3@citrix.com> writes:

> On 27/12/12 18:02, Eric W. Biederman wrote:
>> Andrew Cooper<andrew.cooper3@citrix.com>  writes:
>>
>>> On 27/12/2012 07:53, Eric W. Biederman wrote:
>>>> The syscall ABI still has the wrong semantics.
>>>>
>>>> Aka totally unmaintainable and umergeable.
>>>>
>>>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
>>> There are two requirements pulling at this patch series, but I agree
>>> that we need to clarify them.
>> It probably make sense to split them apart a little even.
>>
>>
>
> Thinking about this split, there might be a way to simply it even more.
>
> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> hypercalls using /dev/xen/privcmd.  This would remove the need for the
> dom0 kernel to distinguish between loading a crash kernel for itself
> and loading a kernel for Xen.
>
> Or is this just a silly idea complicating the matter?

At a first approximation it sounds reasonable.

If the Xen kexec actually copies the loaded kernel to somewhere internal
like the linux kexec that would be entirely reasonable.  If Xen has
other requirements on the dom0 case you might not be able to implement
the call without linux kernel support.

But if you can implement it all in terms of /dev/xen/privcmd go for it.

Eric


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-02 15:27         ` Ian Campbell
  0 siblings, 0 replies; 217+ messages in thread
From: Ian Campbell @ 2013-01-02 15:27 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Eric W. Biederman, x86, konrad.wilk, Daniel Kiper,
	H. Peter Anvin, kexec, linux-kernel, virtualization, mingo,
	jbeulich, maxim.uvarov, tglx, xen-devel, vgoyal

On Thu, 2012-12-27 at 14:18 +0000, Andrew Cooper wrote:
> Many cloud customers and service providers want the ability for a VM
> administrator to be able to load a kdump/kexec kernel within a
> domain[1].  This allows the VM administrator to take more proactive
> steps to isolate the cause of a crash, the state of which is most likely
> discarded while tearing down the domain.  The result being that as far
> as Xen is concerned, the domain is still alive, while the kdump
> kernel/environment can work its usual magic.  I am not aware of any
> feature like this existing in the past.

I have a feeling that some versions of the classic-Xen port supported
domU kexec as well. Certainly there was some work on that back in 2005,
although I can't see much evidence that that attempt ever went anywhere
so maybe I'm imagining things.

It's possible that I'm confusing domU kexec support with support for
domU kexec in some dom0 kernels. That was/is used to support "kexec"
from a PV bootloader into the real kernel (which looks to the host a lot
like a domU kexec would).

Ian.


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2012-12-27 14:18       ` Andrew Cooper
                         ` (3 preceding siblings ...)
  (?)
@ 2013-01-02 15:27       ` Ian Campbell
  -1 siblings, 0 replies; 217+ messages in thread
From: Ian Campbell @ 2013-01-02 15:27 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: xen-devel, konrad.wilk, Daniel Kiper, x86, kexec, linux-kernel,
	virtualization, mingo, Eric W. Biederman, jbeulich,
	H. Peter Anvin, maxim.uvarov, tglx, vgoyal

On Thu, 2012-12-27 at 14:18 +0000, Andrew Cooper wrote:
> Many cloud customers and service providers want the ability for a VM
> administrator to be able to load a kdump/kexec kernel within a
> domain[1].  This allows the VM administrator to take more proactive
> steps to isolate the cause of a crash, the state of which is most likely
> discarded while tearing down the domain.  The result being that as far
> as Xen is concerned, the domain is still alive, while the kdump
> kernel/environment can work its usual magic.  I am not aware of any
> feature like this existing in the past.

I have a feeling that some versions of the classic-Xen port supported
domU kexec as well. Certainly there was some work on that back in 2005,
although I can't see much evidence that that attempt ever went anywhere
so maybe I'm imagining things.

It's possible that I'm confusing domU kexec support with support for
domU kexec in some dom0 kernels. That was/is used to support "kexec"
from a PV bootloader into the real kernel (which looks to the host a lot
like a domU kexec would).

Ian.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-02 15:27         ` Ian Campbell
  0 siblings, 0 replies; 217+ messages in thread
From: Ian Campbell @ 2013-01-02 15:27 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR,
	konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA, Daniel Kiper,
	x86-DgEjT+Ai2ygdnm+yROfE0A,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman,
	jbeulich-IBi9RG/b67k, H. Peter Anvin,
	maxim.uvarov-QHcLZuEGTsvQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA

On Thu, 2012-12-27 at 14:18 +0000, Andrew Cooper wrote:
> Many cloud customers and service providers want the ability for a VM
> administrator to be able to load a kdump/kexec kernel within a
> domain[1].  This allows the VM administrator to take more proactive
> steps to isolate the cause of a crash, the state of which is most likely
> discarded while tearing down the domain.  The result being that as far
> as Xen is concerned, the domain is still alive, while the kdump
> kernel/environment can work its usual magic.  I am not aware of any
> feature like this existing in the past.

I have a feeling that some versions of the classic-Xen port supported
domU kexec as well. Certainly there was some work on that back in 2005,
although I can't see much evidence that that attempt ever went anywhere
so maybe I'm imagining things.

It's possible that I'm confusing domU kexec support with support for
domU kexec in some dom0 kernels. That was/is used to support "kexec"
from a PV bootloader into the real kernel (which looks to the host a lot
like a domU kexec would).

Ian.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-02 15:27         ` Ian Campbell
  0 siblings, 0 replies; 217+ messages in thread
From: Ian Campbell @ 2013-01-02 15:27 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: xen-devel, konrad.wilk, Daniel Kiper, x86, kexec, linux-kernel,
	virtualization, mingo, Eric W. Biederman, jbeulich,
	H. Peter Anvin, maxim.uvarov, tglx, vgoyal

On Thu, 2012-12-27 at 14:18 +0000, Andrew Cooper wrote:
> Many cloud customers and service providers want the ability for a VM
> administrator to be able to load a kdump/kexec kernel within a
> domain[1].  This allows the VM administrator to take more proactive
> steps to isolate the cause of a crash, the state of which is most likely
> discarded while tearing down the domain.  The result being that as far
> as Xen is concerned, the domain is still alive, while the kdump
> kernel/environment can work its usual magic.  I am not aware of any
> feature like this existing in the past.

I have a feeling that some versions of the classic-Xen port supported
domU kexec as well. Certainly there was some work on that back in 2005,
although I can't see much evidence that that attempt ever went anywhere
so maybe I'm imagining things.

It's possible that I'm confusing domU kexec support with support for
domU kexec in some dom0 kernels. That was/is used to support "kexec"
from a PV bootloader into the real kernel (which looks to the host a lot
like a domU kexec would).

Ian.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-02 11:26           ` Andrew Cooper
  (?)
@ 2013-01-03  9:31             ` Jan Beulich
  -1 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-03  9:31 UTC (permalink / raw)
  To: Andrew Cooper, Eric W. Biederman
  Cc: x86, tglx, kexec, virtualization, xen-devel, Daniel Kiper,
	konrad.wilk, maxim.uvarov, mingo, vgoyal, linux-kernel,
	H. PeterAnvin

>>> On 02.01.13 at 12:26, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 27/12/12 18:02, Eric W. Biederman wrote:
>> It probably make sense to split them apart a little even.
> 
> Thinking about this split, there might be a way to simply it even more.
> 
> /sbin/kexec can load the "Xen" crash kernel itself by issuing hypercalls 
> using /dev/xen/privcmd.  This would remove the need for the dom0 kernel 
> to distinguish between loading a crash kernel for itself and loading a 
> kernel for Xen.
> 
> Or is this just a silly idea complicating the matter?

I don't think so (and suggested that before as a response to an
earlier submission of this patch set), and it would make most of
the discussion here mute.

Jan


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-03  9:31             ` Jan Beulich
  0 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-03  9:31 UTC (permalink / raw)
  To: Andrew Cooper, Eric W. Biederman
  Cc: xen-devel, H. PeterAnvin, konrad.wilk, Daniel Kiper, x86, kexec,
	linux-kernel, virtualization, mingo, maxim.uvarov, tglx, vgoyal

>>> On 02.01.13 at 12:26, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 27/12/12 18:02, Eric W. Biederman wrote:
>> It probably make sense to split them apart a little even.
> 
> Thinking about this split, there might be a way to simply it even more.
> 
> /sbin/kexec can load the "Xen" crash kernel itself by issuing hypercalls 
> using /dev/xen/privcmd.  This would remove the need for the dom0 kernel 
> to distinguish between loading a crash kernel for itself and loading a 
> kernel for Xen.
> 
> Or is this just a silly idea complicating the matter?

I don't think so (and suggested that before as a response to an
earlier submission of this patch set), and it would make most of
the discussion here mute.

Jan

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-03  9:31             ` Jan Beulich
  0 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-03  9:31 UTC (permalink / raw)
  To: Andrew Cooper, Eric W. Biederman
  Cc: xen-devel, H. PeterAnvin, konrad.wilk, Daniel Kiper, x86, kexec,
	linux-kernel, virtualization, mingo, maxim.uvarov, tglx, vgoyal

>>> On 02.01.13 at 12:26, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 27/12/12 18:02, Eric W. Biederman wrote:
>> It probably make sense to split them apart a little even.
> 
> Thinking about this split, there might be a way to simply it even more.
> 
> /sbin/kexec can load the "Xen" crash kernel itself by issuing hypercalls 
> using /dev/xen/privcmd.  This would remove the need for the dom0 kernel 
> to distinguish between loading a crash kernel for itself and loading a 
> kernel for Xen.
> 
> Or is this just a silly idea complicating the matter?

I don't think so (and suggested that before as a response to an
earlier submission of this patch set), and it would make most of
the discussion here mute.

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2012-12-27  2:18     ` Daniel Kiper
  (?)
@ 2013-01-03  9:34       ` Jan Beulich
  -1 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-03  9:34 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: andrew.cooper3, x86, tglx, kexec, virtualization, xen-devel,
	konrad.wilk, maxim.uvarov, mingo, vgoyal, linux-kernel, ebiederm,
	hpa

>>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> Some implementations (e.g. Xen PVOPS) could not use part of identity page 
> table
> to construct transition page table. It means that they require separate 
> PUDs,
> PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
> requirement add extra pointer to PGD, PUD, PMD and PTE and align existing 
> code.

So you keep posting this despite it having got pointed out on each
earlier submission that this is unnecessary, proven by the fact that
the non-pvops Xen kernels can get away without it. Why?

Jan


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2013-01-03  9:34       ` Jan Beulich
  0 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-03  9:34 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

>>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> Some implementations (e.g. Xen PVOPS) could not use part of identity page 
> table
> to construct transition page table. It means that they require separate 
> PUDs,
> PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
> requirement add extra pointer to PGD, PUD, PMD and PTE and align existing 
> code.

So you keep posting this despite it having got pointed out on each
earlier submission that this is unnecessary, proven by the fact that
the non-pvops Xen kernels can get away without it. Why?

Jan

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2012-12-27  2:18     ` Daniel Kiper
                       ` (4 preceding siblings ...)
  (?)
@ 2013-01-03  9:34     ` Jan Beulich
  -1 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-03  9:34 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

>>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> Some implementations (e.g. Xen PVOPS) could not use part of identity page 
> table
> to construct transition page table. It means that they require separate 
> PUDs,
> PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
> requirement add extra pointer to PGD, PUD, PMD and PTE and align existing 
> code.

So you keep posting this despite it having got pointed out on each
earlier submission that this is unnecessary, proven by the fact that
the non-pvops Xen kernels can get away without it. Why?

Jan

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2013-01-03  9:34       ` Jan Beulich
  0 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-03  9:34 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

>>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> Some implementations (e.g. Xen PVOPS) could not use part of identity page 
> table
> to construct transition page table. It means that they require separate 
> PUDs,
> PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
> requirement add extra pointer to PGD, PUD, PMD and PTE and align existing 
> code.

So you keep posting this despite it having got pointed out on each
earlier submission that this is unnecessary, proven by the fact that
the non-pvops Xen kernels can get away without it. Why?

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-02 11:26           ` Andrew Cooper
  (?)
@ 2013-01-04 14:22             ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-04 14:22 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Eric W. Biederman, x86, konrad.wilk, H. Peter Anvin, kexec,
	linux-kernel, virtualization, mingo, jbeulich, maxim.uvarov,
	tglx, xen-devel, vgoyal

On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> On 27/12/12 18:02, Eric W. Biederman wrote:
> >Andrew Cooper<andrew.cooper3@citrix.com>  writes:
> >
> >>On 27/12/2012 07:53, Eric W. Biederman wrote:
> >>>The syscall ABI still has the wrong semantics.
> >>>
> >>>Aka totally unmaintainable and umergeable.
> >>>
> >>>The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
> >>There are two requirements pulling at this patch series, but I agree
> >>that we need to clarify them.
> >It probably make sense to split them apart a little even.
> >
> >
>
> Thinking about this split, there might be a way to simply it even more.
>
> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> hypercalls using /dev/xen/privcmd.  This would remove the need for
> the dom0 kernel to distinguish between loading a crash kernel for
> itself and loading a kernel for Xen.
>
> Or is this just a silly idea complicating the matter?

This is impossible with current Xen kexec/kdump interface.
It should be changed to do that. However, I suppose that
Xen community would not be interested in such changes.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-04 14:22             ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-04 14:22 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: xen-devel, konrad.wilk, maxim.uvarov, x86, kexec, linux-kernel,
	virtualization, mingo, Eric W. Biederman, jbeulich,
	H. Peter Anvin, tglx, vgoyal

On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> On 27/12/12 18:02, Eric W. Biederman wrote:
> >Andrew Cooper<andrew.cooper3@citrix.com>  writes:
> >
> >>On 27/12/2012 07:53, Eric W. Biederman wrote:
> >>>The syscall ABI still has the wrong semantics.
> >>>
> >>>Aka totally unmaintainable and umergeable.
> >>>
> >>>The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
> >>There are two requirements pulling at this patch series, but I agree
> >>that we need to clarify them.
> >It probably make sense to split them apart a little even.
> >
> >
>
> Thinking about this split, there might be a way to simply it even more.
>
> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> hypercalls using /dev/xen/privcmd.  This would remove the need for
> the dom0 kernel to distinguish between loading a crash kernel for
> itself and loading a kernel for Xen.
>
> Or is this just a silly idea complicating the matter?

This is impossible with current Xen kexec/kdump interface.
It should be changed to do that. However, I suppose that
Xen community would not be interested in such changes.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-04 14:22             ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-04 14:22 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: xen-devel, konrad.wilk, maxim.uvarov, x86, kexec, linux-kernel,
	virtualization, mingo, Eric W. Biederman, jbeulich,
	H. Peter Anvin, tglx, vgoyal

On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> On 27/12/12 18:02, Eric W. Biederman wrote:
> >Andrew Cooper<andrew.cooper3@citrix.com>  writes:
> >
> >>On 27/12/2012 07:53, Eric W. Biederman wrote:
> >>>The syscall ABI still has the wrong semantics.
> >>>
> >>>Aka totally unmaintainable and umergeable.
> >>>
> >>>The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
> >>There are two requirements pulling at this patch series, but I agree
> >>that we need to clarify them.
> >It probably make sense to split them apart a little even.
> >
> >
>
> Thinking about this split, there might be a way to simply it even more.
>
> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> hypercalls using /dev/xen/privcmd.  This would remove the need for
> the dom0 kernel to distinguish between loading a crash kernel for
> itself and loading a kernel for Xen.
>
> Or is this just a silly idea complicating the matter?

This is impossible with current Xen kexec/kdump interface.
It should be changed to do that. However, I suppose that
Xen community would not be interested in such changes.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-04 14:22             ` Daniel Kiper
  (?)
@ 2013-01-04 14:34               ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 217+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-04 14:34 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: Andrew Cooper, Eric W. Biederman, x86, H. Peter Anvin, kexec,
	linux-kernel, virtualization, mingo, jbeulich, maxim.uvarov,
	tglx, xen-devel, vgoyal

On Fri, Jan 04, 2013 at 03:22:57PM +0100, Daniel Kiper wrote:
> On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> > On 27/12/12 18:02, Eric W. Biederman wrote:
> > >Andrew Cooper<andrew.cooper3@citrix.com>  writes:
> > >
> > >>On 27/12/2012 07:53, Eric W. Biederman wrote:
> > >>>The syscall ABI still has the wrong semantics.
> > >>>
> > >>>Aka totally unmaintainable and umergeable.
> > >>>
> > >>>The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
> > >>There are two requirements pulling at this patch series, but I agree
> > >>that we need to clarify them.
> > >It probably make sense to split them apart a little even.
> > >
> > >
> >
> > Thinking about this split, there might be a way to simply it even more.
> >
> > /sbin/kexec can load the "Xen" crash kernel itself by issuing
> > hypercalls using /dev/xen/privcmd.  This would remove the need for
> > the dom0 kernel to distinguish between loading a crash kernel for
> > itself and loading a kernel for Xen.
> >
> > Or is this just a silly idea complicating the matter?
> 
> This is impossible with current Xen kexec/kdump interface.
> It should be changed to do that. However, I suppose that
> Xen community would not be interested in such changes.

Why not? What is involved in it? IMHO I believe anybody would
welcome a new clean design that solves this thorny problem?

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-04 14:34               ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 217+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-04 14:34 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, Andrew Cooper, maxim.uvarov, x86, kexec, linux-kernel,
	virtualization, mingo, Eric W. Biederman, jbeulich,
	H. Peter Anvin, tglx, vgoyal

On Fri, Jan 04, 2013 at 03:22:57PM +0100, Daniel Kiper wrote:
> On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> > On 27/12/12 18:02, Eric W. Biederman wrote:
> > >Andrew Cooper<andrew.cooper3@citrix.com>  writes:
> > >
> > >>On 27/12/2012 07:53, Eric W. Biederman wrote:
> > >>>The syscall ABI still has the wrong semantics.
> > >>>
> > >>>Aka totally unmaintainable and umergeable.
> > >>>
> > >>>The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
> > >>There are two requirements pulling at this patch series, but I agree
> > >>that we need to clarify them.
> > >It probably make sense to split them apart a little even.
> > >
> > >
> >
> > Thinking about this split, there might be a way to simply it even more.
> >
> > /sbin/kexec can load the "Xen" crash kernel itself by issuing
> > hypercalls using /dev/xen/privcmd.  This would remove the need for
> > the dom0 kernel to distinguish between loading a crash kernel for
> > itself and loading a kernel for Xen.
> >
> > Or is this just a silly idea complicating the matter?
> 
> This is impossible with current Xen kexec/kdump interface.
> It should be changed to do that. However, I suppose that
> Xen community would not be interested in such changes.

Why not? What is involved in it? IMHO I believe anybody would
welcome a new clean design that solves this thorny problem?

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-04 14:34               ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 217+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-04 14:34 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, Andrew Cooper, maxim.uvarov, x86, kexec, linux-kernel,
	virtualization, mingo, Eric W. Biederman, jbeulich,
	H. Peter Anvin, tglx, vgoyal

On Fri, Jan 04, 2013 at 03:22:57PM +0100, Daniel Kiper wrote:
> On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> > On 27/12/12 18:02, Eric W. Biederman wrote:
> > >Andrew Cooper<andrew.cooper3@citrix.com>  writes:
> > >
> > >>On 27/12/2012 07:53, Eric W. Biederman wrote:
> > >>>The syscall ABI still has the wrong semantics.
> > >>>
> > >>>Aka totally unmaintainable and umergeable.
> > >>>
> > >>>The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
> > >>There are two requirements pulling at this patch series, but I agree
> > >>that we need to clarify them.
> > >It probably make sense to split them apart a little even.
> > >
> > >
> >
> > Thinking about this split, there might be a way to simply it even more.
> >
> > /sbin/kexec can load the "Xen" crash kernel itself by issuing
> > hypercalls using /dev/xen/privcmd.  This would remove the need for
> > the dom0 kernel to distinguish between loading a crash kernel for
> > itself and loading a kernel for Xen.
> >
> > Or is this just a silly idea complicating the matter?
> 
> This is impossible with current Xen kexec/kdump interface.
> It should be changed to do that. However, I suppose that
> Xen community would not be interested in such changes.

Why not? What is involved in it? IMHO I believe anybody would
welcome a new clean design that solves this thorny problem?

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-04 14:22             ` Daniel Kiper
  (?)
@ 2013-01-04 14:34               ` Ian Campbell
  -1 siblings, 0 replies; 217+ messages in thread
From: Ian Campbell @ 2013-01-04 14:34 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: Andrew Cooper, xen-devel, konrad.wilk, maxim.uvarov, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman, jbeulich,
	H. Peter Anvin, tglx, vgoyal

On Fri, 2013-01-04 at 14:22 +0000, Daniel Kiper wrote:
> On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> > On 27/12/12 18:02, Eric W. Biederman wrote:
> > >Andrew Cooper<andrew.cooper3@citrix.com>  writes:
> > >
> > >>On 27/12/2012 07:53, Eric W. Biederman wrote:
> > >>>The syscall ABI still has the wrong semantics.
> > >>>
> > >>>Aka totally unmaintainable and umergeable.
> > >>>
> > >>>The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
> > >>There are two requirements pulling at this patch series, but I agree
> > >>that we need to clarify them.
> > >It probably make sense to split them apart a little even.
> > >
> > >
> >
> > Thinking about this split, there might be a way to simply it even more.
> >
> > /sbin/kexec can load the "Xen" crash kernel itself by issuing
> > hypercalls using /dev/xen/privcmd.  This would remove the need for
> > the dom0 kernel to distinguish between loading a crash kernel for
> > itself and loading a kernel for Xen.
> >
> > Or is this just a silly idea complicating the matter?
> 
> This is impossible with current Xen kexec/kdump interface.
> It should be changed to do that. However, I suppose that
> Xen community would not be interested in such changes.

The current HYPERVISOR_kexec interface is pretty fricken bad (it
basically hardcodes the Linux Circa-2.6.18 internal interface!).

I'd be all for a new HYPERVISOR_kexec (with the old gaining a _compat
suffix) which implements something more generic that isn't tied to a
particular dom0 kernel implementation (be it differing versions of Linux
or e.g. *BSD).

If that enables /sbin/kexec to load the kernel directly then so much the
better, assuming the /sbin/kexec maintainers are happy with that
approach.

Ian.


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-04 14:34               ` Ian Campbell
  0 siblings, 0 replies; 217+ messages in thread
From: Ian Campbell @ 2013-01-04 14:34 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, H. Peter Anvin, konrad.wilk, Andrew Cooper, x86,
	kexec, linux-kernel, virtualization, mingo, Eric W. Biederman,
	jbeulich, maxim.uvarov, tglx, vgoyal

On Fri, 2013-01-04 at 14:22 +0000, Daniel Kiper wrote:
> On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> > On 27/12/12 18:02, Eric W. Biederman wrote:
> > >Andrew Cooper<andrew.cooper3@citrix.com>  writes:
> > >
> > >>On 27/12/2012 07:53, Eric W. Biederman wrote:
> > >>>The syscall ABI still has the wrong semantics.
> > >>>
> > >>>Aka totally unmaintainable and umergeable.
> > >>>
> > >>>The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
> > >>There are two requirements pulling at this patch series, but I agree
> > >>that we need to clarify them.
> > >It probably make sense to split them apart a little even.
> > >
> > >
> >
> > Thinking about this split, there might be a way to simply it even more.
> >
> > /sbin/kexec can load the "Xen" crash kernel itself by issuing
> > hypercalls using /dev/xen/privcmd.  This would remove the need for
> > the dom0 kernel to distinguish between loading a crash kernel for
> > itself and loading a kernel for Xen.
> >
> > Or is this just a silly idea complicating the matter?
> 
> This is impossible with current Xen kexec/kdump interface.
> It should be changed to do that. However, I suppose that
> Xen community would not be interested in such changes.

The current HYPERVISOR_kexec interface is pretty fricken bad (it
basically hardcodes the Linux Circa-2.6.18 internal interface!).

I'd be all for a new HYPERVISOR_kexec (with the old gaining a _compat
suffix) which implements something more generic that isn't tied to a
particular dom0 kernel implementation (be it differing versions of Linux
or e.g. *BSD).

If that enables /sbin/kexec to load the kernel directly then so much the
better, assuming the /sbin/kexec maintainers are happy with that
approach.

Ian.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-04 14:34               ` Ian Campbell
  0 siblings, 0 replies; 217+ messages in thread
From: Ian Campbell @ 2013-01-04 14:34 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, H. Peter Anvin, konrad.wilk, Andrew Cooper, x86,
	kexec, linux-kernel, virtualization, mingo, Eric W. Biederman,
	jbeulich, maxim.uvarov, tglx, vgoyal

On Fri, 2013-01-04 at 14:22 +0000, Daniel Kiper wrote:
> On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> > On 27/12/12 18:02, Eric W. Biederman wrote:
> > >Andrew Cooper<andrew.cooper3@citrix.com>  writes:
> > >
> > >>On 27/12/2012 07:53, Eric W. Biederman wrote:
> > >>>The syscall ABI still has the wrong semantics.
> > >>>
> > >>>Aka totally unmaintainable and umergeable.
> > >>>
> > >>>The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
> > >>There are two requirements pulling at this patch series, but I agree
> > >>that we need to clarify them.
> > >It probably make sense to split them apart a little even.
> > >
> > >
> >
> > Thinking about this split, there might be a way to simply it even more.
> >
> > /sbin/kexec can load the "Xen" crash kernel itself by issuing
> > hypercalls using /dev/xen/privcmd.  This would remove the need for
> > the dom0 kernel to distinguish between loading a crash kernel for
> > itself and loading a kernel for Xen.
> >
> > Or is this just a silly idea complicating the matter?
> 
> This is impossible with current Xen kexec/kdump interface.
> It should be changed to do that. However, I suppose that
> Xen community would not be interested in such changes.

The current HYPERVISOR_kexec interface is pretty fricken bad (it
basically hardcodes the Linux Circa-2.6.18 internal interface!).

I'd be all for a new HYPERVISOR_kexec (with the old gaining a _compat
suffix) which implements something more generic that isn't tied to a
particular dom0 kernel implementation (be it differing versions of Linux
or e.g. *BSD).

If that enables /sbin/kexec to load the kernel directly then so much the
better, assuming the /sbin/kexec maintainers are happy with that
approach.

Ian.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-04 14:22             ` Daniel Kiper
  (?)
@ 2013-01-04 14:38               ` David Vrabel
  -1 siblings, 0 replies; 217+ messages in thread
From: David Vrabel @ 2013-01-04 14:38 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: Andrew Cooper, xen-devel, konrad.wilk, maxim.uvarov, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman, jbeulich,
	H. Peter Anvin, tglx, vgoyal

On 04/01/13 14:22, Daniel Kiper wrote:
> On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
>> On 27/12/12 18:02, Eric W. Biederman wrote:
>>> Andrew Cooper<andrew.cooper3@citrix.com>  writes:
>>>
>>>> On 27/12/2012 07:53, Eric W. Biederman wrote:
>>>>> The syscall ABI still has the wrong semantics.
>>>>>
>>>>> Aka totally unmaintainable and umergeable.
>>>>>
>>>>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
>>>> There are two requirements pulling at this patch series, but I agree
>>>> that we need to clarify them.
>>> It probably make sense to split them apart a little even.
>>>
>>>
>>
>> Thinking about this split, there might be a way to simply it even more.
>>
>> /sbin/kexec can load the "Xen" crash kernel itself by issuing
>> hypercalls using /dev/xen/privcmd.  This would remove the need for
>> the dom0 kernel to distinguish between loading a crash kernel for
>> itself and loading a kernel for Xen.
>>
>> Or is this just a silly idea complicating the matter?
> 
> This is impossible with current Xen kexec/kdump interface.
> It should be changed to do that. However, I suppose that
> Xen community would not be interested in such changes.

I don't see why the hypercall ABI cannot be extended with new sub-ops
that do the right thing -- the existing ABI is a bit weird.

I plan to start prototyping something shortly (hopefully next week) for
the Xen kexec case.

David

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-04 14:22             ` Daniel Kiper
                               ` (3 preceding siblings ...)
  (?)
@ 2013-01-04 14:38             ` David Vrabel
  -1 siblings, 0 replies; 217+ messages in thread
From: David Vrabel @ 2013-01-04 14:38 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, H. Peter Anvin, konrad.wilk, Andrew Cooper, x86,
	kexec, linux-kernel, virtualization, mingo, Eric W. Biederman,
	jbeulich, maxim.uvarov, tglx, vgoyal

On 04/01/13 14:22, Daniel Kiper wrote:
> On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
>> On 27/12/12 18:02, Eric W. Biederman wrote:
>>> Andrew Cooper<andrew.cooper3@citrix.com>  writes:
>>>
>>>> On 27/12/2012 07:53, Eric W. Biederman wrote:
>>>>> The syscall ABI still has the wrong semantics.
>>>>>
>>>>> Aka totally unmaintainable and umergeable.
>>>>>
>>>>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
>>>> There are two requirements pulling at this patch series, but I agree
>>>> that we need to clarify them.
>>> It probably make sense to split them apart a little even.
>>>
>>>
>>
>> Thinking about this split, there might be a way to simply it even more.
>>
>> /sbin/kexec can load the "Xen" crash kernel itself by issuing
>> hypercalls using /dev/xen/privcmd.  This would remove the need for
>> the dom0 kernel to distinguish between loading a crash kernel for
>> itself and loading a kernel for Xen.
>>
>> Or is this just a silly idea complicating the matter?
> 
> This is impossible with current Xen kexec/kdump interface.
> It should be changed to do that. However, I suppose that
> Xen community would not be interested in such changes.

I don't see why the hypercall ABI cannot be extended with new sub-ops
that do the right thing -- the existing ABI is a bit weird.

I plan to start prototyping something shortly (hopefully next week) for
the Xen kexec case.

David

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-04 14:38               ` David Vrabel
  0 siblings, 0 replies; 217+ messages in thread
From: David Vrabel @ 2013-01-04 14:38 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: Andrew Cooper, xen-devel, konrad.wilk, maxim.uvarov, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman, jbeulich,
	H. Peter Anvin, tglx, vgoyal

On 04/01/13 14:22, Daniel Kiper wrote:
> On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
>> On 27/12/12 18:02, Eric W. Biederman wrote:
>>> Andrew Cooper<andrew.cooper3@citrix.com>  writes:
>>>
>>>> On 27/12/2012 07:53, Eric W. Biederman wrote:
>>>>> The syscall ABI still has the wrong semantics.
>>>>>
>>>>> Aka totally unmaintainable and umergeable.
>>>>>
>>>>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
>>>> There are two requirements pulling at this patch series, but I agree
>>>> that we need to clarify them.
>>> It probably make sense to split them apart a little even.
>>>
>>>
>>
>> Thinking about this split, there might be a way to simply it even more.
>>
>> /sbin/kexec can load the "Xen" crash kernel itself by issuing
>> hypercalls using /dev/xen/privcmd.  This would remove the need for
>> the dom0 kernel to distinguish between loading a crash kernel for
>> itself and loading a kernel for Xen.
>>
>> Or is this just a silly idea complicating the matter?
> 
> This is impossible with current Xen kexec/kdump interface.
> It should be changed to do that. However, I suppose that
> Xen community would not be interested in such changes.

I don't see why the hypercall ABI cannot be extended with new sub-ops
that do the right thing -- the existing ABI is a bit weird.

I plan to start prototyping something shortly (hopefully next week) for
the Xen kexec case.

David

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-04 14:38               ` David Vrabel
  0 siblings, 0 replies; 217+ messages in thread
From: David Vrabel @ 2013-01-04 14:38 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, H. Peter Anvin, konrad.wilk, Andrew Cooper, x86,
	kexec, linux-kernel, virtualization, mingo, Eric W. Biederman,
	jbeulich, maxim.uvarov, tglx, vgoyal

On 04/01/13 14:22, Daniel Kiper wrote:
> On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
>> On 27/12/12 18:02, Eric W. Biederman wrote:
>>> Andrew Cooper<andrew.cooper3@citrix.com>  writes:
>>>
>>>> On 27/12/2012 07:53, Eric W. Biederman wrote:
>>>>> The syscall ABI still has the wrong semantics.
>>>>>
>>>>> Aka totally unmaintainable and umergeable.
>>>>>
>>>>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
>>>> There are two requirements pulling at this patch series, but I agree
>>>> that we need to clarify them.
>>> It probably make sense to split them apart a little even.
>>>
>>>
>>
>> Thinking about this split, there might be a way to simply it even more.
>>
>> /sbin/kexec can load the "Xen" crash kernel itself by issuing
>> hypercalls using /dev/xen/privcmd.  This would remove the need for
>> the dom0 kernel to distinguish between loading a crash kernel for
>> itself and loading a kernel for Xen.
>>
>> Or is this just a silly idea complicating the matter?
> 
> This is impossible with current Xen kexec/kdump interface.
> It should be changed to do that. However, I suppose that
> Xen community would not be interested in such changes.

I don't see why the hypercall ABI cannot be extended with new sub-ops
that do the right thing -- the existing ABI is a bit weird.

I plan to start prototyping something shortly (hopefully next week) for
the Xen kexec case.

David

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-04 14:22             ` Daniel Kiper
  (?)
@ 2013-01-04 14:41               ` Jan Beulich
  -1 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-04 14:41 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: Andrew Cooper, x86, tglx, kexec, virtualization, xen-devel,
	konrad.wilk, maxim.uvarov, mingo, vgoyal, linux-kernel,
	Eric W. Biederman, H. Peter Anvin

>>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
>> /sbin/kexec can load the "Xen" crash kernel itself by issuing
>> hypercalls using /dev/xen/privcmd.  This would remove the need for
>> the dom0 kernel to distinguish between loading a crash kernel for
>> itself and loading a kernel for Xen.
>>
>> Or is this just a silly idea complicating the matter?
> 
> This is impossible with current Xen kexec/kdump interface.

Why?

> It should be changed to do that. However, I suppose that
> Xen community would not be interested in such changes.

And again - why?

Jan


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-04 14:41               ` Jan Beulich
  0 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-04 14:41 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, H. Peter Anvin, konrad.wilk, Andrew Cooper, x86,
	kexec, linux-kernel, virtualization, mingo, Eric W. Biederman,
	maxim.uvarov, tglx, vgoyal

>>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
>> /sbin/kexec can load the "Xen" crash kernel itself by issuing
>> hypercalls using /dev/xen/privcmd.  This would remove the need for
>> the dom0 kernel to distinguish between loading a crash kernel for
>> itself and loading a kernel for Xen.
>>
>> Or is this just a silly idea complicating the matter?
> 
> This is impossible with current Xen kexec/kdump interface.

Why?

> It should be changed to do that. However, I suppose that
> Xen community would not be interested in such changes.

And again - why?

Jan

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-04 14:41               ` Jan Beulich
  0 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-04 14:41 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, H. Peter Anvin, konrad.wilk, Andrew Cooper, x86,
	kexec, linux-kernel, virtualization, mingo, Eric W. Biederman,
	maxim.uvarov, tglx, vgoyal

>>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
>> /sbin/kexec can load the "Xen" crash kernel itself by issuing
>> hypercalls using /dev/xen/privcmd.  This would remove the need for
>> the dom0 kernel to distinguish between loading a crash kernel for
>> itself and loading a kernel for Xen.
>>
>> Or is this just a silly idea complicating the matter?
> 
> This is impossible with current Xen kexec/kdump interface.

Why?

> It should be changed to do that. However, I suppose that
> Xen community would not be interested in such changes.

And again - why?

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-03  9:34       ` Jan Beulich
  (?)
@ 2013-01-04 15:15         ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-04 15:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: andrew.cooper3, x86, tglx, kexec, virtualization, xen-devel,
	konrad.wilk, maxim.uvarov, mingo, vgoyal, linux-kernel, ebiederm,
	hpa

On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
> >>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > Some implementations (e.g. Xen PVOPS) could not use part of identity page table
> > to construct transition page table. It means that they require separate PUDs,
> > PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
> > requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
> > code.
>
> So you keep posting this despite it having got pointed out on each
> earlier submission that this is unnecessary, proven by the fact that
> the non-pvops Xen kernels can get away without it. Why?

Sorry but I forgot to reply for your email last time.

I am still not convinced. I have tested SUSE kernel itself and it does not work.
Maybe I missed something but... Please check arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()

I can see:

vaddr = (unsigned long)relocate_kernel;

and later:

pgd += pgd_index(vaddr);
...

It is wrong. relocate_kernel() virtual address in Xen is different
than its virtual address in Linux Kernel. That is why transition
page table could not be established in Linux Kernel and so on...
How does this work in SUSE? I do not have an idea.

I am happy to fix that but whatever fix for it is
I would like to be sure that it works.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2013-01-04 15:15         ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-04 15:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
> >>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > Some implementations (e.g. Xen PVOPS) could not use part of identity page table
> > to construct transition page table. It means that they require separate PUDs,
> > PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
> > requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
> > code.
>
> So you keep posting this despite it having got pointed out on each
> earlier submission that this is unnecessary, proven by the fact that
> the non-pvops Xen kernels can get away without it. Why?

Sorry but I forgot to reply for your email last time.

I am still not convinced. I have tested SUSE kernel itself and it does not work.
Maybe I missed something but... Please check arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()

I can see:

vaddr = (unsigned long)relocate_kernel;

and later:

pgd += pgd_index(vaddr);
...

It is wrong. relocate_kernel() virtual address in Xen is different
than its virtual address in Linux Kernel. That is why transition
page table could not be established in Linux Kernel and so on...
How does this work in SUSE? I do not have an idea.

I am happy to fix that but whatever fix for it is
I would like to be sure that it works.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-03  9:34       ` Jan Beulich
                         ` (2 preceding siblings ...)
  (?)
@ 2013-01-04 15:15       ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-04 15:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
> >>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > Some implementations (e.g. Xen PVOPS) could not use part of identity page table
> > to construct transition page table. It means that they require separate PUDs,
> > PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
> > requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
> > code.
>
> So you keep posting this despite it having got pointed out on each
> earlier submission that this is unnecessary, proven by the fact that
> the non-pvops Xen kernels can get away without it. Why?

Sorry but I forgot to reply for your email last time.

I am still not convinced. I have tested SUSE kernel itself and it does not work.
Maybe I missed something but... Please check arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()

I can see:

vaddr = (unsigned long)relocate_kernel;

and later:

pgd += pgd_index(vaddr);
...

It is wrong. relocate_kernel() virtual address in Xen is different
than its virtual address in Linux Kernel. That is why transition
page table could not be established in Linux Kernel and so on...
How does this work in SUSE? I do not have an idea.

I am happy to fix that but whatever fix for it is
I would like to be sure that it works.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2013-01-04 15:15         ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-04 15:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
> >>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > Some implementations (e.g. Xen PVOPS) could not use part of identity page table
> > to construct transition page table. It means that they require separate PUDs,
> > PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
> > requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
> > code.
>
> So you keep posting this despite it having got pointed out on each
> earlier submission that this is unnecessary, proven by the fact that
> the non-pvops Xen kernels can get away without it. Why?

Sorry but I forgot to reply for your email last time.

I am still not convinced. I have tested SUSE kernel itself and it does not work.
Maybe I missed something but... Please check arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()

I can see:

vaddr = (unsigned long)relocate_kernel;

and later:

pgd += pgd_index(vaddr);
...

It is wrong. relocate_kernel() virtual address in Xen is different
than its virtual address in Linux Kernel. That is why transition
page table could not be established in Linux Kernel and so on...
How does this work in SUSE? I do not have an idea.

I am happy to fix that but whatever fix for it is
I would like to be sure that it works.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-04 15:15         ` Daniel Kiper
  (?)
@ 2013-01-04 16:12           ` Jan Beulich
  -1 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-04 16:12 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: andrew.cooper3, x86, tglx, kexec, virtualization, xen-devel,
	konrad.wilk, maxim.uvarov, mingo, vgoyal, linux-kernel, ebiederm,
	hpa

>>> On 04.01.13 at 16:15, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
>> >>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
>> > Some implementations (e.g. Xen PVOPS) could not use part of identity page 
> table
>> > to construct transition page table. It means that they require separate 
> PUDs,
>> > PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
>> > requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
>> > code.
>>
>> So you keep posting this despite it having got pointed out on each
>> earlier submission that this is unnecessary, proven by the fact that
>> the non-pvops Xen kernels can get away without it. Why?
> 
> Sorry but I forgot to reply for your email last time.
> 
> I am still not convinced. I have tested SUSE kernel itself and it does not 
> work.
> Maybe I missed something but... Please check 
> arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
> 
> I can see:
> 
> vaddr = (unsigned long)relocate_kernel;
> 
> and later:
> 
> pgd += pgd_index(vaddr);
> ...

I think that mapping is simply irrelevant, as the code at
relocate_kernel gets copied to the control page and
invoked there (other than in the native case, where
relocate_kernel() gets invoked directly).

Jan

> It is wrong. relocate_kernel() virtual address in Xen is different
> than its virtual address in Linux Kernel. That is why transition
> page table could not be established in Linux Kernel and so on...
> How does this work in SUSE? I do not have an idea.
> 
> I am happy to fix that but whatever fix for it is
> I would like to be sure that it works.
> 
> Daniel




^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2013-01-04 16:12           ` Jan Beulich
  0 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-04 16:12 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

>>> On 04.01.13 at 16:15, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
>> >>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
>> > Some implementations (e.g. Xen PVOPS) could not use part of identity page 
> table
>> > to construct transition page table. It means that they require separate 
> PUDs,
>> > PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
>> > requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
>> > code.
>>
>> So you keep posting this despite it having got pointed out on each
>> earlier submission that this is unnecessary, proven by the fact that
>> the non-pvops Xen kernels can get away without it. Why?
> 
> Sorry but I forgot to reply for your email last time.
> 
> I am still not convinced. I have tested SUSE kernel itself and it does not 
> work.
> Maybe I missed something but... Please check 
> arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
> 
> I can see:
> 
> vaddr = (unsigned long)relocate_kernel;
> 
> and later:
> 
> pgd += pgd_index(vaddr);
> ...

I think that mapping is simply irrelevant, as the code at
relocate_kernel gets copied to the control page and
invoked there (other than in the native case, where
relocate_kernel() gets invoked directly).

Jan

> It is wrong. relocate_kernel() virtual address in Xen is different
> than its virtual address in Linux Kernel. That is why transition
> page table could not be established in Linux Kernel and so on...
> How does this work in SUSE? I do not have an idea.
> 
> I am happy to fix that but whatever fix for it is
> I would like to be sure that it works.
> 
> Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-04 15:15         ` Daniel Kiper
                           ` (2 preceding siblings ...)
  (?)
@ 2013-01-04 16:12         ` Jan Beulich
  -1 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-04 16:12 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

>>> On 04.01.13 at 16:15, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
>> >>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
>> > Some implementations (e.g. Xen PVOPS) could not use part of identity page 
> table
>> > to construct transition page table. It means that they require separate 
> PUDs,
>> > PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
>> > requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
>> > code.
>>
>> So you keep posting this despite it having got pointed out on each
>> earlier submission that this is unnecessary, proven by the fact that
>> the non-pvops Xen kernels can get away without it. Why?
> 
> Sorry but I forgot to reply for your email last time.
> 
> I am still not convinced. I have tested SUSE kernel itself and it does not 
> work.
> Maybe I missed something but... Please check 
> arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
> 
> I can see:
> 
> vaddr = (unsigned long)relocate_kernel;
> 
> and later:
> 
> pgd += pgd_index(vaddr);
> ...

I think that mapping is simply irrelevant, as the code at
relocate_kernel gets copied to the control page and
invoked there (other than in the native case, where
relocate_kernel() gets invoked directly).

Jan

> It is wrong. relocate_kernel() virtual address in Xen is different
> than its virtual address in Linux Kernel. That is why transition
> page table could not be established in Linux Kernel and so on...
> How does this work in SUSE? I do not have an idea.
> 
> I am happy to fix that but whatever fix for it is
> I would like to be sure that it works.
> 
> Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2013-01-04 16:12           ` Jan Beulich
  0 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-04 16:12 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

>>> On 04.01.13 at 16:15, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
>> >>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
>> > Some implementations (e.g. Xen PVOPS) could not use part of identity page 
> table
>> > to construct transition page table. It means that they require separate 
> PUDs,
>> > PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
>> > requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
>> > code.
>>
>> So you keep posting this despite it having got pointed out on each
>> earlier submission that this is unnecessary, proven by the fact that
>> the non-pvops Xen kernels can get away without it. Why?
> 
> Sorry but I forgot to reply for your email last time.
> 
> I am still not convinced. I have tested SUSE kernel itself and it does not 
> work.
> Maybe I missed something but... Please check 
> arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
> 
> I can see:
> 
> vaddr = (unsigned long)relocate_kernel;
> 
> and later:
> 
> pgd += pgd_index(vaddr);
> ...

I think that mapping is simply irrelevant, as the code at
relocate_kernel gets copied to the control page and
invoked there (other than in the native case, where
relocate_kernel() gets invoked directly).

Jan

> It is wrong. relocate_kernel() virtual address in Xen is different
> than its virtual address in Linux Kernel. That is why transition
> page table could not be established in Linux Kernel and so on...
> How does this work in SUSE? I do not have an idea.
> 
> I am happy to fix that but whatever fix for it is
> I would like to be sure that it works.
> 
> Daniel




_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-04 14:38               ` David Vrabel
  (?)
@ 2013-01-04 17:01                 ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-04 17:01 UTC (permalink / raw)
  To: David Vrabel
  Cc: Andrew Cooper, xen-devel, konrad.wilk, maxim.uvarov, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman, jbeulich,
	H. Peter Anvin, tglx, vgoyal

On Fri, Jan 04, 2013 at 02:38:44PM +0000, David Vrabel wrote:
> On 04/01/13 14:22, Daniel Kiper wrote:
> > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> >> On 27/12/12 18:02, Eric W. Biederman wrote:
> >>> Andrew Cooper<andrew.cooper3@citrix.com>  writes:
> >>>
> >>>> On 27/12/2012 07:53, Eric W. Biederman wrote:
> >>>>> The syscall ABI still has the wrong semantics.
> >>>>>
> >>>>> Aka totally unmaintainable and umergeable.
> >>>>>
> >>>>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
> >>>> There are two requirements pulling at this patch series, but I agree
> >>>> that we need to clarify them.
> >>> It probably make sense to split them apart a little even.
> >>>
> >>>
> >>
> >> Thinking about this split, there might be a way to simply it even more.
> >>
> >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> >> the dom0 kernel to distinguish between loading a crash kernel for
> >> itself and loading a kernel for Xen.
> >>
> >> Or is this just a silly idea complicating the matter?
> >
> > This is impossible with current Xen kexec/kdump interface.
> > It should be changed to do that. However, I suppose that
> > Xen community would not be interested in such changes.
>
> I don't see why the hypercall ABI cannot be extended with new sub-ops
> that do the right thing -- the existing ABI is a bit weird.
>
> I plan to start prototyping something shortly (hopefully next week) for
> the Xen kexec case.

Wow... As I can this time Xen community is interested in...
That is great. I agree that current kexec interface is not ideal.

David, I am happy to help in that process. However, if you wish I could
carry it myself. Anyway, it looks that I should hold on with my
Linux kexec/kdump patches.

My .5 cents:
  - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
    probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2;
    load should __LOAD__ kernel image and other things into hypervisor memory;
    I suppose that allmost all things could be copied from linux/kernel/kexec.c,
    linux/arch/x86/kernel/{machine_kexec_$(BITS).c,relocate_kernel_$(BITS).c};
    I think that KEXEC_CMD_kexec should stay as is,
  - Hmmm... Now I think that we should still use kexec syscall to load image
    into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
    all things which are needed to call kdump if dom0 crashes; however,
    I could be wrong...
  - last but not least, we should think about support for PV guests too.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-04 17:01                 ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-04 17:01 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, H. Peter Anvin, konrad.wilk, Andrew Cooper, x86,
	kexec, linux-kernel, virtualization, mingo, Eric W. Biederman,
	jbeulich, maxim.uvarov, tglx, vgoyal

On Fri, Jan 04, 2013 at 02:38:44PM +0000, David Vrabel wrote:
> On 04/01/13 14:22, Daniel Kiper wrote:
> > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> >> On 27/12/12 18:02, Eric W. Biederman wrote:
> >>> Andrew Cooper<andrew.cooper3@citrix.com>  writes:
> >>>
> >>>> On 27/12/2012 07:53, Eric W. Biederman wrote:
> >>>>> The syscall ABI still has the wrong semantics.
> >>>>>
> >>>>> Aka totally unmaintainable and umergeable.
> >>>>>
> >>>>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
> >>>> There are two requirements pulling at this patch series, but I agree
> >>>> that we need to clarify them.
> >>> It probably make sense to split them apart a little even.
> >>>
> >>>
> >>
> >> Thinking about this split, there might be a way to simply it even more.
> >>
> >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> >> the dom0 kernel to distinguish between loading a crash kernel for
> >> itself and loading a kernel for Xen.
> >>
> >> Or is this just a silly idea complicating the matter?
> >
> > This is impossible with current Xen kexec/kdump interface.
> > It should be changed to do that. However, I suppose that
> > Xen community would not be interested in such changes.
>
> I don't see why the hypercall ABI cannot be extended with new sub-ops
> that do the right thing -- the existing ABI is a bit weird.
>
> I plan to start prototyping something shortly (hopefully next week) for
> the Xen kexec case.

Wow... As I can this time Xen community is interested in...
That is great. I agree that current kexec interface is not ideal.

David, I am happy to help in that process. However, if you wish I could
carry it myself. Anyway, it looks that I should hold on with my
Linux kexec/kdump patches.

My .5 cents:
  - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
    probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2;
    load should __LOAD__ kernel image and other things into hypervisor memory;
    I suppose that allmost all things could be copied from linux/kernel/kexec.c,
    linux/arch/x86/kernel/{machine_kexec_$(BITS).c,relocate_kernel_$(BITS).c};
    I think that KEXEC_CMD_kexec should stay as is,
  - Hmmm... Now I think that we should still use kexec syscall to load image
    into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
    all things which are needed to call kdump if dom0 crashes; however,
    I could be wrong...
  - last but not least, we should think about support for PV guests too.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-04 17:01                 ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-04 17:01 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, H. Peter Anvin, konrad.wilk, Andrew Cooper, x86,
	kexec, linux-kernel, virtualization, mingo, Eric W. Biederman,
	jbeulich, maxim.uvarov, tglx, vgoyal

On Fri, Jan 04, 2013 at 02:38:44PM +0000, David Vrabel wrote:
> On 04/01/13 14:22, Daniel Kiper wrote:
> > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> >> On 27/12/12 18:02, Eric W. Biederman wrote:
> >>> Andrew Cooper<andrew.cooper3@citrix.com>  writes:
> >>>
> >>>> On 27/12/2012 07:53, Eric W. Biederman wrote:
> >>>>> The syscall ABI still has the wrong semantics.
> >>>>>
> >>>>> Aka totally unmaintainable and umergeable.
> >>>>>
> >>>>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
> >>>> There are two requirements pulling at this patch series, but I agree
> >>>> that we need to clarify them.
> >>> It probably make sense to split them apart a little even.
> >>>
> >>>
> >>
> >> Thinking about this split, there might be a way to simply it even more.
> >>
> >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> >> the dom0 kernel to distinguish between loading a crash kernel for
> >> itself and loading a kernel for Xen.
> >>
> >> Or is this just a silly idea complicating the matter?
> >
> > This is impossible with current Xen kexec/kdump interface.
> > It should be changed to do that. However, I suppose that
> > Xen community would not be interested in such changes.
>
> I don't see why the hypercall ABI cannot be extended with new sub-ops
> that do the right thing -- the existing ABI is a bit weird.
>
> I plan to start prototyping something shortly (hopefully next week) for
> the Xen kexec case.

Wow... As I can this time Xen community is interested in...
That is great. I agree that current kexec interface is not ideal.

David, I am happy to help in that process. However, if you wish I could
carry it myself. Anyway, it looks that I should hold on with my
Linux kexec/kdump patches.

My .5 cents:
  - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
    probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2;
    load should __LOAD__ kernel image and other things into hypervisor memory;
    I suppose that allmost all things could be copied from linux/kernel/kexec.c,
    linux/arch/x86/kernel/{machine_kexec_$(BITS).c,relocate_kernel_$(BITS).c};
    I think that KEXEC_CMD_kexec should stay as is,
  - Hmmm... Now I think that we should still use kexec syscall to load image
    into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
    all things which are needed to call kdump if dom0 crashes; however,
    I could be wrong...
  - last but not least, we should think about support for PV guests too.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-04 17:07                 ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-04 17:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, x86, tglx, kexec, virtualization, xen-devel,
	konrad.wilk, maxim.uvarov, mingo, vgoyal, linux-kernel,
	Eric W. Biederman, H. Peter Anvin

On Fri, Jan 04, 2013 at 02:41:17PM +0000, Jan Beulich wrote:
> >>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> >> the dom0 kernel to distinguish between loading a crash kernel for
> >> itself and loading a kernel for Xen.
> >>
> >> Or is this just a silly idea complicating the matter?
> >
> > This is impossible with current Xen kexec/kdump interface.
>
> Why?

Because current KEXEC_CMD_kexec_load does not load kernel
image and other things into Xen memory. It means that it
should live somewhere in dom0 Linux kernel memory.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-04 14:41               ` Jan Beulich
                                 ` (2 preceding siblings ...)
  (?)
@ 2013-01-04 17:07               ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-04 17:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, H. Peter Anvin, konrad.wilk, Andrew Cooper, x86,
	kexec, linux-kernel, virtualization, mingo, Eric W. Biederman,
	maxim.uvarov, tglx, vgoyal

On Fri, Jan 04, 2013 at 02:41:17PM +0000, Jan Beulich wrote:
> >>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> >> the dom0 kernel to distinguish between loading a crash kernel for
> >> itself and loading a kernel for Xen.
> >>
> >> Or is this just a silly idea complicating the matter?
> >
> > This is impossible with current Xen kexec/kdump interface.
>
> Why?

Because current KEXEC_CMD_kexec_load does not load kernel
image and other things into Xen memory. It means that it
should live somewhere in dom0 Linux kernel memory.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-04 17:07                 ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-04 17:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, H. Peter Anvin,
	konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA, Andrew Cooper,
	x86-DgEjT+Ai2ygdnm+yROfE0A,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman,
	maxim.uvarov-QHcLZuEGTsvQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA

On Fri, Jan 04, 2013 at 02:41:17PM +0000, Jan Beulich wrote:
> >>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> >> the dom0 kernel to distinguish between loading a crash kernel for
> >> itself and loading a kernel for Xen.
> >>
> >> Or is this just a silly idea complicating the matter?
> >
> > This is impossible with current Xen kexec/kdump interface.
>
> Why?

Because current KEXEC_CMD_kexec_load does not load kernel
image and other things into Xen memory. It means that it
should live somewhere in dom0 Linux kernel memory.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-04 17:07                 ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-04 17:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, H. Peter Anvin, konrad.wilk, Andrew Cooper, x86,
	kexec, linux-kernel, virtualization, mingo, Eric W. Biederman,
	maxim.uvarov, tglx, vgoyal

On Fri, Jan 04, 2013 at 02:41:17PM +0000, Jan Beulich wrote:
> >>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> >> the dom0 kernel to distinguish between loading a crash kernel for
> >> itself and loading a kernel for Xen.
> >>
> >> Or is this just a silly idea complicating the matter?
> >
> > This is impossible with current Xen kexec/kdump interface.
>
> Why?

Because current KEXEC_CMD_kexec_load does not load kernel
image and other things into Xen memory. It means that it
should live somewhere in dom0 Linux kernel memory.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-04 16:12           ` Jan Beulich
  (?)
@ 2013-01-04 17:25             ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-04 17:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: andrew.cooper3, x86, tglx, kexec, virtualization, xen-devel,
	konrad.wilk, maxim.uvarov, mingo, vgoyal, linux-kernel, ebiederm,
	hpa

On Fri, Jan 04, 2013 at 04:12:32PM +0000, Jan Beulich wrote:
> >>> On 04.01.13 at 16:15, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
> >> >>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> >> > Some implementations (e.g. Xen PVOPS) could not use part of identity page table
> >> > to construct transition page table. It means that they require separate PUDs,
> >> > PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
> >> > requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
> >> > code.
> >>
> >> So you keep posting this despite it having got pointed out on each
> >> earlier submission that this is unnecessary, proven by the fact that
> >> the non-pvops Xen kernels can get away without it. Why?
> >
> > Sorry but I forgot to reply for your email last time.
> >
> > I am still not convinced. I have tested SUSE kernel itself and it does not work.
> > Maybe I missed something but... Please check
> > arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
> >
> > I can see:
> >
> > vaddr = (unsigned long)relocate_kernel;
> >
> > and later:
> >
> > pgd += pgd_index(vaddr);
> > ...
>
> I think that mapping is simply irrelevant, as the code at
> relocate_kernel gets copied to the control page and
> invoked there (other than in the native case, where
> relocate_kernel() gets invoked directly).

Right, so where is virtual mapping of control page established?
I could not find relevant code in SLES kernel which does that.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2013-01-04 17:25             ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-04 17:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

On Fri, Jan 04, 2013 at 04:12:32PM +0000, Jan Beulich wrote:
> >>> On 04.01.13 at 16:15, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
> >> >>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> >> > Some implementations (e.g. Xen PVOPS) could not use part of identity page table
> >> > to construct transition page table. It means that they require separate PUDs,
> >> > PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
> >> > requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
> >> > code.
> >>
> >> So you keep posting this despite it having got pointed out on each
> >> earlier submission that this is unnecessary, proven by the fact that
> >> the non-pvops Xen kernels can get away without it. Why?
> >
> > Sorry but I forgot to reply for your email last time.
> >
> > I am still not convinced. I have tested SUSE kernel itself and it does not work.
> > Maybe I missed something but... Please check
> > arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
> >
> > I can see:
> >
> > vaddr = (unsigned long)relocate_kernel;
> >
> > and later:
> >
> > pgd += pgd_index(vaddr);
> > ...
>
> I think that mapping is simply irrelevant, as the code at
> relocate_kernel gets copied to the control page and
> invoked there (other than in the native case, where
> relocate_kernel() gets invoked directly).

Right, so where is virtual mapping of control page established?
I could not find relevant code in SLES kernel which does that.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-04 16:12           ` Jan Beulich
  (?)
  (?)
@ 2013-01-04 17:25           ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-04 17:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

On Fri, Jan 04, 2013 at 04:12:32PM +0000, Jan Beulich wrote:
> >>> On 04.01.13 at 16:15, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
> >> >>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> >> > Some implementations (e.g. Xen PVOPS) could not use part of identity page table
> >> > to construct transition page table. It means that they require separate PUDs,
> >> > PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
> >> > requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
> >> > code.
> >>
> >> So you keep posting this despite it having got pointed out on each
> >> earlier submission that this is unnecessary, proven by the fact that
> >> the non-pvops Xen kernels can get away without it. Why?
> >
> > Sorry but I forgot to reply for your email last time.
> >
> > I am still not convinced. I have tested SUSE kernel itself and it does not work.
> > Maybe I missed something but... Please check
> > arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
> >
> > I can see:
> >
> > vaddr = (unsigned long)relocate_kernel;
> >
> > and later:
> >
> > pgd += pgd_index(vaddr);
> > ...
>
> I think that mapping is simply irrelevant, as the code at
> relocate_kernel gets copied to the control page and
> invoked there (other than in the native case, where
> relocate_kernel() gets invoked directly).

Right, so where is virtual mapping of control page established?
I could not find relevant code in SLES kernel which does that.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2013-01-04 17:25             ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-04 17:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

On Fri, Jan 04, 2013 at 04:12:32PM +0000, Jan Beulich wrote:
> >>> On 04.01.13 at 16:15, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
> >> >>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> >> > Some implementations (e.g. Xen PVOPS) could not use part of identity page table
> >> > to construct transition page table. It means that they require separate PUDs,
> >> > PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
> >> > requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
> >> > code.
> >>
> >> So you keep posting this despite it having got pointed out on each
> >> earlier submission that this is unnecessary, proven by the fact that
> >> the non-pvops Xen kernels can get away without it. Why?
> >
> > Sorry but I forgot to reply for your email last time.
> >
> > I am still not convinced. I have tested SUSE kernel itself and it does not work.
> > Maybe I missed something but... Please check
> > arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
> >
> > I can see:
> >
> > vaddr = (unsigned long)relocate_kernel;
> >
> > and later:
> >
> > pgd += pgd_index(vaddr);
> > ...
>
> I think that mapping is simply irrelevant, as the code at
> relocate_kernel gets copied to the control page and
> invoked there (other than in the native case, where
> relocate_kernel() gets invoked directly).

Right, so where is virtual mapping of control page established?
I could not find relevant code in SLES kernel which does that.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-04 17:07                 ` Daniel Kiper
  (?)
@ 2013-01-04 19:11                   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 217+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-04 19:11 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: Jan Beulich, Andrew Cooper, x86, tglx, kexec, virtualization,
	xen-devel, maxim.uvarov, mingo, vgoyal, linux-kernel,
	Eric W. Biederman, H. Peter Anvin

On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote:
> On Fri, Jan 04, 2013 at 02:41:17PM +0000, Jan Beulich wrote:
> > >>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> > >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> > >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> > >> the dom0 kernel to distinguish between loading a crash kernel for
> > >> itself and loading a kernel for Xen.
> > >>
> > >> Or is this just a silly idea complicating the matter?
> > >
> > > This is impossible with current Xen kexec/kdump interface.
> >
> > Why?
> 
> Because current KEXEC_CMD_kexec_load does not load kernel
> image and other things into Xen memory. It means that it
> should live somewhere in dom0 Linux kernel memory.

We could have a very simple hypercall which would have:

struct fancy_new_hypercall {
	xen_pfn_t payload; // IN
	ssize_t len; // IN
#define DATA (1<<1)
#define DATA_EOF (1<<2)
#define DATA_KERNEL (1<<3)
#define DATA_RAMDISK (1<<4)
	unsigned int flags; // IN
	unsigned int status; // OUT
};

which would in a loop just iterate over the payloads and
let the hypervisor stick it in the crashkernel space.

This is all hand-waving of course. There probably would be a need
to figure out how much space you have in the reserved Xen's
'crashkernel' memory region too.

> 
> Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-04 19:11                   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 217+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-04 19:11 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, H. Peter Anvin, Andrew Cooper, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman,
	Jan Beulich, maxim.uvarov, tglx, vgoyal

On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote:
> On Fri, Jan 04, 2013 at 02:41:17PM +0000, Jan Beulich wrote:
> > >>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> > >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> > >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> > >> the dom0 kernel to distinguish between loading a crash kernel for
> > >> itself and loading a kernel for Xen.
> > >>
> > >> Or is this just a silly idea complicating the matter?
> > >
> > > This is impossible with current Xen kexec/kdump interface.
> >
> > Why?
> 
> Because current KEXEC_CMD_kexec_load does not load kernel
> image and other things into Xen memory. It means that it
> should live somewhere in dom0 Linux kernel memory.

We could have a very simple hypercall which would have:

struct fancy_new_hypercall {
	xen_pfn_t payload; // IN
	ssize_t len; // IN
#define DATA (1<<1)
#define DATA_EOF (1<<2)
#define DATA_KERNEL (1<<3)
#define DATA_RAMDISK (1<<4)
	unsigned int flags; // IN
	unsigned int status; // OUT
};

which would in a loop just iterate over the payloads and
let the hypervisor stick it in the crashkernel space.

This is all hand-waving of course. There probably would be a need
to figure out how much space you have in the reserved Xen's
'crashkernel' memory region too.

> 
> Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-04 19:11                   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 217+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-04 19:11 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, H. Peter Anvin, Andrew Cooper, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman,
	Jan Beulich, maxim.uvarov, tglx, vgoyal

On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote:
> On Fri, Jan 04, 2013 at 02:41:17PM +0000, Jan Beulich wrote:
> > >>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> > >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> > >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> > >> the dom0 kernel to distinguish between loading a crash kernel for
> > >> itself and loading a kernel for Xen.
> > >>
> > >> Or is this just a silly idea complicating the matter?
> > >
> > > This is impossible with current Xen kexec/kdump interface.
> >
> > Why?
> 
> Because current KEXEC_CMD_kexec_load does not load kernel
> image and other things into Xen memory. It means that it
> should live somewhere in dom0 Linux kernel memory.

We could have a very simple hypercall which would have:

struct fancy_new_hypercall {
	xen_pfn_t payload; // IN
	ssize_t len; // IN
#define DATA (1<<1)
#define DATA_EOF (1<<2)
#define DATA_KERNEL (1<<3)
#define DATA_RAMDISK (1<<4)
	unsigned int flags; // IN
	unsigned int status; // OUT
};

which would in a loop just iterate over the payloads and
let the hypervisor stick it in the crashkernel space.

This is all hand-waving of course. There probably would be a need
to figure out how much space you have in the reserved Xen's
'crashkernel' memory region too.

> 
> Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-04 17:25             ` Daniel Kiper
@ 2013-01-07  9:48               ` Jan Beulich
  -1 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-07  9:48 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: andrew.cooper3, x86, tglx, kexec, virtualization, xen-devel,
	konrad.wilk, maxim.uvarov, mingo, vgoyal, linux-kernel, ebiederm,
	hpa

>>> On 04.01.13 at 18:25, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> Right, so where is virtual mapping of control page established?
> I could not find relevant code in SLES kernel which does that.

In the hypervisor (xen/arch/x86/machine_kexec.c:machine_kexec_load()).
xen/arch/x86/machine_kexec.c:machine_kexec() then simply uses
image->page_list[1].

Jan


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-04 17:25             ` Daniel Kiper
                               ` (2 preceding siblings ...)
  (?)
@ 2013-01-07  9:48             ` Jan Beulich
  -1 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-07  9:48 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

>>> On 04.01.13 at 18:25, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> Right, so where is virtual mapping of control page established?
> I could not find relevant code in SLES kernel which does that.

In the hypervisor (xen/arch/x86/machine_kexec.c:machine_kexec_load()).
xen/arch/x86/machine_kexec.c:machine_kexec() then simply uses
image->page_list[1].

Jan

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-04 17:25             ` Daniel Kiper
                               ` (3 preceding siblings ...)
  (?)
@ 2013-01-07  9:48             ` Jan Beulich
  -1 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-07  9:48 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

>>> On 04.01.13 at 18:25, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> Right, so where is virtual mapping of control page established?
> I could not find relevant code in SLES kernel which does that.

In the hypervisor (xen/arch/x86/machine_kexec.c:machine_kexec_load()).
xen/arch/x86/machine_kexec.c:machine_kexec() then simply uses
image->page_list[1].

Jan

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2013-01-07  9:48               ` Jan Beulich
  0 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-07  9:48 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

>>> On 04.01.13 at 18:25, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> Right, so where is virtual mapping of control page established?
> I could not find relevant code in SLES kernel which does that.

In the hypervisor (xen/arch/x86/machine_kexec.c:machine_kexec_load()).
xen/arch/x86/machine_kexec.c:machine_kexec() then simply uses
image->page_list[1].

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-07 10:25                     ` Ian Campbell
  0 siblings, 0 replies; 217+ messages in thread
From: Ian Campbell @ 2013-01-07 10:25 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Daniel Kiper, xen-devel, H. Peter Anvin, Andrew Cooper, x86,
	kexec, linux-kernel, virtualization, mingo, Eric W. Biederman,
	Jan Beulich, maxim.uvarov, tglx, vgoyal

On Fri, 2013-01-04 at 19:11 +0000, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote:
> > On Fri, Jan 04, 2013 at 02:41:17PM +0000, Jan Beulich wrote:
> > > >>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > > > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> > > >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> > > >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> > > >> the dom0 kernel to distinguish between loading a crash kernel for
> > > >> itself and loading a kernel for Xen.
> > > >>
> > > >> Or is this just a silly idea complicating the matter?
> > > >
> > > > This is impossible with current Xen kexec/kdump interface.
> > >
> > > Why?
> > 
> > Because current KEXEC_CMD_kexec_load does not load kernel
> > image and other things into Xen memory. It means that it
> > should live somewhere in dom0 Linux kernel memory.
> 
> We could have a very simple hypercall which would have:
> 
> struct fancy_new_hypercall {
> 	xen_pfn_t payload; // IN

This would have to be XEN_GUEST_HANDLE(something) since userspace cannot
figure out what pfns back its memory. In any case since the hypervisor
is going to want to copy the data into the crashkernel space a virtual
address is convenient to have.

> 	ssize_t len; // IN
> #define DATA (1<<1)
> #define DATA_EOF (1<<2)
> #define DATA_KERNEL (1<<3)
> #define DATA_RAMDISK (1<<4)
> 	unsigned int flags; // IN
> 	unsigned int status; // OUT
> };
> 
> which would in a loop just iterate over the payloads and
> let the hypervisor stick it in the crashkernel space.
> 
> This is all hand-waving of course. There probably would be a need
> to figure out how much space you have in the reserved Xen's
> 'crashkernel' memory region too.

This is probably a mad idea but it's Monday morning and I'm sleep
deprived so I'll throw it out there...

What about adding DOMID_KEXEC (similar DOMID_IO etc)? This would allow
dom0 to map the kexec memory space with the usual privcmd mmap
hypercalls and build things in it directly.

OK, I suspect this might not be practical for a variety of reasons (lack
of a p2m for such domains so no way to find out the list of mfns, dom0
userspace simply doesn't have sufficient context to write sensible
things here, etc) but maybe someone has a better head on today...

Ian.


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-04 19:11                   ` Konrad Rzeszutek Wilk
  (?)
  (?)
@ 2013-01-07 10:25                   ` Ian Campbell
  -1 siblings, 0 replies; 217+ messages in thread
From: Ian Campbell @ 2013-01-07 10:25 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Andrew Cooper, Daniel Kiper, x86, kexec, linux-kernel,
	virtualization, mingo, Eric W. Biederman, Jan Beulich,
	H. Peter Anvin, maxim.uvarov, tglx, vgoyal

On Fri, 2013-01-04 at 19:11 +0000, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote:
> > On Fri, Jan 04, 2013 at 02:41:17PM +0000, Jan Beulich wrote:
> > > >>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > > > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> > > >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> > > >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> > > >> the dom0 kernel to distinguish between loading a crash kernel for
> > > >> itself and loading a kernel for Xen.
> > > >>
> > > >> Or is this just a silly idea complicating the matter?
> > > >
> > > > This is impossible with current Xen kexec/kdump interface.
> > >
> > > Why?
> > 
> > Because current KEXEC_CMD_kexec_load does not load kernel
> > image and other things into Xen memory. It means that it
> > should live somewhere in dom0 Linux kernel memory.
> 
> We could have a very simple hypercall which would have:
> 
> struct fancy_new_hypercall {
> 	xen_pfn_t payload; // IN

This would have to be XEN_GUEST_HANDLE(something) since userspace cannot
figure out what pfns back its memory. In any case since the hypervisor
is going to want to copy the data into the crashkernel space a virtual
address is convenient to have.

> 	ssize_t len; // IN
> #define DATA (1<<1)
> #define DATA_EOF (1<<2)
> #define DATA_KERNEL (1<<3)
> #define DATA_RAMDISK (1<<4)
> 	unsigned int flags; // IN
> 	unsigned int status; // OUT
> };
> 
> which would in a loop just iterate over the payloads and
> let the hypervisor stick it in the crashkernel space.
> 
> This is all hand-waving of course. There probably would be a need
> to figure out how much space you have in the reserved Xen's
> 'crashkernel' memory region too.

This is probably a mad idea but it's Monday morning and I'm sleep
deprived so I'll throw it out there...

What about adding DOMID_KEXEC (similar DOMID_IO etc)? This would allow
dom0 to map the kexec memory space with the usual privcmd mmap
hypercalls and build things in it directly.

OK, I suspect this might not be practical for a variety of reasons (lack
of a p2m for such domains so no way to find out the list of mfns, dom0
userspace simply doesn't have sufficient context to write sensible
things here, etc) but maybe someone has a better head on today...

Ian.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-07 10:25                     ` Ian Campbell
  0 siblings, 0 replies; 217+ messages in thread
From: Ian Campbell @ 2013-01-07 10:25 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Andrew Cooper,
	Daniel Kiper, x86-DgEjT+Ai2ygdnm+yROfE0A,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman, Jan Beulich,
	H. Peter Anvin, maxim.uvarov-QHcLZuEGTsvQT0dZR+AlfA,
	tglx-hfZtesqFncYOwBW4kG4KsQ, vgoyal-H+wXaHxf7aLQT0dZR+AlfA

On Fri, 2013-01-04 at 19:11 +0000, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote:
> > On Fri, Jan 04, 2013 at 02:41:17PM +0000, Jan Beulich wrote:
> > > >>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> > > > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> > > >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> > > >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> > > >> the dom0 kernel to distinguish between loading a crash kernel for
> > > >> itself and loading a kernel for Xen.
> > > >>
> > > >> Or is this just a silly idea complicating the matter?
> > > >
> > > > This is impossible with current Xen kexec/kdump interface.
> > >
> > > Why?
> > 
> > Because current KEXEC_CMD_kexec_load does not load kernel
> > image and other things into Xen memory. It means that it
> > should live somewhere in dom0 Linux kernel memory.
> 
> We could have a very simple hypercall which would have:
> 
> struct fancy_new_hypercall {
> 	xen_pfn_t payload; // IN

This would have to be XEN_GUEST_HANDLE(something) since userspace cannot
figure out what pfns back its memory. In any case since the hypervisor
is going to want to copy the data into the crashkernel space a virtual
address is convenient to have.

> 	ssize_t len; // IN
> #define DATA (1<<1)
> #define DATA_EOF (1<<2)
> #define DATA_KERNEL (1<<3)
> #define DATA_RAMDISK (1<<4)
> 	unsigned int flags; // IN
> 	unsigned int status; // OUT
> };
> 
> which would in a loop just iterate over the payloads and
> let the hypervisor stick it in the crashkernel space.
> 
> This is all hand-waving of course. There probably would be a need
> to figure out how much space you have in the reserved Xen's
> 'crashkernel' memory region too.

This is probably a mad idea but it's Monday morning and I'm sleep
deprived so I'll throw it out there...

What about adding DOMID_KEXEC (similar DOMID_IO etc)? This would allow
dom0 to map the kexec memory space with the usual privcmd mmap
hypercalls and build things in it directly.

OK, I suspect this might not be practical for a variety of reasons (lack
of a p2m for such domains so no way to find out the list of mfns, dom0
userspace simply doesn't have sufficient context to write sensible
things here, etc) but maybe someone has a better head on today...

Ian.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-07 10:25                     ` Ian Campbell
  0 siblings, 0 replies; 217+ messages in thread
From: Ian Campbell @ 2013-01-07 10:25 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Andrew Cooper, Daniel Kiper, x86, kexec, linux-kernel,
	virtualization, mingo, Eric W. Biederman, Jan Beulich,
	H. Peter Anvin, maxim.uvarov, tglx, vgoyal

On Fri, 2013-01-04 at 19:11 +0000, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote:
> > On Fri, Jan 04, 2013 at 02:41:17PM +0000, Jan Beulich wrote:
> > > >>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > > > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> > > >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> > > >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> > > >> the dom0 kernel to distinguish between loading a crash kernel for
> > > >> itself and loading a kernel for Xen.
> > > >>
> > > >> Or is this just a silly idea complicating the matter?
> > > >
> > > > This is impossible with current Xen kexec/kdump interface.
> > >
> > > Why?
> > 
> > Because current KEXEC_CMD_kexec_load does not load kernel
> > image and other things into Xen memory. It means that it
> > should live somewhere in dom0 Linux kernel memory.
> 
> We could have a very simple hypercall which would have:
> 
> struct fancy_new_hypercall {
> 	xen_pfn_t payload; // IN

This would have to be XEN_GUEST_HANDLE(something) since userspace cannot
figure out what pfns back its memory. In any case since the hypervisor
is going to want to copy the data into the crashkernel space a virtual
address is convenient to have.

> 	ssize_t len; // IN
> #define DATA (1<<1)
> #define DATA_EOF (1<<2)
> #define DATA_KERNEL (1<<3)
> #define DATA_RAMDISK (1<<4)
> 	unsigned int flags; // IN
> 	unsigned int status; // OUT
> };
> 
> which would in a loop just iterate over the payloads and
> let the hypervisor stick it in the crashkernel space.
> 
> This is all hand-waving of course. There probably would be a need
> to figure out how much space you have in the reserved Xen's
> 'crashkernel' memory region too.

This is probably a mad idea but it's Monday morning and I'm sleep
deprived so I'll throw it out there...

What about adding DOMID_KEXEC (similar DOMID_IO etc)? This would allow
dom0 to map the kexec memory space with the usual privcmd mmap
hypercalls and build things in it directly.

OK, I suspect this might not be practical for a variety of reasons (lack
of a p2m for such domains so no way to find out the list of mfns, dom0
userspace simply doesn't have sufficient context to write sensible
things here, etc) but maybe someone has a better head on today...

Ian.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-07 10:25                     ` Ian Campbell
  (?)
@ 2013-01-07 10:46                       ` Andrew Cooper
  -1 siblings, 0 replies; 217+ messages in thread
From: Andrew Cooper @ 2013-01-07 10:46 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Konrad Rzeszutek Wilk, Daniel Kiper, xen-devel, H. Peter Anvin,
	x86, kexec, linux-kernel, virtualization, mingo,
	Eric W. Biederman, Jan Beulich, maxim.uvarov, tglx, vgoyal

On 07/01/13 10:25, Ian Campbell wrote:
> On Fri, 2013-01-04 at 19:11 +0000, Konrad Rzeszutek Wilk wrote:
>> On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote:
>>> Because current KEXEC_CMD_kexec_load does not load kernel
>>> image and other things into Xen memory. It means that it
>>> should live somewhere in dom0 Linux kernel memory.
>> We could have a very simple hypercall which would have:
>>
>> struct fancy_new_hypercall {
>> 	xen_pfn_t payload; // IN
> This would have to be XEN_GUEST_HANDLE(something) since userspace cannot
> figure out what pfns back its memory. In any case since the hypervisor
> is going to want to copy the data into the crashkernel space a virtual
> address is convenient to have.
>
>> 	ssize_t len; // IN
>> #define DATA (1<<1)
>> #define DATA_EOF (1<<2)
>> #define DATA_KERNEL (1<<3)
>> #define DATA_RAMDISK (1<<4)
>> 	unsigned int flags; // IN
>> 	unsigned int status; // OUT
>> };
>>
>> which would in a loop just iterate over the payloads and
>> let the hypervisor stick it in the crashkernel space.
>>
>> This is all hand-waving of course. There probably would be a need
>> to figure out how much space you have in the reserved Xen's
>> 'crashkernel' memory region too.
> This is probably a mad idea but it's Monday morning and I'm sleep
> deprived so I'll throw it out there...
>
> What about adding DOMID_KEXEC (similar DOMID_IO etc)? This would allow
> dom0 to map the kexec memory space with the usual privcmd mmap
> hypercalls and build things in it directly.
>
> OK, I suspect this might not be practical for a variety of reasons (lack
> of a p2m for such domains so no way to find out the list of mfns, dom0
> userspace simply doesn't have sufficient context to write sensible
> things here, etc) but maybe someone has a better head on today...
>
> Ian.
>

Given that /sbin/kexec creates a binary blob in memory, surely the most 
simple thing is to get it to suitably mlock() the region and give a list 
of VAs to the hypervisor.

This way, Xen can properly take care of what it does with information 
and where.  For example, at the moment, allowing dom0 to choose where 
gets overwritten in the Xen crash area is a recipe for disaster if a 
crash occurs midway through loading/reloading the crash kernel.

~Andrew


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-07 10:46                       ` Andrew Cooper
  0 siblings, 0 replies; 217+ messages in thread
From: Andrew Cooper @ 2013-01-07 10:46 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, Konrad Rzeszutek Wilk, Daniel Kiper, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman,
	Jan Beulich, H. Peter Anvin, maxim.uvarov, tglx, vgoyal

On 07/01/13 10:25, Ian Campbell wrote:
> On Fri, 2013-01-04 at 19:11 +0000, Konrad Rzeszutek Wilk wrote:
>> On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote:
>>> Because current KEXEC_CMD_kexec_load does not load kernel
>>> image and other things into Xen memory. It means that it
>>> should live somewhere in dom0 Linux kernel memory.
>> We could have a very simple hypercall which would have:
>>
>> struct fancy_new_hypercall {
>> 	xen_pfn_t payload; // IN
> This would have to be XEN_GUEST_HANDLE(something) since userspace cannot
> figure out what pfns back its memory. In any case since the hypervisor
> is going to want to copy the data into the crashkernel space a virtual
> address is convenient to have.
>
>> 	ssize_t len; // IN
>> #define DATA (1<<1)
>> #define DATA_EOF (1<<2)
>> #define DATA_KERNEL (1<<3)
>> #define DATA_RAMDISK (1<<4)
>> 	unsigned int flags; // IN
>> 	unsigned int status; // OUT
>> };
>>
>> which would in a loop just iterate over the payloads and
>> let the hypervisor stick it in the crashkernel space.
>>
>> This is all hand-waving of course. There probably would be a need
>> to figure out how much space you have in the reserved Xen's
>> 'crashkernel' memory region too.
> This is probably a mad idea but it's Monday morning and I'm sleep
> deprived so I'll throw it out there...
>
> What about adding DOMID_KEXEC (similar DOMID_IO etc)? This would allow
> dom0 to map the kexec memory space with the usual privcmd mmap
> hypercalls and build things in it directly.
>
> OK, I suspect this might not be practical for a variety of reasons (lack
> of a p2m for such domains so no way to find out the list of mfns, dom0
> userspace simply doesn't have sufficient context to write sensible
> things here, etc) but maybe someone has a better head on today...
>
> Ian.
>

Given that /sbin/kexec creates a binary blob in memory, surely the most 
simple thing is to get it to suitably mlock() the region and give a list 
of VAs to the hypervisor.

This way, Xen can properly take care of what it does with information 
and where.  For example, at the moment, allowing dom0 to choose where 
gets overwritten in the Xen crash area is a recipe for disaster if a 
crash occurs midway through loading/reloading the crash kernel.

~Andrew

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-07 10:46                       ` Andrew Cooper
  0 siblings, 0 replies; 217+ messages in thread
From: Andrew Cooper @ 2013-01-07 10:46 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, Konrad Rzeszutek Wilk, Daniel Kiper, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman,
	Jan Beulich, H. Peter Anvin, maxim.uvarov, tglx, vgoyal

On 07/01/13 10:25, Ian Campbell wrote:
> On Fri, 2013-01-04 at 19:11 +0000, Konrad Rzeszutek Wilk wrote:
>> On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote:
>>> Because current KEXEC_CMD_kexec_load does not load kernel
>>> image and other things into Xen memory. It means that it
>>> should live somewhere in dom0 Linux kernel memory.
>> We could have a very simple hypercall which would have:
>>
>> struct fancy_new_hypercall {
>> 	xen_pfn_t payload; // IN
> This would have to be XEN_GUEST_HANDLE(something) since userspace cannot
> figure out what pfns back its memory. In any case since the hypervisor
> is going to want to copy the data into the crashkernel space a virtual
> address is convenient to have.
>
>> 	ssize_t len; // IN
>> #define DATA (1<<1)
>> #define DATA_EOF (1<<2)
>> #define DATA_KERNEL (1<<3)
>> #define DATA_RAMDISK (1<<4)
>> 	unsigned int flags; // IN
>> 	unsigned int status; // OUT
>> };
>>
>> which would in a loop just iterate over the payloads and
>> let the hypervisor stick it in the crashkernel space.
>>
>> This is all hand-waving of course. There probably would be a need
>> to figure out how much space you have in the reserved Xen's
>> 'crashkernel' memory region too.
> This is probably a mad idea but it's Monday morning and I'm sleep
> deprived so I'll throw it out there...
>
> What about adding DOMID_KEXEC (similar DOMID_IO etc)? This would allow
> dom0 to map the kexec memory space with the usual privcmd mmap
> hypercalls and build things in it directly.
>
> OK, I suspect this might not be practical for a variety of reasons (lack
> of a p2m for such domains so no way to find out the list of mfns, dom0
> userspace simply doesn't have sufficient context to write sensible
> things here, etc) but maybe someone has a better head on today...
>
> Ian.
>

Given that /sbin/kexec creates a binary blob in memory, surely the most 
simple thing is to get it to suitably mlock() the region and give a list 
of VAs to the hypervisor.

This way, Xen can properly take care of what it does with information 
and where.  For example, at the moment, allowing dom0 to choose where 
gets overwritten in the Xen crash area is a recipe for disaster if a 
crash occurs midway through loading/reloading the crash kernel.

~Andrew


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-07 10:46                       ` Andrew Cooper
  (?)
@ 2013-01-07 10:54                         ` Ian Campbell
  -1 siblings, 0 replies; 217+ messages in thread
From: Ian Campbell @ 2013-01-07 10:54 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Konrad Rzeszutek Wilk, Daniel Kiper, xen-devel, H. Peter Anvin,
	x86, kexec, linux-kernel, virtualization, mingo,
	Eric W. Biederman, Jan Beulich, maxim.uvarov, tglx, vgoyal

On Mon, 2013-01-07 at 10:46 +0000, Andrew Cooper wrote:

> Given that /sbin/kexec creates a binary blob in memory, surely the most 
> simple thing is to get it to suitably mlock() the region and give a list 
> of VAs to the hypervisor.

More than likely. The DOMID_KEXEC thing was just a radon musing ;-)

> This way, Xen can properly take care of what it does with information 
> and where.  For example, at the moment, allowing dom0 to choose where 
> gets overwritten in the Xen crash area is a recipe for disaster if a 
> crash occurs midway through loading/reloading the crash kernel.

That's true. I think there is a double buffering scheme in the current
thing and we should preserve that in any new implementation.

Ian.


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-07 10:54                         ` Ian Campbell
  0 siblings, 0 replies; 217+ messages in thread
From: Ian Campbell @ 2013-01-07 10:54 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: xen-devel, Konrad Rzeszutek Wilk, Daniel Kiper, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman,
	Jan Beulich, H. Peter Anvin, maxim.uvarov, tglx, vgoyal

On Mon, 2013-01-07 at 10:46 +0000, Andrew Cooper wrote:

> Given that /sbin/kexec creates a binary blob in memory, surely the most 
> simple thing is to get it to suitably mlock() the region and give a list 
> of VAs to the hypervisor.

More than likely. The DOMID_KEXEC thing was just a radon musing ;-)

> This way, Xen can properly take care of what it does with information 
> and where.  For example, at the moment, allowing dom0 to choose where 
> gets overwritten in the Xen crash area is a recipe for disaster if a 
> crash occurs midway through loading/reloading the crash kernel.

That's true. I think there is a double buffering scheme in the current
thing and we should preserve that in any new implementation.

Ian.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-07 10:54                         ` Ian Campbell
  0 siblings, 0 replies; 217+ messages in thread
From: Ian Campbell @ 2013-01-07 10:54 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: xen-devel, Konrad Rzeszutek Wilk, Daniel Kiper, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman,
	Jan Beulich, H. Peter Anvin, maxim.uvarov, tglx, vgoyal

On Mon, 2013-01-07 at 10:46 +0000, Andrew Cooper wrote:

> Given that /sbin/kexec creates a binary blob in memory, surely the most 
> simple thing is to get it to suitably mlock() the region and give a list 
> of VAs to the hypervisor.

More than likely. The DOMID_KEXEC thing was just a radon musing ;-)

> This way, Xen can properly take care of what it does with information 
> and where.  For example, at the moment, allowing dom0 to choose where 
> gets overwritten in the Xen crash area is a recipe for disaster if a 
> crash occurs midway through loading/reloading the crash kernel.

That's true. I think there is a double buffering scheme in the current
thing and we should preserve that in any new implementation.

Ian.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-04 19:11                   ` Konrad Rzeszutek Wilk
  (?)
@ 2013-01-07 12:34                     ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-07 12:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jan Beulich, Andrew Cooper, x86, tglx, kexec, virtualization,
	xen-devel, maxim.uvarov, mingo, vgoyal, linux-kernel,
	Eric W. Biederman, H. Peter Anvin

On Fri, Jan 04, 2013 at 02:11:46PM -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote:
> > On Fri, Jan 04, 2013 at 02:41:17PM +0000, Jan Beulich wrote:
> > > >>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > > > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> > > >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> > > >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> > > >> the dom0 kernel to distinguish between loading a crash kernel for
> > > >> itself and loading a kernel for Xen.
> > > >>
> > > >> Or is this just a silly idea complicating the matter?
> > > >
> > > > This is impossible with current Xen kexec/kdump interface.
> > >
> > > Why?
> >
> > Because current KEXEC_CMD_kexec_load does not load kernel
> > image and other things into Xen memory. It means that it
> > should live somewhere in dom0 Linux kernel memory.
>
> We could have a very simple hypercall which would have:
>
> struct fancy_new_hypercall {
> 	xen_pfn_t payload; // IN
> 	ssize_t len; // IN
> #define DATA (1<<1)
> #define DATA_EOF (1<<2)
> #define DATA_KERNEL (1<<3)
> #define DATA_RAMDISK (1<<4)
> 	unsigned int flags; // IN
> 	unsigned int status; // OUT
> };
>
> which would in a loop just iterate over the payloads and
> let the hypervisor stick it in the crashkernel space.
>
> This is all hand-waving of course. There probably would be a need
> to figure out how much space you have in the reserved Xen's
> 'crashkernel' memory region too.

I think that new kexec hypercall function should mimics kexec syscall.
It means that all arguments passed to hypercall should have same types
if it is possible or if it is not possible then conversion should be done
in very easy way. Additionally, I think that one call of new hypercall
load function should load all needed thinks in right place and
return relevant status. Last but not least, new functionality should
be available through /dev/xen/privcmd or directly from kernel without
bigger effort.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-07 12:34                     ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-07 12:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, H. Peter Anvin, Andrew Cooper, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman,
	Jan Beulich, maxim.uvarov, tglx, vgoyal

On Fri, Jan 04, 2013 at 02:11:46PM -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote:
> > On Fri, Jan 04, 2013 at 02:41:17PM +0000, Jan Beulich wrote:
> > > >>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > > > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> > > >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> > > >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> > > >> the dom0 kernel to distinguish between loading a crash kernel for
> > > >> itself and loading a kernel for Xen.
> > > >>
> > > >> Or is this just a silly idea complicating the matter?
> > > >
> > > > This is impossible with current Xen kexec/kdump interface.
> > >
> > > Why?
> >
> > Because current KEXEC_CMD_kexec_load does not load kernel
> > image and other things into Xen memory. It means that it
> > should live somewhere in dom0 Linux kernel memory.
>
> We could have a very simple hypercall which would have:
>
> struct fancy_new_hypercall {
> 	xen_pfn_t payload; // IN
> 	ssize_t len; // IN
> #define DATA (1<<1)
> #define DATA_EOF (1<<2)
> #define DATA_KERNEL (1<<3)
> #define DATA_RAMDISK (1<<4)
> 	unsigned int flags; // IN
> 	unsigned int status; // OUT
> };
>
> which would in a loop just iterate over the payloads and
> let the hypervisor stick it in the crashkernel space.
>
> This is all hand-waving of course. There probably would be a need
> to figure out how much space you have in the reserved Xen's
> 'crashkernel' memory region too.

I think that new kexec hypercall function should mimics kexec syscall.
It means that all arguments passed to hypercall should have same types
if it is possible or if it is not possible then conversion should be done
in very easy way. Additionally, I think that one call of new hypercall
load function should load all needed thinks in right place and
return relevant status. Last but not least, new functionality should
be available through /dev/xen/privcmd or directly from kernel without
bigger effort.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-07 12:34                     ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-07 12:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, H. Peter Anvin, Andrew Cooper, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman,
	Jan Beulich, maxim.uvarov, tglx, vgoyal

On Fri, Jan 04, 2013 at 02:11:46PM -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote:
> > On Fri, Jan 04, 2013 at 02:41:17PM +0000, Jan Beulich wrote:
> > > >>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > > > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> > > >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> > > >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> > > >> the dom0 kernel to distinguish between loading a crash kernel for
> > > >> itself and loading a kernel for Xen.
> > > >>
> > > >> Or is this just a silly idea complicating the matter?
> > > >
> > > > This is impossible with current Xen kexec/kdump interface.
> > >
> > > Why?
> >
> > Because current KEXEC_CMD_kexec_load does not load kernel
> > image and other things into Xen memory. It means that it
> > should live somewhere in dom0 Linux kernel memory.
>
> We could have a very simple hypercall which would have:
>
> struct fancy_new_hypercall {
> 	xen_pfn_t payload; // IN
> 	ssize_t len; // IN
> #define DATA (1<<1)
> #define DATA_EOF (1<<2)
> #define DATA_KERNEL (1<<3)
> #define DATA_RAMDISK (1<<4)
> 	unsigned int flags; // IN
> 	unsigned int status; // OUT
> };
>
> which would in a loop just iterate over the payloads and
> let the hypervisor stick it in the crashkernel space.
>
> This is all hand-waving of course. There probably would be a need
> to figure out how much space you have in the reserved Xen's
> 'crashkernel' memory region too.

I think that new kexec hypercall function should mimics kexec syscall.
It means that all arguments passed to hypercall should have same types
if it is possible or if it is not possible then conversion should be done
in very easy way. Additionally, I think that one call of new hypercall
load function should load all needed thinks in right place and
return relevant status. Last but not least, new functionality should
be available through /dev/xen/privcmd or directly from kernel without
bigger effort.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-07  9:48               ` Jan Beulich
  (?)
@ 2013-01-07 12:52                 ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-07 12:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: andrew.cooper3, x86, tglx, kexec, virtualization, xen-devel,
	konrad.wilk, maxim.uvarov, mingo, vgoyal, linux-kernel, ebiederm,
	hpa

On Mon, Jan 07, 2013 at 09:48:20AM +0000, Jan Beulich wrote:
> >>> On 04.01.13 at 18:25, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > Right, so where is virtual mapping of control page established?
> > I could not find relevant code in SLES kernel which does that.
>
> In the hypervisor (xen/arch/x86/machine_kexec.c:machine_kexec_load()).
> xen/arch/x86/machine_kexec.c:machine_kexec() then simply uses
> image->page_list[1].

This (xen/arch/x86/machine_kexec.c:machine_kexec_load()) maps relevant
page (allocated earlier by dom0) in hypervisor fixmap area. However,
it does not make relevant mapping in transition page table which
leads to crash when %cr3 is switched from Xen page table to
transition page table.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2013-01-07 12:52                 ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-07 12:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

On Mon, Jan 07, 2013 at 09:48:20AM +0000, Jan Beulich wrote:
> >>> On 04.01.13 at 18:25, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > Right, so where is virtual mapping of control page established?
> > I could not find relevant code in SLES kernel which does that.
>
> In the hypervisor (xen/arch/x86/machine_kexec.c:machine_kexec_load()).
> xen/arch/x86/machine_kexec.c:machine_kexec() then simply uses
> image->page_list[1].

This (xen/arch/x86/machine_kexec.c:machine_kexec_load()) maps relevant
page (allocated earlier by dom0) in hypervisor fixmap area. However,
it does not make relevant mapping in transition page table which
leads to crash when %cr3 is switched from Xen page table to
transition page table.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-07  9:48               ` Jan Beulich
  (?)
@ 2013-01-07 12:52               ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-07 12:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

On Mon, Jan 07, 2013 at 09:48:20AM +0000, Jan Beulich wrote:
> >>> On 04.01.13 at 18:25, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > Right, so where is virtual mapping of control page established?
> > I could not find relevant code in SLES kernel which does that.
>
> In the hypervisor (xen/arch/x86/machine_kexec.c:machine_kexec_load()).
> xen/arch/x86/machine_kexec.c:machine_kexec() then simply uses
> image->page_list[1].

This (xen/arch/x86/machine_kexec.c:machine_kexec_load()) maps relevant
page (allocated earlier by dom0) in hypervisor fixmap area. However,
it does not make relevant mapping in transition page table which
leads to crash when %cr3 is switched from Xen page table to
transition page table.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2013-01-07 12:52                 ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-07 12:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

On Mon, Jan 07, 2013 at 09:48:20AM +0000, Jan Beulich wrote:
> >>> On 04.01.13 at 18:25, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > Right, so where is virtual mapping of control page established?
> > I could not find relevant code in SLES kernel which does that.
>
> In the hypervisor (xen/arch/x86/machine_kexec.c:machine_kexec_load()).
> xen/arch/x86/machine_kexec.c:machine_kexec() then simply uses
> image->page_list[1].

This (xen/arch/x86/machine_kexec.c:machine_kexec_load()) maps relevant
page (allocated earlier by dom0) in hypervisor fixmap area. However,
it does not make relevant mapping in transition page table which
leads to crash when %cr3 is switched from Xen page table to
transition page table.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-07 12:52                 ` Daniel Kiper
  (?)
@ 2013-01-07 13:05                   ` Jan Beulich
  -1 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-07 13:05 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: andrew.cooper3, x86, tglx, kexec, virtualization, xen-devel,
	konrad.wilk, maxim.uvarov, mingo, vgoyal, linux-kernel, ebiederm,
	hpa

>>> On 07.01.13 at 13:52, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> On Mon, Jan 07, 2013 at 09:48:20AM +0000, Jan Beulich wrote:
>> >>> On 04.01.13 at 18:25, Daniel Kiper <daniel.kiper@oracle.com> wrote:
>> > Right, so where is virtual mapping of control page established?
>> > I could not find relevant code in SLES kernel which does that.
>>
>> In the hypervisor (xen/arch/x86/machine_kexec.c:machine_kexec_load()).
>> xen/arch/x86/machine_kexec.c:machine_kexec() then simply uses
>> image->page_list[1].
> 
> This (xen/arch/x86/machine_kexec.c:machine_kexec_load()) maps relevant
> page (allocated earlier by dom0) in hypervisor fixmap area. However,
> it does not make relevant mapping in transition page table which
> leads to crash when %cr3 is switched from Xen page table to
> transition page table.

That indeed could explain _random_ failures - the fixmap entries
get created with _PAGE_GLOBAL set, i.e. don't get flushed with
the CR3 write unless CR4.PGE is clear.

And I don't see how your allocation of intermediate page tables
would help: You wouldn't know where the mapping of the control
page lives until you're actually in the early relocate_kernel code.
Or was it that what distinguishes your cloned code from the
native original?

Jan


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2013-01-07 13:05                   ` Jan Beulich
  0 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-07 13:05 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

>>> On 07.01.13 at 13:52, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> On Mon, Jan 07, 2013 at 09:48:20AM +0000, Jan Beulich wrote:
>> >>> On 04.01.13 at 18:25, Daniel Kiper <daniel.kiper@oracle.com> wrote:
>> > Right, so where is virtual mapping of control page established?
>> > I could not find relevant code in SLES kernel which does that.
>>
>> In the hypervisor (xen/arch/x86/machine_kexec.c:machine_kexec_load()).
>> xen/arch/x86/machine_kexec.c:machine_kexec() then simply uses
>> image->page_list[1].
> 
> This (xen/arch/x86/machine_kexec.c:machine_kexec_load()) maps relevant
> page (allocated earlier by dom0) in hypervisor fixmap area. However,
> it does not make relevant mapping in transition page table which
> leads to crash when %cr3 is switched from Xen page table to
> transition page table.

That indeed could explain _random_ failures - the fixmap entries
get created with _PAGE_GLOBAL set, i.e. don't get flushed with
the CR3 write unless CR4.PGE is clear.

And I don't see how your allocation of intermediate page tables
would help: You wouldn't know where the mapping of the control
page lives until you're actually in the early relocate_kernel code.
Or was it that what distinguishes your cloned code from the
native original?

Jan

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-07 12:52                 ` Daniel Kiper
                                   ` (2 preceding siblings ...)
  (?)
@ 2013-01-07 13:05                 ` Jan Beulich
  -1 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-07 13:05 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

>>> On 07.01.13 at 13:52, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> On Mon, Jan 07, 2013 at 09:48:20AM +0000, Jan Beulich wrote:
>> >>> On 04.01.13 at 18:25, Daniel Kiper <daniel.kiper@oracle.com> wrote:
>> > Right, so where is virtual mapping of control page established?
>> > I could not find relevant code in SLES kernel which does that.
>>
>> In the hypervisor (xen/arch/x86/machine_kexec.c:machine_kexec_load()).
>> xen/arch/x86/machine_kexec.c:machine_kexec() then simply uses
>> image->page_list[1].
> 
> This (xen/arch/x86/machine_kexec.c:machine_kexec_load()) maps relevant
> page (allocated earlier by dom0) in hypervisor fixmap area. However,
> it does not make relevant mapping in transition page table which
> leads to crash when %cr3 is switched from Xen page table to
> transition page table.

That indeed could explain _random_ failures - the fixmap entries
get created with _PAGE_GLOBAL set, i.e. don't get flushed with
the CR3 write unless CR4.PGE is clear.

And I don't see how your allocation of intermediate page tables
would help: You wouldn't know where the mapping of the control
page lives until you're actually in the early relocate_kernel code.
Or was it that what distinguishes your cloned code from the
native original?

Jan

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2013-01-07 13:05                   ` Jan Beulich
  0 siblings, 0 replies; 217+ messages in thread
From: Jan Beulich @ 2013-01-07 13:05 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

>>> On 07.01.13 at 13:52, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> On Mon, Jan 07, 2013 at 09:48:20AM +0000, Jan Beulich wrote:
>> >>> On 04.01.13 at 18:25, Daniel Kiper <daniel.kiper@oracle.com> wrote:
>> > Right, so where is virtual mapping of control page established?
>> > I could not find relevant code in SLES kernel which does that.
>>
>> In the hypervisor (xen/arch/x86/machine_kexec.c:machine_kexec_load()).
>> xen/arch/x86/machine_kexec.c:machine_kexec() then simply uses
>> image->page_list[1].
> 
> This (xen/arch/x86/machine_kexec.c:machine_kexec_load()) maps relevant
> page (allocated earlier by dom0) in hypervisor fixmap area. However,
> it does not make relevant mapping in transition page table which
> leads to crash when %cr3 is switched from Xen page table to
> transition page table.

That indeed could explain _random_ failures - the fixmap entries
get created with _PAGE_GLOBAL set, i.e. don't get flushed with
the CR3 write unless CR4.PGE is clear.

And I don't see how your allocation of intermediate page tables
would help: You wouldn't know where the mapping of the control
page lives until you're actually in the early relocate_kernel code.
Or was it that what distinguishes your cloned code from the
native original?

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-07 13:49                       ` Ian Campbell
  0 siblings, 0 replies; 217+ messages in thread
From: Ian Campbell @ 2013-01-07 13:49 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: Konrad Rzeszutek Wilk, xen-devel, H. Peter Anvin, Andrew Cooper,
	x86, kexec, linux-kernel, virtualization, mingo,
	Eric W. Biederman, Jan Beulich, maxim.uvarov, tglx, vgoyal

On Mon, 2013-01-07 at 12:34 +0000, Daniel Kiper wrote:
> I think that new kexec hypercall function should mimics kexec syscall.

We want to have an interface can be used by non-Linux domains (both dom0
and domU) as well though, so please bear this in mind.

Historically we've not always been good at this when the hypercall
interface is strongly tied to a particular guest implementation (in some
sense this is the problem with the current kexec hypercall).

Also what makes for a good syscall interface does not necessarily make
for a good hypercall interface.

Ian.


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-07 12:34                     ` Daniel Kiper
  (?)
  (?)
@ 2013-01-07 13:49                     ` Ian Campbell
  -1 siblings, 0 replies; 217+ messages in thread
From: Ian Campbell @ 2013-01-07 13:49 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, Konrad Rzeszutek Wilk, Andrew Cooper, maxim.uvarov,
	x86, kexec, linux-kernel, virtualization, mingo,
	Eric W. Biederman, Jan Beulich, H. Peter Anvin, tglx, vgoyal

On Mon, 2013-01-07 at 12:34 +0000, Daniel Kiper wrote:
> I think that new kexec hypercall function should mimics kexec syscall.

We want to have an interface can be used by non-Linux domains (both dom0
and domU) as well though, so please bear this in mind.

Historically we've not always been good at this when the hypercall
interface is strongly tied to a particular guest implementation (in some
sense this is the problem with the current kexec hypercall).

Also what makes for a good syscall interface does not necessarily make
for a good hypercall interface.

Ian.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-07 13:49                       ` Ian Campbell
  0 siblings, 0 replies; 217+ messages in thread
From: Ian Campbell @ 2013-01-07 13:49 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR,
	Konrad Rzeszutek Wilk, Andrew Cooper,
	maxim.uvarov-QHcLZuEGTsvQT0dZR+AlfA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman, Jan Beulich,
	H. Peter Anvin, tglx-hfZtesqFncYOwBW4kG4KsQ,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA

On Mon, 2013-01-07 at 12:34 +0000, Daniel Kiper wrote:
> I think that new kexec hypercall function should mimics kexec syscall.

We want to have an interface can be used by non-Linux domains (both dom0
and domU) as well though, so please bear this in mind.

Historically we've not always been good at this when the hypercall
interface is strongly tied to a particular guest implementation (in some
sense this is the problem with the current kexec hypercall).

Also what makes for a good syscall interface does not necessarily make
for a good hypercall interface.

Ian.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-07 13:49                       ` Ian Campbell
  0 siblings, 0 replies; 217+ messages in thread
From: Ian Campbell @ 2013-01-07 13:49 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, Konrad Rzeszutek Wilk, Andrew Cooper, maxim.uvarov,
	x86, kexec, linux-kernel, virtualization, mingo,
	Eric W. Biederman, Jan Beulich, H. Peter Anvin, tglx, vgoyal

On Mon, 2013-01-07 at 12:34 +0000, Daniel Kiper wrote:
> I think that new kexec hypercall function should mimics kexec syscall.

We want to have an interface can be used by non-Linux domains (both dom0
and domU) as well though, so please bear this in mind.

Historically we've not always been good at this when the hypercall
interface is strongly tied to a particular guest implementation (in some
sense this is the problem with the current kexec hypercall).

Also what makes for a good syscall interface does not necessarily make
for a good hypercall interface.

Ian.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-07 12:34                     ` Daniel Kiper
  (?)
@ 2013-01-07 16:20                       ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 217+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-07 16:20 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: Jan Beulich, Andrew Cooper, x86, tglx, kexec, virtualization,
	xen-devel, maxim.uvarov, mingo, vgoyal, linux-kernel,
	Eric W. Biederman, H. Peter Anvin

On Mon, Jan 07, 2013 at 01:34:04PM +0100, Daniel Kiper wrote:
> On Fri, Jan 04, 2013 at 02:11:46PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote:
> > > On Fri, Jan 04, 2013 at 02:41:17PM +0000, Jan Beulich wrote:
> > > > >>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > > > > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> > > > >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> > > > >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> > > > >> the dom0 kernel to distinguish between loading a crash kernel for
> > > > >> itself and loading a kernel for Xen.
> > > > >>
> > > > >> Or is this just a silly idea complicating the matter?
> > > > >
> > > > > This is impossible with current Xen kexec/kdump interface.
> > > >
> > > > Why?
> > >
> > > Because current KEXEC_CMD_kexec_load does not load kernel
> > > image and other things into Xen memory. It means that it
> > > should live somewhere in dom0 Linux kernel memory.
> >
> > We could have a very simple hypercall which would have:
> >
> > struct fancy_new_hypercall {
> > 	xen_pfn_t payload; // IN
> > 	ssize_t len; // IN
> > #define DATA (1<<1)
> > #define DATA_EOF (1<<2)
> > #define DATA_KERNEL (1<<3)
> > #define DATA_RAMDISK (1<<4)
> > 	unsigned int flags; // IN
> > 	unsigned int status; // OUT
> > };
> >
> > which would in a loop just iterate over the payloads and
> > let the hypervisor stick it in the crashkernel space.
> >
> > This is all hand-waving of course. There probably would be a need
> > to figure out how much space you have in the reserved Xen's
> > 'crashkernel' memory region too.
> 
> I think that new kexec hypercall function should mimics kexec syscall.
> It means that all arguments passed to hypercall should have same types
> if it is possible or if it is not possible then conversion should be done
> in very easy way. Additionally, I think that one call of new hypercall
> load function should load all needed thinks in right place and
> return relevant status. Last but not least, new functionality should

We are not restricted to just _one_ hypercall. And this loading
thing could be similar to the micrcode hypercall - which just points
to a virtual address along with the length - and says 'load me'.

> be available through /dev/xen/privcmd or directly from kernel without
> bigger effort.

Perhaps we should have a email thread on xen-devel where we hash out
some ideas. Eric, would you be OK included on this - it would make
sense for this mechanism to be as future-proof as possible - and I am not
sure what your plans for kexec are in the future?
> 
> Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-07 16:20                       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 217+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-07 16:20 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, H. Peter Anvin, Andrew Cooper, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman,
	Jan Beulich, maxim.uvarov, tglx, vgoyal

On Mon, Jan 07, 2013 at 01:34:04PM +0100, Daniel Kiper wrote:
> On Fri, Jan 04, 2013 at 02:11:46PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote:
> > > On Fri, Jan 04, 2013 at 02:41:17PM +0000, Jan Beulich wrote:
> > > > >>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > > > > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> > > > >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> > > > >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> > > > >> the dom0 kernel to distinguish between loading a crash kernel for
> > > > >> itself and loading a kernel for Xen.
> > > > >>
> > > > >> Or is this just a silly idea complicating the matter?
> > > > >
> > > > > This is impossible with current Xen kexec/kdump interface.
> > > >
> > > > Why?
> > >
> > > Because current KEXEC_CMD_kexec_load does not load kernel
> > > image and other things into Xen memory. It means that it
> > > should live somewhere in dom0 Linux kernel memory.
> >
> > We could have a very simple hypercall which would have:
> >
> > struct fancy_new_hypercall {
> > 	xen_pfn_t payload; // IN
> > 	ssize_t len; // IN
> > #define DATA (1<<1)
> > #define DATA_EOF (1<<2)
> > #define DATA_KERNEL (1<<3)
> > #define DATA_RAMDISK (1<<4)
> > 	unsigned int flags; // IN
> > 	unsigned int status; // OUT
> > };
> >
> > which would in a loop just iterate over the payloads and
> > let the hypervisor stick it in the crashkernel space.
> >
> > This is all hand-waving of course. There probably would be a need
> > to figure out how much space you have in the reserved Xen's
> > 'crashkernel' memory region too.
> 
> I think that new kexec hypercall function should mimics kexec syscall.
> It means that all arguments passed to hypercall should have same types
> if it is possible or if it is not possible then conversion should be done
> in very easy way. Additionally, I think that one call of new hypercall
> load function should load all needed thinks in right place and
> return relevant status. Last but not least, new functionality should

We are not restricted to just _one_ hypercall. And this loading
thing could be similar to the micrcode hypercall - which just points
to a virtual address along with the length - and says 'load me'.

> be available through /dev/xen/privcmd or directly from kernel without
> bigger effort.

Perhaps we should have a email thread on xen-devel where we hash out
some ideas. Eric, would you be OK included on this - it would make
sense for this mechanism to be as future-proof as possible - and I am not
sure what your plans for kexec are in the future?
> 
> Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-07 16:20                       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 217+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-07 16:20 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, H. Peter Anvin, Andrew Cooper, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman,
	Jan Beulich, maxim.uvarov, tglx, vgoyal

On Mon, Jan 07, 2013 at 01:34:04PM +0100, Daniel Kiper wrote:
> On Fri, Jan 04, 2013 at 02:11:46PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote:
> > > On Fri, Jan 04, 2013 at 02:41:17PM +0000, Jan Beulich wrote:
> > > > >>> On 04.01.13 at 15:22, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > > > > On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> > > > >> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> > > > >> hypercalls using /dev/xen/privcmd.  This would remove the need for
> > > > >> the dom0 kernel to distinguish between loading a crash kernel for
> > > > >> itself and loading a kernel for Xen.
> > > > >>
> > > > >> Or is this just a silly idea complicating the matter?
> > > > >
> > > > > This is impossible with current Xen kexec/kdump interface.
> > > >
> > > > Why?
> > >
> > > Because current KEXEC_CMD_kexec_load does not load kernel
> > > image and other things into Xen memory. It means that it
> > > should live somewhere in dom0 Linux kernel memory.
> >
> > We could have a very simple hypercall which would have:
> >
> > struct fancy_new_hypercall {
> > 	xen_pfn_t payload; // IN
> > 	ssize_t len; // IN
> > #define DATA (1<<1)
> > #define DATA_EOF (1<<2)
> > #define DATA_KERNEL (1<<3)
> > #define DATA_RAMDISK (1<<4)
> > 	unsigned int flags; // IN
> > 	unsigned int status; // OUT
> > };
> >
> > which would in a loop just iterate over the payloads and
> > let the hypervisor stick it in the crashkernel space.
> >
> > This is all hand-waving of course. There probably would be a need
> > to figure out how much space you have in the reserved Xen's
> > 'crashkernel' memory region too.
> 
> I think that new kexec hypercall function should mimics kexec syscall.
> It means that all arguments passed to hypercall should have same types
> if it is possible or if it is not possible then conversion should be done
> in very easy way. Additionally, I think that one call of new hypercall
> load function should load all needed thinks in right place and
> return relevant status. Last but not least, new functionality should

We are not restricted to just _one_ hypercall. And this loading
thing could be similar to the micrcode hypercall - which just points
to a virtual address along with the length - and says 'load me'.

> be available through /dev/xen/privcmd or directly from kernel without
> bigger effort.

Perhaps we should have a email thread on xen-devel where we hash out
some ideas. Eric, would you be OK included on this - it would make
sense for this mechanism to be as future-proof as possible - and I am not
sure what your plans for kexec are in the future?
> 
> Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-07 13:05                   ` Jan Beulich
@ 2013-01-09 18:42                     ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-09 18:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: andrew.cooper3, x86, tglx, kexec, virtualization, xen-devel,
	konrad.wilk, maxim.uvarov, mingo, vgoyal, linux-kernel, ebiederm,
	hpa

On Mon, Jan 07, 2013 at 01:05:10PM +0000, Jan Beulich wrote:
> >>> On 07.01.13 at 13:52, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > On Mon, Jan 07, 2013 at 09:48:20AM +0000, Jan Beulich wrote:
> >> >>> On 04.01.13 at 18:25, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> >> > Right, so where is virtual mapping of control page established?
> >> > I could not find relevant code in SLES kernel which does that.
> >>
> >> In the hypervisor (xen/arch/x86/machine_kexec.c:machine_kexec_load()).
> >> xen/arch/x86/machine_kexec.c:machine_kexec() then simply uses
> >> image->page_list[1].
> >
> > This (xen/arch/x86/machine_kexec.c:machine_kexec_load()) maps relevant
> > page (allocated earlier by dom0) in hypervisor fixmap area. However,
> > it does not make relevant mapping in transition page table which
> > leads to crash when %cr3 is switched from Xen page table to
> > transition page table.
>
> That indeed could explain _random_ failures - the fixmap entries
> get created with _PAGE_GLOBAL set, i.e. don't get flushed with
> the CR3 write unless CR4.PGE is clear.

This does not matter. As I stated earlier virtual mapping is wrong.
relocate_kernel() is mapped at its virtual address in Linux kernel
instead of control page at its virtual address in Xen hypervisor.
I tested SLES kernel once again. It does not work.

> And I don't see how your allocation of intermediate page tables
> would help: You wouldn't know where the mapping of the control
> page lives until you're actually in the early relocate_kernel code.

Right. Allocation itself is not a solution for this problem.
It should be acompanied by code which establishes transition
page table in relocate_kernel() (which is later copied
to control page, i.e. code of relocate_kernel()).

> Or was it that what distinguishes your cloned code from the
> native original?

No, my code is based on native original.
There are some implementation differences.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-07 13:05                   ` Jan Beulich
  (?)
  (?)
@ 2013-01-09 18:42                   ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-09 18:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

On Mon, Jan 07, 2013 at 01:05:10PM +0000, Jan Beulich wrote:
> >>> On 07.01.13 at 13:52, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > On Mon, Jan 07, 2013 at 09:48:20AM +0000, Jan Beulich wrote:
> >> >>> On 04.01.13 at 18:25, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> >> > Right, so where is virtual mapping of control page established?
> >> > I could not find relevant code in SLES kernel which does that.
> >>
> >> In the hypervisor (xen/arch/x86/machine_kexec.c:machine_kexec_load()).
> >> xen/arch/x86/machine_kexec.c:machine_kexec() then simply uses
> >> image->page_list[1].
> >
> > This (xen/arch/x86/machine_kexec.c:machine_kexec_load()) maps relevant
> > page (allocated earlier by dom0) in hypervisor fixmap area. However,
> > it does not make relevant mapping in transition page table which
> > leads to crash when %cr3 is switched from Xen page table to
> > transition page table.
>
> That indeed could explain _random_ failures - the fixmap entries
> get created with _PAGE_GLOBAL set, i.e. don't get flushed with
> the CR3 write unless CR4.PGE is clear.

This does not matter. As I stated earlier virtual mapping is wrong.
relocate_kernel() is mapped at its virtual address in Linux kernel
instead of control page at its virtual address in Xen hypervisor.
I tested SLES kernel once again. It does not work.

> And I don't see how your allocation of intermediate page tables
> would help: You wouldn't know where the mapping of the control
> page lives until you're actually in the early relocate_kernel code.

Right. Allocation itself is not a solution for this problem.
It should be acompanied by code which establishes transition
page table in relocate_kernel() (which is later copied
to control page, i.e. code of relocate_kernel()).

> Or was it that what distinguishes your cloned code from the
> native original?

No, my code is based on native original.
There are some implementation differences.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-07 13:05                   ` Jan Beulich
                                     ` (2 preceding siblings ...)
  (?)
@ 2013-01-09 18:42                   ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-09 18:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

On Mon, Jan 07, 2013 at 01:05:10PM +0000, Jan Beulich wrote:
> >>> On 07.01.13 at 13:52, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > On Mon, Jan 07, 2013 at 09:48:20AM +0000, Jan Beulich wrote:
> >> >>> On 04.01.13 at 18:25, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> >> > Right, so where is virtual mapping of control page established?
> >> > I could not find relevant code in SLES kernel which does that.
> >>
> >> In the hypervisor (xen/arch/x86/machine_kexec.c:machine_kexec_load()).
> >> xen/arch/x86/machine_kexec.c:machine_kexec() then simply uses
> >> image->page_list[1].
> >
> > This (xen/arch/x86/machine_kexec.c:machine_kexec_load()) maps relevant
> > page (allocated earlier by dom0) in hypervisor fixmap area. However,
> > it does not make relevant mapping in transition page table which
> > leads to crash when %cr3 is switched from Xen page table to
> > transition page table.
>
> That indeed could explain _random_ failures - the fixmap entries
> get created with _PAGE_GLOBAL set, i.e. don't get flushed with
> the CR3 write unless CR4.PGE is clear.

This does not matter. As I stated earlier virtual mapping is wrong.
relocate_kernel() is mapped at its virtual address in Linux kernel
instead of control page at its virtual address in Xen hypervisor.
I tested SLES kernel once again. It does not work.

> And I don't see how your allocation of intermediate page tables
> would help: You wouldn't know where the mapping of the control
> page lives until you're actually in the early relocate_kernel code.

Right. Allocation itself is not a solution for this problem.
It should be acompanied by code which establishes transition
page table in relocate_kernel() (which is later copied
to control page, i.e. code of relocate_kernel()).

> Or was it that what distinguishes your cloned code from the
> native original?

No, my code is based on native original.
There are some implementation differences.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2013-01-09 18:42                     ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-09 18:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hpa, konrad.wilk, andrew.cooper3, x86, kexec, linux-kernel,
	xen-devel, mingo, ebiederm, maxim.uvarov, tglx, virtualization,
	vgoyal

On Mon, Jan 07, 2013 at 01:05:10PM +0000, Jan Beulich wrote:
> >>> On 07.01.13 at 13:52, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > On Mon, Jan 07, 2013 at 09:48:20AM +0000, Jan Beulich wrote:
> >> >>> On 04.01.13 at 18:25, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> >> > Right, so where is virtual mapping of control page established?
> >> > I could not find relevant code in SLES kernel which does that.
> >>
> >> In the hypervisor (xen/arch/x86/machine_kexec.c:machine_kexec_load()).
> >> xen/arch/x86/machine_kexec.c:machine_kexec() then simply uses
> >> image->page_list[1].
> >
> > This (xen/arch/x86/machine_kexec.c:machine_kexec_load()) maps relevant
> > page (allocated earlier by dom0) in hypervisor fixmap area. However,
> > it does not make relevant mapping in transition page table which
> > leads to crash when %cr3 is switched from Xen page table to
> > transition page table.
>
> That indeed could explain _random_ failures - the fixmap entries
> get created with _PAGE_GLOBAL set, i.e. don't get flushed with
> the CR3 write unless CR4.PGE is clear.

This does not matter. As I stated earlier virtual mapping is wrong.
relocate_kernel() is mapped at its virtual address in Linux kernel
instead of control page at its virtual address in Xen hypervisor.
I tested SLES kernel once again. It does not work.

> And I don't see how your allocation of intermediate page tables
> would help: You wouldn't know where the mapping of the control
> page lives until you're actually in the early relocate_kernel code.

Right. Allocation itself is not a solution for this problem.
It should be acompanied by code which establishes transition
page table in relocate_kernel() (which is later copied
to control page, i.e. code of relocate_kernel()).

> Or was it that what distinguishes your cloned code from the
> native original?

No, my code is based on native original.
There are some implementation differences.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-04 15:15         ` Daniel Kiper
  (?)
@ 2013-01-10 14:07           ` David Vrabel
  -1 siblings, 0 replies; 217+ messages in thread
From: David Vrabel @ 2013-01-10 14:07 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: Jan Beulich, hpa, konrad.wilk, andrew.cooper3, x86, kexec,
	linux-kernel, xen-devel, mingo, ebiederm, maxim.uvarov, tglx,
	virtualization, vgoyal

On 04/01/13 15:15, Daniel Kiper wrote:
> On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
>>>>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
>>> Some implementations (e.g. Xen PVOPS) could not use part of identity page table
>>> to construct transition page table. It means that they require separate PUDs,
>>> PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
>>> requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
>>> code.
>>
>> So you keep posting this despite it having got pointed out on each
>> earlier submission that this is unnecessary, proven by the fact that
>> the non-pvops Xen kernels can get away without it. Why?
> 
> Sorry but I forgot to reply for your email last time.
> 
> I am still not convinced. I have tested SUSE kernel itself and it does not work.
> Maybe I missed something but... Please check arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
> 
> I can see:
> 
> vaddr = (unsigned long)relocate_kernel;
> 
> and later:
> 
> pgd += pgd_index(vaddr);
> ...
> 
> It is wrong. relocate_kernel() virtual address in Xen is different
> than its virtual address in Linux Kernel. That is why transition
> page table could not be established in Linux Kernel and so on...
> How does this work in SUSE? I do not have an idea.

The real problem here is attempting to transition from the Xen page
tables to an identity mapping set of page tables by using some
trampoline code and page tables provided by the dom0 kernel.

This works[*] with PV because the page tables from the PV dom0 have
machine addresses and get mapped into the fixmap on kexec load, but it's
completely broken for a PVH dom0.

I shall be ditching this (bizarre) method and putting the trampoline and
transition/identity map page tables into Xen.

David

[*] Works for us in our old classic kernels, YMMV.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2013-01-10 14:07           ` David Vrabel
  0 siblings, 0 replies; 217+ messages in thread
From: David Vrabel @ 2013-01-10 14:07 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: konrad.wilk, andrew.cooper3, maxim.uvarov, x86, kexec,
	linux-kernel, xen-devel, mingo, ebiederm, Jan Beulich, hpa, tglx,
	virtualization, vgoyal

On 04/01/13 15:15, Daniel Kiper wrote:
> On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
>>>>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
>>> Some implementations (e.g. Xen PVOPS) could not use part of identity page table
>>> to construct transition page table. It means that they require separate PUDs,
>>> PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
>>> requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
>>> code.
>>
>> So you keep posting this despite it having got pointed out on each
>> earlier submission that this is unnecessary, proven by the fact that
>> the non-pvops Xen kernels can get away without it. Why?
> 
> Sorry but I forgot to reply for your email last time.
> 
> I am still not convinced. I have tested SUSE kernel itself and it does not work.
> Maybe I missed something but... Please check arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
> 
> I can see:
> 
> vaddr = (unsigned long)relocate_kernel;
> 
> and later:
> 
> pgd += pgd_index(vaddr);
> ...
> 
> It is wrong. relocate_kernel() virtual address in Xen is different
> than its virtual address in Linux Kernel. That is why transition
> page table could not be established in Linux Kernel and so on...
> How does this work in SUSE? I do not have an idea.

The real problem here is attempting to transition from the Xen page
tables to an identity mapping set of page tables by using some
trampoline code and page tables provided by the dom0 kernel.

This works[*] with PV because the page tables from the PV dom0 have
machine addresses and get mapped into the fixmap on kexec load, but it's
completely broken for a PVH dom0.

I shall be ditching this (bizarre) method and putting the trampoline and
transition/identity map page tables into Xen.

David

[*] Works for us in our old classic kernels, YMMV.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-04 15:15         ` Daniel Kiper
                           ` (3 preceding siblings ...)
  (?)
@ 2013-01-10 14:07         ` David Vrabel
  -1 siblings, 0 replies; 217+ messages in thread
From: David Vrabel @ 2013-01-10 14:07 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: konrad.wilk, andrew.cooper3, maxim.uvarov, x86, kexec,
	linux-kernel, xen-devel, mingo, ebiederm, Jan Beulich, hpa, tglx,
	virtualization, vgoyal

On 04/01/13 15:15, Daniel Kiper wrote:
> On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
>>>>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
>>> Some implementations (e.g. Xen PVOPS) could not use part of identity page table
>>> to construct transition page table. It means that they require separate PUDs,
>>> PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
>>> requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
>>> code.
>>
>> So you keep posting this despite it having got pointed out on each
>> earlier submission that this is unnecessary, proven by the fact that
>> the non-pvops Xen kernels can get away without it. Why?
> 
> Sorry but I forgot to reply for your email last time.
> 
> I am still not convinced. I have tested SUSE kernel itself and it does not work.
> Maybe I missed something but... Please check arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
> 
> I can see:
> 
> vaddr = (unsigned long)relocate_kernel;
> 
> and later:
> 
> pgd += pgd_index(vaddr);
> ...
> 
> It is wrong. relocate_kernel() virtual address in Xen is different
> than its virtual address in Linux Kernel. That is why transition
> page table could not be established in Linux Kernel and so on...
> How does this work in SUSE? I do not have an idea.

The real problem here is attempting to transition from the Xen page
tables to an identity mapping set of page tables by using some
trampoline code and page tables provided by the dom0 kernel.

This works[*] with PV because the page tables from the PV dom0 have
machine addresses and get mapped into the fixmap on kexec load, but it's
completely broken for a PVH dom0.

I shall be ditching this (bizarre) method and putting the trampoline and
transition/identity map page tables into Xen.

David

[*] Works for us in our old classic kernels, YMMV.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2013-01-10 14:07           ` David Vrabel
  0 siblings, 0 replies; 217+ messages in thread
From: David Vrabel @ 2013-01-10 14:07 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: konrad.wilk, andrew.cooper3, maxim.uvarov, x86, kexec,
	linux-kernel, xen-devel, mingo, ebiederm, Jan Beulich, hpa, tglx,
	virtualization, vgoyal

On 04/01/13 15:15, Daniel Kiper wrote:
> On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
>>>>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
>>> Some implementations (e.g. Xen PVOPS) could not use part of identity page table
>>> to construct transition page table. It means that they require separate PUDs,
>>> PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
>>> requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
>>> code.
>>
>> So you keep posting this despite it having got pointed out on each
>> earlier submission that this is unnecessary, proven by the fact that
>> the non-pvops Xen kernels can get away without it. Why?
> 
> Sorry but I forgot to reply for your email last time.
> 
> I am still not convinced. I have tested SUSE kernel itself and it does not work.
> Maybe I missed something but... Please check arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
> 
> I can see:
> 
> vaddr = (unsigned long)relocate_kernel;
> 
> and later:
> 
> pgd += pgd_index(vaddr);
> ...
> 
> It is wrong. relocate_kernel() virtual address in Xen is different
> than its virtual address in Linux Kernel. That is why transition
> page table could not be established in Linux Kernel and so on...
> How does this work in SUSE? I do not have an idea.

The real problem here is attempting to transition from the Xen page
tables to an identity mapping set of page tables by using some
trampoline code and page tables provided by the dom0 kernel.

This works[*] with PV because the page tables from the PV dom0 have
machine addresses and get mapped into the fixmap on kexec load, but it's
completely broken for a PVH dom0.

I shall be ditching this (bizarre) method and putting the trampoline and
transition/identity map page tables into Xen.

David

[*] Works for us in our old classic kernels, YMMV.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-04 17:01                 ` Daniel Kiper
  (?)
@ 2013-01-10 14:19                   ` David Vrabel
  -1 siblings, 0 replies; 217+ messages in thread
From: David Vrabel @ 2013-01-10 14:19 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: David Vrabel, xen-devel, H. Peter Anvin, konrad.wilk,
	Andrew Cooper, x86, kexec, linux-kernel, virtualization, mingo,
	Eric W. Biederman, jbeulich, maxim.uvarov, tglx, vgoyal

On 04/01/13 17:01, Daniel Kiper wrote:
> On Fri, Jan 04, 2013 at 02:38:44PM +0000, David Vrabel wrote:
>> On 04/01/13 14:22, Daniel Kiper wrote:
>>> On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
>>>> On 27/12/12 18:02, Eric W. Biederman wrote:
>>>>> Andrew Cooper<andrew.cooper3@citrix.com>  writes:
>>>>>
>>>>>> On 27/12/2012 07:53, Eric W. Biederman wrote:
>>>>>>> The syscall ABI still has the wrong semantics.
>>>>>>>
>>>>>>> Aka totally unmaintainable and umergeable.
>>>>>>>
>>>>>>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
>>>>>> There are two requirements pulling at this patch series, but I agree
>>>>>> that we need to clarify them.
>>>>> It probably make sense to split them apart a little even.
>>>>>
>>>>>
>>>>
>>>> Thinking about this split, there might be a way to simply it even more.
>>>>
>>>> /sbin/kexec can load the "Xen" crash kernel itself by issuing
>>>> hypercalls using /dev/xen/privcmd.  This would remove the need for
>>>> the dom0 kernel to distinguish between loading a crash kernel for
>>>> itself and loading a kernel for Xen.
>>>>
>>>> Or is this just a silly idea complicating the matter?
>>>
>>> This is impossible with current Xen kexec/kdump interface.
>>> It should be changed to do that. However, I suppose that
>>> Xen community would not be interested in such changes.
>>
>> I don't see why the hypercall ABI cannot be extended with new sub-ops
>> that do the right thing -- the existing ABI is a bit weird.
>>
>> I plan to start prototyping something shortly (hopefully next week) for
>> the Xen kexec case.
> 
> Wow... As I can this time Xen community is interested in...
> That is great. I agree that current kexec interface is not ideal.

I spent some more time looking at the existing interface and
implementation and it really is broken.

> David, I am happy to help in that process. However, if you wish I could
> carry it myself. Anyway, it looks that I should hold on with my
> Linux kexec/kdump patches.

I should be able to post some prototype patches for Xen in a few weeks.
 No guarantees though.

> My .5 cents:
>   - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
>     probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2;
>     load should __LOAD__ kernel image and other things into hypervisor memory;

Yes, but I don't see how we can easily support both ABIs easily.  I'd be
in favour of replacing the existing hypercalls and requiring updated
kexec tools in dom0 (this isn't that different to requiring the correct
libxc in dom0).

>     I suppose that allmost all things could be copied from linux/kernel/kexec.c,
>     linux/arch/x86/kernel/{machine_kexec_$(BITS).c,relocate_kernel_$(BITS).c};
>     I think that KEXEC_CMD_kexec should stay as is,

I don't think we want all the junk from Linux inside Xen -- we only want
to support the kdump case and do not have to handle returning from the
kexec image.

>   - Hmmm... Now I think that we should still use kexec syscall to load image
>     into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
>     all things which are needed to call kdump if dom0 crashes; however,
>     I could be wrong...

I don't think we need the kexec syscall.  The kernel can unconditionally
do the crash hypercall, which will return if the kdump kernel isn't
loaded and the kernel can fall back to the regular non-kexec panic.

This will allow the kexec syscall to be used only for the domU kexec case.

>   - last but not least, we should think about support for PV guests
>     too.

I won't be looking at this.

To avoid confusion about the two largely orthogonal sorts of kexec how
about defining some terms.  I suggest:

Xen kexec: Xen executes the image in response to a Xen crash or a
hypercall from a privileged domain.

Guest kexec: The guest kernel executes the images within the domain in
response to a guest kernel crash or a system call.

David

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-10 14:19                   ` David Vrabel
  0 siblings, 0 replies; 217+ messages in thread
From: David Vrabel @ 2013-01-10 14:19 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, konrad.wilk, Andrew Cooper, maxim.uvarov, x86, kexec,
	linux-kernel, virtualization, mingo, David Vrabel, jbeulich,
	H. Peter Anvin, tglx, vgoyal, Eric W. Biederman

On 04/01/13 17:01, Daniel Kiper wrote:
> On Fri, Jan 04, 2013 at 02:38:44PM +0000, David Vrabel wrote:
>> On 04/01/13 14:22, Daniel Kiper wrote:
>>> On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
>>>> On 27/12/12 18:02, Eric W. Biederman wrote:
>>>>> Andrew Cooper<andrew.cooper3@citrix.com>  writes:
>>>>>
>>>>>> On 27/12/2012 07:53, Eric W. Biederman wrote:
>>>>>>> The syscall ABI still has the wrong semantics.
>>>>>>>
>>>>>>> Aka totally unmaintainable and umergeable.
>>>>>>>
>>>>>>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
>>>>>> There are two requirements pulling at this patch series, but I agree
>>>>>> that we need to clarify them.
>>>>> It probably make sense to split them apart a little even.
>>>>>
>>>>>
>>>>
>>>> Thinking about this split, there might be a way to simply it even more.
>>>>
>>>> /sbin/kexec can load the "Xen" crash kernel itself by issuing
>>>> hypercalls using /dev/xen/privcmd.  This would remove the need for
>>>> the dom0 kernel to distinguish between loading a crash kernel for
>>>> itself and loading a kernel for Xen.
>>>>
>>>> Or is this just a silly idea complicating the matter?
>>>
>>> This is impossible with current Xen kexec/kdump interface.
>>> It should be changed to do that. However, I suppose that
>>> Xen community would not be interested in such changes.
>>
>> I don't see why the hypercall ABI cannot be extended with new sub-ops
>> that do the right thing -- the existing ABI is a bit weird.
>>
>> I plan to start prototyping something shortly (hopefully next week) for
>> the Xen kexec case.
> 
> Wow... As I can this time Xen community is interested in...
> That is great. I agree that current kexec interface is not ideal.

I spent some more time looking at the existing interface and
implementation and it really is broken.

> David, I am happy to help in that process. However, if you wish I could
> carry it myself. Anyway, it looks that I should hold on with my
> Linux kexec/kdump patches.

I should be able to post some prototype patches for Xen in a few weeks.
 No guarantees though.

> My .5 cents:
>   - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
>     probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2;
>     load should __LOAD__ kernel image and other things into hypervisor memory;

Yes, but I don't see how we can easily support both ABIs easily.  I'd be
in favour of replacing the existing hypercalls and requiring updated
kexec tools in dom0 (this isn't that different to requiring the correct
libxc in dom0).

>     I suppose that allmost all things could be copied from linux/kernel/kexec.c,
>     linux/arch/x86/kernel/{machine_kexec_$(BITS).c,relocate_kernel_$(BITS).c};
>     I think that KEXEC_CMD_kexec should stay as is,

I don't think we want all the junk from Linux inside Xen -- we only want
to support the kdump case and do not have to handle returning from the
kexec image.

>   - Hmmm... Now I think that we should still use kexec syscall to load image
>     into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
>     all things which are needed to call kdump if dom0 crashes; however,
>     I could be wrong...

I don't think we need the kexec syscall.  The kernel can unconditionally
do the crash hypercall, which will return if the kdump kernel isn't
loaded and the kernel can fall back to the regular non-kexec panic.

This will allow the kexec syscall to be used only for the domU kexec case.

>   - last but not least, we should think about support for PV guests
>     too.

I won't be looking at this.

To avoid confusion about the two largely orthogonal sorts of kexec how
about defining some terms.  I suggest:

Xen kexec: Xen executes the image in response to a Xen crash or a
hypercall from a privileged domain.

Guest kexec: The guest kernel executes the images within the domain in
response to a guest kernel crash or a system call.

David

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-10 14:19                   ` David Vrabel
  0 siblings, 0 replies; 217+ messages in thread
From: David Vrabel @ 2013-01-10 14:19 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, konrad.wilk, Andrew Cooper, maxim.uvarov, x86, kexec,
	linux-kernel, virtualization, mingo, David Vrabel, jbeulich,
	H. Peter Anvin, tglx, vgoyal, Eric W. Biederman

On 04/01/13 17:01, Daniel Kiper wrote:
> On Fri, Jan 04, 2013 at 02:38:44PM +0000, David Vrabel wrote:
>> On 04/01/13 14:22, Daniel Kiper wrote:
>>> On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
>>>> On 27/12/12 18:02, Eric W. Biederman wrote:
>>>>> Andrew Cooper<andrew.cooper3@citrix.com>  writes:
>>>>>
>>>>>> On 27/12/2012 07:53, Eric W. Biederman wrote:
>>>>>>> The syscall ABI still has the wrong semantics.
>>>>>>>
>>>>>>> Aka totally unmaintainable and umergeable.
>>>>>>>
>>>>>>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
>>>>>> There are two requirements pulling at this patch series, but I agree
>>>>>> that we need to clarify them.
>>>>> It probably make sense to split them apart a little even.
>>>>>
>>>>>
>>>>
>>>> Thinking about this split, there might be a way to simply it even more.
>>>>
>>>> /sbin/kexec can load the "Xen" crash kernel itself by issuing
>>>> hypercalls using /dev/xen/privcmd.  This would remove the need for
>>>> the dom0 kernel to distinguish between loading a crash kernel for
>>>> itself and loading a kernel for Xen.
>>>>
>>>> Or is this just a silly idea complicating the matter?
>>>
>>> This is impossible with current Xen kexec/kdump interface.
>>> It should be changed to do that. However, I suppose that
>>> Xen community would not be interested in such changes.
>>
>> I don't see why the hypercall ABI cannot be extended with new sub-ops
>> that do the right thing -- the existing ABI is a bit weird.
>>
>> I plan to start prototyping something shortly (hopefully next week) for
>> the Xen kexec case.
> 
> Wow... As I can this time Xen community is interested in...
> That is great. I agree that current kexec interface is not ideal.

I spent some more time looking at the existing interface and
implementation and it really is broken.

> David, I am happy to help in that process. However, if you wish I could
> carry it myself. Anyway, it looks that I should hold on with my
> Linux kexec/kdump patches.

I should be able to post some prototype patches for Xen in a few weeks.
 No guarantees though.

> My .5 cents:
>   - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
>     probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2;
>     load should __LOAD__ kernel image and other things into hypervisor memory;

Yes, but I don't see how we can easily support both ABIs easily.  I'd be
in favour of replacing the existing hypercalls and requiring updated
kexec tools in dom0 (this isn't that different to requiring the correct
libxc in dom0).

>     I suppose that allmost all things could be copied from linux/kernel/kexec.c,
>     linux/arch/x86/kernel/{machine_kexec_$(BITS).c,relocate_kernel_$(BITS).c};
>     I think that KEXEC_CMD_kexec should stay as is,

I don't think we want all the junk from Linux inside Xen -- we only want
to support the kdump case and do not have to handle returning from the
kexec image.

>   - Hmmm... Now I think that we should still use kexec syscall to load image
>     into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
>     all things which are needed to call kdump if dom0 crashes; however,
>     I could be wrong...

I don't think we need the kexec syscall.  The kernel can unconditionally
do the crash hypercall, which will return if the kdump kernel isn't
loaded and the kernel can fall back to the regular non-kexec panic.

This will allow the kexec syscall to be used only for the domU kexec case.

>   - last but not least, we should think about support for PV guests
>     too.

I won't be looking at this.

To avoid confusion about the two largely orthogonal sorts of kexec how
about defining some terms.  I suggest:

Xen kexec: Xen executes the image in response to a Xen crash or a
hypercall from a privileged domain.

Guest kexec: The guest kernel executes the images within the domain in
response to a guest kernel crash or a system call.

David

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-07 16:20                       ` Konrad Rzeszutek Wilk
  (?)
@ 2013-01-11  4:16                         ` Eric W. Biederman
  -1 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2013-01-11  4:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Daniel Kiper, xen-devel, H. Peter Anvin, Andrew Cooper, x86,
	kexec, linux-kernel, virtualization, mingo, Jan Beulich,
	maxim.uvarov, tglx, vgoyal

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Mon, Jan 07, 2013 at 01:34:04PM +0100, Daniel Kiper wrote:
>> I think that new kexec hypercall function should mimics kexec syscall.
>> It means that all arguments passed to hypercall should have same types
>> if it is possible or if it is not possible then conversion should be done
>> in very easy way. Additionally, I think that one call of new hypercall
>> load function should load all needed thinks in right place and
>> return relevant status. Last but not least, new functionality should
>
> We are not restricted to just _one_ hypercall. And this loading
> thing could be similar to the micrcode hypercall - which just points
> to a virtual address along with the length - and says 'load me'.
>
>> be available through /dev/xen/privcmd or directly from kernel without
>> bigger effort.
>
> Perhaps we should have a email thread on xen-devel where we hash out
> some ideas. Eric, would you be OK included on this - it would make
> sense for this mechanism to be as future-proof as possible - and I am not
> sure what your plans for kexec are in the future?

The basic kexec interface is.

load ranges of virtual addresses physical addresses.
jump to the physical address  with identity mapped page tables.

There are a few flags to allow for different usage scenarios like
kexec on panic vs normal kexec.

It is very very simple and very extensible.  All of the weird glue
happens in userspace.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11  4:16                         ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2013-01-11  4:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Andrew Cooper, Daniel Kiper, x86, kexec, linux-kernel,
	virtualization, mingo, Jan Beulich, H. Peter Anvin, maxim.uvarov,
	tglx, vgoyal

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Mon, Jan 07, 2013 at 01:34:04PM +0100, Daniel Kiper wrote:
>> I think that new kexec hypercall function should mimics kexec syscall.
>> It means that all arguments passed to hypercall should have same types
>> if it is possible or if it is not possible then conversion should be done
>> in very easy way. Additionally, I think that one call of new hypercall
>> load function should load all needed thinks in right place and
>> return relevant status. Last but not least, new functionality should
>
> We are not restricted to just _one_ hypercall. And this loading
> thing could be similar to the micrcode hypercall - which just points
> to a virtual address along with the length - and says 'load me'.
>
>> be available through /dev/xen/privcmd or directly from kernel without
>> bigger effort.
>
> Perhaps we should have a email thread on xen-devel where we hash out
> some ideas. Eric, would you be OK included on this - it would make
> sense for this mechanism to be as future-proof as possible - and I am not
> sure what your plans for kexec are in the future?

The basic kexec interface is.

load ranges of virtual addresses physical addresses.
jump to the physical address  with identity mapped page tables.

There are a few flags to allow for different usage scenarios like
kexec on panic vs normal kexec.

It is very very simple and very extensible.  All of the weird glue
happens in userspace.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11  4:16                         ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2013-01-11  4:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Andrew Cooper, Daniel Kiper, x86, kexec, linux-kernel,
	virtualization, mingo, Jan Beulich, H. Peter Anvin, maxim.uvarov,
	tglx, vgoyal

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Mon, Jan 07, 2013 at 01:34:04PM +0100, Daniel Kiper wrote:
>> I think that new kexec hypercall function should mimics kexec syscall.
>> It means that all arguments passed to hypercall should have same types
>> if it is possible or if it is not possible then conversion should be done
>> in very easy way. Additionally, I think that one call of new hypercall
>> load function should load all needed thinks in right place and
>> return relevant status. Last but not least, new functionality should
>
> We are not restricted to just _one_ hypercall. And this loading
> thing could be similar to the micrcode hypercall - which just points
> to a virtual address along with the length - and says 'load me'.
>
>> be available through /dev/xen/privcmd or directly from kernel without
>> bigger effort.
>
> Perhaps we should have a email thread on xen-devel where we hash out
> some ideas. Eric, would you be OK included on this - it would make
> sense for this mechanism to be as future-proof as possible - and I am not
> sure what your plans for kexec are in the future?

The basic kexec interface is.

load ranges of virtual addresses physical addresses.
jump to the physical address  with identity mapped page tables.

There are a few flags to allow for different usage scenarios like
kexec on panic vs normal kexec.

It is very very simple and very extensible.  All of the weird glue
happens in userspace.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-10 14:19                   ` David Vrabel
  (?)
@ 2013-01-11 13:22                     ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-11 13:22 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, H. Peter Anvin, konrad.wilk, Andrew Cooper, x86,
	kexec, linux-kernel, virtualization, mingo, Eric W. Biederman,
	jbeulich, maxim.uvarov, tglx, vgoyal

On Thu, Jan 10, 2013 at 02:19:55PM +0000, David Vrabel wrote:
> On 04/01/13 17:01, Daniel Kiper wrote:
> > On Fri, Jan 04, 2013 at 02:38:44PM +0000, David Vrabel wrote:
> >> On 04/01/13 14:22, Daniel Kiper wrote:
> >>> On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> >>>> On 27/12/12 18:02, Eric W. Biederman wrote:
> >>>>> Andrew Cooper<andrew.cooper3@citrix.com>  writes:
> >>>>>
> >>>>>> On 27/12/2012 07:53, Eric W. Biederman wrote:
> >>>>>>> The syscall ABI still has the wrong semantics.
> >>>>>>>
> >>>>>>> Aka totally unmaintainable and umergeable.
> >>>>>>>
> >>>>>>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
> >>>>>> There are two requirements pulling at this patch series, but I agree
> >>>>>> that we need to clarify them.
> >>>>> It probably make sense to split them apart a little even.
> >>>>>
> >>>>>
> >>>>
> >>>> Thinking about this split, there might be a way to simply it even more.
> >>>>
> >>>> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> >>>> hypercalls using /dev/xen/privcmd.  This would remove the need for
> >>>> the dom0 kernel to distinguish between loading a crash kernel for
> >>>> itself and loading a kernel for Xen.
> >>>>
> >>>> Or is this just a silly idea complicating the matter?
> >>>
> >>> This is impossible with current Xen kexec/kdump interface.
> >>> It should be changed to do that. However, I suppose that
> >>> Xen community would not be interested in such changes.
> >>
> >> I don't see why the hypercall ABI cannot be extended with new sub-ops
> >> that do the right thing -- the existing ABI is a bit weird.
> >>
> >> I plan to start prototyping something shortly (hopefully next week) for
> >> the Xen kexec case.
> >
> > Wow... As I can this time Xen community is interested in...
> > That is great. I agree that current kexec interface is not ideal.
>
> I spent some more time looking at the existing interface and
> implementation and it really is broken.
>
> > David, I am happy to help in that process. However, if you wish I could
> > carry it myself. Anyway, it looks that I should hold on with my
> > Linux kexec/kdump patches.
>
> I should be able to post some prototype patches for Xen in a few weeks.
>  No guarantees though.

That is great. If you need any help drop me a line.

> > My .5 cents:
> >   - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
> >     probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2;
> >     load should __LOAD__ kernel image and other things into hypervisor memory;
>
> Yes, but I don't see how we can easily support both ABIs easily.  I'd be
> in favour of replacing the existing hypercalls and requiring updated
> kexec tools in dom0 (this isn't that different to requiring the correct
> libxc in dom0).

Why? Just define new strutures for new functions of kexec hypercall.
That should suffice.

> >     I suppose that allmost all things could be copied from linux/kernel/kexec.c,
> >     linux/arch/x86/kernel/{machine_kexec_$(BITS).c,relocate_kernel_$(BITS).c};
> >     I think that KEXEC_CMD_kexec should stay as is,
>
> I don't think we want all the junk from Linux inside Xen -- we only want
> to support the kdump case and do not have to handle returning from the
> kexec image.

I do not want to implement kexec jump or stuff like. However, I think that
it is worth use code which could be used. As I know there are lot of stuff
which was taken with smaller or bigger changes from Linux Kernel.
Why we would like to reinvent the wheel this time?

Additionally, we should not drop kexec support. It is main part of kdump.
In case of kdump new kernel (and other stuff) is placed in prealocated
space in contrary to kexec. That's all. kexec is useful if you would like
to quickly (skipping BIOS) switch from Xen to baremetal Linux. If you drop
kexec support from Xen then you need alter kexec-tools package in bunch
of distros to take into account new Xen behavior.
I think that it is not we want to do.

> >   - Hmmm... Now I think that we should still use kexec syscall to load image
> >     into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
> >     all things which are needed to call kdump if dom0 crashes; however,
> >     I could be wrong...
>
> I don't think we need the kexec syscall.  The kernel can unconditionally
> do the crash hypercall, which will return if the kdump kernel isn't
> loaded and the kernel can fall back to the regular non-kexec panic.

No, please do not do that. When you call HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
system is completly shutdown. Return form HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
would require to restore some kernel functionalities. It maybe impossible
in some cases. Additionally, it means that some changes should be made
in generic kexec code path. As I know kexec maintainers are very reluctant
to make such things.

> This will allow the kexec syscall to be used only for the domU kexec case.
>
> >   - last but not least, we should think about support for PV guests
> >     too.
>
> I won't be looking at this.

OK.

> To avoid confusion about the two largely orthogonal sorts of kexec how
> about defining some terms.  I suggest:
>
> Xen kexec: Xen executes the image in response to a Xen crash or a
> hypercall from a privileged domain.
>
> Guest kexec: The guest kernel executes the images within the domain in
> response to a guest kernel crash or a system call.

OK.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 13:22                     ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-11 13:22 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, konrad.wilk, Andrew Cooper, maxim.uvarov, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman, jbeulich,
	H. Peter Anvin, tglx, vgoyal

On Thu, Jan 10, 2013 at 02:19:55PM +0000, David Vrabel wrote:
> On 04/01/13 17:01, Daniel Kiper wrote:
> > On Fri, Jan 04, 2013 at 02:38:44PM +0000, David Vrabel wrote:
> >> On 04/01/13 14:22, Daniel Kiper wrote:
> >>> On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> >>>> On 27/12/12 18:02, Eric W. Biederman wrote:
> >>>>> Andrew Cooper<andrew.cooper3@citrix.com>  writes:
> >>>>>
> >>>>>> On 27/12/2012 07:53, Eric W. Biederman wrote:
> >>>>>>> The syscall ABI still has the wrong semantics.
> >>>>>>>
> >>>>>>> Aka totally unmaintainable and umergeable.
> >>>>>>>
> >>>>>>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
> >>>>>> There are two requirements pulling at this patch series, but I agree
> >>>>>> that we need to clarify them.
> >>>>> It probably make sense to split them apart a little even.
> >>>>>
> >>>>>
> >>>>
> >>>> Thinking about this split, there might be a way to simply it even more.
> >>>>
> >>>> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> >>>> hypercalls using /dev/xen/privcmd.  This would remove the need for
> >>>> the dom0 kernel to distinguish between loading a crash kernel for
> >>>> itself and loading a kernel for Xen.
> >>>>
> >>>> Or is this just a silly idea complicating the matter?
> >>>
> >>> This is impossible with current Xen kexec/kdump interface.
> >>> It should be changed to do that. However, I suppose that
> >>> Xen community would not be interested in such changes.
> >>
> >> I don't see why the hypercall ABI cannot be extended with new sub-ops
> >> that do the right thing -- the existing ABI is a bit weird.
> >>
> >> I plan to start prototyping something shortly (hopefully next week) for
> >> the Xen kexec case.
> >
> > Wow... As I can this time Xen community is interested in...
> > That is great. I agree that current kexec interface is not ideal.
>
> I spent some more time looking at the existing interface and
> implementation and it really is broken.
>
> > David, I am happy to help in that process. However, if you wish I could
> > carry it myself. Anyway, it looks that I should hold on with my
> > Linux kexec/kdump patches.
>
> I should be able to post some prototype patches for Xen in a few weeks.
>  No guarantees though.

That is great. If you need any help drop me a line.

> > My .5 cents:
> >   - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
> >     probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2;
> >     load should __LOAD__ kernel image and other things into hypervisor memory;
>
> Yes, but I don't see how we can easily support both ABIs easily.  I'd be
> in favour of replacing the existing hypercalls and requiring updated
> kexec tools in dom0 (this isn't that different to requiring the correct
> libxc in dom0).

Why? Just define new strutures for new functions of kexec hypercall.
That should suffice.

> >     I suppose that allmost all things could be copied from linux/kernel/kexec.c,
> >     linux/arch/x86/kernel/{machine_kexec_$(BITS).c,relocate_kernel_$(BITS).c};
> >     I think that KEXEC_CMD_kexec should stay as is,
>
> I don't think we want all the junk from Linux inside Xen -- we only want
> to support the kdump case and do not have to handle returning from the
> kexec image.

I do not want to implement kexec jump or stuff like. However, I think that
it is worth use code which could be used. As I know there are lot of stuff
which was taken with smaller or bigger changes from Linux Kernel.
Why we would like to reinvent the wheel this time?

Additionally, we should not drop kexec support. It is main part of kdump.
In case of kdump new kernel (and other stuff) is placed in prealocated
space in contrary to kexec. That's all. kexec is useful if you would like
to quickly (skipping BIOS) switch from Xen to baremetal Linux. If you drop
kexec support from Xen then you need alter kexec-tools package in bunch
of distros to take into account new Xen behavior.
I think that it is not we want to do.

> >   - Hmmm... Now I think that we should still use kexec syscall to load image
> >     into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
> >     all things which are needed to call kdump if dom0 crashes; however,
> >     I could be wrong...
>
> I don't think we need the kexec syscall.  The kernel can unconditionally
> do the crash hypercall, which will return if the kdump kernel isn't
> loaded and the kernel can fall back to the regular non-kexec panic.

No, please do not do that. When you call HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
system is completly shutdown. Return form HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
would require to restore some kernel functionalities. It maybe impossible
in some cases. Additionally, it means that some changes should be made
in generic kexec code path. As I know kexec maintainers are very reluctant
to make such things.

> This will allow the kexec syscall to be used only for the domU kexec case.
>
> >   - last but not least, we should think about support for PV guests
> >     too.
>
> I won't be looking at this.

OK.

> To avoid confusion about the two largely orthogonal sorts of kexec how
> about defining some terms.  I suggest:
>
> Xen kexec: Xen executes the image in response to a Xen crash or a
> hypercall from a privileged domain.
>
> Guest kexec: The guest kernel executes the images within the domain in
> response to a guest kernel crash or a system call.

OK.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 13:22                     ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-11 13:22 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, konrad.wilk, Andrew Cooper, maxim.uvarov, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman, jbeulich,
	H. Peter Anvin, tglx, vgoyal

On Thu, Jan 10, 2013 at 02:19:55PM +0000, David Vrabel wrote:
> On 04/01/13 17:01, Daniel Kiper wrote:
> > On Fri, Jan 04, 2013 at 02:38:44PM +0000, David Vrabel wrote:
> >> On 04/01/13 14:22, Daniel Kiper wrote:
> >>> On Wed, Jan 02, 2013 at 11:26:43AM +0000, Andrew Cooper wrote:
> >>>> On 27/12/12 18:02, Eric W. Biederman wrote:
> >>>>> Andrew Cooper<andrew.cooper3@citrix.com>  writes:
> >>>>>
> >>>>>> On 27/12/2012 07:53, Eric W. Biederman wrote:
> >>>>>>> The syscall ABI still has the wrong semantics.
> >>>>>>>
> >>>>>>> Aka totally unmaintainable and umergeable.
> >>>>>>>
> >>>>>>> The concept of domU support is also strange.  What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over.
> >>>>>> There are two requirements pulling at this patch series, but I agree
> >>>>>> that we need to clarify them.
> >>>>> It probably make sense to split them apart a little even.
> >>>>>
> >>>>>
> >>>>
> >>>> Thinking about this split, there might be a way to simply it even more.
> >>>>
> >>>> /sbin/kexec can load the "Xen" crash kernel itself by issuing
> >>>> hypercalls using /dev/xen/privcmd.  This would remove the need for
> >>>> the dom0 kernel to distinguish between loading a crash kernel for
> >>>> itself and loading a kernel for Xen.
> >>>>
> >>>> Or is this just a silly idea complicating the matter?
> >>>
> >>> This is impossible with current Xen kexec/kdump interface.
> >>> It should be changed to do that. However, I suppose that
> >>> Xen community would not be interested in such changes.
> >>
> >> I don't see why the hypercall ABI cannot be extended with new sub-ops
> >> that do the right thing -- the existing ABI is a bit weird.
> >>
> >> I plan to start prototyping something shortly (hopefully next week) for
> >> the Xen kexec case.
> >
> > Wow... As I can this time Xen community is interested in...
> > That is great. I agree that current kexec interface is not ideal.
>
> I spent some more time looking at the existing interface and
> implementation and it really is broken.
>
> > David, I am happy to help in that process. However, if you wish I could
> > carry it myself. Anyway, it looks that I should hold on with my
> > Linux kexec/kdump patches.
>
> I should be able to post some prototype patches for Xen in a few weeks.
>  No guarantees though.

That is great. If you need any help drop me a line.

> > My .5 cents:
> >   - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
> >     probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2;
> >     load should __LOAD__ kernel image and other things into hypervisor memory;
>
> Yes, but I don't see how we can easily support both ABIs easily.  I'd be
> in favour of replacing the existing hypercalls and requiring updated
> kexec tools in dom0 (this isn't that different to requiring the correct
> libxc in dom0).

Why? Just define new strutures for new functions of kexec hypercall.
That should suffice.

> >     I suppose that allmost all things could be copied from linux/kernel/kexec.c,
> >     linux/arch/x86/kernel/{machine_kexec_$(BITS).c,relocate_kernel_$(BITS).c};
> >     I think that KEXEC_CMD_kexec should stay as is,
>
> I don't think we want all the junk from Linux inside Xen -- we only want
> to support the kdump case and do not have to handle returning from the
> kexec image.

I do not want to implement kexec jump or stuff like. However, I think that
it is worth use code which could be used. As I know there are lot of stuff
which was taken with smaller or bigger changes from Linux Kernel.
Why we would like to reinvent the wheel this time?

Additionally, we should not drop kexec support. It is main part of kdump.
In case of kdump new kernel (and other stuff) is placed in prealocated
space in contrary to kexec. That's all. kexec is useful if you would like
to quickly (skipping BIOS) switch from Xen to baremetal Linux. If you drop
kexec support from Xen then you need alter kexec-tools package in bunch
of distros to take into account new Xen behavior.
I think that it is not we want to do.

> >   - Hmmm... Now I think that we should still use kexec syscall to load image
> >     into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
> >     all things which are needed to call kdump if dom0 crashes; however,
> >     I could be wrong...
>
> I don't think we need the kexec syscall.  The kernel can unconditionally
> do the crash hypercall, which will return if the kdump kernel isn't
> loaded and the kernel can fall back to the regular non-kexec panic.

No, please do not do that. When you call HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
system is completly shutdown. Return form HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
would require to restore some kernel functionalities. It maybe impossible
in some cases. Additionally, it means that some changes should be made
in generic kexec code path. As I know kexec maintainers are very reluctant
to make such things.

> This will allow the kexec syscall to be used only for the domU kexec case.
>
> >   - last but not least, we should think about support for PV guests
> >     too.
>
> I won't be looking at this.

OK.

> To avoid confusion about the two largely orthogonal sorts of kexec how
> about defining some terms.  I suggest:
>
> Xen kexec: Xen executes the image in response to a Xen crash or a
> hypercall from a privileged domain.
>
> Guest kexec: The guest kernel executes the images within the domain in
> response to a guest kernel crash or a system call.

OK.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-10 14:07           ` David Vrabel
  (?)
@ 2013-01-11 13:36             ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-11 13:36 UTC (permalink / raw)
  To: David Vrabel
  Cc: Jan Beulich, hpa, konrad.wilk, andrew.cooper3, x86, kexec,
	linux-kernel, xen-devel, mingo, ebiederm, maxim.uvarov, tglx,
	virtualization, vgoyal

On Thu, Jan 10, 2013 at 02:07:31PM +0000, David Vrabel wrote:
> On 04/01/13 15:15, Daniel Kiper wrote:
> > On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
> >>>>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> >>> Some implementations (e.g. Xen PVOPS) could not use part of identity page table
> >>> to construct transition page table. It means that they require separate PUDs,
> >>> PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
> >>> requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
> >>> code.
> >>
> >> So you keep posting this despite it having got pointed out on each
> >> earlier submission that this is unnecessary, proven by the fact that
> >> the non-pvops Xen kernels can get away without it. Why?
> >
> > Sorry but I forgot to reply for your email last time.
> >
> > I am still not convinced. I have tested SUSE kernel itself and it does not work.
> > Maybe I missed something but... Please check arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
> >
> > I can see:
> >
> > vaddr = (unsigned long)relocate_kernel;
> >
> > and later:
> >
> > pgd += pgd_index(vaddr);
> > ...
> >
> > It is wrong. relocate_kernel() virtual address in Xen is different
> > than its virtual address in Linux Kernel. That is why transition
> > page table could not be established in Linux Kernel and so on...
> > How does this work in SUSE? I do not have an idea.
>
> The real problem here is attempting to transition from the Xen page
> tables to an identity mapping set of page tables by using some
> trampoline code and page tables provided by the dom0 kernel.
>
> This works[*] with PV because the page tables from the PV dom0 have
> machine addresses and get mapped into the fixmap on kexec load, but it's
> completely broken for a PVH dom0.
>
> I shall be ditching this (bizarre) method and putting the trampoline and
> transition/identity map page tables into Xen.

Great... Maybe I am boring but please look into
linux/arch/x86/kernel/{machine_kexec_$(BITS).c,relocate_kernel_$(BITS).c}
You could find there a lot of things which could be useful.

> David
>
> [*] Works for us in our old classic kernels, YMMV.

Ha... It works because virtual mapping of control page in transtition page table
is established in relocate_kernel() which sits during kexec/kdump execution
in control page. If you did not change something...

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2013-01-11 13:36             ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-11 13:36 UTC (permalink / raw)
  To: David Vrabel
  Cc: konrad.wilk, andrew.cooper3, maxim.uvarov, x86, kexec,
	linux-kernel, xen-devel, mingo, ebiederm, Jan Beulich, hpa, tglx,
	virtualization, vgoyal

On Thu, Jan 10, 2013 at 02:07:31PM +0000, David Vrabel wrote:
> On 04/01/13 15:15, Daniel Kiper wrote:
> > On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
> >>>>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> >>> Some implementations (e.g. Xen PVOPS) could not use part of identity page table
> >>> to construct transition page table. It means that they require separate PUDs,
> >>> PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
> >>> requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
> >>> code.
> >>
> >> So you keep posting this despite it having got pointed out on each
> >> earlier submission that this is unnecessary, proven by the fact that
> >> the non-pvops Xen kernels can get away without it. Why?
> >
> > Sorry but I forgot to reply for your email last time.
> >
> > I am still not convinced. I have tested SUSE kernel itself and it does not work.
> > Maybe I missed something but... Please check arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
> >
> > I can see:
> >
> > vaddr = (unsigned long)relocate_kernel;
> >
> > and later:
> >
> > pgd += pgd_index(vaddr);
> > ...
> >
> > It is wrong. relocate_kernel() virtual address in Xen is different
> > than its virtual address in Linux Kernel. That is why transition
> > page table could not be established in Linux Kernel and so on...
> > How does this work in SUSE? I do not have an idea.
>
> The real problem here is attempting to transition from the Xen page
> tables to an identity mapping set of page tables by using some
> trampoline code and page tables provided by the dom0 kernel.
>
> This works[*] with PV because the page tables from the PV dom0 have
> machine addresses and get mapped into the fixmap on kexec load, but it's
> completely broken for a PVH dom0.
>
> I shall be ditching this (bizarre) method and putting the trampoline and
> transition/identity map page tables into Xen.

Great... Maybe I am boring but please look into
linux/arch/x86/kernel/{machine_kexec_$(BITS).c,relocate_kernel_$(BITS).c}
You could find there a lot of things which could be useful.

> David
>
> [*] Works for us in our old classic kernels, YMMV.

Ha... It works because virtual mapping of control page in transtition page table
is established in relocate_kernel() which sits during kexec/kdump execution
in control page. If you did not change something...

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
  2013-01-10 14:07           ` David Vrabel
  (?)
  (?)
@ 2013-01-11 13:36           ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-11 13:36 UTC (permalink / raw)
  To: David Vrabel
  Cc: konrad.wilk, andrew.cooper3, maxim.uvarov, x86, kexec,
	linux-kernel, xen-devel, mingo, ebiederm, Jan Beulich, hpa, tglx,
	virtualization, vgoyal

On Thu, Jan 10, 2013 at 02:07:31PM +0000, David Vrabel wrote:
> On 04/01/13 15:15, Daniel Kiper wrote:
> > On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
> >>>>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> >>> Some implementations (e.g. Xen PVOPS) could not use part of identity page table
> >>> to construct transition page table. It means that they require separate PUDs,
> >>> PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
> >>> requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
> >>> code.
> >>
> >> So you keep posting this despite it having got pointed out on each
> >> earlier submission that this is unnecessary, proven by the fact that
> >> the non-pvops Xen kernels can get away without it. Why?
> >
> > Sorry but I forgot to reply for your email last time.
> >
> > I am still not convinced. I have tested SUSE kernel itself and it does not work.
> > Maybe I missed something but... Please check arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
> >
> > I can see:
> >
> > vaddr = (unsigned long)relocate_kernel;
> >
> > and later:
> >
> > pgd += pgd_index(vaddr);
> > ...
> >
> > It is wrong. relocate_kernel() virtual address in Xen is different
> > than its virtual address in Linux Kernel. That is why transition
> > page table could not be established in Linux Kernel and so on...
> > How does this work in SUSE? I do not have an idea.
>
> The real problem here is attempting to transition from the Xen page
> tables to an identity mapping set of page tables by using some
> trampoline code and page tables provided by the dom0 kernel.
>
> This works[*] with PV because the page tables from the PV dom0 have
> machine addresses and get mapped into the fixmap on kexec load, but it's
> completely broken for a PVH dom0.
>
> I shall be ditching this (bizarre) method and putting the trampoline and
> transition/identity map page tables into Xen.

Great... Maybe I am boring but please look into
linux/arch/x86/kernel/{machine_kexec_$(BITS).c,relocate_kernel_$(BITS).c}
You could find there a lot of things which could be useful.

> David
>
> [*] Works for us in our old classic kernels, YMMV.

Ha... It works because virtual mapping of control page in transtition page table
is established in relocate_kernel() which sits during kexec/kdump execution
in control page. If you did not change something...

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE
@ 2013-01-11 13:36             ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-11 13:36 UTC (permalink / raw)
  To: David Vrabel
  Cc: konrad.wilk, andrew.cooper3, maxim.uvarov, x86, kexec,
	linux-kernel, xen-devel, mingo, ebiederm, Jan Beulich, hpa, tglx,
	virtualization, vgoyal

On Thu, Jan 10, 2013 at 02:07:31PM +0000, David Vrabel wrote:
> On 04/01/13 15:15, Daniel Kiper wrote:
> > On Thu, Jan 03, 2013 at 09:34:55AM +0000, Jan Beulich wrote:
> >>>>> On 27.12.12 at 03:18, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> >>> Some implementations (e.g. Xen PVOPS) could not use part of identity page table
> >>> to construct transition page table. It means that they require separate PUDs,
> >>> PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
> >>> requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
> >>> code.
> >>
> >> So you keep posting this despite it having got pointed out on each
> >> earlier submission that this is unnecessary, proven by the fact that
> >> the non-pvops Xen kernels can get away without it. Why?
> >
> > Sorry but I forgot to reply for your email last time.
> >
> > I am still not convinced. I have tested SUSE kernel itself and it does not work.
> > Maybe I missed something but... Please check arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
> >
> > I can see:
> >
> > vaddr = (unsigned long)relocate_kernel;
> >
> > and later:
> >
> > pgd += pgd_index(vaddr);
> > ...
> >
> > It is wrong. relocate_kernel() virtual address in Xen is different
> > than its virtual address in Linux Kernel. That is why transition
> > page table could not be established in Linux Kernel and so on...
> > How does this work in SUSE? I do not have an idea.
>
> The real problem here is attempting to transition from the Xen page
> tables to an identity mapping set of page tables by using some
> trampoline code and page tables provided by the dom0 kernel.
>
> This works[*] with PV because the page tables from the PV dom0 have
> machine addresses and get mapped into the fixmap on kexec load, but it's
> completely broken for a PVH dom0.
>
> I shall be ditching this (bizarre) method and putting the trampoline and
> transition/identity map page tables into Xen.

Great... Maybe I am boring but please look into
linux/arch/x86/kernel/{machine_kexec_$(BITS).c,relocate_kernel_$(BITS).c}
You could find there a lot of things which could be useful.

> David
>
> [*] Works for us in our old classic kernels, YMMV.

Ha... It works because virtual mapping of control page in transtition page table
is established in relocate_kernel() which sits during kexec/kdump execution
in control page. If you did not change something...

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-07 13:49                       ` Ian Campbell
  (?)
@ 2013-01-11 13:47                         ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-11 13:47 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Konrad Rzeszutek Wilk, xen-devel, H. Peter Anvin, Andrew Cooper,
	x86, kexec, linux-kernel, virtualization, mingo,
	Eric W. Biederman, Jan Beulich, maxim.uvarov, tglx, vgoyal

On Mon, Jan 07, 2013 at 01:49:44PM +0000, Ian Campbell wrote:
> On Mon, 2013-01-07 at 12:34 +0000, Daniel Kiper wrote:
> > I think that new kexec hypercall function should mimics kexec syscall.
> 
> We want to have an interface can be used by non-Linux domains (both dom0
> and domU) as well though, so please bear this in mind.

I agree, but all arguments passed to kexec syscall are quiet generic and they
do not impose any limitations. Just look into include/linux/kexec.h.
That is why I think that a lot of things could be taken from
Linux kexec implementation.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 13:47                         ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-11 13:47 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, Konrad Rzeszutek Wilk, Andrew Cooper, maxim.uvarov,
	x86, kexec, linux-kernel, virtualization, mingo,
	Eric W. Biederman, Jan Beulich, H. Peter Anvin, tglx, vgoyal

On Mon, Jan 07, 2013 at 01:49:44PM +0000, Ian Campbell wrote:
> On Mon, 2013-01-07 at 12:34 +0000, Daniel Kiper wrote:
> > I think that new kexec hypercall function should mimics kexec syscall.
> 
> We want to have an interface can be used by non-Linux domains (both dom0
> and domU) as well though, so please bear this in mind.

I agree, but all arguments passed to kexec syscall are quiet generic and they
do not impose any limitations. Just look into include/linux/kexec.h.
That is why I think that a lot of things could be taken from
Linux kexec implementation.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 13:47                         ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-11 13:47 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, Konrad Rzeszutek Wilk, Andrew Cooper, maxim.uvarov,
	x86, kexec, linux-kernel, virtualization, mingo,
	Eric W. Biederman, Jan Beulich, H. Peter Anvin, tglx, vgoyal

On Mon, Jan 07, 2013 at 01:49:44PM +0000, Ian Campbell wrote:
> On Mon, 2013-01-07 at 12:34 +0000, Daniel Kiper wrote:
> > I think that new kexec hypercall function should mimics kexec syscall.
> 
> We want to have an interface can be used by non-Linux domains (both dom0
> and domU) as well though, so please bear this in mind.

I agree, but all arguments passed to kexec syscall are quiet generic and they
do not impose any limitations. Just look into include/linux/kexec.h.
That is why I think that a lot of things could be taken from
Linux kexec implementation.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-11 13:22                     ` Daniel Kiper
  (?)
@ 2013-01-11 15:22                       ` David Vrabel
  -1 siblings, 0 replies; 217+ messages in thread
From: David Vrabel @ 2013-01-11 15:22 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, H. Peter Anvin, konrad.wilk, Andrew Cooper, x86,
	kexec, linux-kernel, virtualization, mingo, Eric W. Biederman,
	jbeulich, maxim.uvarov, tglx, vgoyal

On 11/01/13 13:22, Daniel Kiper wrote:
> On Thu, Jan 10, 2013 at 02:19:55PM +0000, David Vrabel wrote:
>> On 04/01/13 17:01, Daniel Kiper wrote:
>>> My .5 cents:
>>>   - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
>>>     probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2;
>>>     load should __LOAD__ kernel image and other things into hypervisor memory;
>>
>> Yes, but I don't see how we can easily support both ABIs easily.  I'd be
>> in favour of replacing the existing hypercalls and requiring updated
>> kexec tools in dom0 (this isn't that different to requiring the correct
>> libxc in dom0).
> 
> Why? Just define new strutures for new functions of kexec hypercall.
> That should suffice.

The current hypervisor ABI depends on an internal kernel ABI (i.e., the
ABI provided by relocate_kernel).  We do not want hypervisor internals
to be constrained by having to be compatible with kernel internals.

>>>   - Hmmm... Now I think that we should still use kexec syscall to load image
>>>     into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
>>>     all things which are needed to call kdump if dom0 crashes; however,
>>>     I could be wrong...
>>
>> I don't think we need the kexec syscall.  The kernel can unconditionally
>> do the crash hypercall, which will return if the kdump kernel isn't
>> loaded and the kernel can fall back to the regular non-kexec panic.
> 
> No, please do not do that. When you call HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
> system is completly shutdown. Return form HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
> would require to restore some kernel functionalities. It maybe impossible
> in some cases. Additionally, it means that some changes should be made
> in generic kexec code path. As I know kexec maintainers are very reluctant
> to make such things.

Huh?  There only needs to be a call to a new hypervisor_crash_kexec()
function (which would then call the Xen specific crash hypercall) at the
very beginning of crash_kexec().  If this returns the normal
crash/shutdown path is done (which could even include a guest kexec!).

David

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 15:22                       ` David Vrabel
  0 siblings, 0 replies; 217+ messages in thread
From: David Vrabel @ 2013-01-11 15:22 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, konrad.wilk, Andrew Cooper, maxim.uvarov, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman, jbeulich,
	H. Peter Anvin, tglx, vgoyal

On 11/01/13 13:22, Daniel Kiper wrote:
> On Thu, Jan 10, 2013 at 02:19:55PM +0000, David Vrabel wrote:
>> On 04/01/13 17:01, Daniel Kiper wrote:
>>> My .5 cents:
>>>   - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
>>>     probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2;
>>>     load should __LOAD__ kernel image and other things into hypervisor memory;
>>
>> Yes, but I don't see how we can easily support both ABIs easily.  I'd be
>> in favour of replacing the existing hypercalls and requiring updated
>> kexec tools in dom0 (this isn't that different to requiring the correct
>> libxc in dom0).
> 
> Why? Just define new strutures for new functions of kexec hypercall.
> That should suffice.

The current hypervisor ABI depends on an internal kernel ABI (i.e., the
ABI provided by relocate_kernel).  We do not want hypervisor internals
to be constrained by having to be compatible with kernel internals.

>>>   - Hmmm... Now I think that we should still use kexec syscall to load image
>>>     into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
>>>     all things which are needed to call kdump if dom0 crashes; however,
>>>     I could be wrong...
>>
>> I don't think we need the kexec syscall.  The kernel can unconditionally
>> do the crash hypercall, which will return if the kdump kernel isn't
>> loaded and the kernel can fall back to the regular non-kexec panic.
> 
> No, please do not do that. When you call HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
> system is completly shutdown. Return form HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
> would require to restore some kernel functionalities. It maybe impossible
> in some cases. Additionally, it means that some changes should be made
> in generic kexec code path. As I know kexec maintainers are very reluctant
> to make such things.

Huh?  There only needs to be a call to a new hypervisor_crash_kexec()
function (which would then call the Xen specific crash hypercall) at the
very beginning of crash_kexec().  If this returns the normal
crash/shutdown path is done (which could even include a guest kexec!).

David

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 15:22                       ` David Vrabel
  0 siblings, 0 replies; 217+ messages in thread
From: David Vrabel @ 2013-01-11 15:22 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: xen-devel, konrad.wilk, Andrew Cooper, maxim.uvarov, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman, jbeulich,
	H. Peter Anvin, tglx, vgoyal

On 11/01/13 13:22, Daniel Kiper wrote:
> On Thu, Jan 10, 2013 at 02:19:55PM +0000, David Vrabel wrote:
>> On 04/01/13 17:01, Daniel Kiper wrote:
>>> My .5 cents:
>>>   - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
>>>     probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2;
>>>     load should __LOAD__ kernel image and other things into hypervisor memory;
>>
>> Yes, but I don't see how we can easily support both ABIs easily.  I'd be
>> in favour of replacing the existing hypercalls and requiring updated
>> kexec tools in dom0 (this isn't that different to requiring the correct
>> libxc in dom0).
> 
> Why? Just define new strutures for new functions of kexec hypercall.
> That should suffice.

The current hypervisor ABI depends on an internal kernel ABI (i.e., the
ABI provided by relocate_kernel).  We do not want hypervisor internals
to be constrained by having to be compatible with kernel internals.

>>>   - Hmmm... Now I think that we should still use kexec syscall to load image
>>>     into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
>>>     all things which are needed to call kdump if dom0 crashes; however,
>>>     I could be wrong...
>>
>> I don't think we need the kexec syscall.  The kernel can unconditionally
>> do the crash hypercall, which will return if the kdump kernel isn't
>> loaded and the kernel can fall back to the regular non-kexec panic.
> 
> No, please do not do that. When you call HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
> system is completly shutdown. Return form HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
> would require to restore some kernel functionalities. It maybe impossible
> in some cases. Additionally, it means that some changes should be made
> in generic kexec code path. As I know kexec maintainers are very reluctant
> to make such things.

Huh?  There only needs to be a call to a new hypervisor_crash_kexec()
function (which would then call the Xen specific crash hypercall) at the
very beginning of crash_kexec().  If this returns the normal
crash/shutdown path is done (which could even include a guest kexec!).

David

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-11  4:16                         ` Eric W. Biederman
  (?)
@ 2013-01-11 16:55                           ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 217+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-11 16:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Daniel Kiper, xen-devel, H. Peter Anvin, Andrew Cooper, x86,
	kexec, linux-kernel, virtualization, mingo, Jan Beulich,
	maxim.uvarov, tglx, vgoyal

On Thu, Jan 10, 2013 at 08:16:48PM -0800, Eric W. Biederman wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> 
> > On Mon, Jan 07, 2013 at 01:34:04PM +0100, Daniel Kiper wrote:
> >> I think that new kexec hypercall function should mimics kexec syscall.
> >> It means that all arguments passed to hypercall should have same types
> >> if it is possible or if it is not possible then conversion should be done
> >> in very easy way. Additionally, I think that one call of new hypercall
> >> load function should load all needed thinks in right place and
> >> return relevant status. Last but not least, new functionality should
> >
> > We are not restricted to just _one_ hypercall. And this loading
> > thing could be similar to the micrcode hypercall - which just points
> > to a virtual address along with the length - and says 'load me'.
> >
> >> be available through /dev/xen/privcmd or directly from kernel without
> >> bigger effort.
> >
> > Perhaps we should have a email thread on xen-devel where we hash out
> > some ideas. Eric, would you be OK included on this - it would make
> > sense for this mechanism to be as future-proof as possible - and I am not
> > sure what your plans for kexec are in the future?
> 
> The basic kexec interface is.
> 
> load ranges of virtual addresses physical addresses.
> jump to the physical address  with identity mapped page tables.
> 
> There are a few flags to allow for different usage scenarios like
> kexec on panic vs normal kexec.

And there is nothing fancy to be done for EFI and SecureBoot? Or is
that something that the kernel has to handle on its own (so somehow
passing some certificates to somewhere).

> 
> It is very very simple and very extensible.  All of the weird glue
> happens in userspace.
> 
> Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 16:55                           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 217+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-11 16:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: xen-devel, Andrew Cooper, Daniel Kiper, x86, kexec, linux-kernel,
	virtualization, mingo, Jan Beulich, H. Peter Anvin, maxim.uvarov,
	tglx, vgoyal

On Thu, Jan 10, 2013 at 08:16:48PM -0800, Eric W. Biederman wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> 
> > On Mon, Jan 07, 2013 at 01:34:04PM +0100, Daniel Kiper wrote:
> >> I think that new kexec hypercall function should mimics kexec syscall.
> >> It means that all arguments passed to hypercall should have same types
> >> if it is possible or if it is not possible then conversion should be done
> >> in very easy way. Additionally, I think that one call of new hypercall
> >> load function should load all needed thinks in right place and
> >> return relevant status. Last but not least, new functionality should
> >
> > We are not restricted to just _one_ hypercall. And this loading
> > thing could be similar to the micrcode hypercall - which just points
> > to a virtual address along with the length - and says 'load me'.
> >
> >> be available through /dev/xen/privcmd or directly from kernel without
> >> bigger effort.
> >
> > Perhaps we should have a email thread on xen-devel where we hash out
> > some ideas. Eric, would you be OK included on this - it would make
> > sense for this mechanism to be as future-proof as possible - and I am not
> > sure what your plans for kexec are in the future?
> 
> The basic kexec interface is.
> 
> load ranges of virtual addresses physical addresses.
> jump to the physical address  with identity mapped page tables.
> 
> There are a few flags to allow for different usage scenarios like
> kexec on panic vs normal kexec.

And there is nothing fancy to be done for EFI and SecureBoot? Or is
that something that the kernel has to handle on its own (so somehow
passing some certificates to somewhere).

> 
> It is very very simple and very extensible.  All of the weird glue
> happens in userspace.
> 
> Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 16:55                           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 217+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-11 16:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: xen-devel, Andrew Cooper, Daniel Kiper, x86, kexec, linux-kernel,
	virtualization, mingo, Jan Beulich, H. Peter Anvin, maxim.uvarov,
	tglx, vgoyal

On Thu, Jan 10, 2013 at 08:16:48PM -0800, Eric W. Biederman wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> 
> > On Mon, Jan 07, 2013 at 01:34:04PM +0100, Daniel Kiper wrote:
> >> I think that new kexec hypercall function should mimics kexec syscall.
> >> It means that all arguments passed to hypercall should have same types
> >> if it is possible or if it is not possible then conversion should be done
> >> in very easy way. Additionally, I think that one call of new hypercall
> >> load function should load all needed thinks in right place and
> >> return relevant status. Last but not least, new functionality should
> >
> > We are not restricted to just _one_ hypercall. And this loading
> > thing could be similar to the micrcode hypercall - which just points
> > to a virtual address along with the length - and says 'load me'.
> >
> >> be available through /dev/xen/privcmd or directly from kernel without
> >> bigger effort.
> >
> > Perhaps we should have a email thread on xen-devel where we hash out
> > some ideas. Eric, would you be OK included on this - it would make
> > sense for this mechanism to be as future-proof as possible - and I am not
> > sure what your plans for kexec are in the future?
> 
> The basic kexec interface is.
> 
> load ranges of virtual addresses physical addresses.
> jump to the physical address  with identity mapped page tables.
> 
> There are a few flags to allow for different usage scenarios like
> kexec on panic vs normal kexec.

And there is nothing fancy to be done for EFI and SecureBoot? Or is
that something that the kernel has to handle on its own (so somehow
passing some certificates to somewhere).

> 
> It is very very simple and very extensible.  All of the weird glue
> happens in userspace.
> 
> Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-11 15:22                       ` David Vrabel
  (?)
@ 2013-01-11 17:34                         ` Daniel Kiper
  -1 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-11 17:34 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, H. Peter Anvin, konrad.wilk, Andrew Cooper, x86,
	kexec, linux-kernel, virtualization, mingo, Eric W. Biederman,
	jbeulich, maxim.uvarov, tglx, vgoyal

On Fri, Jan 11, 2013 at 03:22:35PM +0000, David Vrabel wrote:
> On 11/01/13 13:22, Daniel Kiper wrote:
> > On Thu, Jan 10, 2013 at 02:19:55PM +0000, David Vrabel wrote:
> >> On 04/01/13 17:01, Daniel Kiper wrote:
> >>> My .5 cents:
> >>>   - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
> >>>     probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2;
> >>>     load should __LOAD__ kernel image and other things into hypervisor memory;
> >>
> >> Yes, but I don't see how we can easily support both ABIs easily.  I'd be
> >> in favour of replacing the existing hypercalls and requiring updated
> >> kexec tools in dom0 (this isn't that different to requiring the correct
> >> libxc in dom0).
> >
> > Why? Just define new strutures for new functions of kexec hypercall.
> > That should suffice.
>
> The current hypervisor ABI depends on an internal kernel ABI (i.e., the
> ABI provided by relocate_kernel).  We do not want hypervisor internals
> to be constrained by having to be compatible with kernel internals.

I agree. I did not sugest to stay with current interface. Old KEXEC_CMD_kexec_load
and KEXEC_CMD_kexec_unload should stay as is for backward compatibility (maybe
someday they should be removed). However, I do not see any problem in adding
new KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2 functions with completely
new arguments to existing kexec hypercall. Let's say something like that:

struct kexec_segment {
  void *buf;
  size_t bufsz;
  unsigned long mem;
  size_t memsz;
};

struct xen_kexec_load2 {
  unsigned long entry;
  unsigned long nr_segments;
  struct kexec_segment *segments;
  unsigned long flags;
};

struct xen_kexec_load2 xkl2;

...

rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_load2, &xkl2);

Regarding relocate_kernel(), it should be Xen hypervisor specific but
probably most of the code will be similar to its Linux Kernel version.
It should only at the end leave machine in state identical with state
left by Linux Kernel version of relocate_kernel(). Just to be compatible
with existing kexec/kdump implementations.

> >>>     probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_k
>
> >>>   - Hmmm... Now I think that we should still use kexec syscall to load image
> >>>     into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
> >>>     all things which are needed to call kdump if dom0 crashes; however,
> >>>     I could be wrong...
> >>
> >> I don't think we need the kexec syscall.  The kernel can unconditionally
> >> do the crash hypercall, which will return if the kdump kernel isn't
> >> loaded and the kernel can fall back to the regular non-kexec panic.
> >
> > No, please do not do that. When you call HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
> > system is completly shutdown. Return form HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
> > would require to restore some kernel functionalities. It maybe impossible
> > in some cases. Additionally, it means that some changes should be made
> > in generic kexec code path. As I know kexec maintainers are very reluctant
> > to make such things.
>
> Huh?  There only needs to be a call to a new hypervisor_crash_kexec()
> function (which would then call the Xen specific crash hypercall) at the
> very beginning of crash_kexec().  If this returns the normal
> crash/shutdown path is done (which could even include a guest kexec!).

I am still not convinced. Howerver, go ahead with your vision in this case.
Later we will see it makes sense.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 17:34                         ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-11 17:34 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, konrad.wilk, Andrew Cooper, maxim.uvarov, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman, jbeulich,
	H. Peter Anvin, tglx, vgoyal

On Fri, Jan 11, 2013 at 03:22:35PM +0000, David Vrabel wrote:
> On 11/01/13 13:22, Daniel Kiper wrote:
> > On Thu, Jan 10, 2013 at 02:19:55PM +0000, David Vrabel wrote:
> >> On 04/01/13 17:01, Daniel Kiper wrote:
> >>> My .5 cents:
> >>>   - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
> >>>     probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2;
> >>>     load should __LOAD__ kernel image and other things into hypervisor memory;
> >>
> >> Yes, but I don't see how we can easily support both ABIs easily.  I'd be
> >> in favour of replacing the existing hypercalls and requiring updated
> >> kexec tools in dom0 (this isn't that different to requiring the correct
> >> libxc in dom0).
> >
> > Why? Just define new strutures for new functions of kexec hypercall.
> > That should suffice.
>
> The current hypervisor ABI depends on an internal kernel ABI (i.e., the
> ABI provided by relocate_kernel).  We do not want hypervisor internals
> to be constrained by having to be compatible with kernel internals.

I agree. I did not sugest to stay with current interface. Old KEXEC_CMD_kexec_load
and KEXEC_CMD_kexec_unload should stay as is for backward compatibility (maybe
someday they should be removed). However, I do not see any problem in adding
new KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2 functions with completely
new arguments to existing kexec hypercall. Let's say something like that:

struct kexec_segment {
  void *buf;
  size_t bufsz;
  unsigned long mem;
  size_t memsz;
};

struct xen_kexec_load2 {
  unsigned long entry;
  unsigned long nr_segments;
  struct kexec_segment *segments;
  unsigned long flags;
};

struct xen_kexec_load2 xkl2;

...

rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_load2, &xkl2);

Regarding relocate_kernel(), it should be Xen hypervisor specific but
probably most of the code will be similar to its Linux Kernel version.
It should only at the end leave machine in state identical with state
left by Linux Kernel version of relocate_kernel(). Just to be compatible
with existing kexec/kdump implementations.

> >>>     probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_k
>
> >>>   - Hmmm... Now I think that we should still use kexec syscall to load image
> >>>     into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
> >>>     all things which are needed to call kdump if dom0 crashes; however,
> >>>     I could be wrong...
> >>
> >> I don't think we need the kexec syscall.  The kernel can unconditionally
> >> do the crash hypercall, which will return if the kdump kernel isn't
> >> loaded and the kernel can fall back to the regular non-kexec panic.
> >
> > No, please do not do that. When you call HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
> > system is completly shutdown. Return form HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
> > would require to restore some kernel functionalities. It maybe impossible
> > in some cases. Additionally, it means that some changes should be made
> > in generic kexec code path. As I know kexec maintainers are very reluctant
> > to make such things.
>
> Huh?  There only needs to be a call to a new hypervisor_crash_kexec()
> function (which would then call the Xen specific crash hypercall) at the
> very beginning of crash_kexec().  If this returns the normal
> crash/shutdown path is done (which could even include a guest kexec!).

I am still not convinced. Howerver, go ahead with your vision in this case.
Later we will see it makes sense.

Daniel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 17:34                         ` Daniel Kiper
  0 siblings, 0 replies; 217+ messages in thread
From: Daniel Kiper @ 2013-01-11 17:34 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, konrad.wilk, Andrew Cooper, maxim.uvarov, x86, kexec,
	linux-kernel, virtualization, mingo, Eric W. Biederman, jbeulich,
	H. Peter Anvin, tglx, vgoyal

On Fri, Jan 11, 2013 at 03:22:35PM +0000, David Vrabel wrote:
> On 11/01/13 13:22, Daniel Kiper wrote:
> > On Thu, Jan 10, 2013 at 02:19:55PM +0000, David Vrabel wrote:
> >> On 04/01/13 17:01, Daniel Kiper wrote:
> >>> My .5 cents:
> >>>   - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
> >>>     probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2;
> >>>     load should __LOAD__ kernel image and other things into hypervisor memory;
> >>
> >> Yes, but I don't see how we can easily support both ABIs easily.  I'd be
> >> in favour of replacing the existing hypercalls and requiring updated
> >> kexec tools in dom0 (this isn't that different to requiring the correct
> >> libxc in dom0).
> >
> > Why? Just define new strutures for new functions of kexec hypercall.
> > That should suffice.
>
> The current hypervisor ABI depends on an internal kernel ABI (i.e., the
> ABI provided by relocate_kernel).  We do not want hypervisor internals
> to be constrained by having to be compatible with kernel internals.

I agree. I did not sugest to stay with current interface. Old KEXEC_CMD_kexec_load
and KEXEC_CMD_kexec_unload should stay as is for backward compatibility (maybe
someday they should be removed). However, I do not see any problem in adding
new KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2 functions with completely
new arguments to existing kexec hypercall. Let's say something like that:

struct kexec_segment {
  void *buf;
  size_t bufsz;
  unsigned long mem;
  size_t memsz;
};

struct xen_kexec_load2 {
  unsigned long entry;
  unsigned long nr_segments;
  struct kexec_segment *segments;
  unsigned long flags;
};

struct xen_kexec_load2 xkl2;

...

rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_load2, &xkl2);

Regarding relocate_kernel(), it should be Xen hypervisor specific but
probably most of the code will be similar to its Linux Kernel version.
It should only at the end leave machine in state identical with state
left by Linux Kernel version of relocate_kernel(). Just to be compatible
with existing kexec/kdump implementations.

> >>>     probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_k
>
> >>>   - Hmmm... Now I think that we should still use kexec syscall to load image
> >>>     into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
> >>>     all things which are needed to call kdump if dom0 crashes; however,
> >>>     I could be wrong...
> >>
> >> I don't think we need the kexec syscall.  The kernel can unconditionally
> >> do the crash hypercall, which will return if the kdump kernel isn't
> >> loaded and the kernel can fall back to the regular non-kexec panic.
> >
> > No, please do not do that. When you call HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
> > system is completly shutdown. Return form HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
> > would require to restore some kernel functionalities. It maybe impossible
> > in some cases. Additionally, it means that some changes should be made
> > in generic kexec code path. As I know kexec maintainers are very reluctant
> > to make such things.
>
> Huh?  There only needs to be a call to a new hypervisor_crash_kexec()
> function (which would then call the Xen specific crash hypercall) at the
> very beginning of crash_kexec().  If this returns the normal
> crash/shutdown path is done (which could even include a guest kexec!).

I am still not convinced. Howerver, go ahead with your vision in this case.
Later we will see it makes sense.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-11 15:22                       ` David Vrabel
  (?)
  (?)
@ 2013-01-11 20:05                         ` Eric W. Biederman
  -1 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2013-01-11 20:05 UTC (permalink / raw)
  To: David Vrabel
  Cc: Daniel Kiper, xen-devel, H. Peter Anvin, konrad.wilk,
	Andrew Cooper, x86, kexec, linux-kernel, virtualization, mingo,
	jbeulich, maxim.uvarov, tglx, vgoyal

David Vrabel <david.vrabel@citrix.com> writes:

> On 11/01/13 13:22, Daniel Kiper wrote:
>> On Thu, Jan 10, 2013 at 02:19:55PM +0000, David Vrabel wrote:
>>> On 04/01/13 17:01, Daniel Kiper wrote:
>>>> My .5 cents:
>>>>   - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
>>>>     probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2;
>>>>     load should __LOAD__ kernel image and other things into hypervisor memory;
>>>
>>> Yes, but I don't see how we can easily support both ABIs easily.  I'd be
>>> in favour of replacing the existing hypercalls and requiring updated
>>> kexec tools in dom0 (this isn't that different to requiring the correct
>>> libxc in dom0).
>> 
>> Why? Just define new strutures for new functions of kexec hypercall.
>> That should suffice.
>
> The current hypervisor ABI depends on an internal kernel ABI (i.e., the
> ABI provided by relocate_kernel).  We do not want hypervisor internals
> to be constrained by having to be compatible with kernel internals.

I think this is violent agreement.  A new call with new arguments seems
agreed upon.  The only question seems to be what happens to the old
hypercall.  Keeping the current deprecated hypercall with the current
ABI and not updating it, or modifying the current hypercall to return
the xen equivalant of -ENOSYS seems to be the only question.

Certainly /sbin/kexec will only support the new hypercall once the
support has merged.

>> No, please do not do that. When you call HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
>> system is completly shutdown. Return form HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
>> would require to restore some kernel functionalities. It maybe impossible
>> in some cases. Additionally, it means that some changes should be made
>> in generic kexec code path. As I know kexec maintainers are very reluctant
>> to make such things.
>
> Huh?  There only needs to be a call to a new hypervisor_crash_kexec()
> function (which would then call the Xen specific crash hypercall) at the
> very beginning of crash_kexec().  If this returns the normal
> crash/shutdown path is done (which could even include a guest kexec!).

Can you imagine what crash_kexec would look like if every architecture
would hard code their own little piece in there?

The practical issue with changing crash_kexec is that you are hard
coding Xen policy just before a jump to a piece of code whose purpose
is to implement policy.

>From a maintenance and code comprehension stand-ponit it is much cleaner
to put the hypervisor_crash_kexec() hypercall into the code that is
loaded with sys_kexec_load and is branched to by crash_kexec.  I would
have no problem with hard coding that behavior into /sbin/kexec in
the case of Xen dom0.

Having any code have different semantics when running under Xen is a
maintenance nightmare, and why we are having the conversation years and
years after the initial deployment of Xen.  A tiny hard coded stub that
calls a hypercall should work indefinitely with no one having to do
anything.

Eric


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 20:05                         ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2013-01-11 20:05 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, konrad.wilk, Andrew Cooper, Daniel Kiper, x86, kexec,
	linux-kernel, virtualization, mingo, jbeulich, H. Peter Anvin,
	maxim.uvarov, tglx, vgoyal

David Vrabel <david.vrabel@citrix.com> writes:

> On 11/01/13 13:22, Daniel Kiper wrote:
>> On Thu, Jan 10, 2013 at 02:19:55PM +0000, David Vrabel wrote:
>>> On 04/01/13 17:01, Daniel Kiper wrote:
>>>> My .5 cents:
>>>>   - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
>>>>     probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2;
>>>>     load should __LOAD__ kernel image and other things into hypervisor memory;
>>>
>>> Yes, but I don't see how we can easily support both ABIs easily.  I'd be
>>> in favour of replacing the existing hypercalls and requiring updated
>>> kexec tools in dom0 (this isn't that different to requiring the correct
>>> libxc in dom0).
>> 
>> Why? Just define new strutures for new functions of kexec hypercall.
>> That should suffice.
>
> The current hypervisor ABI depends on an internal kernel ABI (i.e., the
> ABI provided by relocate_kernel).  We do not want hypervisor internals
> to be constrained by having to be compatible with kernel internals.

I think this is violent agreement.  A new call with new arguments seems
agreed upon.  The only question seems to be what happens to the old
hypercall.  Keeping the current deprecated hypercall with the current
ABI and not updating it, or modifying the current hypercall to return
the xen equivalant of -ENOSYS seems to be the only question.

Certainly /sbin/kexec will only support the new hypercall once the
support has merged.

>> No, please do not do that. When you call HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
>> system is completly shutdown. Return form HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
>> would require to restore some kernel functionalities. It maybe impossible
>> in some cases. Additionally, it means that some changes should be made
>> in generic kexec code path. As I know kexec maintainers are very reluctant
>> to make such things.
>
> Huh?  There only needs to be a call to a new hypervisor_crash_kexec()
> function (which would then call the Xen specific crash hypercall) at the
> very beginning of crash_kexec().  If this returns the normal
> crash/shutdown path is done (which could even include a guest kexec!).

Can you imagine what crash_kexec would look like if every architecture
would hard code their own little piece in there?

The practical issue with changing crash_kexec is that you are hard
coding Xen policy just before a jump to a piece of code whose purpose
is to implement policy.

From a maintenance and code comprehension stand-ponit it is much cleaner
to put the hypervisor_crash_kexec() hypercall into the code that is
loaded with sys_kexec_load and is branched to by crash_kexec.  I would
have no problem with hard coding that behavior into /sbin/kexec in
the case of Xen dom0.

Having any code have different semantics when running under Xen is a
maintenance nightmare, and why we are having the conversation years and
years after the initial deployment of Xen.  A tiny hard coded stub that
calls a hypercall should work indefinitely with no one having to do
anything.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 20:05                         ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2013-01-11 20:05 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, konrad.wilk, Andrew Cooper, Daniel Kiper, x86, kexec,
	linux-kernel, virtualization, mingo, jbeulich, H. Peter Anvin,
	maxim.uvarov, tglx, vgoyal

David Vrabel <david.vrabel@citrix.com> writes:

> On 11/01/13 13:22, Daniel Kiper wrote:
>> On Thu, Jan 10, 2013 at 02:19:55PM +0000, David Vrabel wrote:
>>> On 04/01/13 17:01, Daniel Kiper wrote:
>>>> My .5 cents:
>>>>   - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
>>>>     probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2;
>>>>     load should __LOAD__ kernel image and other things into hypervisor memory;
>>>
>>> Yes, but I don't see how we can easily support both ABIs easily.  I'd be
>>> in favour of replacing the existing hypercalls and requiring updated
>>> kexec tools in dom0 (this isn't that different to requiring the correct
>>> libxc in dom0).
>> 
>> Why? Just define new strutures for new functions of kexec hypercall.
>> That should suffice.
>
> The current hypervisor ABI depends on an internal kernel ABI (i.e., the
> ABI provided by relocate_kernel).  We do not want hypervisor internals
> to be constrained by having to be compatible with kernel internals.

I think this is violent agreement.  A new call with new arguments seems
agreed upon.  The only question seems to be what happens to the old
hypercall.  Keeping the current deprecated hypercall with the current
ABI and not updating it, or modifying the current hypercall to return
the xen equivalant of -ENOSYS seems to be the only question.

Certainly /sbin/kexec will only support the new hypercall once the
support has merged.

>> No, please do not do that. When you call HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
>> system is completly shutdown. Return form HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
>> would require to restore some kernel functionalities. It maybe impossible
>> in some cases. Additionally, it means that some changes should be made
>> in generic kexec code path. As I know kexec maintainers are very reluctant
>> to make such things.
>
> Huh?  There only needs to be a call to a new hypervisor_crash_kexec()
> function (which would then call the Xen specific crash hypercall) at the
> very beginning of crash_kexec().  If this returns the normal
> crash/shutdown path is done (which could even include a guest kexec!).

Can you imagine what crash_kexec would look like if every architecture
would hard code their own little piece in there?

The practical issue with changing crash_kexec is that you are hard
coding Xen policy just before a jump to a piece of code whose purpose
is to implement policy.

>From a maintenance and code comprehension stand-ponit it is much cleaner
to put the hypervisor_crash_kexec() hypercall into the code that is
loaded with sys_kexec_load and is branched to by crash_kexec.  I would
have no problem with hard coding that behavior into /sbin/kexec in
the case of Xen dom0.

Having any code have different semantics when running under Xen is a
maintenance nightmare, and why we are having the conversation years and
years after the initial deployment of Xen.  A tiny hard coded stub that
calls a hypercall should work indefinitely with no one having to do
anything.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 20:05                         ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2013-01-11 20:05 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, konrad.wilk, Andrew Cooper, Daniel Kiper, x86, kexec,
	linux-kernel, virtualization, mingo, jbeulich, H. Peter Anvin,
	maxim.uvarov, tglx, vgoyal

David Vrabel <david.vrabel@citrix.com> writes:

> On 11/01/13 13:22, Daniel Kiper wrote:
>> On Thu, Jan 10, 2013 at 02:19:55PM +0000, David Vrabel wrote:
>>> On 04/01/13 17:01, Daniel Kiper wrote:
>>>> My .5 cents:
>>>>   - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
>>>>     probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2;
>>>>     load should __LOAD__ kernel image and other things into hypervisor memory;
>>>
>>> Yes, but I don't see how we can easily support both ABIs easily.  I'd be
>>> in favour of replacing the existing hypercalls and requiring updated
>>> kexec tools in dom0 (this isn't that different to requiring the correct
>>> libxc in dom0).
>> 
>> Why? Just define new strutures for new functions of kexec hypercall.
>> That should suffice.
>
> The current hypervisor ABI depends on an internal kernel ABI (i.e., the
> ABI provided by relocate_kernel).  We do not want hypervisor internals
> to be constrained by having to be compatible with kernel internals.

I think this is violent agreement.  A new call with new arguments seems
agreed upon.  The only question seems to be what happens to the old
hypercall.  Keeping the current deprecated hypercall with the current
ABI and not updating it, or modifying the current hypercall to return
the xen equivalant of -ENOSYS seems to be the only question.

Certainly /sbin/kexec will only support the new hypercall once the
support has merged.

>> No, please do not do that. When you call HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
>> system is completly shutdown. Return form HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
>> would require to restore some kernel functionalities. It maybe impossible
>> in some cases. Additionally, it means that some changes should be made
>> in generic kexec code path. As I know kexec maintainers are very reluctant
>> to make such things.
>
> Huh?  There only needs to be a call to a new hypervisor_crash_kexec()
> function (which would then call the Xen specific crash hypercall) at the
> very beginning of crash_kexec().  If this returns the normal
> crash/shutdown path is done (which could even include a guest kexec!).

Can you imagine what crash_kexec would look like if every architecture
would hard code their own little piece in there?

The practical issue with changing crash_kexec is that you are hard
coding Xen policy just before a jump to a piece of code whose purpose
is to implement policy.

From a maintenance and code comprehension stand-ponit it is much cleaner
to put the hypervisor_crash_kexec() hypercall into the code that is
loaded with sys_kexec_load and is branched to by crash_kexec.  I would
have no problem with hard coding that behavior into /sbin/kexec in
the case of Xen dom0.

Having any code have different semantics when running under Xen is a
maintenance nightmare, and why we are having the conversation years and
years after the initial deployment of Xen.  A tiny hard coded stub that
calls a hypercall should work indefinitely with no one having to do
anything.

Eric


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-11 16:55                           ` Konrad Rzeszutek Wilk
  (?)
@ 2013-01-11 20:26                             ` H. Peter Anvin
  -1 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2013-01-11 20:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Eric W. Biederman, Daniel Kiper, xen-devel, Andrew Cooper, x86,
	kexec, linux-kernel, virtualization, mingo, Jan Beulich,
	maxim.uvarov, tglx, vgoyal, David Woodhouse

>
> And there is nothing fancy to be done for EFI and SecureBoot? Or is
> that something that the kernel has to handle on its own (so somehow
> passing some certificates to somewhere).
>

For EFI, no... other than passing the EFI parameters, which apparently 
is *not* currently done (David Woodhouse is working on it.)  Secure boot 
is still a work in progress.


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 20:26                             ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2013-01-11 20:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Andrew Cooper, Daniel Kiper, x86, kexec, linux-kernel,
	virtualization, mingo, Eric W. Biederman, Jan Beulich,
	maxim.uvarov, tglx, David Woodhouse, vgoyal

>
> And there is nothing fancy to be done for EFI and SecureBoot? Or is
> that something that the kernel has to handle on its own (so somehow
> passing some certificates to somewhere).
>

For EFI, no... other than passing the EFI parameters, which apparently 
is *not* currently done (David Woodhouse is working on it.)  Secure boot 
is still a work in progress.


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 20:26                             ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2013-01-11 20:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Andrew Cooper, Daniel Kiper, x86, kexec, linux-kernel,
	virtualization, mingo, Eric W. Biederman, Jan Beulich,
	maxim.uvarov, tglx, David Woodhouse, vgoyal

>
> And there is nothing fancy to be done for EFI and SecureBoot? Or is
> that something that the kernel has to handle on its own (so somehow
> passing some certificates to somewhere).
>

For EFI, no... other than passing the EFI parameters, which apparently 
is *not* currently done (David Woodhouse is working on it.)  Secure boot 
is still a work in progress.


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-11 16:55                           ` Konrad Rzeszutek Wilk
  (?)
@ 2013-01-11 20:26                             ` Eric W. Biederman
  -1 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2013-01-11 20:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Daniel Kiper, xen-devel, H. Peter Anvin, Andrew Cooper, x86,
	kexec, linux-kernel, virtualization, mingo, Jan Beulich,
	maxim.uvarov, tglx, vgoyal

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Thu, Jan 10, 2013 at 08:16:48PM -0800, Eric W. Biederman wrote:

>> The basic kexec interface is.
>> 
>> load ranges of virtual addresses physical addresses.
>> jump to the physical address  with identity mapped page tables.
>> 
>> There are a few flags to allow for different usage scenarios like
>> kexec on panic vs normal kexec.
>
> And there is nothing fancy to be done for EFI and SecureBoot?

There is a mess with EFI.  Reports are that EFI is a bug ridden pile,
and people keep advocating that we make more and more EFI calls in the
main kernel.  There is an argument over set_virtual_mapping, which is a
call that can be made only once which relocates the EFI code to a
different address, which makes life inconvient for kexec.  There is
another argument that EFI doesn't actually work if you don't make the
set_virtual_mapping call so we can't remove it and always use physical
addresses.

Frankly the only sane way to run a linux kernel under EFI is to scrape
up the information needed to talk to the hardware directly and ignore
EFI.  That is what we have historically done in the face of BIOS madness
and if anything the situation is worse with EFI, but it looks like we
are going to have to learn that the hard way.

Recently there is a desire to figure out how to /sbin/kexec support
signed kernel images.  What will probably happen is to have a specially
trusted userspace application perform the verification.  Sort of like
dom0 for the linux userspace.  A few other ideas have been batted around
but none that have stuck.

None of that is really about SecureBoot.  It is all trusting the kernel
binary but not trusting userspace.  With SecureBoot being an excuse for
coming up with a policy like that.

It looks like the answer to SecureBoot at this point may simply be just
reconfigure your BIOS or root Windows and EFI to get the hardware to do
what you want.

So the answer for looking forward for Xen dom0 is: A trusted /sbin/kexec
won't require changes.  The other suggest solution is a flag that says a
specific chunk of the loaded image is a signature that the magic trust
faires can verify.  As long as you have a flag bit free you should be
able to implement that policy if we ever implement it.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 20:26                             ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2013-01-11 20:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Andrew Cooper, Daniel Kiper, x86, kexec, linux-kernel,
	virtualization, mingo, Jan Beulich, H. Peter Anvin, maxim.uvarov,
	tglx, vgoyal

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Thu, Jan 10, 2013 at 08:16:48PM -0800, Eric W. Biederman wrote:

>> The basic kexec interface is.
>> 
>> load ranges of virtual addresses physical addresses.
>> jump to the physical address  with identity mapped page tables.
>> 
>> There are a few flags to allow for different usage scenarios like
>> kexec on panic vs normal kexec.
>
> And there is nothing fancy to be done for EFI and SecureBoot?

There is a mess with EFI.  Reports are that EFI is a bug ridden pile,
and people keep advocating that we make more and more EFI calls in the
main kernel.  There is an argument over set_virtual_mapping, which is a
call that can be made only once which relocates the EFI code to a
different address, which makes life inconvient for kexec.  There is
another argument that EFI doesn't actually work if you don't make the
set_virtual_mapping call so we can't remove it and always use physical
addresses.

Frankly the only sane way to run a linux kernel under EFI is to scrape
up the information needed to talk to the hardware directly and ignore
EFI.  That is what we have historically done in the face of BIOS madness
and if anything the situation is worse with EFI, but it looks like we
are going to have to learn that the hard way.

Recently there is a desire to figure out how to /sbin/kexec support
signed kernel images.  What will probably happen is to have a specially
trusted userspace application perform the verification.  Sort of like
dom0 for the linux userspace.  A few other ideas have been batted around
but none that have stuck.

None of that is really about SecureBoot.  It is all trusting the kernel
binary but not trusting userspace.  With SecureBoot being an excuse for
coming up with a policy like that.

It looks like the answer to SecureBoot at this point may simply be just
reconfigure your BIOS or root Windows and EFI to get the hardware to do
what you want.

So the answer for looking forward for Xen dom0 is: A trusted /sbin/kexec
won't require changes.  The other suggest solution is a flag that says a
specific chunk of the loaded image is a signature that the magic trust
faires can verify.  As long as you have a flag bit free you should be
able to implement that policy if we ever implement it.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 20:26                             ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2013-01-11 20:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Andrew Cooper, Daniel Kiper, x86, kexec, linux-kernel,
	virtualization, mingo, Jan Beulich, H. Peter Anvin, maxim.uvarov,
	tglx, vgoyal

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Thu, Jan 10, 2013 at 08:16:48PM -0800, Eric W. Biederman wrote:

>> The basic kexec interface is.
>> 
>> load ranges of virtual addresses physical addresses.
>> jump to the physical address  with identity mapped page tables.
>> 
>> There are a few flags to allow for different usage scenarios like
>> kexec on panic vs normal kexec.
>
> And there is nothing fancy to be done for EFI and SecureBoot?

There is a mess with EFI.  Reports are that EFI is a bug ridden pile,
and people keep advocating that we make more and more EFI calls in the
main kernel.  There is an argument over set_virtual_mapping, which is a
call that can be made only once which relocates the EFI code to a
different address, which makes life inconvient for kexec.  There is
another argument that EFI doesn't actually work if you don't make the
set_virtual_mapping call so we can't remove it and always use physical
addresses.

Frankly the only sane way to run a linux kernel under EFI is to scrape
up the information needed to talk to the hardware directly and ignore
EFI.  That is what we have historically done in the face of BIOS madness
and if anything the situation is worse with EFI, but it looks like we
are going to have to learn that the hard way.

Recently there is a desire to figure out how to /sbin/kexec support
signed kernel images.  What will probably happen is to have a specially
trusted userspace application perform the verification.  Sort of like
dom0 for the linux userspace.  A few other ideas have been batted around
but none that have stuck.

None of that is really about SecureBoot.  It is all trusting the kernel
binary but not trusting userspace.  With SecureBoot being an excuse for
coming up with a policy like that.

It looks like the answer to SecureBoot at this point may simply be just
reconfigure your BIOS or root Windows and EFI to get the hardware to do
what you want.

So the answer for looking forward for Xen dom0 is: A trusted /sbin/kexec
won't require changes.  The other suggest solution is a flag that says a
specific chunk of the loaded image is a signature that the magic trust
faires can verify.  As long as you have a flag bit free you should be
able to implement that policy if we ever implement it.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-11 20:26                             ` H. Peter Anvin
  (?)
@ 2013-01-11 20:43                               ` Vivek Goyal
  -1 siblings, 0 replies; 217+ messages in thread
From: Vivek Goyal @ 2013-01-11 20:43 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Konrad Rzeszutek Wilk, Eric W. Biederman, Daniel Kiper,
	xen-devel, Andrew Cooper, x86, kexec, linux-kernel,
	virtualization, mingo, Jan Beulich, maxim.uvarov, tglx,
	David Woodhouse

On Fri, Jan 11, 2013 at 12:26:48PM -0800, H. Peter Anvin wrote:
> >
> >And there is nothing fancy to be done for EFI and SecureBoot? Or is
> >that something that the kernel has to handle on its own (so somehow
> >passing some certificates to somewhere).
> >
> 
> For EFI, no... other than passing the EFI parameters, which
> apparently is *not* currently done (David Woodhouse is working on
> it.)  Secure boot is still a work in progress.

For secureboot, as a first step in that direction, I just wrote some code
to sign elf executable and be able to verify it in kernel upon exec(). I
am soon planning to post RFC code (most likely next week).

Hopefully we will be able to sign statically signed /sbin/kexec, give
it extra capability (upon signature verification) to be able to call
sys_exec().

Thanks
Vivek

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 20:43                               ` Vivek Goyal
  0 siblings, 0 replies; 217+ messages in thread
From: Vivek Goyal @ 2013-01-11 20:43 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: xen-devel, Konrad Rzeszutek Wilk, Andrew Cooper, Daniel Kiper,
	x86, kexec, linux-kernel, virtualization, mingo,
	Eric W. Biederman, Jan Beulich, maxim.uvarov, tglx,
	David Woodhouse

On Fri, Jan 11, 2013 at 12:26:48PM -0800, H. Peter Anvin wrote:
> >
> >And there is nothing fancy to be done for EFI and SecureBoot? Or is
> >that something that the kernel has to handle on its own (so somehow
> >passing some certificates to somewhere).
> >
> 
> For EFI, no... other than passing the EFI parameters, which
> apparently is *not* currently done (David Woodhouse is working on
> it.)  Secure boot is still a work in progress.

For secureboot, as a first step in that direction, I just wrote some code
to sign elf executable and be able to verify it in kernel upon exec(). I
am soon planning to post RFC code (most likely next week).

Hopefully we will be able to sign statically signed /sbin/kexec, give
it extra capability (upon signature verification) to be able to call
sys_exec().

Thanks
Vivek

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 20:43                               ` Vivek Goyal
  0 siblings, 0 replies; 217+ messages in thread
From: Vivek Goyal @ 2013-01-11 20:43 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: xen-devel, Konrad Rzeszutek Wilk, Andrew Cooper, Daniel Kiper,
	x86, kexec, linux-kernel, virtualization, mingo,
	Eric W. Biederman, Jan Beulich, maxim.uvarov, tglx,
	David Woodhouse

On Fri, Jan 11, 2013 at 12:26:48PM -0800, H. Peter Anvin wrote:
> >
> >And there is nothing fancy to be done for EFI and SecureBoot? Or is
> >that something that the kernel has to handle on its own (so somehow
> >passing some certificates to somewhere).
> >
> 
> For EFI, no... other than passing the EFI parameters, which
> apparently is *not* currently done (David Woodhouse is working on
> it.)  Secure boot is still a work in progress.

For secureboot, as a first step in that direction, I just wrote some code
to sign elf executable and be able to verify it in kernel upon exec(). I
am soon planning to post RFC code (most likely next week).

Hopefully we will be able to sign statically signed /sbin/kexec, give
it extra capability (upon signature verification) to be able to call
sys_exec().

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-11 20:26                             ` Eric W. Biederman
  (?)
@ 2013-01-11 20:52                               ` Vivek Goyal
  -1 siblings, 0 replies; 217+ messages in thread
From: Vivek Goyal @ 2013-01-11 20:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Konrad Rzeszutek Wilk, Daniel Kiper, xen-devel, H. Peter Anvin,
	Andrew Cooper, x86, kexec, linux-kernel, virtualization, mingo,
	Jan Beulich, maxim.uvarov, tglx, David Howells

On Fri, Jan 11, 2013 at 12:26:56PM -0800, Eric W. Biederman wrote:

[..]
> Recently there is a desire to figure out how to /sbin/kexec support
> signed kernel images.  What will probably happen is to have a specially
> trusted userspace application perform the verification.  Sort of like
> dom0 for the linux userspace.  A few other ideas have been batted around
> but none that have stuck.

[ CC David Howells ]

Eric,

In a private conversation, David Howells suggested why not pass kernel
signature in a segment to kernel and kernel can do the verification.

/sbin/kexec signature is verified by kernel at exec() time. Then
/sbin/kexec just passes one signature segment (after regular segment) for
each segment being loaded. The segments which don't have signature,
are passed with section size 0. And signature passing behavior can be
controlled by one new kexec flag.

That way /sbin/kexec does not have to worry about doing any verification
by itself. In fact, I am not sure how it can do the verification when
crypto libraries it will need are not signed (assuming they are not
statically linked in).

What do you think about this idea?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 20:52                               ` Vivek Goyal
  0 siblings, 0 replies; 217+ messages in thread
From: Vivek Goyal @ 2013-01-11 20:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: xen-devel, Konrad Rzeszutek Wilk, Andrew Cooper, Daniel Kiper,
	x86, kexec, linux-kernel, virtualization, David Howells, mingo,
	Jan Beulich, H. Peter Anvin, maxim.uvarov, tglx

On Fri, Jan 11, 2013 at 12:26:56PM -0800, Eric W. Biederman wrote:

[..]
> Recently there is a desire to figure out how to /sbin/kexec support
> signed kernel images.  What will probably happen is to have a specially
> trusted userspace application perform the verification.  Sort of like
> dom0 for the linux userspace.  A few other ideas have been batted around
> but none that have stuck.

[ CC David Howells ]

Eric,

In a private conversation, David Howells suggested why not pass kernel
signature in a segment to kernel and kernel can do the verification.

/sbin/kexec signature is verified by kernel at exec() time. Then
/sbin/kexec just passes one signature segment (after regular segment) for
each segment being loaded. The segments which don't have signature,
are passed with section size 0. And signature passing behavior can be
controlled by one new kexec flag.

That way /sbin/kexec does not have to worry about doing any verification
by itself. In fact, I am not sure how it can do the verification when
crypto libraries it will need are not signed (assuming they are not
statically linked in).

What do you think about this idea?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 20:52                               ` Vivek Goyal
  0 siblings, 0 replies; 217+ messages in thread
From: Vivek Goyal @ 2013-01-11 20:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: xen-devel, Konrad Rzeszutek Wilk, Andrew Cooper, Daniel Kiper,
	x86, kexec, linux-kernel, virtualization, David Howells, mingo,
	Jan Beulich, H. Peter Anvin, maxim.uvarov, tglx

On Fri, Jan 11, 2013 at 12:26:56PM -0800, Eric W. Biederman wrote:

[..]
> Recently there is a desire to figure out how to /sbin/kexec support
> signed kernel images.  What will probably happen is to have a specially
> trusted userspace application perform the verification.  Sort of like
> dom0 for the linux userspace.  A few other ideas have been batted around
> but none that have stuck.

[ CC David Howells ]

Eric,

In a private conversation, David Howells suggested why not pass kernel
signature in a segment to kernel and kernel can do the verification.

/sbin/kexec signature is verified by kernel at exec() time. Then
/sbin/kexec just passes one signature segment (after regular segment) for
each segment being loaded. The segments which don't have signature,
are passed with section size 0. And signature passing behavior can be
controlled by one new kexec flag.

That way /sbin/kexec does not have to worry about doing any verification
by itself. In fact, I am not sure how it can do the verification when
crypto libraries it will need are not signed (assuming they are not
statically linked in).

What do you think about this idea?

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 21:03                                 ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2013-01-11 21:03 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Eric W. Biederman, Konrad Rzeszutek Wilk, Daniel Kiper,
	xen-devel, Andrew Cooper, x86, kexec, linux-kernel,
	virtualization, mingo, Jan Beulich, maxim.uvarov, tglx,
	David Howells

On 01/11/2013 12:52 PM, Vivek Goyal wrote:
> 
> Eric,
> 
> In a private conversation, David Howells suggested why not pass kernel
> signature in a segment to kernel and kernel can do the verification.
> 
> /sbin/kexec signature is verified by kernel at exec() time. Then
> /sbin/kexec just passes one signature segment (after regular segment) for
> each segment being loaded. The segments which don't have signature,
> are passed with section size 0. And signature passing behavior can be
> controlled by one new kexec flag.
> 
> That way /sbin/kexec does not have to worry about doing any verification
> by itself. In fact, I am not sure how it can do the verification when
> crypto libraries it will need are not signed (assuming they are not
> statically linked in).
> 
> What do you think about this idea?
> 

A signed /sbin/kexec would realistically have to be statically linked,
at least in the short term; otherwise the libraries and ld.so would need
verification as well.

Now, that *might* very well have some real value -- there are certainly
users out there who would very much want only binaries signed with
specific keys to get run on their system.

	-hpa



^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-11 20:52                               ` Vivek Goyal
  (?)
  (?)
@ 2013-01-11 21:03                               ` H. Peter Anvin
  -1 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2013-01-11 21:03 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: xen-devel, Konrad Rzeszutek Wilk, Andrew Cooper, Daniel Kiper,
	x86, kexec, linux-kernel, virtualization, David Howells, mingo,
	Eric W. Biederman, Jan Beulich, maxim.uvarov, tglx

On 01/11/2013 12:52 PM, Vivek Goyal wrote:
> 
> Eric,
> 
> In a private conversation, David Howells suggested why not pass kernel
> signature in a segment to kernel and kernel can do the verification.
> 
> /sbin/kexec signature is verified by kernel at exec() time. Then
> /sbin/kexec just passes one signature segment (after regular segment) for
> each segment being loaded. The segments which don't have signature,
> are passed with section size 0. And signature passing behavior can be
> controlled by one new kexec flag.
> 
> That way /sbin/kexec does not have to worry about doing any verification
> by itself. In fact, I am not sure how it can do the verification when
> crypto libraries it will need are not signed (assuming they are not
> statically linked in).
> 
> What do you think about this idea?
> 

A signed /sbin/kexec would realistically have to be statically linked,
at least in the short term; otherwise the libraries and ld.so would need
verification as well.

Now, that *might* very well have some real value -- there are certainly
users out there who would very much want only binaries signed with
specific keys to get run on their system.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 21:03                                 ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2013-01-11 21:03 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR,
	Konrad Rzeszutek Wilk, Andrew Cooper, Daniel Kiper,
	x86-DgEjT+Ai2ygdnm+yROfE0A,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Howells, mingo-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman,
	Jan Beulich, maxim.uvarov-QHcLZuEGTsvQT0dZR+AlfA,
	tglx-hfZtesqFncYOwBW4kG4KsQ

On 01/11/2013 12:52 PM, Vivek Goyal wrote:
> 
> Eric,
> 
> In a private conversation, David Howells suggested why not pass kernel
> signature in a segment to kernel and kernel can do the verification.
> 
> /sbin/kexec signature is verified by kernel at exec() time. Then
> /sbin/kexec just passes one signature segment (after regular segment) for
> each segment being loaded. The segments which don't have signature,
> are passed with section size 0. And signature passing behavior can be
> controlled by one new kexec flag.
> 
> That way /sbin/kexec does not have to worry about doing any verification
> by itself. In fact, I am not sure how it can do the verification when
> crypto libraries it will need are not signed (assuming they are not
> statically linked in).
> 
> What do you think about this idea?
> 

A signed /sbin/kexec would realistically have to be statically linked,
at least in the short term; otherwise the libraries and ld.so would need
verification as well.

Now, that *might* very well have some real value -- there are certainly
users out there who would very much want only binaries signed with
specific keys to get run on their system.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 21:03                                 ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2013-01-11 21:03 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: xen-devel, Konrad Rzeszutek Wilk, Andrew Cooper, Daniel Kiper,
	x86, kexec, linux-kernel, virtualization, David Howells, mingo,
	Eric W. Biederman, Jan Beulich, maxim.uvarov, tglx

On 01/11/2013 12:52 PM, Vivek Goyal wrote:
> 
> Eric,
> 
> In a private conversation, David Howells suggested why not pass kernel
> signature in a segment to kernel and kernel can do the verification.
> 
> /sbin/kexec signature is verified by kernel at exec() time. Then
> /sbin/kexec just passes one signature segment (after regular segment) for
> each segment being loaded. The segments which don't have signature,
> are passed with section size 0. And signature passing behavior can be
> controlled by one new kexec flag.
> 
> That way /sbin/kexec does not have to worry about doing any verification
> by itself. In fact, I am not sure how it can do the verification when
> crypto libraries it will need are not signed (assuming they are not
> statically linked in).
> 
> What do you think about this idea?
> 

A signed /sbin/kexec would realistically have to be statically linked,
at least in the short term; otherwise the libraries and ld.so would need
verification as well.

Now, that *might* very well have some real value -- there are certainly
users out there who would very much want only binaries signed with
specific keys to get run on their system.

	-hpa



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 21:08                                   ` Vivek Goyal
  0 siblings, 0 replies; 217+ messages in thread
From: Vivek Goyal @ 2013-01-11 21:08 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Eric W. Biederman, Konrad Rzeszutek Wilk, Daniel Kiper,
	xen-devel, Andrew Cooper, x86, kexec, linux-kernel,
	virtualization, mingo, Jan Beulich, maxim.uvarov, tglx,
	David Howells

On Fri, Jan 11, 2013 at 01:03:41PM -0800, H. Peter Anvin wrote:
> On 01/11/2013 12:52 PM, Vivek Goyal wrote:
> > 
> > Eric,
> > 
> > In a private conversation, David Howells suggested why not pass kernel
> > signature in a segment to kernel and kernel can do the verification.
> > 
> > /sbin/kexec signature is verified by kernel at exec() time. Then
> > /sbin/kexec just passes one signature segment (after regular segment) for
> > each segment being loaded. The segments which don't have signature,
> > are passed with section size 0. And signature passing behavior can be
> > controlled by one new kexec flag.
> > 
> > That way /sbin/kexec does not have to worry about doing any verification
> > by itself. In fact, I am not sure how it can do the verification when
> > crypto libraries it will need are not signed (assuming they are not
> > statically linked in).
> > 
> > What do you think about this idea?
> > 
> 
> A signed /sbin/kexec would realistically have to be statically linked,
> at least in the short term; otherwise the libraries and ld.so would need
> verification as well.

Yes. That's the expectation. Sign only statically linked exeutables which
don't do any of dlopen() stuff either.

In fact in the patch, I fail the exec() if signed executable has
interpreter.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-11 21:03                                 ` H. Peter Anvin
  (?)
  (?)
@ 2013-01-11 21:08                                 ` Vivek Goyal
  -1 siblings, 0 replies; 217+ messages in thread
From: Vivek Goyal @ 2013-01-11 21:08 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: xen-devel, Konrad Rzeszutek Wilk, Andrew Cooper, Daniel Kiper,
	x86, kexec, linux-kernel, virtualization, David Howells, mingo,
	Eric W. Biederman, Jan Beulich, maxim.uvarov, tglx

On Fri, Jan 11, 2013 at 01:03:41PM -0800, H. Peter Anvin wrote:
> On 01/11/2013 12:52 PM, Vivek Goyal wrote:
> > 
> > Eric,
> > 
> > In a private conversation, David Howells suggested why not pass kernel
> > signature in a segment to kernel and kernel can do the verification.
> > 
> > /sbin/kexec signature is verified by kernel at exec() time. Then
> > /sbin/kexec just passes one signature segment (after regular segment) for
> > each segment being loaded. The segments which don't have signature,
> > are passed with section size 0. And signature passing behavior can be
> > controlled by one new kexec flag.
> > 
> > That way /sbin/kexec does not have to worry about doing any verification
> > by itself. In fact, I am not sure how it can do the verification when
> > crypto libraries it will need are not signed (assuming they are not
> > statically linked in).
> > 
> > What do you think about this idea?
> > 
> 
> A signed /sbin/kexec would realistically have to be statically linked,
> at least in the short term; otherwise the libraries and ld.so would need
> verification as well.

Yes. That's the expectation. Sign only statically linked exeutables which
don't do any of dlopen() stuff either.

In fact in the patch, I fail the exec() if signed executable has
interpreter.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 21:08                                   ` Vivek Goyal
  0 siblings, 0 replies; 217+ messages in thread
From: Vivek Goyal @ 2013-01-11 21:08 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR,
	Konrad Rzeszutek Wilk, Andrew Cooper, Daniel Kiper,
	x86-DgEjT+Ai2ygdnm+yROfE0A,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Howells, mingo-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman,
	Jan Beulich, maxim.uvarov-QHcLZuEGTsvQT0dZR+AlfA,
	tglx-hfZtesqFncYOwBW4kG4KsQ

On Fri, Jan 11, 2013 at 01:03:41PM -0800, H. Peter Anvin wrote:
> On 01/11/2013 12:52 PM, Vivek Goyal wrote:
> > 
> > Eric,
> > 
> > In a private conversation, David Howells suggested why not pass kernel
> > signature in a segment to kernel and kernel can do the verification.
> > 
> > /sbin/kexec signature is verified by kernel at exec() time. Then
> > /sbin/kexec just passes one signature segment (after regular segment) for
> > each segment being loaded. The segments which don't have signature,
> > are passed with section size 0. And signature passing behavior can be
> > controlled by one new kexec flag.
> > 
> > That way /sbin/kexec does not have to worry about doing any verification
> > by itself. In fact, I am not sure how it can do the verification when
> > crypto libraries it will need are not signed (assuming they are not
> > statically linked in).
> > 
> > What do you think about this idea?
> > 
> 
> A signed /sbin/kexec would realistically have to be statically linked,
> at least in the short term; otherwise the libraries and ld.so would need
> verification as well.

Yes. That's the expectation. Sign only statically linked exeutables which
don't do any of dlopen() stuff either.

In fact in the patch, I fail the exec() if signed executable has
interpreter.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 21:08                                   ` Vivek Goyal
  0 siblings, 0 replies; 217+ messages in thread
From: Vivek Goyal @ 2013-01-11 21:08 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: xen-devel, Konrad Rzeszutek Wilk, Andrew Cooper, Daniel Kiper,
	x86, kexec, linux-kernel, virtualization, David Howells, mingo,
	Eric W. Biederman, Jan Beulich, maxim.uvarov, tglx

On Fri, Jan 11, 2013 at 01:03:41PM -0800, H. Peter Anvin wrote:
> On 01/11/2013 12:52 PM, Vivek Goyal wrote:
> > 
> > Eric,
> > 
> > In a private conversation, David Howells suggested why not pass kernel
> > signature in a segment to kernel and kernel can do the verification.
> > 
> > /sbin/kexec signature is verified by kernel at exec() time. Then
> > /sbin/kexec just passes one signature segment (after regular segment) for
> > each segment being loaded. The segments which don't have signature,
> > are passed with section size 0. And signature passing behavior can be
> > controlled by one new kexec flag.
> > 
> > That way /sbin/kexec does not have to worry about doing any verification
> > by itself. In fact, I am not sure how it can do the verification when
> > crypto libraries it will need are not signed (assuming they are not
> > statically linked in).
> > 
> > What do you think about this idea?
> > 
> 
> A signed /sbin/kexec would realistically have to be statically linked,
> at least in the short term; otherwise the libraries and ld.so would need
> verification as well.

Yes. That's the expectation. Sign only statically linked exeutables which
don't do any of dlopen() stuff either.

In fact in the patch, I fail the exec() if signed executable has
interpreter.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
  2013-01-11 21:08                                   ` Vivek Goyal
  (?)
@ 2013-01-11 21:14                                     ` H. Peter Anvin
  -1 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2013-01-11 21:14 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Eric W. Biederman, Konrad Rzeszutek Wilk, Daniel Kiper,
	xen-devel, Andrew Cooper, x86, kexec, linux-kernel,
	virtualization, mingo, Jan Beulich, maxim.uvarov, tglx,
	David Howells

On 01/11/2013 01:08 PM, Vivek Goyal wrote:
>>
>> A signed /sbin/kexec would realistically have to be statically linked,
>> at least in the short term; otherwise the libraries and ld.so would need
>> verification as well.
>
> Yes. That's the expectation. Sign only statically linked exeutables which
> don't do any of dlopen() stuff either.
>
> In fact in the patch, I fail the exec() if signed executable has
> interpreter.
>

As I said, though (and possibly not for kexec, that depends): in the 
long term we probably want a way to be able to sign all kinds binaries 
in the system.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 21:14                                     ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2013-01-11 21:14 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: xen-devel, Konrad Rzeszutek Wilk, Andrew Cooper, Daniel Kiper,
	x86, kexec, linux-kernel, virtualization, David Howells, mingo,
	Eric W. Biederman, Jan Beulich, maxim.uvarov, tglx

On 01/11/2013 01:08 PM, Vivek Goyal wrote:
>>
>> A signed /sbin/kexec would realistically have to be statically linked,
>> at least in the short term; otherwise the libraries and ld.so would need
>> verification as well.
>
> Yes. That's the expectation. Sign only statically linked exeutables which
> don't do any of dlopen() stuff either.
>
> In fact in the patch, I fail the exec() if signed executable has
> interpreter.
>

As I said, though (and possibly not for kexec, that depends): in the 
long term we probably want a way to be able to sign all kinds binaries 
in the system.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
@ 2013-01-11 21:14                                     ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2013-01-11 21:14 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: xen-devel, Konrad Rzeszutek Wilk, Andrew Cooper, Daniel Kiper,
	x86, kexec, linux-kernel, virtualization, David Howells, mingo,
	Eric W. Biederman, Jan Beulich, maxim.uvarov, tglx

On 01/11/2013 01:08 PM, Vivek Goyal wrote:
>>
>> A signed /sbin/kexec would realistically have to be statically linked,
>> at least in the short term; otherwise the libraries and ld.so would need
>> verification as well.
>
> Yes. That's the expectation. Sign only statically linked exeutables which
> don't do any of dlopen() stuff either.
>
> In fact in the patch, I fail the exec() if signed executable has
> interpreter.
>

As I said, though (and possibly not for kexec, that depends): in the 
long term we probably want a way to be able to sign all kinds binaries 
in the system.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 217+ messages in thread

end of thread, other threads:[~2013-01-11 21:15 UTC | newest]

Thread overview: 217+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-27  2:18 [PATCH v3 00/11] xen: Initial kexec/kdump implementation Daniel Kiper
2012-12-27  2:18 ` Daniel Kiper
2012-12-27  2:18 ` Daniel Kiper
2012-12-27  2:18 ` [PATCH v3 01/11] kexec: introduce kexec firmware support Daniel Kiper
2012-12-27  2:18 ` Daniel Kiper
2012-12-27  2:18   ` Daniel Kiper
2012-12-27  2:18   ` Daniel Kiper
2012-12-27  2:18   ` [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE Daniel Kiper
2012-12-27  2:18     ` Daniel Kiper
2012-12-27  2:18     ` Daniel Kiper
2012-12-27  2:18     ` [PATCH v3 03/11] xen: Introduce architecture independent data for kexec/kdump Daniel Kiper
2012-12-27  2:18       ` Daniel Kiper
2012-12-27  2:18       ` Daniel Kiper
2012-12-27  2:18       ` [PATCH v3 04/11] x86/xen: Introduce architecture dependent " Daniel Kiper
2012-12-27  2:18       ` Daniel Kiper
2012-12-27  2:18         ` Daniel Kiper
2012-12-27  2:18         ` Daniel Kiper
2012-12-27  2:18         ` [PATCH v3 05/11] x86/xen: Register resources required by kexec-tools Daniel Kiper
2012-12-27  2:18           ` Daniel Kiper
2012-12-27  2:18           ` Daniel Kiper
2012-12-27  2:18           ` [PATCH v3 06/11] x86/xen: Add i386 kexec/kdump implementation Daniel Kiper
2012-12-27  2:18             ` Daniel Kiper
2012-12-27  2:18             ` Daniel Kiper
2012-12-27  2:18             ` [PATCH v3 07/11] x86/xen: Add x86_64 " Daniel Kiper
2012-12-27  2:18             ` Daniel Kiper
2012-12-27  2:18               ` Daniel Kiper
2012-12-27  2:18               ` Daniel Kiper
2012-12-27  2:18               ` [PATCH v3 08/11] x86/xen: Add kexec/kdump Kconfig and makefile rules Daniel Kiper
2012-12-27  2:18               ` Daniel Kiper
2012-12-27  2:18                 ` Daniel Kiper
2012-12-27  2:18                 ` Daniel Kiper
2012-12-27  2:18                 ` [PATCH v3 09/11] x86/xen/enlighten: Add init and crash kexec/kdump hooks Daniel Kiper
2012-12-27  2:18                 ` Daniel Kiper
2012-12-27  2:18                   ` Daniel Kiper
2012-12-27  2:18                   ` Daniel Kiper
2012-12-27  2:18                   ` [PATCH v3 10/11] drivers/xen: Export vmcoreinfo through sysfs Daniel Kiper
2012-12-27  2:18                   ` Daniel Kiper
2012-12-27  2:18                     ` Daniel Kiper
2012-12-27  2:18                     ` Daniel Kiper
2012-12-27  2:19                     ` [PATCH v3 11/11] x86: Add Xen kexec control code size check to linker script Daniel Kiper
2012-12-27  2:19                       ` Daniel Kiper
2012-12-27  2:19                       ` Daniel Kiper
2012-12-27  2:19                     ` Daniel Kiper
2012-12-27 18:53                   ` [PATCH v3 09/11] x86/xen/enlighten: Add init and crash kexec/kdump hooks H. Peter Anvin
2012-12-27 18:53                     ` H. Peter Anvin
2012-12-27 18:53                     ` H. Peter Anvin
2012-12-27  4:00             ` [PATCH v3 06/11] x86/xen: Add i386 kexec/kdump implementation H. Peter Anvin
2012-12-27  4:00               ` H. Peter Anvin
2012-12-27  4:00               ` H. Peter Anvin
2012-12-27  2:18           ` Daniel Kiper
2012-12-27  2:18         ` [PATCH v3 05/11] x86/xen: Register resources required by kexec-tools Daniel Kiper
2012-12-27  2:18     ` [PATCH v3 03/11] xen: Introduce architecture independent data for kexec/kdump Daniel Kiper
2012-12-27  3:33     ` [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE H. Peter Anvin
2012-12-27  3:33       ` H. Peter Anvin
2013-01-03  9:34     ` Jan Beulich
2013-01-03  9:34     ` Jan Beulich
2013-01-03  9:34       ` Jan Beulich
2013-01-03  9:34       ` Jan Beulich
2013-01-04 15:15       ` Daniel Kiper
2013-01-04 15:15         ` Daniel Kiper
2013-01-04 15:15         ` Daniel Kiper
2013-01-04 16:12         ` Jan Beulich
2013-01-04 16:12           ` Jan Beulich
2013-01-04 16:12           ` Jan Beulich
2013-01-04 17:25           ` Daniel Kiper
2013-01-04 17:25           ` Daniel Kiper
2013-01-04 17:25             ` Daniel Kiper
2013-01-04 17:25             ` Daniel Kiper
2013-01-07  9:48             ` Jan Beulich
2013-01-07  9:48               ` Jan Beulich
2013-01-07 12:52               ` Daniel Kiper
2013-01-07 12:52               ` Daniel Kiper
2013-01-07 12:52                 ` Daniel Kiper
2013-01-07 12:52                 ` Daniel Kiper
2013-01-07 13:05                 ` Jan Beulich
2013-01-07 13:05                   ` Jan Beulich
2013-01-07 13:05                   ` Jan Beulich
2013-01-09 18:42                   ` Daniel Kiper
2013-01-09 18:42                   ` Daniel Kiper
2013-01-09 18:42                   ` Daniel Kiper
2013-01-09 18:42                     ` Daniel Kiper
2013-01-07 13:05                 ` Jan Beulich
2013-01-07  9:48             ` Jan Beulich
2013-01-07  9:48             ` Jan Beulich
2013-01-04 16:12         ` Jan Beulich
2013-01-10 14:07         ` David Vrabel
2013-01-10 14:07         ` [Xen-devel] " David Vrabel
2013-01-10 14:07           ` David Vrabel
2013-01-10 14:07           ` David Vrabel
2013-01-11 13:36           ` Daniel Kiper
2013-01-11 13:36           ` [Xen-devel] " Daniel Kiper
2013-01-11 13:36             ` Daniel Kiper
2013-01-11 13:36             ` Daniel Kiper
2013-01-04 15:15       ` Daniel Kiper
2012-12-27  2:18   ` Daniel Kiper
2012-12-27  4:46   ` [PATCH v3 01/11] kexec: introduce kexec firmware support Eric W. Biederman
2012-12-27  4:46     ` Eric W. Biederman
2012-12-27  4:46     ` Eric W. Biederman
2012-12-27  4:02 ` [PATCH v3 00/11] xen: Initial kexec/kdump implementation H. Peter Anvin
2012-12-27  4:02   ` H. Peter Anvin
2012-12-27  4:02   ` H. Peter Anvin
2012-12-27  7:53   ` Eric W. Biederman
2012-12-27  7:53     ` Eric W. Biederman
2012-12-27  7:53     ` Eric W. Biederman
2012-12-27 14:18     ` Andrew Cooper
2012-12-27 14:18       ` Andrew Cooper
2012-12-27 14:18       ` Andrew Cooper
2012-12-27 18:02       ` Eric W. Biederman
2012-12-27 18:02         ` Eric W. Biederman
2012-12-27 18:02         ` Eric W. Biederman
2013-01-02 11:26         ` [Xen-devel] " Andrew Cooper
2013-01-02 11:26           ` Andrew Cooper
2013-01-02 11:26           ` Andrew Cooper
2013-01-02 11:47           ` Eric W. Biederman
2013-01-02 11:47             ` Eric W. Biederman
2013-01-02 11:47             ` Eric W. Biederman
2013-01-03  9:31           ` Jan Beulich
2013-01-03  9:31             ` Jan Beulich
2013-01-03  9:31             ` Jan Beulich
2013-01-04 14:22           ` Daniel Kiper
2013-01-04 14:22             ` Daniel Kiper
2013-01-04 14:22             ` Daniel Kiper
2013-01-04 14:34             ` Konrad Rzeszutek Wilk
2013-01-04 14:34               ` Konrad Rzeszutek Wilk
2013-01-04 14:34               ` Konrad Rzeszutek Wilk
2013-01-04 14:34             ` Ian Campbell
2013-01-04 14:34               ` Ian Campbell
2013-01-04 14:34               ` Ian Campbell
2013-01-04 14:38             ` David Vrabel
2013-01-04 14:38             ` David Vrabel
2013-01-04 14:38               ` David Vrabel
2013-01-04 14:38               ` David Vrabel
2013-01-04 17:01               ` Daniel Kiper
2013-01-04 17:01                 ` Daniel Kiper
2013-01-04 17:01                 ` Daniel Kiper
2013-01-10 14:19                 ` David Vrabel
2013-01-10 14:19                   ` David Vrabel
2013-01-10 14:19                   ` David Vrabel
2013-01-11 13:22                   ` Daniel Kiper
2013-01-11 13:22                     ` Daniel Kiper
2013-01-11 13:22                     ` Daniel Kiper
2013-01-11 15:22                     ` David Vrabel
2013-01-11 15:22                       ` David Vrabel
2013-01-11 15:22                       ` David Vrabel
2013-01-11 17:34                       ` Daniel Kiper
2013-01-11 17:34                         ` Daniel Kiper
2013-01-11 17:34                         ` Daniel Kiper
2013-01-11 20:05                       ` Eric W. Biederman
2013-01-11 20:05                         ` Eric W. Biederman
2013-01-11 20:05                         ` Eric W. Biederman
2013-01-11 20:05                         ` Eric W. Biederman
2013-01-04 14:41             ` Jan Beulich
2013-01-04 14:41               ` Jan Beulich
2013-01-04 14:41               ` Jan Beulich
2013-01-04 17:07               ` Daniel Kiper
2013-01-04 17:07                 ` Daniel Kiper
2013-01-04 17:07                 ` Daniel Kiper
2013-01-04 19:11                 ` Konrad Rzeszutek Wilk
2013-01-04 19:11                   ` Konrad Rzeszutek Wilk
2013-01-04 19:11                   ` Konrad Rzeszutek Wilk
2013-01-07 10:25                   ` Ian Campbell
2013-01-07 10:25                   ` Ian Campbell
2013-01-07 10:25                     ` Ian Campbell
2013-01-07 10:25                     ` Ian Campbell
2013-01-07 10:46                     ` Andrew Cooper
2013-01-07 10:46                       ` Andrew Cooper
2013-01-07 10:46                       ` Andrew Cooper
2013-01-07 10:54                       ` Ian Campbell
2013-01-07 10:54                         ` Ian Campbell
2013-01-07 10:54                         ` Ian Campbell
2013-01-07 12:34                   ` Daniel Kiper
2013-01-07 12:34                     ` Daniel Kiper
2013-01-07 12:34                     ` Daniel Kiper
2013-01-07 13:49                     ` Ian Campbell
2013-01-07 13:49                     ` Ian Campbell
2013-01-07 13:49                       ` Ian Campbell
2013-01-07 13:49                       ` Ian Campbell
2013-01-11 13:47                       ` Daniel Kiper
2013-01-11 13:47                         ` Daniel Kiper
2013-01-11 13:47                         ` Daniel Kiper
2013-01-07 16:20                     ` Konrad Rzeszutek Wilk
2013-01-07 16:20                       ` Konrad Rzeszutek Wilk
2013-01-07 16:20                       ` Konrad Rzeszutek Wilk
2013-01-11  4:16                       ` Eric W. Biederman
2013-01-11  4:16                         ` Eric W. Biederman
2013-01-11  4:16                         ` Eric W. Biederman
2013-01-11 16:55                         ` Konrad Rzeszutek Wilk
2013-01-11 16:55                           ` Konrad Rzeszutek Wilk
2013-01-11 16:55                           ` Konrad Rzeszutek Wilk
2013-01-11 20:26                           ` H. Peter Anvin
2013-01-11 20:26                             ` H. Peter Anvin
2013-01-11 20:26                             ` H. Peter Anvin
2013-01-11 20:43                             ` Vivek Goyal
2013-01-11 20:43                               ` Vivek Goyal
2013-01-11 20:43                               ` Vivek Goyal
2013-01-11 20:26                           ` Eric W. Biederman
2013-01-11 20:26                             ` Eric W. Biederman
2013-01-11 20:26                             ` Eric W. Biederman
2013-01-11 20:52                             ` Vivek Goyal
2013-01-11 20:52                               ` Vivek Goyal
2013-01-11 20:52                               ` Vivek Goyal
2013-01-11 21:03                               ` H. Peter Anvin
2013-01-11 21:03                               ` H. Peter Anvin
2013-01-11 21:03                                 ` H. Peter Anvin
2013-01-11 21:03                                 ` H. Peter Anvin
2013-01-11 21:08                                 ` Vivek Goyal
2013-01-11 21:08                                 ` Vivek Goyal
2013-01-11 21:08                                   ` Vivek Goyal
2013-01-11 21:08                                   ` Vivek Goyal
2013-01-11 21:14                                   ` H. Peter Anvin
2013-01-11 21:14                                     ` H. Peter Anvin
2013-01-11 21:14                                     ` H. Peter Anvin
2013-01-04 17:07               ` Daniel Kiper
2013-01-02 15:27       ` Ian Campbell
2013-01-02 15:27         ` Ian Campbell
2013-01-02 15:27         ` Ian Campbell
2013-01-02 15:27       ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.