All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels
@ 2013-02-21 17:48 David Vrabel
  2013-02-21 17:48 ` [PATCH 1/8] x86: give FIX_EFI_MPF its own fixmap entry David Vrabel
                   ` (23 more replies)
  0 siblings, 24 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-21 17:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Daniel Kiper, kexec, David Vrabel

The series improves the kexec hypercall by making Xen responsible for
loading and relocating the image.  This allows kexec to be usable by
pv-ops kernels and should allow kexec to be usable from a HVM or PVH
privileged domain.

The first patch is a simple clean-up.

The second patch allows hypercall structures to be ABI compatible
between 32- and 64-bit guests (by reusing stuff present for domctls
and sysctls).  This seems better than having to keep adding compat
handling for new hypercalls etc.

Patch 3 introduces the new ABI.

Patch 4 and 5 nearly completely reimplement the kexec load, unload and
exec sub-ops.  The old load_v1 sub-op is then implemented on top of
the new code.

Patch 6 calls the kexec image when dom0 crashes.  This avoids having
to alter dom0 kernels to do a exec sub-op call on crash -- an existing
SHUTDOWN_crash.

Patches 7 and 8 add the libxc API for the kexec calls.

The required patch series for kexec-tools will be posted shortly.

David


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 1/8] x86: give FIX_EFI_MPF its own fixmap entry
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
@ 2013-02-21 17:48 ` David Vrabel
  2013-02-21 17:48 ` David Vrabel
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-21 17:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Daniel Kiper, kexec, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

FIX_EFI_MPF was the same as FIX_KEXEC_BASE_0 which is going away.  So
add its own entry.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/arch/x86/mpparse.c       |    2 --
 xen/include/asm-x86/fixmap.h |    1 +
 2 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
index 97ab5d3..f13ba93 100644
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -538,8 +538,6 @@ static inline void __init construct_default_ISA_mptable(int mpc_default_type)
 	}
 }
 
-#define FIX_EFI_MPF FIX_KEXEC_BASE_0
-
 static __init void efi_unmap_mpf(void)
 {
 	if (efi_enabled)
diff --git a/xen/include/asm-x86/fixmap.h b/xen/include/asm-x86/fixmap.h
index d026d78..2eefcf4 100644
--- a/xen/include/asm-x86/fixmap.h
+++ b/xen/include/asm-x86/fixmap.h
@@ -71,6 +71,7 @@ enum fixed_addresses {
     FIX_APEI_RANGE_BASE,
     FIX_APEI_RANGE_END = FIX_APEI_RANGE_BASE + FIX_APEI_RANGE_MAX -1,
     FIX_IGD_MMIO,
+    FIX_EFI_MPF,
     __end_of_fixed_addresses
 };
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 1/8] x86: give FIX_EFI_MPF its own fixmap entry
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
  2013-02-21 17:48 ` [PATCH 1/8] x86: give FIX_EFI_MPF its own fixmap entry David Vrabel
@ 2013-02-21 17:48 ` David Vrabel
  2013-02-21 17:48 ` [PATCH 2/8] xen: make GUEST_HANDLE_64() and uint64_aligned_t available everywhere David Vrabel
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-21 17:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Daniel Kiper, kexec, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

FIX_EFI_MPF was the same as FIX_KEXEC_BASE_0 which is going away.  So
add its own entry.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/arch/x86/mpparse.c       |    2 --
 xen/include/asm-x86/fixmap.h |    1 +
 2 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
index 97ab5d3..f13ba93 100644
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -538,8 +538,6 @@ static inline void __init construct_default_ISA_mptable(int mpc_default_type)
 	}
 }
 
-#define FIX_EFI_MPF FIX_KEXEC_BASE_0
-
 static __init void efi_unmap_mpf(void)
 {
 	if (efi_enabled)
diff --git a/xen/include/asm-x86/fixmap.h b/xen/include/asm-x86/fixmap.h
index d026d78..2eefcf4 100644
--- a/xen/include/asm-x86/fixmap.h
+++ b/xen/include/asm-x86/fixmap.h
@@ -71,6 +71,7 @@ enum fixed_addresses {
     FIX_APEI_RANGE_BASE,
     FIX_APEI_RANGE_END = FIX_APEI_RANGE_BASE + FIX_APEI_RANGE_MAX -1,
     FIX_IGD_MMIO,
+    FIX_EFI_MPF,
     __end_of_fixed_addresses
 };
 
-- 
1.7.2.5


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 2/8] xen: make GUEST_HANDLE_64() and uint64_aligned_t available everywhere
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
  2013-02-21 17:48 ` [PATCH 1/8] x86: give FIX_EFI_MPF its own fixmap entry David Vrabel
  2013-02-21 17:48 ` David Vrabel
@ 2013-02-21 17:48 ` David Vrabel
  2013-02-21 17:48 ` David Vrabel
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-21 17:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Daniel Kiper, kexec, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

GUEST_HANDLE_64() and uint64_aligned_t allow hypercall ABI structures
to be identical (binary compatible) for 32 and 64-bit guests.  They
are currently limited to only being available for use in sysctls and
domctls.  Relax this limit so it may be used by any new structures.

There is a minimal cost for 32-bit guests on 64-but hypervisors as
set_guest_handle() needs to 0 the whole field on GUEST_HANDLE_64()
handles, but this is expected to be less than the overhead of having
to translate compat structures.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/include/public/arch-x86/xen-x86_32.h |    4 +---
 xen/include/public/xen.h                 |   13 ++++++++-----
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/xen/include/public/arch-x86/xen-x86_32.h b/xen/include/public/arch-x86/xen-x86_32.h
index 906e74a..08fac7a 100644
--- a/xen/include/public/arch-x86/xen-x86_32.h
+++ b/xen/include/public/arch-x86/xen-x86_32.h
@@ -91,8 +91,7 @@
 #define machine_to_phys_mapping ((unsigned long *)MACH2PHYS_VIRT_START)
 #endif
 
-/* 32-/64-bit invariability for control interfaces (domctl/sysctl). */
-#if defined(__XEN__) || defined(__XEN_TOOLS__)
+/* 32-/64-bit invariability. */
 #undef ___DEFINE_XEN_GUEST_HANDLE
 #define ___DEFINE_XEN_GUEST_HANDLE(name, type)                  \
     typedef struct { type *p; }                                 \
@@ -107,7 +106,6 @@
 #define uint64_aligned_t uint64_t __attribute__((aligned(8)))
 #define __XEN_GUEST_HANDLE_64(name) __guest_handle_64_ ## name
 #define XEN_GUEST_HANDLE_64(name) __XEN_GUEST_HANDLE_64(name)
-#endif
 
 #ifndef __ASSEMBLY__
 
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 5593066..c18e7ce 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -840,9 +840,14 @@ __DEFINE_XEN_GUEST_HANDLE(uint64, uint64_t);
 
 #endif /* !__ASSEMBLY__ */
 
-/* Default definitions for macros used by domctl/sysctl. */
-#if defined(__XEN__) || defined(__XEN_TOOLS__)
-
+/*
+ * Default definitions for 32/64-bit invariant macros.
+ *
+ * Use these in ABI structures that should be identical for 32 and
+ * 64-bit guests. There is some (very small) overhead in using
+ * XEN_GUEST_HANDLE_64() instead of XEN_GUEST_HANDLE() so avoid for
+ * very hot paths.
+ */
 #ifndef uint64_aligned_t
 #define uint64_aligned_t uint64_t
 #endif
@@ -857,8 +862,6 @@ struct xenctl_cpumap {
 };
 #endif
 
-#endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
-
 #endif /* __XEN_PUBLIC_XEN_H__ */
 
 /*
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 2/8] xen: make GUEST_HANDLE_64() and uint64_aligned_t available everywhere
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (2 preceding siblings ...)
  2013-02-21 17:48 ` [PATCH 2/8] xen: make GUEST_HANDLE_64() and uint64_aligned_t available everywhere David Vrabel
@ 2013-02-21 17:48 ` David Vrabel
  2013-02-21 17:48 ` [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops David Vrabel
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-21 17:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Daniel Kiper, kexec, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

GUEST_HANDLE_64() and uint64_aligned_t allow hypercall ABI structures
to be identical (binary compatible) for 32 and 64-bit guests.  They
are currently limited to only being available for use in sysctls and
domctls.  Relax this limit so it may be used by any new structures.

There is a minimal cost for 32-bit guests on 64-but hypervisors as
set_guest_handle() needs to 0 the whole field on GUEST_HANDLE_64()
handles, but this is expected to be less than the overhead of having
to translate compat structures.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/include/public/arch-x86/xen-x86_32.h |    4 +---
 xen/include/public/xen.h                 |   13 ++++++++-----
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/xen/include/public/arch-x86/xen-x86_32.h b/xen/include/public/arch-x86/xen-x86_32.h
index 906e74a..08fac7a 100644
--- a/xen/include/public/arch-x86/xen-x86_32.h
+++ b/xen/include/public/arch-x86/xen-x86_32.h
@@ -91,8 +91,7 @@
 #define machine_to_phys_mapping ((unsigned long *)MACH2PHYS_VIRT_START)
 #endif
 
-/* 32-/64-bit invariability for control interfaces (domctl/sysctl). */
-#if defined(__XEN__) || defined(__XEN_TOOLS__)
+/* 32-/64-bit invariability. */
 #undef ___DEFINE_XEN_GUEST_HANDLE
 #define ___DEFINE_XEN_GUEST_HANDLE(name, type)                  \
     typedef struct { type *p; }                                 \
@@ -107,7 +106,6 @@
 #define uint64_aligned_t uint64_t __attribute__((aligned(8)))
 #define __XEN_GUEST_HANDLE_64(name) __guest_handle_64_ ## name
 #define XEN_GUEST_HANDLE_64(name) __XEN_GUEST_HANDLE_64(name)
-#endif
 
 #ifndef __ASSEMBLY__
 
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 5593066..c18e7ce 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -840,9 +840,14 @@ __DEFINE_XEN_GUEST_HANDLE(uint64, uint64_t);
 
 #endif /* !__ASSEMBLY__ */
 
-/* Default definitions for macros used by domctl/sysctl. */
-#if defined(__XEN__) || defined(__XEN_TOOLS__)
-
+/*
+ * Default definitions for 32/64-bit invariant macros.
+ *
+ * Use these in ABI structures that should be identical for 32 and
+ * 64-bit guests. There is some (very small) overhead in using
+ * XEN_GUEST_HANDLE_64() instead of XEN_GUEST_HANDLE() so avoid for
+ * very hot paths.
+ */
 #ifndef uint64_aligned_t
 #define uint64_aligned_t uint64_t
 #endif
@@ -857,8 +862,6 @@ struct xenctl_cpumap {
 };
 #endif
 
-#endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
-
 #endif /* __XEN_PUBLIC_XEN_H__ */
 
 /*
-- 
1.7.2.5


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (3 preceding siblings ...)
  2013-02-21 17:48 ` David Vrabel
@ 2013-02-21 17:48 ` David Vrabel
  2013-02-21 17:48 ` David Vrabel
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-21 17:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Daniel Kiper, kexec, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

Add replacement KEXEC_CMD_load and KEXEC_CMD_unload sub-ops to the
kexec hypercall.  These new sub-ops allow a priviledged guest to
provide the image data to be loaded into Xen memory or the crash
region instead of guests loading the image data themselves and
providing the relocation code and metadata.

The old interface is provided to guests requesting an interface
version prior to 4.3.

Signed-off: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/kexec.c         |   12 ++++----
 xen/include/public/kexec.h |   66 +++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 68 insertions(+), 10 deletions(-)

diff --git a/xen/common/kexec.c b/xen/common/kexec.c
index 6dd20c6..2cbb62c 100644
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -732,7 +732,7 @@ static void crash_save_vmcoreinfo(void)
 #endif
 }
 
-static int kexec_load_unload_internal(unsigned long op, xen_kexec_load_t *load)
+static int kexec_load_unload_internal(unsigned long op, xen_kexec_load_v1_t *load)
 {
     xen_kexec_image_t *image;
     int base, bit, pos;
@@ -779,7 +779,7 @@ static int kexec_load_unload_internal(unsigned long op, xen_kexec_load_t *load)
 
 static int kexec_load_unload(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) uarg)
 {
-    xen_kexec_load_t load;
+    xen_kexec_load_v1_t load;
 
     if ( unlikely(copy_from_guest(&load, uarg, 1)) )
         return -EFAULT;
@@ -791,8 +791,8 @@ static int kexec_load_unload_compat(unsigned long op,
                                     XEN_GUEST_HANDLE_PARAM(void) uarg)
 {
 #ifdef CONFIG_COMPAT
-    compat_kexec_load_t compat_load;
-    xen_kexec_load_t load;
+    compat_kexec_load_v1_t compat_load;
+    xen_kexec_load_v1_t load;
 
     if ( unlikely(copy_from_guest(&compat_load, uarg, 1)) )
         return -EFAULT;
@@ -864,8 +864,8 @@ static int do_kexec_op_internal(unsigned long op,
         else
                 ret = kexec_get_range(uarg);
         break;
-    case KEXEC_CMD_kexec_load:
-    case KEXEC_CMD_kexec_unload:
+    case KEXEC_CMD_kexec_load_v1:
+    case KEXEC_CMD_kexec_unload_v1:
         spin_lock_irqsave(&kexec_lock, flags);
         if (!test_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags))
         {
diff --git a/xen/include/public/kexec.h b/xen/include/public/kexec.h
index 61a8d7d..5259446 100644
--- a/xen/include/public/kexec.h
+++ b/xen/include/public/kexec.h
@@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
  * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
  * image == relocation information for kexec (ignored for unload) [in]
  */
-#define KEXEC_CMD_kexec_load            1
-#define KEXEC_CMD_kexec_unload          2
-typedef struct xen_kexec_load {
+#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
+#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
+typedef struct xen_kexec_load_v1 {
     int type;
     xen_kexec_image_t image;
-} xen_kexec_load_t;
+} xen_kexec_load_v1_t;
 
 #define KEXEC_RANGE_MA_CRASH      0 /* machine address and size of crash area */
 #define KEXEC_RANGE_MA_XEN        1 /* machine address and size of Xen itself */
@@ -152,6 +152,64 @@ typedef struct xen_kexec_range {
     unsigned long start;
 } xen_kexec_range_t;
 
+#if __XEN_INTERFACE_VERSION__ >= 0x00040300
+/*
+ * A contiguous chunk of a kexec image and it's destination machine
+ * address.
+ */
+typedef struct xen_kexec_segment {
+    XEN_GUEST_HANDLE_64(const_void) buf;
+    uint64_t buf_size;
+    uint64_t dest_maddr;
+    uint64_t dest_size;
+} xen_kexec_segment_t;
+DEFINE_XEN_GUEST_HANDLE(xen_kexec_segment_t);
+
+/*
+ * Load a kexec image into memory.
+ *
+ * For KEXEC_TYPE_DEFAULT images, the segments may be anywhere in RAM.
+ * The image is relocated prior to being executed.
+ *
+ * For KEXEC_TYPE_CRASH images, each segment of the image must reside
+ * in the memory region reserved for kexec (KEXEC_RANGE_MA_CRASH) and
+ * the entry point must be within the image. The caller is responsible
+ * for ensuring that multiple images do not overlap.
+ */
+
+#define KEXEC_CMD_kexec_load 4
+typedef struct xen_kexec_load {
+    uint8_t  type;        /* One of KEXEC_TYPE_* */
+    uint16_t arch;        /* ELF machine type (EM_*). */
+    uint32_t __pad;
+    uint64_t entry_maddr; /* image entry point machine address. */
+    uint32_t nr_segments;
+    XEN_GUEST_HANDLE_64(xen_kexec_segment_t) segments;
+} xen_kexec_load_t;
+DEFINE_XEN_GUEST_HANDLE(xen_kexec_load_t);
+
+/*
+ * Unload a kexec image.
+ *
+ * Type must be one of KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH.
+ */
+#define KEXEC_CMD_kexec_unload 5
+typedef struct xen_kexec_unload {
+    uint8_t type;
+} xen_kexec_unload_t;
+DEFINE_XEN_GUEST_HANDLE(xen_kexec_unload_t);
+
+#else /* __XEN_INTERFACE_VERSION__ < 0x00040300 */
+
+#undef KEXEC_CMD_kexec_load
+#undef KEXEC_CMD_kexec_unload
+#define KEXEC_CMD_kexec_load KEXEC_CMD_kexec_load_v1
+#define KEXEC_CMD_kexec_unload KEXEC_CMD_kexec_unload_v1
+
+typedef struct xen_kexec_load_v1_t xen_kexec_load_t;
+
+#endif
+
 #endif /* _XEN_PUBLIC_KEXEC_H */
 
 /*
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (4 preceding siblings ...)
  2013-02-21 17:48 ` [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops David Vrabel
@ 2013-02-21 17:48 ` David Vrabel
  2013-02-21 22:29   ` Daniel Kiper
                     ` (5 more replies)
  2013-02-21 17:48 ` [PATCH 4/8] kexec: add infrastructure for handling kexec images David Vrabel
                   ` (17 subsequent siblings)
  23 siblings, 6 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-21 17:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Daniel Kiper, kexec, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

Add replacement KEXEC_CMD_load and KEXEC_CMD_unload sub-ops to the
kexec hypercall.  These new sub-ops allow a priviledged guest to
provide the image data to be loaded into Xen memory or the crash
region instead of guests loading the image data themselves and
providing the relocation code and metadata.

The old interface is provided to guests requesting an interface
version prior to 4.3.

Signed-off: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/kexec.c         |   12 ++++----
 xen/include/public/kexec.h |   66 +++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 68 insertions(+), 10 deletions(-)

diff --git a/xen/common/kexec.c b/xen/common/kexec.c
index 6dd20c6..2cbb62c 100644
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -732,7 +732,7 @@ static void crash_save_vmcoreinfo(void)
 #endif
 }
 
-static int kexec_load_unload_internal(unsigned long op, xen_kexec_load_t *load)
+static int kexec_load_unload_internal(unsigned long op, xen_kexec_load_v1_t *load)
 {
     xen_kexec_image_t *image;
     int base, bit, pos;
@@ -779,7 +779,7 @@ static int kexec_load_unload_internal(unsigned long op, xen_kexec_load_t *load)
 
 static int kexec_load_unload(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) uarg)
 {
-    xen_kexec_load_t load;
+    xen_kexec_load_v1_t load;
 
     if ( unlikely(copy_from_guest(&load, uarg, 1)) )
         return -EFAULT;
@@ -791,8 +791,8 @@ static int kexec_load_unload_compat(unsigned long op,
                                     XEN_GUEST_HANDLE_PARAM(void) uarg)
 {
 #ifdef CONFIG_COMPAT
-    compat_kexec_load_t compat_load;
-    xen_kexec_load_t load;
+    compat_kexec_load_v1_t compat_load;
+    xen_kexec_load_v1_t load;
 
     if ( unlikely(copy_from_guest(&compat_load, uarg, 1)) )
         return -EFAULT;
@@ -864,8 +864,8 @@ static int do_kexec_op_internal(unsigned long op,
         else
                 ret = kexec_get_range(uarg);
         break;
-    case KEXEC_CMD_kexec_load:
-    case KEXEC_CMD_kexec_unload:
+    case KEXEC_CMD_kexec_load_v1:
+    case KEXEC_CMD_kexec_unload_v1:
         spin_lock_irqsave(&kexec_lock, flags);
         if (!test_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags))
         {
diff --git a/xen/include/public/kexec.h b/xen/include/public/kexec.h
index 61a8d7d..5259446 100644
--- a/xen/include/public/kexec.h
+++ b/xen/include/public/kexec.h
@@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
  * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
  * image == relocation information for kexec (ignored for unload) [in]
  */
-#define KEXEC_CMD_kexec_load            1
-#define KEXEC_CMD_kexec_unload          2
-typedef struct xen_kexec_load {
+#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
+#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
+typedef struct xen_kexec_load_v1 {
     int type;
     xen_kexec_image_t image;
-} xen_kexec_load_t;
+} xen_kexec_load_v1_t;
 
 #define KEXEC_RANGE_MA_CRASH      0 /* machine address and size of crash area */
 #define KEXEC_RANGE_MA_XEN        1 /* machine address and size of Xen itself */
@@ -152,6 +152,64 @@ typedef struct xen_kexec_range {
     unsigned long start;
 } xen_kexec_range_t;
 
+#if __XEN_INTERFACE_VERSION__ >= 0x00040300
+/*
+ * A contiguous chunk of a kexec image and it's destination machine
+ * address.
+ */
+typedef struct xen_kexec_segment {
+    XEN_GUEST_HANDLE_64(const_void) buf;
+    uint64_t buf_size;
+    uint64_t dest_maddr;
+    uint64_t dest_size;
+} xen_kexec_segment_t;
+DEFINE_XEN_GUEST_HANDLE(xen_kexec_segment_t);
+
+/*
+ * Load a kexec image into memory.
+ *
+ * For KEXEC_TYPE_DEFAULT images, the segments may be anywhere in RAM.
+ * The image is relocated prior to being executed.
+ *
+ * For KEXEC_TYPE_CRASH images, each segment of the image must reside
+ * in the memory region reserved for kexec (KEXEC_RANGE_MA_CRASH) and
+ * the entry point must be within the image. The caller is responsible
+ * for ensuring that multiple images do not overlap.
+ */
+
+#define KEXEC_CMD_kexec_load 4
+typedef struct xen_kexec_load {
+    uint8_t  type;        /* One of KEXEC_TYPE_* */
+    uint16_t arch;        /* ELF machine type (EM_*). */
+    uint32_t __pad;
+    uint64_t entry_maddr; /* image entry point machine address. */
+    uint32_t nr_segments;
+    XEN_GUEST_HANDLE_64(xen_kexec_segment_t) segments;
+} xen_kexec_load_t;
+DEFINE_XEN_GUEST_HANDLE(xen_kexec_load_t);
+
+/*
+ * Unload a kexec image.
+ *
+ * Type must be one of KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH.
+ */
+#define KEXEC_CMD_kexec_unload 5
+typedef struct xen_kexec_unload {
+    uint8_t type;
+} xen_kexec_unload_t;
+DEFINE_XEN_GUEST_HANDLE(xen_kexec_unload_t);
+
+#else /* __XEN_INTERFACE_VERSION__ < 0x00040300 */
+
+#undef KEXEC_CMD_kexec_load
+#undef KEXEC_CMD_kexec_unload
+#define KEXEC_CMD_kexec_load KEXEC_CMD_kexec_load_v1
+#define KEXEC_CMD_kexec_unload KEXEC_CMD_kexec_unload_v1
+
+typedef struct xen_kexec_load_v1_t xen_kexec_load_t;
+
+#endif
+
 #endif /* _XEN_PUBLIC_KEXEC_H */
 
 /*
-- 
1.7.2.5


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 4/8] kexec: add infrastructure for handling kexec images
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (6 preceding siblings ...)
  2013-02-21 17:48 ` [PATCH 4/8] kexec: add infrastructure for handling kexec images David Vrabel
@ 2013-02-21 17:48 ` David Vrabel
  2013-02-21 17:48 ` [PATCH 5/8] kexec: extend hypercall with improved load/unload ops David Vrabel
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-21 17:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Daniel Kiper, kexec, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

Add the code needed to handle and load kexec images into Xen memory or
into the crash region.  This is needed for the new KEXEC_CMD_load and
KEXEC_CMD_unload hypercall sub-ops.

Much of this code is derived from the Linux kernel.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/Makefile      |    1 +
 xen/common/kimage.c      |  887 ++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/kimage.h |   64 ++++
 3 files changed, 952 insertions(+), 0 deletions(-)
 create mode 100644 xen/common/kimage.c
 create mode 100644 xen/include/xen/kimage.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 1677342..4c04018 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -11,6 +11,7 @@ obj-y += irq.o
 obj-y += kernel.o
 obj-y += keyhandler.o
 obj-$(HAS_KEXEC) += kexec.o
+obj-$(HAS_KEXEC) += kimage.o
 obj-y += lib.o
 obj-y += memory.o
 obj-y += multicall.o
diff --git a/xen/common/kimage.c b/xen/common/kimage.c
new file mode 100644
index 0000000..c5f07c3
--- /dev/null
+++ b/xen/common/kimage.c
@@ -0,0 +1,887 @@
+/*
+ * Kexec Image
+ *
+ * Copyright (C) 2013 Citrix Systems R&D Ltd.
+ *
+ * Derived from kernel/kexec.c from Linux:
+ *
+ *   Copyright (C) 2002-2004 Eric Biederman  <ebiederm@xmission.com>
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
+ */
+
+#include <xen/config.h>
+#include <xen/types.h>
+#include <xen/init.h>
+#include <xen/kernel.h>
+#include <xen/errno.h>
+#include <xen/spinlock.h>
+#include <xen/guest_access.h>
+#include <xen/mm.h>
+#include <xen/kexec.h>
+#include <xen/kimage.h>
+
+#include <asm/page.h>
+
+/*
+ * When kexec transitions to the new kernel there is a one-to-one
+ * mapping between physical and virtual addresses.  On processors
+ * where you can disable the MMU this is trivial, and easy.  For
+ * others it is still a simple predictable page table to setup.
+ *
+ * In that environment kexec copies the new kernel to its final
+ * resting place.  This means I can only support memory whose
+ * physical address can fit in an unsigned long.  In particular
+ * addresses where (pfn << PAGE_SHIFT) > ULONG_MAX cannot be handled.
+ * If the assembly stub has more restrictive requirements
+ * KEXEC_SOURCE_MEMORY_LIMIT and KEXEC_DEST_MEMORY_LIMIT can be
+ * defined more restrictively in <asm/kexec.h>.
+ *
+ * The code for the transition from the current kernel to the
+ * the new kernel is placed in the control_code_buffer, whose size
+ * is given by KEXEC_CONTROL_PAGE_SIZE.  In the best case only a single
+ * page of memory is necessary, but some architectures require more.
+ * Because this memory must be identity mapped in the transition from
+ * virtual to physical addresses it must live in the range
+ * 0 - TASK_SIZE, as only the user space mappings are arbitrarily
+ * modifiable.
+ *
+ * The assembly stub in the control code buffer is passed a linked list
+ * of descriptor pages detailing the source pages of the new kernel,
+ * and the destination addresses of those source pages.  As this data
+ * structure is not used in the context of the current OS, it must
+ * be self-contained.
+ *
+ * The code has been made to work with highmem pages and will use a
+ * destination page in its final resting place (if it happens
+ * to allocate it).  The end product of this is that most of the
+ * physical address space, and most of RAM can be used.
+ *
+ * Future directions include:
+ *  - allocating a page table with the control code buffer identity
+ *    mapped, to simplify machine_kexec and make kexec_on_panic more
+ *    reliable.
+ */
+
+/*
+ * KIMAGE_NO_DEST is an impossible destination address..., for
+ * allocating pages whose destination address we do not care about.
+ */
+#define KIMAGE_NO_DEST (-1UL)
+
+static int kimage_is_destination_range(struct kexec_image *image,
+                                       unsigned long start, unsigned long end);
+static struct page_info *kimage_alloc_page(struct kexec_image *image,
+                                           unsigned long dest);
+
+static struct page_info *kimage_alloc_xen_page(void)
+{
+    void *p;
+
+    p = alloc_xenheap_page();
+    if ( p == NULL )
+        return NULL;
+    return virt_to_page(p);
+}
+
+static void kimage_free_xen_page(struct page_info *page)
+{
+    free_xenheap_page(page_to_virt(page));
+}
+
+static int do_kimage_alloc(struct kexec_image **rimage, unsigned long entry,
+                           unsigned long nr_segments,
+                           xen_kexec_segment_t *segments)
+{
+    struct kexec_image *image;
+    unsigned long i;
+    int result;
+
+    /* Allocate a controlling structure */
+    result = -ENOMEM;
+    image = xzalloc(typeof(*image));
+    if ( !image )
+        goto out;
+
+    image->head = 0;
+    image->entry = &image->head;
+    image->last_entry = &image->head;
+    image->control_page = ~0; /* By default this does not apply */
+    image->entry_maddr = entry;
+    image->type = KEXEC_TYPE_DEFAULT;
+    image->nr_segments = nr_segments;
+    image->segments = segments;
+
+    INIT_PAGE_LIST_HEAD(&image->control_pages);
+    INIT_PAGE_LIST_HEAD(&image->dest_pages);
+    INIT_PAGE_LIST_HEAD(&image->unuseable_pages);
+
+    /*
+     * Verify we have good destination addresses.  The caller is
+     * responsible for making certain we don't attempt to load
+     * the new image into invalid or reserved areas of RAM.  This
+     * just verifies it is an address we can use.
+     *
+     * Since the kernel does everything in page size chunks ensure
+     * the destination addresses are page aligned.  Too many
+     * special cases crop of when we don't do this.  The most
+     * insidious is getting overlapping destination addresses
+     * simply because addresses are changed to page size
+     * granularity.
+     */
+    result = -EADDRNOTAVAIL;
+    for ( i = 0; i < nr_segments; i++ )
+    {
+        unsigned long mstart, mend;
+
+        mstart = image->segments[i].dest_maddr;
+        mend   = mstart + image->segments[i].dest_size;
+        if ( (mstart & ~PAGE_MASK) || (mend & ~PAGE_MASK) )
+            goto out;
+        if ( mend >= KEXEC_DESTINATION_MEMORY_LIMIT )
+            goto out;
+    }
+
+    /* Verify our destination addresses do not overlap.
+     * If we alloed overlapping destination addresses
+     * through very weird things can happen with no
+     * easy explanation as one segment stops on another.
+     */
+    result = -EINVAL;
+    for ( i = 0; i < nr_segments; i++ )
+    {
+        unsigned long mstart, mend;
+        unsigned long j;
+
+        mstart = image->segments[i].dest_maddr;
+        mend   = mstart + image->segments[i].dest_size;
+        for (j = 0; j < i; j++ )
+        {
+            unsigned long pstart, pend;
+            pstart = image->segments[j].dest_maddr;
+            pend   = pstart + image->segments[j].dest_size;
+            /* Do the segments overlap ? */
+            if ( (mend > pstart) && (mstart < pend) )
+                goto out;
+        }
+    }
+
+    /* Ensure our buffer sizes are strictly less than
+     * our memory sizes.  This should always be the case,
+     * and it is easier to check up front than to be surprised
+     * later on.
+     */
+    result = -EINVAL;
+    for ( i = 0; i < nr_segments; i++ )
+    {
+        if ( image->segments[i].buf_size > image->segments[i].dest_size )
+            goto out;
+    }
+
+    result = 0;
+out:
+    if ( result == 0 )
+        *rimage = image;
+    else
+        kimage_free(image);
+
+    return result;
+
+}
+
+static int kimage_normal_alloc(struct kexec_image **rimage, unsigned long entry,
+                               unsigned long nr_segments,
+                               xen_kexec_segment_t *segments)
+{
+    int result;
+    struct kexec_image *image;
+    void *code_page;
+
+    /* Allocate and initialize a controlling structure */
+    image = NULL;
+    result = do_kimage_alloc(&image, entry, nr_segments, segments);
+    if ( result )
+        goto out;
+
+    *rimage = image;
+
+    /*
+     * The control code page must still be accessible after the
+     * processor has switched to 32-bit mode.
+     */
+    code_page = alloc_xenheap_pages(0, MEMF_bits(32));
+    if ( code_page == NULL )
+    {
+        result = -ENOMEM;
+        gdprintk(XENLOG_WARNING, "Could not allocate control_code_buffer\n");
+        goto out;
+    }
+    image->control_code_page = virt_to_page(code_page);
+
+    result = 0;
+out:
+    if ( result == 0 )
+        *rimage = image;
+    else
+        xfree(image);
+
+    return result;
+}
+
+static int kimage_crash_alloc(struct kexec_image **rimage, unsigned long entry,
+                              unsigned long nr_segments,
+                              xen_kexec_segment_t *segments)
+{
+    int result;
+    struct kexec_image *image;
+    unsigned long i;
+
+    image = NULL;
+    /* Verify we have a valid entry point */
+    if ( (entry < kexec_crash_area.start)
+         || (entry > kexec_crash_area.start + kexec_crash_area.size))
+    {
+        result = -EADDRNOTAVAIL;
+        goto out;
+    }
+
+    /* Allocate and initialize a controlling structure */
+    result = do_kimage_alloc(&image, entry, nr_segments, segments);
+    if ( result )
+        goto out;
+
+    /* Enable the special crash kernel control page
+     * allocation policy.
+     */
+    image->control_page = kexec_crash_area.start;
+    image->type = KEXEC_TYPE_CRASH;
+
+    /*
+     * Verify we have good destination addresses.  Normally
+     * the caller is responsible for making certain we don't
+     * attempt to load the new image into invalid or reserved
+     * areas of RAM.  But crash kernels are preloaded into a
+     * reserved area of ram.  We must ensure the addresses
+     * are in the reserved area otherwise preloading the
+     * kernel could corrupt things.
+     */
+    result = -EADDRNOTAVAIL;
+    for ( i = 0; i < nr_segments; i++ )
+    {
+        unsigned long mstart, mend;
+
+        mstart = image->segments[i].dest_maddr;
+        mend = mstart + image->segments[i].dest_size - 1;
+        /* Ensure we are within the crash kernel limits */
+        if ( (mstart < kexec_crash_area.start )
+             || (mend > kexec_crash_area.start + kexec_crash_area.size))
+            goto out;
+    }
+
+    /*
+     * Find a location for the control code buffer, and add
+     * the vector of segments so that it's pages will also be
+     * counted as destination pages.
+     */
+    result = -ENOMEM;
+    image->control_code_page = kimage_alloc_control_page(image);
+    if ( !image->control_code_page )
+    {
+        gdprintk(XENLOG_WARNING, "Could not allocate control_code_buffer\n");
+        goto out;
+    }
+
+    result = 0;
+out:
+    if ( result == 0 )
+        *rimage = image;
+    else
+        xfree(image);
+
+    return result;
+}
+
+static int kimage_is_destination_range(struct kexec_image *image,
+                                       unsigned long start,
+                                       unsigned long end)
+{
+    unsigned long i;
+
+    for ( i = 0; i < image->nr_segments; i++ )
+    {
+        unsigned long mstart, mend;
+
+        mstart = image->segments[i].dest_maddr;
+        mend = mstart + image->segments[i].dest_size;
+        if ( (end > mstart) && (start < mend) )
+            return 1;
+    }
+
+    return 0;
+}
+
+static void kimage_free_page_list(struct page_list_head *list)
+{
+    struct page_info *page, *next;
+
+    page_list_for_each_safe(page, next, list)
+    {
+        printk("delete page %p\n", page);
+        page_list_del(page, list);
+        kimage_free_xen_page(page);
+    }
+}
+
+static struct page_info *kimage_alloc_normal_control_page(struct kexec_image *image)
+{
+    /* Control pages are special, they are the intermediaries
+     * that are needed while we copy the rest of the pages
+     * to their final resting place.  As such they must
+     * not conflict with either the destination addresses
+     * or memory the kernel is already using.
+     *
+     * The only case where we really need more than one of
+     * these are for architectures where we cannot disable
+     * the MMU and must instead generate an identity mapped
+     * page table for all of the memory.
+     *
+     * At worst this runs in O(N) of the image size.
+     */
+    struct page_list_head extra_pages;
+    struct page_info *page = NULL;
+
+    INIT_PAGE_LIST_HEAD(&extra_pages);
+
+    /* Loop while I can allocate a page and the page allocated
+     * is a destination page.
+     */
+    do {
+        unsigned long mfn, emfn, addr, eaddr;
+
+        page = kimage_alloc_xen_page();
+        if ( !page )
+            break;
+        mfn   = page_to_mfn(page);
+        emfn  = mfn + 1;
+        addr  = mfn << PAGE_SHIFT;
+        eaddr = emfn << PAGE_SHIFT;
+        if ( (emfn >= (KEXEC_CONTROL_MEMORY_LIMIT >> PAGE_SHIFT)) ||
+             kimage_is_destination_range(image, addr, eaddr) )
+        {
+            printk("add page %p\n", page);
+            page_list_add(page, &extra_pages);
+            page = NULL;
+        }
+    } while ( !page );
+
+    if ( page )
+    {
+        /* Remember the allocated page... */
+        page_list_add(page, &image->control_pages);
+
+        /* Because the page is already in it's destination
+         * location we will never allocate another page at
+         * that address.  Therefore kimage_alloc_page
+         * will not return it (again) and we don't need
+         * to give it an entry in image->segments[].
+         */
+    }
+    /* Deal with the destination pages I have inadvertently allocated.
+     *
+     * Ideally I would convert multi-page allocations into single
+     * page allocations, and add everything to image->dest_pages.
+     *
+     * For now it is simpler to just free the pages.
+     */
+    kimage_free_page_list(&extra_pages);
+
+    return page;
+}
+
+static struct page_info *kimage_alloc_crash_control_page(struct kexec_image *image)
+{
+    /* Control pages are special, they are the intermediaries
+     * that are needed while we copy the rest of the pages
+     * to their final resting place.  As such they must
+     * not conflict with either the destination addresses
+     * or memory the kernel is already using.
+     *
+     * Control pages are also the only pags we must allocate
+     * when loading a crash kernel.  All of the other pages
+     * are specified by the segments and we just memcpy
+     * into them directly.
+     *
+     * The only case where we really need more than one of
+     * these are for architectures where we cannot disable
+     * the MMU and must instead generate an identity mapped
+     * page table for all of the memory.
+     *
+     * Given the low demand this implements a very simple
+     * allocator that finds the first hole of the appropriate
+     * size in the reserved memory region, and allocates all
+     * of the memory up to and including the hole.
+     */
+    unsigned long hole_start, hole_end, size;
+    struct page_info *page;
+
+    page = NULL;
+    size = PAGE_SIZE;
+    hole_start = (image->control_page + (size - 1)) & ~(size - 1);
+    hole_end   = hole_start + size - 1;
+    while ( hole_end <= kexec_crash_area.start + kexec_crash_area.size )
+    {
+        unsigned long i;
+
+        if ( hole_end > KEXEC_CRASH_CONTROL_MEMORY_LIMIT )
+            break;
+        if ( hole_end > kexec_crash_area.start + kexec_crash_area.size )
+            break;
+        /* See if I overlap any of the segments */
+        for ( i = 0; i < image->nr_segments; i++ )
+        {
+            unsigned long mstart, mend;
+
+            mstart = image->segments[i].dest_maddr;
+            mend   = mstart + image->segments[i].dest_size - 1;
+            if ( (hole_end >= mstart) && (hole_start <= mend) )
+            {
+                /* Advance the hole to the end of the segment */
+                hole_start = (mend + (size - 1)) & ~(size - 1);
+                hole_end   = hole_start + size - 1;
+                break;
+            }
+        }
+        /* If I don't overlap any segments I have found my hole! */
+        if ( i == image->nr_segments )
+        {
+            page = mfn_to_page(hole_start >> PAGE_SHIFT);
+            break;
+        }
+    }
+    if ( page )
+        image->control_page = hole_end;
+
+    return page;
+}
+
+
+struct page_info *kimage_alloc_control_page(struct kexec_image *image)
+{
+    struct page_info *pages = NULL;
+
+    switch ( image->type )
+    {
+    case KEXEC_TYPE_DEFAULT:
+        pages = kimage_alloc_normal_control_page(image);
+        break;
+    case KEXEC_TYPE_CRASH:
+        pages = kimage_alloc_crash_control_page(image);
+        break;
+    }
+
+    if ( pages )
+        clear_page(page_to_virt(pages));
+
+    return pages;
+}
+
+static int kimage_add_entry(struct kexec_image *image, kimage_entry_t entry)
+{
+    if ( *image->entry != 0 )
+        image->entry++;
+
+    if ( image->entry == image->last_entry )
+    {
+        kimage_entry_t *ind_page;
+        struct page_info *page;
+
+        page = kimage_alloc_page(image, KIMAGE_NO_DEST);
+        if ( !page )
+            return -ENOMEM;
+
+        ind_page = page_to_virt(page);
+        *image->entry = page_to_maddr(page) | IND_INDIRECTION;
+        image->entry = ind_page;
+        image->last_entry = ind_page +
+            ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1);
+    }
+    *image->entry = entry;
+    image->entry++;
+    *image->entry = 0;
+
+    return 0;
+}
+
+static int kimage_set_destination(struct kexec_image *image,
+                                  unsigned long destination)
+{
+    int result;
+
+    destination &= PAGE_MASK;
+    result = kimage_add_entry(image, destination | IND_DESTINATION);
+    if ( result == 0 )
+        image->destination = destination;
+
+    return result;
+}
+
+
+static int kimage_add_page(struct kexec_image *image, unsigned long page)
+{
+    int result;
+
+    page &= PAGE_MASK;
+    result = kimage_add_entry(image, page | IND_SOURCE);
+    if ( result == 0 )
+        image->destination += PAGE_SIZE;
+
+    return result;
+}
+
+
+static void kimage_free_extra_pages(struct kexec_image *image)
+{
+    kimage_free_page_list(&image->dest_pages);
+    kimage_free_page_list(&image->unuseable_pages);
+
+}
+
+static void kimage_terminate(struct kexec_image *image)
+{
+    if ( *image->entry != 0 )
+        image->entry++;
+
+    *image->entry = IND_DONE;
+}
+
+#define for_each_kimage_entry(image, ptr, entry)                      \
+    for ( ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE ); \
+          ptr = (entry & IND_INDIRECTION) ?                           \
+              maddr_to_virt((entry & PAGE_MASK)) : ptr + 1)
+
+static void kimage_free_entry(kimage_entry_t entry)
+{
+    struct page_info *page;
+
+    page = mfn_to_page(entry >> PAGE_SHIFT);
+    kimage_free_xen_page(page);
+}
+
+void kimage_free(struct kexec_image *image)
+{
+    kimage_entry_t *ptr, entry;
+    kimage_entry_t ind = 0;
+
+    if ( !image )
+        return;
+
+    kimage_free_extra_pages(image);
+    for_each_kimage_entry(image, ptr, entry)
+    {
+        if ( entry & IND_INDIRECTION )
+        {
+            /* Free the previous indirection page */
+            if ( ind & IND_INDIRECTION )
+                kimage_free_entry(ind);
+            /* Save this indirection page until we are
+             * done with it.
+             */
+            ind = entry;
+        }
+        else if ( entry & IND_SOURCE )
+            kimage_free_entry(entry);
+    }
+    /* Free the final indirection page */
+    if ( ind & IND_INDIRECTION )
+        kimage_free_entry(ind);
+
+    /* Free the kexec control pages... */
+    kimage_free_page_list(&image->control_pages);
+    xfree(image->segments);
+    xfree(image);
+}
+
+static kimage_entry_t *kimage_dst_used(struct kexec_image *image,
+                                       unsigned long page)
+{
+    kimage_entry_t *ptr, entry;
+    unsigned long destination = 0;
+
+    for_each_kimage_entry(image, ptr, entry)
+    {
+        if ( entry & IND_DESTINATION )
+            destination = entry & PAGE_MASK;
+        else if ( entry & IND_SOURCE )
+        {
+            if ( page == destination )
+                return ptr;
+            destination += PAGE_SIZE;
+        }
+    }
+
+    return NULL;
+}
+
+static struct page_info *kimage_alloc_page(struct kexec_image *image,
+                                      unsigned long destination)
+{
+    /*
+     * Here we implement safeguards to ensure that a source page
+     * is not copied to its destination page before the data on
+     * the destination page is no longer useful.
+     *
+     * To do this we maintain the invariant that a source page is
+     * either its own destination page, or it is not a
+     * destination page at all.
+     *
+     * That is slightly stronger than required, but the proof
+     * that no problems will not occur is trivial, and the
+     * implementation is simply to verify.
+     *
+     * When allocating all pages normally this algorithm will run
+     * in O(N) time, but in the worst case it will run in O(N^2)
+     * time.   If the runtime is a problem the data structures can
+     * be fixed.
+     */
+    struct page_info *page;
+    unsigned long addr;
+
+    /*
+     * Walk through the list of destination pages, and see if I
+     * have a match.
+     */
+    page_list_for_each(page, &image->dest_pages)
+    {
+        addr = page_to_mfn(page) << PAGE_SHIFT;
+        if ( addr == destination )
+        {
+            page_list_del(page, &image->dest_pages);
+            return page;
+        }
+    }
+    page = NULL;
+    for (;;)
+    {
+        kimage_entry_t *old;
+
+        /* Allocate a page, if we run out of memory give up */
+        page = kimage_alloc_xen_page();
+        if ( !page )
+            return NULL;
+        /* If the page cannot be used file it away */
+        if ( page_to_mfn(page) >
+             (KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT) )
+        {
+            page_list_add(page, &image->unuseable_pages);
+            continue;
+        }
+        addr = page_to_mfn(page) << PAGE_SHIFT;
+
+        /* If it is the destination page we want use it */
+        if ( addr == destination )
+            break;
+
+        /* If the page is not a destination page use it */
+        if ( !kimage_is_destination_range(image, addr,
+                                          addr + PAGE_SIZE) )
+            break;
+
+        /*
+         * I know that the page is someones destination page.
+         * See if there is already a source page for this
+         * destination page.  And if so swap the source pages.
+         */
+        old = kimage_dst_used(image, addr);
+        if ( old )
+        {
+            /* If so move it */
+            unsigned long old_addr;
+            struct page_info *old_page;
+
+            old_addr = *old & PAGE_MASK;
+            old_page = mfn_to_page(old_addr >> PAGE_SHIFT);
+            copy_page(page, old_page);
+            *old = addr | (*old & ~PAGE_MASK);
+
+            addr = old_addr;
+            page = old_page;
+            break;
+        }
+        else
+        {
+            /* Place the page on the destination list I
+             * will use it later.
+             */
+            page_list_add(page, &image->dest_pages);
+        }
+    }
+
+    return page;
+}
+
+static int kimage_load_normal_segment(struct kexec_image *image,
+                                      xen_kexec_segment_t *segment)
+{
+    unsigned long to_copy;
+    unsigned long src_offset;
+    unsigned long dest;
+    int ret;
+
+    to_copy = segment->buf_size;
+    src_offset = 0;
+    dest = segment->dest_maddr;
+
+    ret = kimage_set_destination(image, dest);
+    if ( ret < 0 )
+        return ret;
+
+    while ( to_copy )
+    {
+        unsigned long dest_mfn;
+        size_t dest_off;
+        struct page_info *page;
+        void *dest_va;
+        size_t size;
+
+        dest_mfn = dest >> PAGE_SHIFT;
+        dest_off = dest & ~PAGE_MASK;
+
+        size = min(PAGE_SIZE - dest_off, to_copy);
+
+        page = kimage_alloc_page(image, dest);
+        if ( !page )
+            return -ENOMEM;
+        ret = kimage_add_page(image, page_to_mfn(page) << PAGE_SHIFT);
+        if ( ret < 0 )
+            return ret;
+
+        dest_va = page_to_virt(page);
+        clear_page(dest_va);
+        ret = copy_from_guest_offset(dest_va + dest_off, segment->buf, src_offset, size);
+        if ( ret )
+            return -EFAULT;
+
+        to_copy -= size;
+        src_offset += size;
+        dest += size;
+    }
+
+    return 0;
+}
+
+static int kimage_load_crash_segment(struct kexec_image *image,
+                                     xen_kexec_segment_t *segment)
+{
+    /* For crash dumps kernels we simply copy the data from
+     * user space to it's destination.
+     * We do things a page at a time for the sake of kmap.
+     */
+	unsigned long dest;
+	unsigned long sbytes, dbytes;
+    int ret = 0;
+    unsigned long src_offset = 0;
+
+	sbytes = segment->buf_size;
+	dbytes = segment->dest_size;
+	dest = segment->dest_maddr;
+
+	while ( dbytes )
+    {
+        unsigned long dest_mfn;
+        size_t dest_off;
+        void *dest_va;
+		size_t schunk, dchunk;
+
+        dest_mfn = dest >> PAGE_SHIFT;
+        dest_off = dest & ~PAGE_MASK;
+
+		dchunk = min(PAGE_SIZE - dest_off, dbytes);
+        schunk = min(dchunk, sbytes);
+
+		dest_va = vmap(&dest_mfn, 1);
+        if ( dest_va == NULL )
+            return -EINVAL;
+
+        ret = copy_from_guest_offset(dest_va + dest_off, segment->buf, src_offset, schunk);
+        memset(dest_va + dest_off + schunk, 0, dchunk - schunk);
+
+		vunmap(dest_va);
+		if ( ret )
+            return -EFAULT;
+
+		dbytes -= dchunk;
+		sbytes -= schunk;
+		dest += dchunk;
+        src_offset += schunk;
+	}
+
+    return 0;
+}
+
+static int kimage_load_segment(struct kexec_image *image, xen_kexec_segment_t *segment)
+{
+    int result = -ENOMEM;
+
+    switch ( image->type )
+    {
+    case KEXEC_TYPE_DEFAULT:
+        result = kimage_load_normal_segment(image, segment);
+        break;
+    case KEXEC_TYPE_CRASH:
+        result = kimage_load_crash_segment(image, segment);
+        break;
+    }
+
+    return result;
+}
+
+int kimage_alloc(struct kexec_image **rimage, uint8_t type, uint16_t arch,
+                 uint64_t entry_maddr,
+                 uint32_t nr_segments, xen_kexec_segment_t *segment)
+{
+    int result;
+
+    switch( type )
+    {
+    case KEXEC_TYPE_DEFAULT:
+        result = kimage_normal_alloc(rimage, entry_maddr, nr_segments, segment);
+        break;
+    case KEXEC_TYPE_CRASH:
+        result = kimage_crash_alloc(rimage, entry_maddr, nr_segments, segment);
+        break;
+    default:
+        result = -EINVAL;
+        break;
+    }
+    if ( result < 0 )
+        return result;
+
+    (*rimage)->arch = arch;
+
+    return result;
+}
+
+int kimage_load_segments(struct kexec_image *image)
+{
+    int s;
+    int result;
+
+    for ( s = 0; s < image->nr_segments; s++ ) {
+        result = kimage_load_segment(image, &image->segments[s]);
+        if ( result < 0 )
+            return result;
+    }
+    kimage_terminate(image);
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/kimage.h b/xen/include/xen/kimage.h
new file mode 100644
index 0000000..dc71b87
--- /dev/null
+++ b/xen/include/xen/kimage.h
@@ -0,0 +1,64 @@
+#ifndef __XEN_KIMAGE_H__
+#define __XEN_KIMAGE_H__
+
+#include <xen/list.h>
+#include <xen/mm.h>
+#include <public/kexec.h>
+
+#define KEXEC_DESTINATION_MEMORY_LIMIT (~0ul)
+#define KEXEC_CONTROL_MEMORY_LIMIT (~0ul)
+#define KEXEC_CRASH_CONTROL_MEMORY_LIMIT (~0ul)
+#define KEXEC_SOURCE_MEMORY_LIMIT (~0ul)
+
+#define KEXEC_CONTROL_PAGE_SIZE PAGE_SIZE
+
+#define KEXEC_SEGMENT_MAX 16
+
+typedef unsigned long kimage_entry_t;
+#define IND_DESTINATION  0x1
+#define IND_INDIRECTION  0x2
+#define IND_DONE         0x4
+#define IND_SOURCE       0x8
+
+struct kexec_image {
+    uint8_t type;
+    uint16_t arch;
+    uint64_t entry_maddr;
+    uint32_t nr_segments;
+    xen_kexec_segment_t *segments;
+
+    kimage_entry_t head;
+    kimage_entry_t *entry;
+    kimage_entry_t *last_entry;
+
+    unsigned long destination;
+
+    struct page_info *control_code_page;
+    struct page_info *aux_page;
+
+    struct page_list_head control_pages;
+    struct page_list_head dest_pages;
+    struct page_list_head unuseable_pages;
+
+    /* Address of next control page to allocate for crash kernels. */
+    unsigned long control_page;
+};
+
+int kimage_alloc(struct kexec_image **rimage, uint8_t type, uint16_t arch,
+                 uint64_t entry_maddr,
+                 uint32_t nr_segments, xen_kexec_segment_t *segment);
+void kimage_free(struct kexec_image *image);
+int kimage_load_segments(struct kexec_image *image);
+struct page_info *kimage_alloc_control_page(struct kexec_image *image);
+
+#endif /* __XEN_KIMAGE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 4/8] kexec: add infrastructure for handling kexec images
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (5 preceding siblings ...)
  2013-02-21 17:48 ` David Vrabel
@ 2013-02-21 17:48 ` David Vrabel
  2013-03-08 11:37   ` Daniel Kiper
  2013-03-08 11:37   ` Daniel Kiper
  2013-02-21 17:48 ` David Vrabel
                   ` (16 subsequent siblings)
  23 siblings, 2 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-21 17:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Daniel Kiper, kexec, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

Add the code needed to handle and load kexec images into Xen memory or
into the crash region.  This is needed for the new KEXEC_CMD_load and
KEXEC_CMD_unload hypercall sub-ops.

Much of this code is derived from the Linux kernel.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/Makefile      |    1 +
 xen/common/kimage.c      |  887 ++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/kimage.h |   64 ++++
 3 files changed, 952 insertions(+), 0 deletions(-)
 create mode 100644 xen/common/kimage.c
 create mode 100644 xen/include/xen/kimage.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 1677342..4c04018 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -11,6 +11,7 @@ obj-y += irq.o
 obj-y += kernel.o
 obj-y += keyhandler.o
 obj-$(HAS_KEXEC) += kexec.o
+obj-$(HAS_KEXEC) += kimage.o
 obj-y += lib.o
 obj-y += memory.o
 obj-y += multicall.o
diff --git a/xen/common/kimage.c b/xen/common/kimage.c
new file mode 100644
index 0000000..c5f07c3
--- /dev/null
+++ b/xen/common/kimage.c
@@ -0,0 +1,887 @@
+/*
+ * Kexec Image
+ *
+ * Copyright (C) 2013 Citrix Systems R&D Ltd.
+ *
+ * Derived from kernel/kexec.c from Linux:
+ *
+ *   Copyright (C) 2002-2004 Eric Biederman  <ebiederm@xmission.com>
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
+ */
+
+#include <xen/config.h>
+#include <xen/types.h>
+#include <xen/init.h>
+#include <xen/kernel.h>
+#include <xen/errno.h>
+#include <xen/spinlock.h>
+#include <xen/guest_access.h>
+#include <xen/mm.h>
+#include <xen/kexec.h>
+#include <xen/kimage.h>
+
+#include <asm/page.h>
+
+/*
+ * When kexec transitions to the new kernel there is a one-to-one
+ * mapping between physical and virtual addresses.  On processors
+ * where you can disable the MMU this is trivial, and easy.  For
+ * others it is still a simple predictable page table to setup.
+ *
+ * In that environment kexec copies the new kernel to its final
+ * resting place.  This means I can only support memory whose
+ * physical address can fit in an unsigned long.  In particular
+ * addresses where (pfn << PAGE_SHIFT) > ULONG_MAX cannot be handled.
+ * If the assembly stub has more restrictive requirements
+ * KEXEC_SOURCE_MEMORY_LIMIT and KEXEC_DEST_MEMORY_LIMIT can be
+ * defined more restrictively in <asm/kexec.h>.
+ *
+ * The code for the transition from the current kernel to the
+ * the new kernel is placed in the control_code_buffer, whose size
+ * is given by KEXEC_CONTROL_PAGE_SIZE.  In the best case only a single
+ * page of memory is necessary, but some architectures require more.
+ * Because this memory must be identity mapped in the transition from
+ * virtual to physical addresses it must live in the range
+ * 0 - TASK_SIZE, as only the user space mappings are arbitrarily
+ * modifiable.
+ *
+ * The assembly stub in the control code buffer is passed a linked list
+ * of descriptor pages detailing the source pages of the new kernel,
+ * and the destination addresses of those source pages.  As this data
+ * structure is not used in the context of the current OS, it must
+ * be self-contained.
+ *
+ * The code has been made to work with highmem pages and will use a
+ * destination page in its final resting place (if it happens
+ * to allocate it).  The end product of this is that most of the
+ * physical address space, and most of RAM can be used.
+ *
+ * Future directions include:
+ *  - allocating a page table with the control code buffer identity
+ *    mapped, to simplify machine_kexec and make kexec_on_panic more
+ *    reliable.
+ */
+
+/*
+ * KIMAGE_NO_DEST is an impossible destination address..., for
+ * allocating pages whose destination address we do not care about.
+ */
+#define KIMAGE_NO_DEST (-1UL)
+
+static int kimage_is_destination_range(struct kexec_image *image,
+                                       unsigned long start, unsigned long end);
+static struct page_info *kimage_alloc_page(struct kexec_image *image,
+                                           unsigned long dest);
+
+static struct page_info *kimage_alloc_xen_page(void)
+{
+    void *p;
+
+    p = alloc_xenheap_page();
+    if ( p == NULL )
+        return NULL;
+    return virt_to_page(p);
+}
+
+static void kimage_free_xen_page(struct page_info *page)
+{
+    free_xenheap_page(page_to_virt(page));
+}
+
+static int do_kimage_alloc(struct kexec_image **rimage, unsigned long entry,
+                           unsigned long nr_segments,
+                           xen_kexec_segment_t *segments)
+{
+    struct kexec_image *image;
+    unsigned long i;
+    int result;
+
+    /* Allocate a controlling structure */
+    result = -ENOMEM;
+    image = xzalloc(typeof(*image));
+    if ( !image )
+        goto out;
+
+    image->head = 0;
+    image->entry = &image->head;
+    image->last_entry = &image->head;
+    image->control_page = ~0; /* By default this does not apply */
+    image->entry_maddr = entry;
+    image->type = KEXEC_TYPE_DEFAULT;
+    image->nr_segments = nr_segments;
+    image->segments = segments;
+
+    INIT_PAGE_LIST_HEAD(&image->control_pages);
+    INIT_PAGE_LIST_HEAD(&image->dest_pages);
+    INIT_PAGE_LIST_HEAD(&image->unuseable_pages);
+
+    /*
+     * Verify we have good destination addresses.  The caller is
+     * responsible for making certain we don't attempt to load
+     * the new image into invalid or reserved areas of RAM.  This
+     * just verifies it is an address we can use.
+     *
+     * Since the kernel does everything in page size chunks ensure
+     * the destination addresses are page aligned.  Too many
+     * special cases crop of when we don't do this.  The most
+     * insidious is getting overlapping destination addresses
+     * simply because addresses are changed to page size
+     * granularity.
+     */
+    result = -EADDRNOTAVAIL;
+    for ( i = 0; i < nr_segments; i++ )
+    {
+        unsigned long mstart, mend;
+
+        mstart = image->segments[i].dest_maddr;
+        mend   = mstart + image->segments[i].dest_size;
+        if ( (mstart & ~PAGE_MASK) || (mend & ~PAGE_MASK) )
+            goto out;
+        if ( mend >= KEXEC_DESTINATION_MEMORY_LIMIT )
+            goto out;
+    }
+
+    /* Verify our destination addresses do not overlap.
+     * If we alloed overlapping destination addresses
+     * through very weird things can happen with no
+     * easy explanation as one segment stops on another.
+     */
+    result = -EINVAL;
+    for ( i = 0; i < nr_segments; i++ )
+    {
+        unsigned long mstart, mend;
+        unsigned long j;
+
+        mstart = image->segments[i].dest_maddr;
+        mend   = mstart + image->segments[i].dest_size;
+        for (j = 0; j < i; j++ )
+        {
+            unsigned long pstart, pend;
+            pstart = image->segments[j].dest_maddr;
+            pend   = pstart + image->segments[j].dest_size;
+            /* Do the segments overlap ? */
+            if ( (mend > pstart) && (mstart < pend) )
+                goto out;
+        }
+    }
+
+    /* Ensure our buffer sizes are strictly less than
+     * our memory sizes.  This should always be the case,
+     * and it is easier to check up front than to be surprised
+     * later on.
+     */
+    result = -EINVAL;
+    for ( i = 0; i < nr_segments; i++ )
+    {
+        if ( image->segments[i].buf_size > image->segments[i].dest_size )
+            goto out;
+    }
+
+    result = 0;
+out:
+    if ( result == 0 )
+        *rimage = image;
+    else
+        kimage_free(image);
+
+    return result;
+
+}
+
+static int kimage_normal_alloc(struct kexec_image **rimage, unsigned long entry,
+                               unsigned long nr_segments,
+                               xen_kexec_segment_t *segments)
+{
+    int result;
+    struct kexec_image *image;
+    void *code_page;
+
+    /* Allocate and initialize a controlling structure */
+    image = NULL;
+    result = do_kimage_alloc(&image, entry, nr_segments, segments);
+    if ( result )
+        goto out;
+
+    *rimage = image;
+
+    /*
+     * The control code page must still be accessible after the
+     * processor has switched to 32-bit mode.
+     */
+    code_page = alloc_xenheap_pages(0, MEMF_bits(32));
+    if ( code_page == NULL )
+    {
+        result = -ENOMEM;
+        gdprintk(XENLOG_WARNING, "Could not allocate control_code_buffer\n");
+        goto out;
+    }
+    image->control_code_page = virt_to_page(code_page);
+
+    result = 0;
+out:
+    if ( result == 0 )
+        *rimage = image;
+    else
+        xfree(image);
+
+    return result;
+}
+
+static int kimage_crash_alloc(struct kexec_image **rimage, unsigned long entry,
+                              unsigned long nr_segments,
+                              xen_kexec_segment_t *segments)
+{
+    int result;
+    struct kexec_image *image;
+    unsigned long i;
+
+    image = NULL;
+    /* Verify we have a valid entry point */
+    if ( (entry < kexec_crash_area.start)
+         || (entry > kexec_crash_area.start + kexec_crash_area.size))
+    {
+        result = -EADDRNOTAVAIL;
+        goto out;
+    }
+
+    /* Allocate and initialize a controlling structure */
+    result = do_kimage_alloc(&image, entry, nr_segments, segments);
+    if ( result )
+        goto out;
+
+    /* Enable the special crash kernel control page
+     * allocation policy.
+     */
+    image->control_page = kexec_crash_area.start;
+    image->type = KEXEC_TYPE_CRASH;
+
+    /*
+     * Verify we have good destination addresses.  Normally
+     * the caller is responsible for making certain we don't
+     * attempt to load the new image into invalid or reserved
+     * areas of RAM.  But crash kernels are preloaded into a
+     * reserved area of ram.  We must ensure the addresses
+     * are in the reserved area otherwise preloading the
+     * kernel could corrupt things.
+     */
+    result = -EADDRNOTAVAIL;
+    for ( i = 0; i < nr_segments; i++ )
+    {
+        unsigned long mstart, mend;
+
+        mstart = image->segments[i].dest_maddr;
+        mend = mstart + image->segments[i].dest_size - 1;
+        /* Ensure we are within the crash kernel limits */
+        if ( (mstart < kexec_crash_area.start )
+             || (mend > kexec_crash_area.start + kexec_crash_area.size))
+            goto out;
+    }
+
+    /*
+     * Find a location for the control code buffer, and add
+     * the vector of segments so that it's pages will also be
+     * counted as destination pages.
+     */
+    result = -ENOMEM;
+    image->control_code_page = kimage_alloc_control_page(image);
+    if ( !image->control_code_page )
+    {
+        gdprintk(XENLOG_WARNING, "Could not allocate control_code_buffer\n");
+        goto out;
+    }
+
+    result = 0;
+out:
+    if ( result == 0 )
+        *rimage = image;
+    else
+        xfree(image);
+
+    return result;
+}
+
+static int kimage_is_destination_range(struct kexec_image *image,
+                                       unsigned long start,
+                                       unsigned long end)
+{
+    unsigned long i;
+
+    for ( i = 0; i < image->nr_segments; i++ )
+    {
+        unsigned long mstart, mend;
+
+        mstart = image->segments[i].dest_maddr;
+        mend = mstart + image->segments[i].dest_size;
+        if ( (end > mstart) && (start < mend) )
+            return 1;
+    }
+
+    return 0;
+}
+
+static void kimage_free_page_list(struct page_list_head *list)
+{
+    struct page_info *page, *next;
+
+    page_list_for_each_safe(page, next, list)
+    {
+        printk("delete page %p\n", page);
+        page_list_del(page, list);
+        kimage_free_xen_page(page);
+    }
+}
+
+static struct page_info *kimage_alloc_normal_control_page(struct kexec_image *image)
+{
+    /* Control pages are special, they are the intermediaries
+     * that are needed while we copy the rest of the pages
+     * to their final resting place.  As such they must
+     * not conflict with either the destination addresses
+     * or memory the kernel is already using.
+     *
+     * The only case where we really need more than one of
+     * these are for architectures where we cannot disable
+     * the MMU and must instead generate an identity mapped
+     * page table for all of the memory.
+     *
+     * At worst this runs in O(N) of the image size.
+     */
+    struct page_list_head extra_pages;
+    struct page_info *page = NULL;
+
+    INIT_PAGE_LIST_HEAD(&extra_pages);
+
+    /* Loop while I can allocate a page and the page allocated
+     * is a destination page.
+     */
+    do {
+        unsigned long mfn, emfn, addr, eaddr;
+
+        page = kimage_alloc_xen_page();
+        if ( !page )
+            break;
+        mfn   = page_to_mfn(page);
+        emfn  = mfn + 1;
+        addr  = mfn << PAGE_SHIFT;
+        eaddr = emfn << PAGE_SHIFT;
+        if ( (emfn >= (KEXEC_CONTROL_MEMORY_LIMIT >> PAGE_SHIFT)) ||
+             kimage_is_destination_range(image, addr, eaddr) )
+        {
+            printk("add page %p\n", page);
+            page_list_add(page, &extra_pages);
+            page = NULL;
+        }
+    } while ( !page );
+
+    if ( page )
+    {
+        /* Remember the allocated page... */
+        page_list_add(page, &image->control_pages);
+
+        /* Because the page is already in it's destination
+         * location we will never allocate another page at
+         * that address.  Therefore kimage_alloc_page
+         * will not return it (again) and we don't need
+         * to give it an entry in image->segments[].
+         */
+    }
+    /* Deal with the destination pages I have inadvertently allocated.
+     *
+     * Ideally I would convert multi-page allocations into single
+     * page allocations, and add everything to image->dest_pages.
+     *
+     * For now it is simpler to just free the pages.
+     */
+    kimage_free_page_list(&extra_pages);
+
+    return page;
+}
+
+static struct page_info *kimage_alloc_crash_control_page(struct kexec_image *image)
+{
+    /* Control pages are special, they are the intermediaries
+     * that are needed while we copy the rest of the pages
+     * to their final resting place.  As such they must
+     * not conflict with either the destination addresses
+     * or memory the kernel is already using.
+     *
+     * Control pages are also the only pags we must allocate
+     * when loading a crash kernel.  All of the other pages
+     * are specified by the segments and we just memcpy
+     * into them directly.
+     *
+     * The only case where we really need more than one of
+     * these are for architectures where we cannot disable
+     * the MMU and must instead generate an identity mapped
+     * page table for all of the memory.
+     *
+     * Given the low demand this implements a very simple
+     * allocator that finds the first hole of the appropriate
+     * size in the reserved memory region, and allocates all
+     * of the memory up to and including the hole.
+     */
+    unsigned long hole_start, hole_end, size;
+    struct page_info *page;
+
+    page = NULL;
+    size = PAGE_SIZE;
+    hole_start = (image->control_page + (size - 1)) & ~(size - 1);
+    hole_end   = hole_start + size - 1;
+    while ( hole_end <= kexec_crash_area.start + kexec_crash_area.size )
+    {
+        unsigned long i;
+
+        if ( hole_end > KEXEC_CRASH_CONTROL_MEMORY_LIMIT )
+            break;
+        if ( hole_end > kexec_crash_area.start + kexec_crash_area.size )
+            break;
+        /* See if I overlap any of the segments */
+        for ( i = 0; i < image->nr_segments; i++ )
+        {
+            unsigned long mstart, mend;
+
+            mstart = image->segments[i].dest_maddr;
+            mend   = mstart + image->segments[i].dest_size - 1;
+            if ( (hole_end >= mstart) && (hole_start <= mend) )
+            {
+                /* Advance the hole to the end of the segment */
+                hole_start = (mend + (size - 1)) & ~(size - 1);
+                hole_end   = hole_start + size - 1;
+                break;
+            }
+        }
+        /* If I don't overlap any segments I have found my hole! */
+        if ( i == image->nr_segments )
+        {
+            page = mfn_to_page(hole_start >> PAGE_SHIFT);
+            break;
+        }
+    }
+    if ( page )
+        image->control_page = hole_end;
+
+    return page;
+}
+
+
+struct page_info *kimage_alloc_control_page(struct kexec_image *image)
+{
+    struct page_info *pages = NULL;
+
+    switch ( image->type )
+    {
+    case KEXEC_TYPE_DEFAULT:
+        pages = kimage_alloc_normal_control_page(image);
+        break;
+    case KEXEC_TYPE_CRASH:
+        pages = kimage_alloc_crash_control_page(image);
+        break;
+    }
+
+    if ( pages )
+        clear_page(page_to_virt(pages));
+
+    return pages;
+}
+
+static int kimage_add_entry(struct kexec_image *image, kimage_entry_t entry)
+{
+    if ( *image->entry != 0 )
+        image->entry++;
+
+    if ( image->entry == image->last_entry )
+    {
+        kimage_entry_t *ind_page;
+        struct page_info *page;
+
+        page = kimage_alloc_page(image, KIMAGE_NO_DEST);
+        if ( !page )
+            return -ENOMEM;
+
+        ind_page = page_to_virt(page);
+        *image->entry = page_to_maddr(page) | IND_INDIRECTION;
+        image->entry = ind_page;
+        image->last_entry = ind_page +
+            ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1);
+    }
+    *image->entry = entry;
+    image->entry++;
+    *image->entry = 0;
+
+    return 0;
+}
+
+static int kimage_set_destination(struct kexec_image *image,
+                                  unsigned long destination)
+{
+    int result;
+
+    destination &= PAGE_MASK;
+    result = kimage_add_entry(image, destination | IND_DESTINATION);
+    if ( result == 0 )
+        image->destination = destination;
+
+    return result;
+}
+
+
+static int kimage_add_page(struct kexec_image *image, unsigned long page)
+{
+    int result;
+
+    page &= PAGE_MASK;
+    result = kimage_add_entry(image, page | IND_SOURCE);
+    if ( result == 0 )
+        image->destination += PAGE_SIZE;
+
+    return result;
+}
+
+
+static void kimage_free_extra_pages(struct kexec_image *image)
+{
+    kimage_free_page_list(&image->dest_pages);
+    kimage_free_page_list(&image->unuseable_pages);
+
+}
+
+static void kimage_terminate(struct kexec_image *image)
+{
+    if ( *image->entry != 0 )
+        image->entry++;
+
+    *image->entry = IND_DONE;
+}
+
+#define for_each_kimage_entry(image, ptr, entry)                      \
+    for ( ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE ); \
+          ptr = (entry & IND_INDIRECTION) ?                           \
+              maddr_to_virt((entry & PAGE_MASK)) : ptr + 1)
+
+static void kimage_free_entry(kimage_entry_t entry)
+{
+    struct page_info *page;
+
+    page = mfn_to_page(entry >> PAGE_SHIFT);
+    kimage_free_xen_page(page);
+}
+
+void kimage_free(struct kexec_image *image)
+{
+    kimage_entry_t *ptr, entry;
+    kimage_entry_t ind = 0;
+
+    if ( !image )
+        return;
+
+    kimage_free_extra_pages(image);
+    for_each_kimage_entry(image, ptr, entry)
+    {
+        if ( entry & IND_INDIRECTION )
+        {
+            /* Free the previous indirection page */
+            if ( ind & IND_INDIRECTION )
+                kimage_free_entry(ind);
+            /* Save this indirection page until we are
+             * done with it.
+             */
+            ind = entry;
+        }
+        else if ( entry & IND_SOURCE )
+            kimage_free_entry(entry);
+    }
+    /* Free the final indirection page */
+    if ( ind & IND_INDIRECTION )
+        kimage_free_entry(ind);
+
+    /* Free the kexec control pages... */
+    kimage_free_page_list(&image->control_pages);
+    xfree(image->segments);
+    xfree(image);
+}
+
+static kimage_entry_t *kimage_dst_used(struct kexec_image *image,
+                                       unsigned long page)
+{
+    kimage_entry_t *ptr, entry;
+    unsigned long destination = 0;
+
+    for_each_kimage_entry(image, ptr, entry)
+    {
+        if ( entry & IND_DESTINATION )
+            destination = entry & PAGE_MASK;
+        else if ( entry & IND_SOURCE )
+        {
+            if ( page == destination )
+                return ptr;
+            destination += PAGE_SIZE;
+        }
+    }
+
+    return NULL;
+}
+
+static struct page_info *kimage_alloc_page(struct kexec_image *image,
+                                      unsigned long destination)
+{
+    /*
+     * Here we implement safeguards to ensure that a source page
+     * is not copied to its destination page before the data on
+     * the destination page is no longer useful.
+     *
+     * To do this we maintain the invariant that a source page is
+     * either its own destination page, or it is not a
+     * destination page at all.
+     *
+     * That is slightly stronger than required, but the proof
+     * that no problems will not occur is trivial, and the
+     * implementation is simply to verify.
+     *
+     * When allocating all pages normally this algorithm will run
+     * in O(N) time, but in the worst case it will run in O(N^2)
+     * time.   If the runtime is a problem the data structures can
+     * be fixed.
+     */
+    struct page_info *page;
+    unsigned long addr;
+
+    /*
+     * Walk through the list of destination pages, and see if I
+     * have a match.
+     */
+    page_list_for_each(page, &image->dest_pages)
+    {
+        addr = page_to_mfn(page) << PAGE_SHIFT;
+        if ( addr == destination )
+        {
+            page_list_del(page, &image->dest_pages);
+            return page;
+        }
+    }
+    page = NULL;
+    for (;;)
+    {
+        kimage_entry_t *old;
+
+        /* Allocate a page, if we run out of memory give up */
+        page = kimage_alloc_xen_page();
+        if ( !page )
+            return NULL;
+        /* If the page cannot be used file it away */
+        if ( page_to_mfn(page) >
+             (KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT) )
+        {
+            page_list_add(page, &image->unuseable_pages);
+            continue;
+        }
+        addr = page_to_mfn(page) << PAGE_SHIFT;
+
+        /* If it is the destination page we want use it */
+        if ( addr == destination )
+            break;
+
+        /* If the page is not a destination page use it */
+        if ( !kimage_is_destination_range(image, addr,
+                                          addr + PAGE_SIZE) )
+            break;
+
+        /*
+         * I know that the page is someones destination page.
+         * See if there is already a source page for this
+         * destination page.  And if so swap the source pages.
+         */
+        old = kimage_dst_used(image, addr);
+        if ( old )
+        {
+            /* If so move it */
+            unsigned long old_addr;
+            struct page_info *old_page;
+
+            old_addr = *old & PAGE_MASK;
+            old_page = mfn_to_page(old_addr >> PAGE_SHIFT);
+            copy_page(page, old_page);
+            *old = addr | (*old & ~PAGE_MASK);
+
+            addr = old_addr;
+            page = old_page;
+            break;
+        }
+        else
+        {
+            /* Place the page on the destination list I
+             * will use it later.
+             */
+            page_list_add(page, &image->dest_pages);
+        }
+    }
+
+    return page;
+}
+
+static int kimage_load_normal_segment(struct kexec_image *image,
+                                      xen_kexec_segment_t *segment)
+{
+    unsigned long to_copy;
+    unsigned long src_offset;
+    unsigned long dest;
+    int ret;
+
+    to_copy = segment->buf_size;
+    src_offset = 0;
+    dest = segment->dest_maddr;
+
+    ret = kimage_set_destination(image, dest);
+    if ( ret < 0 )
+        return ret;
+
+    while ( to_copy )
+    {
+        unsigned long dest_mfn;
+        size_t dest_off;
+        struct page_info *page;
+        void *dest_va;
+        size_t size;
+
+        dest_mfn = dest >> PAGE_SHIFT;
+        dest_off = dest & ~PAGE_MASK;
+
+        size = min(PAGE_SIZE - dest_off, to_copy);
+
+        page = kimage_alloc_page(image, dest);
+        if ( !page )
+            return -ENOMEM;
+        ret = kimage_add_page(image, page_to_mfn(page) << PAGE_SHIFT);
+        if ( ret < 0 )
+            return ret;
+
+        dest_va = page_to_virt(page);
+        clear_page(dest_va);
+        ret = copy_from_guest_offset(dest_va + dest_off, segment->buf, src_offset, size);
+        if ( ret )
+            return -EFAULT;
+
+        to_copy -= size;
+        src_offset += size;
+        dest += size;
+    }
+
+    return 0;
+}
+
+static int kimage_load_crash_segment(struct kexec_image *image,
+                                     xen_kexec_segment_t *segment)
+{
+    /* For crash dumps kernels we simply copy the data from
+     * user space to it's destination.
+     * We do things a page at a time for the sake of kmap.
+     */
+	unsigned long dest;
+	unsigned long sbytes, dbytes;
+    int ret = 0;
+    unsigned long src_offset = 0;
+
+	sbytes = segment->buf_size;
+	dbytes = segment->dest_size;
+	dest = segment->dest_maddr;
+
+	while ( dbytes )
+    {
+        unsigned long dest_mfn;
+        size_t dest_off;
+        void *dest_va;
+		size_t schunk, dchunk;
+
+        dest_mfn = dest >> PAGE_SHIFT;
+        dest_off = dest & ~PAGE_MASK;
+
+		dchunk = min(PAGE_SIZE - dest_off, dbytes);
+        schunk = min(dchunk, sbytes);
+
+		dest_va = vmap(&dest_mfn, 1);
+        if ( dest_va == NULL )
+            return -EINVAL;
+
+        ret = copy_from_guest_offset(dest_va + dest_off, segment->buf, src_offset, schunk);
+        memset(dest_va + dest_off + schunk, 0, dchunk - schunk);
+
+		vunmap(dest_va);
+		if ( ret )
+            return -EFAULT;
+
+		dbytes -= dchunk;
+		sbytes -= schunk;
+		dest += dchunk;
+        src_offset += schunk;
+	}
+
+    return 0;
+}
+
+static int kimage_load_segment(struct kexec_image *image, xen_kexec_segment_t *segment)
+{
+    int result = -ENOMEM;
+
+    switch ( image->type )
+    {
+    case KEXEC_TYPE_DEFAULT:
+        result = kimage_load_normal_segment(image, segment);
+        break;
+    case KEXEC_TYPE_CRASH:
+        result = kimage_load_crash_segment(image, segment);
+        break;
+    }
+
+    return result;
+}
+
+int kimage_alloc(struct kexec_image **rimage, uint8_t type, uint16_t arch,
+                 uint64_t entry_maddr,
+                 uint32_t nr_segments, xen_kexec_segment_t *segment)
+{
+    int result;
+
+    switch( type )
+    {
+    case KEXEC_TYPE_DEFAULT:
+        result = kimage_normal_alloc(rimage, entry_maddr, nr_segments, segment);
+        break;
+    case KEXEC_TYPE_CRASH:
+        result = kimage_crash_alloc(rimage, entry_maddr, nr_segments, segment);
+        break;
+    default:
+        result = -EINVAL;
+        break;
+    }
+    if ( result < 0 )
+        return result;
+
+    (*rimage)->arch = arch;
+
+    return result;
+}
+
+int kimage_load_segments(struct kexec_image *image)
+{
+    int s;
+    int result;
+
+    for ( s = 0; s < image->nr_segments; s++ ) {
+        result = kimage_load_segment(image, &image->segments[s]);
+        if ( result < 0 )
+            return result;
+    }
+    kimage_terminate(image);
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/kimage.h b/xen/include/xen/kimage.h
new file mode 100644
index 0000000..dc71b87
--- /dev/null
+++ b/xen/include/xen/kimage.h
@@ -0,0 +1,64 @@
+#ifndef __XEN_KIMAGE_H__
+#define __XEN_KIMAGE_H__
+
+#include <xen/list.h>
+#include <xen/mm.h>
+#include <public/kexec.h>
+
+#define KEXEC_DESTINATION_MEMORY_LIMIT (~0ul)
+#define KEXEC_CONTROL_MEMORY_LIMIT (~0ul)
+#define KEXEC_CRASH_CONTROL_MEMORY_LIMIT (~0ul)
+#define KEXEC_SOURCE_MEMORY_LIMIT (~0ul)
+
+#define KEXEC_CONTROL_PAGE_SIZE PAGE_SIZE
+
+#define KEXEC_SEGMENT_MAX 16
+
+typedef unsigned long kimage_entry_t;
+#define IND_DESTINATION  0x1
+#define IND_INDIRECTION  0x2
+#define IND_DONE         0x4
+#define IND_SOURCE       0x8
+
+struct kexec_image {
+    uint8_t type;
+    uint16_t arch;
+    uint64_t entry_maddr;
+    uint32_t nr_segments;
+    xen_kexec_segment_t *segments;
+
+    kimage_entry_t head;
+    kimage_entry_t *entry;
+    kimage_entry_t *last_entry;
+
+    unsigned long destination;
+
+    struct page_info *control_code_page;
+    struct page_info *aux_page;
+
+    struct page_list_head control_pages;
+    struct page_list_head dest_pages;
+    struct page_list_head unuseable_pages;
+
+    /* Address of next control page to allocate for crash kernels. */
+    unsigned long control_page;
+};
+
+int kimage_alloc(struct kexec_image **rimage, uint8_t type, uint16_t arch,
+                 uint64_t entry_maddr,
+                 uint32_t nr_segments, xen_kexec_segment_t *segment);
+void kimage_free(struct kexec_image *image);
+int kimage_load_segments(struct kexec_image *image);
+struct page_info *kimage_alloc_control_page(struct kexec_image *image);
+
+#endif /* __XEN_KIMAGE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.2.5


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (7 preceding siblings ...)
  2013-02-21 17:48 ` David Vrabel
@ 2013-02-21 17:48 ` David Vrabel
  2013-02-21 17:48 ` David Vrabel
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-21 17:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Daniel Kiper, kexec, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

In the existing kexec hypercall, the load and unload ops depend on
internals of the Linux kernel (the page list and code page provided by
the kernel).  The code page is used to transition between Xen context
and the image so using kernel code doesn't make sense and will not
work for PVH guests.

Add replacement KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload ops
that no longer require a code page to be provided by the guest -- Xen
now provides the code for calling the image directly.

The new load op looks similar to the Linux kexec_load system call and
allows the guest to provide the image data to be loaded.  The guest
specifies the architecture of the image which may be a 32-bit subarch
of the hypervisor's architecture (i.e., an EM_386 image on an
EM_X86_64 hypervisor).

The toolstack can now load images without kernel involvement.  This is
required for supporting kexec when using a dom0 with an upstream
kernel.

Crash images are copied directly into the crash region on load.
Default images are copied into Xen heap pages and a list of source and
destination machine addresses is created.  This is list is used in
kexec_reloc() to relocate the image to its destination.

The old load and unload sub-ops are still available (as
KEXEC_CMD_load_v1 and KEXEC_CMD_unload_v1) and are implemented on top
of the new infrastructure.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/arch/x86/machine_kexec.c        |  261 ++++++++++++++++++-------
 xen/arch/x86/x86_64/Makefile        |    2 +-
 xen/arch/x86/x86_64/compat_kexec.S  |  187 -----------------
 xen/arch/x86/x86_64/kexec_reloc.S   |  229 +++++++++++++++++++++
 xen/common/kexec.c                  |  377 +++++++++++++++++++++++++++++------
 xen/include/asm-x86/fixmap.h        |    3 -
 xen/include/asm-x86/machine_kexec.h |   14 ++
 xen/include/xen/kexec.h             |   14 +-
 8 files changed, 755 insertions(+), 332 deletions(-)
 delete mode 100644 xen/arch/x86/x86_64/compat_kexec.S
 create mode 100644 xen/arch/x86/x86_64/kexec_reloc.S
 create mode 100644 xen/include/asm-x86/machine_kexec.h

diff --git a/xen/arch/x86/machine_kexec.c b/xen/arch/x86/machine_kexec.c
index 8191ef1..0ec8c56 100644
--- a/xen/arch/x86/machine_kexec.c
+++ b/xen/arch/x86/machine_kexec.c
@@ -1,9 +1,18 @@
 /******************************************************************************
  * machine_kexec.c
  *
+ * Copyright (C) 2013 Citrix Systems R&D Ltd.
+ *
+ * Portions derived from Linux's arch/x86/kernel/machine_kexec_64.c.
+ *
+ *   Copyright (C) 2002-2005 Eric Biederman  <ebiederm@xmission.com>
+ *
  * Xen port written by:
  * - Simon 'Horms' Horman <horms@verge.net.au>
  * - Magnus Damm <magnus@valinux.co.jp>
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
  */
 
 #include <xen/types.h>
@@ -11,63 +20,195 @@
 #include <xen/guest_access.h>
 #include <asm/fixmap.h>
 #include <asm/hpet.h>
+#include <asm/page.h>
+#include <asm/machine_kexec.h>
+
+static void init_level2_page(l2_pgentry_t *l2, unsigned long addr)
+{
+    unsigned long end_addr;
+
+    addr &= PAGE_MASK;
+    end_addr = addr + L2_PAGETABLE_ENTRIES * (1ul << L2_PAGETABLE_SHIFT);
+
+    while ( addr < end_addr )
+    {
+        l2e_write(l2++, l2e_from_paddr(addr, __PAGE_HYPERVISOR | _PAGE_PSE));
 
-typedef void (*relocate_new_kernel_t)(
-                unsigned long indirection_page,
-                unsigned long *page_list,
-                unsigned long start_address,
-                unsigned int preserve_context);
+        addr += 1ul << L2_PAGETABLE_SHIFT;
+    }
+}
 
-int machine_kexec_load(int type, int slot, xen_kexec_image_t *image)
+static int init_level3_page(struct kexec_image *image, l3_pgentry_t *l3,
+                            unsigned long addr, unsigned long last_addr)
 {
-    unsigned long prev_ma = 0;
-    int fix_base = FIX_KEXEC_BASE_0 + (slot * (KEXEC_XEN_NO_PAGES >> 1));
-    int k;
+    unsigned long end_addr;
 
-    /* setup fixmap to point to our pages and record the virtual address
-     * in every odd index in page_list[].
-     */
+    addr &= PAGE_MASK;
+    end_addr = addr + L3_PAGETABLE_ENTRIES * (1ul << L3_PAGETABLE_SHIFT);
+
+    while( (addr < last_addr) && (addr < end_addr) )
+    {
+        struct page_info *l2_page;
+        l2_pgentry_t *l2;
+
+        l2_page = kimage_alloc_control_page(image);
+        if ( !l2_page )
+            return -ENOMEM;
+        l2 = page_to_virt(l2_page);
+
+        init_level2_page(l2, addr);
+        l3e_write(l3++, l3e_from_page(l2_page, __PAGE_HYPERVISOR));
+
+        addr += 1ul << L3_PAGETABLE_SHIFT;
+    }
+
+    return 0;
+}
+
+/*
+ * Build a complete page table to identity map [addr, last_addr).
+ *
+ * Control pages are used so they do not overlap with the image source
+ * or destination.
+ */
+static int init_level4_page(struct kexec_image *image, l4_pgentry_t *l4,
+                            unsigned long addr, unsigned long last_addr)
+{
+    unsigned long end_addr;
+    int result;
+
+    addr &= PAGE_MASK;
+    end_addr = addr + L4_PAGETABLE_ENTRIES * (1ul << L4_PAGETABLE_SHIFT);
+
+    while ( (addr < last_addr) && (addr < end_addr) )
+    {
+        struct page_info *l3_page;
+        l3_pgentry_t *l3;
+
+        l3_page = kimage_alloc_control_page(image);
+        if ( !l3_page )
+            return -ENOMEM;
+        l3 = page_to_virt(l3_page);
+
+        result = init_level3_page(image, l3, addr, last_addr);
+        if (result)
+            return result;
+        l4e_write(l4++, l4e_from_page(l3_page, __PAGE_HYPERVISOR));
+
+        addr += 1ul << L4_PAGETABLE_SHIFT;
+    }
+
+    return 0;
+}
 
-    for ( k = 0; k < KEXEC_XEN_NO_PAGES; k++ )
+/*
+ * Add a mapping for the control code page to the same virtual address
+ * as kexec_reloc.  This allows us to keep running after these page
+ * tables are loaded in kexec_reloc.
+ * 
+ * We don't really need to allocate control pages here as these
+ * entries won't be used while the kexec image is being copied, but it
+ * makes clean-up easier.
+ */
+static int init_transition_pgtable(struct kexec_image *image, l4_pgentry_t *l4)
+{
+    struct page_info *l3_page;
+    struct page_info *l2_page;
+    struct page_info *l1_page;
+    unsigned long vaddr, paddr;
+    l3_pgentry_t *l3;
+    l2_pgentry_t *l2;
+    l1_pgentry_t *l1;
+
+    vaddr = (unsigned long)kexec_reloc;
+    paddr = page_to_maddr(image->control_code_page);
+
+    l4 += l4_table_offset(vaddr);
+    if ( !(l4e_get_flags(*l4) & _PAGE_PRESENT) )
+    {
+        l3_page = kimage_alloc_control_page(image);
+        if ( !l3_page )
+            return -ENOMEM;
+        l4e_write(l4, l4e_from_page(l3_page, __PAGE_HYPERVISOR));
+    }
+
+    l3 = l4e_to_l3e(*l4) + l3_table_offset(vaddr);
+    if ( !(l3e_get_flags(*l3) & _PAGE_PRESENT) )
+    {
+        l2_page = kimage_alloc_control_page(image);
+        if ( !l2_page )
+            return -ENOMEM;
+        l3e_write(l3, l3e_from_page(l2_page, __PAGE_HYPERVISOR));
+    }
+
+    l2 = l3e_to_l2e(*l3) + l2_table_offset(vaddr);
+    if ( !(l2e_get_flags(*l2) & _PAGE_PRESENT) )
+    {
+        l1_page = kimage_alloc_control_page(image);
+        if ( !l1_page )
+            return -ENOMEM;
+        l2e_write(l2, l2e_from_page(l1_page, __PAGE_HYPERVISOR));
+    }
+
+    l1 = l2e_to_l1e(*l2) + l1_table_offset(vaddr);
+    l1e_write(l1, l1e_from_pfn(paddr >> PAGE_SHIFT, __PAGE_HYPERVISOR));
+    return 0;
+}
+
+
+static int build_reloc_page_table(struct kexec_image *image)
+{
+    struct page_info *l4_page;
+    l4_pgentry_t *l4;
+    int result;
+
+    l4_page = kimage_alloc_control_page(image);
+    if ( !l4_page )
+        return -ENOMEM;
+
+    l4 = page_to_virt(l4_page);
+    result = init_level4_page(image, l4, 0, max_page << PAGE_SHIFT);
+    if ( result )
+        return result;
+
+    result = init_transition_pgtable(image, l4);
+    if ( result )
+        return result;
+
+    image->aux_page = l4_page;
+    return 0;
+}
+
+int machine_kexec_load(struct kexec_image *image)
+{
+    void *code_page;
+    int ret;
+
+    switch ( image->arch )
     {
-        if ( (k & 1) == 0 )
-        {
-            /* Even pages: machine address. */
-            prev_ma = image->page_list[k];
-        }
-        else
-        {
-            /* Odd pages: va for previous ma. */
-            if ( is_pv_32on64_domain(dom0) )
-            {
-                /*
-                 * The compatability bounce code sets up a page table
-                 * with a 1-1 mapping of the first 1G of memory so
-                 * VA==PA here.
-                 *
-                 * This Linux purgatory code still sets up separate
-                 * high and low mappings on the control page (entries
-                 * 0 and 1) but it is harmless if they are equal since
-                 * that PT is not live at the time.
-                 */
-                image->page_list[k] = prev_ma;
-            }
-            else
-            {
-                set_fixmap(fix_base + (k >> 1), prev_ma);
-                image->page_list[k] = fix_to_virt(fix_base + (k >> 1));
-            }
-        }
+    case EM_386:
+    case EM_X86_64:
+        break;
+    default:
+        return -EINVAL;
     }
 
+    code_page = page_to_virt(image->control_code_page);
+    memcpy(code_page, kexec_reloc, PAGE_SIZE);
+
+    ret = build_reloc_page_table(image);
+    if ( ret < 0 )
+        return ret;
+
     return 0;
 }
 
-void machine_kexec_unload(int type, int slot, xen_kexec_image_t *image)
+void machine_kexec_unload(struct kexec_image *image)
 {
+    /* no-op. kimage_free() frees all control pages. */
 }
 
-void machine_reboot_kexec(xen_kexec_image_t *image)
+void machine_reboot_kexec(struct kexec_image *image)
 {
     BUG_ON(smp_processor_id() != 0);
     smp_send_stop();
@@ -75,13 +216,10 @@ void machine_reboot_kexec(xen_kexec_image_t *image)
     BUG();
 }
 
-void machine_kexec(xen_kexec_image_t *image)
+void machine_kexec(struct kexec_image *image)
 {
-    struct desc_ptr gdt_desc = {
-        .base = (unsigned long)(boot_cpu_gdt_table - FIRST_RESERVED_GDT_ENTRY),
-        .limit = LAST_RESERVED_GDT_BYTE
-    };
     int i;
+    unsigned long reloc_flags = 0;
 
     /* We are about to permenantly jump out of the Xen context into the kexec
      * purgatory code.  We really dont want to be still servicing interupts.
@@ -109,29 +247,12 @@ void machine_kexec(xen_kexec_image_t *image)
      * not like running with NMIs disabled. */
     enable_nmis();
 
-    /*
-     * compat_machine_kexec() returns to idle pagetables, which requires us
-     * to be running on a static GDT mapping (idle pagetables have no GDT
-     * mappings in their per-domain mapping area).
-     */
-    asm volatile ( "lgdt %0" : : "m" (gdt_desc) );
+    if ( image->arch == EM_386 )
+        reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
 
-    if ( is_pv_32on64_domain(dom0) )
-    {
-        compat_machine_kexec(image->page_list[1],
-                             image->indirection_page,
-                             image->page_list,
-                             image->start_address);
-    }
-    else
-    {
-        relocate_new_kernel_t rnk;
-
-        rnk = (relocate_new_kernel_t) image->page_list[1];
-        (*rnk)(image->indirection_page, image->page_list,
-               image->start_address,
-               0 /* preserve_context */);
-    }
+    kexec_reloc(page_to_maddr(image->control_code_page), 
+                page_to_maddr(image->aux_page),
+                image->head, image->entry_maddr, reloc_flags);
 }
 
 int machine_kexec_get(xen_kexec_range_t *range)
diff --git a/xen/arch/x86/x86_64/Makefile b/xen/arch/x86/x86_64/Makefile
index d56e12d..7f8fb3d 100644
--- a/xen/arch/x86/x86_64/Makefile
+++ b/xen/arch/x86/x86_64/Makefile
@@ -11,11 +11,11 @@ obj-y += mmconf-fam10h.o
 obj-y += mmconfig_64.o
 obj-y += mmconfig-shared.o
 obj-y += compat.o
-obj-bin-y += compat_kexec.o
 obj-y += domain.o
 obj-y += physdev.o
 obj-y += platform_hypercall.o
 obj-y += cpu_idle.o
 obj-y += cpufreq.o
+obj-bin-y += kexec_reloc.o
 
 obj-$(crash_debug)   += gdbstub.o
diff --git a/xen/arch/x86/x86_64/compat_kexec.S b/xen/arch/x86/x86_64/compat_kexec.S
deleted file mode 100644
index fc92af9..0000000
--- a/xen/arch/x86/x86_64/compat_kexec.S
+++ /dev/null
@@ -1,187 +0,0 @@
-/*
- * Compatibility kexec handler.
- */
-
-/*
- * NOTE: We rely on Xen not relocating itself above the 4G boundary. This is
- * currently true but if it ever changes then compat_pg_table will
- * need to be moved back below 4G at run time.
- */
-
-#include <xen/config.h>
-
-#include <asm/asm_defns.h>
-#include <asm/msr.h>
-#include <asm/page.h>
-
-/* The unrelocated physical address of a symbol. */
-#define SYM_PHYS(sym)          ((sym) - __XEN_VIRT_START)
-
-/* Load physical address of symbol into register and relocate it. */
-#define RELOCATE_SYM(sym,reg)  mov $SYM_PHYS(sym), reg ; \
-                               add xen_phys_start(%rip), reg
-
-/*
- * Relocate a physical address in memory. Size of temporary register
- * determines size of the value to relocate.
- */
-#define RELOCATE_MEM(addr,reg) mov addr(%rip), reg ; \
-                               add xen_phys_start(%rip), reg ; \
-                               mov reg, addr(%rip)
-
-        .text
-
-        .code64
-
-ENTRY(compat_machine_kexec)
-        /* x86/64                        x86/32  */
-        /* %rdi - relocate_new_kernel_t  CALL    */
-        /* %rsi - indirection page       4(%esp) */
-        /* %rdx - page_list              8(%esp) */
-        /* %rcx - start address         12(%esp) */
-        /*        cpu has pae           16(%esp) */
-
-        /* Shim the 64 bit page_list into a 32 bit page_list. */
-        mov $12,%r9
-        lea compat_page_list(%rip), %rbx
-1:      dec %r9
-        movl (%rdx,%r9,8),%eax
-        movl %eax,(%rbx,%r9,4)
-        test %r9,%r9
-        jnz 1b
-
-        RELOCATE_SYM(compat_page_list,%rdx)
-
-        /* Relocate compatibility mode entry point address. */
-        RELOCATE_MEM(compatibility_mode_far,%eax)
-
-        /* Relocate compat_pg_table. */
-        RELOCATE_MEM(compat_pg_table,     %rax)
-        RELOCATE_MEM(compat_pg_table+0x8, %rax)
-        RELOCATE_MEM(compat_pg_table+0x10,%rax)
-        RELOCATE_MEM(compat_pg_table+0x18,%rax)
-
-        /*
-         * Setup an identity mapped region in PML4[0] of idle page
-         * table.
-         */
-        RELOCATE_SYM(l3_identmap,%rax)
-        or  $0x63,%rax
-        mov %rax, idle_pg_table(%rip)
-
-        /* Switch to idle page table. */
-        RELOCATE_SYM(idle_pg_table,%rax)
-        movq %rax, %cr3
-
-        /* Switch to identity mapped compatibility stack. */
-        RELOCATE_SYM(compat_stack,%rax)
-        movq %rax, %rsp
-
-        /* Save xen_phys_start for 32 bit code. */
-        movq xen_phys_start(%rip), %rbx
-
-        /* Jump to low identity mapping in compatibility mode. */
-        ljmp *compatibility_mode_far(%rip)
-        ud2
-
-compatibility_mode_far:
-        .long SYM_PHYS(compatibility_mode)
-        .long __HYPERVISOR_CS32
-
-        /*
-         * We use 5 words of stack for the arguments passed to the kernel. The
-         * kernel only uses 1 word before switching to its own stack. Allocate
-         * 16 words to give "plenty" of room.
-         */
-        .fill 16,4,0
-compat_stack:
-
-        .code32
-
-#undef RELOCATE_SYM
-#undef RELOCATE_MEM
-
-/*
- * Load physical address of symbol into register and relocate it. %rbx
- * contains xen_phys_start(%rip) saved before jump to compatibility
- * mode.
- */
-#define RELOCATE_SYM(sym,reg) mov $SYM_PHYS(sym), reg ; \
-                              add %ebx, reg
-
-compatibility_mode:
-        /* Setup some sane segments. */
-        movl $__HYPERVISOR_DS32, %eax
-        movl %eax, %ds
-        movl %eax, %es
-        movl %eax, %fs
-        movl %eax, %gs
-        movl %eax, %ss
-
-        /* Push arguments onto stack. */
-        pushl $0   /* 20(%esp) - preserve context */
-        pushl $1   /* 16(%esp) - cpu has pae */
-        pushl %ecx /* 12(%esp) - start address */
-        pushl %edx /*  8(%esp) - page list */
-        pushl %esi /*  4(%esp) - indirection page */
-        pushl %edi /*  0(%esp) - CALL */
-
-        /* Disable paging and therefore leave 64 bit mode. */
-        movl %cr0, %eax
-        andl $~X86_CR0_PG, %eax
-        movl %eax, %cr0
-
-        /* Switch to 32 bit page table. */
-        RELOCATE_SYM(compat_pg_table, %eax)
-        movl  %eax, %cr3
-
-        /* Clear MSR_EFER[LME], disabling long mode */
-        movl    $MSR_EFER,%ecx
-        rdmsr
-        btcl    $_EFER_LME,%eax
-        wrmsr
-
-        /* Re-enable paging, but only 32 bit mode now. */
-        movl %cr0, %eax
-        orl $X86_CR0_PG, %eax
-        movl %eax, %cr0
-        jmp 1f
-1:
-
-        popl %eax
-        call *%eax
-        ud2
-
-        .data
-        .align 4
-compat_page_list:
-        .fill 12,4,0
-
-        .align 32,0
-
-        /*
-         * These compat page tables contain an identity mapping of the
-         * first 4G of the physical address space.
-         */
-compat_pg_table:
-        .long SYM_PHYS(compat_pg_table_l2) + 0*PAGE_SIZE + 0x01, 0
-        .long SYM_PHYS(compat_pg_table_l2) + 1*PAGE_SIZE + 0x01, 0
-        .long SYM_PHYS(compat_pg_table_l2) + 2*PAGE_SIZE + 0x01, 0
-        .long SYM_PHYS(compat_pg_table_l2) + 3*PAGE_SIZE + 0x01, 0
-
-        .section .data.page_aligned, "aw", @progbits
-        .align PAGE_SIZE,0
-compat_pg_table_l2:
-        .macro identmap from=0, count=512
-        .if \count-1
-        identmap "(\from+0)","(\count/2)"
-        identmap "(\from+(0x200000*(\count/2)))","(\count/2)"
-        .else
-        .quad 0x00000000000000e3 + \from
-        .endif
-        .endm
-
-        identmap 0x00000000
-        identmap 0x40000000
-        identmap 0x80000000
-        identmap 0xc0000000
diff --git a/xen/arch/x86/x86_64/kexec_reloc.S b/xen/arch/x86/x86_64/kexec_reloc.S
new file mode 100644
index 0000000..e68842c
--- /dev/null
+++ b/xen/arch/x86/x86_64/kexec_reloc.S
@@ -0,0 +1,229 @@
+/*
+ * Relocate a kexec_image to its destination and call it.
+ *
+ * Copyright (C) 2013 Citrix Systems R&D Ltd.
+ *
+ * Portions derived from Linux's arch/x86/kernel/relocate_kernel_64.S.
+ * 
+ *   Copyright (C) 2002-2005 Eric Biederman  <ebiederm@xmission.com>
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
+ */
+#include <xen/config.h>
+
+#include <asm/asm_defns.h>
+#include <asm/msr.h>
+#include <asm/page.h>
+#include <asm/machine_kexec.h>
+
+/* The unrelocated physical address of a symbol. */
+#define SYM_PHYS(sym)          ((sym) - __XEN_VIRT_START)
+
+/* Load physical address of symbol into register and relocate it. */
+#define RELOCATE_SYM(sym,reg)  mov $SYM_PHYS(sym), reg ; \
+                               add xen_phys_start(%rip), reg
+
+#define DBG(c) \
+1:      mov     $0x3f8+5, %dx ; \
+        inb     %dx, %al     ; \
+        test    $0x20, %al   ; \
+        je      1b           ; \
+        mov     $0x3f8, %dx  ; \
+        mov     $c, %al      ; \
+        outb    %al, %dx     ;
+
+        .text
+	.align PAGE_SIZE
+        .code64
+
+ENTRY(kexec_reloc)
+        /* %rdi - code_page maddr */
+        /* %rsi - page table maddr */
+        /* %rdx - indirection page maddr */
+        /* %rcx - entry maddr */
+        /* %r8 - flags */
+
+        mov %rdx, %rbx
+
+        DBG('A')
+
+        /* Setup stack. */
+        RELOCATE_SYM(reloc_stack, %rax)
+        mov %rax, %rsp
+
+        DBG('B')
+
+        wbinvd
+        movq %cr4, %rax
+        andq $~(X86_CR4_PGE|X86_CR4_PCE|X86_CR4_MCE), %rax
+        movq %rax, %cr4
+
+        /* Load reloc page table. */
+        movq %rsi, %cr3
+
+        DBG('C')
+
+        /* Jump to identity mapped code. */
+        movq %rdi, %r9
+        addq $(identity_mapped - kexec_reloc), %r9
+
+        DBG('D')
+
+        jmp *%r9
+
+identity_mapped:
+        DBG('E')
+        
+        pushq %rcx
+        pushq %rbx
+        pushq %rsi
+        pushq %rdi
+
+        movq %rbx, %rdi
+        call swap_pages
+
+        popq %rdi
+        popq %rsi
+        popq %rbx
+        popq %rcx
+
+        DBG('F')
+
+        /* Need to switch to 32-bit mode? */
+        testq $KEXEC_RELOC_FLAG_COMPAT, %r8
+        jnz call_32_bit
+
+call_64_bit:
+        DBG('6')
+
+        /* Call the image entry point.  This should never return. */
+        call *%rcx
+        ud2
+
+call_32_bit:
+        DBG('3')
+
+        /* Relocate compatibility mode entry point address. */
+        movl %edi, %eax
+        addl $(compatibility_mode - kexec_reloc), %eax
+        movl %eax, compatibility_mode_far(%rip)
+
+        DBG('I')
+        
+        /* Load compat GDT. */
+        movq %rdi, %rax
+        addq $(compat_mode_gdt - kexec_reloc), %rax
+        movq %rax, (compat_mode_gdt_desc + 2)(%rip)
+        lgdt compat_mode_gdt_desc(%rip)
+
+        DBG('J')
+        
+        /* Enter compatibility mode. */
+        ljmp *compatibility_mode_far(%rip)
+
+swap_pages:
+        /* %rdi - indirection page maddr */
+        movq    %rdi, %rcx
+        xorq    %rdi, %rdi
+        xorq    %rsi, %rsi
+        jmp     1f
+
+0:      /* top, read another word for the indirection page */
+
+        movq    (%rbx), %rcx
+        addq    $8,     %rbx
+1:
+        testq   $0x1,   %rcx  /* is it a destination page? */
+        jz      2f
+        movq    %rcx,   %rdi
+        andq    $0xfffffffffffff000, %rdi
+        jmp     0b
+2:
+        testq   $0x2,   %rcx  /* is it an indirection page? */
+        jz      2f
+        movq    %rcx,   %rbx
+        andq    $0xfffffffffffff000, %rbx
+        jmp     0b
+2:
+        testq   $0x4,   %rcx  /* is it the done indicator? */
+        jz      2f
+        jmp     3f
+2:
+        testq   $0x8,   %rcx  /* is it the source indicator? */
+        jz      0b            /* Ignore it otherwise */
+        movq    %rcx,   %rsi  /* For ever source page do a copy */
+        andq    $0xfffffffffffff000, %rsi
+
+        movq    %rdi, %rdx
+        movq    %rsi, %rax
+
+        movq    %r10, %rdi
+        movq    $512,   %rcx
+        rep movsq
+
+        movq    %rax, %rdi
+        movq    %rdx, %rsi
+        movq    $512,   %rcx
+        rep movsq
+
+        movq    %rdx, %rdi
+        movq    %r10, %rsi
+        movq    $512,   %rcx
+        rep movsq
+
+        lea     PAGE_SIZE(%rax), %rsi
+        jmp     0b
+3:
+        ret
+
+        .code32
+
+compatibility_mode:
+        DBG('K')
+
+        /* Setup some sane segments. */
+        movl $0x0008, %eax
+        movl %eax, %ds
+        movl %eax, %es
+        movl %eax, %fs
+        movl %eax, %gs
+        movl %eax, %ss
+
+        DBG('L')
+        
+        /* Disable paging and therefore leave 64 bit mode. */
+        movl %cr0, %eax
+        andl $~X86_CR0_PG, %eax
+        movl %eax, %cr0
+
+        DBG('M')
+
+        /* Call the image entry point.  This should never return. */
+        call *%ecx
+        ud2
+
+        .align 16
+compatibility_mode_far:
+        .long SYM_PHYS(compatibility_mode)
+        .word 0x0010
+
+        .align 16
+compat_mode_gdt_desc:
+        .word (3*8)-1
+        .quad SYM_PHYS(compat_mode_gdt)
+
+        .align 16
+compat_mode_gdt:
+        .quad 0x0000000000000000     /* null                              */
+        .quad 0x00cf92000000ffff     /* 0x0008 ring 0 data                */
+        .quad 0x00cf9a000000ffff     /* 0x0010 ring 0 code, compatibility */
+
+        /*
+         * 16 words of stack are more than enough.
+         */
+        .fill 16,8,0
+reloc_stack:
+
+        .globl kexec_reloc_size
+        .set kexec_reloc_size, . - kexec_reloc
diff --git a/xen/common/kexec.c b/xen/common/kexec.c
index 2cbb62c..2926274 100644
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -23,6 +23,7 @@
 #include <xen/version.h>
 #include <xen/console.h>
 #include <xen/kexec.h>
+#include <xen/kimage.h>
 #include <public/elfnote.h>
 #include <xsm/xsm.h>
 #include <xen/cpu.h>
@@ -45,7 +46,7 @@ static Elf_Note *xen_crash_note;
 
 static cpumask_t crash_saved_cpus;
 
-static xen_kexec_image_t kexec_image[KEXEC_IMAGE_NR];
+static struct kexec_image *kexec_image[KEXEC_IMAGE_NR];
 
 #define KEXEC_FLAG_DEFAULT_POS   (KEXEC_IMAGE_NR + 0)
 #define KEXEC_FLAG_CRASH_POS     (KEXEC_IMAGE_NR + 1)
@@ -309,14 +310,14 @@ void kexec_crash(void)
     kexec_common_shutdown();
     kexec_crash_save_cpu();
     machine_crash_shutdown();
-    machine_kexec(&kexec_image[KEXEC_IMAGE_CRASH_BASE + pos]);
+    machine_kexec(kexec_image[KEXEC_IMAGE_CRASH_BASE + pos]);
 
     BUG();
 }
 
 static long kexec_reboot(void *_image)
 {
-    xen_kexec_image_t *image = _image;
+    struct kexec_image *image = _image;
 
     kexecing = TRUE;
 
@@ -732,63 +733,245 @@ static void crash_save_vmcoreinfo(void)
 #endif
 }
 
-static int kexec_load_unload_internal(unsigned long op, xen_kexec_load_v1_t *load)
+static void kexec_unload_image(struct kexec_image *image)
+{
+    if ( !image )
+        return;
+
+    machine_kexec_unload(image);
+}
+
+static int kexec_exec(XEN_GUEST_HANDLE_PARAM(void) uarg)
+{
+    xen_kexec_exec_t exec;
+    struct kexec_image *image;
+    int base, bit, pos, ret = -EINVAL;
+
+    if ( unlikely(copy_from_guest(&exec, uarg, 1)) )
+        return -EFAULT;
+
+    if ( kexec_load_get_bits(exec.type, &base, &bit) )
+        return -EINVAL;
+
+    pos = (test_bit(bit, &kexec_flags) != 0);
+
+    /* Only allow kexec/kdump into loaded images */
+    if ( !test_bit(base + pos, &kexec_flags) )
+        return -ENOENT;
+
+    switch (exec.type)
+    {
+    case KEXEC_TYPE_DEFAULT:
+        image = kexec_image[base + pos];
+        ret = continue_hypercall_on_cpu(0, kexec_reboot, image);
+        break;
+    case KEXEC_TYPE_CRASH:
+        kexec_crash(); /* Does not return */
+        break;
+    }
+
+    return -EINVAL; /* never reached */
+}
+
+static int kexec_swap_images(int type, struct kexec_image *new,
+                             struct kexec_image **old)
 {
-    xen_kexec_image_t *image;
     int base, bit, pos;
-    int ret = 0;
+    int new_slot, old_slot;
+
+    *old = NULL;
+
+    spin_lock(&kexec_lock);
+
+    if ( test_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) )
+    {
+        spin_unlock(&kexec_lock);
+        return -EBUSY;
+    }
 
-    if ( kexec_load_get_bits(load->type, &base, &bit) )
+    if ( kexec_load_get_bits(type, &base, &bit) )
         return -EINVAL;
 
     pos = (test_bit(bit, &kexec_flags) != 0);
+    old_slot = base + pos;
+    new_slot = base + !pos;
 
-    /* Load the user data into an unused image */
-    if ( op == KEXEC_CMD_kexec_load )
+    if ( new )
     {
-        image = &kexec_image[base + !pos];
+        kexec_image[new_slot] = new;
+        set_bit(new_slot, &kexec_flags);
+    }
+    change_bit(bit, &kexec_flags);
 
-        BUG_ON(test_bit((base + !pos), &kexec_flags)); /* must be free */
+    clear_bit(old_slot, &kexec_flags);
+    *old = kexec_image[old_slot];
 
-        memcpy(image, &load->image, sizeof(*image));
+    spin_unlock(&kexec_lock);
 
-        if ( !(ret = machine_kexec_load(load->type, base + !pos, image)) )
-        {
-            /* Set image present bit */
-            set_bit((base + !pos), &kexec_flags);
+    return 0;
+}
 
-            /* Make new image the active one */
-            change_bit(bit, &kexec_flags);
-        }
+static int kexec_load_slot(struct kexec_image *kimage)
+{
+    struct kexec_image *old_kimage;
+    int ret = -ENOMEM;
+
+    ret = machine_kexec_load(kimage);
+    if ( ret < 0 )
+        goto error;
+
+    crash_save_vmcoreinfo();
 
-        crash_save_vmcoreinfo();
+    ret = kexec_swap_images(kimage->type, kimage, &old_kimage);
+    if ( ret < 0 )
+        goto error;
+
+    kexec_unload_image(old_kimage);
+    
+    return 0;
+
+error:
+    kimage_free(kimage);
+    return ret;
+}
+
+static uint16_t kexec_load_v1_arch(void)
+{
+#ifdef CONFIG_X86
+    return is_pv_32on64_domain(dom0) ? EM_386 : EM_X86_64;
+#else
+    return EM_NONE;
+#endif
+}
+
+static int kexec_segments_add_page(unsigned *nr_segments,
+                                   xen_kexec_segment_t *segments,
+                                   unsigned long mfn)
+{
+    unsigned long maddr = mfn << PAGE_SHIFT;
+    int n = *nr_segments;
+
+    /* Need a new segment? */
+    if ( n == 0
+         || segments[n-1].dest_maddr + segments[n-1].dest_size != maddr )
+    {
+        n++;
+        if ( n == KEXEC_SEGMENT_MAX )
+            return -EINVAL;
+        *nr_segments = n;
+
+        set_xen_guest_handle(segments[n-1].buf, NULL);
+        segments[n-1].buf_size = 0;
+        segments[n-1].dest_maddr = maddr;
+        segments[n-1].dest_size = 0;
     }
 
-    /* Unload the old image if present and load successful */
-    if ( ret == 0 && !test_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) )
+    segments[n-1].dest_size += PAGE_SIZE;
+
+    return 0;
+}
+
+static int kexec_segments_from_ind_page(unsigned long mfn,
+                                        unsigned *nr_segments,
+                                        xen_kexec_segment_t *segments)
+{
+    void *page;
+    unsigned long *entry;
+    int ret;
+
+    page = vmap(&mfn, 1);
+    if ( page == NULL )
+        return -ENOMEM;
+
+    /*
+     * Walk the indirection page list, adding destination pages to the
+     * segments.
+     */
+    for ( entry = page; ; entry++ )
     {
-        if ( test_and_clear_bit((base + pos), &kexec_flags) )
+        unsigned long ind;
+
+        ind = (*entry) & 0xf;
+        mfn = (*entry) >> PAGE_SHIFT;
+
+        switch ( ind )
         {
-            image = &kexec_image[base + pos];
-            machine_kexec_unload(load->type, base + pos, image);
+        case IND_DESTINATION:
+            ret = kexec_segments_add_page(nr_segments, segments, mfn);
+            if ( ret < 0 )
+                return ret;
+            break;
+        case IND_INDIRECTION:
+            vunmap(page);
+            page = vmap(&mfn, 1);
+            if ( page == NULL )
+                return -ENOMEM;
+            entry = page;
+            break;
+        case IND_DONE:
+            goto done;
+        case IND_SOURCE:
+            break;
         }
     }
+done:
+    return 0;
+}
+
+static int kexec_do_load_v1(xen_kexec_load_v1_t *load)
+{
+    struct kexec_image *kimage = NULL;
+    xen_kexec_segment_t *segments;
+    uint16_t arch;
+    unsigned nr_segments = 0;
+    int ret;
+
+    arch = kexec_load_v1_arch();
+    if ( arch == EM_NONE )
+        return -ENOSYS;
+
+    segments = xmalloc_array(xen_kexec_segment_t, KEXEC_SEGMENT_MAX);
+    if ( segments == NULL )
+        return -ENOMEM;
+
+    ret = kexec_segments_from_ind_page(load->image.indirection_page >> PAGE_SHIFT,
+                                       &nr_segments, segments);
+    if ( ret < 0 )
+        goto error;
+
+    ret = kimage_alloc(&kimage, load->type, arch, load->image.start_address,
+                       nr_segments, segments);
+    if ( ret < 0 )
+        goto error;
+
+    /* kexec_reloc() uses the same format for the indirection pages so
+       reuse the provided ones. */
+    kimage->head = load->image.indirection_page;
+
+    ret = kexec_load_slot(kimage);
+    if ( ret < 0 )
+        goto error;
+
+    return 0;
 
+error:
+    if ( !kimage )
+        xfree(segments);
+    kimage_free(kimage);
     return ret;
 }
 
-static int kexec_load_unload(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) uarg)
+static int kexec_load_v1(XEN_GUEST_HANDLE_PARAM(void) uarg)
 {
     xen_kexec_load_v1_t load;
 
     if ( unlikely(copy_from_guest(&load, uarg, 1)) )
         return -EFAULT;
 
-    return kexec_load_unload_internal(op, &load);
+    return kexec_do_load_v1(&load);
 }
 
-static int kexec_load_unload_compat(unsigned long op,
-                                    XEN_GUEST_HANDLE_PARAM(void) uarg)
+static int kexec_load_v1_compat(XEN_GUEST_HANDLE_PARAM(void) uarg)
 {
 #ifdef CONFIG_COMPAT
     compat_kexec_load_v1_t compat_load;
@@ -807,49 +990,113 @@ static int kexec_load_unload_compat(unsigned long op,
     load.type = compat_load.type;
     XLAT_kexec_image(&load.image, &compat_load.image);
 
-    return kexec_load_unload_internal(op, &load);
-#else /* CONFIG_COMPAT */
+    return kexec_do_load_v1(&load);
+#else
     return 0;
-#endif /* CONFIG_COMPAT */
+#endif
 }
 
-static int kexec_exec(XEN_GUEST_HANDLE_PARAM(void) uarg)
+static int kexec_load(XEN_GUEST_HANDLE_PARAM(void) uarg)
 {
-    xen_kexec_exec_t exec;
-    xen_kexec_image_t *image;
-    int base, bit, pos, ret = -EINVAL;
+    xen_kexec_load_t load;
+    xen_kexec_segment_t *segments;
+    struct kexec_image *kimage = NULL;
+    int ret;
 
-    if ( unlikely(copy_from_guest(&exec, uarg, 1)) )
+    if ( copy_from_guest(&load, uarg, 1) )
         return -EFAULT;
 
-    if ( kexec_load_get_bits(exec.type, &base, &bit) )
+    if ( load.nr_segments >= KEXEC_SEGMENT_MAX )
         return -EINVAL;
 
-    pos = (test_bit(bit, &kexec_flags) != 0);
-
-    /* Only allow kexec/kdump into loaded images */
-    if ( !test_bit(base + pos, &kexec_flags) )
-        return -ENOENT;
+    segments = xmalloc_array(xen_kexec_segment_t, load.nr_segments);
+    if ( segments == NULL )
+        return -ENOMEM;
 
-    switch (exec.type)
+    if ( copy_from_guest(segments, load.segments, load.nr_segments) )
     {
-    case KEXEC_TYPE_DEFAULT:
-        image = &kexec_image[base + pos];
-        ret = continue_hypercall_on_cpu(0, kexec_reboot, image);
-        break;
-    case KEXEC_TYPE_CRASH:
-        kexec_crash(); /* Does not return */
-        break;
+        ret = -EFAULT;
+        goto error;
     }
 
-    return -EINVAL; /* never reached */
+    ret = kimage_alloc(&kimage, load.type, load.arch, load.entry_maddr,
+                       load.nr_segments, segments);
+    if ( ret < 0 )
+        goto error;
+
+    ret = kimage_load_segments(kimage);
+    if ( ret < 0 )
+        goto error;
+
+    ret = kexec_load_slot(kimage);
+    if ( ret < 0 )
+        goto error;
+
+    return 0;
+
+error:
+    if ( ! kimage )
+        xfree(segments);
+    kimage_free(kimage);
+    return ret;
+}
+
+static int kexec_do_unload(xen_kexec_unload_t *unload)
+{
+    struct kexec_image *old_kimage;
+    int ret;
+
+    ret = kexec_swap_images(unload->type, NULL, &old_kimage);
+    if ( ret < 0 )
+        return ret;
+
+    kexec_unload_image(old_kimage);
+
+    return 0;
+}
+
+static int kexec_unload_v1(XEN_GUEST_HANDLE_PARAM(void) uarg)
+{
+    xen_kexec_load_v1_t load;
+    xen_kexec_unload_t unload;
+
+    if ( copy_from_guest(&load, uarg, 1) )
+        return -EFAULT;
+
+    unload.type = load.type;
+    return kexec_do_unload(&unload);
+}
+
+static int kexec_unload_v1_compat(XEN_GUEST_HANDLE_PARAM(void) uarg)
+{
+#ifdef CONFIG_COMPAT
+    compat_kexec_load_v1_t compat_load;
+    xen_kexec_unload_t unload;
+
+    if ( copy_from_guest(&compat_load, uarg, 1) )
+        return -EFAULT;
+
+    unload.type = compat_load.type;
+    return kexec_do_unload(&unload);
+#else
+    return 0;
+#endif
+}
+
+static int kexec_unload(XEN_GUEST_HANDLE_PARAM(void) uarg)
+{
+    xen_kexec_unload_t unload;
+
+    if ( unlikely(copy_from_guest(&unload, uarg, 1)) )
+        return -EFAULT;
+
+    return kexec_do_unload(&unload);
 }
 
 static int do_kexec_op_internal(unsigned long op,
                                 XEN_GUEST_HANDLE_PARAM(void) uarg,
                                 bool_t compat)
 {
-    unsigned long flags;
     int ret = -EINVAL;
 
     ret = xsm_kexec(XSM_PRIV);
@@ -865,20 +1112,26 @@ static int do_kexec_op_internal(unsigned long op,
                 ret = kexec_get_range(uarg);
         break;
     case KEXEC_CMD_kexec_load_v1:
+        if ( compat )
+            ret = kexec_load_v1_compat(uarg);
+        else
+            ret = kexec_load_v1(uarg);
+        break;
     case KEXEC_CMD_kexec_unload_v1:
-        spin_lock_irqsave(&kexec_lock, flags);
-        if (!test_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags))
-        {
-                if (compat)
-                        ret = kexec_load_unload_compat(op, uarg);
-                else
-                        ret = kexec_load_unload(op, uarg);
-        }
-        spin_unlock_irqrestore(&kexec_lock, flags);
+        if ( compat )
+            ret = kexec_unload_v1_compat(uarg);
+        else
+            ret = kexec_unload_v1(uarg);
         break;
     case KEXEC_CMD_kexec:
         ret = kexec_exec(uarg);
         break;
+    case KEXEC_CMD_kexec_load:
+        ret = kexec_load(uarg);
+        break;
+    case KEXEC_CMD_kexec_unload:
+        ret = kexec_unload(uarg);
+        break;
     }
 
     return ret;
diff --git a/xen/include/asm-x86/fixmap.h b/xen/include/asm-x86/fixmap.h
index 2eefcf4..1695228 100644
--- a/xen/include/asm-x86/fixmap.h
+++ b/xen/include/asm-x86/fixmap.h
@@ -57,9 +57,6 @@ enum fixed_addresses {
     FIX_ACPI_END = FIX_ACPI_BEGIN + FIX_ACPI_PAGES - 1,
     FIX_HPET_BASE,
     FIX_CYCLONE_TIMER,
-    FIX_KEXEC_BASE_0,
-    FIX_KEXEC_BASE_END = FIX_KEXEC_BASE_0 \
-      + ((KEXEC_XEN_NO_PAGES >> 1) * KEXEC_IMAGE_NR) - 1,
     FIX_IOMMU_REGS_BASE_0,
     FIX_IOMMU_REGS_END = FIX_IOMMU_REGS_BASE_0 + MAX_IOMMUS-1,
     FIX_IOMMU_MMIO_BASE_0,
diff --git a/xen/include/asm-x86/machine_kexec.h b/xen/include/asm-x86/machine_kexec.h
new file mode 100644
index 0000000..ec41099
--- /dev/null
+++ b/xen/include/asm-x86/machine_kexec.h
@@ -0,0 +1,14 @@
+#ifndef __X86_MACHINE_KEXEC_H__
+#define __X86_MACHINE_KEXEC_H__
+
+#define KEXEC_RELOC_FLAG_COMPAT 0x1 /* 32-bit image */
+
+#ifndef __ASSEMBLY__
+
+extern void kexec_reloc(unsigned long reloc_code, unsigned long reloc_pt,
+                        unsigned long ind_maddr, unsigned long entry_maddr,
+                        unsigned long flags);
+
+#endif
+
+#endif /* __X86_MACHINE_KEXEC_H__ */
diff --git a/xen/include/xen/kexec.h b/xen/include/xen/kexec.h
index b3ca8b0..b1177d8 100644
--- a/xen/include/xen/kexec.h
+++ b/xen/include/xen/kexec.h
@@ -6,6 +6,7 @@
 #include <public/kexec.h>
 #include <asm/percpu.h>
 #include <xen/elfcore.h>
+#include <xen/kimage.h>
 
 typedef struct xen_kexec_reserve {
     unsigned long size;
@@ -40,11 +41,11 @@ extern enum low_crashinfo low_crashinfo_mode;
 extern paddr_t crashinfo_maxaddr_bits;
 void kexec_early_calculations(void);
 
-int machine_kexec_load(int type, int slot, xen_kexec_image_t *image);
-void machine_kexec_unload(int type, int slot, xen_kexec_image_t *image);
+int machine_kexec_load(struct kexec_image *image);
+void machine_kexec_unload(struct kexec_image *image);
 void machine_kexec_reserved(xen_kexec_reserve_t *reservation);
-void machine_reboot_kexec(xen_kexec_image_t *image);
-void machine_kexec(xen_kexec_image_t *image);
+void machine_reboot_kexec(struct kexec_image *image);
+void machine_kexec(struct kexec_image *image);
 void kexec_crash(void);
 void kexec_crash_save_cpu(void);
 crash_xen_info_t *kexec_crash_save_info(void);
@@ -52,11 +53,6 @@ void machine_crash_shutdown(void);
 int machine_kexec_get(xen_kexec_range_t *range);
 int machine_kexec_get_xen(xen_kexec_range_t *range);
 
-void compat_machine_kexec(unsigned long rnk,
-                          unsigned long indirection_page,
-                          unsigned long *page_list,
-                          unsigned long start_address);
-
 /* vmcoreinfo stuff */
 #define VMCOREINFO_BYTES           (4096)
 #define VMCOREINFO_NOTE_NAME       "VMCOREINFO_XEN"
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (8 preceding siblings ...)
  2013-02-21 17:48 ` [PATCH 5/8] kexec: extend hypercall with improved load/unload ops David Vrabel
@ 2013-02-21 17:48 ` David Vrabel
  2013-02-21 22:41   ` Daniel Kiper
                     ` (7 more replies)
  2013-02-21 17:48 ` [PATCH 6/8] xen: kexec crash image when dom0 crashes David Vrabel
                   ` (13 subsequent siblings)
  23 siblings, 8 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-21 17:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Daniel Kiper, kexec, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

In the existing kexec hypercall, the load and unload ops depend on
internals of the Linux kernel (the page list and code page provided by
the kernel).  The code page is used to transition between Xen context
and the image so using kernel code doesn't make sense and will not
work for PVH guests.

Add replacement KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload ops
that no longer require a code page to be provided by the guest -- Xen
now provides the code for calling the image directly.

The new load op looks similar to the Linux kexec_load system call and
allows the guest to provide the image data to be loaded.  The guest
specifies the architecture of the image which may be a 32-bit subarch
of the hypervisor's architecture (i.e., an EM_386 image on an
EM_X86_64 hypervisor).

The toolstack can now load images without kernel involvement.  This is
required for supporting kexec when using a dom0 with an upstream
kernel.

Crash images are copied directly into the crash region on load.
Default images are copied into Xen heap pages and a list of source and
destination machine addresses is created.  This is list is used in
kexec_reloc() to relocate the image to its destination.

The old load and unload sub-ops are still available (as
KEXEC_CMD_load_v1 and KEXEC_CMD_unload_v1) and are implemented on top
of the new infrastructure.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/arch/x86/machine_kexec.c        |  261 ++++++++++++++++++-------
 xen/arch/x86/x86_64/Makefile        |    2 +-
 xen/arch/x86/x86_64/compat_kexec.S  |  187 -----------------
 xen/arch/x86/x86_64/kexec_reloc.S   |  229 +++++++++++++++++++++
 xen/common/kexec.c                  |  377 +++++++++++++++++++++++++++++------
 xen/include/asm-x86/fixmap.h        |    3 -
 xen/include/asm-x86/machine_kexec.h |   14 ++
 xen/include/xen/kexec.h             |   14 +-
 8 files changed, 755 insertions(+), 332 deletions(-)
 delete mode 100644 xen/arch/x86/x86_64/compat_kexec.S
 create mode 100644 xen/arch/x86/x86_64/kexec_reloc.S
 create mode 100644 xen/include/asm-x86/machine_kexec.h

diff --git a/xen/arch/x86/machine_kexec.c b/xen/arch/x86/machine_kexec.c
index 8191ef1..0ec8c56 100644
--- a/xen/arch/x86/machine_kexec.c
+++ b/xen/arch/x86/machine_kexec.c
@@ -1,9 +1,18 @@
 /******************************************************************************
  * machine_kexec.c
  *
+ * Copyright (C) 2013 Citrix Systems R&D Ltd.
+ *
+ * Portions derived from Linux's arch/x86/kernel/machine_kexec_64.c.
+ *
+ *   Copyright (C) 2002-2005 Eric Biederman  <ebiederm@xmission.com>
+ *
  * Xen port written by:
  * - Simon 'Horms' Horman <horms@verge.net.au>
  * - Magnus Damm <magnus@valinux.co.jp>
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
  */
 
 #include <xen/types.h>
@@ -11,63 +20,195 @@
 #include <xen/guest_access.h>
 #include <asm/fixmap.h>
 #include <asm/hpet.h>
+#include <asm/page.h>
+#include <asm/machine_kexec.h>
+
+static void init_level2_page(l2_pgentry_t *l2, unsigned long addr)
+{
+    unsigned long end_addr;
+
+    addr &= PAGE_MASK;
+    end_addr = addr + L2_PAGETABLE_ENTRIES * (1ul << L2_PAGETABLE_SHIFT);
+
+    while ( addr < end_addr )
+    {
+        l2e_write(l2++, l2e_from_paddr(addr, __PAGE_HYPERVISOR | _PAGE_PSE));
 
-typedef void (*relocate_new_kernel_t)(
-                unsigned long indirection_page,
-                unsigned long *page_list,
-                unsigned long start_address,
-                unsigned int preserve_context);
+        addr += 1ul << L2_PAGETABLE_SHIFT;
+    }
+}
 
-int machine_kexec_load(int type, int slot, xen_kexec_image_t *image)
+static int init_level3_page(struct kexec_image *image, l3_pgentry_t *l3,
+                            unsigned long addr, unsigned long last_addr)
 {
-    unsigned long prev_ma = 0;
-    int fix_base = FIX_KEXEC_BASE_0 + (slot * (KEXEC_XEN_NO_PAGES >> 1));
-    int k;
+    unsigned long end_addr;
 
-    /* setup fixmap to point to our pages and record the virtual address
-     * in every odd index in page_list[].
-     */
+    addr &= PAGE_MASK;
+    end_addr = addr + L3_PAGETABLE_ENTRIES * (1ul << L3_PAGETABLE_SHIFT);
+
+    while( (addr < last_addr) && (addr < end_addr) )
+    {
+        struct page_info *l2_page;
+        l2_pgentry_t *l2;
+
+        l2_page = kimage_alloc_control_page(image);
+        if ( !l2_page )
+            return -ENOMEM;
+        l2 = page_to_virt(l2_page);
+
+        init_level2_page(l2, addr);
+        l3e_write(l3++, l3e_from_page(l2_page, __PAGE_HYPERVISOR));
+
+        addr += 1ul << L3_PAGETABLE_SHIFT;
+    }
+
+    return 0;
+}
+
+/*
+ * Build a complete page table to identity map [addr, last_addr).
+ *
+ * Control pages are used so they do not overlap with the image source
+ * or destination.
+ */
+static int init_level4_page(struct kexec_image *image, l4_pgentry_t *l4,
+                            unsigned long addr, unsigned long last_addr)
+{
+    unsigned long end_addr;
+    int result;
+
+    addr &= PAGE_MASK;
+    end_addr = addr + L4_PAGETABLE_ENTRIES * (1ul << L4_PAGETABLE_SHIFT);
+
+    while ( (addr < last_addr) && (addr < end_addr) )
+    {
+        struct page_info *l3_page;
+        l3_pgentry_t *l3;
+
+        l3_page = kimage_alloc_control_page(image);
+        if ( !l3_page )
+            return -ENOMEM;
+        l3 = page_to_virt(l3_page);
+
+        result = init_level3_page(image, l3, addr, last_addr);
+        if (result)
+            return result;
+        l4e_write(l4++, l4e_from_page(l3_page, __PAGE_HYPERVISOR));
+
+        addr += 1ul << L4_PAGETABLE_SHIFT;
+    }
+
+    return 0;
+}
 
-    for ( k = 0; k < KEXEC_XEN_NO_PAGES; k++ )
+/*
+ * Add a mapping for the control code page to the same virtual address
+ * as kexec_reloc.  This allows us to keep running after these page
+ * tables are loaded in kexec_reloc.
+ * 
+ * We don't really need to allocate control pages here as these
+ * entries won't be used while the kexec image is being copied, but it
+ * makes clean-up easier.
+ */
+static int init_transition_pgtable(struct kexec_image *image, l4_pgentry_t *l4)
+{
+    struct page_info *l3_page;
+    struct page_info *l2_page;
+    struct page_info *l1_page;
+    unsigned long vaddr, paddr;
+    l3_pgentry_t *l3;
+    l2_pgentry_t *l2;
+    l1_pgentry_t *l1;
+
+    vaddr = (unsigned long)kexec_reloc;
+    paddr = page_to_maddr(image->control_code_page);
+
+    l4 += l4_table_offset(vaddr);
+    if ( !(l4e_get_flags(*l4) & _PAGE_PRESENT) )
+    {
+        l3_page = kimage_alloc_control_page(image);
+        if ( !l3_page )
+            return -ENOMEM;
+        l4e_write(l4, l4e_from_page(l3_page, __PAGE_HYPERVISOR));
+    }
+
+    l3 = l4e_to_l3e(*l4) + l3_table_offset(vaddr);
+    if ( !(l3e_get_flags(*l3) & _PAGE_PRESENT) )
+    {
+        l2_page = kimage_alloc_control_page(image);
+        if ( !l2_page )
+            return -ENOMEM;
+        l3e_write(l3, l3e_from_page(l2_page, __PAGE_HYPERVISOR));
+    }
+
+    l2 = l3e_to_l2e(*l3) + l2_table_offset(vaddr);
+    if ( !(l2e_get_flags(*l2) & _PAGE_PRESENT) )
+    {
+        l1_page = kimage_alloc_control_page(image);
+        if ( !l1_page )
+            return -ENOMEM;
+        l2e_write(l2, l2e_from_page(l1_page, __PAGE_HYPERVISOR));
+    }
+
+    l1 = l2e_to_l1e(*l2) + l1_table_offset(vaddr);
+    l1e_write(l1, l1e_from_pfn(paddr >> PAGE_SHIFT, __PAGE_HYPERVISOR));
+    return 0;
+}
+
+
+static int build_reloc_page_table(struct kexec_image *image)
+{
+    struct page_info *l4_page;
+    l4_pgentry_t *l4;
+    int result;
+
+    l4_page = kimage_alloc_control_page(image);
+    if ( !l4_page )
+        return -ENOMEM;
+
+    l4 = page_to_virt(l4_page);
+    result = init_level4_page(image, l4, 0, max_page << PAGE_SHIFT);
+    if ( result )
+        return result;
+
+    result = init_transition_pgtable(image, l4);
+    if ( result )
+        return result;
+
+    image->aux_page = l4_page;
+    return 0;
+}
+
+int machine_kexec_load(struct kexec_image *image)
+{
+    void *code_page;
+    int ret;
+
+    switch ( image->arch )
     {
-        if ( (k & 1) == 0 )
-        {
-            /* Even pages: machine address. */
-            prev_ma = image->page_list[k];
-        }
-        else
-        {
-            /* Odd pages: va for previous ma. */
-            if ( is_pv_32on64_domain(dom0) )
-            {
-                /*
-                 * The compatability bounce code sets up a page table
-                 * with a 1-1 mapping of the first 1G of memory so
-                 * VA==PA here.
-                 *
-                 * This Linux purgatory code still sets up separate
-                 * high and low mappings on the control page (entries
-                 * 0 and 1) but it is harmless if they are equal since
-                 * that PT is not live at the time.
-                 */
-                image->page_list[k] = prev_ma;
-            }
-            else
-            {
-                set_fixmap(fix_base + (k >> 1), prev_ma);
-                image->page_list[k] = fix_to_virt(fix_base + (k >> 1));
-            }
-        }
+    case EM_386:
+    case EM_X86_64:
+        break;
+    default:
+        return -EINVAL;
     }
 
+    code_page = page_to_virt(image->control_code_page);
+    memcpy(code_page, kexec_reloc, PAGE_SIZE);
+
+    ret = build_reloc_page_table(image);
+    if ( ret < 0 )
+        return ret;
+
     return 0;
 }
 
-void machine_kexec_unload(int type, int slot, xen_kexec_image_t *image)
+void machine_kexec_unload(struct kexec_image *image)
 {
+    /* no-op. kimage_free() frees all control pages. */
 }
 
-void machine_reboot_kexec(xen_kexec_image_t *image)
+void machine_reboot_kexec(struct kexec_image *image)
 {
     BUG_ON(smp_processor_id() != 0);
     smp_send_stop();
@@ -75,13 +216,10 @@ void machine_reboot_kexec(xen_kexec_image_t *image)
     BUG();
 }
 
-void machine_kexec(xen_kexec_image_t *image)
+void machine_kexec(struct kexec_image *image)
 {
-    struct desc_ptr gdt_desc = {
-        .base = (unsigned long)(boot_cpu_gdt_table - FIRST_RESERVED_GDT_ENTRY),
-        .limit = LAST_RESERVED_GDT_BYTE
-    };
     int i;
+    unsigned long reloc_flags = 0;
 
     /* We are about to permenantly jump out of the Xen context into the kexec
      * purgatory code.  We really dont want to be still servicing interupts.
@@ -109,29 +247,12 @@ void machine_kexec(xen_kexec_image_t *image)
      * not like running with NMIs disabled. */
     enable_nmis();
 
-    /*
-     * compat_machine_kexec() returns to idle pagetables, which requires us
-     * to be running on a static GDT mapping (idle pagetables have no GDT
-     * mappings in their per-domain mapping area).
-     */
-    asm volatile ( "lgdt %0" : : "m" (gdt_desc) );
+    if ( image->arch == EM_386 )
+        reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
 
-    if ( is_pv_32on64_domain(dom0) )
-    {
-        compat_machine_kexec(image->page_list[1],
-                             image->indirection_page,
-                             image->page_list,
-                             image->start_address);
-    }
-    else
-    {
-        relocate_new_kernel_t rnk;
-
-        rnk = (relocate_new_kernel_t) image->page_list[1];
-        (*rnk)(image->indirection_page, image->page_list,
-               image->start_address,
-               0 /* preserve_context */);
-    }
+    kexec_reloc(page_to_maddr(image->control_code_page), 
+                page_to_maddr(image->aux_page),
+                image->head, image->entry_maddr, reloc_flags);
 }
 
 int machine_kexec_get(xen_kexec_range_t *range)
diff --git a/xen/arch/x86/x86_64/Makefile b/xen/arch/x86/x86_64/Makefile
index d56e12d..7f8fb3d 100644
--- a/xen/arch/x86/x86_64/Makefile
+++ b/xen/arch/x86/x86_64/Makefile
@@ -11,11 +11,11 @@ obj-y += mmconf-fam10h.o
 obj-y += mmconfig_64.o
 obj-y += mmconfig-shared.o
 obj-y += compat.o
-obj-bin-y += compat_kexec.o
 obj-y += domain.o
 obj-y += physdev.o
 obj-y += platform_hypercall.o
 obj-y += cpu_idle.o
 obj-y += cpufreq.o
+obj-bin-y += kexec_reloc.o
 
 obj-$(crash_debug)   += gdbstub.o
diff --git a/xen/arch/x86/x86_64/compat_kexec.S b/xen/arch/x86/x86_64/compat_kexec.S
deleted file mode 100644
index fc92af9..0000000
--- a/xen/arch/x86/x86_64/compat_kexec.S
+++ /dev/null
@@ -1,187 +0,0 @@
-/*
- * Compatibility kexec handler.
- */
-
-/*
- * NOTE: We rely on Xen not relocating itself above the 4G boundary. This is
- * currently true but if it ever changes then compat_pg_table will
- * need to be moved back below 4G at run time.
- */
-
-#include <xen/config.h>
-
-#include <asm/asm_defns.h>
-#include <asm/msr.h>
-#include <asm/page.h>
-
-/* The unrelocated physical address of a symbol. */
-#define SYM_PHYS(sym)          ((sym) - __XEN_VIRT_START)
-
-/* Load physical address of symbol into register and relocate it. */
-#define RELOCATE_SYM(sym,reg)  mov $SYM_PHYS(sym), reg ; \
-                               add xen_phys_start(%rip), reg
-
-/*
- * Relocate a physical address in memory. Size of temporary register
- * determines size of the value to relocate.
- */
-#define RELOCATE_MEM(addr,reg) mov addr(%rip), reg ; \
-                               add xen_phys_start(%rip), reg ; \
-                               mov reg, addr(%rip)
-
-        .text
-
-        .code64
-
-ENTRY(compat_machine_kexec)
-        /* x86/64                        x86/32  */
-        /* %rdi - relocate_new_kernel_t  CALL    */
-        /* %rsi - indirection page       4(%esp) */
-        /* %rdx - page_list              8(%esp) */
-        /* %rcx - start address         12(%esp) */
-        /*        cpu has pae           16(%esp) */
-
-        /* Shim the 64 bit page_list into a 32 bit page_list. */
-        mov $12,%r9
-        lea compat_page_list(%rip), %rbx
-1:      dec %r9
-        movl (%rdx,%r9,8),%eax
-        movl %eax,(%rbx,%r9,4)
-        test %r9,%r9
-        jnz 1b
-
-        RELOCATE_SYM(compat_page_list,%rdx)
-
-        /* Relocate compatibility mode entry point address. */
-        RELOCATE_MEM(compatibility_mode_far,%eax)
-
-        /* Relocate compat_pg_table. */
-        RELOCATE_MEM(compat_pg_table,     %rax)
-        RELOCATE_MEM(compat_pg_table+0x8, %rax)
-        RELOCATE_MEM(compat_pg_table+0x10,%rax)
-        RELOCATE_MEM(compat_pg_table+0x18,%rax)
-
-        /*
-         * Setup an identity mapped region in PML4[0] of idle page
-         * table.
-         */
-        RELOCATE_SYM(l3_identmap,%rax)
-        or  $0x63,%rax
-        mov %rax, idle_pg_table(%rip)
-
-        /* Switch to idle page table. */
-        RELOCATE_SYM(idle_pg_table,%rax)
-        movq %rax, %cr3
-
-        /* Switch to identity mapped compatibility stack. */
-        RELOCATE_SYM(compat_stack,%rax)
-        movq %rax, %rsp
-
-        /* Save xen_phys_start for 32 bit code. */
-        movq xen_phys_start(%rip), %rbx
-
-        /* Jump to low identity mapping in compatibility mode. */
-        ljmp *compatibility_mode_far(%rip)
-        ud2
-
-compatibility_mode_far:
-        .long SYM_PHYS(compatibility_mode)
-        .long __HYPERVISOR_CS32
-
-        /*
-         * We use 5 words of stack for the arguments passed to the kernel. The
-         * kernel only uses 1 word before switching to its own stack. Allocate
-         * 16 words to give "plenty" of room.
-         */
-        .fill 16,4,0
-compat_stack:
-
-        .code32
-
-#undef RELOCATE_SYM
-#undef RELOCATE_MEM
-
-/*
- * Load physical address of symbol into register and relocate it. %rbx
- * contains xen_phys_start(%rip) saved before jump to compatibility
- * mode.
- */
-#define RELOCATE_SYM(sym,reg) mov $SYM_PHYS(sym), reg ; \
-                              add %ebx, reg
-
-compatibility_mode:
-        /* Setup some sane segments. */
-        movl $__HYPERVISOR_DS32, %eax
-        movl %eax, %ds
-        movl %eax, %es
-        movl %eax, %fs
-        movl %eax, %gs
-        movl %eax, %ss
-
-        /* Push arguments onto stack. */
-        pushl $0   /* 20(%esp) - preserve context */
-        pushl $1   /* 16(%esp) - cpu has pae */
-        pushl %ecx /* 12(%esp) - start address */
-        pushl %edx /*  8(%esp) - page list */
-        pushl %esi /*  4(%esp) - indirection page */
-        pushl %edi /*  0(%esp) - CALL */
-
-        /* Disable paging and therefore leave 64 bit mode. */
-        movl %cr0, %eax
-        andl $~X86_CR0_PG, %eax
-        movl %eax, %cr0
-
-        /* Switch to 32 bit page table. */
-        RELOCATE_SYM(compat_pg_table, %eax)
-        movl  %eax, %cr3
-
-        /* Clear MSR_EFER[LME], disabling long mode */
-        movl    $MSR_EFER,%ecx
-        rdmsr
-        btcl    $_EFER_LME,%eax
-        wrmsr
-
-        /* Re-enable paging, but only 32 bit mode now. */
-        movl %cr0, %eax
-        orl $X86_CR0_PG, %eax
-        movl %eax, %cr0
-        jmp 1f
-1:
-
-        popl %eax
-        call *%eax
-        ud2
-
-        .data
-        .align 4
-compat_page_list:
-        .fill 12,4,0
-
-        .align 32,0
-
-        /*
-         * These compat page tables contain an identity mapping of the
-         * first 4G of the physical address space.
-         */
-compat_pg_table:
-        .long SYM_PHYS(compat_pg_table_l2) + 0*PAGE_SIZE + 0x01, 0
-        .long SYM_PHYS(compat_pg_table_l2) + 1*PAGE_SIZE + 0x01, 0
-        .long SYM_PHYS(compat_pg_table_l2) + 2*PAGE_SIZE + 0x01, 0
-        .long SYM_PHYS(compat_pg_table_l2) + 3*PAGE_SIZE + 0x01, 0
-
-        .section .data.page_aligned, "aw", @progbits
-        .align PAGE_SIZE,0
-compat_pg_table_l2:
-        .macro identmap from=0, count=512
-        .if \count-1
-        identmap "(\from+0)","(\count/2)"
-        identmap "(\from+(0x200000*(\count/2)))","(\count/2)"
-        .else
-        .quad 0x00000000000000e3 + \from
-        .endif
-        .endm
-
-        identmap 0x00000000
-        identmap 0x40000000
-        identmap 0x80000000
-        identmap 0xc0000000
diff --git a/xen/arch/x86/x86_64/kexec_reloc.S b/xen/arch/x86/x86_64/kexec_reloc.S
new file mode 100644
index 0000000..e68842c
--- /dev/null
+++ b/xen/arch/x86/x86_64/kexec_reloc.S
@@ -0,0 +1,229 @@
+/*
+ * Relocate a kexec_image to its destination and call it.
+ *
+ * Copyright (C) 2013 Citrix Systems R&D Ltd.
+ *
+ * Portions derived from Linux's arch/x86/kernel/relocate_kernel_64.S.
+ * 
+ *   Copyright (C) 2002-2005 Eric Biederman  <ebiederm@xmission.com>
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
+ */
+#include <xen/config.h>
+
+#include <asm/asm_defns.h>
+#include <asm/msr.h>
+#include <asm/page.h>
+#include <asm/machine_kexec.h>
+
+/* The unrelocated physical address of a symbol. */
+#define SYM_PHYS(sym)          ((sym) - __XEN_VIRT_START)
+
+/* Load physical address of symbol into register and relocate it. */
+#define RELOCATE_SYM(sym,reg)  mov $SYM_PHYS(sym), reg ; \
+                               add xen_phys_start(%rip), reg
+
+#define DBG(c) \
+1:      mov     $0x3f8+5, %dx ; \
+        inb     %dx, %al     ; \
+        test    $0x20, %al   ; \
+        je      1b           ; \
+        mov     $0x3f8, %dx  ; \
+        mov     $c, %al      ; \
+        outb    %al, %dx     ;
+
+        .text
+	.align PAGE_SIZE
+        .code64
+
+ENTRY(kexec_reloc)
+        /* %rdi - code_page maddr */
+        /* %rsi - page table maddr */
+        /* %rdx - indirection page maddr */
+        /* %rcx - entry maddr */
+        /* %r8 - flags */
+
+        mov %rdx, %rbx
+
+        DBG('A')
+
+        /* Setup stack. */
+        RELOCATE_SYM(reloc_stack, %rax)
+        mov %rax, %rsp
+
+        DBG('B')
+
+        wbinvd
+        movq %cr4, %rax
+        andq $~(X86_CR4_PGE|X86_CR4_PCE|X86_CR4_MCE), %rax
+        movq %rax, %cr4
+
+        /* Load reloc page table. */
+        movq %rsi, %cr3
+
+        DBG('C')
+
+        /* Jump to identity mapped code. */
+        movq %rdi, %r9
+        addq $(identity_mapped - kexec_reloc), %r9
+
+        DBG('D')
+
+        jmp *%r9
+
+identity_mapped:
+        DBG('E')
+        
+        pushq %rcx
+        pushq %rbx
+        pushq %rsi
+        pushq %rdi
+
+        movq %rbx, %rdi
+        call swap_pages
+
+        popq %rdi
+        popq %rsi
+        popq %rbx
+        popq %rcx
+
+        DBG('F')
+
+        /* Need to switch to 32-bit mode? */
+        testq $KEXEC_RELOC_FLAG_COMPAT, %r8
+        jnz call_32_bit
+
+call_64_bit:
+        DBG('6')
+
+        /* Call the image entry point.  This should never return. */
+        call *%rcx
+        ud2
+
+call_32_bit:
+        DBG('3')
+
+        /* Relocate compatibility mode entry point address. */
+        movl %edi, %eax
+        addl $(compatibility_mode - kexec_reloc), %eax
+        movl %eax, compatibility_mode_far(%rip)
+
+        DBG('I')
+        
+        /* Load compat GDT. */
+        movq %rdi, %rax
+        addq $(compat_mode_gdt - kexec_reloc), %rax
+        movq %rax, (compat_mode_gdt_desc + 2)(%rip)
+        lgdt compat_mode_gdt_desc(%rip)
+
+        DBG('J')
+        
+        /* Enter compatibility mode. */
+        ljmp *compatibility_mode_far(%rip)
+
+swap_pages:
+        /* %rdi - indirection page maddr */
+        movq    %rdi, %rcx
+        xorq    %rdi, %rdi
+        xorq    %rsi, %rsi
+        jmp     1f
+
+0:      /* top, read another word for the indirection page */
+
+        movq    (%rbx), %rcx
+        addq    $8,     %rbx
+1:
+        testq   $0x1,   %rcx  /* is it a destination page? */
+        jz      2f
+        movq    %rcx,   %rdi
+        andq    $0xfffffffffffff000, %rdi
+        jmp     0b
+2:
+        testq   $0x2,   %rcx  /* is it an indirection page? */
+        jz      2f
+        movq    %rcx,   %rbx
+        andq    $0xfffffffffffff000, %rbx
+        jmp     0b
+2:
+        testq   $0x4,   %rcx  /* is it the done indicator? */
+        jz      2f
+        jmp     3f
+2:
+        testq   $0x8,   %rcx  /* is it the source indicator? */
+        jz      0b            /* Ignore it otherwise */
+        movq    %rcx,   %rsi  /* For ever source page do a copy */
+        andq    $0xfffffffffffff000, %rsi
+
+        movq    %rdi, %rdx
+        movq    %rsi, %rax
+
+        movq    %r10, %rdi
+        movq    $512,   %rcx
+        rep movsq
+
+        movq    %rax, %rdi
+        movq    %rdx, %rsi
+        movq    $512,   %rcx
+        rep movsq
+
+        movq    %rdx, %rdi
+        movq    %r10, %rsi
+        movq    $512,   %rcx
+        rep movsq
+
+        lea     PAGE_SIZE(%rax), %rsi
+        jmp     0b
+3:
+        ret
+
+        .code32
+
+compatibility_mode:
+        DBG('K')
+
+        /* Setup some sane segments. */
+        movl $0x0008, %eax
+        movl %eax, %ds
+        movl %eax, %es
+        movl %eax, %fs
+        movl %eax, %gs
+        movl %eax, %ss
+
+        DBG('L')
+        
+        /* Disable paging and therefore leave 64 bit mode. */
+        movl %cr0, %eax
+        andl $~X86_CR0_PG, %eax
+        movl %eax, %cr0
+
+        DBG('M')
+
+        /* Call the image entry point.  This should never return. */
+        call *%ecx
+        ud2
+
+        .align 16
+compatibility_mode_far:
+        .long SYM_PHYS(compatibility_mode)
+        .word 0x0010
+
+        .align 16
+compat_mode_gdt_desc:
+        .word (3*8)-1
+        .quad SYM_PHYS(compat_mode_gdt)
+
+        .align 16
+compat_mode_gdt:
+        .quad 0x0000000000000000     /* null                              */
+        .quad 0x00cf92000000ffff     /* 0x0008 ring 0 data                */
+        .quad 0x00cf9a000000ffff     /* 0x0010 ring 0 code, compatibility */
+
+        /*
+         * 16 words of stack are more than enough.
+         */
+        .fill 16,8,0
+reloc_stack:
+
+        .globl kexec_reloc_size
+        .set kexec_reloc_size, . - kexec_reloc
diff --git a/xen/common/kexec.c b/xen/common/kexec.c
index 2cbb62c..2926274 100644
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -23,6 +23,7 @@
 #include <xen/version.h>
 #include <xen/console.h>
 #include <xen/kexec.h>
+#include <xen/kimage.h>
 #include <public/elfnote.h>
 #include <xsm/xsm.h>
 #include <xen/cpu.h>
@@ -45,7 +46,7 @@ static Elf_Note *xen_crash_note;
 
 static cpumask_t crash_saved_cpus;
 
-static xen_kexec_image_t kexec_image[KEXEC_IMAGE_NR];
+static struct kexec_image *kexec_image[KEXEC_IMAGE_NR];
 
 #define KEXEC_FLAG_DEFAULT_POS   (KEXEC_IMAGE_NR + 0)
 #define KEXEC_FLAG_CRASH_POS     (KEXEC_IMAGE_NR + 1)
@@ -309,14 +310,14 @@ void kexec_crash(void)
     kexec_common_shutdown();
     kexec_crash_save_cpu();
     machine_crash_shutdown();
-    machine_kexec(&kexec_image[KEXEC_IMAGE_CRASH_BASE + pos]);
+    machine_kexec(kexec_image[KEXEC_IMAGE_CRASH_BASE + pos]);
 
     BUG();
 }
 
 static long kexec_reboot(void *_image)
 {
-    xen_kexec_image_t *image = _image;
+    struct kexec_image *image = _image;
 
     kexecing = TRUE;
 
@@ -732,63 +733,245 @@ static void crash_save_vmcoreinfo(void)
 #endif
 }
 
-static int kexec_load_unload_internal(unsigned long op, xen_kexec_load_v1_t *load)
+static void kexec_unload_image(struct kexec_image *image)
+{
+    if ( !image )
+        return;
+
+    machine_kexec_unload(image);
+}
+
+static int kexec_exec(XEN_GUEST_HANDLE_PARAM(void) uarg)
+{
+    xen_kexec_exec_t exec;
+    struct kexec_image *image;
+    int base, bit, pos, ret = -EINVAL;
+
+    if ( unlikely(copy_from_guest(&exec, uarg, 1)) )
+        return -EFAULT;
+
+    if ( kexec_load_get_bits(exec.type, &base, &bit) )
+        return -EINVAL;
+
+    pos = (test_bit(bit, &kexec_flags) != 0);
+
+    /* Only allow kexec/kdump into loaded images */
+    if ( !test_bit(base + pos, &kexec_flags) )
+        return -ENOENT;
+
+    switch (exec.type)
+    {
+    case KEXEC_TYPE_DEFAULT:
+        image = kexec_image[base + pos];
+        ret = continue_hypercall_on_cpu(0, kexec_reboot, image);
+        break;
+    case KEXEC_TYPE_CRASH:
+        kexec_crash(); /* Does not return */
+        break;
+    }
+
+    return -EINVAL; /* never reached */
+}
+
+static int kexec_swap_images(int type, struct kexec_image *new,
+                             struct kexec_image **old)
 {
-    xen_kexec_image_t *image;
     int base, bit, pos;
-    int ret = 0;
+    int new_slot, old_slot;
+
+    *old = NULL;
+
+    spin_lock(&kexec_lock);
+
+    if ( test_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) )
+    {
+        spin_unlock(&kexec_lock);
+        return -EBUSY;
+    }
 
-    if ( kexec_load_get_bits(load->type, &base, &bit) )
+    if ( kexec_load_get_bits(type, &base, &bit) )
         return -EINVAL;
 
     pos = (test_bit(bit, &kexec_flags) != 0);
+    old_slot = base + pos;
+    new_slot = base + !pos;
 
-    /* Load the user data into an unused image */
-    if ( op == KEXEC_CMD_kexec_load )
+    if ( new )
     {
-        image = &kexec_image[base + !pos];
+        kexec_image[new_slot] = new;
+        set_bit(new_slot, &kexec_flags);
+    }
+    change_bit(bit, &kexec_flags);
 
-        BUG_ON(test_bit((base + !pos), &kexec_flags)); /* must be free */
+    clear_bit(old_slot, &kexec_flags);
+    *old = kexec_image[old_slot];
 
-        memcpy(image, &load->image, sizeof(*image));
+    spin_unlock(&kexec_lock);
 
-        if ( !(ret = machine_kexec_load(load->type, base + !pos, image)) )
-        {
-            /* Set image present bit */
-            set_bit((base + !pos), &kexec_flags);
+    return 0;
+}
 
-            /* Make new image the active one */
-            change_bit(bit, &kexec_flags);
-        }
+static int kexec_load_slot(struct kexec_image *kimage)
+{
+    struct kexec_image *old_kimage;
+    int ret = -ENOMEM;
+
+    ret = machine_kexec_load(kimage);
+    if ( ret < 0 )
+        goto error;
+
+    crash_save_vmcoreinfo();
 
-        crash_save_vmcoreinfo();
+    ret = kexec_swap_images(kimage->type, kimage, &old_kimage);
+    if ( ret < 0 )
+        goto error;
+
+    kexec_unload_image(old_kimage);
+    
+    return 0;
+
+error:
+    kimage_free(kimage);
+    return ret;
+}
+
+static uint16_t kexec_load_v1_arch(void)
+{
+#ifdef CONFIG_X86
+    return is_pv_32on64_domain(dom0) ? EM_386 : EM_X86_64;
+#else
+    return EM_NONE;
+#endif
+}
+
+static int kexec_segments_add_page(unsigned *nr_segments,
+                                   xen_kexec_segment_t *segments,
+                                   unsigned long mfn)
+{
+    unsigned long maddr = mfn << PAGE_SHIFT;
+    int n = *nr_segments;
+
+    /* Need a new segment? */
+    if ( n == 0
+         || segments[n-1].dest_maddr + segments[n-1].dest_size != maddr )
+    {
+        n++;
+        if ( n == KEXEC_SEGMENT_MAX )
+            return -EINVAL;
+        *nr_segments = n;
+
+        set_xen_guest_handle(segments[n-1].buf, NULL);
+        segments[n-1].buf_size = 0;
+        segments[n-1].dest_maddr = maddr;
+        segments[n-1].dest_size = 0;
     }
 
-    /* Unload the old image if present and load successful */
-    if ( ret == 0 && !test_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) )
+    segments[n-1].dest_size += PAGE_SIZE;
+
+    return 0;
+}
+
+static int kexec_segments_from_ind_page(unsigned long mfn,
+                                        unsigned *nr_segments,
+                                        xen_kexec_segment_t *segments)
+{
+    void *page;
+    unsigned long *entry;
+    int ret;
+
+    page = vmap(&mfn, 1);
+    if ( page == NULL )
+        return -ENOMEM;
+
+    /*
+     * Walk the indirection page list, adding destination pages to the
+     * segments.
+     */
+    for ( entry = page; ; entry++ )
     {
-        if ( test_and_clear_bit((base + pos), &kexec_flags) )
+        unsigned long ind;
+
+        ind = (*entry) & 0xf;
+        mfn = (*entry) >> PAGE_SHIFT;
+
+        switch ( ind )
         {
-            image = &kexec_image[base + pos];
-            machine_kexec_unload(load->type, base + pos, image);
+        case IND_DESTINATION:
+            ret = kexec_segments_add_page(nr_segments, segments, mfn);
+            if ( ret < 0 )
+                return ret;
+            break;
+        case IND_INDIRECTION:
+            vunmap(page);
+            page = vmap(&mfn, 1);
+            if ( page == NULL )
+                return -ENOMEM;
+            entry = page;
+            break;
+        case IND_DONE:
+            goto done;
+        case IND_SOURCE:
+            break;
         }
     }
+done:
+    return 0;
+}
+
+static int kexec_do_load_v1(xen_kexec_load_v1_t *load)
+{
+    struct kexec_image *kimage = NULL;
+    xen_kexec_segment_t *segments;
+    uint16_t arch;
+    unsigned nr_segments = 0;
+    int ret;
+
+    arch = kexec_load_v1_arch();
+    if ( arch == EM_NONE )
+        return -ENOSYS;
+
+    segments = xmalloc_array(xen_kexec_segment_t, KEXEC_SEGMENT_MAX);
+    if ( segments == NULL )
+        return -ENOMEM;
+
+    ret = kexec_segments_from_ind_page(load->image.indirection_page >> PAGE_SHIFT,
+                                       &nr_segments, segments);
+    if ( ret < 0 )
+        goto error;
+
+    ret = kimage_alloc(&kimage, load->type, arch, load->image.start_address,
+                       nr_segments, segments);
+    if ( ret < 0 )
+        goto error;
+
+    /* kexec_reloc() uses the same format for the indirection pages so
+       reuse the provided ones. */
+    kimage->head = load->image.indirection_page;
+
+    ret = kexec_load_slot(kimage);
+    if ( ret < 0 )
+        goto error;
+
+    return 0;
 
+error:
+    if ( !kimage )
+        xfree(segments);
+    kimage_free(kimage);
     return ret;
 }
 
-static int kexec_load_unload(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) uarg)
+static int kexec_load_v1(XEN_GUEST_HANDLE_PARAM(void) uarg)
 {
     xen_kexec_load_v1_t load;
 
     if ( unlikely(copy_from_guest(&load, uarg, 1)) )
         return -EFAULT;
 
-    return kexec_load_unload_internal(op, &load);
+    return kexec_do_load_v1(&load);
 }
 
-static int kexec_load_unload_compat(unsigned long op,
-                                    XEN_GUEST_HANDLE_PARAM(void) uarg)
+static int kexec_load_v1_compat(XEN_GUEST_HANDLE_PARAM(void) uarg)
 {
 #ifdef CONFIG_COMPAT
     compat_kexec_load_v1_t compat_load;
@@ -807,49 +990,113 @@ static int kexec_load_unload_compat(unsigned long op,
     load.type = compat_load.type;
     XLAT_kexec_image(&load.image, &compat_load.image);
 
-    return kexec_load_unload_internal(op, &load);
-#else /* CONFIG_COMPAT */
+    return kexec_do_load_v1(&load);
+#else
     return 0;
-#endif /* CONFIG_COMPAT */
+#endif
 }
 
-static int kexec_exec(XEN_GUEST_HANDLE_PARAM(void) uarg)
+static int kexec_load(XEN_GUEST_HANDLE_PARAM(void) uarg)
 {
-    xen_kexec_exec_t exec;
-    xen_kexec_image_t *image;
-    int base, bit, pos, ret = -EINVAL;
+    xen_kexec_load_t load;
+    xen_kexec_segment_t *segments;
+    struct kexec_image *kimage = NULL;
+    int ret;
 
-    if ( unlikely(copy_from_guest(&exec, uarg, 1)) )
+    if ( copy_from_guest(&load, uarg, 1) )
         return -EFAULT;
 
-    if ( kexec_load_get_bits(exec.type, &base, &bit) )
+    if ( load.nr_segments >= KEXEC_SEGMENT_MAX )
         return -EINVAL;
 
-    pos = (test_bit(bit, &kexec_flags) != 0);
-
-    /* Only allow kexec/kdump into loaded images */
-    if ( !test_bit(base + pos, &kexec_flags) )
-        return -ENOENT;
+    segments = xmalloc_array(xen_kexec_segment_t, load.nr_segments);
+    if ( segments == NULL )
+        return -ENOMEM;
 
-    switch (exec.type)
+    if ( copy_from_guest(segments, load.segments, load.nr_segments) )
     {
-    case KEXEC_TYPE_DEFAULT:
-        image = &kexec_image[base + pos];
-        ret = continue_hypercall_on_cpu(0, kexec_reboot, image);
-        break;
-    case KEXEC_TYPE_CRASH:
-        kexec_crash(); /* Does not return */
-        break;
+        ret = -EFAULT;
+        goto error;
     }
 
-    return -EINVAL; /* never reached */
+    ret = kimage_alloc(&kimage, load.type, load.arch, load.entry_maddr,
+                       load.nr_segments, segments);
+    if ( ret < 0 )
+        goto error;
+
+    ret = kimage_load_segments(kimage);
+    if ( ret < 0 )
+        goto error;
+
+    ret = kexec_load_slot(kimage);
+    if ( ret < 0 )
+        goto error;
+
+    return 0;
+
+error:
+    if ( ! kimage )
+        xfree(segments);
+    kimage_free(kimage);
+    return ret;
+}
+
+static int kexec_do_unload(xen_kexec_unload_t *unload)
+{
+    struct kexec_image *old_kimage;
+    int ret;
+
+    ret = kexec_swap_images(unload->type, NULL, &old_kimage);
+    if ( ret < 0 )
+        return ret;
+
+    kexec_unload_image(old_kimage);
+
+    return 0;
+}
+
+static int kexec_unload_v1(XEN_GUEST_HANDLE_PARAM(void) uarg)
+{
+    xen_kexec_load_v1_t load;
+    xen_kexec_unload_t unload;
+
+    if ( copy_from_guest(&load, uarg, 1) )
+        return -EFAULT;
+
+    unload.type = load.type;
+    return kexec_do_unload(&unload);
+}
+
+static int kexec_unload_v1_compat(XEN_GUEST_HANDLE_PARAM(void) uarg)
+{
+#ifdef CONFIG_COMPAT
+    compat_kexec_load_v1_t compat_load;
+    xen_kexec_unload_t unload;
+
+    if ( copy_from_guest(&compat_load, uarg, 1) )
+        return -EFAULT;
+
+    unload.type = compat_load.type;
+    return kexec_do_unload(&unload);
+#else
+    return 0;
+#endif
+}
+
+static int kexec_unload(XEN_GUEST_HANDLE_PARAM(void) uarg)
+{
+    xen_kexec_unload_t unload;
+
+    if ( unlikely(copy_from_guest(&unload, uarg, 1)) )
+        return -EFAULT;
+
+    return kexec_do_unload(&unload);
 }
 
 static int do_kexec_op_internal(unsigned long op,
                                 XEN_GUEST_HANDLE_PARAM(void) uarg,
                                 bool_t compat)
 {
-    unsigned long flags;
     int ret = -EINVAL;
 
     ret = xsm_kexec(XSM_PRIV);
@@ -865,20 +1112,26 @@ static int do_kexec_op_internal(unsigned long op,
                 ret = kexec_get_range(uarg);
         break;
     case KEXEC_CMD_kexec_load_v1:
+        if ( compat )
+            ret = kexec_load_v1_compat(uarg);
+        else
+            ret = kexec_load_v1(uarg);
+        break;
     case KEXEC_CMD_kexec_unload_v1:
-        spin_lock_irqsave(&kexec_lock, flags);
-        if (!test_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags))
-        {
-                if (compat)
-                        ret = kexec_load_unload_compat(op, uarg);
-                else
-                        ret = kexec_load_unload(op, uarg);
-        }
-        spin_unlock_irqrestore(&kexec_lock, flags);
+        if ( compat )
+            ret = kexec_unload_v1_compat(uarg);
+        else
+            ret = kexec_unload_v1(uarg);
         break;
     case KEXEC_CMD_kexec:
         ret = kexec_exec(uarg);
         break;
+    case KEXEC_CMD_kexec_load:
+        ret = kexec_load(uarg);
+        break;
+    case KEXEC_CMD_kexec_unload:
+        ret = kexec_unload(uarg);
+        break;
     }
 
     return ret;
diff --git a/xen/include/asm-x86/fixmap.h b/xen/include/asm-x86/fixmap.h
index 2eefcf4..1695228 100644
--- a/xen/include/asm-x86/fixmap.h
+++ b/xen/include/asm-x86/fixmap.h
@@ -57,9 +57,6 @@ enum fixed_addresses {
     FIX_ACPI_END = FIX_ACPI_BEGIN + FIX_ACPI_PAGES - 1,
     FIX_HPET_BASE,
     FIX_CYCLONE_TIMER,
-    FIX_KEXEC_BASE_0,
-    FIX_KEXEC_BASE_END = FIX_KEXEC_BASE_0 \
-      + ((KEXEC_XEN_NO_PAGES >> 1) * KEXEC_IMAGE_NR) - 1,
     FIX_IOMMU_REGS_BASE_0,
     FIX_IOMMU_REGS_END = FIX_IOMMU_REGS_BASE_0 + MAX_IOMMUS-1,
     FIX_IOMMU_MMIO_BASE_0,
diff --git a/xen/include/asm-x86/machine_kexec.h b/xen/include/asm-x86/machine_kexec.h
new file mode 100644
index 0000000..ec41099
--- /dev/null
+++ b/xen/include/asm-x86/machine_kexec.h
@@ -0,0 +1,14 @@
+#ifndef __X86_MACHINE_KEXEC_H__
+#define __X86_MACHINE_KEXEC_H__
+
+#define KEXEC_RELOC_FLAG_COMPAT 0x1 /* 32-bit image */
+
+#ifndef __ASSEMBLY__
+
+extern void kexec_reloc(unsigned long reloc_code, unsigned long reloc_pt,
+                        unsigned long ind_maddr, unsigned long entry_maddr,
+                        unsigned long flags);
+
+#endif
+
+#endif /* __X86_MACHINE_KEXEC_H__ */
diff --git a/xen/include/xen/kexec.h b/xen/include/xen/kexec.h
index b3ca8b0..b1177d8 100644
--- a/xen/include/xen/kexec.h
+++ b/xen/include/xen/kexec.h
@@ -6,6 +6,7 @@
 #include <public/kexec.h>
 #include <asm/percpu.h>
 #include <xen/elfcore.h>
+#include <xen/kimage.h>
 
 typedef struct xen_kexec_reserve {
     unsigned long size;
@@ -40,11 +41,11 @@ extern enum low_crashinfo low_crashinfo_mode;
 extern paddr_t crashinfo_maxaddr_bits;
 void kexec_early_calculations(void);
 
-int machine_kexec_load(int type, int slot, xen_kexec_image_t *image);
-void machine_kexec_unload(int type, int slot, xen_kexec_image_t *image);
+int machine_kexec_load(struct kexec_image *image);
+void machine_kexec_unload(struct kexec_image *image);
 void machine_kexec_reserved(xen_kexec_reserve_t *reservation);
-void machine_reboot_kexec(xen_kexec_image_t *image);
-void machine_kexec(xen_kexec_image_t *image);
+void machine_reboot_kexec(struct kexec_image *image);
+void machine_kexec(struct kexec_image *image);
 void kexec_crash(void);
 void kexec_crash_save_cpu(void);
 crash_xen_info_t *kexec_crash_save_info(void);
@@ -52,11 +53,6 @@ void machine_crash_shutdown(void);
 int machine_kexec_get(xen_kexec_range_t *range);
 int machine_kexec_get_xen(xen_kexec_range_t *range);
 
-void compat_machine_kexec(unsigned long rnk,
-                          unsigned long indirection_page,
-                          unsigned long *page_list,
-                          unsigned long start_address);
-
 /* vmcoreinfo stuff */
 #define VMCOREINFO_BYTES           (4096)
 #define VMCOREINFO_NOTE_NAME       "VMCOREINFO_XEN"
-- 
1.7.2.5


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 6/8] xen: kexec crash image when dom0 crashes
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (10 preceding siblings ...)
  2013-02-21 17:48 ` [PATCH 6/8] xen: kexec crash image when dom0 crashes David Vrabel
@ 2013-02-21 17:48 ` David Vrabel
  2013-02-21 17:48 ` [PATCH 7/8] libxc: add hypercall buffer arrays David Vrabel
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-21 17:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Daniel Kiper, kexec, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/shutdown.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/xen/common/shutdown.c b/xen/common/shutdown.c
index b18ef5d..12aa034 100644
--- a/xen/common/shutdown.c
+++ b/xen/common/shutdown.c
@@ -46,6 +46,9 @@ void dom0_shutdown(u8 reason)
     {
         debugger_trap_immediate();
         printk("Domain 0 crashed: ");
+#ifdef CONFIG_KEXEC
+        kexec_crash();
+#endif
         maybe_reboot();
         break; /* not reached */
     }
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 6/8] xen: kexec crash image when dom0 crashes
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (9 preceding siblings ...)
  2013-02-21 17:48 ` David Vrabel
@ 2013-02-21 17:48 ` David Vrabel
  2013-02-21 17:48 ` David Vrabel
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-21 17:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Daniel Kiper, kexec, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/shutdown.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/xen/common/shutdown.c b/xen/common/shutdown.c
index b18ef5d..12aa034 100644
--- a/xen/common/shutdown.c
+++ b/xen/common/shutdown.c
@@ -46,6 +46,9 @@ void dom0_shutdown(u8 reason)
     {
         debugger_trap_immediate();
         printk("Domain 0 crashed: ");
+#ifdef CONFIG_KEXEC
+        kexec_crash();
+#endif
         maybe_reboot();
         break; /* not reached */
     }
-- 
1.7.2.5


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 7/8] libxc: add hypercall buffer arrays
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (11 preceding siblings ...)
  2013-02-21 17:48 ` David Vrabel
@ 2013-02-21 17:48 ` David Vrabel
  2013-02-21 17:48 ` David Vrabel
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-21 17:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Daniel Kiper, kexec, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

Hypercall buffer arrays are used when a hypercall takes a variable
length array of buffers.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 tools/libxc/xc_hcall_buf.c |   73 ++++++++++++++++++++++++++++++++++++++++++++
 tools/libxc/xenctrl.h      |   27 ++++++++++++++++
 2 files changed, 100 insertions(+), 0 deletions(-)

diff --git a/tools/libxc/xc_hcall_buf.c b/tools/libxc/xc_hcall_buf.c
index ced9abd..3e01f3f 100644
--- a/tools/libxc/xc_hcall_buf.c
+++ b/tools/libxc/xc_hcall_buf.c
@@ -228,6 +228,79 @@ void xc__hypercall_bounce_post(xc_interface *xch, xc_hypercall_buffer_t *b)
     xc__hypercall_buffer_free(xch, b);
 }
 
+struct xc_hypercall_buffer_array {
+    unsigned max_bufs;
+    xc_hypercall_buffer_t *bufs;
+};
+
+xc_hypercall_buffer_array_t *xc_hypercall_buffer_array_create(xc_interface *xch,
+                                                              unsigned n)
+{
+    xc_hypercall_buffer_array_t *array;
+    xc_hypercall_buffer_t *bufs = NULL;
+
+    array = malloc(sizeof(*array));
+    if ( array == NULL )
+        goto error;
+
+    bufs = calloc(n, sizeof(*bufs));
+    if ( bufs == NULL )
+        goto error;
+
+    array->max_bufs = n;
+    array->bufs     = bufs;
+
+    return array;
+
+error:
+    free(bufs);
+    free(array);
+    return NULL;
+}
+
+void *xc__hypercall_buffer_array_alloc(xc_interface *xch,
+                                       xc_hypercall_buffer_array_t *array,
+                                       unsigned index,
+                                       xc_hypercall_buffer_t *hbuf,
+                                       size_t size)
+{
+    void *buf;
+
+    if ( index >= array->max_bufs || array->bufs[index].hbuf )
+        abort();
+
+    buf = xc__hypercall_buffer_alloc(xch, hbuf, size);
+    if ( buf )
+        array->bufs[index] = *hbuf;
+    return buf;
+}
+
+void *xc__hypercall_buffer_array_get(xc_interface *xch,
+                                     xc_hypercall_buffer_array_t *array,
+                                     unsigned index,
+                                     xc_hypercall_buffer_t *hbuf)
+{
+    if ( index >= array->max_bufs || array->bufs[index].hbuf == NULL )
+        abort();
+
+    *hbuf = array->bufs[index];
+    return array->bufs[index].hbuf;
+}
+
+void xc_hypercall_buffer_array_destroy(xc_interface *xc,
+                                       xc_hypercall_buffer_array_t *array)
+{
+    unsigned i;
+
+    if ( array == NULL )
+        return;
+
+    for (i = 0; i < array->max_bufs; i++ )
+        xc__hypercall_buffer_free(xc, &array->bufs[i]);
+    free(array->bufs);
+    free(array);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 32122fd..c3b2c28 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -317,6 +317,33 @@ void xc__hypercall_buffer_free_pages(xc_interface *xch, xc_hypercall_buffer_t *b
 #define xc_hypercall_buffer_free_pages(_xch, _name, _nr) xc__hypercall_buffer_free_pages(_xch, HYPERCALL_BUFFER(_name), _nr)
 
 /*
+ * Array of hypercall buffers.
+ *
+ * Create an array with xc_hypercall_buffer_array_create() and
+ * populate it by declaring one hypercall buffer in a loop and
+ * allocating the buffer with xc_hypercall_buffer_array_alloc().
+ *
+ * To access a previously allocated buffers, declare a new hypercall
+ * buffer and call xc_hypercall_buffer_array_get().
+ *
+ * Destroy the array with xc_hypercall_buffer_array_destroy() to free
+ * the array and all its alocated hypercall buffers.
+ */
+struct xc_hypercall_buffer_array;
+typedef struct xc_hypercall_buffer_array xc_hypercall_buffer_array_t;
+
+xc_hypercall_buffer_array_t *xc_hypercall_buffer_array_create(xc_interface *xch, unsigned n);
+void *xc__hypercall_buffer_array_alloc(xc_interface *xch, xc_hypercall_buffer_array_t *array,
+                                       unsigned index, xc_hypercall_buffer_t *hbuf, size_t size);
+#define xc_hypercall_buffer_array_alloc(_xch, _array, _index, _name, _size) \
+    xc__hypercall_buffer_array_alloc(_xch, _array, _index, HYPERCALL_BUFFER(_name), _size)
+void *xc__hypercall_buffer_array_get(xc_interface *xch, xc_hypercall_buffer_array_t *array,
+                                     unsigned index, xc_hypercall_buffer_t *hbuf);
+#define xc_hypercall_buffer_array_get(_xch, _array, _index, _name, _size) \
+    xc__hypercall_buffer_array_get(_xch, _array, _index, HYPERCALL_BUFFER(_name))
+void xc_hypercall_buffer_array_destroy(xc_interface *xc, xc_hypercall_buffer_array_t *array);
+
+/*
  * CPUMAP handling
  */
 typedef uint8_t *xc_cpumap_t;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 7/8] libxc: add hypercall buffer arrays
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (12 preceding siblings ...)
  2013-02-21 17:48 ` [PATCH 7/8] libxc: add hypercall buffer arrays David Vrabel
@ 2013-02-21 17:48 ` David Vrabel
  2013-03-06 14:25   ` Ian Jackson
                     ` (3 more replies)
  2013-02-21 17:48 ` [PATCH 8/8] libxc: add API for kexec hypercall David Vrabel
                   ` (9 subsequent siblings)
  23 siblings, 4 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-21 17:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Daniel Kiper, kexec, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

Hypercall buffer arrays are used when a hypercall takes a variable
length array of buffers.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 tools/libxc/xc_hcall_buf.c |   73 ++++++++++++++++++++++++++++++++++++++++++++
 tools/libxc/xenctrl.h      |   27 ++++++++++++++++
 2 files changed, 100 insertions(+), 0 deletions(-)

diff --git a/tools/libxc/xc_hcall_buf.c b/tools/libxc/xc_hcall_buf.c
index ced9abd..3e01f3f 100644
--- a/tools/libxc/xc_hcall_buf.c
+++ b/tools/libxc/xc_hcall_buf.c
@@ -228,6 +228,79 @@ void xc__hypercall_bounce_post(xc_interface *xch, xc_hypercall_buffer_t *b)
     xc__hypercall_buffer_free(xch, b);
 }
 
+struct xc_hypercall_buffer_array {
+    unsigned max_bufs;
+    xc_hypercall_buffer_t *bufs;
+};
+
+xc_hypercall_buffer_array_t *xc_hypercall_buffer_array_create(xc_interface *xch,
+                                                              unsigned n)
+{
+    xc_hypercall_buffer_array_t *array;
+    xc_hypercall_buffer_t *bufs = NULL;
+
+    array = malloc(sizeof(*array));
+    if ( array == NULL )
+        goto error;
+
+    bufs = calloc(n, sizeof(*bufs));
+    if ( bufs == NULL )
+        goto error;
+
+    array->max_bufs = n;
+    array->bufs     = bufs;
+
+    return array;
+
+error:
+    free(bufs);
+    free(array);
+    return NULL;
+}
+
+void *xc__hypercall_buffer_array_alloc(xc_interface *xch,
+                                       xc_hypercall_buffer_array_t *array,
+                                       unsigned index,
+                                       xc_hypercall_buffer_t *hbuf,
+                                       size_t size)
+{
+    void *buf;
+
+    if ( index >= array->max_bufs || array->bufs[index].hbuf )
+        abort();
+
+    buf = xc__hypercall_buffer_alloc(xch, hbuf, size);
+    if ( buf )
+        array->bufs[index] = *hbuf;
+    return buf;
+}
+
+void *xc__hypercall_buffer_array_get(xc_interface *xch,
+                                     xc_hypercall_buffer_array_t *array,
+                                     unsigned index,
+                                     xc_hypercall_buffer_t *hbuf)
+{
+    if ( index >= array->max_bufs || array->bufs[index].hbuf == NULL )
+        abort();
+
+    *hbuf = array->bufs[index];
+    return array->bufs[index].hbuf;
+}
+
+void xc_hypercall_buffer_array_destroy(xc_interface *xc,
+                                       xc_hypercall_buffer_array_t *array)
+{
+    unsigned i;
+
+    if ( array == NULL )
+        return;
+
+    for (i = 0; i < array->max_bufs; i++ )
+        xc__hypercall_buffer_free(xc, &array->bufs[i]);
+    free(array->bufs);
+    free(array);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 32122fd..c3b2c28 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -317,6 +317,33 @@ void xc__hypercall_buffer_free_pages(xc_interface *xch, xc_hypercall_buffer_t *b
 #define xc_hypercall_buffer_free_pages(_xch, _name, _nr) xc__hypercall_buffer_free_pages(_xch, HYPERCALL_BUFFER(_name), _nr)
 
 /*
+ * Array of hypercall buffers.
+ *
+ * Create an array with xc_hypercall_buffer_array_create() and
+ * populate it by declaring one hypercall buffer in a loop and
+ * allocating the buffer with xc_hypercall_buffer_array_alloc().
+ *
+ * To access a previously allocated buffers, declare a new hypercall
+ * buffer and call xc_hypercall_buffer_array_get().
+ *
+ * Destroy the array with xc_hypercall_buffer_array_destroy() to free
+ * the array and all its alocated hypercall buffers.
+ */
+struct xc_hypercall_buffer_array;
+typedef struct xc_hypercall_buffer_array xc_hypercall_buffer_array_t;
+
+xc_hypercall_buffer_array_t *xc_hypercall_buffer_array_create(xc_interface *xch, unsigned n);
+void *xc__hypercall_buffer_array_alloc(xc_interface *xch, xc_hypercall_buffer_array_t *array,
+                                       unsigned index, xc_hypercall_buffer_t *hbuf, size_t size);
+#define xc_hypercall_buffer_array_alloc(_xch, _array, _index, _name, _size) \
+    xc__hypercall_buffer_array_alloc(_xch, _array, _index, HYPERCALL_BUFFER(_name), _size)
+void *xc__hypercall_buffer_array_get(xc_interface *xch, xc_hypercall_buffer_array_t *array,
+                                     unsigned index, xc_hypercall_buffer_t *hbuf);
+#define xc_hypercall_buffer_array_get(_xch, _array, _index, _name, _size) \
+    xc__hypercall_buffer_array_get(_xch, _array, _index, HYPERCALL_BUFFER(_name))
+void xc_hypercall_buffer_array_destroy(xc_interface *xc, xc_hypercall_buffer_array_t *array);
+
+/*
  * CPUMAP handling
  */
 typedef uint8_t *xc_cpumap_t;
-- 
1.7.2.5


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 8/8] libxc: add API for kexec hypercall
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (13 preceding siblings ...)
  2013-02-21 17:48 ` David Vrabel
@ 2013-02-21 17:48 ` David Vrabel
  2013-02-21 17:48 ` David Vrabel
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-21 17:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Daniel Kiper, kexec, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

Add xc_kexec_exec(), xc_kexec_get_ranges(), xc_kexec_load(), and
xc_kexec_unload().  The load and unload calls require the v2 load and
unload ops.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 tools/libxc/Makefile   |    1 +
 tools/libxc/xc_kexec.c |  140 ++++++++++++++++++++++++++++++++++++++++++++++++
 tools/libxc/xenctrl.h  |   55 +++++++++++++++++++
 3 files changed, 196 insertions(+), 0 deletions(-)
 create mode 100644 tools/libxc/xc_kexec.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index d44abf9..39badf9 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -31,6 +31,7 @@ CTRL_SRCS-y       += xc_mem_access.c
 CTRL_SRCS-y       += xc_memshr.c
 CTRL_SRCS-y       += xc_hcall_buf.c
 CTRL_SRCS-y       += xc_foreign_memory.c
+CTRL_SRCS-y       += xc_kexec.c
 CTRL_SRCS-y       += xtl_core.c
 CTRL_SRCS-y       += xtl_logger_stdio.c
 CTRL_SRCS-$(CONFIG_X86) += xc_pagetab.c
diff --git a/tools/libxc/xc_kexec.c b/tools/libxc/xc_kexec.c
new file mode 100644
index 0000000..88d0278
--- /dev/null
+++ b/tools/libxc/xc_kexec.c
@@ -0,0 +1,140 @@
+/******************************************************************************
+ * xc_kexec.c
+ *
+ * API for loading and executing kexec images.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * Copyright (C) 2013 Citrix Systems R&D Ltd.
+ */
+#include "xc_private.h"
+
+int xc_kexec(xc_interface *xch, int type)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_kexec_exec_t, exec);
+    int ret = -1;
+
+    exec = xc_hypercall_buffer_alloc(xch, exec, sizeof(*exec));
+    if ( exec == NULL )
+    {
+        PERROR("Count not alloc bounce buffer for kexec_exec hypercall");
+        goto out;
+    }
+
+    exec->type = type;
+
+    hypercall.op = __HYPERVISOR_kexec_op;
+    hypercall.arg[0] = KEXEC_CMD_kexec;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(exec);
+
+    ret = do_xen_hypercall(xch, &hypercall);
+
+out:
+    xc_hypercall_buffer_free(xch, exec);
+
+    return ret;
+}
+
+int xc_kexec_get_range(xc_interface *xch, int range,  int nr,
+                       uint64_t *size, uint64_t *start)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_kexec_range_t, get_range);
+    int ret = -1;
+
+    get_range = xc_hypercall_buffer_alloc(xch, get_range, sizeof(*get_range));
+    if ( get_range == NULL )
+    {
+        PERROR("Could not alloc bounce buffer for kexec_get_range hypercall");
+        goto out;
+    }
+
+    get_range->range = range;
+    get_range->nr = nr;
+
+    hypercall.op = __HYPERVISOR_kexec_op;
+    hypercall.arg[0] = KEXEC_CMD_kexec_get_range;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(get_range);
+
+    ret = do_xen_hypercall(xch, &hypercall);
+
+    *size = get_range->size;
+    *start = get_range->start;
+
+out:
+    xc_hypercall_buffer_free(xch, get_range);
+
+    return ret;
+}
+
+int xc_kexec_load(xc_interface *xch, uint8_t type, uint16_t arch,
+                  uint64_t entry_maddr,
+                  uint32_t nr_segments, xen_kexec_segment_t *segments)
+{
+    int ret = -1;
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BOUNCE(segments, sizeof(*segments) * nr_segments,
+                             XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    DECLARE_HYPERCALL_BUFFER(xen_kexec_load_t, load);
+    
+    if ( xc_hypercall_bounce_pre(xch, segments) )
+    {
+        PERROR("Could not allocate bounce buffer for kexec load hypercall");
+        goto out;
+    }
+    load = xc_hypercall_buffer_alloc(xch, load, sizeof(*load));
+    if ( load == NULL )
+    {
+        PERROR("Could not allocate buffer for kexec load hypercall");
+        goto out;
+    }
+
+    load->type = type;
+    load->arch = arch;
+    load->entry_maddr = entry_maddr;
+    load->nr_segments = nr_segments;
+    set_xen_guest_handle(load->segments, segments);
+
+    hypercall.op = __HYPERVISOR_kexec_op;
+    hypercall.arg[0] = KEXEC_CMD_kexec_load;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(load);
+
+    ret = do_xen_hypercall(xch, &hypercall);
+
+out:
+    xc_hypercall_buffer_free(xch, load);
+    xc_hypercall_bounce_post(xch, segments);
+
+    return ret;
+}
+
+int xc_kexec_unload(xc_interface *xch, int type)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_kexec_unload_t, unload);
+    int ret = -1;
+
+    unload = xc_hypercall_buffer_alloc(xch, unload, sizeof(*unload));
+    if ( unload == NULL )
+    {
+        PERROR("Count not alloc buffer for kexec unload hypercall");
+        goto out;
+    }
+
+    unload->type = type;
+
+    hypercall.op = __HYPERVISOR_kexec_op;
+    hypercall.arg[0] = KEXEC_CMD_kexec_unload;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(unload);
+
+    ret = do_xen_hypercall(xch, &hypercall);
+
+out:
+    xc_hypercall_buffer_free(xch, unload);
+
+    return ret;
+}
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index c3b2c28..d6c4877 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -46,6 +46,7 @@
 #include <xen/hvm/params.h>
 #include <xen/xsm/flask_op.h>
 #include <xen/tmem.h>
+#include <xen/kexec.h>
 
 #include "xentoollog.h"
 
@@ -2263,4 +2264,58 @@ int xc_compression_uncompress_page(xc_interface *xch, char *compbuf,
 				   unsigned long compbuf_size,
 				   unsigned long *compbuf_pos, char *dest);
 
+/*
+ * Execute an image previously loaded with xc_kexec_load().
+ *
+ * Does not return on success.
+ *
+ * Fails with:
+ *   ENOENT if the specified image has not been loaded.
+ */
+int xc_kexec(xc_interface *xch, int type);
+
+/*
+ * Find the machine address and size of certain memory areas.
+ *
+ *   KEXEC_RANGE_MA_CRASH       crash area
+ *   KEXEC_RANGE_MA_XEN         Xen itself
+ *   KEXEC_RANGE_MA_CPU         CPU note for CPU number 'nr'
+ *   KEXEC_RANGE_MA_XENHEAP     xenheap
+ *   KEXEC_RANGE_MA_EFI_MEMMAP  EFI Memory Map
+ *   KEXEC_RANGE_MA_VMCOREINFO  vmcoreinfo
+ *
+ * Fails with:
+ *   EINVAL if the range or CPU number isn't valid.
+ */
+int xc_kexec_get_range(xc_interface *xch, int range,  int nr,
+                       uint64_t *size, uint64_t *start);
+
+/*
+ * Load a kexec image into memory.
+ *
+ * The image may be of type KEXEC_TYPE_DEFAULT (executed on request)
+ * or KEXEC_TYPE_CRASH (executed on a crash).
+ *
+ * The image architecture may be a 32-bit variant of the hypervisor
+ * architecture (e.g, EM_386 on a x86-64 hypervisor).
+ *
+ * Fails with:
+ *   ENOMEM if there is insufficient memory for the new image.
+ *   EINVAL if the image does not fit into the crash area or the entry
+ *          point isn't within one of segments.
+ *   EBUSY  if another image is being executed.
+ */
+int xc_kexec_load(xc_interface *xch, uint8_t type, uint16_t arch,
+                  uint64_t entry_maddr,
+                  uint32_t nr_segments, xen_kexec_segment_t *segments);
+
+/*
+ * Unload a kexec image.
+ *
+ * This prevents a KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH image from
+ * being executed.  The crash images are not cleared from the crash
+ * region.
+ */
+int xc_kexec_unload(xc_interface *xch, int type);
+
 #endif /* XENCTRL_H */
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* [PATCH 8/8] libxc: add API for kexec hypercall
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (14 preceding siblings ...)
  2013-02-21 17:48 ` [PATCH 8/8] libxc: add API for kexec hypercall David Vrabel
@ 2013-02-21 17:48 ` David Vrabel
  2013-03-07  2:46   ` Ian Campbell
  2013-03-07  2:46   ` [Xen-devel] " Ian Campbell
  2013-02-21 22:47 ` [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels Daniel Kiper
                   ` (7 subsequent siblings)
  23 siblings, 2 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-21 17:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Daniel Kiper, kexec, David Vrabel

From: David Vrabel <david.vrabel@citrix.com>

Add xc_kexec_exec(), xc_kexec_get_ranges(), xc_kexec_load(), and
xc_kexec_unload().  The load and unload calls require the v2 load and
unload ops.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 tools/libxc/Makefile   |    1 +
 tools/libxc/xc_kexec.c |  140 ++++++++++++++++++++++++++++++++++++++++++++++++
 tools/libxc/xenctrl.h  |   55 +++++++++++++++++++
 3 files changed, 196 insertions(+), 0 deletions(-)
 create mode 100644 tools/libxc/xc_kexec.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index d44abf9..39badf9 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -31,6 +31,7 @@ CTRL_SRCS-y       += xc_mem_access.c
 CTRL_SRCS-y       += xc_memshr.c
 CTRL_SRCS-y       += xc_hcall_buf.c
 CTRL_SRCS-y       += xc_foreign_memory.c
+CTRL_SRCS-y       += xc_kexec.c
 CTRL_SRCS-y       += xtl_core.c
 CTRL_SRCS-y       += xtl_logger_stdio.c
 CTRL_SRCS-$(CONFIG_X86) += xc_pagetab.c
diff --git a/tools/libxc/xc_kexec.c b/tools/libxc/xc_kexec.c
new file mode 100644
index 0000000..88d0278
--- /dev/null
+++ b/tools/libxc/xc_kexec.c
@@ -0,0 +1,140 @@
+/******************************************************************************
+ * xc_kexec.c
+ *
+ * API for loading and executing kexec images.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * Copyright (C) 2013 Citrix Systems R&D Ltd.
+ */
+#include "xc_private.h"
+
+int xc_kexec(xc_interface *xch, int type)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_kexec_exec_t, exec);
+    int ret = -1;
+
+    exec = xc_hypercall_buffer_alloc(xch, exec, sizeof(*exec));
+    if ( exec == NULL )
+    {
+        PERROR("Count not alloc bounce buffer for kexec_exec hypercall");
+        goto out;
+    }
+
+    exec->type = type;
+
+    hypercall.op = __HYPERVISOR_kexec_op;
+    hypercall.arg[0] = KEXEC_CMD_kexec;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(exec);
+
+    ret = do_xen_hypercall(xch, &hypercall);
+
+out:
+    xc_hypercall_buffer_free(xch, exec);
+
+    return ret;
+}
+
+int xc_kexec_get_range(xc_interface *xch, int range,  int nr,
+                       uint64_t *size, uint64_t *start)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_kexec_range_t, get_range);
+    int ret = -1;
+
+    get_range = xc_hypercall_buffer_alloc(xch, get_range, sizeof(*get_range));
+    if ( get_range == NULL )
+    {
+        PERROR("Could not alloc bounce buffer for kexec_get_range hypercall");
+        goto out;
+    }
+
+    get_range->range = range;
+    get_range->nr = nr;
+
+    hypercall.op = __HYPERVISOR_kexec_op;
+    hypercall.arg[0] = KEXEC_CMD_kexec_get_range;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(get_range);
+
+    ret = do_xen_hypercall(xch, &hypercall);
+
+    *size = get_range->size;
+    *start = get_range->start;
+
+out:
+    xc_hypercall_buffer_free(xch, get_range);
+
+    return ret;
+}
+
+int xc_kexec_load(xc_interface *xch, uint8_t type, uint16_t arch,
+                  uint64_t entry_maddr,
+                  uint32_t nr_segments, xen_kexec_segment_t *segments)
+{
+    int ret = -1;
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BOUNCE(segments, sizeof(*segments) * nr_segments,
+                             XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    DECLARE_HYPERCALL_BUFFER(xen_kexec_load_t, load);
+    
+    if ( xc_hypercall_bounce_pre(xch, segments) )
+    {
+        PERROR("Could not allocate bounce buffer for kexec load hypercall");
+        goto out;
+    }
+    load = xc_hypercall_buffer_alloc(xch, load, sizeof(*load));
+    if ( load == NULL )
+    {
+        PERROR("Could not allocate buffer for kexec load hypercall");
+        goto out;
+    }
+
+    load->type = type;
+    load->arch = arch;
+    load->entry_maddr = entry_maddr;
+    load->nr_segments = nr_segments;
+    set_xen_guest_handle(load->segments, segments);
+
+    hypercall.op = __HYPERVISOR_kexec_op;
+    hypercall.arg[0] = KEXEC_CMD_kexec_load;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(load);
+
+    ret = do_xen_hypercall(xch, &hypercall);
+
+out:
+    xc_hypercall_buffer_free(xch, load);
+    xc_hypercall_bounce_post(xch, segments);
+
+    return ret;
+}
+
+int xc_kexec_unload(xc_interface *xch, int type)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_kexec_unload_t, unload);
+    int ret = -1;
+
+    unload = xc_hypercall_buffer_alloc(xch, unload, sizeof(*unload));
+    if ( unload == NULL )
+    {
+        PERROR("Count not alloc buffer for kexec unload hypercall");
+        goto out;
+    }
+
+    unload->type = type;
+
+    hypercall.op = __HYPERVISOR_kexec_op;
+    hypercall.arg[0] = KEXEC_CMD_kexec_unload;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(unload);
+
+    ret = do_xen_hypercall(xch, &hypercall);
+
+out:
+    xc_hypercall_buffer_free(xch, unload);
+
+    return ret;
+}
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index c3b2c28..d6c4877 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -46,6 +46,7 @@
 #include <xen/hvm/params.h>
 #include <xen/xsm/flask_op.h>
 #include <xen/tmem.h>
+#include <xen/kexec.h>
 
 #include "xentoollog.h"
 
@@ -2263,4 +2264,58 @@ int xc_compression_uncompress_page(xc_interface *xch, char *compbuf,
 				   unsigned long compbuf_size,
 				   unsigned long *compbuf_pos, char *dest);
 
+/*
+ * Execute an image previously loaded with xc_kexec_load().
+ *
+ * Does not return on success.
+ *
+ * Fails with:
+ *   ENOENT if the specified image has not been loaded.
+ */
+int xc_kexec(xc_interface *xch, int type);
+
+/*
+ * Find the machine address and size of certain memory areas.
+ *
+ *   KEXEC_RANGE_MA_CRASH       crash area
+ *   KEXEC_RANGE_MA_XEN         Xen itself
+ *   KEXEC_RANGE_MA_CPU         CPU note for CPU number 'nr'
+ *   KEXEC_RANGE_MA_XENHEAP     xenheap
+ *   KEXEC_RANGE_MA_EFI_MEMMAP  EFI Memory Map
+ *   KEXEC_RANGE_MA_VMCOREINFO  vmcoreinfo
+ *
+ * Fails with:
+ *   EINVAL if the range or CPU number isn't valid.
+ */
+int xc_kexec_get_range(xc_interface *xch, int range,  int nr,
+                       uint64_t *size, uint64_t *start);
+
+/*
+ * Load a kexec image into memory.
+ *
+ * The image may be of type KEXEC_TYPE_DEFAULT (executed on request)
+ * or KEXEC_TYPE_CRASH (executed on a crash).
+ *
+ * The image architecture may be a 32-bit variant of the hypervisor
+ * architecture (e.g, EM_386 on a x86-64 hypervisor).
+ *
+ * Fails with:
+ *   ENOMEM if there is insufficient memory for the new image.
+ *   EINVAL if the image does not fit into the crash area or the entry
+ *          point isn't within one of segments.
+ *   EBUSY  if another image is being executed.
+ */
+int xc_kexec_load(xc_interface *xch, uint8_t type, uint16_t arch,
+                  uint64_t entry_maddr,
+                  uint32_t nr_segments, xen_kexec_segment_t *segments);
+
+/*
+ * Unload a kexec image.
+ *
+ * This prevents a KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH image from
+ * being executed.  The crash images are not cleared from the crash
+ * region.
+ */
+int xc_kexec_unload(xc_interface *xch, int type);
+
 #endif /* XENCTRL_H */
-- 
1.7.2.5


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 106+ messages in thread

* Re: [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-02-21 17:48 ` David Vrabel
@ 2013-02-21 22:29   ` Daniel Kiper
  2013-02-21 22:29   ` Daniel Kiper
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-02-21 22:29 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Thu, Feb 21, 2013 at 05:48:09PM +0000, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
>
> Add replacement KEXEC_CMD_load and KEXEC_CMD_unload sub-ops to the
> kexec hypercall.  These new sub-ops allow a priviledged guest to
> provide the image data to be loaded into Xen memory or the crash
> region instead of guests loading the image data themselves and
> providing the relocation code and metadata.
>
> The old interface is provided to guests requesting an interface
> version prior to 4.3.
>
> Signed-off: David Vrabel <david.vrabel@citrix.com>
> ---
>  xen/common/kexec.c         |   12 ++++----
>  xen/include/public/kexec.h |   66 +++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 68 insertions(+), 10 deletions(-)
>
> diff --git a/xen/common/kexec.c b/xen/common/kexec.c
> index 6dd20c6..2cbb62c 100644
> --- a/xen/common/kexec.c
> +++ b/xen/common/kexec.c
> @@ -732,7 +732,7 @@ static void crash_save_vmcoreinfo(void)
>  #endif
>  }
>
> -static int kexec_load_unload_internal(unsigned long op, xen_kexec_load_t *load)
> +static int kexec_load_unload_internal(unsigned long op, xen_kexec_load_v1_t *load)
>  {
>      xen_kexec_image_t *image;
>      int base, bit, pos;
> @@ -779,7 +779,7 @@ static int kexec_load_unload_internal(unsigned long op, xen_kexec_load_t *load)
>
>  static int kexec_load_unload(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) uarg)
>  {
> -    xen_kexec_load_t load;
> +    xen_kexec_load_v1_t load;
>
>      if ( unlikely(copy_from_guest(&load, uarg, 1)) )
>          return -EFAULT;
> @@ -791,8 +791,8 @@ static int kexec_load_unload_compat(unsigned long op,
>                                      XEN_GUEST_HANDLE_PARAM(void) uarg)
>  {
>  #ifdef CONFIG_COMPAT
> -    compat_kexec_load_t compat_load;
> -    xen_kexec_load_t load;
> +    compat_kexec_load_v1_t compat_load;
> +    xen_kexec_load_v1_t load;
>
>      if ( unlikely(copy_from_guest(&compat_load, uarg, 1)) )
>          return -EFAULT;
> @@ -864,8 +864,8 @@ static int do_kexec_op_internal(unsigned long op,
>          else
>                  ret = kexec_get_range(uarg);
>          break;
> -    case KEXEC_CMD_kexec_load:
> -    case KEXEC_CMD_kexec_unload:
> +    case KEXEC_CMD_kexec_load_v1:
> +    case KEXEC_CMD_kexec_unload_v1:
>          spin_lock_irqsave(&kexec_lock, flags);
>          if (!test_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags))
>          {
> diff --git a/xen/include/public/kexec.h b/xen/include/public/kexec.h
> index 61a8d7d..5259446 100644
> --- a/xen/include/public/kexec.h
> +++ b/xen/include/public/kexec.h
> @@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
>   * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
>   * image == relocation information for kexec (ignored for unload) [in]
>   */
> -#define KEXEC_CMD_kexec_load            1
> -#define KEXEC_CMD_kexec_unload          2
> -typedef struct xen_kexec_load {
> +#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
> +#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
> +typedef struct xen_kexec_load_v1 {
>      int type;
>      xen_kexec_image_t image;
> -} xen_kexec_load_t;
> +} xen_kexec_load_v1_t;
>
>  #define KEXEC_RANGE_MA_CRASH      0 /* machine address and size of crash area */
>  #define KEXEC_RANGE_MA_XEN        1 /* machine address and size of Xen itself */
> @@ -152,6 +152,64 @@ typedef struct xen_kexec_range {
>      unsigned long start;
>  } xen_kexec_range_t;
>
> +#if __XEN_INTERFACE_VERSION__ >= 0x00040300
> +/*
> + * A contiguous chunk of a kexec image and it's destination machine
> + * address.
> + */
> +typedef struct xen_kexec_segment {
> +    XEN_GUEST_HANDLE_64(const_void) buf;
> +    uint64_t buf_size;
> +    uint64_t dest_maddr;
> +    uint64_t dest_size;
> +} xen_kexec_segment_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_kexec_segment_t);
> +
> +/*
> + * Load a kexec image into memory.
> + *
> + * For KEXEC_TYPE_DEFAULT images, the segments may be anywhere in RAM.
> + * The image is relocated prior to being executed.
> + *
> + * For KEXEC_TYPE_CRASH images, each segment of the image must reside
> + * in the memory region reserved for kexec (KEXEC_RANGE_MA_CRASH) and
> + * the entry point must be within the image. The caller is responsible
> + * for ensuring that multiple images do not overlap.

What do you mean by "The caller is responsible for ensuring
that multiple images do not overlap."?

> + */
> +
> +#define KEXEC_CMD_kexec_load 4
> +typedef struct xen_kexec_load {
> +    uint8_t  type;        /* One of KEXEC_TYPE_* */
> +    uint16_t arch;        /* ELF machine type (EM_*). */
> +    uint32_t __pad;

Why do you need __pad here?

> +    uint64_t entry_maddr; /* image entry point machine address. */
> +    uint32_t nr_segments;
> +    XEN_GUEST_HANDLE_64(xen_kexec_segment_t) segments;
> +} xen_kexec_load_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_kexec_load_t);
> +
> +/*
> + * Unload a kexec image.
> + *
> + * Type must be one of KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH.
> + */
> +#define KEXEC_CMD_kexec_unload 5
> +typedef struct xen_kexec_unload {
> +    uint8_t type;
> +} xen_kexec_unload_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_kexec_unload_t);
> +
> +#else /* __XEN_INTERFACE_VERSION__ < 0x00040300 */
> +
> +#undef KEXEC_CMD_kexec_load
> +#undef KEXEC_CMD_kexec_unload
> +#define KEXEC_CMD_kexec_load KEXEC_CMD_kexec_load_v1
> +#define KEXEC_CMD_kexec_unload KEXEC_CMD_kexec_unload_v1

Could you define all constants in one place at the
beginning of this file? It is very difficult to
see what is going on. Especially those undefs are
crazy for me.

Daniel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-02-21 17:48 ` David Vrabel
  2013-02-21 22:29   ` Daniel Kiper
@ 2013-02-21 22:29   ` Daniel Kiper
  2013-02-22 11:49     ` David Vrabel
  2013-02-22 11:49     ` David Vrabel
  2013-02-22  8:33   ` [Xen-devel] " Jan Beulich
                     ` (3 subsequent siblings)
  5 siblings, 2 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-02-21 22:29 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Thu, Feb 21, 2013 at 05:48:09PM +0000, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
>
> Add replacement KEXEC_CMD_load and KEXEC_CMD_unload sub-ops to the
> kexec hypercall.  These new sub-ops allow a priviledged guest to
> provide the image data to be loaded into Xen memory or the crash
> region instead of guests loading the image data themselves and
> providing the relocation code and metadata.
>
> The old interface is provided to guests requesting an interface
> version prior to 4.3.
>
> Signed-off: David Vrabel <david.vrabel@citrix.com>
> ---
>  xen/common/kexec.c         |   12 ++++----
>  xen/include/public/kexec.h |   66 +++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 68 insertions(+), 10 deletions(-)
>
> diff --git a/xen/common/kexec.c b/xen/common/kexec.c
> index 6dd20c6..2cbb62c 100644
> --- a/xen/common/kexec.c
> +++ b/xen/common/kexec.c
> @@ -732,7 +732,7 @@ static void crash_save_vmcoreinfo(void)
>  #endif
>  }
>
> -static int kexec_load_unload_internal(unsigned long op, xen_kexec_load_t *load)
> +static int kexec_load_unload_internal(unsigned long op, xen_kexec_load_v1_t *load)
>  {
>      xen_kexec_image_t *image;
>      int base, bit, pos;
> @@ -779,7 +779,7 @@ static int kexec_load_unload_internal(unsigned long op, xen_kexec_load_t *load)
>
>  static int kexec_load_unload(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) uarg)
>  {
> -    xen_kexec_load_t load;
> +    xen_kexec_load_v1_t load;
>
>      if ( unlikely(copy_from_guest(&load, uarg, 1)) )
>          return -EFAULT;
> @@ -791,8 +791,8 @@ static int kexec_load_unload_compat(unsigned long op,
>                                      XEN_GUEST_HANDLE_PARAM(void) uarg)
>  {
>  #ifdef CONFIG_COMPAT
> -    compat_kexec_load_t compat_load;
> -    xen_kexec_load_t load;
> +    compat_kexec_load_v1_t compat_load;
> +    xen_kexec_load_v1_t load;
>
>      if ( unlikely(copy_from_guest(&compat_load, uarg, 1)) )
>          return -EFAULT;
> @@ -864,8 +864,8 @@ static int do_kexec_op_internal(unsigned long op,
>          else
>                  ret = kexec_get_range(uarg);
>          break;
> -    case KEXEC_CMD_kexec_load:
> -    case KEXEC_CMD_kexec_unload:
> +    case KEXEC_CMD_kexec_load_v1:
> +    case KEXEC_CMD_kexec_unload_v1:
>          spin_lock_irqsave(&kexec_lock, flags);
>          if (!test_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags))
>          {
> diff --git a/xen/include/public/kexec.h b/xen/include/public/kexec.h
> index 61a8d7d..5259446 100644
> --- a/xen/include/public/kexec.h
> +++ b/xen/include/public/kexec.h
> @@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
>   * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
>   * image == relocation information for kexec (ignored for unload) [in]
>   */
> -#define KEXEC_CMD_kexec_load            1
> -#define KEXEC_CMD_kexec_unload          2
> -typedef struct xen_kexec_load {
> +#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
> +#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
> +typedef struct xen_kexec_load_v1 {
>      int type;
>      xen_kexec_image_t image;
> -} xen_kexec_load_t;
> +} xen_kexec_load_v1_t;
>
>  #define KEXEC_RANGE_MA_CRASH      0 /* machine address and size of crash area */
>  #define KEXEC_RANGE_MA_XEN        1 /* machine address and size of Xen itself */
> @@ -152,6 +152,64 @@ typedef struct xen_kexec_range {
>      unsigned long start;
>  } xen_kexec_range_t;
>
> +#if __XEN_INTERFACE_VERSION__ >= 0x00040300
> +/*
> + * A contiguous chunk of a kexec image and it's destination machine
> + * address.
> + */
> +typedef struct xen_kexec_segment {
> +    XEN_GUEST_HANDLE_64(const_void) buf;
> +    uint64_t buf_size;
> +    uint64_t dest_maddr;
> +    uint64_t dest_size;
> +} xen_kexec_segment_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_kexec_segment_t);
> +
> +/*
> + * Load a kexec image into memory.
> + *
> + * For KEXEC_TYPE_DEFAULT images, the segments may be anywhere in RAM.
> + * The image is relocated prior to being executed.
> + *
> + * For KEXEC_TYPE_CRASH images, each segment of the image must reside
> + * in the memory region reserved for kexec (KEXEC_RANGE_MA_CRASH) and
> + * the entry point must be within the image. The caller is responsible
> + * for ensuring that multiple images do not overlap.

What do you mean by "The caller is responsible for ensuring
that multiple images do not overlap."?

> + */
> +
> +#define KEXEC_CMD_kexec_load 4
> +typedef struct xen_kexec_load {
> +    uint8_t  type;        /* One of KEXEC_TYPE_* */
> +    uint16_t arch;        /* ELF machine type (EM_*). */
> +    uint32_t __pad;

Why do you need __pad here?

> +    uint64_t entry_maddr; /* image entry point machine address. */
> +    uint32_t nr_segments;
> +    XEN_GUEST_HANDLE_64(xen_kexec_segment_t) segments;
> +} xen_kexec_load_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_kexec_load_t);
> +
> +/*
> + * Unload a kexec image.
> + *
> + * Type must be one of KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH.
> + */
> +#define KEXEC_CMD_kexec_unload 5
> +typedef struct xen_kexec_unload {
> +    uint8_t type;
> +} xen_kexec_unload_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_kexec_unload_t);
> +
> +#else /* __XEN_INTERFACE_VERSION__ < 0x00040300 */
> +
> +#undef KEXEC_CMD_kexec_load
> +#undef KEXEC_CMD_kexec_unload
> +#define KEXEC_CMD_kexec_load KEXEC_CMD_kexec_load_v1
> +#define KEXEC_CMD_kexec_unload KEXEC_CMD_kexec_unload_v1

Could you define all constants in one place at the
beginning of this file? It is very difficult to
see what is going on. Especially those undefs are
crazy for me.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-02-21 17:48 ` David Vrabel
@ 2013-02-21 22:41   ` Daniel Kiper
  2013-02-21 22:41   ` Daniel Kiper
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-02-21 22:41 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Thu, Feb 21, 2013 at 05:48:11PM +0000, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
>
> In the existing kexec hypercall, the load and unload ops depend on
> internals of the Linux kernel (the page list and code page provided by
> the kernel).  The code page is used to transition between Xen context
> and the image so using kernel code doesn't make sense and will not
> work for PVH guests.
>
> Add replacement KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload ops
> that no longer require a code page to be provided by the guest -- Xen
> now provides the code for calling the image directly.
>
> The new load op looks similar to the Linux kexec_load system call and
> allows the guest to provide the image data to be loaded.  The guest
> specifies the architecture of the image which may be a 32-bit subarch
> of the hypervisor's architecture (i.e., an EM_386 image on an
> EM_X86_64 hypervisor).
>
> The toolstack can now load images without kernel involvement.  This is
> required for supporting kexec when using a dom0 with an upstream
> kernel.
>
> Crash images are copied directly into the crash region on load.
> Default images are copied into Xen heap pages and a list of source and
> destination machine addresses is created.  This is list is used in
> kexec_reloc() to relocate the image to its destination.
>
> The old load and unload sub-ops are still available (as
> KEXEC_CMD_load_v1 and KEXEC_CMD_unload_v1) and are implemented on top
> of the new infrastructure.
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

[...]

> diff --git a/xen/arch/x86/x86_64/kexec_reloc.S b/xen/arch/x86/x86_64/kexec_reloc.S
> new file mode 100644
> index 0000000..e68842c
> --- /dev/null
> +++ b/xen/arch/x86/x86_64/kexec_reloc.S
> @@ -0,0 +1,229 @@
> +/*
> + * Relocate a kexec_image to its destination and call it.
> + *
> + * Copyright (C) 2013 Citrix Systems R&D Ltd.
> + *
> + * Portions derived from Linux's arch/x86/kernel/relocate_kernel_64.S.
> + *
> + *   Copyright (C) 2002-2005 Eric Biederman  <ebiederm@xmission.com>
> + *
> + * This source code is licensed under the GNU General Public License,
> + * Version 2.  See the file COPYING for more details.
> + */
> +#include <xen/config.h>
> +
> +#include <asm/asm_defns.h>
> +#include <asm/msr.h>
> +#include <asm/page.h>
> +#include <asm/machine_kexec.h>
> +
> +/* The unrelocated physical address of a symbol. */
> +#define SYM_PHYS(sym)          ((sym) - __XEN_VIRT_START)
> +
> +/* Load physical address of symbol into register and relocate it. */
> +#define RELOCATE_SYM(sym,reg)  mov $SYM_PHYS(sym), reg ; \
> +                               add xen_phys_start(%rip), reg
> +
> +#define DBG(c) \
> +1:      mov     $0x3f8+5, %dx ; \
> +        inb     %dx, %al     ; \
> +        test    $0x20, %al   ; \
> +        je      1b           ; \
> +        mov     $0x3f8, %dx  ; \
> +        mov     $c, %al      ; \
> +        outb    %al, %dx     ;

Nice feature but I think that it is dangerous to write
to serial port unconditionally. There are a lot of machines
in the wild which do not have coms these days. Maybe it should be
enabled if Xen is compiled with debug feature. Then serial port
address should be established on the base of console argument.

Daniel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-02-21 17:48 ` David Vrabel
  2013-02-21 22:41   ` Daniel Kiper
@ 2013-02-21 22:41   ` Daniel Kiper
  2013-02-22  8:42   ` [Xen-devel] " Jan Beulich
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-02-21 22:41 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Thu, Feb 21, 2013 at 05:48:11PM +0000, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
>
> In the existing kexec hypercall, the load and unload ops depend on
> internals of the Linux kernel (the page list and code page provided by
> the kernel).  The code page is used to transition between Xen context
> and the image so using kernel code doesn't make sense and will not
> work for PVH guests.
>
> Add replacement KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload ops
> that no longer require a code page to be provided by the guest -- Xen
> now provides the code for calling the image directly.
>
> The new load op looks similar to the Linux kexec_load system call and
> allows the guest to provide the image data to be loaded.  The guest
> specifies the architecture of the image which may be a 32-bit subarch
> of the hypervisor's architecture (i.e., an EM_386 image on an
> EM_X86_64 hypervisor).
>
> The toolstack can now load images without kernel involvement.  This is
> required for supporting kexec when using a dom0 with an upstream
> kernel.
>
> Crash images are copied directly into the crash region on load.
> Default images are copied into Xen heap pages and a list of source and
> destination machine addresses is created.  This is list is used in
> kexec_reloc() to relocate the image to its destination.
>
> The old load and unload sub-ops are still available (as
> KEXEC_CMD_load_v1 and KEXEC_CMD_unload_v1) and are implemented on top
> of the new infrastructure.
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

[...]

> diff --git a/xen/arch/x86/x86_64/kexec_reloc.S b/xen/arch/x86/x86_64/kexec_reloc.S
> new file mode 100644
> index 0000000..e68842c
> --- /dev/null
> +++ b/xen/arch/x86/x86_64/kexec_reloc.S
> @@ -0,0 +1,229 @@
> +/*
> + * Relocate a kexec_image to its destination and call it.
> + *
> + * Copyright (C) 2013 Citrix Systems R&D Ltd.
> + *
> + * Portions derived from Linux's arch/x86/kernel/relocate_kernel_64.S.
> + *
> + *   Copyright (C) 2002-2005 Eric Biederman  <ebiederm@xmission.com>
> + *
> + * This source code is licensed under the GNU General Public License,
> + * Version 2.  See the file COPYING for more details.
> + */
> +#include <xen/config.h>
> +
> +#include <asm/asm_defns.h>
> +#include <asm/msr.h>
> +#include <asm/page.h>
> +#include <asm/machine_kexec.h>
> +
> +/* The unrelocated physical address of a symbol. */
> +#define SYM_PHYS(sym)          ((sym) - __XEN_VIRT_START)
> +
> +/* Load physical address of symbol into register and relocate it. */
> +#define RELOCATE_SYM(sym,reg)  mov $SYM_PHYS(sym), reg ; \
> +                               add xen_phys_start(%rip), reg
> +
> +#define DBG(c) \
> +1:      mov     $0x3f8+5, %dx ; \
> +        inb     %dx, %al     ; \
> +        test    $0x20, %al   ; \
> +        je      1b           ; \
> +        mov     $0x3f8, %dx  ; \
> +        mov     $c, %al      ; \
> +        outb    %al, %dx     ;

Nice feature but I think that it is dangerous to write
to serial port unconditionally. There are a lot of machines
in the wild which do not have coms these days. Maybe it should be
enabled if Xen is compiled with debug feature. Then serial port
address should be established on the base of console argument.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (16 preceding siblings ...)
  2013-02-21 22:47 ` [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels Daniel Kiper
@ 2013-02-21 22:47 ` Daniel Kiper
  2013-02-22  8:17 ` Jan Beulich
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-02-21 22:47 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Thu, Feb 21, 2013 at 05:48:06PM +0000, David Vrabel wrote:
> The series improves the kexec hypercall by making Xen responsible for
> loading and relocating the image.  This allows kexec to be usable by
> pv-ops kernels and should allow kexec to be usable from a HVM or PVH
> privileged domain.
>
> The first patch is a simple clean-up.
>
> The second patch allows hypercall structures to be ABI compatible
> between 32- and 64-bit guests (by reusing stuff present for domctls
> and sysctls).  This seems better than having to keep adding compat
> handling for new hypercalls etc.
>
> Patch 3 introduces the new ABI.
>
> Patch 4 and 5 nearly completely reimplement the kexec load, unload and
> exec sub-ops.  The old load_v1 sub-op is then implemented on top of
> the new code.
>
> Patch 6 calls the kexec image when dom0 crashes.  This avoids having
> to alter dom0 kernels to do a exec sub-op call on crash -- an existing
> SHUTDOWN_crash.
>
> Patches 7 and 8 add the libxc API for the kexec calls.
>
> The required patch series for kexec-tools will be posted shortly.

On first sight both patch series looks quite good for me.
Give me a week or two to do some tests.

Daniel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (15 preceding siblings ...)
  2013-02-21 17:48 ` David Vrabel
@ 2013-02-21 22:47 ` Daniel Kiper
  2013-02-21 22:47 ` Daniel Kiper
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-02-21 22:47 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Thu, Feb 21, 2013 at 05:48:06PM +0000, David Vrabel wrote:
> The series improves the kexec hypercall by making Xen responsible for
> loading and relocating the image.  This allows kexec to be usable by
> pv-ops kernels and should allow kexec to be usable from a HVM or PVH
> privileged domain.
>
> The first patch is a simple clean-up.
>
> The second patch allows hypercall structures to be ABI compatible
> between 32- and 64-bit guests (by reusing stuff present for domctls
> and sysctls).  This seems better than having to keep adding compat
> handling for new hypercalls etc.
>
> Patch 3 introduces the new ABI.
>
> Patch 4 and 5 nearly completely reimplement the kexec load, unload and
> exec sub-ops.  The old load_v1 sub-op is then implemented on top of
> the new code.
>
> Patch 6 calls the kexec image when dom0 crashes.  This avoids having
> to alter dom0 kernels to do a exec sub-op call on crash -- an existing
> SHUTDOWN_crash.
>
> Patches 7 and 8 add the libxc API for the kexec calls.
>
> The required patch series for kexec-tools will be posted shortly.

On first sight both patch series looks quite good for me.
Give me a week or two to do some tests.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (17 preceding siblings ...)
  2013-02-21 22:47 ` Daniel Kiper
@ 2013-02-22  8:17 ` Jan Beulich
  2013-02-22  8:17 ` [Xen-devel] " Jan Beulich
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 106+ messages in thread
From: Jan Beulich @ 2013-02-22  8:17 UTC (permalink / raw)
  To: David Vrabel; +Cc: Daniel Kiper, kexec, xen-devel

>>> On 21.02.13 at 18:48, David Vrabel <david.vrabel@citrix.com> wrote:
> The series improves the kexec hypercall by making Xen responsible for
> loading and relocating the image.  This allows kexec to be usable by
> pv-ops kernels and should allow kexec to be usable from a HVM or PVH
> privileged domain.
> 
> The first patch is a simple clean-up.
> 
> The second patch allows hypercall structures to be ABI compatible
> between 32- and 64-bit guests (by reusing stuff present for domctls
> and sysctls).  This seems better than having to keep adding compat
> handling for new hypercalls etc.
> 
> Patch 3 introduces the new ABI.
> 
> Patch 4 and 5 nearly completely reimplement the kexec load, unload and
> exec sub-ops.  The old load_v1 sub-op is then implemented on top of
> the new code.
> 
> Patch 6 calls the kexec image when dom0 crashes.  This avoids having
> to alter dom0 kernels to do a exec sub-op call on crash -- an existing
> SHUTDOWN_crash.

Am I right in understanding that at this point no kexec support is
necessary in the Dom0 kernel at all anymore? If so, that's a very
nice move - thanks for doing that!

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (18 preceding siblings ...)
  2013-02-22  8:17 ` Jan Beulich
@ 2013-02-22  8:17 ` Jan Beulich
  2013-02-22 11:56   ` David Vrabel
  2013-02-22 11:56   ` David Vrabel
  2013-02-26 13:58 ` Don Slutz
                   ` (3 subsequent siblings)
  23 siblings, 2 replies; 106+ messages in thread
From: Jan Beulich @ 2013-02-22  8:17 UTC (permalink / raw)
  To: David Vrabel; +Cc: Daniel Kiper, kexec, xen-devel

>>> On 21.02.13 at 18:48, David Vrabel <david.vrabel@citrix.com> wrote:
> The series improves the kexec hypercall by making Xen responsible for
> loading and relocating the image.  This allows kexec to be usable by
> pv-ops kernels and should allow kexec to be usable from a HVM or PVH
> privileged domain.
> 
> The first patch is a simple clean-up.
> 
> The second patch allows hypercall structures to be ABI compatible
> between 32- and 64-bit guests (by reusing stuff present for domctls
> and sysctls).  This seems better than having to keep adding compat
> handling for new hypercalls etc.
> 
> Patch 3 introduces the new ABI.
> 
> Patch 4 and 5 nearly completely reimplement the kexec load, unload and
> exec sub-ops.  The old load_v1 sub-op is then implemented on top of
> the new code.
> 
> Patch 6 calls the kexec image when dom0 crashes.  This avoids having
> to alter dom0 kernels to do a exec sub-op call on crash -- an existing
> SHUTDOWN_crash.

Am I right in understanding that at this point no kexec support is
necessary in the Dom0 kernel at all anymore? If so, that's a very
nice move - thanks for doing that!

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-02-21 17:48 ` David Vrabel
                     ` (2 preceding siblings ...)
  2013-02-22  8:33   ` [Xen-devel] " Jan Beulich
@ 2013-02-22  8:33   ` Jan Beulich
  2013-03-08 10:50   ` Daniel Kiper
  2013-03-08 10:50   ` Daniel Kiper
  5 siblings, 0 replies; 106+ messages in thread
From: Jan Beulich @ 2013-02-22  8:33 UTC (permalink / raw)
  To: David Vrabel; +Cc: Daniel Kiper, kexec, xen-devel

>>> On 21.02.13 at 18:48, David Vrabel <david.vrabel@citrix.com> wrote:
> --- a/xen/include/public/kexec.h
> +++ b/xen/include/public/kexec.h
> @@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
>   * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
>   * image == relocation information for kexec (ignored for unload) [in]
>   */
> -#define KEXEC_CMD_kexec_load            1
> -#define KEXEC_CMD_kexec_unload          2
> -typedef struct xen_kexec_load {
> +#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
> +#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
> +typedef struct xen_kexec_load_v1 {
>      int type;
>      xen_kexec_image_t image;
> -} xen_kexec_load_t;
> +} xen_kexec_load_v1_t;

You can't change type names like this without also guarding them
with a __XEN_INTERFACE_VERSION__ conditional or providing
backward compatibility #define-s.

>  
>  #define KEXEC_RANGE_MA_CRASH      0 /* machine address and size of crash area */
>  #define KEXEC_RANGE_MA_XEN        1 /* machine address and size of Xen itself */
> @@ -152,6 +152,64 @@ typedef struct xen_kexec_range {
>      unsigned long start;
>  } xen_kexec_range_t;
>  
> +#if __XEN_INTERFACE_VERSION__ >= 0x00040300
> +/*
> + * A contiguous chunk of a kexec image and it's destination machine
> + * address.
> + */
> +typedef struct xen_kexec_segment {
> +    XEN_GUEST_HANDLE_64(const_void) buf;
> +    uint64_t buf_size;
> +    uint64_t dest_maddr;
> +    uint64_t dest_size;
> +} xen_kexec_segment_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_kexec_segment_t);
> +
> +/*
> + * Load a kexec image into memory.
> + *
> + * For KEXEC_TYPE_DEFAULT images, the segments may be anywhere in RAM.
> + * The image is relocated prior to being executed.
> + *
> + * For KEXEC_TYPE_CRASH images, each segment of the image must reside
> + * in the memory region reserved for kexec (KEXEC_RANGE_MA_CRASH) and
> + * the entry point must be within the image. The caller is responsible
> + * for ensuring that multiple images do not overlap.
> + */
> +
> +#define KEXEC_CMD_kexec_load 4
> +typedef struct xen_kexec_load {
> +    uint8_t  type;        /* One of KEXEC_TYPE_* */

uint8_t __pad1;

> +    uint16_t arch;        /* ELF machine type (EM_*). */
> +    uint32_t __pad;

Put nr_segments here instead?

> +    uint64_t entry_maddr; /* image entry point machine address. */
> +    uint32_t nr_segments;
> +    XEN_GUEST_HANDLE_64(xen_kexec_segment_t) segments;
> +} xen_kexec_load_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_kexec_load_t);

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-02-21 17:48 ` David Vrabel
  2013-02-21 22:29   ` Daniel Kiper
  2013-02-21 22:29   ` Daniel Kiper
@ 2013-02-22  8:33   ` Jan Beulich
  2013-02-22 11:50     ` David Vrabel
  2013-02-22 11:50     ` [Xen-devel] " David Vrabel
  2013-02-22  8:33   ` Jan Beulich
                     ` (2 subsequent siblings)
  5 siblings, 2 replies; 106+ messages in thread
From: Jan Beulich @ 2013-02-22  8:33 UTC (permalink / raw)
  To: David Vrabel; +Cc: Daniel Kiper, kexec, xen-devel

>>> On 21.02.13 at 18:48, David Vrabel <david.vrabel@citrix.com> wrote:
> --- a/xen/include/public/kexec.h
> +++ b/xen/include/public/kexec.h
> @@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
>   * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
>   * image == relocation information for kexec (ignored for unload) [in]
>   */
> -#define KEXEC_CMD_kexec_load            1
> -#define KEXEC_CMD_kexec_unload          2
> -typedef struct xen_kexec_load {
> +#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
> +#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
> +typedef struct xen_kexec_load_v1 {
>      int type;
>      xen_kexec_image_t image;
> -} xen_kexec_load_t;
> +} xen_kexec_load_v1_t;

You can't change type names like this without also guarding them
with a __XEN_INTERFACE_VERSION__ conditional or providing
backward compatibility #define-s.

>  
>  #define KEXEC_RANGE_MA_CRASH      0 /* machine address and size of crash area */
>  #define KEXEC_RANGE_MA_XEN        1 /* machine address and size of Xen itself */
> @@ -152,6 +152,64 @@ typedef struct xen_kexec_range {
>      unsigned long start;
>  } xen_kexec_range_t;
>  
> +#if __XEN_INTERFACE_VERSION__ >= 0x00040300
> +/*
> + * A contiguous chunk of a kexec image and it's destination machine
> + * address.
> + */
> +typedef struct xen_kexec_segment {
> +    XEN_GUEST_HANDLE_64(const_void) buf;
> +    uint64_t buf_size;
> +    uint64_t dest_maddr;
> +    uint64_t dest_size;
> +} xen_kexec_segment_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_kexec_segment_t);
> +
> +/*
> + * Load a kexec image into memory.
> + *
> + * For KEXEC_TYPE_DEFAULT images, the segments may be anywhere in RAM.
> + * The image is relocated prior to being executed.
> + *
> + * For KEXEC_TYPE_CRASH images, each segment of the image must reside
> + * in the memory region reserved for kexec (KEXEC_RANGE_MA_CRASH) and
> + * the entry point must be within the image. The caller is responsible
> + * for ensuring that multiple images do not overlap.
> + */
> +
> +#define KEXEC_CMD_kexec_load 4
> +typedef struct xen_kexec_load {
> +    uint8_t  type;        /* One of KEXEC_TYPE_* */

uint8_t __pad1;

> +    uint16_t arch;        /* ELF machine type (EM_*). */
> +    uint32_t __pad;

Put nr_segments here instead?

> +    uint64_t entry_maddr; /* image entry point machine address. */
> +    uint32_t nr_segments;
> +    XEN_GUEST_HANDLE_64(xen_kexec_segment_t) segments;
> +} xen_kexec_load_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_kexec_load_t);

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-02-21 17:48 ` David Vrabel
                     ` (2 preceding siblings ...)
  2013-02-22  8:42   ` [Xen-devel] " Jan Beulich
@ 2013-02-22  8:42   ` Jan Beulich
  2013-03-08 11:23   ` Daniel Kiper
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 106+ messages in thread
From: Jan Beulich @ 2013-02-22  8:42 UTC (permalink / raw)
  To: David Vrabel; +Cc: Daniel Kiper, kexec, xen-devel

>>> On 21.02.13 at 18:48, David Vrabel <david.vrabel@citrix.com> wrote:
> Crash images are copied directly into the crash region on load.
> Default images are copied into Xen heap pages and a list of source and
> destination machine addresses is created.  This is list is used in
> kexec_reloc() to relocate the image to its destination.

Did you carefully consider the implications of using Xen heap pages
here as opposed to domain heap ones? On huge systems, this may
prevent kexec from working, as you're not just trying to allocate a
handful of pages. IOW, is the less complex code really worth the
increased likelihood of a failure here?

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-02-21 17:48 ` David Vrabel
  2013-02-21 22:41   ` Daniel Kiper
  2013-02-21 22:41   ` Daniel Kiper
@ 2013-02-22  8:42   ` Jan Beulich
  2013-02-22 11:54     ` David Vrabel
  2013-02-22 11:54     ` David Vrabel
  2013-02-22  8:42   ` Jan Beulich
                     ` (4 subsequent siblings)
  7 siblings, 2 replies; 106+ messages in thread
From: Jan Beulich @ 2013-02-22  8:42 UTC (permalink / raw)
  To: David Vrabel; +Cc: Daniel Kiper, kexec, xen-devel

>>> On 21.02.13 at 18:48, David Vrabel <david.vrabel@citrix.com> wrote:
> Crash images are copied directly into the crash region on load.
> Default images are copied into Xen heap pages and a list of source and
> destination machine addresses is created.  This is list is used in
> kexec_reloc() to relocate the image to its destination.

Did you carefully consider the implications of using Xen heap pages
here as opposed to domain heap ones? On huge systems, this may
prevent kexec from working, as you're not just trying to allocate a
handful of pages. IOW, is the less complex code really worth the
increased likelihood of a failure here?

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-02-21 22:29   ` Daniel Kiper
@ 2013-02-22 11:49     ` David Vrabel
  2013-02-22 11:49     ` David Vrabel
  1 sibling, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-22 11:49 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kexec, xen-devel

On 21/02/13 22:29, Daniel Kiper wrote:
> On Thu, Feb 21, 2013 at 05:48:09PM +0000, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> Add replacement KEXEC_CMD_load and KEXEC_CMD_unload sub-ops to the
>> kexec hypercall.  These new sub-ops allow a priviledged guest to
>> provide the image data to be loaded into Xen memory or the crash
>> region instead of guests loading the image data themselves and
>> providing the relocation code and metadata.
>>
>> The old interface is provided to guests requesting an interface
>> version prior to 4.3.
>>
[...]
>> +/*
>> + * Load a kexec image into memory.
>> + *
>> + * For KEXEC_TYPE_DEFAULT images, the segments may be anywhere in RAM.
>> + * The image is relocated prior to being executed.
>> + *
>> + * For KEXEC_TYPE_CRASH images, each segment of the image must reside
>> + * in the memory region reserved for kexec (KEXEC_RANGE_MA_CRASH) and
>> + * the entry point must be within the image. The caller is responsible
>> + * for ensuring that multiple images do not overlap.
> 
> What do you mean by "The caller is responsible for ensuring
> that multiple images do not overlap."?

The intention here is to allow for safe replacement of a crash image by
loading the second image at a different location in the crash region.

This won't actually work however, as the control pages (also allocated
from the crash region) will conflict.

This is the behaviour of the Linux implementation.  It's less than ideal
and something I plan to look at later on (it's low priority as replacing
crash images isn't an interesting use case).

>> + */
>> +
>> +#define KEXEC_CMD_kexec_load 4
>> +typedef struct xen_kexec_load {
>> +    uint8_t  type;        /* One of KEXEC_TYPE_* */
>> +    uint16_t arch;        /* ELF machine type (EM_*). */
>> +    uint32_t __pad;
> 
> Why do you need __pad here?

To ensure that the following uint64_t is aligned to 8 bytes in both 32
and 64-bit.

Annoyingly uint64_t only has 4 byte alignment on 32-bit architectures.

>> +    uint64_t entry_maddr; /* image entry point machine address. */
>> +    uint32_t nr_segments;
>> +    XEN_GUEST_HANDLE_64(xen_kexec_segment_t) segments;
>> +} xen_kexec_load_t;
>> +DEFINE_XEN_GUEST_HANDLE(xen_kexec_load_t);
>> +
>> +/*
>> + * Unload a kexec image.
>> + *
>> + * Type must be one of KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH.
>> + */
>> +#define KEXEC_CMD_kexec_unload 5
>> +typedef struct xen_kexec_unload {
>> +    uint8_t type;
>> +} xen_kexec_unload_t;
>> +DEFINE_XEN_GUEST_HANDLE(xen_kexec_unload_t);
>> +
>> +#else /* __XEN_INTERFACE_VERSION__ < 0x00040300 */
>> +
>> +#undef KEXEC_CMD_kexec_load
>> +#undef KEXEC_CMD_kexec_unload
>> +#define KEXEC_CMD_kexec_load KEXEC_CMD_kexec_load_v1
>> +#define KEXEC_CMD_kexec_unload KEXEC_CMD_kexec_unload_v1
> 
> Could you define all constants in one place at the
> beginning of this file? It is very difficult to
> see what is going on. Especially those undefs are
> crazy for me.

I was copying the style used for sched_op_compat.

David

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-02-21 22:29   ` Daniel Kiper
  2013-02-22 11:49     ` David Vrabel
@ 2013-02-22 11:49     ` David Vrabel
  1 sibling, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-22 11:49 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kexec, xen-devel

On 21/02/13 22:29, Daniel Kiper wrote:
> On Thu, Feb 21, 2013 at 05:48:09PM +0000, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> Add replacement KEXEC_CMD_load and KEXEC_CMD_unload sub-ops to the
>> kexec hypercall.  These new sub-ops allow a priviledged guest to
>> provide the image data to be loaded into Xen memory or the crash
>> region instead of guests loading the image data themselves and
>> providing the relocation code and metadata.
>>
>> The old interface is provided to guests requesting an interface
>> version prior to 4.3.
>>
[...]
>> +/*
>> + * Load a kexec image into memory.
>> + *
>> + * For KEXEC_TYPE_DEFAULT images, the segments may be anywhere in RAM.
>> + * The image is relocated prior to being executed.
>> + *
>> + * For KEXEC_TYPE_CRASH images, each segment of the image must reside
>> + * in the memory region reserved for kexec (KEXEC_RANGE_MA_CRASH) and
>> + * the entry point must be within the image. The caller is responsible
>> + * for ensuring that multiple images do not overlap.
> 
> What do you mean by "The caller is responsible for ensuring
> that multiple images do not overlap."?

The intention here is to allow for safe replacement of a crash image by
loading the second image at a different location in the crash region.

This won't actually work however, as the control pages (also allocated
from the crash region) will conflict.

This is the behaviour of the Linux implementation.  It's less than ideal
and something I plan to look at later on (it's low priority as replacing
crash images isn't an interesting use case).

>> + */
>> +
>> +#define KEXEC_CMD_kexec_load 4
>> +typedef struct xen_kexec_load {
>> +    uint8_t  type;        /* One of KEXEC_TYPE_* */
>> +    uint16_t arch;        /* ELF machine type (EM_*). */
>> +    uint32_t __pad;
> 
> Why do you need __pad here?

To ensure that the following uint64_t is aligned to 8 bytes in both 32
and 64-bit.

Annoyingly uint64_t only has 4 byte alignment on 32-bit architectures.

>> +    uint64_t entry_maddr; /* image entry point machine address. */
>> +    uint32_t nr_segments;
>> +    XEN_GUEST_HANDLE_64(xen_kexec_segment_t) segments;
>> +} xen_kexec_load_t;
>> +DEFINE_XEN_GUEST_HANDLE(xen_kexec_load_t);
>> +
>> +/*
>> + * Unload a kexec image.
>> + *
>> + * Type must be one of KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH.
>> + */
>> +#define KEXEC_CMD_kexec_unload 5
>> +typedef struct xen_kexec_unload {
>> +    uint8_t type;
>> +} xen_kexec_unload_t;
>> +DEFINE_XEN_GUEST_HANDLE(xen_kexec_unload_t);
>> +
>> +#else /* __XEN_INTERFACE_VERSION__ < 0x00040300 */
>> +
>> +#undef KEXEC_CMD_kexec_load
>> +#undef KEXEC_CMD_kexec_unload
>> +#define KEXEC_CMD_kexec_load KEXEC_CMD_kexec_load_v1
>> +#define KEXEC_CMD_kexec_unload KEXEC_CMD_kexec_unload_v1
> 
> Could you define all constants in one place at the
> beginning of this file? It is very difficult to
> see what is going on. Especially those undefs are
> crazy for me.

I was copying the style used for sched_op_compat.

David

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-02-22  8:33   ` [Xen-devel] " Jan Beulich
@ 2013-02-22 11:50     ` David Vrabel
  2013-02-22 11:50     ` [Xen-devel] " David Vrabel
  1 sibling, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-22 11:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Daniel Kiper, kexec, xen-devel

On 22/02/13 08:33, Jan Beulich wrote:
>>>> On 21.02.13 at 18:48, David Vrabel <david.vrabel@citrix.com> wrote:
>> --- a/xen/include/public/kexec.h
>> +++ b/xen/include/public/kexec.h
>> @@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
>>   * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
>>   * image == relocation information for kexec (ignored for unload) [in]
>>   */
>> -#define KEXEC_CMD_kexec_load            1
>> -#define KEXEC_CMD_kexec_unload          2
>> -typedef struct xen_kexec_load {
>> +#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
>> +#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
>> +typedef struct xen_kexec_load_v1 {
>>      int type;
>>      xen_kexec_image_t image;
>> -} xen_kexec_load_t;
>> +} xen_kexec_load_v1_t;
> 
> You can't change type names like this without also guarding them
> with a __XEN_INTERFACE_VERSION__ conditional or providing
> backward compatibility #define-s.

There are backward compatible definitions provided at the end of the
header file.

I will probably refactor this as it confused Daniel as well.

David

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-02-22  8:33   ` [Xen-devel] " Jan Beulich
  2013-02-22 11:50     ` David Vrabel
@ 2013-02-22 11:50     ` David Vrabel
  2013-02-22 13:09       ` Jan Beulich
  2013-02-22 13:09       ` [Xen-devel] " Jan Beulich
  1 sibling, 2 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-22 11:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Daniel Kiper, kexec, xen-devel

On 22/02/13 08:33, Jan Beulich wrote:
>>>> On 21.02.13 at 18:48, David Vrabel <david.vrabel@citrix.com> wrote:
>> --- a/xen/include/public/kexec.h
>> +++ b/xen/include/public/kexec.h
>> @@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
>>   * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
>>   * image == relocation information for kexec (ignored for unload) [in]
>>   */
>> -#define KEXEC_CMD_kexec_load            1
>> -#define KEXEC_CMD_kexec_unload          2
>> -typedef struct xen_kexec_load {
>> +#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
>> +#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
>> +typedef struct xen_kexec_load_v1 {
>>      int type;
>>      xen_kexec_image_t image;
>> -} xen_kexec_load_t;
>> +} xen_kexec_load_v1_t;
> 
> You can't change type names like this without also guarding them
> with a __XEN_INTERFACE_VERSION__ conditional or providing
> backward compatibility #define-s.

There are backward compatible definitions provided at the end of the
header file.

I will probably refactor this as it confused Daniel as well.

David

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-02-22  8:42   ` [Xen-devel] " Jan Beulich
  2013-02-22 11:54     ` David Vrabel
@ 2013-02-22 11:54     ` David Vrabel
  1 sibling, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-22 11:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Daniel Kiper, kexec, xen-devel

On 22/02/13 08:42, Jan Beulich wrote:
>>>> On 21.02.13 at 18:48, David Vrabel <david.vrabel@citrix.com> wrote:
>> Crash images are copied directly into the crash region on load.
>> Default images are copied into Xen heap pages and a list of source and
>> destination machine addresses is created.  This is list is used in
>> kexec_reloc() to relocate the image to its destination.
> 
> Did you carefully consider the implications of using Xen heap pages
> here as opposed to domain heap ones? On huge systems, this may
> prevent kexec from working, as you're not just trying to allocate a
> handful of pages. IOW, is the less complex code really worth the
> increased likelihood of a failure here?

I wouldn't say carefully considered... I thought about using dom heap
briefly and took the lazy route.

I take your point though and will change it to use the dom heap.  Is
there a way to verify that all the map/unmaps are correctly done and it
isn't just working by chance?  Some sort of debug option?

David

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-02-22  8:42   ` [Xen-devel] " Jan Beulich
@ 2013-02-22 11:54     ` David Vrabel
  2013-02-22 13:11       ` Jan Beulich
  2013-02-22 13:11       ` Jan Beulich
  2013-02-22 11:54     ` David Vrabel
  1 sibling, 2 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-22 11:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Daniel Kiper, kexec, xen-devel

On 22/02/13 08:42, Jan Beulich wrote:
>>>> On 21.02.13 at 18:48, David Vrabel <david.vrabel@citrix.com> wrote:
>> Crash images are copied directly into the crash region on load.
>> Default images are copied into Xen heap pages and a list of source and
>> destination machine addresses is created.  This is list is used in
>> kexec_reloc() to relocate the image to its destination.
> 
> Did you carefully consider the implications of using Xen heap pages
> here as opposed to domain heap ones? On huge systems, this may
> prevent kexec from working, as you're not just trying to allocate a
> handful of pages. IOW, is the less complex code really worth the
> increased likelihood of a failure here?

I wouldn't say carefully considered... I thought about using dom heap
briefly and took the lazy route.

I take your point though and will change it to use the dom heap.  Is
there a way to verify that all the map/unmaps are correctly done and it
isn't just working by chance?  Some sort of debug option?

David

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels
  2013-02-22  8:17 ` [Xen-devel] " Jan Beulich
  2013-02-22 11:56   ` David Vrabel
@ 2013-02-22 11:56   ` David Vrabel
  1 sibling, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-22 11:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Daniel Kiper, kexec, xen-devel

On 22/02/13 08:17, Jan Beulich wrote:
>>>> On 21.02.13 at 18:48, David Vrabel <david.vrabel@citrix.com> wrote:
>> The series improves the kexec hypercall by making Xen responsible for
>> loading and relocating the image.  This allows kexec to be usable by
>> pv-ops kernels and should allow kexec to be usable from a HVM or PVH
>> privileged domain.
>>
>> The first patch is a simple clean-up.
>>
>> The second patch allows hypercall structures to be ABI compatible
>> between 32- and 64-bit guests (by reusing stuff present for domctls
>> and sysctls).  This seems better than having to keep adding compat
>> handling for new hypercalls etc.
>>
>> Patch 3 introduces the new ABI.
>>
>> Patch 4 and 5 nearly completely reimplement the kexec load, unload and
>> exec sub-ops.  The old load_v1 sub-op is then implemented on top of
>> the new code.
>>
>> Patch 6 calls the kexec image when dom0 crashes.  This avoids having
>> to alter dom0 kernels to do a exec sub-op call on crash -- an existing
>> SHUTDOWN_crash.
> 
> Am I right in understanding that at this point no kexec support is
> necessary in the Dom0 kernel at all anymore? If so, that's a very
> nice move - thanks for doing that!

Yes.  It will kexec slightly later than it would on native (or classic)
but I don't think this will be a problem in practice.

David

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels
  2013-02-22  8:17 ` [Xen-devel] " Jan Beulich
@ 2013-02-22 11:56   ` David Vrabel
  2013-02-22 11:56   ` David Vrabel
  1 sibling, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-02-22 11:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Daniel Kiper, kexec, xen-devel

On 22/02/13 08:17, Jan Beulich wrote:
>>>> On 21.02.13 at 18:48, David Vrabel <david.vrabel@citrix.com> wrote:
>> The series improves the kexec hypercall by making Xen responsible for
>> loading and relocating the image.  This allows kexec to be usable by
>> pv-ops kernels and should allow kexec to be usable from a HVM or PVH
>> privileged domain.
>>
>> The first patch is a simple clean-up.
>>
>> The second patch allows hypercall structures to be ABI compatible
>> between 32- and 64-bit guests (by reusing stuff present for domctls
>> and sysctls).  This seems better than having to keep adding compat
>> handling for new hypercalls etc.
>>
>> Patch 3 introduces the new ABI.
>>
>> Patch 4 and 5 nearly completely reimplement the kexec load, unload and
>> exec sub-ops.  The old load_v1 sub-op is then implemented on top of
>> the new code.
>>
>> Patch 6 calls the kexec image when dom0 crashes.  This avoids having
>> to alter dom0 kernels to do a exec sub-op call on crash -- an existing
>> SHUTDOWN_crash.
> 
> Am I right in understanding that at this point no kexec support is
> necessary in the Dom0 kernel at all anymore? If so, that's a very
> nice move - thanks for doing that!

Yes.  It will kexec slightly later than it would on native (or classic)
but I don't think this will be a problem in practice.

David

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-02-22 11:50     ` [Xen-devel] " David Vrabel
@ 2013-02-22 13:09       ` Jan Beulich
  2013-02-22 13:09       ` [Xen-devel] " Jan Beulich
  1 sibling, 0 replies; 106+ messages in thread
From: Jan Beulich @ 2013-02-22 13:09 UTC (permalink / raw)
  To: David Vrabel; +Cc: Daniel Kiper, kexec, xen-devel

>>> On 22.02.13 at 12:50, David Vrabel <david.vrabel@citrix.com> wrote:
> On 22/02/13 08:33, Jan Beulich wrote:
>>>>> On 21.02.13 at 18:48, David Vrabel <david.vrabel@citrix.com> wrote:
>>> --- a/xen/include/public/kexec.h
>>> +++ b/xen/include/public/kexec.h
>>> @@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
>>>   * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
>>>   * image == relocation information for kexec (ignored for unload) [in]
>>>   */
>>> -#define KEXEC_CMD_kexec_load            1
>>> -#define KEXEC_CMD_kexec_unload          2
>>> -typedef struct xen_kexec_load {
>>> +#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
>>> +#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
>>> +typedef struct xen_kexec_load_v1 {
>>>      int type;
>>>      xen_kexec_image_t image;
>>> -} xen_kexec_load_t;
>>> +} xen_kexec_load_v1_t;
>> 
>> You can't change type names like this without also guarding them
>> with a __XEN_INTERFACE_VERSION__ conditional or providing
>> backward compatibility #define-s.
> 
> There are backward compatible definitions provided at the end of the
> header file.

There's a typedef producing xen_kexec_load_t, but there's no
way for a consumer to use struct xen_kexec_load afaics.

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-02-22 11:50     ` [Xen-devel] " David Vrabel
  2013-02-22 13:09       ` Jan Beulich
@ 2013-02-22 13:09       ` Jan Beulich
  1 sibling, 0 replies; 106+ messages in thread
From: Jan Beulich @ 2013-02-22 13:09 UTC (permalink / raw)
  To: David Vrabel; +Cc: Daniel Kiper, kexec, xen-devel

>>> On 22.02.13 at 12:50, David Vrabel <david.vrabel@citrix.com> wrote:
> On 22/02/13 08:33, Jan Beulich wrote:
>>>>> On 21.02.13 at 18:48, David Vrabel <david.vrabel@citrix.com> wrote:
>>> --- a/xen/include/public/kexec.h
>>> +++ b/xen/include/public/kexec.h
>>> @@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
>>>   * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
>>>   * image == relocation information for kexec (ignored for unload) [in]
>>>   */
>>> -#define KEXEC_CMD_kexec_load            1
>>> -#define KEXEC_CMD_kexec_unload          2
>>> -typedef struct xen_kexec_load {
>>> +#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
>>> +#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
>>> +typedef struct xen_kexec_load_v1 {
>>>      int type;
>>>      xen_kexec_image_t image;
>>> -} xen_kexec_load_t;
>>> +} xen_kexec_load_v1_t;
>> 
>> You can't change type names like this without also guarding them
>> with a __XEN_INTERFACE_VERSION__ conditional or providing
>> backward compatibility #define-s.
> 
> There are backward compatible definitions provided at the end of the
> header file.

There's a typedef producing xen_kexec_load_t, but there's no
way for a consumer to use struct xen_kexec_load afaics.

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-02-22 11:54     ` David Vrabel
  2013-02-22 13:11       ` Jan Beulich
@ 2013-02-22 13:11       ` Jan Beulich
  1 sibling, 0 replies; 106+ messages in thread
From: Jan Beulich @ 2013-02-22 13:11 UTC (permalink / raw)
  To: David Vrabel; +Cc: Daniel Kiper, kexec, xen-devel

>>> On 22.02.13 at 12:54, David Vrabel <david.vrabel@citrix.com> wrote:
> I take your point though and will change it to use the dom heap.  Is
> there a way to verify that all the map/unmaps are correctly done and it
> isn't just working by chance?  Some sort of debug option?

Not currently (other than running out of map space at some point,
resulting in a BUG_ON() to trigger).

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-02-22 11:54     ` David Vrabel
@ 2013-02-22 13:11       ` Jan Beulich
  2013-02-22 13:11       ` Jan Beulich
  1 sibling, 0 replies; 106+ messages in thread
From: Jan Beulich @ 2013-02-22 13:11 UTC (permalink / raw)
  To: David Vrabel; +Cc: Daniel Kiper, kexec, xen-devel

>>> On 22.02.13 at 12:54, David Vrabel <david.vrabel@citrix.com> wrote:
> I take your point though and will change it to use the dom heap.  Is
> there a way to verify that all the map/unmaps are correctly done and it
> isn't just working by chance?  Some sort of debug option?

Not currently (other than running out of map space at some point,
resulting in a BUG_ON() to trigger).

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (19 preceding siblings ...)
  2013-02-22  8:17 ` [Xen-devel] " Jan Beulich
@ 2013-02-26 13:58 ` Don Slutz
  2013-02-26 13:58 ` [Xen-devel] " Don Slutz
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 106+ messages in thread
From: Don Slutz @ 2013-02-26 13:58 UTC (permalink / raw)
  To: David Vrabel; +Cc: Daniel Kiper, kexec, xen-devel

On 02/21/13 12:48, David Vrabel wrote:
> The series improves the kexec hypercall by making Xen responsible for
> loading and relocating the image.  This allows kexec to be usable by
> pv-ops kernels and should allow kexec to be usable from a HVM or PVH
> privileged domain.
>
> The first patch is a simple clean-up.
>
> The second patch allows hypercall structures to be ABI compatible
> between 32- and 64-bit guests (by reusing stuff present for domctls
> and sysctls).  This seems better than having to keep adding compat
> handling for new hypercalls etc.
>
> Patch 3 introduces the new ABI.
>
> Patch 4 and 5 nearly completely reimplement the kexec load, unload and
> exec sub-ops.  The old load_v1 sub-op is then implemented on top of
> the new code.
>
> Patch 6 calls the kexec image when dom0 crashes.  This avoids having
> to alter dom0 kernels to do a exec sub-op call on crash -- an existing
> SHUTDOWN_crash.
>
> Patches 7 and 8 add the libxc API for the kexec calls.
>
> The required patch series for kexec-tools will be posted shortly.
>
> David
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
I have tested this patch set on a Fedora 17 dom0 with kernels:

3.7.3-101.fc17.x86_64
3.6.5-1.fc17.x86_64

So:

Tested-by: Don Slutz <Don@CloudSwitch.com>

    -Don Slutz

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (20 preceding siblings ...)
  2013-02-26 13:58 ` Don Slutz
@ 2013-02-26 13:58 ` Don Slutz
  2013-03-05 11:04 ` David Vrabel
  2013-03-05 11:04 ` [Xen-devel] " David Vrabel
  23 siblings, 0 replies; 106+ messages in thread
From: Don Slutz @ 2013-02-26 13:58 UTC (permalink / raw)
  To: David Vrabel; +Cc: Daniel Kiper, kexec, xen-devel

On 02/21/13 12:48, David Vrabel wrote:
> The series improves the kexec hypercall by making Xen responsible for
> loading and relocating the image.  This allows kexec to be usable by
> pv-ops kernels and should allow kexec to be usable from a HVM or PVH
> privileged domain.
>
> The first patch is a simple clean-up.
>
> The second patch allows hypercall structures to be ABI compatible
> between 32- and 64-bit guests (by reusing stuff present for domctls
> and sysctls).  This seems better than having to keep adding compat
> handling for new hypercalls etc.
>
> Patch 3 introduces the new ABI.
>
> Patch 4 and 5 nearly completely reimplement the kexec load, unload and
> exec sub-ops.  The old load_v1 sub-op is then implemented on top of
> the new code.
>
> Patch 6 calls the kexec image when dom0 crashes.  This avoids having
> to alter dom0 kernels to do a exec sub-op call on crash -- an existing
> SHUTDOWN_crash.
>
> Patches 7 and 8 add the libxc API for the kexec calls.
>
> The required patch series for kexec-tools will be posted shortly.
>
> David
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
I have tested this patch set on a Fedora 17 dom0 with kernels:

3.7.3-101.fc17.x86_64
3.6.5-1.fc17.x86_64

So:

Tested-by: Don Slutz <Don@CloudSwitch.com>

    -Don Slutz

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (21 preceding siblings ...)
  2013-02-26 13:58 ` [Xen-devel] " Don Slutz
@ 2013-03-05 11:04 ` David Vrabel
  2013-03-05 11:04 ` [Xen-devel] " David Vrabel
  23 siblings, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-03-05 11:04 UTC (permalink / raw)
  To: David Vrabel; +Cc: Ian Jackson, Daniel Kiper, kexec, Ian Campbell, xen-devel

On 21/02/13 17:48, David Vrabel wrote:
> The series improves the kexec hypercall by making Xen responsible for
> loading and relocating the image.  This allows kexec to be usable by
> pv-ops kernels and should allow kexec to be usable from a HVM or PVH
> privileged domain.

Any further comments/acks for any of these patches? Specifically patch 2
and the toolstack patches (7 and 8).

Thanks.

> The first patch is a simple clean-up.
> 
> The second patch allows hypercall structures to be ABI compatible
> between 32- and 64-bit guests (by reusing stuff present for domctls
> and sysctls).  This seems better than having to keep adding compat
> handling for new hypercalls etc.
> 
> Patch 3 introduces the new ABI.
> 
> Patch 4 and 5 nearly completely reimplement the kexec load, unload and
> exec sub-ops.  The old load_v1 sub-op is then implemented on top of
> the new code.
> 
> Patch 6 calls the kexec image when dom0 crashes.  This avoids having
> to alter dom0 kernels to do a exec sub-op call on crash -- an existing
> SHUTDOWN_crash.
> 
> Patches 7 and 8 add the libxc API for the kexec calls.
> 
> The required patch series for kexec-tools will be posted shortly.

David

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels
  2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
                   ` (22 preceding siblings ...)
  2013-03-05 11:04 ` David Vrabel
@ 2013-03-05 11:04 ` David Vrabel
  23 siblings, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-03-05 11:04 UTC (permalink / raw)
  To: David Vrabel; +Cc: Ian Jackson, Daniel Kiper, kexec, Ian Campbell, xen-devel

On 21/02/13 17:48, David Vrabel wrote:
> The series improves the kexec hypercall by making Xen responsible for
> loading and relocating the image.  This allows kexec to be usable by
> pv-ops kernels and should allow kexec to be usable from a HVM or PVH
> privileged domain.

Any further comments/acks for any of these patches? Specifically patch 2
and the toolstack patches (7 and 8).

Thanks.

> The first patch is a simple clean-up.
> 
> The second patch allows hypercall structures to be ABI compatible
> between 32- and 64-bit guests (by reusing stuff present for domctls
> and sysctls).  This seems better than having to keep adding compat
> handling for new hypercalls etc.
> 
> Patch 3 introduces the new ABI.
> 
> Patch 4 and 5 nearly completely reimplement the kexec load, unload and
> exec sub-ops.  The old load_v1 sub-op is then implemented on top of
> the new code.
> 
> Patch 6 calls the kexec image when dom0 crashes.  This avoids having
> to alter dom0 kernels to do a exec sub-op call on crash -- an existing
> SHUTDOWN_crash.
> 
> Patches 7 and 8 add the libxc API for the kexec calls.
> 
> The required patch series for kexec-tools will be posted shortly.

David

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 7/8] libxc: add hypercall buffer arrays
  2013-02-21 17:48 ` David Vrabel
@ 2013-03-06 14:25   ` Ian Jackson
  2013-03-06 14:25   ` [Xen-devel] " Ian Jackson
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 106+ messages in thread
From: Ian Jackson @ 2013-03-06 14:25 UTC (permalink / raw)
  To: David Vrabel; +Cc: Daniel Kiper, kexec, Ian Campbell, xen-devel

David Vrabel writes ("[Xen-devel] [PATCH 7/8] libxc: add hypercall buffer arrays"):
> From: David Vrabel <david.vrabel@citrix.com>
> 
> Hypercall buffer arrays are used when a hypercall takes a variable
> length array of buffers.

Ian Campbell did the hypercall buffers and is the relevant authority,
so CC'ing him.

Ian.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 7/8] libxc: add hypercall buffer arrays
  2013-02-21 17:48 ` David Vrabel
  2013-03-06 14:25   ` Ian Jackson
@ 2013-03-06 14:25   ` Ian Jackson
  2013-03-07  2:44   ` Ian Campbell
  2013-03-07  2:44   ` Ian Campbell
  3 siblings, 0 replies; 106+ messages in thread
From: Ian Jackson @ 2013-03-06 14:25 UTC (permalink / raw)
  To: David Vrabel; +Cc: Daniel Kiper, kexec, Ian Campbell, xen-devel

David Vrabel writes ("[Xen-devel] [PATCH 7/8] libxc: add hypercall buffer arrays"):
> From: David Vrabel <david.vrabel@citrix.com>
> 
> Hypercall buffer arrays are used when a hypercall takes a variable
> length array of buffers.

Ian Campbell did the hypercall buffers and is the relevant authority,
so CC'ing him.

Ian.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 7/8] libxc: add hypercall buffer arrays
  2013-02-21 17:48 ` David Vrabel
                     ` (2 preceding siblings ...)
  2013-03-07  2:44   ` Ian Campbell
@ 2013-03-07  2:44   ` Ian Campbell
  3 siblings, 0 replies; 106+ messages in thread
From: Ian Campbell @ 2013-03-07  2:44 UTC (permalink / raw)
  To: David Vrabel; +Cc: Daniel Kiper, kexec, xen-devel

On Thu, 2013-02-21 at 17:48 +0000, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
> 
> Hypercall buffer arrays are used when a hypercall takes a variable
> length array of buffers.

This looks good to me, I took a quick peek at the 4/4 patch in the kexec
tools series which uses it as well.

Acked-by: Ian Campbell <ian.campbell@citrix.com>

> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>  tools/libxc/xc_hcall_buf.c |   73 ++++++++++++++++++++++++++++++++++++++++++++
>  tools/libxc/xenctrl.h      |   27 ++++++++++++++++
>  2 files changed, 100 insertions(+), 0 deletions(-)
> 
> diff --git a/tools/libxc/xc_hcall_buf.c b/tools/libxc/xc_hcall_buf.c
> index ced9abd..3e01f3f 100644
> --- a/tools/libxc/xc_hcall_buf.c
> +++ b/tools/libxc/xc_hcall_buf.c
> @@ -228,6 +228,79 @@ void xc__hypercall_bounce_post(xc_interface *xch, xc_hypercall_buffer_t *b)
>      xc__hypercall_buffer_free(xch, b);
>  }
>  
> +struct xc_hypercall_buffer_array {
> +    unsigned max_bufs;
> +    xc_hypercall_buffer_t *bufs;
> +};
> +
> +xc_hypercall_buffer_array_t *xc_hypercall_buffer_array_create(xc_interface *xch,
> +                                                              unsigned n)
> +{
> +    xc_hypercall_buffer_array_t *array;
> +    xc_hypercall_buffer_t *bufs = NULL;
> +
> +    array = malloc(sizeof(*array));
> +    if ( array == NULL )
> +        goto error;
> +
> +    bufs = calloc(n, sizeof(*bufs));
> +    if ( bufs == NULL )
> +        goto error;
> +
> +    array->max_bufs = n;
> +    array->bufs     = bufs;
> +
> +    return array;
> +
> +error:
> +    free(bufs);
> +    free(array);
> +    return NULL;
> +}
> +
> +void *xc__hypercall_buffer_array_alloc(xc_interface *xch,
> +                                       xc_hypercall_buffer_array_t *array,
> +                                       unsigned index,
> +                                       xc_hypercall_buffer_t *hbuf,
> +                                       size_t size)
> +{
> +    void *buf;
> +
> +    if ( index >= array->max_bufs || array->bufs[index].hbuf )
> +        abort();
> +
> +    buf = xc__hypercall_buffer_alloc(xch, hbuf, size);
> +    if ( buf )
> +        array->bufs[index] = *hbuf;
> +    return buf;
> +}
> +
> +void *xc__hypercall_buffer_array_get(xc_interface *xch,
> +                                     xc_hypercall_buffer_array_t *array,
> +                                     unsigned index,
> +                                     xc_hypercall_buffer_t *hbuf)
> +{
> +    if ( index >= array->max_bufs || array->bufs[index].hbuf == NULL )
> +        abort();
> +
> +    *hbuf = array->bufs[index];
> +    return array->bufs[index].hbuf;
> +}
> +
> +void xc_hypercall_buffer_array_destroy(xc_interface *xc,
> +                                       xc_hypercall_buffer_array_t *array)
> +{
> +    unsigned i;
> +
> +    if ( array == NULL )
> +        return;
> +
> +    for (i = 0; i < array->max_bufs; i++ )
> +        xc__hypercall_buffer_free(xc, &array->bufs[i]);
> +    free(array->bufs);
> +    free(array);
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> index 32122fd..c3b2c28 100644
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -317,6 +317,33 @@ void xc__hypercall_buffer_free_pages(xc_interface *xch, xc_hypercall_buffer_t *b
>  #define xc_hypercall_buffer_free_pages(_xch, _name, _nr) xc__hypercall_buffer_free_pages(_xch, HYPERCALL_BUFFER(_name), _nr)
>  
>  /*
> + * Array of hypercall buffers.
> + *
> + * Create an array with xc_hypercall_buffer_array_create() and
> + * populate it by declaring one hypercall buffer in a loop and
> + * allocating the buffer with xc_hypercall_buffer_array_alloc().
> + *
> + * To access a previously allocated buffers, declare a new hypercall
> + * buffer and call xc_hypercall_buffer_array_get().
> + *
> + * Destroy the array with xc_hypercall_buffer_array_destroy() to free
> + * the array and all its alocated hypercall buffers.
> + */
> +struct xc_hypercall_buffer_array;
> +typedef struct xc_hypercall_buffer_array xc_hypercall_buffer_array_t;
> +
> +xc_hypercall_buffer_array_t *xc_hypercall_buffer_array_create(xc_interface *xch, unsigned n);
> +void *xc__hypercall_buffer_array_alloc(xc_interface *xch, xc_hypercall_buffer_array_t *array,
> +                                       unsigned index, xc_hypercall_buffer_t *hbuf, size_t size);
> +#define xc_hypercall_buffer_array_alloc(_xch, _array, _index, _name, _size) \
> +    xc__hypercall_buffer_array_alloc(_xch, _array, _index, HYPERCALL_BUFFER(_name), _size)
> +void *xc__hypercall_buffer_array_get(xc_interface *xch, xc_hypercall_buffer_array_t *array,
> +                                     unsigned index, xc_hypercall_buffer_t *hbuf);
> +#define xc_hypercall_buffer_array_get(_xch, _array, _index, _name, _size) \
> +    xc__hypercall_buffer_array_get(_xch, _array, _index, HYPERCALL_BUFFER(_name))
> +void xc_hypercall_buffer_array_destroy(xc_interface *xc, xc_hypercall_buffer_array_t *array);
> +
> +/*
>   * CPUMAP handling
>   */
>  typedef uint8_t *xc_cpumap_t;

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 7/8] libxc: add hypercall buffer arrays
  2013-02-21 17:48 ` David Vrabel
  2013-03-06 14:25   ` Ian Jackson
  2013-03-06 14:25   ` [Xen-devel] " Ian Jackson
@ 2013-03-07  2:44   ` Ian Campbell
  2013-03-07  2:44   ` Ian Campbell
  3 siblings, 0 replies; 106+ messages in thread
From: Ian Campbell @ 2013-03-07  2:44 UTC (permalink / raw)
  To: David Vrabel; +Cc: Daniel Kiper, kexec, xen-devel

On Thu, 2013-02-21 at 17:48 +0000, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
> 
> Hypercall buffer arrays are used when a hypercall takes a variable
> length array of buffers.

This looks good to me, I took a quick peek at the 4/4 patch in the kexec
tools series which uses it as well.

Acked-by: Ian Campbell <ian.campbell@citrix.com>

> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>  tools/libxc/xc_hcall_buf.c |   73 ++++++++++++++++++++++++++++++++++++++++++++
>  tools/libxc/xenctrl.h      |   27 ++++++++++++++++
>  2 files changed, 100 insertions(+), 0 deletions(-)
> 
> diff --git a/tools/libxc/xc_hcall_buf.c b/tools/libxc/xc_hcall_buf.c
> index ced9abd..3e01f3f 100644
> --- a/tools/libxc/xc_hcall_buf.c
> +++ b/tools/libxc/xc_hcall_buf.c
> @@ -228,6 +228,79 @@ void xc__hypercall_bounce_post(xc_interface *xch, xc_hypercall_buffer_t *b)
>      xc__hypercall_buffer_free(xch, b);
>  }
>  
> +struct xc_hypercall_buffer_array {
> +    unsigned max_bufs;
> +    xc_hypercall_buffer_t *bufs;
> +};
> +
> +xc_hypercall_buffer_array_t *xc_hypercall_buffer_array_create(xc_interface *xch,
> +                                                              unsigned n)
> +{
> +    xc_hypercall_buffer_array_t *array;
> +    xc_hypercall_buffer_t *bufs = NULL;
> +
> +    array = malloc(sizeof(*array));
> +    if ( array == NULL )
> +        goto error;
> +
> +    bufs = calloc(n, sizeof(*bufs));
> +    if ( bufs == NULL )
> +        goto error;
> +
> +    array->max_bufs = n;
> +    array->bufs     = bufs;
> +
> +    return array;
> +
> +error:
> +    free(bufs);
> +    free(array);
> +    return NULL;
> +}
> +
> +void *xc__hypercall_buffer_array_alloc(xc_interface *xch,
> +                                       xc_hypercall_buffer_array_t *array,
> +                                       unsigned index,
> +                                       xc_hypercall_buffer_t *hbuf,
> +                                       size_t size)
> +{
> +    void *buf;
> +
> +    if ( index >= array->max_bufs || array->bufs[index].hbuf )
> +        abort();
> +
> +    buf = xc__hypercall_buffer_alloc(xch, hbuf, size);
> +    if ( buf )
> +        array->bufs[index] = *hbuf;
> +    return buf;
> +}
> +
> +void *xc__hypercall_buffer_array_get(xc_interface *xch,
> +                                     xc_hypercall_buffer_array_t *array,
> +                                     unsigned index,
> +                                     xc_hypercall_buffer_t *hbuf)
> +{
> +    if ( index >= array->max_bufs || array->bufs[index].hbuf == NULL )
> +        abort();
> +
> +    *hbuf = array->bufs[index];
> +    return array->bufs[index].hbuf;
> +}
> +
> +void xc_hypercall_buffer_array_destroy(xc_interface *xc,
> +                                       xc_hypercall_buffer_array_t *array)
> +{
> +    unsigned i;
> +
> +    if ( array == NULL )
> +        return;
> +
> +    for (i = 0; i < array->max_bufs; i++ )
> +        xc__hypercall_buffer_free(xc, &array->bufs[i]);
> +    free(array->bufs);
> +    free(array);
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> index 32122fd..c3b2c28 100644
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -317,6 +317,33 @@ void xc__hypercall_buffer_free_pages(xc_interface *xch, xc_hypercall_buffer_t *b
>  #define xc_hypercall_buffer_free_pages(_xch, _name, _nr) xc__hypercall_buffer_free_pages(_xch, HYPERCALL_BUFFER(_name), _nr)
>  
>  /*
> + * Array of hypercall buffers.
> + *
> + * Create an array with xc_hypercall_buffer_array_create() and
> + * populate it by declaring one hypercall buffer in a loop and
> + * allocating the buffer with xc_hypercall_buffer_array_alloc().
> + *
> + * To access a previously allocated buffers, declare a new hypercall
> + * buffer and call xc_hypercall_buffer_array_get().
> + *
> + * Destroy the array with xc_hypercall_buffer_array_destroy() to free
> + * the array and all its alocated hypercall buffers.
> + */
> +struct xc_hypercall_buffer_array;
> +typedef struct xc_hypercall_buffer_array xc_hypercall_buffer_array_t;
> +
> +xc_hypercall_buffer_array_t *xc_hypercall_buffer_array_create(xc_interface *xch, unsigned n);
> +void *xc__hypercall_buffer_array_alloc(xc_interface *xch, xc_hypercall_buffer_array_t *array,
> +                                       unsigned index, xc_hypercall_buffer_t *hbuf, size_t size);
> +#define xc_hypercall_buffer_array_alloc(_xch, _array, _index, _name, _size) \
> +    xc__hypercall_buffer_array_alloc(_xch, _array, _index, HYPERCALL_BUFFER(_name), _size)
> +void *xc__hypercall_buffer_array_get(xc_interface *xch, xc_hypercall_buffer_array_t *array,
> +                                     unsigned index, xc_hypercall_buffer_t *hbuf);
> +#define xc_hypercall_buffer_array_get(_xch, _array, _index, _name, _size) \
> +    xc__hypercall_buffer_array_get(_xch, _array, _index, HYPERCALL_BUFFER(_name))
> +void xc_hypercall_buffer_array_destroy(xc_interface *xc, xc_hypercall_buffer_array_t *array);
> +
> +/*
>   * CPUMAP handling
>   */
>  typedef uint8_t *xc_cpumap_t;



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 8/8] libxc: add API for kexec hypercall
  2013-02-21 17:48 ` David Vrabel
@ 2013-03-07  2:46   ` Ian Campbell
  2013-03-07  2:46   ` [Xen-devel] " Ian Campbell
  1 sibling, 0 replies; 106+ messages in thread
From: Ian Campbell @ 2013-03-07  2:46 UTC (permalink / raw)
  To: David Vrabel; +Cc: Daniel Kiper, kexec, xen-devel

On Thu, 2013-02-21 at 17:48 +0000, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
> 
> Add xc_kexec_exec(), xc_kexec_get_ranges(), xc_kexec_load(), and
> xc_kexec_unload().  The load and unload calls require the v2 load and
> unload ops.

Looks good, few nits below, but

Acked-by: Ian Campbell <ian.campbell@citrix.com>

> +/*
> + * Find the machine address and size of certain memory areas.
> + *
> + *   KEXEC_RANGE_MA_CRASH       crash area
> + *   KEXEC_RANGE_MA_XEN         Xen itself
> + *   KEXEC_RANGE_MA_CPU         CPU note for CPU number 'nr'
> + *   KEXEC_RANGE_MA_XENHEAP     xenheap
> + *   KEXEC_RANGE_MA_EFI_MEMMAP  EFI Memory Map
> + *   KEXEC_RANGE_MA_VMCOREINFO  vmcoreinfo

I expect there is a canonical list of these somewhere, can we get a
pointer to it here?

> + *
> + * Fails with:
> + *   EINVAL if the range or CPU number isn't valid.
> + */
> +int xc_kexec_get_range(xc_interface *xch, int range,  int nr,
> +                       uint64_t *size, uint64_t *start);
> +
> +/*
> + * Load a kexec image into memory.
> + *
> + * The image may be of type KEXEC_TYPE_DEFAULT (executed on request)
> + * or KEXEC_TYPE_CRASH (executed on a crash).
> + *
> + * The image architecture may be a 32-bit variant of the hypervisor
> + * architecture (e.g, EM_386 on a x86-64 hypervisor).

arch is an ELF arch? Worth mentioning explicitly I think.

Ian

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 8/8] libxc: add API for kexec hypercall
  2013-02-21 17:48 ` David Vrabel
  2013-03-07  2:46   ` Ian Campbell
@ 2013-03-07  2:46   ` Ian Campbell
  1 sibling, 0 replies; 106+ messages in thread
From: Ian Campbell @ 2013-03-07  2:46 UTC (permalink / raw)
  To: David Vrabel; +Cc: Daniel Kiper, kexec, xen-devel

On Thu, 2013-02-21 at 17:48 +0000, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
> 
> Add xc_kexec_exec(), xc_kexec_get_ranges(), xc_kexec_load(), and
> xc_kexec_unload().  The load and unload calls require the v2 load and
> unload ops.

Looks good, few nits below, but

Acked-by: Ian Campbell <ian.campbell@citrix.com>

> +/*
> + * Find the machine address and size of certain memory areas.
> + *
> + *   KEXEC_RANGE_MA_CRASH       crash area
> + *   KEXEC_RANGE_MA_XEN         Xen itself
> + *   KEXEC_RANGE_MA_CPU         CPU note for CPU number 'nr'
> + *   KEXEC_RANGE_MA_XENHEAP     xenheap
> + *   KEXEC_RANGE_MA_EFI_MEMMAP  EFI Memory Map
> + *   KEXEC_RANGE_MA_VMCOREINFO  vmcoreinfo

I expect there is a canonical list of these somewhere, can we get a
pointer to it here?

> + *
> + * Fails with:
> + *   EINVAL if the range or CPU number isn't valid.
> + */
> +int xc_kexec_get_range(xc_interface *xch, int range,  int nr,
> +                       uint64_t *size, uint64_t *start);
> +
> +/*
> + * Load a kexec image into memory.
> + *
> + * The image may be of type KEXEC_TYPE_DEFAULT (executed on request)
> + * or KEXEC_TYPE_CRASH (executed on a crash).
> + *
> + * The image architecture may be a 32-bit variant of the hypervisor
> + * architecture (e.g, EM_386 on a x86-64 hypervisor).

arch is an ELF arch? Worth mentioning explicitly I think.

Ian


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-02-21 17:48 ` David Vrabel
                     ` (3 preceding siblings ...)
  2013-02-22  8:33   ` Jan Beulich
@ 2013-03-08 10:50   ` Daniel Kiper
  2013-03-08 10:50   ` Daniel Kiper
  5 siblings, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-08 10:50 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Thu, Feb 21, 2013 at 05:48:09PM +0000, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
>
> Add replacement KEXEC_CMD_load and KEXEC_CMD_unload sub-ops to the
> kexec hypercall.  These new sub-ops allow a priviledged guest to
> provide the image data to be loaded into Xen memory or the crash
> region instead of guests loading the image data themselves and
> providing the relocation code and metadata.
>
> The old interface is provided to guests requesting an interface
> version prior to 4.3.
>
> Signed-off: David Vrabel <david.vrabel@citrix.com>

[...]

> diff --git a/xen/include/public/kexec.h b/xen/include/public/kexec.h
> index 61a8d7d..5259446 100644
> --- a/xen/include/public/kexec.h
> +++ b/xen/include/public/kexec.h
> @@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
>   * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
>   * image == relocation information for kexec (ignored for unload) [in]
>   */
> -#define KEXEC_CMD_kexec_load            1
> -#define KEXEC_CMD_kexec_unload          2
> -typedef struct xen_kexec_load {
> +#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
> +#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
> +typedef struct xen_kexec_load_v1 {
>      int type;
>      xen_kexec_image_t image;
> -} xen_kexec_load_t;
> +} xen_kexec_load_v1_t;

I think that this is not good idea to redefine meaning of constants,
types, structures, etc. IMO it is comparable to redefining meaning
of words in any laguage (e.g. English). It will be very confusing
and may easily lead to stupid bugs. I think that old interface should
stay as is (with its bad behavior). New interface should be introduced
with "_v2" suffix, e.g. KEXEC_CMD_kexec_load_v2, ...
This would not confuse our descendants.

Daniel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-02-21 17:48 ` David Vrabel
                     ` (4 preceding siblings ...)
  2013-03-08 10:50   ` Daniel Kiper
@ 2013-03-08 10:50   ` Daniel Kiper
  2013-03-08 11:52     ` David Vrabel
  2013-03-08 11:52     ` David Vrabel
  5 siblings, 2 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-08 10:50 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Thu, Feb 21, 2013 at 05:48:09PM +0000, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
>
> Add replacement KEXEC_CMD_load and KEXEC_CMD_unload sub-ops to the
> kexec hypercall.  These new sub-ops allow a priviledged guest to
> provide the image data to be loaded into Xen memory or the crash
> region instead of guests loading the image data themselves and
> providing the relocation code and metadata.
>
> The old interface is provided to guests requesting an interface
> version prior to 4.3.
>
> Signed-off: David Vrabel <david.vrabel@citrix.com>

[...]

> diff --git a/xen/include/public/kexec.h b/xen/include/public/kexec.h
> index 61a8d7d..5259446 100644
> --- a/xen/include/public/kexec.h
> +++ b/xen/include/public/kexec.h
> @@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
>   * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
>   * image == relocation information for kexec (ignored for unload) [in]
>   */
> -#define KEXEC_CMD_kexec_load            1
> -#define KEXEC_CMD_kexec_unload          2
> -typedef struct xen_kexec_load {
> +#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
> +#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
> +typedef struct xen_kexec_load_v1 {
>      int type;
>      xen_kexec_image_t image;
> -} xen_kexec_load_t;
> +} xen_kexec_load_v1_t;

I think that this is not good idea to redefine meaning of constants,
types, structures, etc. IMO it is comparable to redefining meaning
of words in any laguage (e.g. English). It will be very confusing
and may easily lead to stupid bugs. I think that old interface should
stay as is (with its bad behavior). New interface should be introduced
with "_v2" suffix, e.g. KEXEC_CMD_kexec_load_v2, ...
This would not confuse our descendants.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-02-21 17:48 ` David Vrabel
                     ` (3 preceding siblings ...)
  2013-02-22  8:42   ` Jan Beulich
@ 2013-03-08 11:23   ` Daniel Kiper
  2013-03-08 11:23   ` Daniel Kiper
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-08 11:23 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Thu, Feb 21, 2013 at 05:48:11PM +0000, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
>
> In the existing kexec hypercall, the load and unload ops depend on
> internals of the Linux kernel (the page list and code page provided by
> the kernel).  The code page is used to transition between Xen context
> and the image so using kernel code doesn't make sense and will not
> work for PVH guests.
>
> Add replacement KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload ops
> that no longer require a code page to be provided by the guest -- Xen
> now provides the code for calling the image directly.
>
> The new load op looks similar to the Linux kexec_load system call and
> allows the guest to provide the image data to be loaded.  The guest
> specifies the architecture of the image which may be a 32-bit subarch
> of the hypervisor's architecture (i.e., an EM_386 image on an
> EM_X86_64 hypervisor).
>
> The toolstack can now load images without kernel involvement.  This is
> required for supporting kexec when using a dom0 with an upstream
> kernel.
>
> Crash images are copied directly into the crash region on load.
> Default images are copied into Xen heap pages and a list of source and
> destination machine addresses is created.  This is list is used in
> kexec_reloc() to relocate the image to its destination.
>
> The old load and unload sub-ops are still available (as
> KEXEC_CMD_load_v1 and KEXEC_CMD_unload_v1) and are implemented on top
> of the new infrastructure.
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

[...]

> diff --git a/xen/arch/x86/x86_64/kexec_reloc.S b/xen/arch/x86/x86_64/kexec_reloc.S
> new file mode 100644
> index 0000000..e68842c
> --- /dev/null
> +++ b/xen/arch/x86/x86_64/kexec_reloc.S
> @@ -0,0 +1,229 @@
> +/*
> + * Relocate a kexec_image to its destination and call it.
> + *
> + * Copyright (C) 2013 Citrix Systems R&D Ltd.
> + *
> + * Portions derived from Linux's arch/x86/kernel/relocate_kernel_64.S.
> + *
> + *   Copyright (C) 2002-2005 Eric Biederman  <ebiederm@xmission.com>
> + *
> + * This source code is licensed under the GNU General Public License,
> + * Version 2.  See the file COPYING for more details.
> + */
> +#include <xen/config.h>
> +
> +#include <asm/asm_defns.h>
> +#include <asm/msr.h>
> +#include <asm/page.h>
> +#include <asm/machine_kexec.h>
> +
> +/* The unrelocated physical address of a symbol. */
> +#define SYM_PHYS(sym)          ((sym) - __XEN_VIRT_START)
> +
> +/* Load physical address of symbol into register and relocate it. */
> +#define RELOCATE_SYM(sym,reg)  mov $SYM_PHYS(sym), reg ; \
> +                               add xen_phys_start(%rip), reg
> +
> +#define DBG(c) \
> +1:      mov     $0x3f8+5, %dx ; \
> +        inb     %dx, %al     ; \
> +        test    $0x20, %al   ; \
> +        je      1b           ; \
> +        mov     $0x3f8, %dx  ; \
> +        mov     $c, %al      ; \
> +        outb    %al, %dx     ;
> +
> +        .text
> +	.align PAGE_SIZE
> +        .code64
> +
> +ENTRY(kexec_reloc)
> +        /* %rdi - code_page maddr */
> +        /* %rsi - page table maddr */
> +        /* %rdx - indirection page maddr */
> +        /* %rcx - entry maddr */
> +        /* %r8 - flags */
> +
> +        mov %rdx, %rbx
> +
> +        DBG('A')
> +
> +        /* Setup stack. */
> +        RELOCATE_SYM(reloc_stack, %rax)
> +        mov %rax, %rsp
> +
> +        DBG('B')
> +
> +        wbinvd
> +        movq %cr4, %rax
> +        andq $~(X86_CR4_PGE|X86_CR4_PCE|X86_CR4_MCE), %rax
> +        movq %rax, %cr4
> +
> +        /* Load reloc page table. */
> +        movq %rsi, %cr3
> +
> +        DBG('C')
> +
> +        /* Jump to identity mapped code. */
> +        movq %rdi, %r9
> +        addq $(identity_mapped - kexec_reloc), %r9
> +
> +        DBG('D')
> +
> +        jmp *%r9
> +
> +identity_mapped:
> +        DBG('E')
> +
> +        pushq %rcx
> +        pushq %rbx
> +        pushq %rsi
> +        pushq %rdi
> +
> +        movq %rbx, %rdi
> +        call swap_pages
> +
> +        popq %rdi
> +        popq %rsi
> +        popq %rbx
> +        popq %rcx
> +
> +        DBG('F')
> +
> +        /* Need to switch to 32-bit mode? */
> +        testq $KEXEC_RELOC_FLAG_COMPAT, %r8
> +        jnz call_32_bit

Why do you need that? This is not needed because purgatory code
from kexec-tools always switches to 32-bit mode. Please check
kexec-tools/purgatory/arch/x86_64/entry64.S.

Daniel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-02-21 17:48 ` David Vrabel
                     ` (4 preceding siblings ...)
  2013-03-08 11:23   ` Daniel Kiper
@ 2013-03-08 11:23   ` Daniel Kiper
  2013-03-08 11:40     ` David Vrabel
  2013-03-08 11:40     ` David Vrabel
  2013-03-12 11:36   ` Daniel Kiper
  2013-03-12 11:36   ` Daniel Kiper
  7 siblings, 2 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-08 11:23 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Thu, Feb 21, 2013 at 05:48:11PM +0000, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
>
> In the existing kexec hypercall, the load and unload ops depend on
> internals of the Linux kernel (the page list and code page provided by
> the kernel).  The code page is used to transition between Xen context
> and the image so using kernel code doesn't make sense and will not
> work for PVH guests.
>
> Add replacement KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload ops
> that no longer require a code page to be provided by the guest -- Xen
> now provides the code for calling the image directly.
>
> The new load op looks similar to the Linux kexec_load system call and
> allows the guest to provide the image data to be loaded.  The guest
> specifies the architecture of the image which may be a 32-bit subarch
> of the hypervisor's architecture (i.e., an EM_386 image on an
> EM_X86_64 hypervisor).
>
> The toolstack can now load images without kernel involvement.  This is
> required for supporting kexec when using a dom0 with an upstream
> kernel.
>
> Crash images are copied directly into the crash region on load.
> Default images are copied into Xen heap pages and a list of source and
> destination machine addresses is created.  This is list is used in
> kexec_reloc() to relocate the image to its destination.
>
> The old load and unload sub-ops are still available (as
> KEXEC_CMD_load_v1 and KEXEC_CMD_unload_v1) and are implemented on top
> of the new infrastructure.
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

[...]

> diff --git a/xen/arch/x86/x86_64/kexec_reloc.S b/xen/arch/x86/x86_64/kexec_reloc.S
> new file mode 100644
> index 0000000..e68842c
> --- /dev/null
> +++ b/xen/arch/x86/x86_64/kexec_reloc.S
> @@ -0,0 +1,229 @@
> +/*
> + * Relocate a kexec_image to its destination and call it.
> + *
> + * Copyright (C) 2013 Citrix Systems R&D Ltd.
> + *
> + * Portions derived from Linux's arch/x86/kernel/relocate_kernel_64.S.
> + *
> + *   Copyright (C) 2002-2005 Eric Biederman  <ebiederm@xmission.com>
> + *
> + * This source code is licensed under the GNU General Public License,
> + * Version 2.  See the file COPYING for more details.
> + */
> +#include <xen/config.h>
> +
> +#include <asm/asm_defns.h>
> +#include <asm/msr.h>
> +#include <asm/page.h>
> +#include <asm/machine_kexec.h>
> +
> +/* The unrelocated physical address of a symbol. */
> +#define SYM_PHYS(sym)          ((sym) - __XEN_VIRT_START)
> +
> +/* Load physical address of symbol into register and relocate it. */
> +#define RELOCATE_SYM(sym,reg)  mov $SYM_PHYS(sym), reg ; \
> +                               add xen_phys_start(%rip), reg
> +
> +#define DBG(c) \
> +1:      mov     $0x3f8+5, %dx ; \
> +        inb     %dx, %al     ; \
> +        test    $0x20, %al   ; \
> +        je      1b           ; \
> +        mov     $0x3f8, %dx  ; \
> +        mov     $c, %al      ; \
> +        outb    %al, %dx     ;
> +
> +        .text
> +	.align PAGE_SIZE
> +        .code64
> +
> +ENTRY(kexec_reloc)
> +        /* %rdi - code_page maddr */
> +        /* %rsi - page table maddr */
> +        /* %rdx - indirection page maddr */
> +        /* %rcx - entry maddr */
> +        /* %r8 - flags */
> +
> +        mov %rdx, %rbx
> +
> +        DBG('A')
> +
> +        /* Setup stack. */
> +        RELOCATE_SYM(reloc_stack, %rax)
> +        mov %rax, %rsp
> +
> +        DBG('B')
> +
> +        wbinvd
> +        movq %cr4, %rax
> +        andq $~(X86_CR4_PGE|X86_CR4_PCE|X86_CR4_MCE), %rax
> +        movq %rax, %cr4
> +
> +        /* Load reloc page table. */
> +        movq %rsi, %cr3
> +
> +        DBG('C')
> +
> +        /* Jump to identity mapped code. */
> +        movq %rdi, %r9
> +        addq $(identity_mapped - kexec_reloc), %r9
> +
> +        DBG('D')
> +
> +        jmp *%r9
> +
> +identity_mapped:
> +        DBG('E')
> +
> +        pushq %rcx
> +        pushq %rbx
> +        pushq %rsi
> +        pushq %rdi
> +
> +        movq %rbx, %rdi
> +        call swap_pages
> +
> +        popq %rdi
> +        popq %rsi
> +        popq %rbx
> +        popq %rcx
> +
> +        DBG('F')
> +
> +        /* Need to switch to 32-bit mode? */
> +        testq $KEXEC_RELOC_FLAG_COMPAT, %r8
> +        jnz call_32_bit

Why do you need that? This is not needed because purgatory code
from kexec-tools always switches to 32-bit mode. Please check
kexec-tools/purgatory/arch/x86_64/entry64.S.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 4/8] kexec: add infrastructure for handling kexec images
  2013-02-21 17:48 ` [PATCH 4/8] kexec: add infrastructure for handling kexec images David Vrabel
  2013-03-08 11:37   ` Daniel Kiper
@ 2013-03-08 11:37   ` Daniel Kiper
  1 sibling, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-08 11:37 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Thu, Feb 21, 2013 at 05:48:10PM +0000, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
>
> Add the code needed to handle and load kexec images into Xen memory or
> into the crash region.  This is needed for the new KEXEC_CMD_load and
> KEXEC_CMD_unload hypercall sub-ops.
>
> Much of this code is derived from the Linux kernel.
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

[...]

> diff --git a/xen/common/kimage.c b/xen/common/kimage.c

[...]

> +static int kimage_load_segment(struct kexec_image *image, xen_kexec_segment_t *segment)
> +{
> +    int result = -ENOMEM;

Somewhere should be a check that arhitecture of loaded image
is compatible with arhitecture on which we currently running.

> +    switch ( image->type )
> +    {
> +    case KEXEC_TYPE_DEFAULT:
> +        result = kimage_load_normal_segment(image, segment);
> +        break;
> +    case KEXEC_TYPE_CRASH:
> +        result = kimage_load_crash_segment(image, segment);
> +        break;
> +    }
> +
> +    return result;
> +}

Daniel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 4/8] kexec: add infrastructure for handling kexec images
  2013-02-21 17:48 ` [PATCH 4/8] kexec: add infrastructure for handling kexec images David Vrabel
@ 2013-03-08 11:37   ` Daniel Kiper
  2013-03-08 11:42     ` David Vrabel
  2013-03-08 11:42     ` David Vrabel
  2013-03-08 11:37   ` Daniel Kiper
  1 sibling, 2 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-08 11:37 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Thu, Feb 21, 2013 at 05:48:10PM +0000, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
>
> Add the code needed to handle and load kexec images into Xen memory or
> into the crash region.  This is needed for the new KEXEC_CMD_load and
> KEXEC_CMD_unload hypercall sub-ops.
>
> Much of this code is derived from the Linux kernel.
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

[...]

> diff --git a/xen/common/kimage.c b/xen/common/kimage.c

[...]

> +static int kimage_load_segment(struct kexec_image *image, xen_kexec_segment_t *segment)
> +{
> +    int result = -ENOMEM;

Somewhere should be a check that arhitecture of loaded image
is compatible with arhitecture on which we currently running.

> +    switch ( image->type )
> +    {
> +    case KEXEC_TYPE_DEFAULT:
> +        result = kimage_load_normal_segment(image, segment);
> +        break;
> +    case KEXEC_TYPE_CRASH:
> +        result = kimage_load_crash_segment(image, segment);
> +        break;
> +    }
> +
> +    return result;
> +}

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-08 11:23   ` Daniel Kiper
  2013-03-08 11:40     ` David Vrabel
@ 2013-03-08 11:40     ` David Vrabel
  1 sibling, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-03-08 11:40 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kexec, xen-devel

On 08/03/13 11:23, Daniel Kiper wrote:
> On Thu, Feb 21, 2013 at 05:48:11PM +0000, David Vrabel wrote:
>> 
>> +        /* Need to switch to 32-bit mode? */
>> +        testq $KEXEC_RELOC_FLAG_COMPAT, %r8
>> +        jnz call_32_bit
> 
> Why do you need that? This is not needed because purgatory code
> from kexec-tools always switches to 32-bit mode. Please check
> kexec-tools/purgatory/arch/x86_64/entry64.S.

The sub-architecture is a property of the image.  Why should the tool
know or care about the sub-architecture of the hypervisor?

The ABI isn't designed only for kexec-tools.

David

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-08 11:23   ` Daniel Kiper
@ 2013-03-08 11:40     ` David Vrabel
  2013-03-08 12:21       ` Daniel Kiper
  2013-03-08 12:21       ` Daniel Kiper
  2013-03-08 11:40     ` David Vrabel
  1 sibling, 2 replies; 106+ messages in thread
From: David Vrabel @ 2013-03-08 11:40 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kexec, xen-devel

On 08/03/13 11:23, Daniel Kiper wrote:
> On Thu, Feb 21, 2013 at 05:48:11PM +0000, David Vrabel wrote:
>> 
>> +        /* Need to switch to 32-bit mode? */
>> +        testq $KEXEC_RELOC_FLAG_COMPAT, %r8
>> +        jnz call_32_bit
> 
> Why do you need that? This is not needed because purgatory code
> from kexec-tools always switches to 32-bit mode. Please check
> kexec-tools/purgatory/arch/x86_64/entry64.S.

The sub-architecture is a property of the image.  Why should the tool
know or care about the sub-architecture of the hypervisor?

The ABI isn't designed only for kexec-tools.

David

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 4/8] kexec: add infrastructure for handling kexec images
  2013-03-08 11:37   ` Daniel Kiper
  2013-03-08 11:42     ` David Vrabel
@ 2013-03-08 11:42     ` David Vrabel
  1 sibling, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-03-08 11:42 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kexec, xen-devel

On 08/03/13 11:37, Daniel Kiper wrote:
> On Thu, Feb 21, 2013 at 05:48:10PM +0000, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> Add the code needed to handle and load kexec images into Xen memory or
>> into the crash region.  This is needed for the new KEXEC_CMD_load and
>> KEXEC_CMD_unload hypercall sub-ops.
>>
>> Much of this code is derived from the Linux kernel.
>>
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> 
> [...]
> 
>> diff --git a/xen/common/kimage.c b/xen/common/kimage.c
> 
> [...]
> 
>> +static int kimage_load_segment(struct kexec_image *image, xen_kexec_segment_t *segment)
>> +{
>> +    int result = -ENOMEM;
> 
> Somewhere should be a check that arhitecture of loaded image
> is compatible with arhitecture on which we currently running.

See machine_kexec_load() in patch 5.

David

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 4/8] kexec: add infrastructure for handling kexec images
  2013-03-08 11:37   ` Daniel Kiper
@ 2013-03-08 11:42     ` David Vrabel
  2013-03-08 11:58       ` Daniel Kiper
  2013-03-08 11:58       ` Daniel Kiper
  2013-03-08 11:42     ` David Vrabel
  1 sibling, 2 replies; 106+ messages in thread
From: David Vrabel @ 2013-03-08 11:42 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kexec, xen-devel

On 08/03/13 11:37, Daniel Kiper wrote:
> On Thu, Feb 21, 2013 at 05:48:10PM +0000, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> Add the code needed to handle and load kexec images into Xen memory or
>> into the crash region.  This is needed for the new KEXEC_CMD_load and
>> KEXEC_CMD_unload hypercall sub-ops.
>>
>> Much of this code is derived from the Linux kernel.
>>
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> 
> [...]
> 
>> diff --git a/xen/common/kimage.c b/xen/common/kimage.c
> 
> [...]
> 
>> +static int kimage_load_segment(struct kexec_image *image, xen_kexec_segment_t *segment)
>> +{
>> +    int result = -ENOMEM;
> 
> Somewhere should be a check that arhitecture of loaded image
> is compatible with arhitecture on which we currently running.

See machine_kexec_load() in patch 5.

David

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-03-08 10:50   ` Daniel Kiper
  2013-03-08 11:52     ` David Vrabel
@ 2013-03-08 11:52     ` David Vrabel
  1 sibling, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-03-08 11:52 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kexec, xen-devel

On 08/03/13 10:50, Daniel Kiper wrote:
> On Thu, Feb 21, 2013 at 05:48:09PM +0000, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> Add replacement KEXEC_CMD_load and KEXEC_CMD_unload sub-ops to the
>> kexec hypercall.  These new sub-ops allow a priviledged guest to
>> provide the image data to be loaded into Xen memory or the crash
>> region instead of guests loading the image data themselves and
>> providing the relocation code and metadata.
>>
>> The old interface is provided to guests requesting an interface
>> version prior to 4.3.
>>
>> Signed-off: David Vrabel <david.vrabel@citrix.com>
> 
> [...]
> 
>> diff --git a/xen/include/public/kexec.h b/xen/include/public/kexec.h
>> index 61a8d7d..5259446 100644
>> --- a/xen/include/public/kexec.h
>> +++ b/xen/include/public/kexec.h
>> @@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
>>   * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
>>   * image == relocation information for kexec (ignored for unload) [in]
>>   */
>> -#define KEXEC_CMD_kexec_load            1
>> -#define KEXEC_CMD_kexec_unload          2
>> -typedef struct xen_kexec_load {
>> +#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
>> +#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
>> +typedef struct xen_kexec_load_v1 {
>>      int type;
>>      xen_kexec_image_t image;
>> -} xen_kexec_load_t;
>> +} xen_kexec_load_v1_t;
> 
> I think that this is not good idea to redefine meaning of constants,
> types, structures, etc. IMO it is comparable to redefining meaning
> of words in any laguage (e.g. English). It will be very confusing
> and may easily lead to stupid bugs. I think that old interface should
> stay as is (with its bad behavior). New interface should be introduced
> with "_v2" suffix, e.g. KEXEC_CMD_kexec_load_v2, ...
> This would not confuse our descendants.

This is something that was requested (by Ian C) as the Xen way of doing it.

David

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-03-08 10:50   ` Daniel Kiper
@ 2013-03-08 11:52     ` David Vrabel
  2013-03-08 12:28       ` Daniel Kiper
  2013-03-08 12:28       ` Daniel Kiper
  2013-03-08 11:52     ` David Vrabel
  1 sibling, 2 replies; 106+ messages in thread
From: David Vrabel @ 2013-03-08 11:52 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kexec, xen-devel

On 08/03/13 10:50, Daniel Kiper wrote:
> On Thu, Feb 21, 2013 at 05:48:09PM +0000, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> Add replacement KEXEC_CMD_load and KEXEC_CMD_unload sub-ops to the
>> kexec hypercall.  These new sub-ops allow a priviledged guest to
>> provide the image data to be loaded into Xen memory or the crash
>> region instead of guests loading the image data themselves and
>> providing the relocation code and metadata.
>>
>> The old interface is provided to guests requesting an interface
>> version prior to 4.3.
>>
>> Signed-off: David Vrabel <david.vrabel@citrix.com>
> 
> [...]
> 
>> diff --git a/xen/include/public/kexec.h b/xen/include/public/kexec.h
>> index 61a8d7d..5259446 100644
>> --- a/xen/include/public/kexec.h
>> +++ b/xen/include/public/kexec.h
>> @@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
>>   * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
>>   * image == relocation information for kexec (ignored for unload) [in]
>>   */
>> -#define KEXEC_CMD_kexec_load            1
>> -#define KEXEC_CMD_kexec_unload          2
>> -typedef struct xen_kexec_load {
>> +#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
>> +#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
>> +typedef struct xen_kexec_load_v1 {
>>      int type;
>>      xen_kexec_image_t image;
>> -} xen_kexec_load_t;
>> +} xen_kexec_load_v1_t;
> 
> I think that this is not good idea to redefine meaning of constants,
> types, structures, etc. IMO it is comparable to redefining meaning
> of words in any laguage (e.g. English). It will be very confusing
> and may easily lead to stupid bugs. I think that old interface should
> stay as is (with its bad behavior). New interface should be introduced
> with "_v2" suffix, e.g. KEXEC_CMD_kexec_load_v2, ...
> This would not confuse our descendants.

This is something that was requested (by Ian C) as the Xen way of doing it.

David

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 4/8] kexec: add infrastructure for handling kexec images
  2013-03-08 11:42     ` David Vrabel
  2013-03-08 11:58       ` Daniel Kiper
@ 2013-03-08 11:58       ` Daniel Kiper
  1 sibling, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-08 11:58 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Fri, Mar 08, 2013 at 11:42:27AM +0000, David Vrabel wrote:
> On 08/03/13 11:37, Daniel Kiper wrote:
> > On Thu, Feb 21, 2013 at 05:48:10PM +0000, David Vrabel wrote:
> >> From: David Vrabel <david.vrabel@citrix.com>
> >>
> >> Add the code needed to handle and load kexec images into Xen memory or
> >> into the crash region.  This is needed for the new KEXEC_CMD_load and
> >> KEXEC_CMD_unload hypercall sub-ops.
> >>
> >> Much of this code is derived from the Linux kernel.
> >>
> >> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> >
> > [...]
> >
> >> diff --git a/xen/common/kimage.c b/xen/common/kimage.c
> >
> > [...]
> >
> >> +static int kimage_load_segment(struct kexec_image *image, xen_kexec_segment_t *segment)
> >> +{
> >> +    int result = -ENOMEM;
> >
> > Somewhere should be a check that arhitecture of loaded image
> > is compatible with arhitecture on which we currently running.
>
> See machine_kexec_load() in patch 5.

Thanks. I missed that.

Daniel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 4/8] kexec: add infrastructure for handling kexec images
  2013-03-08 11:42     ` David Vrabel
@ 2013-03-08 11:58       ` Daniel Kiper
  2013-03-08 11:58       ` Daniel Kiper
  1 sibling, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-08 11:58 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Fri, Mar 08, 2013 at 11:42:27AM +0000, David Vrabel wrote:
> On 08/03/13 11:37, Daniel Kiper wrote:
> > On Thu, Feb 21, 2013 at 05:48:10PM +0000, David Vrabel wrote:
> >> From: David Vrabel <david.vrabel@citrix.com>
> >>
> >> Add the code needed to handle and load kexec images into Xen memory or
> >> into the crash region.  This is needed for the new KEXEC_CMD_load and
> >> KEXEC_CMD_unload hypercall sub-ops.
> >>
> >> Much of this code is derived from the Linux kernel.
> >>
> >> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> >
> > [...]
> >
> >> diff --git a/xen/common/kimage.c b/xen/common/kimage.c
> >
> > [...]
> >
> >> +static int kimage_load_segment(struct kexec_image *image, xen_kexec_segment_t *segment)
> >> +{
> >> +    int result = -ENOMEM;
> >
> > Somewhere should be a check that arhitecture of loaded image
> > is compatible with arhitecture on which we currently running.
>
> See machine_kexec_load() in patch 5.

Thanks. I missed that.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-08 11:40     ` David Vrabel
  2013-03-08 12:21       ` Daniel Kiper
@ 2013-03-08 12:21       ` Daniel Kiper
  1 sibling, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-08 12:21 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Fri, Mar 08, 2013 at 11:40:44AM +0000, David Vrabel wrote:
> On 08/03/13 11:23, Daniel Kiper wrote:
> > On Thu, Feb 21, 2013 at 05:48:11PM +0000, David Vrabel wrote:
> >>
> >> +        /* Need to switch to 32-bit mode? */
> >> +        testq $KEXEC_RELOC_FLAG_COMPAT, %r8
> >> +        jnz call_32_bit
> >
> > Why do you need that? This is not needed because purgatory code
> > from kexec-tools always switches to 32-bit mode. Please check
> > kexec-tools/purgatory/arch/x86_64/entry64.S.
>
> The sub-architecture is a property of the image.  Why should the tool
> know or care about the sub-architecture of the hypervisor?
>
> The ABI isn't designed only for kexec-tools.

OK, but I think it is much easier to assume that machine state
is not changed by kexec syscall/hypercall and move out this
task to separate module (in this case purgatory code) which does
all needed things (in this case sets it to "native" mode which is
close to machine state after BIOS initialization; as I know this
assumption is common for other architectures too). This way you
could get what you need (e.g. 64-bit -> 64-bit, 64-bit -> 32-bit, ...)
without changing a single instruction in hypervisor or kernel.
Just do changes in purgatory (it could be called differently in
your private kexec-tool) and voila.

Additionally, you duplicate code which exists and works well.

Daniel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-08 11:40     ` David Vrabel
@ 2013-03-08 12:21       ` Daniel Kiper
  2013-03-08 14:01         ` David Vrabel
  2013-03-08 14:01         ` David Vrabel
  2013-03-08 12:21       ` Daniel Kiper
  1 sibling, 2 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-08 12:21 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Fri, Mar 08, 2013 at 11:40:44AM +0000, David Vrabel wrote:
> On 08/03/13 11:23, Daniel Kiper wrote:
> > On Thu, Feb 21, 2013 at 05:48:11PM +0000, David Vrabel wrote:
> >>
> >> +        /* Need to switch to 32-bit mode? */
> >> +        testq $KEXEC_RELOC_FLAG_COMPAT, %r8
> >> +        jnz call_32_bit
> >
> > Why do you need that? This is not needed because purgatory code
> > from kexec-tools always switches to 32-bit mode. Please check
> > kexec-tools/purgatory/arch/x86_64/entry64.S.
>
> The sub-architecture is a property of the image.  Why should the tool
> know or care about the sub-architecture of the hypervisor?
>
> The ABI isn't designed only for kexec-tools.

OK, but I think it is much easier to assume that machine state
is not changed by kexec syscall/hypercall and move out this
task to separate module (in this case purgatory code) which does
all needed things (in this case sets it to "native" mode which is
close to machine state after BIOS initialization; as I know this
assumption is common for other architectures too). This way you
could get what you need (e.g. 64-bit -> 64-bit, 64-bit -> 32-bit, ...)
without changing a single instruction in hypervisor or kernel.
Just do changes in purgatory (it could be called differently in
your private kexec-tool) and voila.

Additionally, you duplicate code which exists and works well.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-03-08 11:52     ` David Vrabel
@ 2013-03-08 12:28       ` Daniel Kiper
  2013-03-08 12:28       ` Daniel Kiper
  1 sibling, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-08 12:28 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, ian.campbell, xen-devel

On Fri, Mar 08, 2013 at 11:52:21AM +0000, David Vrabel wrote:
> On 08/03/13 10:50, Daniel Kiper wrote:
> > On Thu, Feb 21, 2013 at 05:48:09PM +0000, David Vrabel wrote:
> >> From: David Vrabel <david.vrabel@citrix.com>
> >>
> >> Add replacement KEXEC_CMD_load and KEXEC_CMD_unload sub-ops to the
> >> kexec hypercall.  These new sub-ops allow a priviledged guest to
> >> provide the image data to be loaded into Xen memory or the crash
> >> region instead of guests loading the image data themselves and
> >> providing the relocation code and metadata.
> >>
> >> The old interface is provided to guests requesting an interface
> >> version prior to 4.3.
> >>
> >> Signed-off: David Vrabel <david.vrabel@citrix.com>
> >
> > [...]
> >
> >> diff --git a/xen/include/public/kexec.h b/xen/include/public/kexec.h
> >> index 61a8d7d..5259446 100644
> >> --- a/xen/include/public/kexec.h
> >> +++ b/xen/include/public/kexec.h
> >> @@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
> >>   * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
> >>   * image == relocation information for kexec (ignored for unload) [in]
> >>   */
> >> -#define KEXEC_CMD_kexec_load            1
> >> -#define KEXEC_CMD_kexec_unload          2
> >> -typedef struct xen_kexec_load {
> >> +#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
> >> +#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
> >> +typedef struct xen_kexec_load_v1 {
> >>      int type;
> >>      xen_kexec_image_t image;
> >> -} xen_kexec_load_t;
> >> +} xen_kexec_load_v1_t;
> >
> > I think that this is not good idea to redefine meaning of constants,
> > types, structures, etc. IMO it is comparable to redefining meaning
> > of words in any laguage (e.g. English). It will be very confusing
> > and may easily lead to stupid bugs. I think that old interface should
> > stay as is (with its bad behavior). New interface should be introduced
> > with "_v2" suffix, e.g. KEXEC_CMD_kexec_load_v2, ...
> > This would not confuse our descendants.
>
> This is something that was requested (by Ian C) as the Xen way of doing it.

Yes, I remember but still do not agree with that idea in general.
Maybe discussion on kexec interface is good point to change
that Xen community behavior? Ian?

Daniel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-03-08 11:52     ` David Vrabel
  2013-03-08 12:28       ` Daniel Kiper
@ 2013-03-08 12:28       ` Daniel Kiper
  2013-03-08 12:36         ` [Xen-devel] " Jan Beulich
  2013-03-08 12:36         ` Jan Beulich
  1 sibling, 2 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-08 12:28 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, ian.campbell, xen-devel

On Fri, Mar 08, 2013 at 11:52:21AM +0000, David Vrabel wrote:
> On 08/03/13 10:50, Daniel Kiper wrote:
> > On Thu, Feb 21, 2013 at 05:48:09PM +0000, David Vrabel wrote:
> >> From: David Vrabel <david.vrabel@citrix.com>
> >>
> >> Add replacement KEXEC_CMD_load and KEXEC_CMD_unload sub-ops to the
> >> kexec hypercall.  These new sub-ops allow a priviledged guest to
> >> provide the image data to be loaded into Xen memory or the crash
> >> region instead of guests loading the image data themselves and
> >> providing the relocation code and metadata.
> >>
> >> The old interface is provided to guests requesting an interface
> >> version prior to 4.3.
> >>
> >> Signed-off: David Vrabel <david.vrabel@citrix.com>
> >
> > [...]
> >
> >> diff --git a/xen/include/public/kexec.h b/xen/include/public/kexec.h
> >> index 61a8d7d..5259446 100644
> >> --- a/xen/include/public/kexec.h
> >> +++ b/xen/include/public/kexec.h
> >> @@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
> >>   * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
> >>   * image == relocation information for kexec (ignored for unload) [in]
> >>   */
> >> -#define KEXEC_CMD_kexec_load            1
> >> -#define KEXEC_CMD_kexec_unload          2
> >> -typedef struct xen_kexec_load {
> >> +#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
> >> +#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
> >> +typedef struct xen_kexec_load_v1 {
> >>      int type;
> >>      xen_kexec_image_t image;
> >> -} xen_kexec_load_t;
> >> +} xen_kexec_load_v1_t;
> >
> > I think that this is not good idea to redefine meaning of constants,
> > types, structures, etc. IMO it is comparable to redefining meaning
> > of words in any laguage (e.g. English). It will be very confusing
> > and may easily lead to stupid bugs. I think that old interface should
> > stay as is (with its bad behavior). New interface should be introduced
> > with "_v2" suffix, e.g. KEXEC_CMD_kexec_load_v2, ...
> > This would not confuse our descendants.
>
> This is something that was requested (by Ian C) as the Xen way of doing it.

Yes, I remember but still do not agree with that idea in general.
Maybe discussion on kexec interface is good point to change
that Xen community behavior? Ian?

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-03-08 12:28       ` Daniel Kiper
  2013-03-08 12:36         ` [Xen-devel] " Jan Beulich
@ 2013-03-08 12:36         ` Jan Beulich
  1 sibling, 0 replies; 106+ messages in thread
From: Jan Beulich @ 2013-03-08 12:36 UTC (permalink / raw)
  To: David Vrabel, Daniel Kiper; +Cc: kexec, ian.campbell, xen-devel

>>> On 08.03.13 at 13:28, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> On Fri, Mar 08, 2013 at 11:52:21AM +0000, David Vrabel wrote:
>> On 08/03/13 10:50, Daniel Kiper wrote:
>> > On Thu, Feb 21, 2013 at 05:48:09PM +0000, David Vrabel wrote:
>> >> From: David Vrabel <david.vrabel@citrix.com>
>> >>
>> >> Add replacement KEXEC_CMD_load and KEXEC_CMD_unload sub-ops to the
>> >> kexec hypercall.  These new sub-ops allow a priviledged guest to
>> >> provide the image data to be loaded into Xen memory or the crash
>> >> region instead of guests loading the image data themselves and
>> >> providing the relocation code and metadata.
>> >>
>> >> The old interface is provided to guests requesting an interface
>> >> version prior to 4.3.
>> >>
>> >> Signed-off: David Vrabel <david.vrabel@citrix.com>
>> >
>> > [...]
>> >
>> >> diff --git a/xen/include/public/kexec.h b/xen/include/public/kexec.h
>> >> index 61a8d7d..5259446 100644
>> >> --- a/xen/include/public/kexec.h
>> >> +++ b/xen/include/public/kexec.h
>> >> @@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
>> >>   * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
>> >>   * image == relocation information for kexec (ignored for unload) [in]
>> >>   */
>> >> -#define KEXEC_CMD_kexec_load            1
>> >> -#define KEXEC_CMD_kexec_unload          2
>> >> -typedef struct xen_kexec_load {
>> >> +#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
>> >> +#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
>> >> +typedef struct xen_kexec_load_v1 {
>> >>      int type;
>> >>      xen_kexec_image_t image;
>> >> -} xen_kexec_load_t;
>> >> +} xen_kexec_load_v1_t;
>> >
>> > I think that this is not good idea to redefine meaning of constants,
>> > types, structures, etc. IMO it is comparable to redefining meaning
>> > of words in any laguage (e.g. English). It will be very confusing
>> > and may easily lead to stupid bugs. I think that old interface should
>> > stay as is (with its bad behavior). New interface should be introduced
>> > with "_v2" suffix, e.g. KEXEC_CMD_kexec_load_v2, ...
>> > This would not confuse our descendants.
>>
>> This is something that was requested (by Ian C) as the Xen way of doing it.
> 
> Yes, I remember but still do not agree with that idea in general.
> Maybe discussion on kexec interface is good point to change
> that Xen community behavior? Ian?

Together with all consumers being expected to properly make
use of __XEN_INTERFACE_VERSION__ (or else they get the
lowest possible definitions), I don't think there's a big issue here
as long as all old definitions retain their meaning when said
symbol is set low enough.

Jan

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-03-08 12:28       ` Daniel Kiper
@ 2013-03-08 12:36         ` Jan Beulich
  2013-03-08 15:34           ` Daniel Kiper
  2013-03-08 15:34           ` Daniel Kiper
  2013-03-08 12:36         ` Jan Beulich
  1 sibling, 2 replies; 106+ messages in thread
From: Jan Beulich @ 2013-03-08 12:36 UTC (permalink / raw)
  To: David Vrabel, Daniel Kiper; +Cc: kexec, ian.campbell, xen-devel

>>> On 08.03.13 at 13:28, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> On Fri, Mar 08, 2013 at 11:52:21AM +0000, David Vrabel wrote:
>> On 08/03/13 10:50, Daniel Kiper wrote:
>> > On Thu, Feb 21, 2013 at 05:48:09PM +0000, David Vrabel wrote:
>> >> From: David Vrabel <david.vrabel@citrix.com>
>> >>
>> >> Add replacement KEXEC_CMD_load and KEXEC_CMD_unload sub-ops to the
>> >> kexec hypercall.  These new sub-ops allow a priviledged guest to
>> >> provide the image data to be loaded into Xen memory or the crash
>> >> region instead of guests loading the image data themselves and
>> >> providing the relocation code and metadata.
>> >>
>> >> The old interface is provided to guests requesting an interface
>> >> version prior to 4.3.
>> >>
>> >> Signed-off: David Vrabel <david.vrabel@citrix.com>
>> >
>> > [...]
>> >
>> >> diff --git a/xen/include/public/kexec.h b/xen/include/public/kexec.h
>> >> index 61a8d7d..5259446 100644
>> >> --- a/xen/include/public/kexec.h
>> >> +++ b/xen/include/public/kexec.h
>> >> @@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
>> >>   * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
>> >>   * image == relocation information for kexec (ignored for unload) [in]
>> >>   */
>> >> -#define KEXEC_CMD_kexec_load            1
>> >> -#define KEXEC_CMD_kexec_unload          2
>> >> -typedef struct xen_kexec_load {
>> >> +#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
>> >> +#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
>> >> +typedef struct xen_kexec_load_v1 {
>> >>      int type;
>> >>      xen_kexec_image_t image;
>> >> -} xen_kexec_load_t;
>> >> +} xen_kexec_load_v1_t;
>> >
>> > I think that this is not good idea to redefine meaning of constants,
>> > types, structures, etc. IMO it is comparable to redefining meaning
>> > of words in any laguage (e.g. English). It will be very confusing
>> > and may easily lead to stupid bugs. I think that old interface should
>> > stay as is (with its bad behavior). New interface should be introduced
>> > with "_v2" suffix, e.g. KEXEC_CMD_kexec_load_v2, ...
>> > This would not confuse our descendants.
>>
>> This is something that was requested (by Ian C) as the Xen way of doing it.
> 
> Yes, I remember but still do not agree with that idea in general.
> Maybe discussion on kexec interface is good point to change
> that Xen community behavior? Ian?

Together with all consumers being expected to properly make
use of __XEN_INTERFACE_VERSION__ (or else they get the
lowest possible definitions), I don't think there's a big issue here
as long as all old definitions retain their meaning when said
symbol is set low enough.

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-08 12:21       ` Daniel Kiper
  2013-03-08 14:01         ` David Vrabel
@ 2013-03-08 14:01         ` David Vrabel
  1 sibling, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-03-08 14:01 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kexec, xen-devel

On 08/03/13 12:21, Daniel Kiper wrote:
> On Fri, Mar 08, 2013 at 11:40:44AM +0000, David Vrabel wrote:
>> On 08/03/13 11:23, Daniel Kiper wrote:
>>> On Thu, Feb 21, 2013 at 05:48:11PM +0000, David Vrabel wrote:
>>>>
>>>> +        /* Need to switch to 32-bit mode? */
>>>> +        testq $KEXEC_RELOC_FLAG_COMPAT, %r8
>>>> +        jnz call_32_bit
>>>
>>> Why do you need that? This is not needed because purgatory code
>>> from kexec-tools always switches to 32-bit mode. Please check
>>> kexec-tools/purgatory/arch/x86_64/entry64.S.
>>
>> The sub-architecture is a property of the image.  Why should the tool
>> know or care about the sub-architecture of the hypervisor?
>>
>> The ABI isn't designed only for kexec-tools.
> 
> OK, but I think it is much easier to assume that machine state
> is not changed by kexec syscall/hypercall

What machine state is that?  The one seen by the tools or the guest
kernel or by the hypervisor?

The tools know what mode the image must be called it and it can tell the
hypervisor and the hypervisor can trivial setup the correct mode.

I propose:

* Tools say: "here's an image, call it in mode X".

You suggest:

* Hypervisor implicitly says through some unspecified side channel: "I
only call images in mode Y".
* Tools says: "here's an image. I set it up for mode Y. I hope that
works for you."

Finally, the v1 interface will call images loaded by a 32-bit dom0
kernel in 32-bit mode and we need to do continue to do the same.

> Additionally, you duplicate code which exists and works well.

It's only 17 instructions and 6 bytes of data.

David

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-08 12:21       ` Daniel Kiper
@ 2013-03-08 14:01         ` David Vrabel
  2013-03-08 15:23             ` Daniel Kiper
  2013-03-08 14:01         ` David Vrabel
  1 sibling, 1 reply; 106+ messages in thread
From: David Vrabel @ 2013-03-08 14:01 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kexec, xen-devel

On 08/03/13 12:21, Daniel Kiper wrote:
> On Fri, Mar 08, 2013 at 11:40:44AM +0000, David Vrabel wrote:
>> On 08/03/13 11:23, Daniel Kiper wrote:
>>> On Thu, Feb 21, 2013 at 05:48:11PM +0000, David Vrabel wrote:
>>>>
>>>> +        /* Need to switch to 32-bit mode? */
>>>> +        testq $KEXEC_RELOC_FLAG_COMPAT, %r8
>>>> +        jnz call_32_bit
>>>
>>> Why do you need that? This is not needed because purgatory code
>>> from kexec-tools always switches to 32-bit mode. Please check
>>> kexec-tools/purgatory/arch/x86_64/entry64.S.
>>
>> The sub-architecture is a property of the image.  Why should the tool
>> know or care about the sub-architecture of the hypervisor?
>>
>> The ABI isn't designed only for kexec-tools.
> 
> OK, but I think it is much easier to assume that machine state
> is not changed by kexec syscall/hypercall

What machine state is that?  The one seen by the tools or the guest
kernel or by the hypervisor?

The tools know what mode the image must be called it and it can tell the
hypervisor and the hypervisor can trivial setup the correct mode.

I propose:

* Tools say: "here's an image, call it in mode X".

You suggest:

* Hypervisor implicitly says through some unspecified side channel: "I
only call images in mode Y".
* Tools says: "here's an image. I set it up for mode Y. I hope that
works for you."

Finally, the v1 interface will call images loaded by a 32-bit dom0
kernel in 32-bit mode and we need to do continue to do the same.

> Additionally, you duplicate code which exists and works well.

It's only 17 instructions and 6 bytes of data.

David

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-08 14:01         ` David Vrabel
@ 2013-03-08 15:23             ` Daniel Kiper
  0 siblings, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-08 15:23 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Fri, Mar 08, 2013 at 02:01:19PM +0000, David Vrabel wrote:
> On 08/03/13 12:21, Daniel Kiper wrote:
> > On Fri, Mar 08, 2013 at 11:40:44AM +0000, David Vrabel wrote:
> >> On 08/03/13 11:23, Daniel Kiper wrote:
> >>> On Thu, Feb 21, 2013 at 05:48:11PM +0000, David Vrabel wrote:
> >>>>
> >>>> +        /* Need to switch to 32-bit mode? */
> >>>> +        testq $KEXEC_RELOC_FLAG_COMPAT, %r8
> >>>> +        jnz call_32_bit
> >>>
> >>> Why do you need that? This is not needed because purgatory code
> >>> from kexec-tools always switches to 32-bit mode. Please check
> >>> kexec-tools/purgatory/arch/x86_64/entry64.S.
> >>
> >> The sub-architecture is a property of the image.  Why should the tool
> >> know or care about the sub-architecture of the hypervisor?
> >>
> >> The ABI isn't designed only for kexec-tools.
> >
> > OK, but I think it is much easier to assume that machine state
> > is not changed by kexec syscall/hypercall
>
> What machine state is that?  The one seen by the tools or the guest
> kernel or by the hypervisor?

State of machine set by hypervisor before purgatory call.

> The tools know what mode the image must be called it and it can tell the
> hypervisor and the hypervisor can trivial setup the correct mode.
>
> I propose:
>
> * Tools say: "here's an image, call it in mode X".
>
> You suggest:
>
> * Hypervisor implicitly says through some unspecified side channel: "I
> only call images in mode Y".

Purgatory is clearly defined. Please look into kexec-tools/purgatory.
It is integral part of kexec infrastructure.

> * Tools says: "here's an image. I set it up for mode Y. I hope that
> works for you."

New kernel is never called directly by old kernel in current kexec
implementations. New system is always started in following way:

old_kernel -> purgatory -> new_kernel

What purgatory does I described earlier more or less.

Why do you want change that? It works on many architectures.
Why do we need something different for Xen (and Xen only)?
If we choose existing solution we do not lose any flexiblity.
Additionally, we could maintain compatibilty at least with
Linux for nothing.

> Finally, the v1 interface will call images loaded by a 32-bit dom0
> kernel in 32-bit mode and we need to do continue to do the same.

Purgatory does it. It is used even with current
Xen kexec implementation.

> > Additionally, you duplicate code which exists and works well.
>
> It's only 17 instructions and 6 bytes of data.

For me it is always worth to optimize code. In this case too.
However, to be precise, if you remove this unneeded code then
you could gain PAGE_SIZE - 1 bytes in worst case. Just remove
.align PAGE_SIZE in xen/xen/arch/x86/x86_64/kexec_reloc.S.

Daniel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
@ 2013-03-08 15:23             ` Daniel Kiper
  0 siblings, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-08 15:23 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Fri, Mar 08, 2013 at 02:01:19PM +0000, David Vrabel wrote:
> On 08/03/13 12:21, Daniel Kiper wrote:
> > On Fri, Mar 08, 2013 at 11:40:44AM +0000, David Vrabel wrote:
> >> On 08/03/13 11:23, Daniel Kiper wrote:
> >>> On Thu, Feb 21, 2013 at 05:48:11PM +0000, David Vrabel wrote:
> >>>>
> >>>> +        /* Need to switch to 32-bit mode? */
> >>>> +        testq $KEXEC_RELOC_FLAG_COMPAT, %r8
> >>>> +        jnz call_32_bit
> >>>
> >>> Why do you need that? This is not needed because purgatory code
> >>> from kexec-tools always switches to 32-bit mode. Please check
> >>> kexec-tools/purgatory/arch/x86_64/entry64.S.
> >>
> >> The sub-architecture is a property of the image.  Why should the tool
> >> know or care about the sub-architecture of the hypervisor?
> >>
> >> The ABI isn't designed only for kexec-tools.
> >
> > OK, but I think it is much easier to assume that machine state
> > is not changed by kexec syscall/hypercall
>
> What machine state is that?  The one seen by the tools or the guest
> kernel or by the hypervisor?

State of machine set by hypervisor before purgatory call.

> The tools know what mode the image must be called it and it can tell the
> hypervisor and the hypervisor can trivial setup the correct mode.
>
> I propose:
>
> * Tools say: "here's an image, call it in mode X".
>
> You suggest:
>
> * Hypervisor implicitly says through some unspecified side channel: "I
> only call images in mode Y".

Purgatory is clearly defined. Please look into kexec-tools/purgatory.
It is integral part of kexec infrastructure.

> * Tools says: "here's an image. I set it up for mode Y. I hope that
> works for you."

New kernel is never called directly by old kernel in current kexec
implementations. New system is always started in following way:

old_kernel -> purgatory -> new_kernel

What purgatory does I described earlier more or less.

Why do you want change that? It works on many architectures.
Why do we need something different for Xen (and Xen only)?
If we choose existing solution we do not lose any flexiblity.
Additionally, we could maintain compatibilty at least with
Linux for nothing.

> Finally, the v1 interface will call images loaded by a 32-bit dom0
> kernel in 32-bit mode and we need to do continue to do the same.

Purgatory does it. It is used even with current
Xen kexec implementation.

> > Additionally, you duplicate code which exists and works well.
>
> It's only 17 instructions and 6 bytes of data.

For me it is always worth to optimize code. In this case too.
However, to be precise, if you remove this unneeded code then
you could gain PAGE_SIZE - 1 bytes in worst case. Just remove
.align PAGE_SIZE in xen/xen/arch/x86/x86_64/kexec_reloc.S.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-03-08 12:36         ` [Xen-devel] " Jan Beulich
  2013-03-08 15:34           ` Daniel Kiper
@ 2013-03-08 15:34           ` Daniel Kiper
  1 sibling, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-08 15:34 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, kexec, David Vrabel, ian.campbell

On Fri, Mar 08, 2013 at 12:36:16PM +0000, Jan Beulich wrote:
> >>> On 08.03.13 at 13:28, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > On Fri, Mar 08, 2013 at 11:52:21AM +0000, David Vrabel wrote:
> >> On 08/03/13 10:50, Daniel Kiper wrote:
> >> > On Thu, Feb 21, 2013 at 05:48:09PM +0000, David Vrabel wrote:
> >> >> From: David Vrabel <david.vrabel@citrix.com>
> >> >>
> >> >> Add replacement KEXEC_CMD_load and KEXEC_CMD_unload sub-ops to the
> >> >> kexec hypercall.  These new sub-ops allow a priviledged guest to
> >> >> provide the image data to be loaded into Xen memory or the crash
> >> >> region instead of guests loading the image data themselves and
> >> >> providing the relocation code and metadata.
> >> >>
> >> >> The old interface is provided to guests requesting an interface
> >> >> version prior to 4.3.
> >> >>
> >> >> Signed-off: David Vrabel <david.vrabel@citrix.com>
> >> >
> >> > [...]
> >> >
> >> >> diff --git a/xen/include/public/kexec.h b/xen/include/public/kexec.h
> >> >> index 61a8d7d..5259446 100644
> >> >> --- a/xen/include/public/kexec.h
> >> >> +++ b/xen/include/public/kexec.h
> >> >> @@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
> >> >>   * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
> >> >>   * image == relocation information for kexec (ignored for unload) [in]
> >> >>   */
> >> >> -#define KEXEC_CMD_kexec_load            1
> >> >> -#define KEXEC_CMD_kexec_unload          2
> >> >> -typedef struct xen_kexec_load {
> >> >> +#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
> >> >> +#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
> >> >> +typedef struct xen_kexec_load_v1 {
> >> >>      int type;
> >> >>      xen_kexec_image_t image;
> >> >> -} xen_kexec_load_t;
> >> >> +} xen_kexec_load_v1_t;
> >> >
> >> > I think that this is not good idea to redefine meaning of constants,
> >> > types, structures, etc. IMO it is comparable to redefining meaning
> >> > of words in any laguage (e.g. English). It will be very confusing
> >> > and may easily lead to stupid bugs. I think that old interface should
> >> > stay as is (with its bad behavior). New interface should be introduced
> >> > with "_v2" suffix, e.g. KEXEC_CMD_kexec_load_v2, ...
> >> > This would not confuse our descendants.
> >>
> >> This is something that was requested (by Ian C) as the Xen way of doing it.
> >
> > Yes, I remember but still do not agree with that idea in general.
> > Maybe discussion on kexec interface is good point to change
> > that Xen community behavior? Ian?
>
> Together with all consumers being expected to properly make
> use of __XEN_INTERFACE_VERSION__ (or else they get the
> lowest possible definitions), I don't think there's a big issue here
> as long as all old definitions retain their meaning when said
> symbol is set low enough.

It is good for dropping or adding some functionalities. However, I still
do not agree that it is sufficient solution in this case. Here we are
redefining interface. I know that was done many times but it does not
mean this is good. Additionally, compilter does not care. It reads everything.
However, for me (I suppose at least) it is difficult read code which have
same "words" with meaning depending only on placement. I would like to read
sources without thinking where I am. That is all.

Daniel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops
  2013-03-08 12:36         ` [Xen-devel] " Jan Beulich
@ 2013-03-08 15:34           ` Daniel Kiper
  2013-03-08 15:34           ` Daniel Kiper
  1 sibling, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-08 15:34 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, kexec, David Vrabel, ian.campbell

On Fri, Mar 08, 2013 at 12:36:16PM +0000, Jan Beulich wrote:
> >>> On 08.03.13 at 13:28, Daniel Kiper <daniel.kiper@oracle.com> wrote:
> > On Fri, Mar 08, 2013 at 11:52:21AM +0000, David Vrabel wrote:
> >> On 08/03/13 10:50, Daniel Kiper wrote:
> >> > On Thu, Feb 21, 2013 at 05:48:09PM +0000, David Vrabel wrote:
> >> >> From: David Vrabel <david.vrabel@citrix.com>
> >> >>
> >> >> Add replacement KEXEC_CMD_load and KEXEC_CMD_unload sub-ops to the
> >> >> kexec hypercall.  These new sub-ops allow a priviledged guest to
> >> >> provide the image data to be loaded into Xen memory or the crash
> >> >> region instead of guests loading the image data themselves and
> >> >> providing the relocation code and metadata.
> >> >>
> >> >> The old interface is provided to guests requesting an interface
> >> >> version prior to 4.3.
> >> >>
> >> >> Signed-off: David Vrabel <david.vrabel@citrix.com>
> >> >
> >> > [...]
> >> >
> >> >> diff --git a/xen/include/public/kexec.h b/xen/include/public/kexec.h
> >> >> index 61a8d7d..5259446 100644
> >> >> --- a/xen/include/public/kexec.h
> >> >> +++ b/xen/include/public/kexec.h
> >> >> @@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
> >> >>   * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
> >> >>   * image == relocation information for kexec (ignored for unload) [in]
> >> >>   */
> >> >> -#define KEXEC_CMD_kexec_load            1
> >> >> -#define KEXEC_CMD_kexec_unload          2
> >> >> -typedef struct xen_kexec_load {
> >> >> +#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040300 */
> >> >> +#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040300 */
> >> >> +typedef struct xen_kexec_load_v1 {
> >> >>      int type;
> >> >>      xen_kexec_image_t image;
> >> >> -} xen_kexec_load_t;
> >> >> +} xen_kexec_load_v1_t;
> >> >
> >> > I think that this is not good idea to redefine meaning of constants,
> >> > types, structures, etc. IMO it is comparable to redefining meaning
> >> > of words in any laguage (e.g. English). It will be very confusing
> >> > and may easily lead to stupid bugs. I think that old interface should
> >> > stay as is (with its bad behavior). New interface should be introduced
> >> > with "_v2" suffix, e.g. KEXEC_CMD_kexec_load_v2, ...
> >> > This would not confuse our descendants.
> >>
> >> This is something that was requested (by Ian C) as the Xen way of doing it.
> >
> > Yes, I remember but still do not agree with that idea in general.
> > Maybe discussion on kexec interface is good point to change
> > that Xen community behavior? Ian?
>
> Together with all consumers being expected to properly make
> use of __XEN_INTERFACE_VERSION__ (or else they get the
> lowest possible definitions), I don't think there's a big issue here
> as long as all old definitions retain their meaning when said
> symbol is set low enough.

It is good for dropping or adding some functionalities. However, I still
do not agree that it is sufficient solution in this case. Here we are
redefining interface. I know that was done many times but it does not
mean this is good. Additionally, compilter does not care. It reads everything.
However, for me (I suppose at least) it is difficult read code which have
same "words" with meaning depending only on placement. I would like to read
sources without thinking where I am. That is all.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-08 15:23             ` Daniel Kiper
@ 2013-03-08 17:29               ` Andrew Cooper
  -1 siblings, 0 replies; 106+ messages in thread
From: Andrew Cooper @ 2013-03-08 17:29 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kexec, David Vrabel, xen-devel

<snip>
>> The tools know what mode the image must be called it and it can tell the
>> hypervisor and the hypervisor can trivial setup the correct mode.
>>
>> I propose:
>>
>> * Tools say: "here's an image, call it in mode X".
>>
>> You suggest:
>>
>> * Hypervisor implicitly says through some unspecified side channel: "I
>> only call images in mode Y".
> Purgatory is clearly defined. Please look into kexec-tools/purgatory.
> It is integral part of kexec infrastructure.

Purgatory might be well defined, but that is not relevant here.

The kexec syscall and hypercall basically amount to "Here is a blob. 
Its architecture is $X and its entry point is $Y" (Give or take some
reconstruction)

Xen should not be making any assumptions about these things.

As it currently stands, Xen will assume that KEXEC_load from a pv_32on64
domain is an i386 image, while a KEXEC_load from a 64bit PV domain is an
x86_64 image.

The fact that this currently works in the common case of having the
crash kernel with the same architecture as the dom0 kernel is by luck
rather than good guidance.

Furthmore, the design of the interface should not be deliberately
crippled because the common user of it "can deal with it like this";
kexec-tools is not the only potential consumer of this interface.

~Andrew

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
@ 2013-03-08 17:29               ` Andrew Cooper
  0 siblings, 0 replies; 106+ messages in thread
From: Andrew Cooper @ 2013-03-08 17:29 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kexec, David Vrabel, xen-devel

<snip>
>> The tools know what mode the image must be called it and it can tell the
>> hypervisor and the hypervisor can trivial setup the correct mode.
>>
>> I propose:
>>
>> * Tools say: "here's an image, call it in mode X".
>>
>> You suggest:
>>
>> * Hypervisor implicitly says through some unspecified side channel: "I
>> only call images in mode Y".
> Purgatory is clearly defined. Please look into kexec-tools/purgatory.
> It is integral part of kexec infrastructure.

Purgatory might be well defined, but that is not relevant here.

The kexec syscall and hypercall basically amount to "Here is a blob. 
Its architecture is $X and its entry point is $Y" (Give or take some
reconstruction)

Xen should not be making any assumptions about these things.

As it currently stands, Xen will assume that KEXEC_load from a pv_32on64
domain is an i386 image, while a KEXEC_load from a 64bit PV domain is an
x86_64 image.

The fact that this currently works in the common case of having the
crash kernel with the same architecture as the dom0 kernel is by luck
rather than good guidance.

Furthmore, the design of the interface should not be deliberately
crippled because the common user of it "can deal with it like this";
kexec-tools is not the only potential consumer of this interface.

~Andrew

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-08 17:29               ` [Xen-devel] " Andrew Cooper
  (?)
  (?)
@ 2013-03-08 21:45               ` Daniel Kiper
  -1 siblings, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-08 21:45 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: kexec, David Vrabel, xen-devel

On Fri, Mar 08, 2013 at 05:29:05PM +0000, Andrew Cooper wrote:
> <snip>
> >> The tools know what mode the image must be called it and it can tell the
> >> hypervisor and the hypervisor can trivial setup the correct mode.
> >>
> >> I propose:
> >>
> >> * Tools say: "here's an image, call it in mode X".
> >>
> >> You suggest:
> >>
> >> * Hypervisor implicitly says through some unspecified side channel: "I
> >> only call images in mode Y".
> > Purgatory is clearly defined. Please look into kexec-tools/purgatory.
> > It is integral part of kexec infrastructure.
>
> Purgatory might be well defined, but that is not relevant here.
>
> The kexec syscall and hypercall basically amount to "Here is a blob.
> Its architecture is $X and its entry point is $Y"

kexec syscall use architecture information to check that given
image could be executed on given platform. That is all.

> (Give or take some reconstruction)

What does this reconstruction? Hypervisor?

> Xen should not be making any assumptions about these things.
>
> As it currently stands, Xen will assume that KEXEC_load from a pv_32on64
> domain is an i386 image, while a KEXEC_load from a 64bit PV domain is an
> x86_64 image.

I do not understand. First you write that "Xen should not be making any
assumptions about these things" and in the next sentence you state
that "Xen will assume that...". What do you mean by that?

And why do you force users to use image for one architecture (in this case
subarchitecture)? I (as a user) would like to have a choice.

> The fact that this currently works in the common case of having the
> crash kernel with the same architecture as the dom0 kernel is by luck
> rather than good guidance.

OK, I agree but in this case following part of patch 5/8:

if ( image->arch == EM_386 )
  reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;

should be change to:

if ( is_pv_32on64_domain(dom0) )
  reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;

> Furthmore, the design of the interface should not be deliberately
> crippled because the common user of it "can deal with it like this";

If something is good and tested in many ways, on many architectures,
very long time, why not use it? What is the difference between Xen
and other architectures?

> kexec-tools is not the only potential consumer of this interface.

Potentialy yes but as I know (correct me if I am wrong) kexec-tools
is only one tool, until now, which uses kexec syscall/hypercall.
If we use this tool we should align to widely accepted rules.
If we do not like them then we should convince maintainers that
our approach is better or write our own tool with our own rules.
But then we should not call it kexec.

Daniel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-08 17:29               ` [Xen-devel] " Andrew Cooper
  (?)
@ 2013-03-08 21:45               ` Daniel Kiper
  2013-03-08 23:38                 ` Andrew Cooper
  2013-03-08 23:38                 ` [Xen-devel] " Andrew Cooper
  -1 siblings, 2 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-08 21:45 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: kexec, David Vrabel, xen-devel

On Fri, Mar 08, 2013 at 05:29:05PM +0000, Andrew Cooper wrote:
> <snip>
> >> The tools know what mode the image must be called it and it can tell the
> >> hypervisor and the hypervisor can trivial setup the correct mode.
> >>
> >> I propose:
> >>
> >> * Tools say: "here's an image, call it in mode X".
> >>
> >> You suggest:
> >>
> >> * Hypervisor implicitly says through some unspecified side channel: "I
> >> only call images in mode Y".
> > Purgatory is clearly defined. Please look into kexec-tools/purgatory.
> > It is integral part of kexec infrastructure.
>
> Purgatory might be well defined, but that is not relevant here.
>
> The kexec syscall and hypercall basically amount to "Here is a blob.
> Its architecture is $X and its entry point is $Y"

kexec syscall use architecture information to check that given
image could be executed on given platform. That is all.

> (Give or take some reconstruction)

What does this reconstruction? Hypervisor?

> Xen should not be making any assumptions about these things.
>
> As it currently stands, Xen will assume that KEXEC_load from a pv_32on64
> domain is an i386 image, while a KEXEC_load from a 64bit PV domain is an
> x86_64 image.

I do not understand. First you write that "Xen should not be making any
assumptions about these things" and in the next sentence you state
that "Xen will assume that...". What do you mean by that?

And why do you force users to use image for one architecture (in this case
subarchitecture)? I (as a user) would like to have a choice.

> The fact that this currently works in the common case of having the
> crash kernel with the same architecture as the dom0 kernel is by luck
> rather than good guidance.

OK, I agree but in this case following part of patch 5/8:

if ( image->arch == EM_386 )
  reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;

should be change to:

if ( is_pv_32on64_domain(dom0) )
  reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;

> Furthmore, the design of the interface should not be deliberately
> crippled because the common user of it "can deal with it like this";

If something is good and tested in many ways, on many architectures,
very long time, why not use it? What is the difference between Xen
and other architectures?

> kexec-tools is not the only potential consumer of this interface.

Potentialy yes but as I know (correct me if I am wrong) kexec-tools
is only one tool, until now, which uses kexec syscall/hypercall.
If we use this tool we should align to widely accepted rules.
If we do not like them then we should convince maintainers that
our approach is better or write our own tool with our own rules.
But then we should not call it kexec.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-08 21:45               ` Daniel Kiper
@ 2013-03-08 23:38                 ` Andrew Cooper
  2013-03-08 23:38                 ` [Xen-devel] " Andrew Cooper
  1 sibling, 0 replies; 106+ messages in thread
From: Andrew Cooper @ 2013-03-08 23:38 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kexec, David Vrabel, xen-devel

On 08/03/13 21:45, Daniel Kiper wrote:
> On Fri, Mar 08, 2013 at 05:29:05PM +0000, Andrew Cooper wrote:
>> <snip>
>>>> The tools know what mode the image must be called it and it can tell the
>>>> hypervisor and the hypervisor can trivial setup the correct mode.
>>>>
>>>> I propose:
>>>>
>>>> * Tools say: "here's an image, call it in mode X".
>>>>
>>>> You suggest:
>>>>
>>>> * Hypervisor implicitly says through some unspecified side channel: "I
>>>> only call images in mode Y".
>>> Purgatory is clearly defined. Please look into kexec-tools/purgatory.
>>> It is integral part of kexec infrastructure.
>> Purgatory might be well defined, but that is not relevant here.
>>
>> The kexec syscall and hypercall basically amount to "Here is a blob.
>> Its architecture is $X and its entry point is $Y"
> kexec syscall use architecture information to check that given
> image could be executed on given platform. That is all.

And how is 'could' distinguished?

A basic sanity check at load time of "is $X an operating mode I can get
to at some point in the future" is fine, and useful to eliminate the
case of trying to load something claiming to be an ARM blob on an x86
machine.

However, the entry point given can only possibly work in one operating
mode.  If $X is i386 and Xen jumps to it with long mode enabled, then it
will crash very quickly.  Conversely, if $X is x86_64 and Xen jumps to
it in protected mode, another crash will occur.

>
>> (Give or take some reconstruction)
> What does this reconstruction? Hypervisor?

Under the current implementation, the dom0 kernel.  Under the new
planned implementation, Xen.

>
>> Xen should not be making any assumptions about these things.
>>
>> As it currently stands, Xen will assume that KEXEC_load from a pv_32on64
>> domain is an i386 image, while a KEXEC_load from a 64bit PV domain is an
>> x86_64 image.
> I do not understand. First you write that "Xen should not be making any
> assumptions about these things" and in the next sentence you state
> that "Xen will assume that...". What do you mean by that?

Sorry for the confustion - That is what happens in the current
implementation.

>
> And why do you force users to use image for one architecture (in this case
> subarchitecture)? I (as a user) would like to have a choice.

The image can do whatever it wants once it is running.

>
>> The fact that this currently works in the common case of having the
>> crash kernel with the same architecture as the dom0 kernel is by luck
>> rather than good guidance.
> OK, I agree but in this case following part of patch 5/8:
>
> if ( image->arch == EM_386 )
>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
>
> should be change to:
>
> if ( is_pv_32on64_domain(dom0) )
>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;

No - specifically not.  This is the whole problem we are trying to avoid.

The current running architecture of dom0 has no place trying to
second-guess the intended architecture of the blob.

What happens if I as the user am currently running a 32bit dom0 on 64
bit Xen, and want to load a 64bit blob to jump to?

Under your suggestion, I as the user have to declare it to be a 32bit
blob and write a 32->64 shim at the beginning of it.  Under Davids
suggestion, all I as the user have to do is to tell Xen that it is
indeed a 64bit image.

>
>> Furthmore, the design of the interface should not be deliberately
>> crippled because the common user of it "can deal with it like this";
> If something is good and tested in many ways, on many architectures,
> very long time, why not use it? What is the difference between Xen
> and other architectures?

argumentum ad antiquitatem

Not that I wish to jibe at kexec-tools, but to point out the fallacy of
an argument on that basis.


About "good and tested", the current kexec handover mechanism is insane,
and is frankly a miracle it ever worked in the first place.

Lets take the example of a 32bit dom0 on 64bit Xen and a 32bit crash kernel

(The following is to the best of my understanding, so apologies if I
have misunderstood bits)

1) /sbin/kexec bundles a 32bit kernel and initrd, along with purgatory
etc and makes a kexec system call
2) dom0 copies the segments into regular kalloc()'d chunks
3) dom0 constructs a control page, bundles some control state together
and makes a kexec hypercall
4) Xen saves the control data and overwrites the dom0 provided virtual
addresses

In the case of a crash

1) Xen writes crash notes and shuts down as fast as possible
2) Because dom0 is 32bit, Xen sets up 32bit mode non-pae 1:1mapped and
3a) might die there and then because the control page living in dom0
kalloc()'d space might now be above the 4GB boundary
3b) be lucky that the control page is below the 4GB and
4) Execute the control page which sets up 32bit mode non-pae 1:1mapped
(on a different set of pagetables/GDT etc)
5) Works to reconstruct the image in the crash region which
6a) might copy in the wrong block because of 32bit truncation issues
7) Jump to the beginning of purgatory which sets up 32bit mode

And amongst all of that, I am still unsure of whether there are other
issues because of an "unsigned long page_list[]" in the 64bit hypervisor
being different from the "unsigned long page_list[]" used by the 32bit
control page.  In machine_kexec_load() in the hypervisor, we make no
sanity checks against the assertions of the comments.


In the proposed new interface, we do not need to set up the correct
state for purgatory, jump into the dom0 control page which re-sets up
different equivalent state, just to reconstruct the image and jump to it.

As for the different architecture of Xen, I hope the above shows exacly
why it is different, and why it is dangerous to use assumptions based on
is_pv_32on64_domain(dom0)

>
>> kexec-tools is not the only potential consumer of this interface.
> Potentialy yes but as I know (correct me if I am wrong) kexec-tools
> is only one tool, until now, which uses kexec syscall/hypercall.
> If we use this tool we should align to widely accepted rules.
> If we do not like them then we should convince maintainers that
> our approach is better or write our own tool with our own rules.
> But then we should not call it kexec.
>
> Daniel

I see no reason why Davids proposed interface is incompatible with
kexec-tools.  Do you?

~Andrew

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-08 21:45               ` Daniel Kiper
  2013-03-08 23:38                 ` Andrew Cooper
@ 2013-03-08 23:38                 ` Andrew Cooper
  2013-03-11 11:17                   ` Daniel Kiper
  2013-03-11 11:17                   ` [Xen-devel] " Daniel Kiper
  1 sibling, 2 replies; 106+ messages in thread
From: Andrew Cooper @ 2013-03-08 23:38 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kexec, David Vrabel, xen-devel

On 08/03/13 21:45, Daniel Kiper wrote:
> On Fri, Mar 08, 2013 at 05:29:05PM +0000, Andrew Cooper wrote:
>> <snip>
>>>> The tools know what mode the image must be called it and it can tell the
>>>> hypervisor and the hypervisor can trivial setup the correct mode.
>>>>
>>>> I propose:
>>>>
>>>> * Tools say: "here's an image, call it in mode X".
>>>>
>>>> You suggest:
>>>>
>>>> * Hypervisor implicitly says through some unspecified side channel: "I
>>>> only call images in mode Y".
>>> Purgatory is clearly defined. Please look into kexec-tools/purgatory.
>>> It is integral part of kexec infrastructure.
>> Purgatory might be well defined, but that is not relevant here.
>>
>> The kexec syscall and hypercall basically amount to "Here is a blob.
>> Its architecture is $X and its entry point is $Y"
> kexec syscall use architecture information to check that given
> image could be executed on given platform. That is all.

And how is 'could' distinguished?

A basic sanity check at load time of "is $X an operating mode I can get
to at some point in the future" is fine, and useful to eliminate the
case of trying to load something claiming to be an ARM blob on an x86
machine.

However, the entry point given can only possibly work in one operating
mode.  If $X is i386 and Xen jumps to it with long mode enabled, then it
will crash very quickly.  Conversely, if $X is x86_64 and Xen jumps to
it in protected mode, another crash will occur.

>
>> (Give or take some reconstruction)
> What does this reconstruction? Hypervisor?

Under the current implementation, the dom0 kernel.  Under the new
planned implementation, Xen.

>
>> Xen should not be making any assumptions about these things.
>>
>> As it currently stands, Xen will assume that KEXEC_load from a pv_32on64
>> domain is an i386 image, while a KEXEC_load from a 64bit PV domain is an
>> x86_64 image.
> I do not understand. First you write that "Xen should not be making any
> assumptions about these things" and in the next sentence you state
> that "Xen will assume that...". What do you mean by that?

Sorry for the confustion - That is what happens in the current
implementation.

>
> And why do you force users to use image for one architecture (in this case
> subarchitecture)? I (as a user) would like to have a choice.

The image can do whatever it wants once it is running.

>
>> The fact that this currently works in the common case of having the
>> crash kernel with the same architecture as the dom0 kernel is by luck
>> rather than good guidance.
> OK, I agree but in this case following part of patch 5/8:
>
> if ( image->arch == EM_386 )
>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
>
> should be change to:
>
> if ( is_pv_32on64_domain(dom0) )
>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;

No - specifically not.  This is the whole problem we are trying to avoid.

The current running architecture of dom0 has no place trying to
second-guess the intended architecture of the blob.

What happens if I as the user am currently running a 32bit dom0 on 64
bit Xen, and want to load a 64bit blob to jump to?

Under your suggestion, I as the user have to declare it to be a 32bit
blob and write a 32->64 shim at the beginning of it.  Under Davids
suggestion, all I as the user have to do is to tell Xen that it is
indeed a 64bit image.

>
>> Furthmore, the design of the interface should not be deliberately
>> crippled because the common user of it "can deal with it like this";
> If something is good and tested in many ways, on many architectures,
> very long time, why not use it? What is the difference between Xen
> and other architectures?

argumentum ad antiquitatem

Not that I wish to jibe at kexec-tools, but to point out the fallacy of
an argument on that basis.


About "good and tested", the current kexec handover mechanism is insane,
and is frankly a miracle it ever worked in the first place.

Lets take the example of a 32bit dom0 on 64bit Xen and a 32bit crash kernel

(The following is to the best of my understanding, so apologies if I
have misunderstood bits)

1) /sbin/kexec bundles a 32bit kernel and initrd, along with purgatory
etc and makes a kexec system call
2) dom0 copies the segments into regular kalloc()'d chunks
3) dom0 constructs a control page, bundles some control state together
and makes a kexec hypercall
4) Xen saves the control data and overwrites the dom0 provided virtual
addresses

In the case of a crash

1) Xen writes crash notes and shuts down as fast as possible
2) Because dom0 is 32bit, Xen sets up 32bit mode non-pae 1:1mapped and
3a) might die there and then because the control page living in dom0
kalloc()'d space might now be above the 4GB boundary
3b) be lucky that the control page is below the 4GB and
4) Execute the control page which sets up 32bit mode non-pae 1:1mapped
(on a different set of pagetables/GDT etc)
5) Works to reconstruct the image in the crash region which
6a) might copy in the wrong block because of 32bit truncation issues
7) Jump to the beginning of purgatory which sets up 32bit mode

And amongst all of that, I am still unsure of whether there are other
issues because of an "unsigned long page_list[]" in the 64bit hypervisor
being different from the "unsigned long page_list[]" used by the 32bit
control page.  In machine_kexec_load() in the hypervisor, we make no
sanity checks against the assertions of the comments.


In the proposed new interface, we do not need to set up the correct
state for purgatory, jump into the dom0 control page which re-sets up
different equivalent state, just to reconstruct the image and jump to it.

As for the different architecture of Xen, I hope the above shows exacly
why it is different, and why it is dangerous to use assumptions based on
is_pv_32on64_domain(dom0)

>
>> kexec-tools is not the only potential consumer of this interface.
> Potentialy yes but as I know (correct me if I am wrong) kexec-tools
> is only one tool, until now, which uses kexec syscall/hypercall.
> If we use this tool we should align to widely accepted rules.
> If we do not like them then we should convince maintainers that
> our approach is better or write our own tool with our own rules.
> But then we should not call it kexec.
>
> Daniel

I see no reason why Davids proposed interface is incompatible with
kexec-tools.  Do you?

~Andrew

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-08 23:38                 ` [Xen-devel] " Andrew Cooper
@ 2013-03-11 11:17                   ` Daniel Kiper
  2013-03-11 11:17                   ` [Xen-devel] " Daniel Kiper
  1 sibling, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-11 11:17 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: kexec, David Vrabel, xen-devel

On Fri, Mar 08, 2013 at 11:38:03PM +0000, Andrew Cooper wrote:
> On 08/03/13 21:45, Daniel Kiper wrote:
> > On Fri, Mar 08, 2013 at 05:29:05PM +0000, Andrew Cooper wrote:
> >> <snip>
> >>>> The tools know what mode the image must be called it and it can tell the
> >>>> hypervisor and the hypervisor can trivial setup the correct mode.
> >>>>
> >>>> I propose:
> >>>>
> >>>> * Tools say: "here's an image, call it in mode X".
> >>>>
> >>>> You suggest:
> >>>>
> >>>> * Hypervisor implicitly says through some unspecified side channel: "I
> >>>> only call images in mode Y".
> >>> Purgatory is clearly defined. Please look into kexec-tools/purgatory.
> >>> It is integral part of kexec infrastructure.
> >> Purgatory might be well defined, but that is not relevant here.
> >>
> >> The kexec syscall and hypercall basically amount to "Here is a blob.
> >> Its architecture is $X and its entry point is $Y"
> > kexec syscall use architecture information to check that given
> > image could be executed on given platform. That is all.
>
> And how is 'could' distinguished?
>
> A basic sanity check at load time of "is $X an operating mode I can get
> to at some point in the future" is fine, and useful to eliminate the
> case of trying to load something claiming to be an ARM blob on an x86
> machine.
>
> However, the entry point given can only possibly work in one operating
> mode.  If $X is i386 and Xen jumps to it with long mode enabled, then it
> will crash very quickly.  Conversely, if $X is x86_64 and Xen jumps to
> it in protected mode, another crash will occur.

It always works because purgatory sets "native mode". It means that machine
before execution of new kernel is in state like it would be after BIOS
initialization. It is assumption for all architectures and it is always
done by purgatory.

> >> (Give or take some reconstruction)
> > What does this reconstruction? Hypervisor?
>
> Under the current implementation, the dom0 kernel.  Under the new
> planned implementation, Xen.

What do you mean by reconstruction? Setting to "native mode"?

[...]

> >> The fact that this currently works in the common case of having the
> >> crash kernel with the same architecture as the dom0 kernel is by luck
> >> rather than good guidance.
> > OK, I agree but in this case following part of patch 5/8:
> >
> > if ( image->arch == EM_386 )
> >   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
> >
> > should be change to:
> >
> > if ( is_pv_32on64_domain(dom0) )
> >   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
>
> No - specifically not.  This is the whole problem we are trying to avoid.
>
> The current running architecture of dom0 has no place trying to
> second-guess the intended architecture of the blob.
>
> What happens if I as the user am currently running a 32bit dom0 on 64
> bit Xen, and want to load a 64bit blob to jump to?
>
> Under your suggestion, I as the user have to declare it to be a 32bit
> blob and write a 32->64 shim at the beginning of it.  Under Davids
> suggestion, all I as the user have to do is to tell Xen that it is
> indeed a 64bit image.

You forgot about purgatory code. Just reminder:

old_kernel (Xen) -> purgatory (native mode) -> new_kernel

purgatory architecture is same as kexec-tools architecture. If you
use dom0 i386 it means that kexec-tools is (and must be) i386 too.
We do not support Xen i386 anymore. It means that my condition is
correct.

> >> Furthmore, the design of the interface should not be deliberately
> >> crippled because the common user of it "can deal with it like this";
> > If something is good and tested in many ways, on many architectures,
> > very long time, why not use it? What is the difference between Xen
> > and other architectures?
>
> argumentum ad antiquitatem
>
> Not that I wish to jibe at kexec-tools, but to point out the fallacy of
> an argument on that basis.
>
>
> About "good and tested", the current kexec handover mechanism is insane,
> and is frankly a miracle it ever worked in the first place.
>
> Lets take the example of a 32bit dom0 on 64bit Xen and a 32bit crash kernel
>
> (The following is to the best of my understanding, so apologies if I
> have misunderstood bits)
>
> 1) /sbin/kexec bundles a 32bit kernel and initrd, along with purgatory
> etc and makes a kexec system call
> 2) dom0 copies the segments into regular kalloc()'d chunks
> 3) dom0 constructs a control page, bundles some control state together
> and makes a kexec hypercall
> 4) Xen saves the control data and overwrites the dom0 provided virtual
> addresses
>
> In the case of a crash
>
> 1) Xen writes crash notes and shuts down as fast as possible
> 2) Because dom0 is 32bit, Xen sets up 32bit mode non-pae 1:1mapped and
> 3a) might die there and then because the control page living in dom0
> kalloc()'d space might now be above the 4GB boundary
> 3b) be lucky that the control page is below the 4GB and
> 4) Execute the control page which sets up 32bit mode non-pae 1:1mapped
> (on a different set of pagetables/GDT etc)
> 5) Works to reconstruct the image in the crash region which
> 6a) might copy in the wrong block because of 32bit truncation issues
> 7) Jump to the beginning of purgatory which sets up 32bit mode
>
> And amongst all of that, I am still unsure of whether there are other
> issues because of an "unsigned long page_list[]" in the 64bit hypervisor
> being different from the "unsigned long page_list[]" used by the 32bit
> control page.  In machine_kexec_load() in the hypervisor, we make no
> sanity checks against the assertions of the comments.
>
>
> In the proposed new interface, we do not need to set up the correct
> state for purgatory, jump into the dom0 control page which re-sets up
> different equivalent state, just to reconstruct the image and jump to it.
>
> As for the different architecture of Xen, I hope the above shows exacly
> why it is different, and why it is dangerous to use assumptions based on
> is_pv_32on64_domain(dom0)
>
> >
> >> kexec-tools is not the only potential consumer of this interface.
> > Potentialy yes but as I know (correct me if I am wrong) kexec-tools
> > is only one tool, until now, which uses kexec syscall/hypercall.
> > If we use this tool we should align to widely accepted rules.
> > If we do not like them then we should convince maintainers that
> > our approach is better or write our own tool with our own rules.
> > But then we should not call it kexec.
> >
> > Daniel
>
> I see no reason why Davids proposed interface is incompatible with
> kexec-tools.  Do you?

Heh... It looks that there is a misunderstanding. At first I thought
that David was going to replace purgatory functionality by switching
from 64-bit to 32-bit in kexec_reloc. But later I realized that
I missed Xen 64-bit/dom 32-bit case. Now I agree that this switch
must stay as is. However, now I think that there is another
small mistake which should be fixed. Please look above.

Daniel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-08 23:38                 ` [Xen-devel] " Andrew Cooper
  2013-03-11 11:17                   ` Daniel Kiper
@ 2013-03-11 11:17                   ` Daniel Kiper
  2013-03-11 13:21                     ` David Vrabel
  2013-03-11 13:21                     ` [Xen-devel] " David Vrabel
  1 sibling, 2 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-11 11:17 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: kexec, David Vrabel, xen-devel

On Fri, Mar 08, 2013 at 11:38:03PM +0000, Andrew Cooper wrote:
> On 08/03/13 21:45, Daniel Kiper wrote:
> > On Fri, Mar 08, 2013 at 05:29:05PM +0000, Andrew Cooper wrote:
> >> <snip>
> >>>> The tools know what mode the image must be called it and it can tell the
> >>>> hypervisor and the hypervisor can trivial setup the correct mode.
> >>>>
> >>>> I propose:
> >>>>
> >>>> * Tools say: "here's an image, call it in mode X".
> >>>>
> >>>> You suggest:
> >>>>
> >>>> * Hypervisor implicitly says through some unspecified side channel: "I
> >>>> only call images in mode Y".
> >>> Purgatory is clearly defined. Please look into kexec-tools/purgatory.
> >>> It is integral part of kexec infrastructure.
> >> Purgatory might be well defined, but that is not relevant here.
> >>
> >> The kexec syscall and hypercall basically amount to "Here is a blob.
> >> Its architecture is $X and its entry point is $Y"
> > kexec syscall use architecture information to check that given
> > image could be executed on given platform. That is all.
>
> And how is 'could' distinguished?
>
> A basic sanity check at load time of "is $X an operating mode I can get
> to at some point in the future" is fine, and useful to eliminate the
> case of trying to load something claiming to be an ARM blob on an x86
> machine.
>
> However, the entry point given can only possibly work in one operating
> mode.  If $X is i386 and Xen jumps to it with long mode enabled, then it
> will crash very quickly.  Conversely, if $X is x86_64 and Xen jumps to
> it in protected mode, another crash will occur.

It always works because purgatory sets "native mode". It means that machine
before execution of new kernel is in state like it would be after BIOS
initialization. It is assumption for all architectures and it is always
done by purgatory.

> >> (Give or take some reconstruction)
> > What does this reconstruction? Hypervisor?
>
> Under the current implementation, the dom0 kernel.  Under the new
> planned implementation, Xen.

What do you mean by reconstruction? Setting to "native mode"?

[...]

> >> The fact that this currently works in the common case of having the
> >> crash kernel with the same architecture as the dom0 kernel is by luck
> >> rather than good guidance.
> > OK, I agree but in this case following part of patch 5/8:
> >
> > if ( image->arch == EM_386 )
> >   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
> >
> > should be change to:
> >
> > if ( is_pv_32on64_domain(dom0) )
> >   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
>
> No - specifically not.  This is the whole problem we are trying to avoid.
>
> The current running architecture of dom0 has no place trying to
> second-guess the intended architecture of the blob.
>
> What happens if I as the user am currently running a 32bit dom0 on 64
> bit Xen, and want to load a 64bit blob to jump to?
>
> Under your suggestion, I as the user have to declare it to be a 32bit
> blob and write a 32->64 shim at the beginning of it.  Under Davids
> suggestion, all I as the user have to do is to tell Xen that it is
> indeed a 64bit image.

You forgot about purgatory code. Just reminder:

old_kernel (Xen) -> purgatory (native mode) -> new_kernel

purgatory architecture is same as kexec-tools architecture. If you
use dom0 i386 it means that kexec-tools is (and must be) i386 too.
We do not support Xen i386 anymore. It means that my condition is
correct.

> >> Furthmore, the design of the interface should not be deliberately
> >> crippled because the common user of it "can deal with it like this";
> > If something is good and tested in many ways, on many architectures,
> > very long time, why not use it? What is the difference between Xen
> > and other architectures?
>
> argumentum ad antiquitatem
>
> Not that I wish to jibe at kexec-tools, but to point out the fallacy of
> an argument on that basis.
>
>
> About "good and tested", the current kexec handover mechanism is insane,
> and is frankly a miracle it ever worked in the first place.
>
> Lets take the example of a 32bit dom0 on 64bit Xen and a 32bit crash kernel
>
> (The following is to the best of my understanding, so apologies if I
> have misunderstood bits)
>
> 1) /sbin/kexec bundles a 32bit kernel and initrd, along with purgatory
> etc and makes a kexec system call
> 2) dom0 copies the segments into regular kalloc()'d chunks
> 3) dom0 constructs a control page, bundles some control state together
> and makes a kexec hypercall
> 4) Xen saves the control data and overwrites the dom0 provided virtual
> addresses
>
> In the case of a crash
>
> 1) Xen writes crash notes and shuts down as fast as possible
> 2) Because dom0 is 32bit, Xen sets up 32bit mode non-pae 1:1mapped and
> 3a) might die there and then because the control page living in dom0
> kalloc()'d space might now be above the 4GB boundary
> 3b) be lucky that the control page is below the 4GB and
> 4) Execute the control page which sets up 32bit mode non-pae 1:1mapped
> (on a different set of pagetables/GDT etc)
> 5) Works to reconstruct the image in the crash region which
> 6a) might copy in the wrong block because of 32bit truncation issues
> 7) Jump to the beginning of purgatory which sets up 32bit mode
>
> And amongst all of that, I am still unsure of whether there are other
> issues because of an "unsigned long page_list[]" in the 64bit hypervisor
> being different from the "unsigned long page_list[]" used by the 32bit
> control page.  In machine_kexec_load() in the hypervisor, we make no
> sanity checks against the assertions of the comments.
>
>
> In the proposed new interface, we do not need to set up the correct
> state for purgatory, jump into the dom0 control page which re-sets up
> different equivalent state, just to reconstruct the image and jump to it.
>
> As for the different architecture of Xen, I hope the above shows exacly
> why it is different, and why it is dangerous to use assumptions based on
> is_pv_32on64_domain(dom0)
>
> >
> >> kexec-tools is not the only potential consumer of this interface.
> > Potentialy yes but as I know (correct me if I am wrong) kexec-tools
> > is only one tool, until now, which uses kexec syscall/hypercall.
> > If we use this tool we should align to widely accepted rules.
> > If we do not like them then we should convince maintainers that
> > our approach is better or write our own tool with our own rules.
> > But then we should not call it kexec.
> >
> > Daniel
>
> I see no reason why Davids proposed interface is incompatible with
> kexec-tools.  Do you?

Heh... It looks that there is a misunderstanding. At first I thought
that David was going to replace purgatory functionality by switching
from 64-bit to 32-bit in kexec_reloc. But later I realized that
I missed Xen 64-bit/dom 32-bit case. Now I agree that this switch
must stay as is. However, now I think that there is another
small mistake which should be fixed. Please look above.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-11 11:17                   ` [Xen-devel] " Daniel Kiper
@ 2013-03-11 13:21                     ` David Vrabel
  2013-03-11 13:21                     ` [Xen-devel] " David Vrabel
  1 sibling, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-03-11 13:21 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: Andrew Cooper, kexec, xen-devel

On 11/03/13 11:17, Daniel Kiper wrote:
> 
> Heh... It looks that there is a misunderstanding. At first I thought
> that David was going to replace purgatory functionality by switching
> from 64-bit to 32-bit in kexec_reloc. But later I realized that
> I missed Xen 64-bit/dom 32-bit case. Now I agree that this switch
> must stay as is. However, now I think that there is another
> small mistake which should be fixed. Please look above.

Which mistake?  I'm not sure what you're referring to.

David

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-11 11:17                   ` [Xen-devel] " Daniel Kiper
  2013-03-11 13:21                     ` David Vrabel
@ 2013-03-11 13:21                     ` David Vrabel
  2013-03-11 13:30                       ` Daniel Kiper
  2013-03-11 13:30                       ` Daniel Kiper
  1 sibling, 2 replies; 106+ messages in thread
From: David Vrabel @ 2013-03-11 13:21 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: Andrew Cooper, kexec, xen-devel

On 11/03/13 11:17, Daniel Kiper wrote:
> 
> Heh... It looks that there is a misunderstanding. At first I thought
> that David was going to replace purgatory functionality by switching
> from 64-bit to 32-bit in kexec_reloc. But later I realized that
> I missed Xen 64-bit/dom 32-bit case. Now I agree that this switch
> must stay as is. However, now I think that there is another
> small mistake which should be fixed. Please look above.

Which mistake?  I'm not sure what you're referring to.

David

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-11 13:21                     ` [Xen-devel] " David Vrabel
  2013-03-11 13:30                       ` Daniel Kiper
@ 2013-03-11 13:30                       ` Daniel Kiper
  1 sibling, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-11 13:30 UTC (permalink / raw)
  To: David Vrabel; +Cc: Andrew Cooper, kexec, xen-devel

On Mon, Mar 11, 2013 at 01:21:30PM +0000, David Vrabel wrote:
> On 11/03/13 11:17, Daniel Kiper wrote:
> >
> > Heh... It looks that there is a misunderstanding. At first I thought
> > that David was going to replace purgatory functionality by switching
> > from 64-bit to 32-bit in kexec_reloc. But later I realized that
> > I missed Xen 64-bit/dom 32-bit case. Now I agree that this switch
> > must stay as is. However, now I think that there is another
> > small mistake which should be fixed. Please look above.
>
> Which mistake?  I'm not sure what you're referring to.

I thought about that:

if ( image->arch == EM_386 )
  reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;

It should be change to:

if ( is_pv_32on64_domain(dom0) )
  reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;

Daniel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-11 13:21                     ` [Xen-devel] " David Vrabel
@ 2013-03-11 13:30                       ` Daniel Kiper
  2013-03-11 13:43                         ` David Vrabel
  2013-03-11 13:43                         ` [Xen-devel] " David Vrabel
  2013-03-11 13:30                       ` Daniel Kiper
  1 sibling, 2 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-11 13:30 UTC (permalink / raw)
  To: David Vrabel; +Cc: Andrew Cooper, kexec, xen-devel

On Mon, Mar 11, 2013 at 01:21:30PM +0000, David Vrabel wrote:
> On 11/03/13 11:17, Daniel Kiper wrote:
> >
> > Heh... It looks that there is a misunderstanding. At first I thought
> > that David was going to replace purgatory functionality by switching
> > from 64-bit to 32-bit in kexec_reloc. But later I realized that
> > I missed Xen 64-bit/dom 32-bit case. Now I agree that this switch
> > must stay as is. However, now I think that there is another
> > small mistake which should be fixed. Please look above.
>
> Which mistake?  I'm not sure what you're referring to.

I thought about that:

if ( image->arch == EM_386 )
  reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;

It should be change to:

if ( is_pv_32on64_domain(dom0) )
  reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-11 13:30                       ` Daniel Kiper
@ 2013-03-11 13:43                         ` David Vrabel
  2013-03-11 13:43                         ` [Xen-devel] " David Vrabel
  1 sibling, 0 replies; 106+ messages in thread
From: David Vrabel @ 2013-03-11 13:43 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: Andrew Cooper, kexec, xen-devel

On 11/03/13 13:30, Daniel Kiper wrote:
> On Mon, Mar 11, 2013 at 01:21:30PM +0000, David Vrabel wrote:
>> On 11/03/13 11:17, Daniel Kiper wrote:
>>>
>>> Heh... It looks that there is a misunderstanding. At first I thought
>>> that David was going to replace purgatory functionality by switching
>>> from 64-bit to 32-bit in kexec_reloc. But later I realized that
>>> I missed Xen 64-bit/dom 32-bit case. Now I agree that this switch
>>> must stay as is. However, now I think that there is another
>>> small mistake which should be fixed. Please look above.
>>
>> Which mistake?  I'm not sure what you're referring to.
> 
> I thought about that:
> 
> if ( image->arch == EM_386 )
>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
> 
> It should be change to:
> 
> if ( is_pv_32on64_domain(dom0) )
>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;

This isn't a mistake but a deliberate improvement to the old interface.

It is clearer and more useful for this sub-architecture to be explicitly
supplied in the kexec_load call than implicitly through some other
side-channel.

If we go with what you suggest then you prevent kexec from being used
by: a) PVH dom0s; b) suitably privileged service domains; c) 32-bit
guests wanting to load an image with a 64-bit entry point; and d)
possibly other use cases you or I haven't even thought about yet.

David

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-11 13:30                       ` Daniel Kiper
  2013-03-11 13:43                         ` David Vrabel
@ 2013-03-11 13:43                         ` David Vrabel
  2013-03-11 14:13                           ` Daniel Kiper
  2013-03-11 14:13                           ` [Xen-devel] " Daniel Kiper
  1 sibling, 2 replies; 106+ messages in thread
From: David Vrabel @ 2013-03-11 13:43 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: Andrew Cooper, kexec, xen-devel

On 11/03/13 13:30, Daniel Kiper wrote:
> On Mon, Mar 11, 2013 at 01:21:30PM +0000, David Vrabel wrote:
>> On 11/03/13 11:17, Daniel Kiper wrote:
>>>
>>> Heh... It looks that there is a misunderstanding. At first I thought
>>> that David was going to replace purgatory functionality by switching
>>> from 64-bit to 32-bit in kexec_reloc. But later I realized that
>>> I missed Xen 64-bit/dom 32-bit case. Now I agree that this switch
>>> must stay as is. However, now I think that there is another
>>> small mistake which should be fixed. Please look above.
>>
>> Which mistake?  I'm not sure what you're referring to.
> 
> I thought about that:
> 
> if ( image->arch == EM_386 )
>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
> 
> It should be change to:
> 
> if ( is_pv_32on64_domain(dom0) )
>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;

This isn't a mistake but a deliberate improvement to the old interface.

It is clearer and more useful for this sub-architecture to be explicitly
supplied in the kexec_load call than implicitly through some other
side-channel.

If we go with what you suggest then you prevent kexec from being used
by: a) PVH dom0s; b) suitably privileged service domains; c) 32-bit
guests wanting to load an image with a 64-bit entry point; and d)
possibly other use cases you or I haven't even thought about yet.

David

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-11 13:43                         ` [Xen-devel] " David Vrabel
@ 2013-03-11 14:13                           ` Daniel Kiper
  2013-03-11 14:13                           ` [Xen-devel] " Daniel Kiper
  1 sibling, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-11 14:13 UTC (permalink / raw)
  To: David Vrabel; +Cc: Andrew Cooper, kexec, xen-devel

On Mon, Mar 11, 2013 at 01:43:02PM +0000, David Vrabel wrote:
> On 11/03/13 13:30, Daniel Kiper wrote:
> > On Mon, Mar 11, 2013 at 01:21:30PM +0000, David Vrabel wrote:
> >> On 11/03/13 11:17, Daniel Kiper wrote:
> >>>
> >>> Heh... It looks that there is a misunderstanding. At first I thought
> >>> that David was going to replace purgatory functionality by switching
> >>> from 64-bit to 32-bit in kexec_reloc. But later I realized that
> >>> I missed Xen 64-bit/dom 32-bit case. Now I agree that this switch
> >>> must stay as is. However, now I think that there is another
> >>> small mistake which should be fixed. Please look above.
> >>
> >> Which mistake?  I'm not sure what you're referring to.
> >
> > I thought about that:
> >
> > if ( image->arch == EM_386 )
> >   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
> >
> > It should be change to:
> >
> > if ( is_pv_32on64_domain(dom0) )
> >   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
>
> This isn't a mistake but a deliberate improvement to the old interface.

I am still not convinced.

> It is clearer and more useful for this sub-architecture to be explicitly
> supplied in the kexec_load call than implicitly through some other
> side-channel.

First of all you do not need to pass any info about architecure to
new kernel or something like that (please check my previous emails).
If any then there is another questions. What do you do if you need
second or third argument?. You redefine kexec interface once again.
For what? Additionally, currently there are a lot of stuff passed
to new kernel via purgatory. And purgatory is called by your
interface too...

> If we go with what you suggest then you prevent kexec from being used
> by: a) PVH dom0s; b) suitably privileged service domains; c) 32-bit

Maybe for PVH should be different check. However,
until now we do not have it in Xen yet.

> guests wanting to load an image with a 64-bit entry point; and d)

Once again:

old_kernel (Xen) -> purgatory (native mode) -> new_kernel

purgatory architecture is same as kexec-tools architecture. If you
use dom0 i386 it means that kexec-tools is (and must be) i386 too.
We do not support Xen i386 anymore. It means that my condition is
correct.

Daniel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-11 13:43                         ` [Xen-devel] " David Vrabel
  2013-03-11 14:13                           ` Daniel Kiper
@ 2013-03-11 14:13                           ` Daniel Kiper
  2013-03-11 14:27                             ` Andrew Cooper
  2013-03-11 14:27                             ` Andrew Cooper
  1 sibling, 2 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-11 14:13 UTC (permalink / raw)
  To: David Vrabel; +Cc: Andrew Cooper, kexec, xen-devel

On Mon, Mar 11, 2013 at 01:43:02PM +0000, David Vrabel wrote:
> On 11/03/13 13:30, Daniel Kiper wrote:
> > On Mon, Mar 11, 2013 at 01:21:30PM +0000, David Vrabel wrote:
> >> On 11/03/13 11:17, Daniel Kiper wrote:
> >>>
> >>> Heh... It looks that there is a misunderstanding. At first I thought
> >>> that David was going to replace purgatory functionality by switching
> >>> from 64-bit to 32-bit in kexec_reloc. But later I realized that
> >>> I missed Xen 64-bit/dom 32-bit case. Now I agree that this switch
> >>> must stay as is. However, now I think that there is another
> >>> small mistake which should be fixed. Please look above.
> >>
> >> Which mistake?  I'm not sure what you're referring to.
> >
> > I thought about that:
> >
> > if ( image->arch == EM_386 )
> >   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
> >
> > It should be change to:
> >
> > if ( is_pv_32on64_domain(dom0) )
> >   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
>
> This isn't a mistake but a deliberate improvement to the old interface.

I am still not convinced.

> It is clearer and more useful for this sub-architecture to be explicitly
> supplied in the kexec_load call than implicitly through some other
> side-channel.

First of all you do not need to pass any info about architecure to
new kernel or something like that (please check my previous emails).
If any then there is another questions. What do you do if you need
second or third argument?. You redefine kexec interface once again.
For what? Additionally, currently there are a lot of stuff passed
to new kernel via purgatory. And purgatory is called by your
interface too...

> If we go with what you suggest then you prevent kexec from being used
> by: a) PVH dom0s; b) suitably privileged service domains; c) 32-bit

Maybe for PVH should be different check. However,
until now we do not have it in Xen yet.

> guests wanting to load an image with a 64-bit entry point; and d)

Once again:

old_kernel (Xen) -> purgatory (native mode) -> new_kernel

purgatory architecture is same as kexec-tools architecture. If you
use dom0 i386 it means that kexec-tools is (and must be) i386 too.
We do not support Xen i386 anymore. It means that my condition is
correct.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-11 14:13                           ` [Xen-devel] " Daniel Kiper
  2013-03-11 14:27                             ` Andrew Cooper
@ 2013-03-11 14:27                             ` Andrew Cooper
  1 sibling, 0 replies; 106+ messages in thread
From: Andrew Cooper @ 2013-03-11 14:27 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kexec, David Vrabel, xen-devel

On 11/03/13 14:13, Daniel Kiper wrote:
> On Mon, Mar 11, 2013 at 01:43:02PM +0000, David Vrabel wrote:
>> On 11/03/13 13:30, Daniel Kiper wrote:
>>> On Mon, Mar 11, 2013 at 01:21:30PM +0000, David Vrabel wrote:
>>>> On 11/03/13 11:17, Daniel Kiper wrote:
>>>>> Heh... It looks that there is a misunderstanding. At first I thought
>>>>> that David was going to replace purgatory functionality by switching
>>>>> from 64-bit to 32-bit in kexec_reloc. But later I realized that
>>>>> I missed Xen 64-bit/dom 32-bit case. Now I agree that this switch
>>>>> must stay as is. However, now I think that there is another
>>>>> small mistake which should be fixed. Please look above.
>>>> Which mistake?  I'm not sure what you're referring to.
>>> I thought about that:
>>>
>>> if ( image->arch == EM_386 )
>>>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
>>>
>>> It should be change to:
>>>
>>> if ( is_pv_32on64_domain(dom0) )
>>>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
>> This isn't a mistake but a deliberate improvement to the old interface.
> I am still not convinced.
>
>> It is clearer and more useful for this sub-architecture to be explicitly
>> supplied in the kexec_load call than implicitly through some other
>> side-channel.
> First of all you do not need to pass any info about architecure to
> new kernel or something like that (please check my previous emails).

Yes - you really do.  Guessing the architecture of a blob of code is
insane, and any current interface which relies on this guessing is
broken by design.

> If any then there is another questions. What do you do if you need
> second or third argument?. You redefine kexec interface once again.
> For what? Additionally, currently there are a lot of stuff passed
> to new kernel via purgatory. And purgatory is called by your
> interface too...
>
>> If we go with what you suggest then you prevent kexec from being used
>> by: a) PVH dom0s; b) suitably privileged service domains; c) 32-bit
> Maybe for PVH should be different check. However,
> until now we do not have it in Xen yet.
>
>> guests wanting to load an image with a 64-bit entry point; and d)
> Once again:
>
> old_kernel (Xen) -> purgatory (native mode) -> new_kernel
>
> purgatory architecture is same as kexec-tools architecture. If you
> use dom0 i386 it means that kexec-tools is (and must be) i386 too.
> We do not support Xen i386 anymore. It means that my condition is
> correct.
>
> Daniel

And what happens when kexec-tools is using ia32-libs under a 64bit dom0?
That would also break.

The logic is very simple.

If the blob passed in kexec_load claims to be 32bit, Xen will switch
into 32bit mode before executing it.  If the blob claims to be 64 bit
then Xen will stay in 64bit mode before executing it.

This way, purgatory from any kind of multi-arch setup in dom0 will work,
as well as all of the other usecases which your suggestion breaks.

~Andrew

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-11 14:13                           ` [Xen-devel] " Daniel Kiper
@ 2013-03-11 14:27                             ` Andrew Cooper
  2013-03-11 20:45                               ` Daniel Kiper
  2013-03-11 20:45                               ` Daniel Kiper
  2013-03-11 14:27                             ` Andrew Cooper
  1 sibling, 2 replies; 106+ messages in thread
From: Andrew Cooper @ 2013-03-11 14:27 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kexec, David Vrabel, xen-devel

On 11/03/13 14:13, Daniel Kiper wrote:
> On Mon, Mar 11, 2013 at 01:43:02PM +0000, David Vrabel wrote:
>> On 11/03/13 13:30, Daniel Kiper wrote:
>>> On Mon, Mar 11, 2013 at 01:21:30PM +0000, David Vrabel wrote:
>>>> On 11/03/13 11:17, Daniel Kiper wrote:
>>>>> Heh... It looks that there is a misunderstanding. At first I thought
>>>>> that David was going to replace purgatory functionality by switching
>>>>> from 64-bit to 32-bit in kexec_reloc. But later I realized that
>>>>> I missed Xen 64-bit/dom 32-bit case. Now I agree that this switch
>>>>> must stay as is. However, now I think that there is another
>>>>> small mistake which should be fixed. Please look above.
>>>> Which mistake?  I'm not sure what you're referring to.
>>> I thought about that:
>>>
>>> if ( image->arch == EM_386 )
>>>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
>>>
>>> It should be change to:
>>>
>>> if ( is_pv_32on64_domain(dom0) )
>>>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
>> This isn't a mistake but a deliberate improvement to the old interface.
> I am still not convinced.
>
>> It is clearer and more useful for this sub-architecture to be explicitly
>> supplied in the kexec_load call than implicitly through some other
>> side-channel.
> First of all you do not need to pass any info about architecure to
> new kernel or something like that (please check my previous emails).

Yes - you really do.  Guessing the architecture of a blob of code is
insane, and any current interface which relies on this guessing is
broken by design.

> If any then there is another questions. What do you do if you need
> second or third argument?. You redefine kexec interface once again.
> For what? Additionally, currently there are a lot of stuff passed
> to new kernel via purgatory. And purgatory is called by your
> interface too...
>
>> If we go with what you suggest then you prevent kexec from being used
>> by: a) PVH dom0s; b) suitably privileged service domains; c) 32-bit
> Maybe for PVH should be different check. However,
> until now we do not have it in Xen yet.
>
>> guests wanting to load an image with a 64-bit entry point; and d)
> Once again:
>
> old_kernel (Xen) -> purgatory (native mode) -> new_kernel
>
> purgatory architecture is same as kexec-tools architecture. If you
> use dom0 i386 it means that kexec-tools is (and must be) i386 too.
> We do not support Xen i386 anymore. It means that my condition is
> correct.
>
> Daniel

And what happens when kexec-tools is using ia32-libs under a 64bit dom0?
That would also break.

The logic is very simple.

If the blob passed in kexec_load claims to be 32bit, Xen will switch
into 32bit mode before executing it.  If the blob claims to be 64 bit
then Xen will stay in 64bit mode before executing it.

This way, purgatory from any kind of multi-arch setup in dom0 will work,
as well as all of the other usecases which your suggestion breaks.

~Andrew

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-11 14:27                             ` Andrew Cooper
  2013-03-11 20:45                               ` Daniel Kiper
@ 2013-03-11 20:45                               ` Daniel Kiper
  1 sibling, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-11 20:45 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: kexec, David Vrabel, xen-devel

On Mon, Mar 11, 2013 at 02:27:26PM +0000, Andrew Cooper wrote:
> On 11/03/13 14:13, Daniel Kiper wrote:
> > On Mon, Mar 11, 2013 at 01:43:02PM +0000, David Vrabel wrote:
> >> On 11/03/13 13:30, Daniel Kiper wrote:
> >>> On Mon, Mar 11, 2013 at 01:21:30PM +0000, David Vrabel wrote:
> >>>> On 11/03/13 11:17, Daniel Kiper wrote:
> >>>>> Heh... It looks that there is a misunderstanding. At first I thought
> >>>>> that David was going to replace purgatory functionality by switching
> >>>>> from 64-bit to 32-bit in kexec_reloc. But later I realized that
> >>>>> I missed Xen 64-bit/dom 32-bit case. Now I agree that this switch
> >>>>> must stay as is. However, now I think that there is another
> >>>>> small mistake which should be fixed. Please look above.
> >>>> Which mistake?  I'm not sure what you're referring to.
> >>> I thought about that:
> >>>
> >>> if ( image->arch == EM_386 )
> >>>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
> >>>
> >>> It should be change to:
> >>>
> >>> if ( is_pv_32on64_domain(dom0) )
> >>>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
> >> This isn't a mistake but a deliberate improvement to the old interface.
> > I am still not convinced.
> >
> >> It is clearer and more useful for this sub-architecture to be explicitly
> >> supplied in the kexec_load call than implicitly through some other
> >> side-channel.
> > First of all you do not need to pass any info about architecure to
> > new kernel or something like that (please check my previous emails).
>
> Yes - you really do.  Guessing the architecture of a blob of code is
> insane, and any current interface which relies on this guessing is
> broken by design.

Which interface do you mean? Old Xen? kexec-tools? purgatory?

Why do you need to enforce architecture? purgatory starts new kernel
image like BIOS does it. What is wrong with that? Do you set something
in BIOS to differentiate between 32-bit and 64-bit system?

> > If any then there is another questions. What do you do if you need
> > second or third argument?. You redefine kexec interface once again.
> > For what? Additionally, currently there are a lot of stuff passed
> > to new kernel via purgatory. And purgatory is called by your
> > interface too...
> >
> >> If we go with what you suggest then you prevent kexec from being used
> >> by: a) PVH dom0s; b) suitably privileged service domains; c) 32-bit
> > Maybe for PVH should be different check. However,
> > until now we do not have it in Xen yet.
> >
> >> guests wanting to load an image with a 64-bit entry point; and d)
> > Once again:
> >
> > old_kernel (Xen) -> purgatory (native mode) -> new_kernel
> >
> > purgatory architecture is same as kexec-tools architecture. If you
> > use dom0 i386 it means that kexec-tools is (and must be) i386 too.
> > We do not support Xen i386 anymore. It means that my condition is
> > correct.
> >
> > Daniel
>
> And what happens when kexec-tools is using ia32-libs under a 64bit dom0?
> That would also break.

Hmmm... Once I have done some tests with hypercalls called on 64-bit
system running 32-bit binaries. I was not able to run them properly.
Probably there is an issue with arguments. However, I have not time
to test it deeper. Please correct me if I am wrong. If I am wrong then
condition should be changed to something different. In this case your
and my proposal will be not correct for all cases for sure.

> The logic is very simple.
>
> If the blob passed in kexec_load claims to be 32bit, Xen will switch
> into 32bit mode before executing it.  If the blob claims to be 64 bit

Davids condition is wrong for Xen/dom0/binaries 64-bit case with 32-bit
kernel image. purgatory will crash or somthing like that because
it is 64-bit and it was called in 32-bit mode.

Daniel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-11 14:27                             ` Andrew Cooper
@ 2013-03-11 20:45                               ` Daniel Kiper
  2013-03-11 21:18                                 ` Andrew Cooper
  2013-03-11 21:18                                 ` Andrew Cooper
  2013-03-11 20:45                               ` Daniel Kiper
  1 sibling, 2 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-11 20:45 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: kexec, David Vrabel, xen-devel

On Mon, Mar 11, 2013 at 02:27:26PM +0000, Andrew Cooper wrote:
> On 11/03/13 14:13, Daniel Kiper wrote:
> > On Mon, Mar 11, 2013 at 01:43:02PM +0000, David Vrabel wrote:
> >> On 11/03/13 13:30, Daniel Kiper wrote:
> >>> On Mon, Mar 11, 2013 at 01:21:30PM +0000, David Vrabel wrote:
> >>>> On 11/03/13 11:17, Daniel Kiper wrote:
> >>>>> Heh... It looks that there is a misunderstanding. At first I thought
> >>>>> that David was going to replace purgatory functionality by switching
> >>>>> from 64-bit to 32-bit in kexec_reloc. But later I realized that
> >>>>> I missed Xen 64-bit/dom 32-bit case. Now I agree that this switch
> >>>>> must stay as is. However, now I think that there is another
> >>>>> small mistake which should be fixed. Please look above.
> >>>> Which mistake?  I'm not sure what you're referring to.
> >>> I thought about that:
> >>>
> >>> if ( image->arch == EM_386 )
> >>>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
> >>>
> >>> It should be change to:
> >>>
> >>> if ( is_pv_32on64_domain(dom0) )
> >>>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
> >> This isn't a mistake but a deliberate improvement to the old interface.
> > I am still not convinced.
> >
> >> It is clearer and more useful for this sub-architecture to be explicitly
> >> supplied in the kexec_load call than implicitly through some other
> >> side-channel.
> > First of all you do not need to pass any info about architecure to
> > new kernel or something like that (please check my previous emails).
>
> Yes - you really do.  Guessing the architecture of a blob of code is
> insane, and any current interface which relies on this guessing is
> broken by design.

Which interface do you mean? Old Xen? kexec-tools? purgatory?

Why do you need to enforce architecture? purgatory starts new kernel
image like BIOS does it. What is wrong with that? Do you set something
in BIOS to differentiate between 32-bit and 64-bit system?

> > If any then there is another questions. What do you do if you need
> > second or third argument?. You redefine kexec interface once again.
> > For what? Additionally, currently there are a lot of stuff passed
> > to new kernel via purgatory. And purgatory is called by your
> > interface too...
> >
> >> If we go with what you suggest then you prevent kexec from being used
> >> by: a) PVH dom0s; b) suitably privileged service domains; c) 32-bit
> > Maybe for PVH should be different check. However,
> > until now we do not have it in Xen yet.
> >
> >> guests wanting to load an image with a 64-bit entry point; and d)
> > Once again:
> >
> > old_kernel (Xen) -> purgatory (native mode) -> new_kernel
> >
> > purgatory architecture is same as kexec-tools architecture. If you
> > use dom0 i386 it means that kexec-tools is (and must be) i386 too.
> > We do not support Xen i386 anymore. It means that my condition is
> > correct.
> >
> > Daniel
>
> And what happens when kexec-tools is using ia32-libs under a 64bit dom0?
> That would also break.

Hmmm... Once I have done some tests with hypercalls called on 64-bit
system running 32-bit binaries. I was not able to run them properly.
Probably there is an issue with arguments. However, I have not time
to test it deeper. Please correct me if I am wrong. If I am wrong then
condition should be changed to something different. In this case your
and my proposal will be not correct for all cases for sure.

> The logic is very simple.
>
> If the blob passed in kexec_load claims to be 32bit, Xen will switch
> into 32bit mode before executing it.  If the blob claims to be 64 bit

Davids condition is wrong for Xen/dom0/binaries 64-bit case with 32-bit
kernel image. purgatory will crash or somthing like that because
it is 64-bit and it was called in 32-bit mode.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-11 20:45                               ` Daniel Kiper
  2013-03-11 21:18                                 ` Andrew Cooper
@ 2013-03-11 21:18                                 ` Andrew Cooper
  1 sibling, 0 replies; 106+ messages in thread
From: Andrew Cooper @ 2013-03-11 21:18 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kexec, David Vrabel, xen-devel

On 11/03/13 20:45, Daniel Kiper wrote:
> On Mon, Mar 11, 2013 at 02:27:26PM +0000, Andrew Cooper wrote:
>> On 11/03/13 14:13, Daniel Kiper wrote:
>>> On Mon, Mar 11, 2013 at 01:43:02PM +0000, David Vrabel wrote:
>>>> On 11/03/13 13:30, Daniel Kiper wrote:
>>>>> On Mon, Mar 11, 2013 at 01:21:30PM +0000, David Vrabel wrote:
>>>>>> On 11/03/13 11:17, Daniel Kiper wrote:
>>>>>>> Heh... It looks that there is a misunderstanding. At first I thought
>>>>>>> that David was going to replace purgatory functionality by switching
>>>>>>> from 64-bit to 32-bit in kexec_reloc. But later I realized that
>>>>>>> I missed Xen 64-bit/dom 32-bit case. Now I agree that this switch
>>>>>>> must stay as is. However, now I think that there is another
>>>>>>> small mistake which should be fixed. Please look above.
>>>>>> Which mistake?  I'm not sure what you're referring to.
>>>>> I thought about that:
>>>>>
>>>>> if ( image->arch == EM_386 )
>>>>>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
>>>>>
>>>>> It should be change to:
>>>>>
>>>>> if ( is_pv_32on64_domain(dom0) )
>>>>>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
>>>> This isn't a mistake but a deliberate improvement to the old interface.
>>> I am still not convinced.
>>>
>>>> It is clearer and more useful for this sub-architecture to be explicitly
>>>> supplied in the kexec_load call than implicitly through some other
>>>> side-channel.
>>> First of all you do not need to pass any info about architecure to
>>> new kernel or something like that (please check my previous emails).
>> Yes - you really do.  Guessing the architecture of a blob of code is
>> insane, and any current interface which relies on this guessing is
>> broken by design.
> Which interface do you mean? Old Xen? kexec-tools? purgatory?
>
> Why do you need to enforce architecture? purgatory starts new kernel
> image like BIOS does it. What is wrong with that? Do you set something
> in BIOS to differentiate between 32-bit and 64-bit system?

Fine, but that is irrelevant.  Purgatory can do whatever it wants as
soon as it is running.

What Xen cares about is entering into the binary blob in the correct
operating mode.  This binary blob will usually be purgatory but can be
any executable image loaded using the new interface.

If that image claims to be a 32bit image, Xen will enter it in 32bit
mode.  If it claims to be 64bit, Xen will enter it in 64bit mode.  If it
is being stupid and claiming to be 32bit when it is in fact 64bit (or
vice versa) then it has only itself to blame.

What is absolutely unacceptable (which is the case in the current
interface) is for Xen to assume that the entry point for the blob is the
same operating mode as dom0.

Just because "this happens to be the case when using kexec-tools at the
moment" is no reason nor excuse to functionally break the interface.

~Andrew

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-11 20:45                               ` Daniel Kiper
@ 2013-03-11 21:18                                 ` Andrew Cooper
  2013-03-12 11:17                                   ` Daniel Kiper
  2013-03-12 11:17                                   ` [Xen-devel] " Daniel Kiper
  2013-03-11 21:18                                 ` Andrew Cooper
  1 sibling, 2 replies; 106+ messages in thread
From: Andrew Cooper @ 2013-03-11 21:18 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kexec, David Vrabel, xen-devel

On 11/03/13 20:45, Daniel Kiper wrote:
> On Mon, Mar 11, 2013 at 02:27:26PM +0000, Andrew Cooper wrote:
>> On 11/03/13 14:13, Daniel Kiper wrote:
>>> On Mon, Mar 11, 2013 at 01:43:02PM +0000, David Vrabel wrote:
>>>> On 11/03/13 13:30, Daniel Kiper wrote:
>>>>> On Mon, Mar 11, 2013 at 01:21:30PM +0000, David Vrabel wrote:
>>>>>> On 11/03/13 11:17, Daniel Kiper wrote:
>>>>>>> Heh... It looks that there is a misunderstanding. At first I thought
>>>>>>> that David was going to replace purgatory functionality by switching
>>>>>>> from 64-bit to 32-bit in kexec_reloc. But later I realized that
>>>>>>> I missed Xen 64-bit/dom 32-bit case. Now I agree that this switch
>>>>>>> must stay as is. However, now I think that there is another
>>>>>>> small mistake which should be fixed. Please look above.
>>>>>> Which mistake?  I'm not sure what you're referring to.
>>>>> I thought about that:
>>>>>
>>>>> if ( image->arch == EM_386 )
>>>>>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
>>>>>
>>>>> It should be change to:
>>>>>
>>>>> if ( is_pv_32on64_domain(dom0) )
>>>>>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
>>>> This isn't a mistake but a deliberate improvement to the old interface.
>>> I am still not convinced.
>>>
>>>> It is clearer and more useful for this sub-architecture to be explicitly
>>>> supplied in the kexec_load call than implicitly through some other
>>>> side-channel.
>>> First of all you do not need to pass any info about architecure to
>>> new kernel or something like that (please check my previous emails).
>> Yes - you really do.  Guessing the architecture of a blob of code is
>> insane, and any current interface which relies on this guessing is
>> broken by design.
> Which interface do you mean? Old Xen? kexec-tools? purgatory?
>
> Why do you need to enforce architecture? purgatory starts new kernel
> image like BIOS does it. What is wrong with that? Do you set something
> in BIOS to differentiate between 32-bit and 64-bit system?

Fine, but that is irrelevant.  Purgatory can do whatever it wants as
soon as it is running.

What Xen cares about is entering into the binary blob in the correct
operating mode.  This binary blob will usually be purgatory but can be
any executable image loaded using the new interface.

If that image claims to be a 32bit image, Xen will enter it in 32bit
mode.  If it claims to be 64bit, Xen will enter it in 64bit mode.  If it
is being stupid and claiming to be 32bit when it is in fact 64bit (or
vice versa) then it has only itself to blame.

What is absolutely unacceptable (which is the case in the current
interface) is for Xen to assume that the entry point for the blob is the
same operating mode as dom0.

Just because "this happens to be the case when using kexec-tools at the
moment" is no reason nor excuse to functionally break the interface.

~Andrew

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-11 21:18                                 ` Andrew Cooper
@ 2013-03-12 11:17                                   ` Daniel Kiper
  2013-03-12 11:17                                   ` [Xen-devel] " Daniel Kiper
  1 sibling, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-12 11:17 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: kexec, David Vrabel, xen-devel

On Mon, Mar 11, 2013 at 09:18:54PM +0000, Andrew Cooper wrote:
> On 11/03/13 20:45, Daniel Kiper wrote:
> > On Mon, Mar 11, 2013 at 02:27:26PM +0000, Andrew Cooper wrote:
> >> On 11/03/13 14:13, Daniel Kiper wrote:
> >>> On Mon, Mar 11, 2013 at 01:43:02PM +0000, David Vrabel wrote:
> >>>> On 11/03/13 13:30, Daniel Kiper wrote:
> >>>>> On Mon, Mar 11, 2013 at 01:21:30PM +0000, David Vrabel wrote:
> >>>>>> On 11/03/13 11:17, Daniel Kiper wrote:
> >>>>>>> Heh... It looks that there is a misunderstanding. At first I thought
> >>>>>>> that David was going to replace purgatory functionality by switching
> >>>>>>> from 64-bit to 32-bit in kexec_reloc. But later I realized that
> >>>>>>> I missed Xen 64-bit/dom 32-bit case. Now I agree that this switch
> >>>>>>> must stay as is. However, now I think that there is another
> >>>>>>> small mistake which should be fixed. Please look above.
> >>>>>> Which mistake?  I'm not sure what you're referring to.
> >>>>> I thought about that:
> >>>>>
> >>>>> if ( image->arch == EM_386 )
> >>>>>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
> >>>>>
> >>>>> It should be change to:
> >>>>>
> >>>>> if ( is_pv_32on64_domain(dom0) )
> >>>>>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
> >>>> This isn't a mistake but a deliberate improvement to the old interface.
> >>> I am still not convinced.
> >>>
> >>>> It is clearer and more useful for this sub-architecture to be explicitly
> >>>> supplied in the kexec_load call than implicitly through some other
> >>>> side-channel.
> >>> First of all you do not need to pass any info about architecure to
> >>> new kernel or something like that (please check my previous emails).
> >> Yes - you really do.  Guessing the architecture of a blob of code is
> >> insane, and any current interface which relies on this guessing is
> >> broken by design.
> > Which interface do you mean? Old Xen? kexec-tools? purgatory?
> >
> > Why do you need to enforce architecture? purgatory starts new kernel
> > image like BIOS does it. What is wrong with that? Do you set something
> > in BIOS to differentiate between 32-bit and 64-bit system?
>
> Fine, but that is irrelevant.  Purgatory can do whatever it wants as
> soon as it is running.
>
> What Xen cares about is entering into the binary blob in the correct
> operating mode.  This binary blob will usually be purgatory but can be
> any executable image loaded using the new interface.

Now it is clear. If you would like to enforce arch of blob as whole
instead of kernel image itself then David's condition is OK.

Daniel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-03-11 21:18                                 ` Andrew Cooper
  2013-03-12 11:17                                   ` Daniel Kiper
@ 2013-03-12 11:17                                   ` Daniel Kiper
  1 sibling, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-12 11:17 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: kexec, David Vrabel, xen-devel

On Mon, Mar 11, 2013 at 09:18:54PM +0000, Andrew Cooper wrote:
> On 11/03/13 20:45, Daniel Kiper wrote:
> > On Mon, Mar 11, 2013 at 02:27:26PM +0000, Andrew Cooper wrote:
> >> On 11/03/13 14:13, Daniel Kiper wrote:
> >>> On Mon, Mar 11, 2013 at 01:43:02PM +0000, David Vrabel wrote:
> >>>> On 11/03/13 13:30, Daniel Kiper wrote:
> >>>>> On Mon, Mar 11, 2013 at 01:21:30PM +0000, David Vrabel wrote:
> >>>>>> On 11/03/13 11:17, Daniel Kiper wrote:
> >>>>>>> Heh... It looks that there is a misunderstanding. At first I thought
> >>>>>>> that David was going to replace purgatory functionality by switching
> >>>>>>> from 64-bit to 32-bit in kexec_reloc. But later I realized that
> >>>>>>> I missed Xen 64-bit/dom 32-bit case. Now I agree that this switch
> >>>>>>> must stay as is. However, now I think that there is another
> >>>>>>> small mistake which should be fixed. Please look above.
> >>>>>> Which mistake?  I'm not sure what you're referring to.
> >>>>> I thought about that:
> >>>>>
> >>>>> if ( image->arch == EM_386 )
> >>>>>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
> >>>>>
> >>>>> It should be change to:
> >>>>>
> >>>>> if ( is_pv_32on64_domain(dom0) )
> >>>>>   reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
> >>>> This isn't a mistake but a deliberate improvement to the old interface.
> >>> I am still not convinced.
> >>>
> >>>> It is clearer and more useful for this sub-architecture to be explicitly
> >>>> supplied in the kexec_load call than implicitly through some other
> >>>> side-channel.
> >>> First of all you do not need to pass any info about architecure to
> >>> new kernel or something like that (please check my previous emails).
> >> Yes - you really do.  Guessing the architecture of a blob of code is
> >> insane, and any current interface which relies on this guessing is
> >> broken by design.
> > Which interface do you mean? Old Xen? kexec-tools? purgatory?
> >
> > Why do you need to enforce architecture? purgatory starts new kernel
> > image like BIOS does it. What is wrong with that? Do you set something
> > in BIOS to differentiate between 32-bit and 64-bit system?
>
> Fine, but that is irrelevant.  Purgatory can do whatever it wants as
> soon as it is running.
>
> What Xen cares about is entering into the binary blob in the correct
> operating mode.  This binary blob will usually be purgatory but can be
> any executable image loaded using the new interface.

Now it is clear. If you would like to enforce arch of blob as whole
instead of kernel image itself then David's condition is OK.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-02-21 17:48 ` David Vrabel
                     ` (6 preceding siblings ...)
  2013-03-12 11:36   ` Daniel Kiper
@ 2013-03-12 11:36   ` Daniel Kiper
  7 siblings, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-12 11:36 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Thu, Feb 21, 2013 at 05:48:11PM +0000, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
>
> In the existing kexec hypercall, the load and unload ops depend on
> internals of the Linux kernel (the page list and code page provided by
> the kernel).  The code page is used to transition between Xen context
> and the image so using kernel code doesn't make sense and will not
> work for PVH guests.
>
> Add replacement KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload ops
> that no longer require a code page to be provided by the guest -- Xen
> now provides the code for calling the image directly.
>
> The new load op looks similar to the Linux kexec_load system call and
> allows the guest to provide the image data to be loaded.  The guest
> specifies the architecture of the image which may be a 32-bit subarch
> of the hypervisor's architecture (i.e., an EM_386 image on an
> EM_X86_64 hypervisor).
>
> The toolstack can now load images without kernel involvement.  This is
> required for supporting kexec when using a dom0 with an upstream
> kernel.
>
> Crash images are copied directly into the crash region on load.
> Default images are copied into Xen heap pages and a list of source and
> destination machine addresses is created.  This is list is used in
> kexec_reloc() to relocate the image to its destination.
>
> The old load and unload sub-ops are still available (as
> KEXEC_CMD_load_v1 and KEXEC_CMD_unload_v1) and are implemented on top
> of the new infrastructure.
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

[...]

> diff --git a/xen/common/kexec.c b/xen/common/kexec.c

[...]

> -static int kexec_load_unload_compat(unsigned long op,
> -                                    XEN_GUEST_HANDLE_PARAM(void) uarg)
> +static int kexec_load_v1_compat(XEN_GUEST_HANDLE_PARAM(void) uarg)
>  {
>  #ifdef CONFIG_COMPAT
>      compat_kexec_load_v1_t compat_load;
> @@ -807,49 +990,113 @@ static int kexec_load_unload_compat(unsigned long op,
>      load.type = compat_load.type;
>      XLAT_kexec_image(&load.image, &compat_load.image);
>
> -    return kexec_load_unload_internal(op, &load);
> -#else /* CONFIG_COMPAT */
> +    return kexec_do_load_v1(&load);
> +#else
>      return 0;

-ENOSYS?

> -#endif /* CONFIG_COMPAT */
> +#endif
>  }

[...]

> +static int kexec_unload_v1_compat(XEN_GUEST_HANDLE_PARAM(void) uarg)
> +{
> +#ifdef CONFIG_COMPAT
> +    compat_kexec_load_v1_t compat_load;
> +    xen_kexec_unload_t unload;
> +
> +    if ( copy_from_guest(&compat_load, uarg, 1) )
> +        return -EFAULT;
> +
> +    unload.type = compat_load.type;
> +    return kexec_do_unload(&unload);
> +#else
> +    return 0;

-ENOSYS?

> +#endif
> +}

...and in other similar places...

Daniel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-02-21 17:48 ` David Vrabel
                     ` (5 preceding siblings ...)
  2013-03-08 11:23   ` Daniel Kiper
@ 2013-03-12 11:36   ` Daniel Kiper
  2013-03-12 11:36   ` Daniel Kiper
  7 siblings, 0 replies; 106+ messages in thread
From: Daniel Kiper @ 2013-03-12 11:36 UTC (permalink / raw)
  To: David Vrabel; +Cc: kexec, xen-devel

On Thu, Feb 21, 2013 at 05:48:11PM +0000, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
>
> In the existing kexec hypercall, the load and unload ops depend on
> internals of the Linux kernel (the page list and code page provided by
> the kernel).  The code page is used to transition between Xen context
> and the image so using kernel code doesn't make sense and will not
> work for PVH guests.
>
> Add replacement KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload ops
> that no longer require a code page to be provided by the guest -- Xen
> now provides the code for calling the image directly.
>
> The new load op looks similar to the Linux kexec_load system call and
> allows the guest to provide the image data to be loaded.  The guest
> specifies the architecture of the image which may be a 32-bit subarch
> of the hypervisor's architecture (i.e., an EM_386 image on an
> EM_X86_64 hypervisor).
>
> The toolstack can now load images without kernel involvement.  This is
> required for supporting kexec when using a dom0 with an upstream
> kernel.
>
> Crash images are copied directly into the crash region on load.
> Default images are copied into Xen heap pages and a list of source and
> destination machine addresses is created.  This is list is used in
> kexec_reloc() to relocate the image to its destination.
>
> The old load and unload sub-ops are still available (as
> KEXEC_CMD_load_v1 and KEXEC_CMD_unload_v1) and are implemented on top
> of the new infrastructure.
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

[...]

> diff --git a/xen/common/kexec.c b/xen/common/kexec.c

[...]

> -static int kexec_load_unload_compat(unsigned long op,
> -                                    XEN_GUEST_HANDLE_PARAM(void) uarg)
> +static int kexec_load_v1_compat(XEN_GUEST_HANDLE_PARAM(void) uarg)
>  {
>  #ifdef CONFIG_COMPAT
>      compat_kexec_load_v1_t compat_load;
> @@ -807,49 +990,113 @@ static int kexec_load_unload_compat(unsigned long op,
>      load.type = compat_load.type;
>      XLAT_kexec_image(&load.image, &compat_load.image);
>
> -    return kexec_load_unload_internal(op, &load);
> -#else /* CONFIG_COMPAT */
> +    return kexec_do_load_v1(&load);
> +#else
>      return 0;

-ENOSYS?

> -#endif /* CONFIG_COMPAT */
> +#endif
>  }

[...]

> +static int kexec_unload_v1_compat(XEN_GUEST_HANDLE_PARAM(void) uarg)
> +{
> +#ifdef CONFIG_COMPAT
> +    compat_kexec_load_v1_t compat_load;
> +    xen_kexec_unload_t unload;
> +
> +    if ( copy_from_guest(&compat_load, uarg, 1) )
> +        return -EFAULT;
> +
> +    unload.type = compat_load.type;
> +    return kexec_do_unload(&unload);
> +#else
> +    return 0;

-ENOSYS?

> +#endif
> +}

...and in other similar places...

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-04-17 10:11     ` David Vrabel
@ 2013-04-17 10:20       ` Jan Beulich
  0 siblings, 0 replies; 106+ messages in thread
From: Jan Beulich @ 2013-04-17 10:20 UTC (permalink / raw)
  To: David Vrabel; +Cc: Keir (Xen.org), Daniel Kiper, kexec, xen-devel

>>> On 17.04.13 at 12:11, David Vrabel <david.vrabel@citrix.com> wrote:
> On 17/04/13 09:55, Jan Beulich wrote:
>>>>> On 16.04.13 at 19:13, David Vrabel <david.vrabel@citrix.com> wrote:
>>> -static int kexec_exec(XEN_GUEST_HANDLE_PARAM(void) uarg)
>>> +static int kexec_load(XEN_GUEST_HANDLE_PARAM(void) uarg)
>>>  {
>>> -    xen_kexec_exec_t exec;
>>> -    xen_kexec_image_t *image;
>>> -    int base, bit, pos, ret = -EINVAL;
>>> +    xen_kexec_load_t load;
>>> +    xen_kexec_segment_t *segments;
>>> +    struct kexec_image *kimage = NULL;
>>> +    int ret;
>>>  
>>> -    if ( unlikely(copy_from_guest(&exec, uarg, 1)) )
>>> +    if ( copy_from_guest(&load, uarg, 1) )
>>>          return -EFAULT;
>>>  
>>> -    if ( kexec_load_get_bits(exec.type, &base, &bit) )
>>> +    if ( load.nr_segments >= KEXEC_SEGMENT_MAX )
>>>          return -EINVAL;
>> 
>> Especially since you named the padding field _rsvd, you ought
>> to verify it to be zero somewhere here. Or if you're really sure
>> that nobody will ever want to make use of the field, name it
>> _pad instead.
> 
> 8 bits isn't likely to be useful and the interface can always be
> extended by adding a new sub-ob.  I'll rename it to _pad.
> 
> Do you want to put your acked-by on this patch (with this change) or on
> any of the other patches?

No, I didn't look at this closely enough to do so.

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-04-17  8:55   ` [Xen-devel] " Jan Beulich
@ 2013-04-17 10:11     ` David Vrabel
  2013-04-17 10:20       ` Jan Beulich
  0 siblings, 1 reply; 106+ messages in thread
From: David Vrabel @ 2013-04-17 10:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir (Xen.org), Daniel Kiper, kexec, xen-devel

On 17/04/13 09:55, Jan Beulich wrote:
>>>> On 16.04.13 at 19:13, David Vrabel <david.vrabel@citrix.com> wrote:
>> -static int kexec_exec(XEN_GUEST_HANDLE_PARAM(void) uarg)
>> +static int kexec_load(XEN_GUEST_HANDLE_PARAM(void) uarg)
>>  {
>> -    xen_kexec_exec_t exec;
>> -    xen_kexec_image_t *image;
>> -    int base, bit, pos, ret = -EINVAL;
>> +    xen_kexec_load_t load;
>> +    xen_kexec_segment_t *segments;
>> +    struct kexec_image *kimage = NULL;
>> +    int ret;
>>  
>> -    if ( unlikely(copy_from_guest(&exec, uarg, 1)) )
>> +    if ( copy_from_guest(&load, uarg, 1) )
>>          return -EFAULT;
>>  
>> -    if ( kexec_load_get_bits(exec.type, &base, &bit) )
>> +    if ( load.nr_segments >= KEXEC_SEGMENT_MAX )
>>          return -EINVAL;
> 
> Especially since you named the padding field _rsvd, you ought
> to verify it to be zero somewhere here. Or if you're really sure
> that nobody will ever want to make use of the field, name it
> _pad instead.

8 bits isn't likely to be useful and the interface can always be
extended by adding a new sub-ob.  I'll rename it to _pad.

Do you want to put your acked-by on this patch (with this change) or on
any of the other patches?

David

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] kexec: extend hypercall with improved load/unload ops
  2013-04-16 17:13 ` [PATCH 5/8] kexec: extend hypercall with improved load/unload ops David Vrabel
@ 2013-04-17  8:55   ` Jan Beulich
  2013-04-17 10:11     ` David Vrabel
  0 siblings, 1 reply; 106+ messages in thread
From: Jan Beulich @ 2013-04-17  8:55 UTC (permalink / raw)
  To: David Vrabel; +Cc: Keir Fraser, Daniel Kiper, kexec, xen-devel

>>> On 16.04.13 at 19:13, David Vrabel <david.vrabel@citrix.com> wrote:
> -static int kexec_exec(XEN_GUEST_HANDLE_PARAM(void) uarg)
> +static int kexec_load(XEN_GUEST_HANDLE_PARAM(void) uarg)
>  {
> -    xen_kexec_exec_t exec;
> -    xen_kexec_image_t *image;
> -    int base, bit, pos, ret = -EINVAL;
> +    xen_kexec_load_t load;
> +    xen_kexec_segment_t *segments;
> +    struct kexec_image *kimage = NULL;
> +    int ret;
>  
> -    if ( unlikely(copy_from_guest(&exec, uarg, 1)) )
> +    if ( copy_from_guest(&load, uarg, 1) )
>          return -EFAULT;
>  
> -    if ( kexec_load_get_bits(exec.type, &base, &bit) )
> +    if ( load.nr_segments >= KEXEC_SEGMENT_MAX )
>          return -EINVAL;

Especially since you named the padding field _rsvd, you ought
to verify it to be zero somewhere here. Or if you're really sure
that nobody will ever want to make use of the field, name it
_pad instead.

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 106+ messages in thread

end of thread, other threads:[~2013-04-17 10:19 UTC | newest]

Thread overview: 106+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-21 17:48 [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels David Vrabel
2013-02-21 17:48 ` [PATCH 1/8] x86: give FIX_EFI_MPF its own fixmap entry David Vrabel
2013-02-21 17:48 ` David Vrabel
2013-02-21 17:48 ` [PATCH 2/8] xen: make GUEST_HANDLE_64() and uint64_aligned_t available everywhere David Vrabel
2013-02-21 17:48 ` David Vrabel
2013-02-21 17:48 ` [PATCH 3/8] kexec: add public interface for improved load/unload sub-ops David Vrabel
2013-02-21 17:48 ` David Vrabel
2013-02-21 22:29   ` Daniel Kiper
2013-02-21 22:29   ` Daniel Kiper
2013-02-22 11:49     ` David Vrabel
2013-02-22 11:49     ` David Vrabel
2013-02-22  8:33   ` [Xen-devel] " Jan Beulich
2013-02-22 11:50     ` David Vrabel
2013-02-22 11:50     ` [Xen-devel] " David Vrabel
2013-02-22 13:09       ` Jan Beulich
2013-02-22 13:09       ` [Xen-devel] " Jan Beulich
2013-02-22  8:33   ` Jan Beulich
2013-03-08 10:50   ` Daniel Kiper
2013-03-08 10:50   ` Daniel Kiper
2013-03-08 11:52     ` David Vrabel
2013-03-08 12:28       ` Daniel Kiper
2013-03-08 12:28       ` Daniel Kiper
2013-03-08 12:36         ` [Xen-devel] " Jan Beulich
2013-03-08 15:34           ` Daniel Kiper
2013-03-08 15:34           ` Daniel Kiper
2013-03-08 12:36         ` Jan Beulich
2013-03-08 11:52     ` David Vrabel
2013-02-21 17:48 ` [PATCH 4/8] kexec: add infrastructure for handling kexec images David Vrabel
2013-03-08 11:37   ` Daniel Kiper
2013-03-08 11:42     ` David Vrabel
2013-03-08 11:58       ` Daniel Kiper
2013-03-08 11:58       ` Daniel Kiper
2013-03-08 11:42     ` David Vrabel
2013-03-08 11:37   ` Daniel Kiper
2013-02-21 17:48 ` David Vrabel
2013-02-21 17:48 ` [PATCH 5/8] kexec: extend hypercall with improved load/unload ops David Vrabel
2013-02-21 17:48 ` David Vrabel
2013-02-21 22:41   ` Daniel Kiper
2013-02-21 22:41   ` Daniel Kiper
2013-02-22  8:42   ` [Xen-devel] " Jan Beulich
2013-02-22 11:54     ` David Vrabel
2013-02-22 13:11       ` Jan Beulich
2013-02-22 13:11       ` Jan Beulich
2013-02-22 11:54     ` David Vrabel
2013-02-22  8:42   ` Jan Beulich
2013-03-08 11:23   ` Daniel Kiper
2013-03-08 11:23   ` Daniel Kiper
2013-03-08 11:40     ` David Vrabel
2013-03-08 12:21       ` Daniel Kiper
2013-03-08 14:01         ` David Vrabel
2013-03-08 15:23           ` Daniel Kiper
2013-03-08 15:23             ` Daniel Kiper
2013-03-08 17:29             ` Andrew Cooper
2013-03-08 17:29               ` [Xen-devel] " Andrew Cooper
2013-03-08 21:45               ` Daniel Kiper
2013-03-08 23:38                 ` Andrew Cooper
2013-03-08 23:38                 ` [Xen-devel] " Andrew Cooper
2013-03-11 11:17                   ` Daniel Kiper
2013-03-11 11:17                   ` [Xen-devel] " Daniel Kiper
2013-03-11 13:21                     ` David Vrabel
2013-03-11 13:21                     ` [Xen-devel] " David Vrabel
2013-03-11 13:30                       ` Daniel Kiper
2013-03-11 13:43                         ` David Vrabel
2013-03-11 13:43                         ` [Xen-devel] " David Vrabel
2013-03-11 14:13                           ` Daniel Kiper
2013-03-11 14:13                           ` [Xen-devel] " Daniel Kiper
2013-03-11 14:27                             ` Andrew Cooper
2013-03-11 20:45                               ` Daniel Kiper
2013-03-11 21:18                                 ` Andrew Cooper
2013-03-12 11:17                                   ` Daniel Kiper
2013-03-12 11:17                                   ` [Xen-devel] " Daniel Kiper
2013-03-11 21:18                                 ` Andrew Cooper
2013-03-11 20:45                               ` Daniel Kiper
2013-03-11 14:27                             ` Andrew Cooper
2013-03-11 13:30                       ` Daniel Kiper
2013-03-08 21:45               ` Daniel Kiper
2013-03-08 14:01         ` David Vrabel
2013-03-08 12:21       ` Daniel Kiper
2013-03-08 11:40     ` David Vrabel
2013-03-12 11:36   ` Daniel Kiper
2013-03-12 11:36   ` Daniel Kiper
2013-02-21 17:48 ` [PATCH 6/8] xen: kexec crash image when dom0 crashes David Vrabel
2013-02-21 17:48 ` David Vrabel
2013-02-21 17:48 ` [PATCH 7/8] libxc: add hypercall buffer arrays David Vrabel
2013-02-21 17:48 ` David Vrabel
2013-03-06 14:25   ` Ian Jackson
2013-03-06 14:25   ` [Xen-devel] " Ian Jackson
2013-03-07  2:44   ` Ian Campbell
2013-03-07  2:44   ` Ian Campbell
2013-02-21 17:48 ` [PATCH 8/8] libxc: add API for kexec hypercall David Vrabel
2013-02-21 17:48 ` David Vrabel
2013-03-07  2:46   ` Ian Campbell
2013-03-07  2:46   ` [Xen-devel] " Ian Campbell
2013-02-21 22:47 ` [PATCH 0/8] kexec: extended kexec hypercall for use with pv-ops kernels Daniel Kiper
2013-02-21 22:47 ` Daniel Kiper
2013-02-22  8:17 ` Jan Beulich
2013-02-22  8:17 ` [Xen-devel] " Jan Beulich
2013-02-22 11:56   ` David Vrabel
2013-02-22 11:56   ` David Vrabel
2013-02-26 13:58 ` Don Slutz
2013-02-26 13:58 ` [Xen-devel] " Don Slutz
2013-03-05 11:04 ` David Vrabel
2013-03-05 11:04 ` [Xen-devel] " David Vrabel
2013-04-16 17:13 [PATCHv4 0/8] kexec: extend " David Vrabel
2013-04-16 17:13 ` [PATCH 5/8] kexec: extend hypercall with improved load/unload ops David Vrabel
2013-04-17  8:55   ` [Xen-devel] " Jan Beulich
2013-04-17 10:11     ` David Vrabel
2013-04-17 10:20       ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.