All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFCv3 0/8] toolstack-based approach to pvhvm guest kexec
@ 2014-10-07 13:10 Vitaly Kuznetsov
  2014-10-07 13:10 ` [PATCH RFCv3 1/8] xen: introduce SHUTDOWN_soft_reset shutdown reason Vitaly Kuznetsov
                   ` (9 more replies)
  0 siblings, 10 replies; 17+ messages in thread
From: Vitaly Kuznetsov @ 2014-10-07 13:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Ian Campbell, Andrew Cooper, David Vrabel, Jan Beulich

Changes from RFC/WIPv2:

Here is a slightly different approach to memory reassignment. Instead of
introducing new (and very doubtful) XENMEM_transfer operation introduce
simple XEN_DOMCTL_set_recipient operation and do everything in free_domheap_pages()
handler utilizing normal domain destroy path. This is better because:
- The approach is general-enough
- All memory pages are usually being freed when the domain is destroyed
- No special grants handling required
- Better supportability

With regards to PV:
Though XEN_DOMCTL_set_recipient works for both PV and HVM this patchset does not
bring PV kexec/kdump support. xc_domain_soft_reset() is limited to work with HVM
domains only. The main reason for that is: it is (in theory) possible to save p2m
and rebuild them with the new domain but that would only allow us to resume execution
from where we stopped. If we want to execute new kernel we need to build the same
kernel/initrd/bootstrap_pagetables/... structure we build to boot PV domain initially.
That however would destroy the original domain's memory thus making kdump impossible.
To make everything work additional support from kexec userspace/linux kernel is
required and I'm not sure it makes sense to implement all this stuff in the light of
PVH.

Original description:

When a PVHVM linux guest performs kexec there are lots of things which
require taking care of:
- shared info, vcpu_info
- grants
- event channels
- ...
Instead of taking care of all these things we can rebuild the domain
performing kexec from scratch doing so-called soft-reboot.

The idea was suggested by David Vrabel, Jan Beulich, and Konrad Rzeszutek Wilk.

P.S. The patch series can be tested with PVHVM Linux guest with the following
modifications:

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c0cb11f..33c5cdd 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -33,6 +33,10 @@
 #include <linux/memblock.h>
 #include <linux/edd.h>

+#ifdef CONFIG_KEXEC
+#include <linux/kexec.h>
+#endif
+
 #include <xen/xen.h>
 #include <xen/events.h>
 #include <xen/interface/xen.h>
@@ -1810,6 +1814,22 @@ static struct notifier_block xen_hvm_cpu_notifier = {
   .notifier_call   = xen_hvm_cpu_notify,
 };

+#ifdef CONFIG_KEXEC
+static void xen_pvhvm_kexec_shutdown(void)
+{
+	native_machine_shutdown();
+	if (kexec_in_progress) {
+	   xen_reboot(SHUTDOWN_soft_reset);
+	   }
+}
+
+static void xen_pvhvm_crash_shutdown(struct pt_regs *regs)
+{
+	native_machine_crash_shutdown(regs);
+	xen_reboot(SHUTDOWN_soft_reset);
+}
+#endif
+
 static void __init xen_hvm_guest_init(void)
 {
	init_hvm_pv_info();
@@ -1826,6 +1846,10 @@ static void __init xen_hvm_guest_init(void)
   x86_init.irqs.intr_init = xen_init_IRQ;
   xen_hvm_init_time_ops();
   xen_hvm_init_mmu_ops();
+#ifdef CONFIG_KEXEC
+	machine_ops.shutdown = xen_pvhvm_kexec_shutdown;
+	machine_ops.crash_shutdown = xen_pvhvm_crash_shutdown;
+#endif
 }

 static bool xen_nopv = false;
diff --git a/include/xen/interface/sched.h b/include/xen/interface/sched.h
index 9ce0839..b5942a8 100644
--- a/include/xen/interface/sched.h
+++ b/include/xen/interface/sched.h
@@ -107,5 +107,6 @@ struct sched_watchdog {
 #define SHUTDOWN_suspend    2  /* Clean up, save suspend info, kill.         */
 #define SHUTDOWN_crash      3  /* Tell controller we've crashed.             */
 #define SHUTDOWN_watchdog   4  /* Restart because watchdog time expired.     */
+#define SHUTDOWN_soft_reset 5  /* Soft-reset for kexec.                      */

 #endif /* __XEN_PUBLIC_SCHED_H__ */

Vitaly Kuznetsov (8):
  xen: introduce SHUTDOWN_soft_reset shutdown reason
  libxl: support SHUTDOWN_soft_reset shutdown reason
  xen: introduce XEN_DOMCTL_set_recipient
  libxc: support XEN_DOMCTL_set_recipient
  libxl: add libxl__domain_soft_reset_destroy_old()
  libxc: introduce soft reset for HVM domains
  libxl: soft reset support
  xsm: add XEN_DOMCTL_set_recipient support

 tools/libxc/Makefile                |   1 +
 tools/libxc/xc_domain.c             |  10 +
 tools/libxc/xc_domain_soft_reset.c  | 394 ++++++++++++++++++++++++++++++++++++
 tools/libxc/xenctrl.h               |  14 ++
 tools/libxc/xenguest.h              |  20 ++
 tools/libxl/libxl.c                 |  32 ++-
 tools/libxl/libxl.h                 |   6 +
 tools/libxl/libxl_create.c          | 103 +++++++++-
 tools/libxl/libxl_internal.h        |   8 +
 tools/libxl/libxl_types.idl         |   1 +
 tools/libxl/xl_cmdimpl.c            |  24 ++-
 tools/python/xen/lowlevel/xl/xl.c   |   1 +
 xen/common/domain.c                 |   3 +
 xen/common/domctl.c                 |  35 ++++
 xen/common/page_alloc.c             |  26 ++-
 xen/common/shutdown.c               |   7 +
 xen/include/public/domctl.h         |  19 ++
 xen/include/public/sched.h          |   3 +-
 xen/include/xen/sched.h             |   1 +
 xen/include/xsm/dummy.h             |   6 +
 xen/include/xsm/xsm.h               |   6 +
 xen/xsm/dummy.c                     |   1 +
 xen/xsm/flask/hooks.c               |  17 ++
 xen/xsm/flask/policy/access_vectors |  10 +
 24 files changed, 726 insertions(+), 22 deletions(-)
 create mode 100644 tools/libxc/xc_domain_soft_reset.c

-- 
1.9.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFCv3 1/8] xen: introduce SHUTDOWN_soft_reset shutdown reason
  2014-10-07 13:10 [PATCH RFCv3 0/8] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
@ 2014-10-07 13:10 ` Vitaly Kuznetsov
  2014-10-07 13:10 ` [PATCH RFCv3 2/8] libxl: support " Vitaly Kuznetsov
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Vitaly Kuznetsov @ 2014-10-07 13:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Ian Campbell, Andrew Cooper, David Vrabel, Jan Beulich

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 xen/common/shutdown.c      | 7 +++++++
 xen/include/public/sched.h | 3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/xen/common/shutdown.c b/xen/common/shutdown.c
index 94d4c53..5c3a158 100644
--- a/xen/common/shutdown.c
+++ b/xen/common/shutdown.c
@@ -71,6 +71,13 @@ void hwdom_shutdown(u8 reason)
         break; /* not reached */
     }
 
+    case SHUTDOWN_soft_reset:
+    {
+        printk("Domain 0 did soft reset but it is unsupported, rebooting.\n");
+        machine_restart(0);
+        break; /* not reached */
+    }
+
     default:
     {
         printk("Domain 0 shutdown (unknown reason %u): ", reason);
diff --git a/xen/include/public/sched.h b/xen/include/public/sched.h
index 4000ac9..800c808 100644
--- a/xen/include/public/sched.h
+++ b/xen/include/public/sched.h
@@ -159,7 +159,8 @@ DEFINE_XEN_GUEST_HANDLE(sched_watchdog_t);
 #define SHUTDOWN_suspend    2  /* Clean up, save suspend info, kill.         */
 #define SHUTDOWN_crash      3  /* Tell controller we've crashed.             */
 #define SHUTDOWN_watchdog   4  /* Restart because watchdog time expired.     */
-#define SHUTDOWN_MAX        4  /* Maximum valid shutdown reason.             */
+#define SHUTDOWN_soft_reset 5  /* Soft reset, rebuild keeping memory content */
+#define SHUTDOWN_MAX        5  /* Maximum valid shutdown reason.             */
 /* ` } */
 
 #endif /* __XEN_PUBLIC_SCHED_H__ */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFCv3 2/8] libxl: support SHUTDOWN_soft_reset shutdown reason
  2014-10-07 13:10 [PATCH RFCv3 0/8] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
  2014-10-07 13:10 ` [PATCH RFCv3 1/8] xen: introduce SHUTDOWN_soft_reset shutdown reason Vitaly Kuznetsov
@ 2014-10-07 13:10 ` Vitaly Kuznetsov
  2014-10-07 13:10 ` [PATCH RFCv3 3/8] xen: introduce XEN_DOMCTL_set_recipient Vitaly Kuznetsov
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Vitaly Kuznetsov @ 2014-10-07 13:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Ian Campbell, Andrew Cooper, David Vrabel, Jan Beulich

Use letter 't' to indicate a domain in such state.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/libxl/libxl_types.idl       | 1 +
 tools/libxl/xl_cmdimpl.c          | 2 +-
 tools/python/xen/lowlevel/xl/xl.c | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index f1fcbc3..8077415 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -166,6 +166,7 @@ libxl_shutdown_reason = Enumeration("shutdown_reason", [
     (2, "suspend"),
     (3, "crash"),
     (4, "watchdog"),
+    (5, "soft_reset"),
     ], init_val = "LIBXL_SHUTDOWN_REASON_UNKNOWN")
 
 libxl_vga_interface_type = Enumeration("vga_interface_type", [
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 698b3bc..b40ad50 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -3279,7 +3279,7 @@ static void list_domains(int verbose, int context, int claim, int numa,
                          const libxl_dominfo *info, int nb_domain)
 {
     int i;
-    static const char shutdown_reason_letters[]= "-rscw";
+    static const char shutdown_reason_letters[]= "-rscwt";
     libxl_bitmap nodemap;
     libxl_physinfo physinfo;
 
diff --git a/tools/python/xen/lowlevel/xl/xl.c b/tools/python/xen/lowlevel/xl/xl.c
index 32f982a..7c61160 100644
--- a/tools/python/xen/lowlevel/xl/xl.c
+++ b/tools/python/xen/lowlevel/xl/xl.c
@@ -784,6 +784,7 @@ PyMODINIT_FUNC initxl(void)
     _INT_CONST_LIBXL(m, SHUTDOWN_REASON_SUSPEND);
     _INT_CONST_LIBXL(m, SHUTDOWN_REASON_CRASH);
     _INT_CONST_LIBXL(m, SHUTDOWN_REASON_WATCHDOG);
+    _INT_CONST_LIBXL(m, SHUTDOWN_REASON_SOFT_RESET);
 
     genwrap__init(m);
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFCv3 3/8] xen: introduce XEN_DOMCTL_set_recipient
  2014-10-07 13:10 [PATCH RFCv3 0/8] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
  2014-10-07 13:10 ` [PATCH RFCv3 1/8] xen: introduce SHUTDOWN_soft_reset shutdown reason Vitaly Kuznetsov
  2014-10-07 13:10 ` [PATCH RFCv3 2/8] libxl: support " Vitaly Kuznetsov
@ 2014-10-07 13:10 ` Vitaly Kuznetsov
  2014-11-25 14:51   ` Jan Beulich
  2014-10-07 13:10 ` [PATCH RFCv3 4/8] libxc: support XEN_DOMCTL_set_recipient Vitaly Kuznetsov
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 17+ messages in thread
From: Vitaly Kuznetsov @ 2014-10-07 13:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Ian Campbell, Andrew Cooper, David Vrabel, Jan Beulich

New operation sets the 'recipient' domain which will recieve all
memory pages from a particular domain when these pages are freed.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 xen/common/domain.c         |  3 +++
 xen/common/domctl.c         | 27 +++++++++++++++++++++++++++
 xen/common/page_alloc.c     | 26 ++++++++++++++++++++++----
 xen/include/public/domctl.h | 19 +++++++++++++++++++
 xen/include/xen/sched.h     |  1 +
 5 files changed, 72 insertions(+), 4 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 62514b0..6bceb64 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -822,6 +822,9 @@ static void complete_domain_destroy(struct rcu_head *head)
     if ( d->target != NULL )
         put_domain(d->target);
 
+    if ( d->recipient != NULL )
+        put_domain(d->recipient);
+
     evtchn_destroy_final(d);
 
     radix_tree_destroy(&d->pirq_tree, free_pirq_struct);
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 1ad0729..38cd10a 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -1152,6 +1152,33 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
     }
     break;
 
+    case XEN_DOMCTL_set_recipient:
+    {
+        struct domain *recipient_dom;
+
+        if ( op->u.recipient.recipient == DOMID_INVALID )
+        {
+            if ( d->recipient )
+            {
+                put_domain(d->recipient);
+            }
+            d->recipient = NULL;
+            break;
+        }
+
+        recipient_dom = get_domain_by_id(op->u.recipient.recipient);
+        if ( recipient_dom == NULL )
+        {
+            ret = -ESRCH;
+            break;
+        }
+        else
+        {
+            d->recipient = recipient_dom;
+        }
+    }
+    break;
+
     default:
         ret = arch_do_domctl(op, d, u_domctl);
         break;
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 7b4092d..abb2ef3 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1707,6 +1707,7 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
 {
     struct domain *d = page_get_owner(pg);
     unsigned int i;
+    unsigned long mfn, gmfn;
     bool_t drop_dom_ref;
 
     ASSERT(!in_irq());
@@ -1764,11 +1765,28 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
             scrub = 1;
         }
 
-        if ( unlikely(scrub) )
-            for ( i = 0; i < (1 << order); i++ )
-                scrub_one_page(&pg[i]);
+        if ( !d || !d->recipient || d->recipient->is_dying )
+        {
+            if ( unlikely(scrub) )
+                for ( i = 0; i < (1 << order); i++ )
+                    scrub_one_page(&pg[i]);
 
-        free_heap_pages(pg, order);
+            free_heap_pages(pg, order);
+        }
+        else
+        {
+            mfn = page_to_mfn(pg);
+            gmfn = mfn_to_gmfn(d, mfn);
+
+            page_set_owner(pg, NULL);
+            assign_pages(d->recipient, pg, order, 0);
+
+            if ( guest_physmap_add_page(d->recipient, gmfn, mfn, order) )
+            {
+                gdprintk(XENLOG_INFO, "Failed to add page to domain's physmap"
+                         " mfn: %lx\n", mfn);
+            }
+        }
     }
 
     if ( drop_dom_ref )
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index cfa39b3..7fcb6db 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -965,6 +965,23 @@ struct xen_domctl_vnuma {
 typedef struct xen_domctl_vnuma xen_domctl_vnuma_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_vnuma_t);
 
+/*
+ * XEN_DOMCTL_set_recipient - sets the 'recipient' domain which will recieve
+ * all the domain's memory after its death or when and memory page from
+ * domain's heap is being freed. Pages from xen heap belonging to the domain
+ * are not copied. An additional reference link to the destination domain is
+ * being taken to prevent it from dying.
+ * When DOMID_INVALID is supplied as a recipient the source domain returns to
+ * its original state when memory pages are freed normally, reference link to
+ * the destination domain is being dropped.
+ */
+struct xen_domctl_set_recipient {
+    domid_t recipient;
+};
+typedef struct xen_domctl_set_recipient xen_domctl_set_recipient_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_recipient_t);
+
+
 struct xen_domctl {
     uint32_t cmd;
 #define XEN_DOMCTL_createdomain                   1
@@ -1038,6 +1055,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_get_vcpu_msrs                 72
 #define XEN_DOMCTL_set_vcpu_msrs                 73
 #define XEN_DOMCTL_setvnumainfo                  74
+#define XEN_DOMCTL_set_recipient                 75
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1099,6 +1117,7 @@ struct xen_domctl {
         struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu;
         struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
         struct xen_domctl_vnuma             vnuma;
+        struct xen_domctl_set_recipient     recipient;
         uint8_t                             pad[128];
     } u;
 };
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index c5157e6..d00d655 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -366,6 +366,7 @@ struct domain
     bool_t           is_privileged;
     /* Which guest this guest has privileges on */
     struct domain   *target;
+    struct domain   *recipient;
     /* Is this guest being debugged by dom0? */
     bool_t           debugger_attached;
     /* Is this guest dying (i.e., a zombie)? */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFCv3 4/8] libxc: support XEN_DOMCTL_set_recipient
  2014-10-07 13:10 [PATCH RFCv3 0/8] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  2014-10-07 13:10 ` [PATCH RFCv3 3/8] xen: introduce XEN_DOMCTL_set_recipient Vitaly Kuznetsov
@ 2014-10-07 13:10 ` Vitaly Kuznetsov
  2014-10-07 13:10 ` [PATCH RFCv3 5/8] libxl: add libxl__domain_soft_reset_destroy_old() Vitaly Kuznetsov
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Vitaly Kuznetsov @ 2014-10-07 13:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Ian Campbell, Andrew Cooper, David Vrabel, Jan Beulich

Introduce new xc_domain_set_recipient() function to support
XEN_DOMCTL_set_recipient.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/libxc/xc_domain.c | 10 ++++++++++
 tools/libxc/xenctrl.h   | 14 ++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 14f4666..2c69087 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -92,6 +92,16 @@ int xc_domain_destroy(xc_interface *xch,
     return ret;
 }
 
+int xc_domain_set_recipient(xc_interface *xch,
+                            uint32_t domid, uint32_t recipient)
+{
+    DECLARE_DOMCTL;
+    domctl.cmd = XEN_DOMCTL_set_recipient;
+    domctl.domain = (domid_t)domid;
+    domctl.u.recipient.recipient = (domid_t)recipient;
+    return do_domctl(xch, &domctl);
+}
+
 int xc_domain_shutdown(xc_interface *xch,
                        uint32_t domid,
                        int reason)
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 514b241..e59c718 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -551,6 +551,20 @@ int xc_domain_unpause(xc_interface *xch,
 int xc_domain_destroy(xc_interface *xch,
                       uint32_t domid);
 
+/**
+ * This function sets a 'recipient' domain for a domain (when the source domain
+ * releases memory it is being reassigned to the recipient domain instead of
+ * being freed). The destination domain is supposed to have enough max_mem.
+ * Supplying DOMID_INVALID as a recipient cleans things up.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid the source domain id
+ * @parm recipient the destrination domain id
+ * @return 0 on success, -1 on failure
+ */
+int xc_domain_set_recipient(xc_interface *xch,
+                            uint32_t domid, uint32_t recipient);
+
 
 /**
  * This function resumes a suspended domain. The domain should have
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFCv3 5/8] libxl: add libxl__domain_soft_reset_destroy_old()
  2014-10-07 13:10 [PATCH RFCv3 0/8] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (3 preceding siblings ...)
  2014-10-07 13:10 ` [PATCH RFCv3 4/8] libxc: support XEN_DOMCTL_set_recipient Vitaly Kuznetsov
@ 2014-10-07 13:10 ` Vitaly Kuznetsov
  2014-10-07 13:10 ` [PATCH RFCv3 6/8] libxc: introduce soft reset for HVM domains Vitaly Kuznetsov
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Vitaly Kuznetsov @ 2014-10-07 13:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Ian Campbell, Andrew Cooper, David Vrabel, Jan Beulich

New libxl__domain_soft_reset_destroy_old() is an internal-only
version of libxl_domain_destroy() which follows the same domain
destroy path with the only difference: xc_domain_destroy() is
being avoided so the domain is not actually being destroyed.

Add soft_reset flag to libxl__domain_destroy_state structure
to support the change.

The original libxl_domain_destroy() function could be easily
modified to support new flag but I'm trying to avoid that as
it is part of public API.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/libxl/libxl.c          | 32 +++++++++++++++++++++++++++-----
 tools/libxl/libxl_internal.h |  4 ++++
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 77672d0..a38019b 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -1351,6 +1351,23 @@ int libxl_domain_destroy(libxl_ctx *ctx, uint32_t domid,
     return AO_INPROGRESS;
 }
 
+int libxl__domain_soft_reset_destroy_old(libxl_ctx *ctx, uint32_t domid,
+                                         const libxl_asyncop_how *ao_how)
+{
+    AO_CREATE(ctx, domid, ao_how);
+    libxl__domain_destroy_state *dds;
+
+    GCNEW(dds);
+    dds->ao = ao;
+    dds->domid = domid;
+    dds->callback = domain_destroy_cb;
+    dds->soft_reset = 1;
+    libxl__domain_destroy(egc, dds);
+
+    return AO_INPROGRESS;
+}
+
+
 static void domain_destroy_cb(libxl__egc *egc, libxl__domain_destroy_state *dds,
                               int rc)
 {
@@ -1526,6 +1543,7 @@ static void devices_destroy_cb(libxl__egc *egc,
 {
     STATE_AO_GC(drs->ao);
     libxl__destroy_domid_state *dis = CONTAINER_OF(drs, *dis, drs);
+    libxl__domain_destroy_state *dds = CONTAINER_OF(dis, *dds, domain);
     libxl_ctx *ctx = CTX;
     uint32_t domid = dis->domid;
     char *dom_path;
@@ -1564,11 +1582,15 @@ static void devices_destroy_cb(libxl__egc *egc,
     }
     libxl__userdata_destroyall(gc, domid);
 
-    rc = xc_domain_destroy(ctx->xch, domid);
-    if (rc < 0) {
-        LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, "xc_domain_destroy failed for %d", domid);
-        rc = ERROR_FAIL;
-        goto out;
+    if (!dds->soft_reset)
+    {
+        rc = xc_domain_destroy(ctx->xch, domid);
+        if (rc < 0) {
+            LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc,
+                                "xc_domain_destroy failed for %d", domid);
+            rc = ERROR_FAIL;
+            goto out;
+        }
     }
     rc = 0;
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index f61673c..b43f2e3 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2762,6 +2762,7 @@ struct libxl__domain_destroy_state {
     int stubdom_finished;
     libxl__destroy_domid_state domain;
     int domain_finished;
+    int soft_reset;
 };
 
 /*
@@ -2925,6 +2926,9 @@ _hidden void libxl__domain_save_device_model(libxl__egc *egc,
 
 _hidden const char *libxl__device_model_savefile(libxl__gc *gc, uint32_t domid);
 
+_hidden int libxl__domain_soft_reset_destroy_old(libxl_ctx *ctx, uint32_t domid,
+                                                 const libxl_asyncop_how *ao_how);
+
 
 /*
  * Convenience macros.
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFCv3 6/8] libxc: introduce soft reset for HVM domains
  2014-10-07 13:10 [PATCH RFCv3 0/8] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (4 preceding siblings ...)
  2014-10-07 13:10 ` [PATCH RFCv3 5/8] libxl: add libxl__domain_soft_reset_destroy_old() Vitaly Kuznetsov
@ 2014-10-07 13:10 ` Vitaly Kuznetsov
  2014-10-07 13:10 ` [PATCH RFCv3 7/8] libxl: soft reset support Vitaly Kuznetsov
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Vitaly Kuznetsov @ 2014-10-07 13:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Ian Campbell, Andrew Cooper, David Vrabel, Jan Beulich

Add new xc_domain_soft_reset() function which performs so-called 'soft reset'
for an HVM domain. It is being performed in the following way:
- Save HVM context and all HVM params;
- Set a recipient domain (an empty one with the same config) with
  XEN_DOMCTL_set_recipient;
- The source domain is destroyed;
- If everything goes well all memory gets reassigned to the new domain;
- If we have some leftovers left (that means something went wrong during
  destroy) just copy all remaining pages;
- Restore HVM context, HVM params, seed grant table

After that the domain resumes execution from where SHUTDOWN_soft_reset was
called.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/libxc/Makefile               |   1 +
 tools/libxc/xc_domain_soft_reset.c | 394 +++++++++++++++++++++++++++++++++++++
 tools/libxc/xenguest.h             |  20 ++
 3 files changed, 415 insertions(+)
 create mode 100644 tools/libxc/xc_domain_soft_reset.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 3b04027..b5d4b60 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -49,6 +49,7 @@ GUEST_SRCS-y += xc_offline_page.c xc_compression.c
 else
 GUEST_SRCS-y += xc_nomigrate.c
 endif
+GUEST_SRCS-y += xc_domain_soft_reset.c
 
 vpath %.c ../../xen/common/libelf
 CFLAGS += -I../../xen/common/libelf
diff --git a/tools/libxc/xc_domain_soft_reset.c b/tools/libxc/xc_domain_soft_reset.c
new file mode 100644
index 0000000..495b7e6
--- /dev/null
+++ b/tools/libxc/xc_domain_soft_reset.c
@@ -0,0 +1,394 @@
+/******************************************************************************
+ * xc_domain_soft_reset.c
+ *
+ * Do soft reset.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include <inttypes.h>
+#include <time.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/time.h>
+
+#include "xc_private.h"
+#include "xc_core.h"
+#include "xc_bitops.h"
+#include "xc_dom.h"
+#include "xg_private.h"
+#include "xg_save_restore.h"
+
+#include <xen/hvm/params.h>
+
+#define SLEEP_INT 5
+
+static int get_pfn_types(xc_interface *xch, uint32_t dom, uint64_t *pfn_buf,
+                         unsigned long max_gpfn)
+{
+    int size;
+    unsigned long pfn, pfn2;
+    int rc = -1;
+
+    for (pfn = 0; pfn < max_gpfn + 1; pfn+= 1024)
+    {
+        if (pfn + 1024 <= max_gpfn + 1)
+            size = 1024;
+        else
+            size = max_gpfn - pfn + 1;
+
+        for (pfn2 = pfn; pfn2 < pfn + size; pfn2++)
+            pfn_buf[pfn2] = pfn2;
+
+        if ( xc_get_pfn_type_batch(xch, dom, size, &pfn_buf[pfn]) )
+        {
+            PERROR("xc_get_pfn_type_batch failed");
+            goto out;
+        }
+    }
+    rc = 0;
+out:
+    return rc;
+}
+
+int xc_domain_soft_reset(xc_interface *xch, uint32_t source_dom,
+                         uint32_t dest_dom, domid_t console_domid,
+                         unsigned long *console_mfn, domid_t store_domid,
+                         unsigned long *store_mfn)
+{
+    xc_dominfo_t old_info, new_info;
+    int rc = 1;
+
+    uint32_t hvm_buf_size = 0;
+    uint8_t *hvm_buf = NULL;
+    unsigned long console_pfn, store_pfn, io_pfn, buffio_pfn;
+    unsigned long pfn, max_gpfn, num_pages, num_pages_new;
+    uint64_t hvm_params[HVM_NR_PARAMS];
+    uint64_t *pfn_buf_old = NULL,*pfn_buf_new = NULL;
+    void *source_pg, *dest_pg;
+
+    DPRINTF("%s: soft reset domid %u -> %u", __func__, source_dom, dest_dom);
+
+    if ( xc_domain_getinfo(xch, source_dom, 1, &old_info) != 1 )
+    {
+        PERROR("Could not get old domain info");
+        return 1;
+    }
+
+    if ( xc_domain_getinfo(xch, dest_dom, 1, &new_info) != 1 )
+    {
+        PERROR("Could not get new domain info");
+        return 1;
+    }
+
+    if ( !old_info.hvm || !new_info.hvm )
+    {
+        PERROR("Soft reset is supported for HVM only");
+        return 1;
+    }
+
+    max_gpfn = xc_domain_maximum_gpfn(xch, source_dom);
+    pfn_buf_old = (uint64_t *)malloc((max_gpfn + 1) * sizeof(uint64_t));
+
+    if ( !pfn_buf_old  )
+    {
+        ERROR("Couldn't allocate memory");
+        goto out;
+    }
+
+    if ( get_pfn_types(xch, source_dom, pfn_buf_old, max_gpfn) )
+    {
+        goto out;
+    }
+
+    hvm_buf_size = xc_domain_hvm_getcontext(xch, source_dom, 0, 0);
+    if ( hvm_buf_size == -1 )
+    {
+        PERROR("Couldn't get HVM context size from Xen");
+        goto out;
+    }
+
+    hvm_buf = malloc(hvm_buf_size);
+    if ( !hvm_buf )
+    {
+        ERROR("Couldn't allocate memory");
+        goto out;
+    }
+
+    if ( xc_domain_hvm_getcontext(xch, source_dom, hvm_buf,
+                                  hvm_buf_size) == -1 )
+    {
+        PERROR("HVM:Could not get hvm buffer");
+        goto out;
+    }
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_STORE_PFN,
+                     &hvm_params[HVM_PARAM_STORE_PFN]);
+    store_pfn = hvm_params[HVM_PARAM_STORE_PFN];
+    *store_mfn = store_pfn;
+
+    xc_hvm_param_get(xch, source_dom,
+                     HVM_PARAM_CONSOLE_PFN,
+                     &hvm_params[HVM_PARAM_CONSOLE_PFN]);
+    console_pfn = hvm_params[HVM_PARAM_CONSOLE_PFN];
+    *console_mfn = console_pfn;
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_BUFIOREQ_PFN,
+                     &hvm_params[HVM_PARAM_BUFIOREQ_PFN]);
+    buffio_pfn = hvm_params[HVM_PARAM_BUFIOREQ_PFN];
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_IOREQ_PFN,
+                     &hvm_params[HVM_PARAM_IOREQ_PFN]);
+    io_pfn = hvm_params[HVM_PARAM_IOREQ_PFN];
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_IDENT_PT,
+                     &hvm_params[HVM_PARAM_IDENT_PT]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_PAGING_RING_PFN,
+                     &hvm_params[HVM_PARAM_PAGING_RING_PFN]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_ACCESS_RING_PFN,
+                     &hvm_params[HVM_PARAM_ACCESS_RING_PFN]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_VM86_TSS,
+                     &hvm_params[HVM_PARAM_VM86_TSS]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_ACPI_IOPORTS_LOCATION,
+                     &hvm_params[HVM_PARAM_ACPI_IOPORTS_LOCATION]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_VIRIDIAN,
+                     &hvm_params[HVM_PARAM_VIRIDIAN]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_PAE_ENABLED,
+                     &hvm_params[HVM_PARAM_PAE_ENABLED]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_STORE_EVTCHN,
+                     &hvm_params[HVM_PARAM_STORE_EVTCHN]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_IOREQ_SERVER_PFN,
+                     &hvm_params[HVM_PARAM_IOREQ_SERVER_PFN]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
+                     &hvm_params[HVM_PARAM_NR_IOREQ_SERVER_PAGES]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_VM_GENERATION_ID_ADDR,
+                     &hvm_params[HVM_PARAM_VM_GENERATION_ID_ADDR]);
+
+    rc = xc_domain_set_recipient(xch, source_dom, dest_dom);
+    if ( rc != 0 )
+    {
+        PERROR("failed to set recipient, rc=%d\n", rc);
+        goto out;
+    }
+
+    rc = xc_domain_destroy(xch, source_dom);
+    if ( rc != 0 )
+    {
+        PERROR("failed to desrtoy source domain, rc=%d\n", rc);
+        goto out;
+    }
+
+    num_pages = 0;
+    while ( 1 )
+    {
+        sleep(SLEEP_INT);
+        num_pages_new = xc_get_tot_pages(xch, dest_dom);
+        if ( num_pages_new == num_pages)
+        {
+            DPRINTF("Stopped at %lu pages, %lu pages left to copy", num_pages,
+                    old_info.nr_pages - num_pages);
+            break;
+        }
+        num_pages = num_pages_new;
+    }
+
+    if ( num_pages == old_info.nr_pages )
+    {
+        DPRINTF("All pages were transferred");
+    }
+    else
+    {
+        pfn_buf_new = (uint64_t *)malloc((max_gpfn + 1) * sizeof(uint64_t));
+
+        if ( !pfn_buf_new  )
+        {
+            ERROR("Couldn't allocate memory");
+            goto out;
+        }
+
+        /* Stop sending pages to the destination domain */
+        rc = xc_domain_set_recipient(xch, source_dom, DOMID_INVALID);
+        if ( rc != 0 )
+        {
+            /* Don't fail here -- source domain may have just died */
+            PERROR("failed to set recipient, rc=%d\n", rc);
+        }
+
+        if ( get_pfn_types(xch, dest_dom, pfn_buf_new, max_gpfn) )
+        {
+            goto out;
+        }
+
+        for (pfn = 0; pfn < max_gpfn + 1; pfn++)
+        {
+            if ( (pfn_buf_new[pfn] == XEN_DOMCTL_PFINFO_XTAB) &&
+                 (pfn_buf_old[pfn] != XEN_DOMCTL_PFINFO_XTAB) )
+            {
+                if ( xc_domain_populate_physmap_exact(xch, dest_dom,
+                                                      1, 0, 0, &pfn) )
+                {
+                    free(pfn_buf_new);
+                    PERROR("failed to populate pfn %lx", pfn);
+                }
+
+                source_pg = xc_map_foreign_pages(xch, source_dom,
+                                                 PROT_READ, &pfn, 1);
+                if ( !source_pg )
+                {
+                    PERROR("failed to map source page %lx", pfn);
+                    continue;
+                }
+
+                dest_pg = xc_map_foreign_pages(xch, dest_dom,
+                                               PROT_READ|PROT_WRITE, &pfn, 1);
+                if ( !dest_pg )
+                {
+                    munmap(source_pg, PAGE_SIZE);
+                    PERROR("failed to map dest page %lx", pfn);
+                    continue;
+                }
+                memcpy(dest_pg, source_pg, PAGE_SIZE);
+                munmap(source_pg, PAGE_SIZE);
+                munmap(dest_pg, PAGE_SIZE);
+            }
+        }
+        free(pfn_buf_new);
+
+
+    }
+
+    if ( pfn_buf_old[old_info.shared_info_frame] == XEN_DOMCTL_PFINFO_XTAB)
+    {
+        /*
+         * Shared info frame is being removed when guest maps shared info so
+         * this page is likely XEN_DOMCTL_PFINFO_XTAB but we need to replace
+         * it with an empty page in that case.
+         */
+
+        if ( xc_domain_populate_physmap_exact(xch, dest_dom, 1, 0, 0,
+                                              &old_info.shared_info_frame) )
+        {
+            free(pfn_buf_new);
+            PERROR("failed to populate pfn %lx (shared info)", pfn);
+            goto out;
+        }
+    }
+
+    if ( xc_domain_hvm_setcontext(xch, dest_dom, hvm_buf,
+                                  hvm_buf_size) == -1 )
+    {
+        PERROR("HVM:Could not set hvm buffer");
+        goto out;
+    }
+
+    if ( store_pfn )
+        xc_clear_domain_page(xch, dest_dom, store_pfn);
+
+    if ( console_pfn )
+        xc_clear_domain_page(xch, dest_dom, console_pfn);
+
+    if ( buffio_pfn )
+        xc_clear_domain_page(xch, dest_dom, buffio_pfn);
+
+    if ( io_pfn )
+        xc_clear_domain_page(xch, dest_dom, io_pfn);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_STORE_PFN,
+                     hvm_params[HVM_PARAM_STORE_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom,
+                     HVM_PARAM_CONSOLE_PFN,
+                     hvm_params[HVM_PARAM_CONSOLE_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_BUFIOREQ_PFN,
+                     hvm_params[HVM_PARAM_BUFIOREQ_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_IOREQ_PFN,
+                     hvm_params[HVM_PARAM_IOREQ_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_IDENT_PT,
+                     hvm_params[HVM_PARAM_IDENT_PT]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_PAGING_RING_PFN,
+                     hvm_params[HVM_PARAM_PAGING_RING_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_ACCESS_RING_PFN,
+                     hvm_params[HVM_PARAM_ACCESS_RING_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_VM86_TSS,
+                     hvm_params[HVM_PARAM_VM86_TSS]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_ACPI_IOPORTS_LOCATION,
+                     hvm_params[HVM_PARAM_ACPI_IOPORTS_LOCATION]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_VIRIDIAN,
+                     hvm_params[HVM_PARAM_VIRIDIAN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_PAE_ENABLED,
+                     hvm_params[HVM_PARAM_PAE_ENABLED]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_STORE_EVTCHN,
+                     hvm_params[HVM_PARAM_STORE_EVTCHN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_IOREQ_SERVER_PFN,
+                     hvm_params[HVM_PARAM_IOREQ_SERVER_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
+                     hvm_params[HVM_PARAM_NR_IOREQ_SERVER_PAGES]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_VM_GENERATION_ID_ADDR,
+                     hvm_params[HVM_PARAM_VM_GENERATION_ID_ADDR]);
+
+    if (xc_dom_gnttab_hvm_seed(xch, dest_dom, console_pfn, store_pfn,
+                               console_domid, store_domid))
+    {
+        PERROR("error seeding hvm grant table");
+        goto out;
+    }
+
+    rc = 0;
+out:
+    if (pfn_buf_old) free(pfn_buf_old);
+
+    if (hvm_buf) free(hvm_buf);
+
+    if ( (rc != 0) && (dest_dom != 0) ) {
+            PERROR("Faled to perform soft reset, destroying domain %d",
+                   dest_dom);
+	    xc_domain_destroy(xch, dest_dom);
+    }
+
+    return !!rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
index 40bbac8..770cd10 100644
--- a/tools/libxc/xenguest.h
+++ b/tools/libxc/xenguest.h
@@ -131,6 +131,26 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
  * of the new domain is automatically appended to the filename,
  * separated by a ".".
  */
+
+/**
+ * This function does soft reset for a domain. During soft reset all
+ * source domain's memory is being reassigned to the destination domain,
+ * HVM context and HVM params are being copied.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm source_dom the id of the source domain
+ * @parm dest_dom the id of the destination domain
+ * @parm console_domid the id of the domain handling console
+ * @parm console_mfn returned with the mfn of the console page
+ * @parm store_domid the id of the domain handling store
+ * @parm store_mfn returned with the mfn of the store page
+ * @return 0 on success, -1 on failure
+ */
+int xc_domain_soft_reset(xc_interface *xch, uint32_t source_dom,
+                         uint32_t dest_dom, domid_t console_domid,
+                         unsigned long *console_mfn, domid_t store_domid,
+                         unsigned long *store_mfn);
+
 #define XC_DEVICE_MODEL_RESTORE_FILE "/var/lib/xen/qemu-resume"
 
 /**
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFCv3 7/8] libxl: soft reset support
  2014-10-07 13:10 [PATCH RFCv3 0/8] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (5 preceding siblings ...)
  2014-10-07 13:10 ` [PATCH RFCv3 6/8] libxc: introduce soft reset for HVM domains Vitaly Kuznetsov
@ 2014-10-07 13:10 ` Vitaly Kuznetsov
  2014-10-07 13:10 ` [PATCH RFCv3 8/8] xsm: add XEN_DOMCTL_set_recipient support Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Vitaly Kuznetsov @ 2014-10-07 13:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Ian Campbell, Andrew Cooper, David Vrabel, Jan Beulich

Perform soft reset when a domain did SHUTDOWN_soft_reset. Migrate the
content with xc_domain_soft_reset(), reload dm and toolstack.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/libxl/libxl.h          |   6 +++
 tools/libxl/libxl_create.c   | 103 +++++++++++++++++++++++++++++++++++++++----
 tools/libxl/libxl_internal.h |   4 ++
 tools/libxl/xl_cmdimpl.c     |  22 ++++++++-
 4 files changed, 124 insertions(+), 11 deletions(-)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index bc68cac..24b43c5 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -877,6 +877,12 @@ int static inline libxl_domain_create_restore_0x040200(
 
 #endif
 
+int libxl_domain_soft_reset(libxl_ctx *ctx, libxl_domain_config *d_config,
+                            uint32_t *domid, uint32_t domid_old,
+                            const libxl_asyncop_how *ao_how,
+                            const libxl_asyncprogress_how *aop_console_how)
+                            LIBXL_EXTERNAL_CALLERS_ONLY;
+
   /* A progress report will be made via ao_console_how, of type
    * domain_create_console_available, when the domain's primary
    * console is available and can be connected to.
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 8b82584..544d386 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -24,6 +24,8 @@
 #include <xenguest.h>
 #include <xen/hvm/hvm_info_table.h>
 
+#define INVALID_DOMID ~0
+
 int libxl__domain_create_info_setdefault(libxl__gc *gc,
                                          libxl_domain_create_info *c_info)
 {
@@ -885,6 +887,9 @@ static void initiate_domain_create(libxl__egc *egc,
     if (restore_fd >= 0) {
         LOG(DEBUG, "restoring, not running bootloader");
         domcreate_bootloader_done(egc, &dcs->bl, 0);
+    } else if (dcs->domid_soft_reset != INVALID_DOMID) {
+        LOG(DEBUG, "soft reset, not running bootloader\n");
+        domcreate_bootloader_done(egc, &dcs->bl, 0);
     } else  {
         LOG(DEBUG, "running bootloader");
         dcs->bl.callback = domcreate_bootloader_done;
@@ -933,6 +938,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     libxl_domain_config *const d_config = dcs->guest_config;
     libxl_domain_build_info *const info = &d_config->b_info;
     const int restore_fd = dcs->restore_fd;
+    const uint32_t domid_soft_reset = dcs->domid_soft_reset;
     libxl__domain_build_state *const state = &dcs->build_state;
     libxl__srm_restore_autogen_callbacks *const callbacks =
         &dcs->shs.callbacks.restore.a;
@@ -956,7 +962,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     dcs->dmss.dm.callback = domcreate_devmodel_started;
     dcs->dmss.callback = domcreate_devmodel_started;
 
-    if ( restore_fd < 0 ) {
+    if ( (restore_fd < 0) && (domid_soft_reset == INVALID_DOMID) ) {
         rc = libxl__domain_build(gc, d_config, domid, state);
         domcreate_rebuild_done(egc, dcs, rc);
         return;
@@ -986,14 +992,74 @@ static void domcreate_bootloader_done(libxl__egc *egc,
         rc = ERROR_INVAL;
         goto out;
     }
-    libxl__xc_domain_restore(egc, dcs,
-                             hvm, pae, superpages);
+    if ( restore_fd >= 0 ) {
+        libxl__xc_domain_restore(egc, dcs,
+                                 hvm, pae, superpages);
+    } else {
+        libxl__xc_domain_soft_reset(egc, dcs);
+    }
+
     return;
 
  out:
     libxl__xc_domain_restore_done(egc, dcs, rc, 0, 0);
 }
 
+void libxl__xc_domain_soft_reset(libxl__egc *egc,
+                                 libxl__domain_create_state *dcs)
+{
+    STATE_AO_GC(dcs->ao);
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+    const uint32_t domid_soft_reset = dcs->domid_soft_reset;
+    const uint32_t domid = dcs->guest_domid;
+    libxl_domain_config *const d_config = dcs->guest_config;
+    libxl_domain_build_info *const info = &d_config->b_info;
+    uint8_t *buf;
+    uint32_t len;
+    uint32_t console_domid, store_domid;
+    unsigned long store_mfn, console_mfn;
+    int rc;
+    struct libxl__domain_suspend_state *dss;
+
+    GCNEW(dss);
+
+    dss->ao = ao;
+    dss->domid = domid_soft_reset;
+    dss->dm_savefile = GCSPRINTF("/var/lib/xen/qemu-save.%d",
+                                 domid_soft_reset);
+
+    if (info->type == LIBXL_DOMAIN_TYPE_HVM) {
+        rc = libxl__domain_suspend_device_model(gc, dss);
+        if (rc) goto out;
+    }
+
+    console_domid = dcs->build_state.console_domid;
+    store_domid = dcs->build_state.store_domid;
+
+    libxl__domain_soft_reset_destroy_old(ctx, domid_soft_reset, 0);
+
+    rc = xc_domain_soft_reset(ctx->xch, domid_soft_reset, domid, console_domid,
+                              &console_mfn, store_domid, &store_mfn);
+    if (rc) goto out;
+
+    libxl__qmp_cleanup(gc, domid_soft_reset);
+
+    dcs->build_state.store_mfn = store_mfn;
+    dcs->build_state.console_mfn = console_mfn;
+
+    rc = libxl__toolstack_save(domid_soft_reset, &buf, &len, dss);
+    if (rc) goto out;
+
+    rc = libxl__toolstack_restore(domid, buf, len, &dcs->shs);
+    if (rc) goto out;
+out:
+    /*
+     * Now pretend we did normal restore and simply call
+     * libxl__xc_domain_restore_done().
+     */
+    libxl__xc_domain_restore_done(egc, dcs, rc, 0, 0);
+}
+
 void libxl__srm_callout_callback_restore_results(unsigned long store_mfn,
           unsigned long console_mfn, void *user)
 {
@@ -1019,6 +1085,7 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
 
     /* convenience aliases */
     const uint32_t domid = dcs->guest_domid;
+    const uint32_t domid_soft_reset = dcs->domid_soft_reset;
     libxl_domain_config *const d_config = dcs->guest_config;
     libxl_domain_build_info *const info = &d_config->b_info;
     libxl__domain_build_state *const state = &dcs->build_state;
@@ -1071,9 +1138,12 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
     if (ret)
         goto out;
 
-    if (info->type == LIBXL_DOMAIN_TYPE_HVM) {
+    if (info->type == LIBXL_DOMAIN_TYPE_HVM && fd != -1) {
         state->saved_state = GCSPRINTF(
                        XC_DEVICE_MODEL_RESTORE_FILE".%d", domid);
+    } else if (domid_soft_reset != INVALID_DOMID) {
+        state->saved_state = GCSPRINTF(
+                       "/var/lib/xen/qemu-save.%d", domid_soft_reset);
     }
 
 out:
@@ -1082,9 +1152,12 @@ out:
         libxl__file_reference_unmap(&state->pv_ramdisk);
     }
 
-    esave = errno;
-    libxl_fd_set_nonblock(ctx, fd, 0);
-    errno = esave;
+    if ( fd != -1 ) {
+        esave = errno;
+        libxl_fd_set_nonblock(ctx, fd, 0);
+        errno = esave;
+    }
+
     domcreate_rebuild_done(egc, dcs, ret);
 }
 
@@ -1463,6 +1536,7 @@ static void domain_create_cb(libxl__egc *egc,
 static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
                             uint32_t *domid,
                             int restore_fd, int checkpointed_stream,
+                            uint32_t domid_old,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
@@ -1475,6 +1549,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
     libxl_domain_config_init(&cdcs->dcs.guest_config_saved);
     libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config);
     cdcs->dcs.restore_fd = restore_fd;
+    cdcs->dcs.domid_soft_reset = domid_old;
     cdcs->dcs.callback = domain_create_cb;
     cdcs->dcs.checkpointed_stream = checkpointed_stream;
     libxl__ao_progress_gethow(&cdcs->dcs.aop_console_how, aop_console_how);
@@ -1503,7 +1578,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
-    return do_domain_create(ctx, d_config, domid, -1, 0,
+    return do_domain_create(ctx, d_config, domid, -1, 0, INVALID_DOMID,
                             ao_how, aop_console_how);
 }
 
@@ -1514,7 +1589,17 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 const libxl_asyncprogress_how *aop_console_how)
 {
     return do_domain_create(ctx, d_config, domid, restore_fd,
-                            params->checkpointed_stream, ao_how, aop_console_how);
+                            params->checkpointed_stream, INVALID_DOMID,
+                            ao_how, aop_console_how);
+}
+
+int libxl_domain_soft_reset(libxl_ctx *ctx, libxl_domain_config *d_config,
+                            uint32_t *domid, uint32_t domid_old,
+                            const libxl_asyncop_how *ao_how,
+                            const libxl_asyncprogress_how *aop_console_how)
+{
+    return do_domain_create(ctx, d_config, domid, -1, 0, domid_old,
+                            ao_how, aop_console_how);
 }
 
 /*
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index b43f2e3..14b38f5 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2862,6 +2862,7 @@ struct libxl__domain_create_state {
     libxl_domain_config *guest_config;
     libxl_domain_config guest_config_saved; /* vanilla config */
     int restore_fd;
+    uint32_t domid_soft_reset;
     libxl__domain_create_cb *callback;
     libxl_asyncprogress_how aop_console_how;
     /* private to domain_create */
@@ -2916,6 +2917,9 @@ _hidden void libxl__xc_domain_restore(libxl__egc *egc,
  * If rc!=0, retval and errnoval are undefined. */
 _hidden void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
                                            int rc, int retval, int errnoval);
+/* calls libxl__xc_domain_restore_done when done */
+_hidden void libxl__xc_domain_soft_reset(libxl__egc *egc,
+                                         libxl__domain_create_state *dcs);
 
 /* Each time the dm needs to be saved, we must call suspend and then save */
 _hidden int libxl__domain_suspend_device_model(libxl__gc *gc,
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index b40ad50..4307edf 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1818,7 +1818,8 @@ static void reload_domain_config(uint32_t domid,
 }
 
 /* Returns 1 if domain should be restarted,
- * 2 if domain should be renamed then restarted, or 0
+ * 2 if domain should be renamed then restarted,
+ * 3 if domain performed soft reset, or 0
  * Can update r_domid if domain is destroyed etc */
 static int handle_domain_death(uint32_t *r_domid,
                                libxl_event *event,
@@ -1844,6 +1845,9 @@ static int handle_domain_death(uint32_t *r_domid,
     case LIBXL_SHUTDOWN_REASON_WATCHDOG:
         action = d_config->on_watchdog;
         break;
+    case LIBXL_SHUTDOWN_REASON_SOFT_RESET:
+        LOG("Domain performed soft reset.");
+        return 3;
     default:
         LOG("Unknown shutdown reason code %d. Destroying domain.",
             event->u.domain_shutdown.shutdown_reason);
@@ -2067,6 +2071,7 @@ static void evdisable_disk_ejects(libxl_evgen_disk_eject **diskws,
 static uint32_t create_domain(struct domain_create *dom_info)
 {
     uint32_t domid = INVALID_DOMID;
+    uint32_t domid_old = INVALID_DOMID;
 
     libxl_domain_config d_config;
 
@@ -2292,7 +2297,18 @@ start:
          * restore/migrate-receive it again.
          */
         restoring = 0;
-    }else{
+    } else if (domid_old != INVALID_DOMID) {
+        /* Do soft reset */
+        d_config.b_info.nodemap.size = 0;
+        ret = libxl_domain_soft_reset(ctx, &d_config,
+                                      &domid, domid_old,
+                                      0, 0);
+
+        if ( ret ) {
+            goto error_out;
+        }
+        domid_old = INVALID_DOMID;
+    } else {
         ret = libxl_domain_create_new(ctx, &d_config, &domid,
                                       0, autoconnect_console_how);
     }
@@ -2356,6 +2372,8 @@ start:
                 event->u.domain_shutdown.shutdown_reason,
                 event->u.domain_shutdown.shutdown_reason);
             switch (handle_domain_death(&domid, event, &d_config)) {
+            case 3:
+                domid_old = domid;
             case 2:
                 if (!preserve_domain(&domid, event, &d_config)) {
                     /* If we fail then exit leaving the old domain in place. */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFCv3 8/8] xsm: add XEN_DOMCTL_set_recipient support
  2014-10-07 13:10 [PATCH RFCv3 0/8] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (6 preceding siblings ...)
  2014-10-07 13:10 ` [PATCH RFCv3 7/8] libxl: soft reset support Vitaly Kuznetsov
@ 2014-10-07 13:10 ` Vitaly Kuznetsov
  2014-10-07 13:28 ` [PATCH RFCv3 0/8] toolstack-based approach to pvhvm guest kexec David Vrabel
  2014-11-03 13:21 ` Vitaly Kuznetsov
  9 siblings, 0 replies; 17+ messages in thread
From: Vitaly Kuznetsov @ 2014-10-07 13:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Ian Campbell, Andrew Cooper, David Vrabel, Jan Beulich

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 xen/common/domctl.c                 | 14 +++++++++++---
 xen/include/xsm/dummy.h             |  6 ++++++
 xen/include/xsm/xsm.h               |  6 ++++++
 xen/xsm/dummy.c                     |  1 +
 xen/xsm/flask/hooks.c               | 17 +++++++++++++++++
 xen/xsm/flask/policy/access_vectors | 10 ++++++++++
 6 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 38cd10a..b456ff2 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -1158,6 +1158,10 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
 
         if ( op->u.recipient.recipient == DOMID_INVALID )
         {
+            ret = xsm_set_recipient(XSM_HOOK, d, NULL);
+            if ( ret )
+                break;
+
             if ( d->recipient )
             {
                 put_domain(d->recipient);
@@ -1172,10 +1176,14 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
             ret = -ESRCH;
             break;
         }
-        else
-        {
-            d->recipient = recipient_dom;
+
+        ret = xsm_set_recipient(XSM_HOOK, d, recipient_dom);
+        if ( ret ) {
+            put_domain(recipient_dom);
+            break;
         }
+
+        d->recipient = recipient_dom;
     }
     break;
 
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index df55e70..c037b78 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -113,6 +113,12 @@ static XSM_INLINE int xsm_set_target(XSM_DEFAULT_ARG struct domain *d, struct do
     return xsm_default_action(action, current->domain, NULL);
 }
 
+static XSM_INLINE int xsm_set_recipient(XSM_DEFAULT_ARG struct domain *d, struct domain *e)
+{
+    XSM_ASSERT_ACTION(XSM_HOOK);
+    return xsm_default_action(action, current->domain, NULL);
+}
+
 static XSM_INLINE int xsm_domctl(XSM_DEFAULT_ARG struct domain *d, int cmd)
 {
     XSM_ASSERT_ACTION(XSM_OTHER);
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 6c1c079..818654d 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -58,6 +58,7 @@ struct xsm_operations {
     int (*domctl_scheduler_op) (struct domain *d, int op);
     int (*sysctl_scheduler_op) (int op);
     int (*set_target) (struct domain *d, struct domain *e);
+    int (*set_recipient) (struct domain *d, struct domain *e);
     int (*domctl) (struct domain *d, int cmd);
     int (*sysctl) (int cmd);
     int (*readconsole) (uint32_t clear);
@@ -210,6 +211,11 @@ static inline int xsm_set_target (xsm_default_t def, struct domain *d, struct do
     return xsm_ops->set_target(d, e);
 }
 
+static inline int xsm_set_recipient (xsm_default_t def, struct domain *d, struct domain *r)
+{
+    return xsm_ops->set_recipient(d, r);
+}
+
 static inline int xsm_domctl (xsm_default_t def, struct domain *d, int cmd)
 {
     return xsm_ops->domctl(d, cmd);
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 0826a8b..e38d9c6 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -35,6 +35,7 @@ void xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, domctl_scheduler_op);
     set_to_dummy_if_null(ops, sysctl_scheduler_op);
     set_to_dummy_if_null(ops, set_target);
+    set_to_dummy_if_null(ops, set_recipient);
     set_to_dummy_if_null(ops, domctl);
     set_to_dummy_if_null(ops, sysctl);
     set_to_dummy_if_null(ops, readconsole);
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index df05566..ae16795 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -565,6 +565,21 @@ static int flask_set_target(struct domain *d, struct domain *t)
     return rc;
 }
 
+static int flask_set_recipient(struct domain *d, struct domain *r)
+{
+    int rc;
+    struct domain_security_struct *dsec, *rsec;
+    dsec = d->ssid;
+    rsec = r->ssid;
+
+    rc = current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__SET_AS_SOURCE);
+    if ( rc )
+        return rc;
+    if ( r )
+        rc = current_has_perm(r, SECCLASS_DOMAIN2, DOMAIN2__SET_AS_RECIPIENT);
+    return rc;
+}
+
 static int flask_domctl(struct domain *d, int cmd)
 {
     switch ( cmd )
@@ -577,6 +592,7 @@ static int flask_domctl(struct domain *d, int cmd)
     case XEN_DOMCTL_iomem_permission:
     case XEN_DOMCTL_memory_mapping:
     case XEN_DOMCTL_set_target:
+    case XEN_DOMCTL_set_recipient:
 #ifdef CONFIG_X86
     /* These have individual XSM hooks (arch/x86/domctl.c) */
     case XEN_DOMCTL_shadow_op:
@@ -1493,6 +1509,7 @@ static struct xsm_operations flask_ops = {
     .domctl_scheduler_op = flask_domctl_scheduler_op,
     .sysctl_scheduler_op = flask_sysctl_scheduler_op,
     .set_target = flask_set_target,
+    .set_recipient = flask_set_recipient,
     .domctl = flask_domctl,
     .sysctl = flask_sysctl,
     .readconsole = flask_readconsole,
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index d279841..5604281 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -132,6 +132,8 @@ class domain
 #  target = the new target domain
 # see also the domain2 make_priv_for and set_as_target checks
     set_target
+# XEN_DOMCTL_set_recipient
+    set_recipient
 # SCHEDOP_remote_shutdown
     shutdown
 # XEN_DOMCTL_set{,_machine}_address_size
@@ -184,6 +186,14 @@ class domain2
 #  source = the domain making the hypercall
 #  target = the new target domain
     set_as_target
+# checked in XEN_DOMCTL_set_recipient:
+#  source = the domain making the hypercall
+#  target = the new source domain
+    set_as_source
+# checked in XEN_DOMCTL_set_recipient:
+#  source = the domain making the hypercall
+#  target = the new recipient domain
+    set_as_recipient
 # XEN_DOMCTL_set_cpuid
     set_cpuid
 # XEN_DOMCTL_gettscinfo
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH RFCv3 0/8] toolstack-based approach to pvhvm guest kexec
  2014-10-07 13:10 [PATCH RFCv3 0/8] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (7 preceding siblings ...)
  2014-10-07 13:10 ` [PATCH RFCv3 8/8] xsm: add XEN_DOMCTL_set_recipient support Vitaly Kuznetsov
@ 2014-10-07 13:28 ` David Vrabel
  2014-11-03 13:21 ` Vitaly Kuznetsov
  9 siblings, 0 replies; 17+ messages in thread
From: David Vrabel @ 2014-10-07 13:28 UTC (permalink / raw)
  To: Vitaly Kuznetsov, xen-devel
  Cc: Andrew Cooper, Andrew Jones, Ian Campbell, Jan Beulich

On 07/10/14 14:10, Vitaly Kuznetsov wrote:
> Changes from RFC/WIPv2:
> 
> Here is a slightly different approach to memory reassignment. Instead of
> introducing new (and very doubtful) XENMEM_transfer operation introduce
> simple XEN_DOMCTL_set_recipient operation and do everything in free_domheap_pages()
> handler utilizing normal domain destroy path. This is better because:
> - The approach is general-enough
> - All memory pages are usually being freed when the domain is destroyed
> - No special grants handling required
> - Better supportability

I like this idea, but this really isn't my area of expertise so you'll
have to wait to see what Jan and Tim say.

> With regards to PV:
> Though XEN_DOMCTL_set_recipient works for both PV and HVM this patchset does not
> bring PV kexec/kdump support. xc_domain_soft_reset() is limited to work with HVM
> domains only. The main reason for that is: it is (in theory) possible to save p2m
> and rebuild them with the new domain but that would only allow us to resume execution
> from where we stopped. If we want to execute new kernel we need to build the same
> kernel/initrd/bootstrap_pagetables/... structure we build to boot PV domain initially.
> That however would destroy the original domain's memory thus making kdump impossible.
> To make everything work additional support from kexec userspace/linux kernel is
> required and I'm not sure it makes sense to implement all this stuff in the light of
> PVH.

I'm fine with limiting this to HVM guests.

David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFCv3 0/8] toolstack-based approach to pvhvm guest kexec
  2014-10-07 13:10 [PATCH RFCv3 0/8] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (8 preceding siblings ...)
  2014-10-07 13:28 ` [PATCH RFCv3 0/8] toolstack-based approach to pvhvm guest kexec David Vrabel
@ 2014-11-03 13:21 ` Vitaly Kuznetsov
  2014-11-03 13:32   ` Jan Beulich
  9 siblings, 1 reply; 17+ messages in thread
From: Vitaly Kuznetsov @ 2014-11-03 13:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Ian Campbell, Andrew Cooper, Tim Deegan,
	David Vrabel, Jan Beulich, Ian Jackson

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> Changes from RFC/WIPv2:
>
> Here is a slightly different approach to memory reassignment. Instead of
> introducing new (and very doubtful) XENMEM_transfer operation introduce
> simple XEN_DOMCTL_set_recipient operation and do everything in free_domheap_pages()
> handler utilizing normal domain destroy path. This is better because:
> - The approach is general-enough
> - All memory pages are usually being freed when the domain is destroyed
> - No special grants handling required
> - Better supportability
>
> With regards to PV:
> Though XEN_DOMCTL_set_recipient works for both PV and HVM this patchset does not
> bring PV kexec/kdump support. xc_domain_soft_reset() is limited to work with HVM
> domains only. The main reason for that is: it is (in theory) possible to save p2m
> and rebuild them with the new domain but that would only allow us to resume execution
> from where we stopped. If we want to execute new kernel we need to build the same
> kernel/initrd/bootstrap_pagetables/... structure we build to boot PV domain initially.
> That however would destroy the original domain's memory thus making kdump impossible.
> To make everything work additional support from kexec userspace/linux kernel is
> required and I'm not sure it makes sense to implement all this stuff in the light of
> PVH.

I don't want to create noise but I didn't get many comments on this RFC
(the only one belongs to David). In case there are no comments on RFC
(design) level I can rebase and send first non-RFC.

I understand 4.5 release is the main focus atm though I would greatly
appreciate some feedback.

Thanks,

>
> Original description:
>
> When a PVHVM linux guest performs kexec there are lots of things which
> require taking care of:
> - shared info, vcpu_info
> - grants
> - event channels
> - ...
> Instead of taking care of all these things we can rebuild the domain
> performing kexec from scratch doing so-called soft-reboot.
>
> The idea was suggested by David Vrabel, Jan Beulich, and Konrad Rzeszutek Wilk.
>
> P.S. The patch series can be tested with PVHVM Linux guest with the following
> modifications:
>
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index c0cb11f..33c5cdd 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -33,6 +33,10 @@
>  #include <linux/memblock.h>
>  #include <linux/edd.h>
>
> +#ifdef CONFIG_KEXEC
> +#include <linux/kexec.h>
> +#endif
> +
>  #include <xen/xen.h>
>  #include <xen/events.h>
>  #include <xen/interface/xen.h>
> @@ -1810,6 +1814,22 @@ static struct notifier_block xen_hvm_cpu_notifier = {
>    .notifier_call   = xen_hvm_cpu_notify,
>  };
>
> +#ifdef CONFIG_KEXEC
> +static void xen_pvhvm_kexec_shutdown(void)
> +{
> +	native_machine_shutdown();
> +	if (kexec_in_progress) {
> +	   xen_reboot(SHUTDOWN_soft_reset);
> +	   }
> +}
> +
> +static void xen_pvhvm_crash_shutdown(struct pt_regs *regs)
> +{
> +	native_machine_crash_shutdown(regs);
> +	xen_reboot(SHUTDOWN_soft_reset);
> +}
> +#endif
> +
>  static void __init xen_hvm_guest_init(void)
>  {
> 	init_hvm_pv_info();
> @@ -1826,6 +1846,10 @@ static void __init xen_hvm_guest_init(void)
>    x86_init.irqs.intr_init = xen_init_IRQ;
>    xen_hvm_init_time_ops();
>    xen_hvm_init_mmu_ops();
> +#ifdef CONFIG_KEXEC
> +	machine_ops.shutdown = xen_pvhvm_kexec_shutdown;
> +	machine_ops.crash_shutdown = xen_pvhvm_crash_shutdown;
> +#endif
>  }
>
>  static bool xen_nopv = false;
> diff --git a/include/xen/interface/sched.h b/include/xen/interface/sched.h
> index 9ce0839..b5942a8 100644
> --- a/include/xen/interface/sched.h
> +++ b/include/xen/interface/sched.h
> @@ -107,5 +107,6 @@ struct sched_watchdog {
>  #define SHUTDOWN_suspend    2  /* Clean up, save suspend info, kill.         */
>  #define SHUTDOWN_crash      3  /* Tell controller we've crashed.             */
>  #define SHUTDOWN_watchdog   4  /* Restart because watchdog time expired.     */
> +#define SHUTDOWN_soft_reset 5  /* Soft-reset for kexec.                      */
>
>  #endif /* __XEN_PUBLIC_SCHED_H__ */
>
> Vitaly Kuznetsov (8):
>   xen: introduce SHUTDOWN_soft_reset shutdown reason
>   libxl: support SHUTDOWN_soft_reset shutdown reason
>   xen: introduce XEN_DOMCTL_set_recipient
>   libxc: support XEN_DOMCTL_set_recipient
>   libxl: add libxl__domain_soft_reset_destroy_old()
>   libxc: introduce soft reset for HVM domains
>   libxl: soft reset support
>   xsm: add XEN_DOMCTL_set_recipient support
>
>  tools/libxc/Makefile                |   1 +
>  tools/libxc/xc_domain.c             |  10 +
>  tools/libxc/xc_domain_soft_reset.c  | 394 ++++++++++++++++++++++++++++++++++++
>  tools/libxc/xenctrl.h               |  14 ++
>  tools/libxc/xenguest.h              |  20 ++
>  tools/libxl/libxl.c                 |  32 ++-
>  tools/libxl/libxl.h                 |   6 +
>  tools/libxl/libxl_create.c          | 103 +++++++++-
>  tools/libxl/libxl_internal.h        |   8 +
>  tools/libxl/libxl_types.idl         |   1 +
>  tools/libxl/xl_cmdimpl.c            |  24 ++-
>  tools/python/xen/lowlevel/xl/xl.c   |   1 +
>  xen/common/domain.c                 |   3 +
>  xen/common/domctl.c                 |  35 ++++
>  xen/common/page_alloc.c             |  26 ++-
>  xen/common/shutdown.c               |   7 +
>  xen/include/public/domctl.h         |  19 ++
>  xen/include/public/sched.h          |   3 +-
>  xen/include/xen/sched.h             |   1 +
>  xen/include/xsm/dummy.h             |   6 +
>  xen/include/xsm/xsm.h               |   6 +
>  xen/xsm/dummy.c                     |   1 +
>  xen/xsm/flask/hooks.c               |  17 ++
>  xen/xsm/flask/policy/access_vectors |  10 +
>  24 files changed, 726 insertions(+), 22 deletions(-)
>  create mode 100644 tools/libxc/xc_domain_soft_reset.c

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFCv3 0/8] toolstack-based approach to pvhvm guest kexec
  2014-11-03 13:21 ` Vitaly Kuznetsov
@ 2014-11-03 13:32   ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2014-11-03 13:32 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andrew Jones, Ian Campbell, Andrew Cooper, Ian Jackson,
	Tim Deegan, David Vrabel, xen-devel

>>> On 03.11.14 at 14:21, <vkuznets@redhat.com> wrote:
> I don't want to create noise but I didn't get many comments on this RFC
> (the only one belongs to David). In case there are no comments on RFC
> (design) level I can rebase and send first non-RFC.

Afaic, I will likely have time to look at the hypervisor parts of this
only when 4.5 has (at least mostly) settled.

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFCv3 3/8] xen: introduce XEN_DOMCTL_set_recipient
  2014-10-07 13:10 ` [PATCH RFCv3 3/8] xen: introduce XEN_DOMCTL_set_recipient Vitaly Kuznetsov
@ 2014-11-25 14:51   ` Jan Beulich
  2014-11-25 15:41     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2014-11-25 14:51 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andrew Jones, Ian Campbell, Andrew Cooper, David Vrabel, xen-devel

>>> On 07.10.14 at 15:10, <vkuznets@redhat.com> wrote:
> New operation sets the 'recipient' domain which will recieve all
> memory pages from a particular domain when these pages are freed.



> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -1152,6 +1152,33 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>      }
>      break;
>  
> +    case XEN_DOMCTL_set_recipient:
> +    {
> +        struct domain *recipient_dom;
> +
> +        if ( op->u.recipient.recipient == DOMID_INVALID )
> +        {
> +            if ( d->recipient )
> +            {
> +                put_domain(d->recipient);
> +            }

Please drop pointless braces like these and ...

> +            d->recipient = NULL;
> +            break;
> +        }
> +
> +        recipient_dom = get_domain_by_id(op->u.recipient.recipient);
> +        if ( recipient_dom == NULL )
> +        {
> +            ret = -ESRCH;
> +            break;
> +        }
> +        else
> +        {
> +            d->recipient = recipient_dom;
> +        }

... the pointless else-s/break-s like here.

> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1707,6 +1707,7 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
>  {
>      struct domain *d = page_get_owner(pg);
>      unsigned int i;
> +    unsigned long mfn, gmfn;
>      bool_t drop_dom_ref;
>  
>      ASSERT(!in_irq());
> @@ -1764,11 +1765,28 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
>              scrub = 1;
>          }
>  
> -        if ( unlikely(scrub) )
> -            for ( i = 0; i < (1 << order); i++ )
> -                scrub_one_page(&pg[i]);
> +        if ( !d || !d->recipient || d->recipient->is_dying )
> +        {
> +            if ( unlikely(scrub) )
> +                for ( i = 0; i < (1 << order); i++ )
> +                    scrub_one_page(&pg[i]);
>  
> -        free_heap_pages(pg, order);
> +            free_heap_pages(pg, order);
> +        }
> +        else
> +        {
> +            mfn = page_to_mfn(pg);
> +            gmfn = mfn_to_gmfn(d, mfn);
> +
> +            page_set_owner(pg, NULL);
> +            assign_pages(d->recipient, pg, order, 0);

This can fail.

> +
> +            if ( guest_physmap_add_page(d->recipient, gmfn, mfn, order) )
> +            {
> +                gdprintk(XENLOG_INFO, "Failed to add page to domain's physmap"
> +                         " mfn: %lx\n", mfn);

The current domain/vCPU don't matter here afaict, hence no need
for gdprintk(). The source and/or recipient domain do matter though,
hence printing either or both would seem desirable. The gMFN may
be relevant for diagnostic purposes too. Hence e.g.

                printk(XENLOG_G_INFO "Failed to add MFN %lx (GFN %lx) to Dom%d's physmap\n"
                       mfn, gmfn, d->recipient->domain_id);

What's worse though is that you don't guard against
XEN_DOMCTL_set_recipient zapping d->recipient. If that really is
necessary to be supported, some synchronization will be needed.

And finally - what's the intended protocol for the tool stack to
know that all pages got transferred (and hence the new domain
can be launched)?

> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -965,6 +965,23 @@ struct xen_domctl_vnuma {
>  typedef struct xen_domctl_vnuma xen_domctl_vnuma_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_vnuma_t);
>  
> +/*
> + * XEN_DOMCTL_set_recipient - sets the 'recipient' domain which will recieve
> + * all the domain's memory after its death or when and memory page from
> + * domain's heap is being freed.

So this specifically allows for this to happen on a non-dying domain.
Why, i.e. what's the intended usage scenario? If not really needed,
please remove this and verify in the handling that the source domain
is dying.

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -366,6 +366,7 @@ struct domain
>      bool_t           is_privileged;
>      /* Which guest this guest has privileges on */
>      struct domain   *target;
> +    struct domain   *recipient;

Please add a brief comment similar to the one for "target".

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFCv3 3/8] xen: introduce XEN_DOMCTL_set_recipient
  2014-11-25 14:51   ` Jan Beulich
@ 2014-11-25 15:41     ` Vitaly Kuznetsov
  2014-11-25 16:32       ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Vitaly Kuznetsov @ 2014-11-25 15:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Jones, Ian Campbell, Andrew Cooper, David Vrabel, xen-devel

"Jan Beulich" <JBeulich@suse.com> writes:

Thanks for the review!

>>>> On 07.10.14 at 15:10, <vkuznets@redhat.com> wrote:
>> New operation sets the 'recipient' domain which will recieve all
>> memory pages from a particular domain when these pages are freed.
>
>> --- a/xen/common/domctl.c
>> +++ b/xen/common/domctl.c
>> @@ -1152,6 +1152,33 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>>      }
>>      break;
>>  
>> +    case XEN_DOMCTL_set_recipient:
>> +    {
>> +        struct domain *recipient_dom;
>> +
>> +        if ( op->u.recipient.recipient == DOMID_INVALID )
>> +        {
>> +            if ( d->recipient )
>> +            {
>> +                put_domain(d->recipient);
>> +            }
>
> Please drop pointless braces like these and ...
>
>> +            d->recipient = NULL;
>> +            break;
>> +        }
>> +
>> +        recipient_dom = get_domain_by_id(op->u.recipient.recipient);
>> +        if ( recipient_dom == NULL )
>> +        {
>> +            ret = -ESRCH;
>> +            break;
>> +        }
>> +        else
>> +        {
>> +            d->recipient = recipient_dom;
>> +        }
>
> ... the pointless else-s/break-s like here.
>
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -1707,6 +1707,7 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
>>  {
>>      struct domain *d = page_get_owner(pg);
>>      unsigned int i;
>> +    unsigned long mfn, gmfn;
>>      bool_t drop_dom_ref;
>>  
>>      ASSERT(!in_irq());
>> @@ -1764,11 +1765,28 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
>>              scrub = 1;
>>          }
>>  
>> -        if ( unlikely(scrub) )
>> -            for ( i = 0; i < (1 << order); i++ )
>> -                scrub_one_page(&pg[i]);
>> +        if ( !d || !d->recipient || d->recipient->is_dying )
>> +        {
>> +            if ( unlikely(scrub) )
>> +                for ( i = 0; i < (1 << order); i++ )
>> +                    scrub_one_page(&pg[i]);
>>  
>> -        free_heap_pages(pg, order);
>> +            free_heap_pages(pg, order);
>> +        }
>> +        else
>> +        {
>> +            mfn = page_to_mfn(pg);
>> +            gmfn = mfn_to_gmfn(d, mfn);
>> +
>> +            page_set_owner(pg, NULL);
>> +            assign_pages(d->recipient, pg, order, 0);
>
> This can fail.

.. if the recipient is dying or we're over-allocating. Shouldn't happen
(if toolstack does its job right) but I'll add a check.

>
>> +
>> +            if ( guest_physmap_add_page(d->recipient, gmfn, mfn, order) )
>> +            {
>> +                gdprintk(XENLOG_INFO, "Failed to add page to domain's physmap"
>> +                         " mfn: %lx\n", mfn);
>
> The current domain/vCPU don't matter here afaict, hence no need
> for gdprintk(). The source and/or recipient domain do matter though,
> hence printing either or both would seem desirable. The gMFN may
> be relevant for diagnostic purposes too. Hence e.g.
>
>                 printk(XENLOG_G_INFO "Failed to add MFN %lx (GFN %lx) to Dom%d's physmap\n"
>                        mfn, gmfn, d->recipient->domain_id);
>
> What's worse though is that you don't guard against
> XEN_DOMCTL_set_recipient zapping d->recipient. If that really is
> necessary to be supported, some synchronization will be needed.
>
> And finally - what's the intended protocol for the tool stack to
> know that all pages got transferred (and hence the new domain
> can be launched)?
>

(hope this answers both questions above)

Now the protocol is:
1) Toolstack queries for all page frames (with xc_get_pfn_type_batch())
for the old domain.
2) Toolstack issues XEN_DOMCTL_set_recipient with the recipient domain
3) Toolstack kills the source domain
4) Toolstack waits for awhile (loop keeps checking while we see new pages
arriving + some timeout).
5) Toolstack issues XEN_DOMCTL_set_recipient with DOMID_INVALID
resetting recipient.
6) Toolstack checks if all pages were transferred
7) If some pages are missing (e.g. source domain became zombie holding
something) request new empty pages instead.
8) Toolstack starts new domain

So we don't actually need XEN_DOMCTL_set_recipient to switch from one
recipient to the other, we just need a way to reset it.

>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -965,6 +965,23 @@ struct xen_domctl_vnuma {
>>  typedef struct xen_domctl_vnuma xen_domctl_vnuma_t;
>>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_vnuma_t);
>>  
>> +/*
>> + * XEN_DOMCTL_set_recipient - sets the 'recipient' domain which will recieve
>> + * all the domain's memory after its death or when and memory page from
>> + * domain's heap is being freed.
>
> So this specifically allows for this to happen on a non-dying domain.
> Why, i.e. what's the intended usage scenario? If not really needed,
> please remove this and verify in the handling that the source domain
> is dying.

Sorry if I didn't get this comment right. We need a way to tell which
domain will recieve memory and XEN_DOMCTL_set_recipient sets (and
resets) this target domain. We call it from toolstack before we start
killing old domain so it is not dying yet. We can't do it
hypervistor-side when our source domain exits doing SHUTDOWN_soft_reset
as there is no recipient domain created yet.

>From a general (hypervisor) point of view we don't actually care if the
domain is dying or not. We just want to recieve all freed pages from
this domain (so they go to some other domain instead of trash).

>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -366,6 +366,7 @@ struct domain
>>      bool_t           is_privileged;
>>      /* Which guest this guest has privileges on */
>>      struct domain   *target;
>> +    struct domain   *recipient;
>
> Please add a brief comment similar to the one for "target".
>

Thanks again,

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFCv3 3/8] xen: introduce XEN_DOMCTL_set_recipient
  2014-11-25 15:41     ` Vitaly Kuznetsov
@ 2014-11-25 16:32       ` Jan Beulich
  2014-11-25 17:10         ` Vitaly Kuznetsov
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2014-11-25 16:32 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andrew Jones, Ian Campbell, Andrew Cooper, David Vrabel, xen-devel

>>> On 25.11.14 at 16:41, <vkuznets@redhat.com> wrote:
> "Jan Beulich" <JBeulich@suse.com> writes:
>>>>> On 07.10.14 at 15:10, <vkuznets@redhat.com> wrote:
>>> @@ -1764,11 +1765,28 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
>>>              scrub = 1;
>>>          }
>>>  
>>> -        if ( unlikely(scrub) )
>>> -            for ( i = 0; i < (1 << order); i++ )
>>> -                scrub_one_page(&pg[i]);
>>> +        if ( !d || !d->recipient || d->recipient->is_dying )
>>> +        {
>>> +            if ( unlikely(scrub) )
>>> +                for ( i = 0; i < (1 << order); i++ )
>>> +                    scrub_one_page(&pg[i]);
>>>  
>>> -        free_heap_pages(pg, order);
>>> +            free_heap_pages(pg, order);
>>> +        }
>>> +        else
>>> +        {
>>> +            mfn = page_to_mfn(pg);
>>> +            gmfn = mfn_to_gmfn(d, mfn);
>>> +
>>> +            page_set_owner(pg, NULL);
>>> +            assign_pages(d->recipient, pg, order, 0);
>>
>> This can fail.
> 
> .. if the recipient is dying or we're over-allocating. Shouldn't happen
> (if toolstack does its job right) but I'll add a check.

You must not rely on the tool stack doing things right (see XSA-77).

>> What's worse though is that you don't guard against
>> XEN_DOMCTL_set_recipient zapping d->recipient. If that really is
>> necessary to be supported, some synchronization will be needed.
>>
>> And finally - what's the intended protocol for the tool stack to
>> know that all pages got transferred (and hence the new domain
>> can be launched)?
>>
> 
> (hope this answers both questions above)
> 
> Now the protocol is:
> 1) Toolstack queries for all page frames (with xc_get_pfn_type_batch())
> for the old domain.
> 2) Toolstack issues XEN_DOMCTL_set_recipient with the recipient domain
> 3) Toolstack kills the source domain
> 4) Toolstack waits for awhile (loop keeps checking while we see new pages
> arriving + some timeout).
> 5) Toolstack issues XEN_DOMCTL_set_recipient with DOMID_INVALID
> resetting recipient.
> 6) Toolstack checks if all pages were transferred
> 7) If some pages are missing (e.g. source domain became zombie holding
> something) request new empty pages instead.
> 8) Toolstack starts new domain
> 
> So we don't actually need XEN_DOMCTL_set_recipient to switch from one
> recipient to the other, we just need a way to reset it.

No, this doesn't address either question. For the first one, you again
assume the tool stack behaves correctly, which is wrong nowadays.
And for the second you give a description, but that's not really a
dependable protocol. Nor do I really follow why resetting the recipient
is necessary. Can't the tools e.g. wait for the final death of the domain
if there's no other notification?

>>> --- a/xen/include/public/domctl.h
>>> +++ b/xen/include/public/domctl.h
>>> @@ -965,6 +965,23 @@ struct xen_domctl_vnuma {
>>>  typedef struct xen_domctl_vnuma xen_domctl_vnuma_t;
>>>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_vnuma_t);
>>>  
>>> +/*
>>> + * XEN_DOMCTL_set_recipient - sets the 'recipient' domain which will recieve
>>> + * all the domain's memory after its death or when and memory page from
>>> + * domain's heap is being freed.
>>
>> So this specifically allows for this to happen on a non-dying domain.
>> Why, i.e. what's the intended usage scenario? If not really needed,
>> please remove this and verify in the handling that the source domain
>> is dying.
> 
> Sorry if I didn't get this comment right. We need a way to tell which
> domain will recieve memory and XEN_DOMCTL_set_recipient sets (and
> resets) this target domain. We call it from toolstack before we start
> killing old domain so it is not dying yet. We can't do it
> hypervistor-side when our source domain exits doing SHUTDOWN_soft_reset
> as there is no recipient domain created yet.
> 
> From a general (hypervisor) point of view we don't actually care if the
> domain is dying or not. We just want to recieve all freed pages from
> this domain (so they go to some other domain instead of trash).

We do care I think, primarily because we want a correct GMFN <->
MFN mapping. Seeing multiple pages for the same GMFN is for
example going to cause the whole process in its current form to fail.
And considering the need for such a correct mapping - is it always
the case that the mapping gets updated after a page got removed
from a guest? (I can see that the mapping doesn't change anymore
for a dying guest, but you explicitly say that you want/need this to
work before the guest is actually marked dying.)

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFCv3 3/8] xen: introduce XEN_DOMCTL_set_recipient
  2014-11-25 16:32       ` Jan Beulich
@ 2014-11-25 17:10         ` Vitaly Kuznetsov
  2014-11-27  8:49           ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Vitaly Kuznetsov @ 2014-11-25 17:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Jones, Ian Campbell, Andrew Cooper, David Vrabel, xen-devel

"Jan Beulich" <JBeulich@suse.com> writes:

>>>> On 25.11.14 at 16:41, <vkuznets@redhat.com> wrote:
>> "Jan Beulich" <JBeulich@suse.com> writes:
>>>>>> On 07.10.14 at 15:10, <vkuznets@redhat.com> wrote:
>>>> @@ -1764,11 +1765,28 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
>>>>              scrub = 1;
>>>>          }
>>>>  
>>>> -        if ( unlikely(scrub) )
>>>> -            for ( i = 0; i < (1 << order); i++ )
>>>> -                scrub_one_page(&pg[i]);
>>>> +        if ( !d || !d->recipient || d->recipient->is_dying )
>>>> +        {
>>>> +            if ( unlikely(scrub) )
>>>> +                for ( i = 0; i < (1 << order); i++ )
>>>> +                    scrub_one_page(&pg[i]);
>>>>  
>>>> -        free_heap_pages(pg, order);
>>>> +            free_heap_pages(pg, order);
>>>> +        }
>>>> +        else
>>>> +        {
>>>> +            mfn = page_to_mfn(pg);
>>>> +            gmfn = mfn_to_gmfn(d, mfn);
>>>> +
>>>> +            page_set_owner(pg, NULL);
>>>> +            assign_pages(d->recipient, pg, order, 0);
>>>
>>> This can fail.
>> 
>> .. if the recipient is dying or we're over-allocating. Shouldn't happen
>> (if toolstack does its job right) but I'll add a check.
>
> You must not rely on the tool stack doing things right (see XSA-77).
>
>>> What's worse though is that you don't guard against
>>> XEN_DOMCTL_set_recipient zapping d->recipient. If that really is
>>> necessary to be supported, some synchronization will be needed.
>>>
>>> And finally - what's the intended protocol for the tool stack to
>>> know that all pages got transferred (and hence the new domain
>>> can be launched)?
>>>
>> 
>> (hope this answers both questions above)
>> 
>> Now the protocol is:
>> 1) Toolstack queries for all page frames (with xc_get_pfn_type_batch())
>> for the old domain.
>> 2) Toolstack issues XEN_DOMCTL_set_recipient with the recipient domain
>> 3) Toolstack kills the source domain
>> 4) Toolstack waits for awhile (loop keeps checking while we see new pages
>> arriving + some timeout).
>> 5) Toolstack issues XEN_DOMCTL_set_recipient with DOMID_INVALID
>> resetting recipient.
>> 6) Toolstack checks if all pages were transferred
>> 7) If some pages are missing (e.g. source domain became zombie holding
>> something) request new empty pages instead.
>> 8) Toolstack starts new domain
>> 
>> So we don't actually need XEN_DOMCTL_set_recipient to switch from one
>> recipient to the other, we just need a way to reset it.
>
> No, this doesn't address either question. For the first one, you again
> assume the tool stack behaves correctly, which is wrong nowadays.
> And for the second you give a description, but that's not really a
> dependable protocol. Nor do I really follow why resetting the recipient
> is necessary. Can't the tools e.g. wait for the final death of the domain
> if there's no other notification?

Yes, it's possible and should happen in all normal cases. However my
idea was that it's possible to start new domain even if the old one
fails to die holding several (presumably special) pages and we're fine
with replacing those with empty pages. In case we go for 'always wait
for the original domain to die' solution resetting
XEN_DOMCTL_set_recipient is not necessary (it is necessary in case we
don't as we can recieve a page after we already requested a new one
instead).

>
>>>> --- a/xen/include/public/domctl.h
>>>> +++ b/xen/include/public/domctl.h
>>>> @@ -965,6 +965,23 @@ struct xen_domctl_vnuma {
>>>>  typedef struct xen_domctl_vnuma xen_domctl_vnuma_t;
>>>>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_vnuma_t);
>>>>  
>>>> +/*
>>>> + * XEN_DOMCTL_set_recipient - sets the 'recipient' domain which will recieve
>>>> + * all the domain's memory after its death or when and memory page from
>>>> + * domain's heap is being freed.
>>>
>>> So this specifically allows for this to happen on a non-dying domain.
>>> Why, i.e. what's the intended usage scenario? If not really needed,
>>> please remove this and verify in the handling that the source domain
>>> is dying.
>> 
>> Sorry if I didn't get this comment right. We need a way to tell which
>> domain will recieve memory and XEN_DOMCTL_set_recipient sets (and
>> resets) this target domain. We call it from toolstack before we start
>> killing old domain so it is not dying yet. We can't do it
>> hypervistor-side when our source domain exits doing SHUTDOWN_soft_reset
>> as there is no recipient domain created yet.
>> 
>> From a general (hypervisor) point of view we don't actually care if the
>> domain is dying or not. We just want to recieve all freed pages from
>> this domain (so they go to some other domain instead of trash).
>
> We do care I think, primarily because we want a correct GMFN <->
> MFN mapping. Seeing multiple pages for the same GMFN is for
> example going to cause the whole process in its current form to fail.

Can adding a check that nothing is mapped to the GMFN before mapping new
MFN there be a solution?

> And considering the need for such a correct mapping - is it always
> the case that the mapping gets updated after a page got removed
> from a guest? (I can see that the mapping doesn't change anymore
> for a dying guest, but you explicitly say that you want/need this to
> work before the guest is actually marked dying.)

Actual reassignment here happens for a dying guest only as
XEN_DOMCTL_set_recipient does nothing. If you think it's safer to depend
on the fact that dying flag is always set we can couple
XEN_DOMCTL_set_recipient with XEN_DOMCTL_destroydomain in one call. It
is possible if we go for 'always wait for the original domain to die'
solution (so no reset is required).

>
> Jan

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFCv3 3/8] xen: introduce XEN_DOMCTL_set_recipient
  2014-11-25 17:10         ` Vitaly Kuznetsov
@ 2014-11-27  8:49           ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2014-11-27  8:49 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andrew Jones, Ian Campbell, Andrew Cooper, David Vrabel, xen-devel

>>> On 25.11.14 at 18:10, <vkuznets@redhat.com> wrote:
> "Jan Beulich" <JBeulich@suse.com> writes:
> 
>>>>> On 25.11.14 at 16:41, <vkuznets@redhat.com> wrote:
>>> "Jan Beulich" <JBeulich@suse.com> writes:
>>>>>>> On 07.10.14 at 15:10, <vkuznets@redhat.com> wrote:
>>>>> @@ -1764,11 +1765,28 @@ void free_domheap_pages(struct page_info *pg, 
> unsigned int order)
>>>>>              scrub = 1;
>>>>>          }
>>>>>  
>>>>> -        if ( unlikely(scrub) )
>>>>> -            for ( i = 0; i < (1 << order); i++ )
>>>>> -                scrub_one_page(&pg[i]);
>>>>> +        if ( !d || !d->recipient || d->recipient->is_dying )
>>>>> +        {
>>>>> +            if ( unlikely(scrub) )
>>>>> +                for ( i = 0; i < (1 << order); i++ )
>>>>> +                    scrub_one_page(&pg[i]);
>>>>>  
>>>>> -        free_heap_pages(pg, order);
>>>>> +            free_heap_pages(pg, order);
>>>>> +        }
>>>>> +        else
>>>>> +        {
>>>>> +            mfn = page_to_mfn(pg);
>>>>> +            gmfn = mfn_to_gmfn(d, mfn);
>>>>> +
>>>>> +            page_set_owner(pg, NULL);
>>>>> +            assign_pages(d->recipient, pg, order, 0);
>>>>
>>>> This can fail.
>>> 
>>> .. if the recipient is dying or we're over-allocating. Shouldn't happen
>>> (if toolstack does its job right) but I'll add a check.
>>
>> You must not rely on the tool stack doing things right (see XSA-77).
>>
>>>> What's worse though is that you don't guard against
>>>> XEN_DOMCTL_set_recipient zapping d->recipient. If that really is
>>>> necessary to be supported, some synchronization will be needed.
>>>>
>>>> And finally - what's the intended protocol for the tool stack to
>>>> know that all pages got transferred (and hence the new domain
>>>> can be launched)?
>>>>
>>> 
>>> (hope this answers both questions above)
>>> 
>>> Now the protocol is:
>>> 1) Toolstack queries for all page frames (with xc_get_pfn_type_batch())
>>> for the old domain.
>>> 2) Toolstack issues XEN_DOMCTL_set_recipient with the recipient domain
>>> 3) Toolstack kills the source domain
>>> 4) Toolstack waits for awhile (loop keeps checking while we see new pages
>>> arriving + some timeout).
>>> 5) Toolstack issues XEN_DOMCTL_set_recipient with DOMID_INVALID
>>> resetting recipient.
>>> 6) Toolstack checks if all pages were transferred
>>> 7) If some pages are missing (e.g. source domain became zombie holding
>>> something) request new empty pages instead.
>>> 8) Toolstack starts new domain
>>> 
>>> So we don't actually need XEN_DOMCTL_set_recipient to switch from one
>>> recipient to the other, we just need a way to reset it.
>>
>> No, this doesn't address either question. For the first one, you again
>> assume the tool stack behaves correctly, which is wrong nowadays.
>> And for the second you give a description, but that's not really a
>> dependable protocol. Nor do I really follow why resetting the recipient
>> is necessary. Can't the tools e.g. wait for the final death of the domain
>> if there's no other notification?
> 
> Yes, it's possible and should happen in all normal cases. However my
> idea was that it's possible to start new domain even if the old one
> fails to die holding several (presumably special) pages and we're fine
> with replacing those with empty pages. In case we go for 'always wait
> for the original domain to die' solution resetting
> XEN_DOMCTL_set_recipient is not necessary (it is necessary in case we
> don't as we can recieve a page after we already requested a new one
> instead).

I think this "always wait" is imo the only reasonable solution: How
would replacing some pages with empty ones fit the kexec/kdump
purposes? However, it may not be necessary to wait for the
domain to completely disappear, it may be sufficient to wait for
its d->tot_pages to become zero.

>>> From a general (hypervisor) point of view we don't actually care if the
>>> domain is dying or not. We just want to recieve all freed pages from
>>> this domain (so they go to some other domain instead of trash).
>>
>> We do care I think, primarily because we want a correct GMFN <->
>> MFN mapping. Seeing multiple pages for the same GMFN is for
>> example going to cause the whole process in its current form to fail.
> 
> Can adding a check that nothing is mapped to the GMFN before mapping new
> MFN there be a solution?

Not checking for this is presumably the better approach - the new
domain would get what the old one had last at a given GMFN. What
it may not get are pages that intermediately got moved around.
But again, all that is relevant only if the old domain can still alter its
memory layout while the transfer is already in progress, which (as
said before) I think should be avoided.

>> And considering the need for such a correct mapping - is it always
>> the case that the mapping gets updated after a page got removed
>> from a guest? (I can see that the mapping doesn't change anymore
>> for a dying guest, but you explicitly say that you want/need this to
>> work before the guest is actually marked dying.)
> 
> Actual reassignment here happens for a dying guest only as
> XEN_DOMCTL_set_recipient does nothing. If you think it's safer to depend
> on the fact that dying flag is always set we can couple
> XEN_DOMCTL_set_recipient with XEN_DOMCTL_destroydomain in one call. It
> is possible if we go for 'always wait for the original domain to die'
> solution (so no reset is required).

I indeed think that's the safer and more correct route.

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-11-27  8:49 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-07 13:10 [PATCH RFCv3 0/8] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
2014-10-07 13:10 ` [PATCH RFCv3 1/8] xen: introduce SHUTDOWN_soft_reset shutdown reason Vitaly Kuznetsov
2014-10-07 13:10 ` [PATCH RFCv3 2/8] libxl: support " Vitaly Kuznetsov
2014-10-07 13:10 ` [PATCH RFCv3 3/8] xen: introduce XEN_DOMCTL_set_recipient Vitaly Kuznetsov
2014-11-25 14:51   ` Jan Beulich
2014-11-25 15:41     ` Vitaly Kuznetsov
2014-11-25 16:32       ` Jan Beulich
2014-11-25 17:10         ` Vitaly Kuznetsov
2014-11-27  8:49           ` Jan Beulich
2014-10-07 13:10 ` [PATCH RFCv3 4/8] libxc: support XEN_DOMCTL_set_recipient Vitaly Kuznetsov
2014-10-07 13:10 ` [PATCH RFCv3 5/8] libxl: add libxl__domain_soft_reset_destroy_old() Vitaly Kuznetsov
2014-10-07 13:10 ` [PATCH RFCv3 6/8] libxc: introduce soft reset for HVM domains Vitaly Kuznetsov
2014-10-07 13:10 ` [PATCH RFCv3 7/8] libxl: soft reset support Vitaly Kuznetsov
2014-10-07 13:10 ` [PATCH RFCv3 8/8] xsm: add XEN_DOMCTL_set_recipient support Vitaly Kuznetsov
2014-10-07 13:28 ` [PATCH RFCv3 0/8] toolstack-based approach to pvhvm guest kexec David Vrabel
2014-11-03 13:21 ` Vitaly Kuznetsov
2014-11-03 13:32   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.