All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec
@ 2014-12-11 13:45 Vitaly Kuznetsov
  2014-12-11 13:45 ` [PATCH v5 1/9] xen: introduce DOMDYING_locked state Vitaly Kuznetsov
                   ` (9 more replies)
  0 siblings, 10 replies; 39+ messages in thread
From: Vitaly Kuznetsov @ 2014-12-11 13:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu

This patch series provides x86 PVHVM domains with an ability to perform
kexec/kdump.

Changes from v4:
- "on_soft_reset" option was introduced, now it's possible to specify the behavior
  [Wei Liu]
- renamed libxl__domain_soft_reset_destroy_old to libxl__domain_soft_reset_destroy
- libxl__domain_soft_reset_destroy now takes gc instead of ctx, coding style fix
  [Wei Liu]
- remove forgotten 'd_config.b_info.nodemap.size = 0' hackaround, add
  reload_domain_config() on soft reset path [Wei Liu]
- add whole procedure description to libxl_internal.h [Wei Liu]
- reword 'recipient' description in struct domain [Julien Grall]

Nothing was done with regards to ARM. It seems that correct implementation of
mfn_to_gmfn() returning propper gmfn for DomUs is the desirable solution in the
long run, however, any additional thoughts here are more than wellcome! 

Changes from RFCv3:
This is the first non-RFC series as no major concerns were expressed. I'm trying
to address Jan's comments. Changes are:
- Move from XEN_DOMCTL_set_recipient to XEN_DOMCTL_devour (I don't really like
  the name but nothing more appropriate came to my mind) which incorporates
  former XEN_DOMCTL_set_recipient and XEN_DOMCTL_destroydomain to prevent
  original domain from changing its allocations during transfer procedure.
- Check in free_domheap_pages() that assign_pages() succeeded.
- Change printk() in free_domheap_pages().
- DOMDYING_locked state was introduced to support XEN_DOMCTL_devour.
- xc_domain_soft_reset() got simplified a bit. Now we just wait for the original
  domain to die or loose all its pages.
- rebased on top of current master branch.

Changes from RFC/WIPv2:

Here is a slightly different approach to memory reassignment. Instead of
introducing new (and very doubtful) XENMEM_transfer operation introduce
simple XEN_DOMCTL_set_recipient operation and do everything in free_domheap_pages()
handler utilizing normal domain destroy path. This is better because:
- The approach is general-enough
- All memory pages are usually being freed when the domain is destroyed
- No special grants handling required
- Better supportability

With regards to PV:
Though XEN_DOMCTL_set_recipient works for both PV and HVM this patchset does not
bring PV kexec/kdump support. xc_domain_soft_reset() is limited to work with HVM
domains only. The main reason for that is: it is (in theory) possible to save p2m
and rebuild them with the new domain but that would only allow us to resume execution
from where we stopped. If we want to execute new kernel we need to build the same
kernel/initrd/bootstrap_pagetables/... structure we build to boot PV domain initially.
That however would destroy the original domain's memory thus making kdump impossible.
To make everything work additional support from kexec userspace/linux kernel is
required and I'm not sure it makes sense to implement all this stuff in the light of
PVH.

Original description:

When a PVHVM linux guest performs kexec there are lots of things which
require taking care of:
- shared info, vcpu_info
- grants
- event channels
- ...
Instead of taking care of all these things we can rebuild the domain
performing kexec from scratch doing so-called soft-reboot.

The idea was suggested by David Vrabel, Jan Beulich, and Konrad Rzeszutek Wilk.

Previous discussions:
This patch series:
http://lists.xen.org/archives/html/xen-devel/2014-12/msg00432.html
http://lists.xen.org/archives/html/xen-devel/2014-10/msg00764.html
http://lists.xen.org/archives/html/xen-devel/2014-09/msg03623.html
http://lists.xen.org/archives/html/xen-devel/2014-08/msg02309.html

on resetting VCPU_info (and that's where 'rebuild everything with the
toolstack solution' was suggested):
http://lists.xen.org/archives/html/xen-devel/2014-08/msg01869.html
Previous versions:
http://lists.xen.org/archives/html/xen-devel/2014-08/msg01630.html
http://lists.xen.org/archives/html/xen-devel/2014-08/msg00603.html

EVTCHNOP_reset (got merged):
http://lists.xen.org/archives/html/xen-devel/2014-07/msg03979.html
Previous:
http://lists.xen.org/archives/html/xen-devel/2014-07/msg03925.html
http://lists.xen.org/archives/html/xen-devel/2014-07/msg03322.html
http://lists.xen.org/archives/html/xen-devel/2014-07/msg02500.html

P.S. The patch series can be tested with PVHVM Linux guest with the following
modifications:

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c0cb11f..33c5cdd 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -33,6 +33,10 @@
 #include <linux/memblock.h>
 #include <linux/edd.h>

+#ifdef CONFIG_KEXEC
+#include <linux/kexec.h>
+#endif
+
 #include <xen/xen.h>
 #include <xen/events.h>
 #include <xen/interface/xen.h>
@@ -1810,6 +1814,22 @@ static struct notifier_block xen_hvm_cpu_notifier = {
   .notifier_call   = xen_hvm_cpu_notify,
 };

+#ifdef CONFIG_KEXEC
+static void xen_pvhvm_kexec_shutdown(void)
+{
+	native_machine_shutdown();
+	if (kexec_in_progress) {
+	   xen_reboot(SHUTDOWN_soft_reset);
+	   }
+}
+
+static void xen_pvhvm_crash_shutdown(struct pt_regs *regs)
+{
+	native_machine_crash_shutdown(regs);
+	xen_reboot(SHUTDOWN_soft_reset);
+}
+#endif
+
 static void __init xen_hvm_guest_init(void)
 {
	init_hvm_pv_info();
@@ -1826,6 +1846,10 @@ static void __init xen_hvm_guest_init(void)
   x86_init.irqs.intr_init = xen_init_IRQ;
   xen_hvm_init_time_ops();
   xen_hvm_init_mmu_ops();
+#ifdef CONFIG_KEXEC
+	machine_ops.shutdown = xen_pvhvm_kexec_shutdown;
+	machine_ops.crash_shutdown = xen_pvhvm_crash_shutdown;
+#endif
 }

 static bool xen_nopv = false;
diff --git a/include/xen/interface/sched.h b/include/xen/interface/sched.h
index 9ce0839..b5942a8 100644
--- a/include/xen/interface/sched.h
+++ b/include/xen/interface/sched.h
@@ -107,5 +107,6 @@ struct sched_watchdog {
 #define SHUTDOWN_suspend    2  /* Clean up, save suspend info, kill.         */
 #define SHUTDOWN_crash      3  /* Tell controller we've crashed.             */
 #define SHUTDOWN_watchdog   4  /* Restart because watchdog time expired.     */
+#define SHUTDOWN_soft_reset 5  /* Soft-reset for kexec.                      */

 #endif /* __XEN_PUBLIC_SCHED_H__ */

Vitaly Kuznetsov (9):
  xen: introduce DOMDYING_locked state
  xen: introduce SHUTDOWN_soft_reset shutdown reason
  libxl: support SHUTDOWN_soft_reset shutdown reason
  xen: introduce XEN_DOMCTL_devour
  libxc: support XEN_DOMCTL_devour
  libxl: add libxl__domain_soft_reset_destroy()
  libxc: introduce soft reset for HVM domains
  libxl: soft reset support
  xsm: add XEN_DOMCTL_devour support

 docs/man/xl.cfg.pod.5               |  12 ++
 tools/libxc/Makefile                |   1 +
 tools/libxc/include/xenctrl.h       |  14 ++
 tools/libxc/include/xenguest.h      |  20 +++
 tools/libxc/xc_domain.c             |  13 ++
 tools/libxc/xc_domain_soft_reset.c  | 282 ++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl.c                 |  33 ++++-
 tools/libxl/libxl.h                 |   6 +
 tools/libxl/libxl_create.c          | 103 +++++++++++--
 tools/libxl/libxl_internal.h        |  30 ++++
 tools/libxl/libxl_types.idl         |   4 +
 tools/libxl/xl_cmdimpl.c            |  37 ++++-
 tools/python/xen/lowlevel/xl/xl.c   |   1 +
 xen/common/domain.c                 |   4 +
 xen/common/domctl.c                 |  39 +++++
 xen/common/page_alloc.c             |  28 +++-
 xen/common/shutdown.c               |   7 +
 xen/include/public/domctl.h         |  15 ++
 xen/include/public/sched.h          |   3 +-
 xen/include/xen/sched.h             |   5 +-
 xen/include/xsm/dummy.h             |   6 +
 xen/include/xsm/xsm.h               |   6 +
 xen/xsm/dummy.c                     |   1 +
 xen/xsm/flask/hooks.c               |  17 +++
 xen/xsm/flask/policy/access_vectors |  10 ++
 25 files changed, 674 insertions(+), 23 deletions(-)
 create mode 100644 tools/libxc/xc_domain_soft_reset.c

-- 
1.9.3

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 1/9] xen: introduce DOMDYING_locked state
  2014-12-11 13:45 [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
@ 2014-12-11 13:45 ` Vitaly Kuznetsov
  2014-12-18 13:23   ` Jan Beulich
  2014-12-11 13:45 ` [PATCH v5 2/9] xen: introduce SHUTDOWN_soft_reset shutdown reason Vitaly Kuznetsov
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: Vitaly Kuznetsov @ 2014-12-11 13:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu

New dying state is requred to indicate that a particular domain
is dying but cleanup procedure wasn't started. This state can be
set from outside of domain_kill().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 xen/common/domain.c     | 1 +
 xen/include/xen/sched.h | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 4a62c1d..c13a7cf 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -603,6 +603,7 @@ int domain_kill(struct domain *d)
     switch ( d->is_dying )
     {
     case DOMDYING_alive:
+    case DOMDYING_locked:
         domain_pause(d);
         d->is_dying = DOMDYING_dying;
         spin_barrier(&d->domain_lock);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 46fc6e3..a42d0b8 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -369,7 +369,8 @@ struct domain
     /* Is this guest being debugged by dom0? */
     bool_t           debugger_attached;
     /* Is this guest dying (i.e., a zombie)? */
-    enum { DOMDYING_alive, DOMDYING_dying, DOMDYING_dead } is_dying;
+    enum { DOMDYING_alive, DOMDYING_locked, DOMDYING_dying, DOMDYING_dead }
+        is_dying;
     /* Domain is paused by controller software? */
     int              controller_pause_count;
     /* Domain's VCPUs are pinned 1:1 to physical CPUs? */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 2/9] xen: introduce SHUTDOWN_soft_reset shutdown reason
  2014-12-11 13:45 [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
  2014-12-11 13:45 ` [PATCH v5 1/9] xen: introduce DOMDYING_locked state Vitaly Kuznetsov
@ 2014-12-11 13:45 ` Vitaly Kuznetsov
  2014-12-18 13:28   ` Jan Beulich
  2015-01-13 12:20   ` Ian Campbell
  2014-12-11 13:45 ` [PATCH v5 3/9] libxl: support " Vitaly Kuznetsov
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 39+ messages in thread
From: Vitaly Kuznetsov @ 2014-12-11 13:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 xen/common/shutdown.c      | 7 +++++++
 xen/include/public/sched.h | 3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/xen/common/shutdown.c b/xen/common/shutdown.c
index 94d4c53..5c3a158 100644
--- a/xen/common/shutdown.c
+++ b/xen/common/shutdown.c
@@ -71,6 +71,13 @@ void hwdom_shutdown(u8 reason)
         break; /* not reached */
     }
 
+    case SHUTDOWN_soft_reset:
+    {
+        printk("Domain 0 did soft reset but it is unsupported, rebooting.\n");
+        machine_restart(0);
+        break; /* not reached */
+    }
+
     default:
     {
         printk("Domain 0 shutdown (unknown reason %u): ", reason);
diff --git a/xen/include/public/sched.h b/xen/include/public/sched.h
index 4000ac9..800c808 100644
--- a/xen/include/public/sched.h
+++ b/xen/include/public/sched.h
@@ -159,7 +159,8 @@ DEFINE_XEN_GUEST_HANDLE(sched_watchdog_t);
 #define SHUTDOWN_suspend    2  /* Clean up, save suspend info, kill.         */
 #define SHUTDOWN_crash      3  /* Tell controller we've crashed.             */
 #define SHUTDOWN_watchdog   4  /* Restart because watchdog time expired.     */
-#define SHUTDOWN_MAX        4  /* Maximum valid shutdown reason.             */
+#define SHUTDOWN_soft_reset 5  /* Soft reset, rebuild keeping memory content */
+#define SHUTDOWN_MAX        5  /* Maximum valid shutdown reason.             */
 /* ` } */
 
 #endif /* __XEN_PUBLIC_SCHED_H__ */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 3/9] libxl: support SHUTDOWN_soft_reset shutdown reason
  2014-12-11 13:45 [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
  2014-12-11 13:45 ` [PATCH v5 1/9] xen: introduce DOMDYING_locked state Vitaly Kuznetsov
  2014-12-11 13:45 ` [PATCH v5 2/9] xen: introduce SHUTDOWN_soft_reset shutdown reason Vitaly Kuznetsov
@ 2014-12-11 13:45 ` Vitaly Kuznetsov
  2015-01-13 12:22   ` Ian Campbell
  2015-01-13 12:23   ` Ian Campbell
  2014-12-11 13:45 ` [PATCH v5 4/9] xen: introduce XEN_DOMCTL_devour Vitaly Kuznetsov
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 39+ messages in thread
From: Vitaly Kuznetsov @ 2014-12-11 13:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu

Use letter 't' to indicate a domain in such state.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/libxl/libxl_types.idl       | 1 +
 tools/libxl/xl_cmdimpl.c          | 2 +-
 tools/python/xen/lowlevel/xl/xl.c | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index f7fc695..4a0e2be 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -175,6 +175,7 @@ libxl_shutdown_reason = Enumeration("shutdown_reason", [
     (2, "suspend"),
     (3, "crash"),
     (4, "watchdog"),
+    (5, "soft_reset"),
     ], init_val = "LIBXL_SHUTDOWN_REASON_UNKNOWN")
 
 libxl_vga_interface_type = Enumeration("vga_interface_type", [
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 3737c7e..b193c3c 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -3506,7 +3506,7 @@ static void list_domains(int verbose, int context, int claim, int numa,
                          const libxl_dominfo *info, int nb_domain)
 {
     int i;
-    static const char shutdown_reason_letters[]= "-rscw";
+    static const char shutdown_reason_letters[]= "-rscwt";
     libxl_bitmap nodemap;
     libxl_physinfo physinfo;
 
diff --git a/tools/python/xen/lowlevel/xl/xl.c b/tools/python/xen/lowlevel/xl/xl.c
index 32f982a..7c61160 100644
--- a/tools/python/xen/lowlevel/xl/xl.c
+++ b/tools/python/xen/lowlevel/xl/xl.c
@@ -784,6 +784,7 @@ PyMODINIT_FUNC initxl(void)
     _INT_CONST_LIBXL(m, SHUTDOWN_REASON_SUSPEND);
     _INT_CONST_LIBXL(m, SHUTDOWN_REASON_CRASH);
     _INT_CONST_LIBXL(m, SHUTDOWN_REASON_WATCHDOG);
+    _INT_CONST_LIBXL(m, SHUTDOWN_REASON_SOFT_RESET);
 
     genwrap__init(m);
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 4/9] xen: introduce XEN_DOMCTL_devour
  2014-12-11 13:45 [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  2014-12-11 13:45 ` [PATCH v5 3/9] libxl: support " Vitaly Kuznetsov
@ 2014-12-11 13:45 ` Vitaly Kuznetsov
  2014-12-18 13:57   ` Jan Beulich
  2015-01-13 13:53   ` Ian Campbell
  2014-12-11 13:45 ` [PATCH v5 5/9] libxc: support XEN_DOMCTL_devour Vitaly Kuznetsov
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 39+ messages in thread
From: Vitaly Kuznetsov @ 2014-12-11 13:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu

New operation sets the 'recipient' domain which will receive all
memory pages from a particular domain and kills the original domain.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 xen/common/domain.c         |  3 +++
 xen/common/domctl.c         | 33 +++++++++++++++++++++++++++++++++
 xen/common/page_alloc.c     | 28 ++++++++++++++++++++++++----
 xen/include/public/domctl.h | 15 +++++++++++++++
 xen/include/xen/sched.h     |  2 ++
 5 files changed, 77 insertions(+), 4 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index c13a7cf..f26267a 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -825,6 +825,9 @@ static void complete_domain_destroy(struct rcu_head *head)
     if ( d->target != NULL )
         put_domain(d->target);
 
+    if ( d->recipient != NULL )
+        put_domain(d->recipient);
+
     evtchn_destroy_final(d);
 
     radix_tree_destroy(&d->pirq_tree, free_pirq_struct);
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index f15dcfe..7e7fb47 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -1177,6 +1177,39 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
     }
     break;
 
+    case XEN_DOMCTL_devour:
+    {
+        struct domain *recipient_dom;
+
+        if ( !d->recipient )
+        {
+            recipient_dom = get_domain_by_id(op->u.devour.recipient);
+            if ( recipient_dom == NULL )
+            {
+                ret = -ESRCH;
+                break;
+            }
+
+            if ( recipient_dom->tot_pages != 0 )
+            {
+                put_domain(recipient_dom);
+                ret = -EINVAL;
+                break;
+            }
+            /*
+             * Make sure no allocation/remapping is ongoing and set is_dying
+             * flag to prevent such actions in future.
+             */
+            spin_lock(&d->page_alloc_lock);
+            d->is_dying = DOMDYING_locked;
+            d->recipient = recipient_dom;
+            smp_wmb(); /* make sure recipient was set before domain_kill() */
+            spin_unlock(&d->page_alloc_lock);
+        }
+        ret = domain_kill(d);
+    }
+    break;
+
     default:
         ret = arch_do_domctl(op, d, u_domctl);
         break;
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 7b4092d..7eb4404 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1707,6 +1707,7 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
 {
     struct domain *d = page_get_owner(pg);
     unsigned int i;
+    unsigned long mfn, gmfn;
     bool_t drop_dom_ref;
 
     ASSERT(!in_irq());
@@ -1764,13 +1765,32 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
             scrub = 1;
         }
 
-        if ( unlikely(scrub) )
-            for ( i = 0; i < (1 << order); i++ )
-                scrub_one_page(&pg[i]);
+        if ( !d || !d->recipient || d->recipient->is_dying )
+        {
+            if ( unlikely(scrub) )
+                for ( i = 0; i < (1 << order); i++ )
+                    scrub_one_page(&pg[i]);
 
-        free_heap_pages(pg, order);
+            free_heap_pages(pg, order);
+        }
+        else
+        {
+            mfn = page_to_mfn(pg);
+            gmfn = mfn_to_gmfn(d, mfn);
+
+            page_set_owner(pg, NULL);
+            if ( assign_pages(d->recipient, pg, order, 0) )
+                /* assign_pages reports the error by itself */
+                goto out;
+
+            if ( guest_physmap_add_page(d->recipient, gmfn, mfn, order) )
+                printk(XENLOG_G_INFO
+                       "Failed to add MFN %lx (GFN %lx) to Dom%d's physmap\n",
+                       mfn, gmfn, d->recipient->domain_id);
+        }
     }
 
+out:
     if ( drop_dom_ref )
         put_domain(d);
 }
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 57e2ed7..871fa5e 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -995,6 +995,19 @@ struct xen_domctl_psr_cmt_op {
 typedef struct xen_domctl_psr_cmt_op xen_domctl_psr_cmt_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_psr_cmt_op_t);
 
+/*
+ * XEN_DOMCTL_devour - kills the domain reassigning all of its domheap pages
+ * to the 'recipient' domain. Pages from xen heap belonging to the domain
+ * are not copied. Reassigned pages are mapped to the same GMFNs in the
+ * recipient domain as they were mapped in the original. The recipient domain
+ * is supposed to not have any domheap pages to avoid MFN-GMFN collisions.
+ */
+struct xen_domctl_devour {
+    domid_t recipient;
+};
+typedef struct xen_domctl_devour xen_domctl_devour_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_devour_t);
+
 struct xen_domctl {
     uint32_t cmd;
 #define XEN_DOMCTL_createdomain                   1
@@ -1070,6 +1083,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_setvnumainfo                  74
 #define XEN_DOMCTL_psr_cmt_op                    75
 #define XEN_DOMCTL_arm_configure_domain          76
+#define XEN_DOMCTL_devour                        77
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1135,6 +1149,7 @@ struct xen_domctl {
         struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
         struct xen_domctl_vnuma             vnuma;
         struct xen_domctl_psr_cmt_op        psr_cmt_op;
+        struct xen_domctl_devour            devour;
         uint8_t                             pad[128];
     } u;
 };
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index a42d0b8..bbb0505 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -366,6 +366,8 @@ struct domain
     bool_t           is_privileged;
     /* Which guest this guest has privileges on */
     struct domain   *target;
+    /* Newly created guest which receives freed memory pages (soft reset) */
+    struct domain   *recipient;
     /* Is this guest being debugged by dom0? */
     bool_t           debugger_attached;
     /* Is this guest dying (i.e., a zombie)? */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 5/9] libxc: support XEN_DOMCTL_devour
  2014-12-11 13:45 [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (3 preceding siblings ...)
  2014-12-11 13:45 ` [PATCH v5 4/9] xen: introduce XEN_DOMCTL_devour Vitaly Kuznetsov
@ 2014-12-11 13:45 ` Vitaly Kuznetsov
  2014-12-11 13:45 ` [PATCH v5 6/9] libxl: add libxl__domain_soft_reset_destroy() Vitaly Kuznetsov
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 39+ messages in thread
From: Vitaly Kuznetsov @ 2014-12-11 13:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu

Introduce new xc_domain_devour() function to support XEN_DOMCTL_devour.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/libxc/include/xenctrl.h | 14 ++++++++++++++
 tools/libxc/xc_domain.c       | 13 +++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 0ad8b8d..a789de3 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -558,6 +558,20 @@ int xc_domain_unpause(xc_interface *xch,
 int xc_domain_destroy(xc_interface *xch,
                       uint32_t domid);
 
+/**
+ * This function sets a 'recipient' domain for a domain (when the source domain
+ * releases memory it is being reassigned to the recipient domain instead of
+ * being freed) and kills the original domain. The destination domain is supposed
+ * to have enough max_mem and no pages assigned.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid the source domain id
+ * @parm recipient the destrination domain id
+ * @return 0 on success, -1 on failure
+ */
+int xc_domain_devour(xc_interface *xch,
+                     uint32_t domid, uint32_t recipient);
+
 
 /**
  * This function resumes a suspended domain. The domain should have
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index b864872..5949725 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -122,6 +122,19 @@ int xc_domain_destroy(xc_interface *xch,
     return ret;
 }
 
+int xc_domain_devour(xc_interface *xch, uint32_t domid, uint32_t recipient)
+{
+    int ret;
+    DECLARE_DOMCTL;
+    domctl.cmd = XEN_DOMCTL_devour;
+    domctl.domain = (domid_t)domid;
+    domctl.u.devour.recipient = (domid_t)recipient;
+    do {
+        ret = do_domctl(xch, &domctl);
+    } while ( ret && (errno == EAGAIN) );
+    return ret;
+}
+
 int xc_domain_shutdown(xc_interface *xch,
                        uint32_t domid,
                        int reason)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 6/9] libxl: add libxl__domain_soft_reset_destroy()
  2014-12-11 13:45 [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (4 preceding siblings ...)
  2014-12-11 13:45 ` [PATCH v5 5/9] libxc: support XEN_DOMCTL_devour Vitaly Kuznetsov
@ 2014-12-11 13:45 ` Vitaly Kuznetsov
  2015-01-13 13:58   ` Ian Campbell
  2014-12-11 13:45 ` [PATCH v5 7/9] libxc: introduce soft reset for HVM domains Vitaly Kuznetsov
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: Vitaly Kuznetsov @ 2014-12-11 13:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu

New libxl__domain_soft_reset_destroy() is an internal-only
version of libxl_domain_destroy() which follows the same domain
destroy path with the only difference: xc_domain_destroy() is
being avoided so the domain is not actually being destroyed.

Add soft_reset flag to libxl__domain_destroy_state structure
to support the change.

The original libxl_domain_destroy() function could be easily
modified to support new flag but I'm trying to avoid that as
it is part of public API.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/libxl/libxl.c          | 33 ++++++++++++++++++++++++++++-----
 tools/libxl/libxl_internal.h |  4 ++++
 2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 74c00dc..a232db1 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -1444,6 +1444,24 @@ int libxl_domain_destroy(libxl_ctx *ctx, uint32_t domid,
     return AO_INPROGRESS;
 }
 
+int libxl__domain_soft_reset_destroy(libxl__gc *gc_in, uint32_t domid,
+                                     const libxl_asyncop_how *ao_how)
+{
+    libxl_ctx *ctx = libxl__gc_owner(gc_in);
+    AO_CREATE(ctx, domid, ao_how);
+    libxl__domain_destroy_state *dds;
+
+    GCNEW(dds);
+    dds->ao = ao;
+    dds->domid = domid;
+    dds->callback = domain_destroy_cb;
+    dds->soft_reset = 1;
+    libxl__domain_destroy(egc, dds);
+
+    return AO_INPROGRESS;
+}
+
+
 static void domain_destroy_cb(libxl__egc *egc, libxl__domain_destroy_state *dds,
                               int rc)
 {
@@ -1619,6 +1637,7 @@ static void devices_destroy_cb(libxl__egc *egc,
 {
     STATE_AO_GC(drs->ao);
     libxl__destroy_domid_state *dis = CONTAINER_OF(drs, *dis, drs);
+    libxl__domain_destroy_state *dds = CONTAINER_OF(dis, *dds, domain);
     libxl_ctx *ctx = CTX;
     uint32_t domid = dis->domid;
     char *dom_path;
@@ -1657,11 +1676,15 @@ static void devices_destroy_cb(libxl__egc *egc,
     }
     libxl__userdata_destroyall(gc, domid);
 
-    rc = xc_domain_destroy(ctx->xch, domid);
-    if (rc < 0) {
-        LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, "xc_domain_destroy failed for %d", domid);
-        rc = ERROR_FAIL;
-        goto out;
+    if (!dds->soft_reset)
+    {
+        rc = xc_domain_destroy(ctx->xch, domid);
+        if (rc < 0) {
+            LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc,
+                                "xc_domain_destroy failed for %d", domid);
+            rc = ERROR_FAIL;
+            goto out;
+        }
     }
     rc = 0;
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index a38f695..d58f08a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2969,6 +2969,7 @@ struct libxl__domain_destroy_state {
     int stubdom_finished;
     libxl__destroy_domid_state domain;
     int domain_finished;
+    int soft_reset;
 };
 
 /*
@@ -3132,6 +3133,9 @@ _hidden void libxl__domain_save_device_model(libxl__egc *egc,
 
 _hidden const char *libxl__device_model_savefile(libxl__gc *gc, uint32_t domid);
 
+_hidden int libxl__domain_soft_reset_destroy(libxl__gc *gc, uint32_t domid,
+                                             const libxl_asyncop_how *ao_how);
+
 
 /*
  * Convenience macros.
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 7/9] libxc: introduce soft reset for HVM domains
  2014-12-11 13:45 [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (5 preceding siblings ...)
  2014-12-11 13:45 ` [PATCH v5 6/9] libxl: add libxl__domain_soft_reset_destroy() Vitaly Kuznetsov
@ 2014-12-11 13:45 ` Vitaly Kuznetsov
  2015-01-13 14:08   ` Ian Campbell
  2014-12-11 13:45 ` [PATCH v5 8/9] libxl: soft reset support Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: Vitaly Kuznetsov @ 2014-12-11 13:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu

Add new xc_domain_soft_reset() function which performs so-called 'soft reset'
for an HVM domain. It is being performed in the following way:
- Save HVM context and all HVM params;
- Devour original domain with XEN_DOMCTL_devour;
- Wait till original domain dies or has no pages left;
- Restore HVM context, HVM params, seed grant table.

After that the domain resumes execution from where SHUTDOWN_soft_reset was
called.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/libxc/Makefile               |   1 +
 tools/libxc/include/xenguest.h     |  20 +++
 tools/libxc/xc_domain_soft_reset.c | 282 +++++++++++++++++++++++++++++++++++++
 3 files changed, 303 insertions(+)
 create mode 100644 tools/libxc/xc_domain_soft_reset.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index bd2ca6c..8f8abd6 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -52,6 +52,7 @@ GUEST_SRCS-y += xc_offline_page.c xc_compression.c
 else
 GUEST_SRCS-y += xc_nomigrate.c
 endif
+GUEST_SRCS-y += xc_domain_soft_reset.c
 
 vpath %.c ../../xen/common/libelf
 CFLAGS += -I../../xen/common/libelf
diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 40bbac8..770cd10 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -131,6 +131,26 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
  * of the new domain is automatically appended to the filename,
  * separated by a ".".
  */
+
+/**
+ * This function does soft reset for a domain. During soft reset all
+ * source domain's memory is being reassigned to the destination domain,
+ * HVM context and HVM params are being copied.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm source_dom the id of the source domain
+ * @parm dest_dom the id of the destination domain
+ * @parm console_domid the id of the domain handling console
+ * @parm console_mfn returned with the mfn of the console page
+ * @parm store_domid the id of the domain handling store
+ * @parm store_mfn returned with the mfn of the store page
+ * @return 0 on success, -1 on failure
+ */
+int xc_domain_soft_reset(xc_interface *xch, uint32_t source_dom,
+                         uint32_t dest_dom, domid_t console_domid,
+                         unsigned long *console_mfn, domid_t store_domid,
+                         unsigned long *store_mfn);
+
 #define XC_DEVICE_MODEL_RESTORE_FILE "/var/lib/xen/qemu-resume"
 
 /**
diff --git a/tools/libxc/xc_domain_soft_reset.c b/tools/libxc/xc_domain_soft_reset.c
new file mode 100644
index 0000000..24d0b48
--- /dev/null
+++ b/tools/libxc/xc_domain_soft_reset.c
@@ -0,0 +1,282 @@
+/******************************************************************************
+ * xc_domain_soft_reset.c
+ *
+ * Do soft reset.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include <inttypes.h>
+#include <time.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/time.h>
+
+#include "xc_private.h"
+#include "xc_core.h"
+#include "xc_bitops.h"
+#include "xc_dom.h"
+#include "xg_private.h"
+#include "xg_save_restore.h"
+
+#include <xen/hvm/params.h>
+
+#define SLEEP_INT 1
+
+int xc_domain_soft_reset(xc_interface *xch, uint32_t source_dom,
+                         uint32_t dest_dom, domid_t console_domid,
+                         unsigned long *console_mfn, domid_t store_domid,
+                         unsigned long *store_mfn)
+{
+    xc_dominfo_t old_info, new_info;
+    int rc = 1;
+
+    uint32_t hvm_buf_size = 0;
+    uint8_t *hvm_buf = NULL;
+    unsigned long console_pfn, store_pfn, io_pfn, buffio_pfn;
+    unsigned long max_gpfn;
+    uint64_t hvm_params[HVM_NR_PARAMS];
+    xen_pfn_t sharedinfo_pfn;
+
+    DPRINTF("%s: soft reset domid %u -> %u", __func__, source_dom, dest_dom);
+
+    if ( xc_domain_getinfo(xch, source_dom, 1, &old_info) != 1 )
+    {
+        PERROR("Could not get old domain info");
+        return 1;
+    }
+
+    if ( xc_domain_getinfo(xch, dest_dom, 1, &new_info) != 1 )
+    {
+        PERROR("Could not get new domain info");
+        return 1;
+    }
+
+    if ( !old_info.hvm || !new_info.hvm )
+    {
+        PERROR("Soft reset is supported for HVM only");
+        return 1;
+    }
+
+    max_gpfn = xc_domain_maximum_gpfn(xch, source_dom);
+
+    sharedinfo_pfn = old_info.shared_info_frame;
+    if ( xc_get_pfn_type_batch(xch, source_dom, 1, &sharedinfo_pfn) )
+    {
+        PERROR("xc_get_pfn_type_batch failed");
+        goto out;
+    }
+
+    hvm_buf_size = xc_domain_hvm_getcontext(xch, source_dom, 0, 0);
+    if ( hvm_buf_size == -1 )
+    {
+        PERROR("Couldn't get HVM context size from Xen");
+        goto out;
+    }
+
+    hvm_buf = malloc(hvm_buf_size);
+    if ( !hvm_buf )
+    {
+        ERROR("Couldn't allocate memory");
+        goto out;
+    }
+
+    if ( xc_domain_hvm_getcontext(xch, source_dom, hvm_buf,
+                                  hvm_buf_size) == -1 )
+    {
+        PERROR("HVM:Could not get hvm buffer");
+        goto out;
+    }
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_STORE_PFN,
+                     &hvm_params[HVM_PARAM_STORE_PFN]);
+    store_pfn = hvm_params[HVM_PARAM_STORE_PFN];
+    *store_mfn = store_pfn;
+
+    xc_hvm_param_get(xch, source_dom,
+                     HVM_PARAM_CONSOLE_PFN,
+                     &hvm_params[HVM_PARAM_CONSOLE_PFN]);
+    console_pfn = hvm_params[HVM_PARAM_CONSOLE_PFN];
+    *console_mfn = console_pfn;
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_BUFIOREQ_PFN,
+                     &hvm_params[HVM_PARAM_BUFIOREQ_PFN]);
+    buffio_pfn = hvm_params[HVM_PARAM_BUFIOREQ_PFN];
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_IOREQ_PFN,
+                     &hvm_params[HVM_PARAM_IOREQ_PFN]);
+    io_pfn = hvm_params[HVM_PARAM_IOREQ_PFN];
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_IDENT_PT,
+                     &hvm_params[HVM_PARAM_IDENT_PT]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_PAGING_RING_PFN,
+                     &hvm_params[HVM_PARAM_PAGING_RING_PFN]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_ACCESS_RING_PFN,
+                     &hvm_params[HVM_PARAM_ACCESS_RING_PFN]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_VM86_TSS,
+                     &hvm_params[HVM_PARAM_VM86_TSS]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_ACPI_IOPORTS_LOCATION,
+                     &hvm_params[HVM_PARAM_ACPI_IOPORTS_LOCATION]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_VIRIDIAN,
+                     &hvm_params[HVM_PARAM_VIRIDIAN]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_PAE_ENABLED,
+                     &hvm_params[HVM_PARAM_PAE_ENABLED]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_STORE_EVTCHN,
+                     &hvm_params[HVM_PARAM_STORE_EVTCHN]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_IOREQ_SERVER_PFN,
+                     &hvm_params[HVM_PARAM_IOREQ_SERVER_PFN]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
+                     &hvm_params[HVM_PARAM_NR_IOREQ_SERVER_PAGES]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_VM_GENERATION_ID_ADDR,
+                     &hvm_params[HVM_PARAM_VM_GENERATION_ID_ADDR]);
+
+    rc = xc_domain_devour(xch, source_dom, dest_dom);
+    if ( rc != 0 )
+    {
+        PERROR("failed to devour original domain, rc=%d\n", rc);
+        goto out;
+    }
+
+    while ( 1 )
+    {
+        sleep(SLEEP_INT);
+        if ( xc_get_tot_pages(xch, source_dom) <= 0 )
+        {
+            DPRINTF("All pages were transferred");
+            break;
+        }
+    }
+
+
+    if ( sharedinfo_pfn == XEN_DOMCTL_PFINFO_XTAB)
+    {
+        /*
+         * Shared info frame is being removed when guest maps shared info so
+         * this page is likely XEN_DOMCTL_PFINFO_XTAB but we need to replace
+         * it with an empty page in that case.
+         */
+
+        if ( xc_domain_populate_physmap_exact(xch, dest_dom, 1, 0, 0,
+                                              &old_info.shared_info_frame) )
+        {
+            PERROR("failed to populate pfn %lx (shared info)", old_info.shared_info_frame);
+            goto out;
+        }
+    }
+
+    if ( xc_domain_hvm_setcontext(xch, dest_dom, hvm_buf,
+                                  hvm_buf_size) == -1 )
+    {
+        PERROR("HVM:Could not set hvm buffer");
+        goto out;
+    }
+
+    if ( store_pfn )
+        xc_clear_domain_page(xch, dest_dom, store_pfn);
+
+    if ( console_pfn )
+        xc_clear_domain_page(xch, dest_dom, console_pfn);
+
+    if ( buffio_pfn )
+        xc_clear_domain_page(xch, dest_dom, buffio_pfn);
+
+    if ( io_pfn )
+        xc_clear_domain_page(xch, dest_dom, io_pfn);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_STORE_PFN,
+                     hvm_params[HVM_PARAM_STORE_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom,
+                     HVM_PARAM_CONSOLE_PFN,
+                     hvm_params[HVM_PARAM_CONSOLE_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_BUFIOREQ_PFN,
+                     hvm_params[HVM_PARAM_BUFIOREQ_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_IOREQ_PFN,
+                     hvm_params[HVM_PARAM_IOREQ_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_IDENT_PT,
+                     hvm_params[HVM_PARAM_IDENT_PT]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_PAGING_RING_PFN,
+                     hvm_params[HVM_PARAM_PAGING_RING_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_ACCESS_RING_PFN,
+                     hvm_params[HVM_PARAM_ACCESS_RING_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_VM86_TSS,
+                     hvm_params[HVM_PARAM_VM86_TSS]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_ACPI_IOPORTS_LOCATION,
+                     hvm_params[HVM_PARAM_ACPI_IOPORTS_LOCATION]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_VIRIDIAN,
+                     hvm_params[HVM_PARAM_VIRIDIAN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_PAE_ENABLED,
+                     hvm_params[HVM_PARAM_PAE_ENABLED]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_STORE_EVTCHN,
+                     hvm_params[HVM_PARAM_STORE_EVTCHN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_IOREQ_SERVER_PFN,
+                     hvm_params[HVM_PARAM_IOREQ_SERVER_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
+                     hvm_params[HVM_PARAM_NR_IOREQ_SERVER_PAGES]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_VM_GENERATION_ID_ADDR,
+                     hvm_params[HVM_PARAM_VM_GENERATION_ID_ADDR]);
+
+    if (xc_dom_gnttab_hvm_seed(xch, dest_dom, console_pfn, store_pfn,
+                               console_domid, store_domid))
+    {
+        PERROR("error seeding hvm grant table");
+        goto out;
+    }
+
+    rc = 0;
+out:
+    if (hvm_buf) free(hvm_buf);
+
+    if ( (rc != 0) && (dest_dom != 0) ) {
+            PERROR("Faled to perform soft reset, destroying domain %d",
+                   dest_dom);
+	    xc_domain_destroy(xch, dest_dom);
+    }
+
+    return !!rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 8/9] libxl: soft reset support
  2014-12-11 13:45 [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (6 preceding siblings ...)
  2014-12-11 13:45 ` [PATCH v5 7/9] libxc: introduce soft reset for HVM domains Vitaly Kuznetsov
@ 2014-12-11 13:45 ` Vitaly Kuznetsov
  2015-01-13 14:21   ` Ian Campbell
  2014-12-11 13:45 ` [PATCH v5 9/9] xsm: add XEN_DOMCTL_devour support Vitaly Kuznetsov
  2015-01-05 12:46 ` [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec Wei Liu
  9 siblings, 1 reply; 39+ messages in thread
From: Vitaly Kuznetsov @ 2014-12-11 13:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu

Perform soft reset when a domain did SHUTDOWN_soft_reset. Migrate the
content with xc_domain_soft_reset(), reload dm and toolstack.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 docs/man/xl.cfg.pod.5        |  12 +++++
 tools/libxl/libxl.h          |   6 +++
 tools/libxl/libxl_create.c   | 103 +++++++++++++++++++++++++++++++++++++++----
 tools/libxl/libxl_internal.h |  26 +++++++++++
 tools/libxl/libxl_types.idl  |   3 ++
 tools/libxl/xl_cmdimpl.c     |  35 ++++++++++++++-
 6 files changed, 174 insertions(+), 11 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 622ea53..8b57643 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -306,6 +306,13 @@ destroy the domain.
 write a "coredump" of the domain to F</var/xen/dump/NAME> and then
 restart the domain.
 
+=item B<soft-reset>
+
+create a new domain with the same configuration, reassign all the domain's
+memory to this new domain, kill the original domain, and continue execution
+of the new domain from where the action was triggered. Supported for HVM
+guests only.
+
 =back
 
 The default for C<on_poweroff> is C<destroy>.
@@ -324,6 +331,11 @@ Default is C<destroy>.
 
 Action to take if the domain crashes.  Default is C<destroy>.
 
+=item B<on_soft_reset="ACTION">
+
+Action to take if the domain performs 'soft reset' (e.g. does kexec).
+Default is C<soft-reset>.
+
 =back
 
 =head3 Direct Kernel Boot
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 0a123f1..710dc0e 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -929,6 +929,12 @@ int static inline libxl_domain_create_restore_0x040200(
 
 #endif
 
+int libxl_domain_soft_reset(libxl_ctx *ctx, libxl_domain_config *d_config,
+                            uint32_t *domid, uint32_t domid_old,
+                            const libxl_asyncop_how *ao_how,
+                            const libxl_asyncprogress_how *aop_console_how)
+                            LIBXL_EXTERNAL_CALLERS_ONLY;
+
   /* A progress report will be made via ao_console_how, of type
    * domain_create_console_available, when the domain's primary
    * console is available and can be connected to.
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 1198225..0a840c9 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -25,6 +25,8 @@
 #include <xen/hvm/hvm_info_table.h>
 #include <xen/hvm/e820.h>
 
+#define INVALID_DOMID ~0
+
 int libxl__domain_create_info_setdefault(libxl__gc *gc,
                                          libxl_domain_create_info *c_info)
 {
@@ -903,6 +905,9 @@ static void initiate_domain_create(libxl__egc *egc,
     if (restore_fd >= 0) {
         LOG(DEBUG, "restoring, not running bootloader");
         domcreate_bootloader_done(egc, &dcs->bl, 0);
+    } else if (dcs->domid_soft_reset != INVALID_DOMID) {
+        LOG(DEBUG, "soft reset, not running bootloader\n");
+        domcreate_bootloader_done(egc, &dcs->bl, 0);
     } else  {
         LOG(DEBUG, "running bootloader");
         dcs->bl.callback = domcreate_bootloader_done;
@@ -951,6 +956,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     libxl_domain_config *const d_config = dcs->guest_config;
     libxl_domain_build_info *const info = &d_config->b_info;
     const int restore_fd = dcs->restore_fd;
+    const uint32_t domid_soft_reset = dcs->domid_soft_reset;
     libxl__domain_build_state *const state = &dcs->build_state;
     libxl__srm_restore_autogen_callbacks *const callbacks =
         &dcs->shs.callbacks.restore.a;
@@ -974,7 +980,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     dcs->dmss.dm.callback = domcreate_devmodel_started;
     dcs->dmss.callback = domcreate_devmodel_started;
 
-    if ( restore_fd < 0 ) {
+    if ( (restore_fd < 0) && (domid_soft_reset == INVALID_DOMID) ) {
         rc = libxl__domain_build(gc, d_config, domid, state);
         domcreate_rebuild_done(egc, dcs, rc);
         return;
@@ -1004,14 +1010,74 @@ static void domcreate_bootloader_done(libxl__egc *egc,
         rc = ERROR_INVAL;
         goto out;
     }
-    libxl__xc_domain_restore(egc, dcs,
-                             hvm, pae, superpages);
+    if ( restore_fd >= 0 ) {
+        libxl__xc_domain_restore(egc, dcs,
+                                 hvm, pae, superpages);
+    } else {
+        libxl__xc_domain_soft_reset(egc, dcs);
+    }
+
     return;
 
  out:
     libxl__xc_domain_restore_done(egc, dcs, rc, 0, 0);
 }
 
+void libxl__xc_domain_soft_reset(libxl__egc *egc,
+                                 libxl__domain_create_state *dcs)
+{
+    STATE_AO_GC(dcs->ao);
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+    const uint32_t domid_soft_reset = dcs->domid_soft_reset;
+    const uint32_t domid = dcs->guest_domid;
+    libxl_domain_config *const d_config = dcs->guest_config;
+    libxl_domain_build_info *const info = &d_config->b_info;
+    uint8_t *buf;
+    uint32_t len;
+    uint32_t console_domid, store_domid;
+    unsigned long store_mfn, console_mfn;
+    int rc;
+    struct libxl__domain_suspend_state *dss;
+
+    GCNEW(dss);
+
+    dss->ao = ao;
+    dss->domid = domid_soft_reset;
+    dss->dm_savefile = GCSPRINTF("/var/lib/xen/qemu-save.%d",
+                                 domid_soft_reset);
+
+    if (info->type == LIBXL_DOMAIN_TYPE_HVM) {
+        rc = libxl__domain_suspend_device_model(gc, dss);
+        if (rc) goto out;
+    }
+
+    console_domid = dcs->build_state.console_domid;
+    store_domid = dcs->build_state.store_domid;
+
+    libxl__domain_soft_reset_destroy(gc, domid_soft_reset, 0);
+
+    rc = xc_domain_soft_reset(ctx->xch, domid_soft_reset, domid, console_domid,
+                              &console_mfn, store_domid, &store_mfn);
+    if (rc) goto out;
+
+    libxl__qmp_cleanup(gc, domid_soft_reset);
+
+    dcs->build_state.store_mfn = store_mfn;
+    dcs->build_state.console_mfn = console_mfn;
+
+    rc = libxl__toolstack_save(domid_soft_reset, &buf, &len, dss);
+    if (rc) goto out;
+
+    rc = libxl__toolstack_restore(domid, buf, len, &dcs->shs);
+    if (rc) goto out;
+out:
+    /*
+     * Now pretend we did normal restore and simply call
+     * libxl__xc_domain_restore_done().
+     */
+    libxl__xc_domain_restore_done(egc, dcs, rc, 0, 0);
+}
+
 void libxl__srm_callout_callback_restore_results(unsigned long store_mfn,
           unsigned long console_mfn, void *user)
 {
@@ -1037,6 +1103,7 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
 
     /* convenience aliases */
     const uint32_t domid = dcs->guest_domid;
+    const uint32_t domid_soft_reset = dcs->domid_soft_reset;
     libxl_domain_config *const d_config = dcs->guest_config;
     libxl_domain_build_info *const info = &d_config->b_info;
     libxl__domain_build_state *const state = &dcs->build_state;
@@ -1089,9 +1156,12 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
     if (ret)
         goto out;
 
-    if (info->type == LIBXL_DOMAIN_TYPE_HVM) {
+    if (info->type == LIBXL_DOMAIN_TYPE_HVM && fd != -1) {
         state->saved_state = GCSPRINTF(
                        XC_DEVICE_MODEL_RESTORE_FILE".%d", domid);
+    } else if (domid_soft_reset != INVALID_DOMID) {
+        state->saved_state = GCSPRINTF(
+                       "/var/lib/xen/qemu-save.%d", domid_soft_reset);
     }
 
 out:
@@ -1100,9 +1170,12 @@ out:
         libxl__file_reference_unmap(&state->pv_ramdisk);
     }
 
-    esave = errno;
-    libxl_fd_set_nonblock(ctx, fd, 0);
-    errno = esave;
+    if ( fd != -1 ) {
+        esave = errno;
+        libxl_fd_set_nonblock(ctx, fd, 0);
+        errno = esave;
+    }
+
     domcreate_rebuild_done(egc, dcs, ret);
 }
 
@@ -1495,6 +1568,7 @@ static void domain_create_cb(libxl__egc *egc,
 static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
                             uint32_t *domid,
                             int restore_fd, int checkpointed_stream,
+                            uint32_t domid_old,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
@@ -1507,6 +1581,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
     libxl_domain_config_init(&cdcs->dcs.guest_config_saved);
     libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config);
     cdcs->dcs.restore_fd = restore_fd;
+    cdcs->dcs.domid_soft_reset = domid_old;
     cdcs->dcs.callback = domain_create_cb;
     cdcs->dcs.checkpointed_stream = checkpointed_stream;
     libxl__ao_progress_gethow(&cdcs->dcs.aop_console_how, aop_console_how);
@@ -1535,7 +1610,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
-    return do_domain_create(ctx, d_config, domid, -1, 0,
+    return do_domain_create(ctx, d_config, domid, -1, 0, INVALID_DOMID,
                             ao_how, aop_console_how);
 }
 
@@ -1546,7 +1621,17 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 const libxl_asyncprogress_how *aop_console_how)
 {
     return do_domain_create(ctx, d_config, domid, restore_fd,
-                            params->checkpointed_stream, ao_how, aop_console_how);
+                            params->checkpointed_stream, INVALID_DOMID,
+                            ao_how, aop_console_how);
+}
+
+int libxl_domain_soft_reset(libxl_ctx *ctx, libxl_domain_config *d_config,
+                            uint32_t *domid, uint32_t domid_old,
+                            const libxl_asyncop_how *ao_how,
+                            const libxl_asyncprogress_how *aop_console_how)
+{
+    return do_domain_create(ctx, d_config, domid, -1, 0, domid_old,
+                            ao_how, aop_console_how);
 }
 
 /*
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index d58f08a..b149c51 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3069,6 +3069,7 @@ struct libxl__domain_create_state {
     libxl_domain_config *guest_config;
     libxl_domain_config guest_config_saved; /* vanilla config */
     int restore_fd;
+    uint32_t domid_soft_reset;
     libxl__domain_create_cb *callback;
     libxl_asyncprogress_how aop_console_how;
     /* private to domain_create */
@@ -3133,6 +3134,31 @@ _hidden void libxl__domain_save_device_model(libxl__egc *egc,
 
 _hidden const char *libxl__device_model_savefile(libxl__gc *gc, uint32_t domid);
 
+/*
+ * Soft reset is a special type of reset when the new domain is being built with
+ * the memory contents and vCPU contexts of the original domain thus allowing
+ * to continue execution from where the hypercall was done. This is supported
+ * for HVM domains only and is done as a modification for the domain create /
+ * restore path within xl:
+ * domcreate_bootloader_done()
+ *     libxl__xc_domain_soft_reset()
+ *         for the original domain:
+ *           libxl__domain_suspend_device_model()
+ *           libxl__domain_soft_reset_destroy()
+ *         xc_domain_soft_reset() which saves HVM context and HVM params, calls
+ *             XEN_DOMCTL_devour destroying the original domain and waiting till
+ *             all memory migrates to the new domain, restoring HVM context and
+ *             HVM params for the new domain
+ *         for the new domain:
+ *           libxl__toolstack_save()
+ *           libxl__toolstack_restore()
+ *         libxl__xc_domain_restore_done() and following the standard 'restore'
+ *             path.
+ */
+
+_hidden void libxl__xc_domain_soft_reset(libxl__egc *egc,
+                                         libxl__domain_create_state *dcs);
+
 _hidden int libxl__domain_soft_reset_destroy(libxl__gc *gc, uint32_t domid,
                                              const libxl_asyncop_how *ao_how);
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 4a0e2be..10ef652 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -121,6 +121,8 @@ libxl_action_on_shutdown = Enumeration("action_on_shutdown", [
 
     (5, "COREDUMP_DESTROY"),
     (6, "COREDUMP_RESTART"),
+
+    (7, "SOFT_RESET"),
     ], init_val = "LIBXL_ACTION_ON_SHUTDOWN_DESTROY")
 
 libxl_trigger = Enumeration("trigger", [
@@ -558,6 +560,7 @@ libxl_domain_config = Struct("domain_config", [
     ("on_reboot", libxl_action_on_shutdown),
     ("on_watchdog", libxl_action_on_shutdown),
     ("on_crash", libxl_action_on_shutdown),
+    ("on_soft_reset", libxl_action_on_shutdown),
     ], dir=DIR_IN)
 
 libxl_diskinfo = Struct("diskinfo", [
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index b193c3c..d3ce409 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -130,6 +130,8 @@ static const char *action_on_shutdown_names[] = {
 
     [LIBXL_ACTION_ON_SHUTDOWN_COREDUMP_DESTROY] = "coredump-destroy",
     [LIBXL_ACTION_ON_SHUTDOWN_COREDUMP_RESTART] = "coredump-restart",
+
+    [LIBXL_ACTION_ON_SHUTDOWN_SOFT_RESET] = "soft-reset",
 };
 
 /* Optional data, in order:
@@ -1067,6 +1069,13 @@ static void parse_config_data(const char *config_source,
         exit(1);
     }
 
+    if (xlu_cfg_get_string (config, "on_soft_reset", &buf, 0))
+        buf = "soft-reset";
+    if (!parse_action_on_shutdown(buf, &d_config->on_soft_reset)) {
+        fprintf(stderr, "Unknown on_soft_reset action \"%s\" specified\n", buf);
+        exit(1);
+    }
+
     /* libxl_get_required_shadow_memory() must be called after final values
      * (default or specified) for vcpus and memory are set, because the
      * calculation depends on those values. */
@@ -2052,7 +2061,8 @@ static void reload_domain_config(uint32_t domid,
 }
 
 /* Returns 1 if domain should be restarted,
- * 2 if domain should be renamed then restarted, or 0
+ * 2 if domain should be renamed then restarted,
+ * 3 if domain performed soft reset, or 0
  * Can update r_domid if domain is destroyed etc */
 static int handle_domain_death(uint32_t *r_domid,
                                libxl_event *event,
@@ -2078,6 +2088,9 @@ static int handle_domain_death(uint32_t *r_domid,
     case LIBXL_SHUTDOWN_REASON_WATCHDOG:
         action = d_config->on_watchdog;
         break;
+    case LIBXL_SHUTDOWN_REASON_SOFT_RESET:
+        action = d_config->on_soft_reset;
+        break;
     default:
         LOG("Unknown shutdown reason code %d. Destroying domain.",
             event->u.domain_shutdown.shutdown_reason);
@@ -2128,6 +2141,11 @@ static int handle_domain_death(uint32_t *r_domid,
         *r_domid = INVALID_DOMID;
         break;
 
+    case LIBXL_ACTION_ON_SHUTDOWN_SOFT_RESET:
+        reload_domain_config(*r_domid, d_config);
+        restart = 3;
+        break;
+
     case LIBXL_ACTION_ON_SHUTDOWN_COREDUMP_DESTROY:
     case LIBXL_ACTION_ON_SHUTDOWN_COREDUMP_RESTART:
         /* Already handled these above. */
@@ -2294,6 +2312,7 @@ static void evdisable_disk_ejects(libxl_evgen_disk_eject **diskws,
 static uint32_t create_domain(struct domain_create *dom_info)
 {
     uint32_t domid = INVALID_DOMID;
+    uint32_t domid_old = INVALID_DOMID;
 
     libxl_domain_config d_config;
 
@@ -2519,7 +2538,17 @@ start:
          * restore/migrate-receive it again.
          */
         restoring = 0;
-    }else{
+    } else if (domid_old != INVALID_DOMID) {
+        /* Do soft reset */
+        ret = libxl_domain_soft_reset(ctx, &d_config,
+                                      &domid, domid_old,
+                                      0, 0);
+
+        if ( ret ) {
+            goto error_out;
+        }
+        domid_old = INVALID_DOMID;
+    } else {
         ret = libxl_domain_create_new(ctx, &d_config, &domid,
                                       0, autoconnect_console_how);
     }
@@ -2583,6 +2612,8 @@ start:
                 event->u.domain_shutdown.shutdown_reason,
                 event->u.domain_shutdown.shutdown_reason);
             switch (handle_domain_death(&domid, event, &d_config)) {
+            case 3:
+                domid_old = domid;
             case 2:
                 if (!preserve_domain(&domid, event, &d_config)) {
                     /* If we fail then exit leaving the old domain in place. */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 9/9] xsm: add XEN_DOMCTL_devour support
  2014-12-11 13:45 [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (7 preceding siblings ...)
  2014-12-11 13:45 ` [PATCH v5 8/9] libxl: soft reset support Vitaly Kuznetsov
@ 2014-12-11 13:45 ` Vitaly Kuznetsov
  2014-12-18 13:59   ` Jan Beulich
  2015-01-05 12:46 ` [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec Wei Liu
  9 siblings, 1 reply; 39+ messages in thread
From: Vitaly Kuznetsov @ 2014-12-11 13:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 xen/common/domctl.c                 |  6 ++++++
 xen/include/xsm/dummy.h             |  6 ++++++
 xen/include/xsm/xsm.h               |  6 ++++++
 xen/xsm/dummy.c                     |  1 +
 xen/xsm/flask/hooks.c               | 17 +++++++++++++++++
 xen/xsm/flask/policy/access_vectors | 10 ++++++++++
 6 files changed, 46 insertions(+)

diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 7e7fb47..7c22e35 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -1190,6 +1190,12 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
                 break;
             }
 
+            ret = xsm_devour(XSM_HOOK, d, recipient_dom);
+            if ( ret ) {
+                put_domain(recipient_dom);
+                break;
+            }
+
             if ( recipient_dom->tot_pages != 0 )
             {
                 put_domain(recipient_dom);
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index f20e89c..6e9e38b 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -113,6 +113,12 @@ static XSM_INLINE int xsm_set_target(XSM_DEFAULT_ARG struct domain *d, struct do
     return xsm_default_action(action, current->domain, NULL);
 }
 
+static XSM_INLINE int xsm_devour(XSM_DEFAULT_ARG struct domain *d, struct domain *e)
+{
+    XSM_ASSERT_ACTION(XSM_HOOK);
+    return xsm_default_action(action, current->domain, NULL);
+}
+
 static XSM_INLINE int xsm_domctl(XSM_DEFAULT_ARG struct domain *d, int cmd)
 {
     XSM_ASSERT_ACTION(XSM_OTHER);
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 4ce089f..7db7433 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -58,6 +58,7 @@ struct xsm_operations {
     int (*domctl_scheduler_op) (struct domain *d, int op);
     int (*sysctl_scheduler_op) (int op);
     int (*set_target) (struct domain *d, struct domain *e);
+    int (*devour) (struct domain *d, struct domain *e);
     int (*domctl) (struct domain *d, int cmd);
     int (*sysctl) (int cmd);
     int (*readconsole) (uint32_t clear);
@@ -213,6 +214,11 @@ static inline int xsm_set_target (xsm_default_t def, struct domain *d, struct do
     return xsm_ops->set_target(d, e);
 }
 
+static inline int xsm_devour (xsm_default_t def, struct domain *d, struct domain *r)
+{
+    return xsm_ops->devour(d, r);
+}
+
 static inline int xsm_domctl (xsm_default_t def, struct domain *d, int cmd)
 {
     return xsm_ops->domctl(d, cmd);
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 8eb3050..f3c2f9e 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -35,6 +35,7 @@ void xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, domctl_scheduler_op);
     set_to_dummy_if_null(ops, sysctl_scheduler_op);
     set_to_dummy_if_null(ops, set_target);
+    set_to_dummy_if_null(ops, devour);
     set_to_dummy_if_null(ops, domctl);
     set_to_dummy_if_null(ops, sysctl);
     set_to_dummy_if_null(ops, readconsole);
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index d48463f..097c8c2 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -565,6 +565,21 @@ static int flask_set_target(struct domain *d, struct domain *t)
     return rc;
 }
 
+static int flask_devour(struct domain *d, struct domain *r)
+{
+    int rc;
+    struct domain_security_struct *dsec, *rsec;
+    dsec = d->ssid;
+    rsec = r->ssid;
+
+    rc = current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__SET_AS_SOURCE);
+    if ( rc )
+        return rc;
+    if ( r )
+        rc = current_has_perm(r, SECCLASS_DOMAIN2, DOMAIN2__SET_AS_RECIPIENT);
+    return rc;
+}
+
 static int flask_domctl(struct domain *d, int cmd)
 {
     switch ( cmd )
@@ -580,6 +595,7 @@ static int flask_domctl(struct domain *d, int cmd)
 #ifdef HAS_MEM_ACCESS
     case XEN_DOMCTL_mem_event_op:
 #endif
+    case XEN_DOMCTL_devour:
 #ifdef CONFIG_X86
     /* These have individual XSM hooks (arch/x86/domctl.c) */
     case XEN_DOMCTL_shadow_op:
@@ -1512,6 +1528,7 @@ static struct xsm_operations flask_ops = {
     .domctl_scheduler_op = flask_domctl_scheduler_op,
     .sysctl_scheduler_op = flask_sysctl_scheduler_op,
     .set_target = flask_set_target,
+    .devour = flask_devour,
     .domctl = flask_domctl,
     .sysctl = flask_sysctl,
     .readconsole = flask_readconsole,
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 1da9f63..64c3424 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -142,6 +142,8 @@ class domain
 #  target = the new target domain
 # see also the domain2 make_priv_for and set_as_target checks
     set_target
+# XEN_DOMCTL_devour
+    devour
 # SCHEDOP_remote_shutdown
     shutdown
 # XEN_DOMCTL_set{,_machine}_address_size
@@ -196,6 +198,14 @@ class domain2
 #  source = the domain making the hypercall
 #  target = the new target domain
     set_as_target
+# checked in XEN_DOMCTL_devour:
+#  source = the domain making the hypercall
+#  target = the new source domain
+    set_as_source
+# checked in XEN_DOMCTL_devour:
+#  source = the domain making the hypercall
+#  target = the new recipient domain
+    set_as_recipient
 # XEN_DOMCTL_set_cpuid
     set_cpuid
 # XEN_DOMCTL_gettscinfo
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 1/9] xen: introduce DOMDYING_locked state
  2014-12-11 13:45 ` [PATCH v5 1/9] xen: introduce DOMDYING_locked state Vitaly Kuznetsov
@ 2014-12-18 13:23   ` Jan Beulich
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2014-12-18 13:23 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	Tim Deegan, David Vrabel, xen-devel, Keir Fraser

>>> On 11.12.14 at 14:45, <vkuznets@redhat.com> wrote:
> New dying state is requred to indicate that a particular domain
> is dying but cleanup procedure wasn't started. This state can be
> set from outside of domain_kill().

Without any user of the new state (yet), please be more verbose
here to explain what the intended use is (also allowing to
understand the naming choice).

Jan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 2/9] xen: introduce SHUTDOWN_soft_reset shutdown reason
  2014-12-11 13:45 ` [PATCH v5 2/9] xen: introduce SHUTDOWN_soft_reset shutdown reason Vitaly Kuznetsov
@ 2014-12-18 13:28   ` Jan Beulich
  2015-01-13 12:20   ` Ian Campbell
  1 sibling, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2014-12-18 13:28 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	Tim Deegan, David Vrabel, xen-devel, Keir Fraser

>>> On 11.12.14 at 14:45, <vkuznets@redhat.com> wrote:
> --- a/xen/common/shutdown.c
> +++ b/xen/common/shutdown.c
> @@ -71,6 +71,13 @@ void hwdom_shutdown(u8 reason)
>          break; /* not reached */
>      }
>  
> +    case SHUTDOWN_soft_reset:
> +    {
> +        printk("Domain 0 did soft reset but it is unsupported, rebooting.\n");

Please shorten the message by e.g. dropping "did" and "but it is",
and please also don't add pointless braces around this code (no
matter that other code above/below does so - actually the whole
function could use an overhaul, dropping these braces and
replacing the mention of "Domain 0" with either "Hardware domain"
or "Dom%d" using hwdom's correct ID).

Jan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 4/9] xen: introduce XEN_DOMCTL_devour
  2014-12-11 13:45 ` [PATCH v5 4/9] xen: introduce XEN_DOMCTL_devour Vitaly Kuznetsov
@ 2014-12-18 13:57   ` Jan Beulich
  2015-01-13 12:26     ` Ian Campbell
  2015-01-13 13:53   ` Ian Campbell
  1 sibling, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2014-12-18 13:57 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	Tim Deegan, David Vrabel, xen-devel, Keir Fraser

>>> On 11.12.14 at 14:45, <vkuznets@redhat.com> wrote:
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -1177,6 +1177,39 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>      }
>      break;
>  
> +    case XEN_DOMCTL_devour:
> +    {
> +        struct domain *recipient_dom;
> +
> +        if ( !d->recipient )
> +        {
> +            recipient_dom = get_domain_by_id(op->u.devour.recipient);
> +            if ( recipient_dom == NULL )
> +            {
> +                ret = -ESRCH;
> +                break;
> +            }
> +
> +            if ( recipient_dom->tot_pages != 0 )
> +            {
> +                put_domain(recipient_dom);
> +                ret = -EINVAL;
> +                break;
> +            }
> +            /*
> +             * Make sure no allocation/remapping is ongoing and set is_dying
> +             * flag to prevent such actions in future.
> +             */
> +            spin_lock(&d->page_alloc_lock);
> +            d->is_dying = DOMDYING_locked;
> +            d->recipient = recipient_dom;

Is d == recipient_dom a valid case (not leading to any issues)?

> +            smp_wmb(); /* make sure recipient was set before domain_kill() */

The spin_unlock() guarantees ordering already.

> +            spin_unlock(&d->page_alloc_lock);
> +        }
> +        ret = domain_kill(d);

Do you really mean to simply kill the domain when d->recipient was
already set on entry?

Also I would strongly suggest having this fall through into the
XEN_DOMCTL_destroydomain case - just look there for what code
you're missing right now. Of course the continuation then
shouldn't go through the whole if() above anymore, i.e. you may
want to permit the operation to succeed when d->recipient ==
recipient_dom.

> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1707,6 +1707,7 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
>  {
>      struct domain *d = page_get_owner(pg);
>      unsigned int i;
> +    unsigned long mfn, gmfn;

Please add these variable in the scope you need them in.

> @@ -1764,13 +1765,32 @@ void free_domheap_pages(struct page_info *pg, 
> unsigned int order)
>              scrub = 1;
>          }
>  
> -        if ( unlikely(scrub) )
> -            for ( i = 0; i < (1 << order); i++ )
> -                scrub_one_page(&pg[i]);
> +        if ( !d || !d->recipient || d->recipient->is_dying )
> +        {
> +            if ( unlikely(scrub) )
> +                for ( i = 0; i < (1 << order); i++ )
> +                    scrub_one_page(&pg[i]);
>  
> -        free_heap_pages(pg, order);
> +            free_heap_pages(pg, order);
> +        }
> +        else
> +        {
> +            mfn = page_to_mfn(pg);
> +            gmfn = mfn_to_gmfn(d, mfn);
> +
> +            page_set_owner(pg, NULL);

This needs to be done more than once when order > 0, or else
you may trigger an ASSERT() in assign_pages().

> +            if ( assign_pages(d->recipient, pg, order, 0) )
> +                /* assign_pages reports the error by itself */
> +                goto out;
> +
> +            if ( guest_physmap_add_page(d->recipient, gmfn, mfn, order) )
> +                printk(XENLOG_G_INFO
> +                       "Failed to add MFN %lx (GFN %lx) to Dom%d's physmap\n",
> +                       mfn, gmfn, d->recipient->domain_id);

And that's it? I understand that you can't propagate the error, but
wouldn't you better crash the recipient domain instead of leaving it
in an inconsistent state?

Also the message would better be more specific - e.g.

"Failed to re-add Dom%d's GFN %lx (MFN %lx) to Dom%d\n"

> +        }
>      }
>  
> +out:

You could do well without this label (in particular I think what you add
as else path would better move into to if() case several lines up, in
which case the use of a label may then indeed be warranted), but if
you really need one, please make sure it is indented by at least one
space.

Jan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 9/9] xsm: add XEN_DOMCTL_devour support
  2014-12-11 13:45 ` [PATCH v5 9/9] xsm: add XEN_DOMCTL_devour support Vitaly Kuznetsov
@ 2014-12-18 13:59   ` Jan Beulich
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2014-12-18 13:59 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	Tim Deegan, David Vrabel, xen-devel, Keir Fraser

>>> On 11.12.14 at 14:45, <vkuznets@redhat.com> wrote:
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -1190,6 +1190,12 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>                  break;
>              }
>  
> +            ret = xsm_devour(XSM_HOOK, d, recipient_dom);
> +            if ( ret ) {

Provided such a hook is needed in the first place, please move the
opening brace on its own line.

Jan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec
  2014-12-11 13:45 [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (8 preceding siblings ...)
  2014-12-11 13:45 ` [PATCH v5 9/9] xsm: add XEN_DOMCTL_devour support Vitaly Kuznetsov
@ 2015-01-05 12:46 ` Wei Liu
  2015-01-05 13:00   ` Vitaly Kuznetsov
  9 siblings, 1 reply; 39+ messages in thread
From: Wei Liu @ 2015-01-05 12:46 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Wei Liu, Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, xen-devel

Olaf mentioned his concern about handling ballooned pages in
<20141211153029.GA1772@aepfle.de>. Is that point moot now?

Wei.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec
  2015-01-05 12:46 ` [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec Wei Liu
@ 2015-01-05 13:00   ` Vitaly Kuznetsov
  2015-01-07  9:10     ` Olaf Hering
  0 siblings, 1 reply; 39+ messages in thread
From: Vitaly Kuznetsov @ 2015-01-05 13:00 UTC (permalink / raw)
  To: Wei Liu
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, xen-devel

Wei Liu <wei.liu2@citrix.com> writes:

> Olaf mentioned his concern about handling ballooned pages in
> <20141211153029.GA1772@aepfle.de>. Is that point moot now?

Well, the limitation is real and some guest-side handling will be
required in case we want to support kexec with ballooning. But as David
validly mentioned "It's the responsibility of the guest to ensure it
either doesn't kexec when it is ballooned or that the kexec kernel can
handle this". Not sure if we can (and need to) do anything hypevisor- or
toolstack-side.

Anyway, I have to address Jan's comments to my v5 series and I wanted to
play with ballooning a bit before sending v6. This work is pending.

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec
  2015-01-05 13:00   ` Vitaly Kuznetsov
@ 2015-01-07  9:10     ` Olaf Hering
  2015-01-07 10:41       ` David Vrabel
  2015-01-07 10:49       ` Vitaly Kuznetsov
  0 siblings, 2 replies; 39+ messages in thread
From: Olaf Hering @ 2015-01-07  9:10 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andrew Jones, Julien Grall, Wei Liu, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Tim Deegan,
	David Vrabel, Jan Beulich, xen-devel, Keir Fraser

On Mon, Jan 05, Vitaly Kuznetsov wrote:

> Wei Liu <wei.liu2@citrix.com> writes:
> 
> > Olaf mentioned his concern about handling ballooned pages in
> > <20141211153029.GA1772@aepfle.de>. Is that point moot now?
> 
> Well, the limitation is real and some guest-side handling will be
> required in case we want to support kexec with ballooning. But as David
> validly mentioned "It's the responsibility of the guest to ensure it
> either doesn't kexec when it is ballooned or that the kexec kernel can
> handle this". Not sure if we can (and need to) do anything hypevisor- or
> toolstack-side.

One approach would be to mark all pages as some sort of
populate-on-demand first. Then copy the existing assigned pages from
domA to domB and update the page type. The remaining pages are likely
ballooned. Once the guest tries to access them this should give the
hypervisor and/or toolstack a chance to assign a real RAM page to them.

I mean, if a host-assisted approach for kexec is implemented then this
approach must also cover ballooning.


Olaf

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec
  2015-01-07  9:10     ` Olaf Hering
@ 2015-01-07 10:41       ` David Vrabel
  2015-01-07 10:54         ` Jan Beulich
                           ` (2 more replies)
  2015-01-07 10:49       ` Vitaly Kuznetsov
  1 sibling, 3 replies; 39+ messages in thread
From: David Vrabel @ 2015-01-07 10:41 UTC (permalink / raw)
  To: Olaf Hering, Vitaly Kuznetsov
  Cc: Andrew Jones, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Tim Deegan,
	David Vrabel, Jan Beulich, xen-devel, Keir Fraser

On 07/01/15 09:10, Olaf Hering wrote:
> On Mon, Jan 05, Vitaly Kuznetsov wrote:
> 
>> Wei Liu <wei.liu2@citrix.com> writes:
>>
>>> Olaf mentioned his concern about handling ballooned pages in
>>> <20141211153029.GA1772@aepfle.de>. Is that point moot now?
>>
>> Well, the limitation is real and some guest-side handling will be
>> required in case we want to support kexec with ballooning. But as David
>> validly mentioned "It's the responsibility of the guest to ensure it
>> either doesn't kexec when it is ballooned or that the kexec kernel can
>> handle this". Not sure if we can (and need to) do anything hypevisor- or
>> toolstack-side.
> 
> One approach would be to mark all pages as some sort of
> populate-on-demand first. Then copy the existing assigned pages from
> domA to domB and update the page type. The remaining pages are likely
> ballooned. Once the guest tries to access them this should give the
> hypervisor and/or toolstack a chance to assign a real RAM page to them.
> 
> I mean, if a host-assisted approach for kexec is implemented then this
> approach must also cover ballooning.

It is not possible for the hypervisor or toolstack to do what you want
because there may not be enough free memory to repopulate the new domain.

The guest can handle this by:

1. Not ballooning (this is common in cloud environments).
2. Reducing the balloon prior to kexec.
3. Running the kexec'd image in a reserved chunk of memory (the crash
kernel case).
4. Providing balloon information to the kexec'd image.

None of these require any additional hypervisor or toolstack support and
1-3 are trivial for a guest to implement.

David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec
  2015-01-07  9:10     ` Olaf Hering
  2015-01-07 10:41       ` David Vrabel
@ 2015-01-07 10:49       ` Vitaly Kuznetsov
  2015-01-07 11:03         ` Olaf Hering
  1 sibling, 1 reply; 39+ messages in thread
From: Vitaly Kuznetsov @ 2015-01-07 10:49 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Andrew Jones, Julien Grall, Wei Liu, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Tim Deegan,
	David Vrabel, Jan Beulich, xen-devel, Keir Fraser

Olaf Hering <olaf@aepfle.de> writes:

> On Mon, Jan 05, Vitaly Kuznetsov wrote:
>
>> Wei Liu <wei.liu2@citrix.com> writes:
>> 
>> > Olaf mentioned his concern about handling ballooned pages in
>> > <20141211153029.GA1772@aepfle.de>. Is that point moot now?
>> 
>> Well, the limitation is real and some guest-side handling will be
>> required in case we want to support kexec with ballooning. But as David
>> validly mentioned "It's the responsibility of the guest to ensure it
>> either doesn't kexec when it is ballooned or that the kexec kernel can
>> handle this". Not sure if we can (and need to) do anything hypevisor- or
>> toolstack-side.
>
> One approach would be to mark all pages as some sort of
> populate-on-demand first. Then copy the existing assigned pages from
> domA to domB and update the page type. The remaining pages are likely
> ballooned. Once the guest tries to access them this should give the
> hypervisor and/or toolstack a chance to assign a real RAM page to
> them.

The thing is .. we don't have these pages when kexec is being performed,
they are already ballooned out and the hypervisor doesn't have the
knowledge of which GFNs should be re-populated. I think it is possible
to keep track of all pages the guest balloons out for this purpose, but 
..

>
> I mean, if a host-assisted approach for kexec is implemented then this
> approach must also cover ballooning.

I don't see why solving the issue hypervisor-side is a must. When the
guest performs kdump we don't care about the ballooning as we have a
separate memory area which is supposed to have no ballooned out
pages. When we do kexec nothing stops us from asking balloon driver to
bring everything back, it is fine to perform non-trivial work before
kexec (e.g. we shutdown all the devices).

But, as I said, I'll try playing with ballooning to make these thoughts
not purely theoretical.

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec
  2015-01-07 10:41       ` David Vrabel
@ 2015-01-07 10:54         ` Jan Beulich
  2015-01-07 11:59           ` Vitaly Kuznetsov
  2015-01-07 11:01         ` Olaf Hering
  2015-01-13 12:18         ` Ian Campbell
  2 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2015-01-07 10:54 UTC (permalink / raw)
  To: Olaf Hering, David Vrabel, Vitaly Kuznetsov
  Cc: Andrew Jones, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, TimDeegan, xen-devel,
	Keir Fraser

>>> On 07.01.15 at 11:41, <david.vrabel@citrix.com> wrote:
> On 07/01/15 09:10, Olaf Hering wrote:
>> On Mon, Jan 05, Vitaly Kuznetsov wrote:
>> 
>>> Wei Liu <wei.liu2@citrix.com> writes:
>>>
>>>> Olaf mentioned his concern about handling ballooned pages in
>>>> <20141211153029.GA1772@aepfle.de>. Is that point moot now?
>>>
>>> Well, the limitation is real and some guest-side handling will be
>>> required in case we want to support kexec with ballooning. But as David
>>> validly mentioned "It's the responsibility of the guest to ensure it
>>> either doesn't kexec when it is ballooned or that the kexec kernel can
>>> handle this". Not sure if we can (and need to) do anything hypevisor- or
>>> toolstack-side.
>> 
>> One approach would be to mark all pages as some sort of
>> populate-on-demand first. Then copy the existing assigned pages from
>> domA to domB and update the page type. The remaining pages are likely
>> ballooned. Once the guest tries to access them this should give the
>> hypervisor and/or toolstack a chance to assign a real RAM page to them.
>> 
>> I mean, if a host-assisted approach for kexec is implemented then this
>> approach must also cover ballooning.
> 
> It is not possible for the hypervisor or toolstack to do what you want
> because there may not be enough free memory to repopulate the new domain.
> 
> The guest can handle this by:
> 
> 1. Not ballooning (this is common in cloud environments).
> 2. Reducing the balloon prior to kexec.

Which may fail because again there may not be enough memory to
claim back from the hypervisor.

Jan

> 3. Running the kexec'd image in a reserved chunk of memory (the crash
> kernel case).
> 4. Providing balloon information to the kexec'd image.
> 
> None of these require any additional hypervisor or toolstack support and
> 1-3 are trivial for a guest to implement.
> 
> David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec
  2015-01-07 10:41       ` David Vrabel
  2015-01-07 10:54         ` Jan Beulich
@ 2015-01-07 11:01         ` Olaf Hering
  2015-01-13 12:18         ` Ian Campbell
  2 siblings, 0 replies; 39+ messages in thread
From: Olaf Hering @ 2015-01-07 11:01 UTC (permalink / raw)
  To: David Vrabel
  Cc: Andrew Jones, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Tim Deegan,
	Jan Beulich, xen-devel, Vitaly Kuznetsov, Keir Fraser

On Wed, Jan 07, David Vrabel wrote:

> 2. Reducing the balloon prior to kexec.

We carry a patch for kexec(1) which does balloon up before doing the
actual kexec call. I propose to get such change into the upstream kexec
tools if that is indeed the way to go. The benefit is that the guest
waits until every ballooned page is populated. If the host is short on
memory then the guest will hang instead of crash after kexec.

https://build.opensuse.org/package/view_file/Kernel:kdump/kexec-tools/kexec-tools-xen-balloon-up.patch

Olaf

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec
  2015-01-07 10:49       ` Vitaly Kuznetsov
@ 2015-01-07 11:03         ` Olaf Hering
  2015-01-14 11:06           ` George Dunlap
  0 siblings, 1 reply; 39+ messages in thread
From: Olaf Hering @ 2015-01-07 11:03 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andrew Jones, Julien Grall, Wei Liu, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Tim Deegan,
	David Vrabel, Jan Beulich, xen-devel, Keir Fraser

On Wed, Jan 07, Vitaly Kuznetsov wrote:

> The thing is .. we don't have these pages when kexec is being performed,
> they are already ballooned out and the hypervisor doesn't have the
> knowledge of which GFNs should be re-populated. I think it is possible
> to keep track of all pages the guest balloons out for this purpose, but 
> ..

Have you tried to make the new guest a PoD guest? That way it may work
out of the box already.

Olaf

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec
  2015-01-07 10:54         ` Jan Beulich
@ 2015-01-07 11:59           ` Vitaly Kuznetsov
  0 siblings, 0 replies; 39+ messages in thread
From: Vitaly Kuznetsov @ 2015-01-07 11:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	TimDeegan, David Vrabel, xen-devel, Keir Fraser

"Jan Beulich" <JBeulich@suse.com> writes:

>>>> On 07.01.15 at 11:41, <david.vrabel@citrix.com> wrote:
>> On 07/01/15 09:10, Olaf Hering wrote:
>>> On Mon, Jan 05, Vitaly Kuznetsov wrote:
>>> 
>>>> Wei Liu <wei.liu2@citrix.com> writes:
>>>>
>>>>> Olaf mentioned his concern about handling ballooned pages in
>>>>> <20141211153029.GA1772@aepfle.de>. Is that point moot now?
>>>>
>>>> Well, the limitation is real and some guest-side handling will be
>>>> required in case we want to support kexec with ballooning. But as David
>>>> validly mentioned "It's the responsibility of the guest to ensure it
>>>> either doesn't kexec when it is ballooned or that the kexec kernel can
>>>> handle this". Not sure if we can (and need to) do anything hypevisor- or
>>>> toolstack-side.
>>> 
>>> One approach would be to mark all pages as some sort of
>>> populate-on-demand first. Then copy the existing assigned pages from
>>> domA to domB and update the page type. The remaining pages are likely
>>> ballooned. Once the guest tries to access them this should give the
>>> hypervisor and/or toolstack a chance to assign a real RAM page to them.
>>> 
>>> I mean, if a host-assisted approach for kexec is implemented then this
>>> approach must also cover ballooning.
>> 
>> It is not possible for the hypervisor or toolstack to do what you want
>> because there may not be enough free memory to repopulate the new domain.
>> 
>> The guest can handle this by:
>> 
>> 1. Not ballooning (this is common in cloud environments).
>> 2. Reducing the balloon prior to kexec.
>
> Which may fail because again there may not be enough memory to
> claim back from the hypervisor.
>

Yes, but it may be better to cancel kexec at this point.

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec
  2015-01-07 10:41       ` David Vrabel
  2015-01-07 10:54         ` Jan Beulich
  2015-01-07 11:01         ` Olaf Hering
@ 2015-01-13 12:18         ` Ian Campbell
  2 siblings, 0 replies; 39+ messages in thread
From: Ian Campbell @ 2015-01-13 12:18 UTC (permalink / raw)
  To: David Vrabel
  Cc: Olaf Hering, Wei Liu, Stefano Stabellini, Andrew Cooper,
	Julien Grall, Ian Jackson, Andrew Jones, Tim Deegan, Jan Beulich,
	xen-devel, Vitaly Kuznetsov, Keir Fraser

On Wed, 2015-01-07 at 10:41 +0000, David Vrabel wrote:
> On 07/01/15 09:10, Olaf Hering wrote:
> > On Mon, Jan 05, Vitaly Kuznetsov wrote:
> > 
> >> Wei Liu <wei.liu2@citrix.com> writes:
> >>
> >>> Olaf mentioned his concern about handling ballooned pages in
> >>> <20141211153029.GA1772@aepfle.de>. Is that point moot now?
> >>
> >> Well, the limitation is real and some guest-side handling will be
> >> required in case we want to support kexec with ballooning. But as David
> >> validly mentioned "It's the responsibility of the guest to ensure it
> >> either doesn't kexec when it is ballooned or that the kexec kernel can
> >> handle this". Not sure if we can (and need to) do anything hypevisor- or
> >> toolstack-side.
> > 
> > One approach would be to mark all pages as some sort of
> > populate-on-demand first. Then copy the existing assigned pages from
> > domA to domB and update the page type. The remaining pages are likely
> > ballooned. Once the guest tries to access them this should give the
> > hypervisor and/or toolstack a chance to assign a real RAM page to them.
> > 
> > I mean, if a host-assisted approach for kexec is implemented then this
> > approach must also cover ballooning.
> 
> It is not possible for the hypervisor or toolstack to do what you want
> because there may not be enough free memory to repopulate the new domain.
> 
> The guest can handle this by:
> 
> 1. Not ballooning (this is common in cloud environments).
> 2. Reducing the balloon prior to kexec.
> 3. Running the kexec'd image in a reserved chunk of memory (the crash
> kernel case).
> 4. Providing balloon information to the kexec'd image.

Is #4 not just some sort of special case of "provide a memory map to the
kexec'd kernel"? I suppose it's significantly more fine grained and
therefore doesn't fit into the usual data structures which kexec might
already support throwing over the fence.

(Olaf's suggesting to use PoD in a different subthread was interesting
too).

> None of these require any additional hypervisor or toolstack support and
> 1-3 are trivial for a guest to implement.
> 
> David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 2/9] xen: introduce SHUTDOWN_soft_reset shutdown reason
  2014-12-11 13:45 ` [PATCH v5 2/9] xen: introduce SHUTDOWN_soft_reset shutdown reason Vitaly Kuznetsov
  2014-12-18 13:28   ` Jan Beulich
@ 2015-01-13 12:20   ` Ian Campbell
  1 sibling, 0 replies; 39+ messages in thread
From: Ian Campbell @ 2015-01-13 12:20 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Wei Liu, Andrew Jones, Julien Grall, Keir Fraser,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, xen-devel

On Thu, 2014-12-11 at 14:45 +0100, Vitaly Kuznetsov wrote:
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  xen/common/shutdown.c      | 7 +++++++
>  xen/include/public/sched.h | 3 ++-
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/common/shutdown.c b/xen/common/shutdown.c
> index 94d4c53..5c3a158 100644
> --- a/xen/common/shutdown.c
> +++ b/xen/common/shutdown.c
> @@ -71,6 +71,13 @@ void hwdom_shutdown(u8 reason)
>          break; /* not reached */
>      }
>  
> +    case SHUTDOWN_soft_reset:
> +    {
> +        printk("Domain 0 did soft reset but it is unsupported, rebooting.\n");
> +        machine_restart(0);
> +        break; /* not reached */

Would a host kexec be an appropriate response to this situation,
assuming a kernel was loaded? (My guy says no, but thought I would
mention it)

Ian.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 3/9] libxl: support SHUTDOWN_soft_reset shutdown reason
  2014-12-11 13:45 ` [PATCH v5 3/9] libxl: support " Vitaly Kuznetsov
@ 2015-01-13 12:22   ` Ian Campbell
  2015-01-13 12:23   ` Ian Campbell
  1 sibling, 0 replies; 39+ messages in thread
From: Ian Campbell @ 2015-01-13 12:22 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Wei Liu, Andrew Jones, Julien Grall, Keir Fraser,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, xen-devel

On Thu, 2014-12-11 at 14:45 +0100, Vitaly Kuznetsov wrote:
> Use letter 't' to indicate a domain in such state.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  tools/libxl/libxl_types.idl       | 1 +
>  tools/libxl/xl_cmdimpl.c          | 2 +-
>  tools/python/xen/lowlevel/xl/xl.c | 1 +
>  3 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index f7fc695..4a0e2be 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -175,6 +175,7 @@ libxl_shutdown_reason = Enumeration("shutdown_reason", [
>      (2, "suspend"),
>      (3, "crash"),
>      (4, "watchdog"),
> +    (5, "soft_reset"),

Please add a LIBXL_HAVE_... define to libxl.h
(LIBXL_HAVE_SHUTDWON_REASON_SOFT_RESET, I think). There are examples in
there to copy (and a comment describing why etc).

NB: xl and the python bindings are in tree and therefore you don't need
to actually use this #define in the othyer two hunks, it's for 3rd party
uses of libxl.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 3/9] libxl: support SHUTDOWN_soft_reset shutdown reason
  2014-12-11 13:45 ` [PATCH v5 3/9] libxl: support " Vitaly Kuznetsov
  2015-01-13 12:22   ` Ian Campbell
@ 2015-01-13 12:23   ` Ian Campbell
  1 sibling, 0 replies; 39+ messages in thread
From: Ian Campbell @ 2015-01-13 12:23 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Wei Liu, Andrew Jones, Julien Grall, Keir Fraser,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, xen-devel

On Thu, 2014-12-11 at 14:45 +0100, Vitaly Kuznetsov wrote:
> Use letter 't' to indicate a domain in such state.

For rese(t) I suppose? How about "(S)oft reset" or "soft (R)eset"?
(depending on how closely you think this operation aligns as a special
case of reset).

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 4/9] xen: introduce XEN_DOMCTL_devour
  2014-12-18 13:57   ` Jan Beulich
@ 2015-01-13 12:26     ` Ian Campbell
  0 siblings, 0 replies; 39+ messages in thread
From: Ian Campbell @ 2015-01-13 12:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Olaf Hering, Wei Liu, Stefano Stabellini, Andrew Cooper,
	Julien Grall, Ian Jackson, Andrew Jones, Tim Deegan,
	David Vrabel, xen-devel, Vitaly Kuznetsov, Keir Fraser

On Thu, 2014-12-18 at 13:57 +0000, Jan Beulich wrote:
> >>> On 11.12.14 at 14:45, <vkuznets@redhat.com> wrote:
> > --- a/xen/common/domctl.c
> > +++ b/xen/common/domctl.c
> > @@ -1177,6 +1177,39 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
> >      }
> >      break;
> >  
> > +    case XEN_DOMCTL_devour:
> > +    {
> > +        struct domain *recipient_dom;
> > +
> > +        if ( !d->recipient )
> > +        {
> > +            recipient_dom = get_domain_by_id(op->u.devour.recipient);
> > +            if ( recipient_dom == NULL )
> > +            {
> > +                ret = -ESRCH;
> > +                break;
> > +            }
> > +
> > +            if ( recipient_dom->tot_pages != 0 )
> > +            {
> > +                put_domain(recipient_dom);
> > +                ret = -EINVAL;
> > +                break;
> > +            }
> > +            /*
> > +             * Make sure no allocation/remapping is ongoing and set is_dying
> > +             * flag to prevent such actions in future.
> > +             */
> > +            spin_lock(&d->page_alloc_lock);
> > +            d->is_dying = DOMDYING_locked;
> > +            d->recipient = recipient_dom;
> 
> Is d == recipient_dom a valid case (not leading to any issues)?

I suspect not, due to the restriction that the recipient not have any
domheap pages, if d didn't have any domheap pages this whole dance is a
bit unnecessary.

In any case it doesn't seem very useful, so it's much easier to just
avoid the issue by outlawing it.

Ian.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 4/9] xen: introduce XEN_DOMCTL_devour
  2014-12-11 13:45 ` [PATCH v5 4/9] xen: introduce XEN_DOMCTL_devour Vitaly Kuznetsov
  2014-12-18 13:57   ` Jan Beulich
@ 2015-01-13 13:53   ` Ian Campbell
  2015-01-13 14:48     ` Tim Deegan
  2015-01-13 16:17     ` Vitaly Kuznetsov
  1 sibling, 2 replies; 39+ messages in thread
From: Ian Campbell @ 2015-01-13 13:53 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Wei Liu, Andrew Jones, Julien Grall, Keir Fraser,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, xen-devel

On Thu, 2014-12-11 at 14:45 +0100, Vitaly Kuznetsov wrote:
> +            gmfn = mfn_to_gmfn(d, mfn);

(I haven't thought about it super hard, but I'm taking it as given that
this approach to kexec is going to be needed for ARM too, since that
seems likely)

mfn_to_gmfn is going to be a bit pricey on ARM, we don't have an m2p to
refer to, I'm not sure what we would do instead, walking the p2m looking
for mfns surely won't be a good idea!

An alternative approach to this might be to walk the guest p2m (with
appropriate continuations) and move each domheap page (this would also
help us preserve super page mappings). It would also have the advantage
of not needing additional stages in the destroy path and state in struct
domain etc, since all the action would be constrained to the one
hypercall.

x86 folks, would that work for your p2m too?

Ian.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 6/9] libxl: add libxl__domain_soft_reset_destroy()
  2014-12-11 13:45 ` [PATCH v5 6/9] libxl: add libxl__domain_soft_reset_destroy() Vitaly Kuznetsov
@ 2015-01-13 13:58   ` Ian Campbell
  0 siblings, 0 replies; 39+ messages in thread
From: Ian Campbell @ 2015-01-13 13:58 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Wei Liu, Andrew Jones, Julien Grall, Keir Fraser,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, xen-devel

On Thu, 2014-12-11 at 14:45 +0100, Vitaly Kuznetsov wrote:
> New libxl__domain_soft_reset_destroy() is an internal-only
> version of libxl_domain_destroy() which follows the same domain
> destroy path with the only difference: xc_domain_destroy() is
> being avoided so the domain is not actually being destroyed.

Rather than duplicating the bulk of libxl_domain_destroy, please make
this libxl__domain_destroy taking a flag and turn libxl_domain_destroy
into a thin wrapper around the new internal version.

> Add soft_reset flag to libxl__domain_destroy_state structure
> to support the change.
> 
> The original libxl_domain_destroy() function could be easily
> modified to support new flag but I'm trying to avoid that as
> it is part of public API.

There are mechanisms which could be used here to rev the API if it was
desirable to expose this flag to the calling toolstack for some reason,
e.g. checkout the uses of LIBXL_API_VERSION in libxl.h.

Ian.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 7/9] libxc: introduce soft reset for HVM domains
  2014-12-11 13:45 ` [PATCH v5 7/9] libxc: introduce soft reset for HVM domains Vitaly Kuznetsov
@ 2015-01-13 14:08   ` Ian Campbell
  0 siblings, 0 replies; 39+ messages in thread
From: Ian Campbell @ 2015-01-13 14:08 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Wei Liu, Andrew Jones, Julien Grall, Keir Fraser,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, xen-devel

On Thu, 2014-12-11 at 14:45 +0100, Vitaly Kuznetsov wrote:
> Add new xc_domain_soft_reset() function which performs so-called 'soft reset'
> for an HVM domain. It is being performed in the following way:
> - Save HVM context and all HVM params;
> - Devour original domain with XEN_DOMCTL_devour;
> - Wait till original domain dies or has no pages left;
> - Restore HVM context, HVM params, seed grant table.

Are any of these operations "slow", per the definition under 'Machinery
for asynchronous operations ("ao")' in libxl_internal.h? "Wait till
original domain dies" sounds like it might be.

That might have implications for the use of this functionality from
libxl.

> +    xc_hvm_param_get(xch, source_dom, HVM_PARAM_IDENT_PT,
> +                     &hvm_params[HVM_PARAM_IDENT_PT]);

There's quite a risk of the set of HVM parameters retrieved getting out
of sync, either with the hypervisor or with the sets done below.

I don't know if any part of the migration infrastructure (specifically
Andy Cooper's v2 stuff, or some of the underlying hypercalls) could be
reused here to pickle/unpickle the state?

Other possibilities:

A new hypercall pair to get/set all hvm params.

An list of params to save/restore locally here, which would at least
stop the get/set parts gettuing out of sync, but doesn't help with the
hypervisor getting out of sync (and therefore would not be my preferred
solution).

Also this function needs to take arch specifics into account.

> +    while ( 1 )
> +    {
> +        sleep(SLEEP_INT);
> +        if ( xc_get_tot_pages(xch, source_dom) <= 0 )
> +        {
> +            DPRINTF("All pages were transferred");
> +            break;
> +        }
> +    }

I think we are going to need to find a better solution than this.

Changing the nature of the hypercall as I suggested in a previous reply
would also remove this, so I'll wait for a verdict on that before
worrying about this bit any further.
> [...]

> +            PERROR("Faled to perform soft reset, destroying domain %d",

"Failed"

Ian.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 8/9] libxl: soft reset support
  2014-12-11 13:45 ` [PATCH v5 8/9] libxl: soft reset support Vitaly Kuznetsov
@ 2015-01-13 14:21   ` Ian Campbell
  0 siblings, 0 replies; 39+ messages in thread
From: Ian Campbell @ 2015-01-13 14:21 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Wei Liu, Andrew Jones, Julien Grall, Keir Fraser,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, xen-devel

On Thu, 2014-12-11 at 14:45 +0100, Vitaly Kuznetsov wrote:
> Supported for HVM guests only.

Is it specifically PVHVM guests, or are unaware HVM guests also
supported? (I think the answer is that an unaware HVM guest has no way
to trigger a soft reset, so maybe it's moot...)

> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 0a123f1..710dc0e 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -929,6 +929,12 @@ int static inline libxl_domain_create_restore_0x040200(
>  
>  #endif
>  
> +int libxl_domain_soft_reset(libxl_ctx *ctx, libxl_domain_config *d_config,
> +                            uint32_t *domid, uint32_t domid_old,
> +                            const libxl_asyncop_how *ao_how,
> +                            const libxl_asyncprogress_how *aop_console_how)
> +                            LIBXL_EXTERNAL_CALLERS_ONLY;
> +
>    /* A progress report will be made via ao_console_how, of type
>     * domain_create_console_available, when the domain's primary
>     * console is available and can be connected to.
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 1198225..0a840c9 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -25,6 +25,8 @@
>  #include <xen/hvm/hvm_info_table.h>
>  #include <xen/hvm/e820.h>
>  
> +#define INVALID_DOMID ~0

Is this completely internal to this file, or are you requiring that it
matches the one in xl_cmdimpl.c (i.e does it cross the library
interface)?

> +
> +void libxl__xc_domain_soft_reset(libxl__egc *egc,
> +                                 libxl__domain_create_state *dcs)
> +{
> +    STATE_AO_GC(dcs->ao);
> +    libxl_ctx *ctx = libxl__gc_owner(gc);
> +    const uint32_t domid_soft_reset = dcs->domid_soft_reset;
> +    const uint32_t domid = dcs->guest_domid;
> +    libxl_domain_config *const d_config = dcs->guest_config;
> +    libxl_domain_build_info *const info = &d_config->b_info;
> +    uint8_t *buf;
> +    uint32_t len;
> +    uint32_t console_domid, store_domid;
> +    unsigned long store_mfn, console_mfn;
> +    int rc;
> +    struct libxl__domain_suspend_state *dss;
> +
> +    GCNEW(dss);
> +
> +    dss->ao = ao;
> +    dss->domid = domid_soft_reset;
> +    dss->dm_savefile = GCSPRINTF("/var/lib/xen/qemu-save.%d",
> +                                 domid_soft_reset);
> +
> +    if (info->type == LIBXL_DOMAIN_TYPE_HVM) {

I thought the alternative  (PV) wasn't possible?

> +        rc = libxl__domain_suspend_device_model(gc, dss);
> +        if (rc) goto out;
> +    }
> +
> +    console_domid = dcs->build_state.console_domid;
> +    store_domid = dcs->build_state.store_domid;
[...]
> +    rc = xc_domain_soft_reset(ctx->xch, domid_soft_reset, domid, console_domid,
> +                              &console_mfn, store_domid, &store_mfn);
> +    if (rc) goto out;
[..]
> +    dcs->build_state.store_mfn = store_mfn;
> +    dcs->build_state.console_mfn = console_mfn;

Are you trying to avoid passing &dcs->build_state.store_mfn to the xc
function directly for some reason?

> +
> +    rc = libxl__toolstack_save(domid_soft_reset, &buf, &len, dss);
> +    if (rc) goto out;
> +
> +    rc = libxl__toolstack_restore(domid, buf, len, &dcs->shs);
> +    if (rc) goto out;
> +out:
> +    /*
> +     * Now pretend we did normal restore and simply call
> +     * libxl__xc_domain_restore_done().
> +     */
> +    libxl__xc_domain_restore_done(egc, dcs, rc, 0, 0);
> +}
> +
>  void libxl__srm_callout_callback_restore_results(unsigned long store_mfn,
>            unsigned long console_mfn, void *user)
>  {
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 4a0e2be..10ef652 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -121,6 +121,8 @@ libxl_action_on_shutdown = Enumeration("action_on_shutdown", [
>  
>      (5, "COREDUMP_DESTROY"),
>      (6, "COREDUMP_RESTART"),
> +
> +    (7, "SOFT_RESET"),

I think I mention a LIBXL_HAVE #define earlier on, since they are all
related I think you can have a single one for the overall feature rather
than ones for each new enum value. function etc. Probably
LIBXL_HAVE_DOMAIN_SOFT_RESET fits best?

> @@ -2519,7 +2538,17 @@ start:
>           * restore/migrate-receive it again.
>           */
>          restoring = 0;
> -    }else{
> +    } else if (domid_old != INVALID_DOMID) {
> +        /* Do soft reset */
> +        ret = libxl_domain_soft_reset(ctx, &d_config,
> +                                      &domid, domid_old,
> +                                      0, 0);
> +
> +        if ( ret ) {
> +            goto error_out;
> +        }
> +        domid_old = INVALID_DOMID;
> +    } else {
>          ret = libxl_domain_create_new(ctx, &d_config, &domid,
>                                        0, autoconnect_console_how);
>      }
> @@ -2583,6 +2612,8 @@ start:
>                  event->u.domain_shutdown.shutdown_reason,
>                  event->u.domain_shutdown.shutdown_reason);
>              switch (handle_domain_death(&domid, event, &d_config)) {
> +            case 3:
> +                domid_old = domid;

Please comment when falling through is deliberate.

I think we've now passed the point where the raw numbers are tolerable
any more, please could you convert to an enum and then add the new
value.

>              case 2:
>                  if (!preserve_domain(&domid, event, &d_config)) {
>                      /* If we fail then exit leaving the old domain in place. */

Ian.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 4/9] xen: introduce XEN_DOMCTL_devour
  2015-01-13 13:53   ` Ian Campbell
@ 2015-01-13 14:48     ` Tim Deegan
  2015-01-13 16:17     ` Vitaly Kuznetsov
  1 sibling, 0 replies; 39+ messages in thread
From: Tim Deegan @ 2015-01-13 14:48 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Wei Liu, Andrew Jones, Julien Grall, Keir Fraser,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	David Vrabel, Jan Beulich, xen-devel, Vitaly Kuznetsov

At 13:53 +0000 on 13 Jan (1421153637), Ian Campbell wrote:
> On Thu, 2014-12-11 at 14:45 +0100, Vitaly Kuznetsov wrote:
> > +            gmfn = mfn_to_gmfn(d, mfn);
> 
> (I haven't thought about it super hard, but I'm taking it as given that
> this approach to kexec is going to be needed for ARM too, since that
> seems likely)
> 
> mfn_to_gmfn is going to be a bit pricey on ARM, we don't have an m2p to
> refer to, I'm not sure what we would do instead, walking the p2m looking
> for mfns surely won't be a good idea!
> 
> An alternative approach to this might be to walk the guest p2m (with
> appropriate continuations) and move each domheap page (this would also
> help us preserve super page mappings). It would also have the advantage
> of not needing additional stages in the destroy path and state in struct
> domain etc, since all the action would be constrained to the one
> hypercall.
> 
> x86 folks, would that work for your p2m too?

Without having looked at the details, it sounds plausible to me.

Tim.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 4/9] xen: introduce XEN_DOMCTL_devour
  2015-01-13 13:53   ` Ian Campbell
  2015-01-13 14:48     ` Tim Deegan
@ 2015-01-13 16:17     ` Vitaly Kuznetsov
  2015-01-13 16:24       ` Jan Beulich
  2015-01-13 16:41       ` Ian Campbell
  1 sibling, 2 replies; 39+ messages in thread
From: Vitaly Kuznetsov @ 2015-01-13 16:17 UTC (permalink / raw)
  To: Ian Campbell, Jan Beulich
  Cc: Wei Liu, Andrew Jones, Julien Grall, Keir Fraser,
	Stefano Stabellini, Ian Jackson, Tim Deegan, Olaf Hering,
	David Vrabel, Andrew Cooper, xen-devel

Ian Campbell <Ian.Campbell@citrix.com> writes:

> On Thu, 2014-12-11 at 14:45 +0100, Vitaly Kuznetsov wrote:
>> +            gmfn = mfn_to_gmfn(d, mfn);
>
> (I haven't thought about it super hard, but I'm taking it as given that
> this approach to kexec is going to be needed for ARM too, since that
> seems likely)
>
> mfn_to_gmfn is going to be a bit pricey on ARM, we don't have an m2p to
> refer to, I'm not sure what we would do instead, walking the p2m looking
> for mfns surely won't be a good idea!

Can we form a 'temporary m2p' table by walking p2m once? Our domain is
dying and mappings don't change.

>
> An alternative approach to this might be to walk the guest p2m (with
> appropriate continuations) and move each domheap page (this would also
> help us preserve super page mappings). It would also have the advantage
> of not needing additional stages in the destroy path and state in struct
> domain etc, since all the action would be constrained to the one
> hypercall.

Something like that (but not exactly) was in my RFC/WIPv2 series:
http://lists.xen.org/archives/html/xen-devel/2014-09/msg03624.html

The drawback of such approach is the necessity of copying all mapped
more than once pages (granted pages, qemu-mapped pages, ...) or at least
providing blank pages instead of them. 

Jan, may I also explicitly ask your opinion?

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 4/9] xen: introduce XEN_DOMCTL_devour
  2015-01-13 16:17     ` Vitaly Kuznetsov
@ 2015-01-13 16:24       ` Jan Beulich
  2015-01-13 16:45         ` Vitaly Kuznetsov
  2015-01-13 16:41       ` Ian Campbell
  1 sibling, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2015-01-13 16:24 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	Tim Deegan, David Vrabel, xen-devel, Keir Fraser

>>> On 13.01.15 at 17:17, <vkuznets@redhat.com> wrote:
> Ian Campbell <Ian.Campbell@citrix.com> writes:
>> An alternative approach to this might be to walk the guest p2m (with
>> appropriate continuations) and move each domheap page (this would also
>> help us preserve super page mappings). It would also have the advantage
>> of not needing additional stages in the destroy path and state in struct
>> domain etc, since all the action would be constrained to the one
>> hypercall.
> 
> Something like that (but not exactly) was in my RFC/WIPv2 series:
> http://lists.xen.org/archives/html/xen-devel/2014-09/msg03624.html 
> 
> The drawback of such approach is the necessity of copying all mapped
> more than once pages (granted pages, qemu-mapped pages, ...) or at least
> providing blank pages instead of them. 

Why would that be necessary only in that alternative model? What
gets done with pages used by other than _just_ the dying domain
shouldn't depend on how the MFN/GFN relationship gets determined.

Jan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 4/9] xen: introduce XEN_DOMCTL_devour
  2015-01-13 16:17     ` Vitaly Kuznetsov
  2015-01-13 16:24       ` Jan Beulich
@ 2015-01-13 16:41       ` Ian Campbell
  1 sibling, 0 replies; 39+ messages in thread
From: Ian Campbell @ 2015-01-13 16:41 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andrew Jones, Keir Fraser, xen-devel, Stefano Stabellini,
	Tim Deegan, Julien Grall, Ian Jackson, Olaf Hering, David Vrabel,
	Jan Beulich, Andrew Cooper, Wei Liu

On Tue, 2015-01-13 at 17:17 +0100, Vitaly Kuznetsov wrote:
> Ian Campbell <Ian.Campbell@citrix.com> writes:
> 
> > On Thu, 2014-12-11 at 14:45 +0100, Vitaly Kuznetsov wrote:
> >> +            gmfn = mfn_to_gmfn(d, mfn);
> >
> > (I haven't thought about it super hard, but I'm taking it as given that
> > this approach to kexec is going to be needed for ARM too, since that
> > seems likely)
> >
> > mfn_to_gmfn is going to be a bit pricey on ARM, we don't have an m2p to
> > refer to, I'm not sure what we would do instead, walking the p2m looking
> > for mfns surely won't be a good idea!
> 
> Can we form a 'temporary m2p' table by walking p2m once? Our domain is
> dying and mappings don't change.

That could work.

> > An alternative approach to this might be to walk the guest p2m (with
> > appropriate continuations) and move each domheap page (this would also
> > help us preserve super page mappings). It would also have the advantage
> > of not needing additional stages in the destroy path and state in struct
> > domain etc, since all the action would be constrained to the one
> > hypercall.
> 
> Something like that (but not exactly) was in my RFC/WIPv2 series:
> http://lists.xen.org/archives/html/xen-devel/2014-09/msg03624.html
> 
> The drawback of such approach is the necessity of copying all mapped
> more than once pages (granted pages, qemu-mapped pages, ...) or at least
> providing blank pages instead of them. 

Hrm, I'd have thought that the sequencing requirements of this call
having to happen between the domain killing itself and the final destroy
would avoid that sort of thing, or else it would be, as Jan says,
independent of the mechanism for doing the m to p translation.

Ian.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 4/9] xen: introduce XEN_DOMCTL_devour
  2015-01-13 16:24       ` Jan Beulich
@ 2015-01-13 16:45         ` Vitaly Kuznetsov
  2015-01-13 16:56           ` Jan Beulich
  0 siblings, 1 reply; 39+ messages in thread
From: Vitaly Kuznetsov @ 2015-01-13 16:45 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	Tim Deegan, David Vrabel, xen-devel, Keir Fraser

"Jan Beulich" <JBeulich@suse.com> writes:

>>>> On 13.01.15 at 17:17, <vkuznets@redhat.com> wrote:
>> Ian Campbell <Ian.Campbell@citrix.com> writes:
>>> An alternative approach to this might be to walk the guest p2m (with
>>> appropriate continuations) and move each domheap page (this would also
>>> help us preserve super page mappings). It would also have the advantage
>>> of not needing additional stages in the destroy path and state in struct
>>> domain etc, since all the action would be constrained to the one
>>> hypercall.
>> 
>> Something like that (but not exactly) was in my RFC/WIPv2 series:
>> http://lists.xen.org/archives/html/xen-devel/2014-09/msg03624.html 
>> 
>> The drawback of such approach is the necessity of copying all mapped
>> more than once pages (granted pages, qemu-mapped pages, ...) or at least
>> providing blank pages instead of them. 
>
> Why would that be necessary only in that alternative model? What
> gets done with pages used by other than _just_ the dying domain
> shouldn't depend on how the MFN/GFN relationship gets determined.

The difference comes from the fact that in the current model (when we
have a hook in free_domheap_pages()) we don't care who frees the
particular page first - our dying domain or e.g. a backend from dom0,
eventually all pages are freed. In the alternative model (one hypercall
reassigning all pages) we need to go through all domheap pages before
we start domain cleanup (unless we modify the domain cleanup path) or
some or all pages may got lost.

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 4/9] xen: introduce XEN_DOMCTL_devour
  2015-01-13 16:45         ` Vitaly Kuznetsov
@ 2015-01-13 16:56           ` Jan Beulich
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2015-01-13 16:56 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	Tim Deegan, David Vrabel, xen-devel, Keir Fraser

>>> On 13.01.15 at 17:45, <vkuznets@redhat.com> wrote:
> "Jan Beulich" <JBeulich@suse.com> writes:
> 
>>>>> On 13.01.15 at 17:17, <vkuznets@redhat.com> wrote:
>>> Ian Campbell <Ian.Campbell@citrix.com> writes:
>>>> An alternative approach to this might be to walk the guest p2m (with
>>>> appropriate continuations) and move each domheap page (this would also
>>>> help us preserve super page mappings). It would also have the advantage
>>>> of not needing additional stages in the destroy path and state in struct
>>>> domain etc, since all the action would be constrained to the one
>>>> hypercall.
>>> 
>>> Something like that (but not exactly) was in my RFC/WIPv2 series:
>>> http://lists.xen.org/archives/html/xen-devel/2014-09/msg03624.html 
>>> 
>>> The drawback of such approach is the necessity of copying all mapped
>>> more than once pages (granted pages, qemu-mapped pages, ...) or at least
>>> providing blank pages instead of them. 
>>
>> Why would that be necessary only in that alternative model? What
>> gets done with pages used by other than _just_ the dying domain
>> shouldn't depend on how the MFN/GFN relationship gets determined.
> 
> The difference comes from the fact that in the current model (when we
> have a hook in free_domheap_pages()) we don't care who frees the
> particular page first - our dying domain or e.g. a backend from dom0,
> eventually all pages are freed. In the alternative model (one hypercall
> reassigning all pages) we need to go through all domheap pages before
> we start domain cleanup (unless we modify the domain cleanup path) or
> some or all pages may got lost.

Which then looks like an argument against that alternative model.

Jan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec
  2015-01-07 11:03         ` Olaf Hering
@ 2015-01-14 11:06           ` George Dunlap
  0 siblings, 0 replies; 39+ messages in thread
From: George Dunlap @ 2015-01-14 11:06 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Andrew Jones, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Tim Deegan,
	David Vrabel, Jan Beulich, xen-devel, Vitaly Kuznetsov,
	Keir Fraser

On Wed, Jan 7, 2015 at 11:03 AM, Olaf Hering <olaf@aepfle.de> wrote:
> On Wed, Jan 07, Vitaly Kuznetsov wrote:
>
>> The thing is .. we don't have these pages when kexec is being performed,
>> they are already ballooned out and the hypervisor doesn't have the
>> knowledge of which GFNs should be re-populated. I think it is possible
>> to keep track of all pages the guest balloons out for this purpose, but
>> ..
>
> Have you tried to make the new guest a PoD guest? That way it may work
> out of the box already.

There's no difference between a PoD guest and a normal guest, except
that the PoD guest starts with some PoD entries.  At the moment, when
the balloon driver hands a page back to Xen, it puts a hole in the
p2m.  Once the balloon driver loads and inflates the balloon,
eventually all the pages will be touched, and there will be no more
PoD entries; at which point it will look exactly like any other guest.

An interesting idea to explore, however, would be to make it policy
rather than putting a hole in the p2m table when ballooning, to put a
p2m entry there instead.  Then when we kexec, it will look pretty much
like a normal PoD guest does booting, but just with some of the memory
pre-populated.  As long as there are enough zero pages lying around,
the PoD system should be able shuffle memory around enough to allow
the kexec'd guest to run until the balloon driver can come up and
inflate the balloon, similar to the way it does in the PoD case.

 -George

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2015-01-14 11:06 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-11 13:45 [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
2014-12-11 13:45 ` [PATCH v5 1/9] xen: introduce DOMDYING_locked state Vitaly Kuznetsov
2014-12-18 13:23   ` Jan Beulich
2014-12-11 13:45 ` [PATCH v5 2/9] xen: introduce SHUTDOWN_soft_reset shutdown reason Vitaly Kuznetsov
2014-12-18 13:28   ` Jan Beulich
2015-01-13 12:20   ` Ian Campbell
2014-12-11 13:45 ` [PATCH v5 3/9] libxl: support " Vitaly Kuznetsov
2015-01-13 12:22   ` Ian Campbell
2015-01-13 12:23   ` Ian Campbell
2014-12-11 13:45 ` [PATCH v5 4/9] xen: introduce XEN_DOMCTL_devour Vitaly Kuznetsov
2014-12-18 13:57   ` Jan Beulich
2015-01-13 12:26     ` Ian Campbell
2015-01-13 13:53   ` Ian Campbell
2015-01-13 14:48     ` Tim Deegan
2015-01-13 16:17     ` Vitaly Kuznetsov
2015-01-13 16:24       ` Jan Beulich
2015-01-13 16:45         ` Vitaly Kuznetsov
2015-01-13 16:56           ` Jan Beulich
2015-01-13 16:41       ` Ian Campbell
2014-12-11 13:45 ` [PATCH v5 5/9] libxc: support XEN_DOMCTL_devour Vitaly Kuznetsov
2014-12-11 13:45 ` [PATCH v5 6/9] libxl: add libxl__domain_soft_reset_destroy() Vitaly Kuznetsov
2015-01-13 13:58   ` Ian Campbell
2014-12-11 13:45 ` [PATCH v5 7/9] libxc: introduce soft reset for HVM domains Vitaly Kuznetsov
2015-01-13 14:08   ` Ian Campbell
2014-12-11 13:45 ` [PATCH v5 8/9] libxl: soft reset support Vitaly Kuznetsov
2015-01-13 14:21   ` Ian Campbell
2014-12-11 13:45 ` [PATCH v5 9/9] xsm: add XEN_DOMCTL_devour support Vitaly Kuznetsov
2014-12-18 13:59   ` Jan Beulich
2015-01-05 12:46 ` [PATCH v5 0/9] toolstack-based approach to pvhvm guest kexec Wei Liu
2015-01-05 13:00   ` Vitaly Kuznetsov
2015-01-07  9:10     ` Olaf Hering
2015-01-07 10:41       ` David Vrabel
2015-01-07 10:54         ` Jan Beulich
2015-01-07 11:59           ` Vitaly Kuznetsov
2015-01-07 11:01         ` Olaf Hering
2015-01-13 12:18         ` Ian Campbell
2015-01-07 10:49       ` Vitaly Kuznetsov
2015-01-07 11:03         ` Olaf Hering
2015-01-14 11:06           ` George Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.