All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/10] toolstack-based approach to pvhvm guest kexec
@ 2015-05-13  9:49 Vitaly Kuznetsov
  2015-05-13  9:49 ` [PATCH v6 01/10] xen: introduce SHUTDOWN_soft_reset shutdown reason Vitaly Kuznetsov
                   ` (9 more replies)
  0 siblings, 10 replies; 31+ messages in thread
From: Vitaly Kuznetsov @ 2015-05-13  9:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu, Daniel De Graaf

This patch series provides x86 PVHVM domains with an ability to perform
kexec/kdump.

The main change in v6 is the XEN_DOMCTL_devour -> XENMEM_soft_reset change of
the newly introduced hypercall. This change was made because of two main
reasons:
1) Make it ARM-friendly by eliminating the mfn_to_gmfn() call (as m2p does not
   exist on ARM).
2) To support PoD guests we need to set some entries as PoD in the p2m and this
   is impossible with the 'XEN_DOMCTL_devour' approach when we reassign pages
   when they are freed as we don't free non-existent pages.

Changes/comments grouped by the reviewer name:

Olaf Hering:
- PoD guests now work and no special guest code is required. Ballooned out
  pages are not supposed to be a problem for kexec any more.

Ian Campbell:
- Changing the nature of the hypercall to make it ARM-friendly and eliminate
  'wait for the original domain to die' from xc_domain_soft_reset().
- Add commit message to the "xen: introduce SHUTDOWN_soft_reset shutdown
  reason" patch.
- Add LIBXL_HAVE_SHUTDWON_REASON_SOFT_RESET
- Use 'S' instead of 't' to indicate a domain in the 'soft reset' state.
- libxl__domain_soft_reset_destroy() is gone.
- INVALID_DOMID define migrated to libxl_internal.h. We don't need it to be the
  same with the one from xl_cmdimpl.c and it doesn't cross libxl library
  boundary, it is just a convenience.
- introduce enum domain_restart_type.
- other minor fixes.
- [not fixed] xc_domain_soft_reset() still gets and sets all HVM params
  sequentially. This is in line with current migration code and Andrew's
  'migration v2' work. In general I agree we can do better by introducing
  a new pair of hypercalls to get/set all HVM params at once but I'd like
  to suggest we leave this change out of scope of this series. Using the
  sequential approach has definitely lower risk for soft reset than for
  migration as we can migrate live and the domain performing soft reset is
  almost dead. Please let me know your thoughts.

Jan Beulich:
- Rewrite commit message describing DOMDYING_locked state.
- Rewrite the message in hwdom_shutdown()

Since the change is significant some of the review comments from v5 are now
irrelevant and the overall quality of the series could have dropped. It's also
been awhile since the v5. I'd like to apologize for all that.

v5 and all the history of the series is available here:
http://lists.xenproject.org/archives/html/xen-devel/2014-12/msg01312.html

Vitaly Kuznetsov (10):
  xen: introduce SHUTDOWN_soft_reset shutdown reason
  libxl: support SHUTDOWN_soft_reset shutdown reason
  xen: introduce DOMDYING_locked state
  xen: Introduce XENMEM_soft_reset operation
  xsm: add XENMEM_soft_reset support
  libxc: support XENMEM_soft_reset operation
  libxc: introduce soft reset for HVM domains
  xl: introduce enum domain_restart_type
  libxc: add XC_DEVICE_MODEL_SAVE_FILE
  (lib)xl: soft reset support

 docs/man/xl.cfg.pod.5                        |  12 ++
 tools/flask/policy/policy/modules/xen/xen.if |   2 +-
 tools/libxc/Makefile                         |   1 +
 tools/libxc/include/xenctrl.h                |   5 +
 tools/libxc/include/xenguest.h               |  21 +++
 tools/libxc/xc_domain.c                      |  18 ++
 tools/libxc/xc_domain_soft_reset.c           | 272 +++++++++++++++++++++++++++
 tools/libxl/libxl.c                          |   4 +
 tools/libxl/libxl.h                          |  13 ++
 tools/libxl/libxl_create.c                   | 121 +++++++++++-
 tools/libxl/libxl_dm.c                       |   2 +-
 tools/libxl/libxl_internal.h                 |  26 +++
 tools/libxl/libxl_types.idl                  |   4 +
 tools/libxl/xl.h                             |   7 +
 tools/libxl/xl_cmdimpl.c                     |  58 ++++--
 tools/python/xen/lowlevel/xl/xl.c            |   1 +
 xen/common/domain.c                          |   1 +
 xen/common/memory.c                          | 236 +++++++++++++++++++++++
 xen/common/shutdown.c                        |   6 +
 xen/include/public/memory.h                  |  34 +++-
 xen/include/public/sched.h                   |   3 +-
 xen/include/xen/sched.h                      |   3 +-
 xen/include/xsm/dummy.h                      |   7 +
 xen/include/xsm/xsm.h                        |   7 +
 xen/xsm/dummy.c                              |   1 +
 xen/xsm/flask/hooks.c                        |  12 ++
 xen/xsm/flask/policy/access_vectors          |   4 +
 27 files changed, 851 insertions(+), 30 deletions(-)
 create mode 100644 tools/libxc/xc_domain_soft_reset.c

-- 
1.9.3

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v6 01/10] xen: introduce SHUTDOWN_soft_reset shutdown reason
  2015-05-13  9:49 [PATCH v6 00/10] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
@ 2015-05-13  9:49 ` Vitaly Kuznetsov
  2015-05-13  9:49 ` [PATCH v6 02/10] libxl: support " Vitaly Kuznetsov
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Vitaly Kuznetsov @ 2015-05-13  9:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu, Daniel De Graaf

This special type of shutdown is supposed to be used by PVHVM guests when
they want to perform some sort of kexec/kdump. Toolstack will have to build
a new domain with the memory content of the original domain and start executing
the new one from the point where SHUTDOWN_soft_reset was called. This operation
is not supported for PV domains including the hardware domain.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 xen/common/shutdown.c      | 6 ++++++
 xen/include/public/sched.h | 3 ++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/common/shutdown.c b/xen/common/shutdown.c
index 9cfbf7a..03a8641 100644
--- a/xen/common/shutdown.c
+++ b/xen/common/shutdown.c
@@ -66,6 +66,12 @@ void hwdom_shutdown(u8 reason)
         machine_restart(0);
         break; /* not reached */
 
+    case SHUTDOWN_soft_reset:
+        printk("Hardware domain %d did unsupported soft reset, rebooting.\n",
+               hardware_domain->domain_id);
+        machine_restart(0);
+        break; /* not reached */
+
     default:
         printk("Hardware Dom%u shutdown (unknown reason %u): ",
                hardware_domain->domain_id, reason);
diff --git a/xen/include/public/sched.h b/xen/include/public/sched.h
index 4000ac9..800c808 100644
--- a/xen/include/public/sched.h
+++ b/xen/include/public/sched.h
@@ -159,7 +159,8 @@ DEFINE_XEN_GUEST_HANDLE(sched_watchdog_t);
 #define SHUTDOWN_suspend    2  /* Clean up, save suspend info, kill.         */
 #define SHUTDOWN_crash      3  /* Tell controller we've crashed.             */
 #define SHUTDOWN_watchdog   4  /* Restart because watchdog time expired.     */
-#define SHUTDOWN_MAX        4  /* Maximum valid shutdown reason.             */
+#define SHUTDOWN_soft_reset 5  /* Soft reset, rebuild keeping memory content */
+#define SHUTDOWN_MAX        5  /* Maximum valid shutdown reason.             */
 /* ` } */
 
 #endif /* __XEN_PUBLIC_SCHED_H__ */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 02/10] libxl: support SHUTDOWN_soft_reset shutdown reason
  2015-05-13  9:49 [PATCH v6 00/10] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
  2015-05-13  9:49 ` [PATCH v6 01/10] xen: introduce SHUTDOWN_soft_reset shutdown reason Vitaly Kuznetsov
@ 2015-05-13  9:49 ` Vitaly Kuznetsov
  2015-05-20  9:44   ` Wei Liu
  2015-05-13  9:49 ` [PATCH v6 03/10] xen: introduce DOMDYING_locked state Vitaly Kuznetsov
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Vitaly Kuznetsov @ 2015-05-13  9:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu, Daniel De Graaf

Use letter 'S' to indicate a domain in such state.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/libxl/libxl.h               | 7 +++++++
 tools/libxl/libxl_types.idl       | 1 +
 tools/libxl/xl_cmdimpl.c          | 2 +-
 tools/python/xen/lowlevel/xl/xl.c | 1 +
 4 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index efc0617..8be0840 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -192,6 +192,13 @@
  * is not present, instead of ERROR_INVAL.
  */
 #define LIBXL_HAVE_ERROR_DOMAIN_NOTFOUND 1
+
+/*
+ * LIBXL_HAVE_SHUTDWON_REASON_SOFT_RESET indicates that libxl supports
+ * 'soft reset' shutdown reason in enum libxl_shutdown_reason.
+ */
+#define LIBXL_HAVE_SHUTDWON_REASON_SOFT_RESET 1
+
 /*
  * libxl ABI compatibility
  *
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 023b21e..986c4d3 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -177,6 +177,7 @@ libxl_shutdown_reason = Enumeration("shutdown_reason", [
     (2, "suspend"),
     (3, "crash"),
     (4, "watchdog"),
+    (5, "soft_reset"),
     ], init_val = "LIBXL_SHUTDOWN_REASON_UNKNOWN")
 
 libxl_vga_interface_type = Enumeration("vga_interface_type", [
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 526a1f6..c8cdf6e 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -3704,7 +3704,7 @@ static void list_domains(bool verbose, bool context, bool claim, bool numa,
                          bool cpupool, const libxl_dominfo *info, int nb_domain)
 {
     int i;
-    static const char shutdown_reason_letters[]= "-rscw";
+    static const char shutdown_reason_letters[]= "-rscwS";
     libxl_bitmap nodemap;
     libxl_physinfo physinfo;
 
diff --git a/tools/python/xen/lowlevel/xl/xl.c b/tools/python/xen/lowlevel/xl/xl.c
index 32f982a..7c61160 100644
--- a/tools/python/xen/lowlevel/xl/xl.c
+++ b/tools/python/xen/lowlevel/xl/xl.c
@@ -784,6 +784,7 @@ PyMODINIT_FUNC initxl(void)
     _INT_CONST_LIBXL(m, SHUTDOWN_REASON_SUSPEND);
     _INT_CONST_LIBXL(m, SHUTDOWN_REASON_CRASH);
     _INT_CONST_LIBXL(m, SHUTDOWN_REASON_WATCHDOG);
+    _INT_CONST_LIBXL(m, SHUTDOWN_REASON_SOFT_RESET);
 
     genwrap__init(m);
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 03/10] xen: introduce DOMDYING_locked state
  2015-05-13  9:49 [PATCH v6 00/10] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
  2015-05-13  9:49 ` [PATCH v6 01/10] xen: introduce SHUTDOWN_soft_reset shutdown reason Vitaly Kuznetsov
  2015-05-13  9:49 ` [PATCH v6 02/10] libxl: support " Vitaly Kuznetsov
@ 2015-05-13  9:49 ` Vitaly Kuznetsov
  2015-05-13  9:49 ` [PATCH v6 04/10] xen: Introduce XENMEM_soft_reset operation Vitaly Kuznetsov
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Vitaly Kuznetsov @ 2015-05-13  9:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu, Daniel De Graaf

The new state is required as a prerequisite to implementing soft reset
operation for domains. In this operation we need to make sure that source
domain's mappings don't change while we are reassigning all its memory to
the destination domain. This new state indicates that the particular domain is
dying (so its mappings are locked and do not change) but the cleanup procedure
has not being started yet. It will be set from outside of the domain_kill()
function.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 xen/common/domain.c     | 1 +
 xen/include/xen/sched.h | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 6803c4d..7825c56 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -605,6 +605,7 @@ int domain_kill(struct domain *d)
     switch ( d->is_dying )
     {
     case DOMDYING_alive:
+    case DOMDYING_locked:
         domain_pause(d);
         d->is_dying = DOMDYING_dying;
         spin_barrier(&d->domain_lock);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 80c6f62..f53b91d 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -354,7 +354,8 @@ struct domain
     enum guest_type guest_type;
 
     /* Is this guest dying (i.e., a zombie)? */
-    enum { DOMDYING_alive, DOMDYING_dying, DOMDYING_dead } is_dying;
+    enum { DOMDYING_alive, DOMDYING_locked, DOMDYING_dying, DOMDYING_dead }
+        is_dying;
 
     /* Domain is paused by controller software? */
     int              controller_pause_count;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 04/10] xen: Introduce XENMEM_soft_reset operation
  2015-05-13  9:49 [PATCH v6 00/10] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  2015-05-13  9:49 ` [PATCH v6 03/10] xen: introduce DOMDYING_locked state Vitaly Kuznetsov
@ 2015-05-13  9:49 ` Vitaly Kuznetsov
  2015-05-22  9:38   ` Jan Beulich
  2015-05-13  9:49 ` [PATCH v6 05/10] xsm: add XENMEM_soft_reset support Vitaly Kuznetsov
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Vitaly Kuznetsov @ 2015-05-13  9:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu, Daniel De Graaf

New operation reassigns all memory pages from source domain to the destination
domain mapping them at exactly the same GFNs. Pages mapped more than once (e.g.
grants) are being copied.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 xen/common/memory.c         | 232 ++++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/memory.h |  34 ++++++-
 2 files changed, 265 insertions(+), 1 deletion(-)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 063a1c5..385218c 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -580,6 +580,234 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
     return rc;
 }
 
+static long memory_soft_reset(XEN_GUEST_HANDLE_PARAM(xen_memory_soft_reset_t) arg)
+{
+    long rc = 0;
+    struct xen_memory_soft_reset softr;
+    struct domain *source_d, *dest_d;
+    unsigned long mfn, mfn_new, gmfn, last_gmfn, count;
+    unsigned int order;
+    p2m_type_t p2mt;
+    struct page_info *page, *new_page, *next_page;
+    int drop_dom_ref;
+
+    if ( copy_from_guest(&softr, arg, 1) )
+        return -EFAULT;
+
+    if ( softr.source_domid == softr.dest_domid )
+        return -EINVAL;
+
+    source_d = rcu_lock_domain_by_any_id(softr.source_domid);
+    if ( source_d == NULL )
+    {
+        rc = -ESRCH;
+        goto fail_early;
+    }
+
+    dest_d = rcu_lock_domain_by_any_id(softr.dest_domid);
+    if ( dest_d == NULL )
+    {
+        rc = -ESRCH;
+        rcu_unlock_domain(source_d);
+        goto fail_early;
+    }
+
+    if ( dest_d->is_dying )
+    {
+        rc = -EINVAL;
+        goto fail;
+    }
+
+    if ( !source_d->is_dying )
+    {
+        /*
+         * Make sure no allocation/remapping for the source domain is ongoing
+         * and set is_dying flag to prevent such actions in future.
+         */
+        spin_lock(&source_d->page_alloc_lock);
+        source_d->is_dying = DOMDYING_locked;
+        spin_unlock(&source_d->page_alloc_lock);
+    }
+
+    last_gmfn = domain_get_maximum_gpfn(source_d);
+    gmfn = softr.gmfn_start;
+    while ( gmfn <= last_gmfn )
+    {
+        page = get_page_from_gfn(source_d, gmfn, &p2mt, 0);
+        if ( unlikely(page == NULL) )
+        {
+#ifdef CONFIG_X86
+            count = 0;
+            while ( p2m_is_pod(p2mt) )
+            {
+                count++;
+                if ( gmfn + count > last_gmfn )
+                    break;
+                page = get_page_from_gfn(source_d, gmfn + count, &p2mt, 0);
+                if ( page )
+                {
+                    put_page(page);
+                    page = NULL;
+                    break;
+                }
+            }
+
+            if ( !count )
+                gmfn++;
+
+            while ( count )
+            {
+                order = get_order_from_pages(count);
+                if ( (1ul << order) > count )
+                    order -= 1;
+                rc = guest_physmap_mark_populate_on_demand(dest_d, gmfn, order);
+                if ( rc )
+                    goto fail;
+                count -= 1ul << order;
+                gmfn += 1ul << order;
+            }
+#else
+            gmfn++;
+#endif
+            goto preempt_check;
+        }
+
+        mfn = page_to_mfn(page);
+        if ( unlikely(!mfn_valid(mfn)) )
+        {
+            put_page(page);
+            gmfn++;
+            goto preempt_check;
+        }
+
+        next_page = page;
+
+        /*
+         * A normal page is supposed to have count_info = 2 (1 from the domain
+         * and 1 from get_page_from_gfn() above). All other pages need to be
+         * copied.
+         */
+        count = 0;
+        while ( next_page && !is_xen_heap_page(next_page) &&
+                (next_page->count_info & PGC_count_mask) == 2 &&
+                page_to_mfn(next_page) == mfn + count )
+        {
+            count++;
+            drop_dom_ref = 0;
+            spin_lock(&source_d->page_alloc_lock);
+            page_set_owner(next_page, NULL);
+            page_list_del(next_page, &source_d->page_list);
+            source_d->tot_pages -= 1;
+            if ( unlikely(source_d->tot_pages == 0) )
+                drop_dom_ref = 1;
+            spin_unlock(&source_d->page_alloc_lock);
+            put_page(next_page);
+            if ( drop_dom_ref )
+                put_domain(source_d);
+
+            if ( unlikely(assign_pages(dest_d, next_page, 0, 0)) )
+            {
+                printk(XENLOG_G_INFO "Failed to assign Dom%d's MFN %lx"
+                       " to Dom%d\n", source_d->domain_id, mfn,
+                       dest_d->domain_id);
+                rc = -EFAULT;
+                goto fail;
+            }
+
+            if ( unlikely(gmfn + count > last_gmfn) )
+            {
+                next_page = NULL;
+                break;
+            }
+
+            next_page = get_page_from_gfn(source_d, gmfn + count, &p2mt, 0);
+        }
+
+        if ( next_page && count )
+                put_page(next_page);
+
+        if ( !count && (page->count_info & PGC_count_mask) != 2 )
+        {
+            new_page = alloc_domheap_page(dest_d, 0);
+            if ( unlikely(new_page == NULL) )
+            {
+                printk(XENLOG_G_INFO "Failed to alloc a page to replace"
+                       " Dom%d's GFN %lx (MFN %lx) for Dom %d\n",
+                       source_d->domain_id, gmfn, mfn, dest_d->domain_id);
+                rc = -ENOMEM;
+                put_page(page);
+                goto fail;
+            }
+            mfn_new = page_to_mfn(new_page);
+            copy_domain_page(mfn_new, mfn);
+            mfn = mfn_new;
+            put_page(page);
+            if ( guest_physmap_add_page(dest_d, gmfn, mfn, 0) )
+            {
+                printk(XENLOG_G_INFO "Failed to add new GFN %lx"
+                       " (MFN %lx) to Dom%d\n",
+                       gmfn, mfn, dest_d->domain_id);
+                rc = -EFAULT;
+                goto fail;
+            }
+            gmfn++;
+            softr.nr_transferred++;
+        }
+        else if ( !count )
+        {
+            put_page(page);
+            gmfn++;
+            goto preempt_check;
+        }
+
+        while ( count )
+        {
+            order = get_order_from_pages(count);
+            if ( (1ul << order) > count )
+                order -= 1;
+
+            guest_physmap_remove_page(source_d, gmfn, mfn, order);
+
+            if ( guest_physmap_add_page(dest_d, gmfn, mfn, order) )
+            {
+                printk(XENLOG_G_INFO "Failed to re-add Dom%d's GFN %lx"
+                       " (MFN %lx, order: %u) to Dom%d\n", source_d->domain_id,
+                       gmfn, mfn, order, dest_d->domain_id);
+                rc = -EFAULT;
+                goto fail;
+            }
+
+            softr.nr_transferred += 1ul << order;
+            count -= 1ul << order;
+            gmfn += 1ul << order;
+            mfn += 1ul << order;
+        }
+
+ preempt_check:
+        if ( hypercall_preempt_check() && gmfn <= last_gmfn )
+        {
+            softr.gmfn_start = gmfn;
+            rcu_unlock_domain(source_d);
+            rcu_unlock_domain(dest_d);
+            if ( __copy_field_to_guest(arg, &softr, gmfn_start) )
+                return -EFAULT;
+            if ( __copy_field_to_guest(arg, &softr, nr_transferred) )
+                return -EFAULT;
+            return hypercall_create_continuation(
+                __HYPERVISOR_memory_op, "lh", XENMEM_soft_reset, arg);
+        }
+    }
+
+ fail:
+    rcu_unlock_domain(dest_d);
+    rcu_unlock_domain(source_d);
+ fail_early:
+    if ( __copy_field_to_guest(arg, &softr, nr_transferred) )
+        rc = -EFAULT;
+
+    return rc;
+}
+
 static int xenmem_add_to_physmap(struct domain *d,
                                  struct xen_add_to_physmap *xatp,
                                  unsigned int start)
@@ -828,6 +1056,10 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         rc = memory_exchange(guest_handle_cast(arg, xen_memory_exchange_t));
         break;
 
+    case XENMEM_soft_reset:
+        rc = memory_soft_reset(guest_handle_cast(arg, xen_memory_soft_reset_t));
+        break;
+
     case XENMEM_maximum_ram_page:
         if ( unlikely(start_extent) )
             return -ENOSYS;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 832559a..8117341 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -573,7 +573,39 @@ struct xen_vnuma_topology_info {
 typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
 DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
 
-/* Next available subop number is 27 */
+/*
+ * Transfer all memory pages from one domain to the other. Pages are unmapped
+ * from the source domain and mapped at the same GFNs to the destination
+ * domain. This hypercall has a side-effect of moving the source domain to the
+ * 'dying' state.
+ *
+ * If a particular page is mapped more then once in the source domain a new
+ * empty page is being allocated for the destination domain and the content is
+ * being copied. The original page remains mapped to the source domain.
+ *
+ * The caller has to be priviliged and it is supposed to set gmfn_start to 0,
+ * this field is required for the hypercall continuation.
+ */
+
+#define XENMEM_soft_reset                   27
+struct xen_memory_soft_reset {
+    /*
+     * [IN] Memory transfer details.
+     */
+    domid_t source_domid; /* steal pages from */
+    domid_t dest_domid;   /* assign pages to */
+
+    xen_pfn_t gmfn_start; /* start from gmfn */
+
+    /*
+     * [OUT] Number of transfered pages including new allocations.
+     */
+    xen_ulong_t nr_transferred;
+};
+typedef struct xen_memory_soft_reset xen_memory_soft_reset_t;
+DEFINE_XEN_GUEST_HANDLE(xen_memory_soft_reset_t);
+
+/* Next available subop number is 28 */
 
 #endif /* __XEN_PUBLIC_MEMORY_H__ */
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 05/10] xsm: add XENMEM_soft_reset support
  2015-05-13  9:49 [PATCH v6 00/10] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (3 preceding siblings ...)
  2015-05-13  9:49 ` [PATCH v6 04/10] xen: Introduce XENMEM_soft_reset operation Vitaly Kuznetsov
@ 2015-05-13  9:49 ` Vitaly Kuznetsov
  2015-05-20 23:30   ` Daniel De Graaf
  2015-05-22  9:40   ` Jan Beulich
  2015-05-13  9:49 ` [PATCH v6 06/10] libxc: support XENMEM_soft_reset operation Vitaly Kuznetsov
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 31+ messages in thread
From: Vitaly Kuznetsov @ 2015-05-13  9:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu, Daniel De Graaf

Dummy policy just checks that the current domain is privileged,
in flask policy soft_reset is added to create_domain.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/flask/policy/policy/modules/xen/xen.if |  2 +-
 xen/common/memory.c                          |  4 ++++
 xen/include/xsm/dummy.h                      |  7 +++++++
 xen/include/xsm/xsm.h                        |  7 +++++++
 xen/xsm/dummy.c                              |  1 +
 xen/xsm/flask/hooks.c                        | 12 ++++++++++++
 xen/xsm/flask/policy/access_vectors          |  4 ++++
 7 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.if b/tools/flask/policy/policy/modules/xen/xen.if
index 620d151..05e1c86 100644
--- a/tools/flask/policy/policy/modules/xen/xen.if
+++ b/tools/flask/policy/policy/modules/xen/xen.if
@@ -54,7 +54,7 @@ define(`create_domain_common', `
 			psr_cmt_op };
 	allow $1 $2:security check_context;
 	allow $1 $2:shadow enable;
-	allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage mmuext_op updatemp };
+	allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage mmuext_op updatemp soft_reset };
 	allow $1 $2:grant setup;
 	allow $1 $2:hvm { cacheattr getparam hvmctl irqlevel pciroute sethvmc
 			setparam pcilevel trackdirtyvram nested };
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 385218c..0d163d6 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -618,6 +618,10 @@ static long memory_soft_reset(XEN_GUEST_HANDLE_PARAM(xen_memory_soft_reset_t) ar
         goto fail;
     }
 
+    rc = xsm_memory_soft_reset(XSM_PRIV, source_d, dest_d);
+    if ( rc )
+        goto fail;
+
     if ( !source_d->is_dying )
     {
         /*
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index f044c0f..3aa5ba2 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -193,6 +193,13 @@ static XSM_INLINE int xsm_memory_exchange(XSM_DEFAULT_ARG struct domain *d)
     return xsm_default_action(action, current->domain, d);
 }
 
+static XSM_INLINE int xsm_memory_soft_reset(XSM_DEFAULT_ARG struct domain *d1,
+					    struct domain *d2)
+{
+    XSM_ASSERT_ACTION(XSM_PRIV);
+    return xsm_default_action(action, current->domain, NULL);
+}
+
 static XSM_INLINE int xsm_memory_adjust_reservation(XSM_DEFAULT_ARG struct domain *d1,
                                                             struct domain *d2)
 {
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index c872d44..872b58e 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -87,6 +87,7 @@ struct xsm_operations {
     int (*get_pod_target) (struct domain *d);
     int (*set_pod_target) (struct domain *d);
     int (*memory_exchange) (struct domain *d);
+    int (*memory_soft_reset) (struct domain *d1, struct domain *d2);
     int (*memory_adjust_reservation) (struct domain *d1, struct domain *d2);
     int (*memory_stat_reservation) (struct domain *d1, struct domain *d2);
     int (*memory_pin_page) (struct domain *d1, struct domain *d2, struct page_info *page);
@@ -351,6 +352,12 @@ static inline int xsm_memory_exchange (xsm_default_t def, struct domain *d)
     return xsm_ops->memory_exchange(d);
 }
 
+static inline int xsm_memory_soft_reset (xsm_default_t def, struct domain *d1,
+					 struct domain *d2)
+{
+    return xsm_ops->memory_soft_reset(d1, d2);
+}
+
 static inline int xsm_memory_adjust_reservation (xsm_default_t def, struct domain *d1, struct
                                                                     domain *d2)
 {
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index e84b0e4..98b47aa 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -64,6 +64,7 @@ void xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, set_pod_target);
 
     set_to_dummy_if_null(ops, memory_exchange);
+    set_to_dummy_if_null(ops, memory_soft_reset);
     set_to_dummy_if_null(ops, memory_adjust_reservation);
     set_to_dummy_if_null(ops, memory_stat_reservation);
     set_to_dummy_if_null(ops, memory_pin_page);
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 11b7453..a22b646 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -383,6 +383,17 @@ static int flask_memory_exchange(struct domain *d)
     return current_has_perm(d, SECCLASS_MMU, MMU__EXCHANGE);
 }
 
+static int flask_memory_soft_reset(struct domain *d1, struct domain *d2)
+{
+    int rc;
+
+    rc = current_has_perm(d1, SECCLASS_MMU, MMU__SOFT_RESET);
+    if (rc)
+        return rc;
+
+    return current_has_perm(d2, SECCLASS_MMU, MMU__SOFT_RESET);
+}
+
 static int flask_memory_adjust_reservation(struct domain *d1, struct domain *d2)
 {
     return domain_has_perm(d1, d2, SECCLASS_MMU, MMU__ADJUST);
@@ -1617,6 +1628,7 @@ static struct xsm_operations flask_ops = {
     .get_pod_target = flask_get_pod_target,
     .set_pod_target = flask_set_pod_target,
     .memory_exchange = flask_memory_exchange,
+    .memory_soft_reset = flask_memory_soft_reset,
     .memory_adjust_reservation = flask_memory_adjust_reservation,
     .memory_stat_reservation = flask_memory_stat_reservation,
     .memory_pin_page = flask_memory_pin_page,
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index ea556df..6b45b05 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -366,6 +366,10 @@ class mmu
 #  source = domain making the hypercall
 #  target = domain whose pages are being exchanged
     exchange
+# XENMEM_soft_reset:
+#  source = source soft reset domain
+#  target = destination soft reset domain
+    soft_reset
 # Allow a privileged domain to install a map of a page it does not own.  Used
 # for stub domain device models with the PV framebuffer.
     target_hack
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 06/10] libxc: support XENMEM_soft_reset operation
  2015-05-13  9:49 [PATCH v6 00/10] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (4 preceding siblings ...)
  2015-05-13  9:49 ` [PATCH v6 05/10] xsm: add XENMEM_soft_reset support Vitaly Kuznetsov
@ 2015-05-13  9:49 ` Vitaly Kuznetsov
  2015-05-13  9:49 ` [PATCH v6 07/10] libxc: introduce soft reset for HVM domains Vitaly Kuznetsov
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Vitaly Kuznetsov @ 2015-05-13  9:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu, Daniel De Graaf

Introduce xc_domain_memory_soft_reset() function supporting XENMEM_soft_reset.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/libxc/include/xenctrl.h |  5 +++++
 tools/libxc/xc_domain.c       | 18 ++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index a689caf..cb1bb15 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1381,6 +1381,11 @@ int xc_domain_claim_pages(xc_interface *xch,
                                uint32_t domid,
                                unsigned long nr_pages);
 
+int xc_domain_memory_soft_reset(xc_interface *xch,
+                                uint32_t source_domid,
+                                uint32_t dest_domid,
+                                uint64_t *nr_trans);
+
 int xc_domain_memory_exchange_pages(xc_interface *xch,
                                     int domid,
                                     unsigned long nr_in_extents,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index a7079a1..be39987 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -976,6 +976,24 @@ int xc_domain_claim_pages(xc_interface *xch,
     return err;
 }
 
+int xc_domain_memory_soft_reset(xc_interface *xch,
+                                uint32_t source_domid,
+                                uint32_t dest_domid,
+                                uint64_t *nr_trans)
+{
+    int err;
+    struct xen_memory_soft_reset softr = {
+        .source_domid   = source_domid,
+        .dest_domid     = dest_domid,
+        .gmfn_start     = 0,
+        .nr_transferred = 0
+    };
+
+    err = do_memory_op(xch, XENMEM_soft_reset, &softr, sizeof(softr));
+    *nr_trans = softr.nr_transferred;
+    return err;
+}
+
 int xc_domain_populate_physmap(xc_interface *xch,
                                uint32_t domid,
                                unsigned long nr_extents,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 07/10] libxc: introduce soft reset for HVM domains
  2015-05-13  9:49 [PATCH v6 00/10] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (5 preceding siblings ...)
  2015-05-13  9:49 ` [PATCH v6 06/10] libxc: support XENMEM_soft_reset operation Vitaly Kuznetsov
@ 2015-05-13  9:49 ` Vitaly Kuznetsov
  2015-05-20 15:10   ` Julien Grall
  2015-05-13  9:49 ` [PATCH v6 08/10] xl: introduce enum domain_restart_type Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Vitaly Kuznetsov @ 2015-05-13  9:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu, Daniel De Graaf

Add new xc_domain_soft_reset() function which performs so-called 'soft reset'
for an HVM domain. It is being performed in the following way:
- Save HVM context and all HVM params of the original domain;
- Transfer all the original domain's memory to a new domain with
  XENMEM_soft_reset;
- Destroy the original domain;
- Restore HVM context, HVM params, seed grant table for the new domain.

After that the new domain resumes execution from where SHUTDOWN_soft_reset was
called by the original domain.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/libxc/Makefile               |   1 +
 tools/libxc/include/xenguest.h     |  20 +++
 tools/libxc/xc_domain_soft_reset.c | 272 +++++++++++++++++++++++++++++++++++++
 3 files changed, 293 insertions(+)
 create mode 100644 tools/libxc/xc_domain_soft_reset.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 63878ec..f6c901b 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -69,6 +69,7 @@ $(patsubst %.c,%.opic,$(GUEST_SRCS-y)): CFLAGS += -DXG_LIBXL_HVM_COMPAT
 else
 GUEST_SRCS-y += xc_nomigrate.c
 endif
+GUEST_SRCS-y += xc_domain_soft_reset.c
 
 vpath %.c ../../xen/common/libelf
 CFLAGS += -I../../xen/common/libelf
diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 8e39075..d1227f7 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -146,6 +146,26 @@ int xc_domain_restore2(xc_interface *xch, int io_fd, uint32_t dom,
  * of the new domain is automatically appended to the filename,
  * separated by a ".".
  */
+
+/**
+ * This function does soft reset for a domain. During soft reset all
+ * source domain's memory is being reassigned to the destination domain,
+ * HVM context and HVM params are being copied.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm source_dom the id of the source domain
+ * @parm dest_dom the id of the destination domain
+ * @parm console_domid the id of the domain handling console
+ * @parm console_mfn returned with the mfn of the console page
+ * @parm store_domid the id of the domain handling store
+ * @parm store_mfn returned with the mfn of the store page
+ * @return 0 on success, -1 on failure
+ */
+int xc_domain_soft_reset(xc_interface *xch, uint32_t source_dom,
+                         uint32_t dest_dom, domid_t console_domid,
+                         unsigned long *console_mfn, domid_t store_domid,
+                         unsigned long *store_mfn);
+
 #define XC_DEVICE_MODEL_RESTORE_FILE "/var/lib/xen/qemu-resume"
 
 /**
diff --git a/tools/libxc/xc_domain_soft_reset.c b/tools/libxc/xc_domain_soft_reset.c
new file mode 100644
index 0000000..7012235
--- /dev/null
+++ b/tools/libxc/xc_domain_soft_reset.c
@@ -0,0 +1,272 @@
+/******************************************************************************
+ * xc_domain_soft_reset.c
+ *
+ * Do soft reset.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include <inttypes.h>
+#include <time.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/time.h>
+
+#include "xc_private.h"
+#include "xc_core.h"
+#include "xc_bitops.h"
+#include "xc_dom.h"
+#include "xg_private.h"
+#include "xg_save_restore.h"
+
+#include <xen/hvm/params.h>
+
+int xc_domain_soft_reset(xc_interface *xch, uint32_t source_dom,
+                         uint32_t dest_dom, domid_t console_domid,
+                         unsigned long *console_mfn, domid_t store_domid,
+                         unsigned long *store_mfn)
+{
+    xc_dominfo_t old_info, new_info;
+    int rc = 1;
+
+    uint32_t hvm_buf_size = 0;
+    uint64_t nr_trans;
+    uint8_t *hvm_buf = NULL;
+    unsigned long console_pfn, store_pfn, io_pfn, buffio_pfn;
+    uint64_t hvm_params[HVM_NR_PARAMS];
+    xen_pfn_t sharedinfo_pfn;
+
+    DPRINTF("%s: soft reset domid %u -> %u", __func__, source_dom, dest_dom);
+
+    if ( xc_domain_getinfo(xch, source_dom, 1, &old_info) != 1 )
+    {
+        PERROR("Could not get old domain info");
+        return 1;
+    }
+
+    if ( xc_domain_getinfo(xch, dest_dom, 1, &new_info) != 1 )
+    {
+        PERROR("Could not get new domain info");
+        return 1;
+    }
+
+    if ( !old_info.hvm || !new_info.hvm )
+    {
+        PERROR("Soft reset is supported for HVM only");
+        return 1;
+    }
+
+    sharedinfo_pfn = old_info.shared_info_frame;
+    if ( xc_get_pfn_type_batch(xch, source_dom, 1, &sharedinfo_pfn) )
+    {
+        PERROR("xc_get_pfn_type_batch failed");
+        goto out;
+    }
+
+    hvm_buf_size = xc_domain_hvm_getcontext(xch, source_dom, 0, 0);
+    if ( hvm_buf_size == -1 )
+    {
+        PERROR("Couldn't get HVM context size from Xen");
+        goto out;
+    }
+
+    hvm_buf = malloc(hvm_buf_size);
+    if ( !hvm_buf )
+    {
+        ERROR("Couldn't allocate memory");
+        goto out;
+    }
+
+    if ( xc_domain_hvm_getcontext(xch, source_dom, hvm_buf,
+                                  hvm_buf_size) == -1 )
+    {
+        PERROR("HVM:Could not get hvm buffer");
+        goto out;
+    }
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_STORE_PFN,
+                     &hvm_params[HVM_PARAM_STORE_PFN]);
+    store_pfn = hvm_params[HVM_PARAM_STORE_PFN];
+    *store_mfn = store_pfn;
+
+    xc_hvm_param_get(xch, source_dom,
+                     HVM_PARAM_CONSOLE_PFN,
+                     &hvm_params[HVM_PARAM_CONSOLE_PFN]);
+    console_pfn = hvm_params[HVM_PARAM_CONSOLE_PFN];
+    *console_mfn = console_pfn;
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_BUFIOREQ_PFN,
+                     &hvm_params[HVM_PARAM_BUFIOREQ_PFN]);
+    buffio_pfn = hvm_params[HVM_PARAM_BUFIOREQ_PFN];
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_IOREQ_PFN,
+                     &hvm_params[HVM_PARAM_IOREQ_PFN]);
+    io_pfn = hvm_params[HVM_PARAM_IOREQ_PFN];
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_IDENT_PT,
+                     &hvm_params[HVM_PARAM_IDENT_PT]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_PAGING_RING_PFN,
+                     &hvm_params[HVM_PARAM_PAGING_RING_PFN]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_VM86_TSS,
+                     &hvm_params[HVM_PARAM_VM86_TSS]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_ACPI_IOPORTS_LOCATION,
+                     &hvm_params[HVM_PARAM_ACPI_IOPORTS_LOCATION]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_VIRIDIAN,
+                     &hvm_params[HVM_PARAM_VIRIDIAN]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_PAE_ENABLED,
+                     &hvm_params[HVM_PARAM_PAE_ENABLED]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_STORE_EVTCHN,
+                     &hvm_params[HVM_PARAM_STORE_EVTCHN]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_IOREQ_SERVER_PFN,
+                     &hvm_params[HVM_PARAM_IOREQ_SERVER_PFN]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
+                     &hvm_params[HVM_PARAM_NR_IOREQ_SERVER_PAGES]);
+
+    xc_hvm_param_get(xch, source_dom, HVM_PARAM_VM_GENERATION_ID_ADDR,
+                     &hvm_params[HVM_PARAM_VM_GENERATION_ID_ADDR]);
+
+    rc = xc_domain_memory_soft_reset(xch, source_dom, dest_dom, &nr_trans);
+    if ( rc != 0 )
+    {
+        PERROR("Failed to devour original domain, rc=%d\n", rc);
+        goto out;
+    }
+
+    /*
+     * Shared info frame is being removed when guest maps shared info so this
+     * page is likely XEN_DOMCTL_PFINFO_XTAB but we need to replace it with an
+     * empty page in that case.
+     */
+    if ( sharedinfo_pfn == XEN_DOMCTL_PFINFO_XTAB &&
+         xc_domain_populate_physmap_exact(xch, dest_dom, 1, 0, 0,
+                                          &old_info.shared_info_frame) )
+    {
+        PERROR("Failed to populate pfn %lx (shared info)",
+               old_info.shared_info_frame);
+        goto out;
+    }
+
+    if ( xc_domain_hvm_setcontext(xch, dest_dom, hvm_buf,
+                                  hvm_buf_size) == -1 )
+    {
+        PERROR("HVM:Could not set hvm buffer");
+        goto out;
+    }
+
+    if ( store_pfn )
+        xc_clear_domain_page(xch, dest_dom, store_pfn);
+
+    if ( console_pfn )
+        xc_clear_domain_page(xch, dest_dom, console_pfn);
+
+    if ( buffio_pfn )
+        xc_clear_domain_page(xch, dest_dom, buffio_pfn);
+
+    if ( io_pfn )
+        xc_clear_domain_page(xch, dest_dom, io_pfn);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_STORE_PFN,
+                     hvm_params[HVM_PARAM_STORE_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom,
+                     HVM_PARAM_CONSOLE_PFN,
+                     hvm_params[HVM_PARAM_CONSOLE_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_BUFIOREQ_PFN,
+                     hvm_params[HVM_PARAM_BUFIOREQ_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_IOREQ_PFN,
+                     hvm_params[HVM_PARAM_IOREQ_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_IDENT_PT,
+                     hvm_params[HVM_PARAM_IDENT_PT]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_PAGING_RING_PFN,
+                     hvm_params[HVM_PARAM_PAGING_RING_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_VM86_TSS,
+                     hvm_params[HVM_PARAM_VM86_TSS]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_ACPI_IOPORTS_LOCATION,
+                     hvm_params[HVM_PARAM_ACPI_IOPORTS_LOCATION]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_VIRIDIAN,
+                     hvm_params[HVM_PARAM_VIRIDIAN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_PAE_ENABLED,
+                     hvm_params[HVM_PARAM_PAE_ENABLED]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_STORE_EVTCHN,
+                     hvm_params[HVM_PARAM_STORE_EVTCHN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_IOREQ_SERVER_PFN,
+                     hvm_params[HVM_PARAM_IOREQ_SERVER_PFN]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
+                     hvm_params[HVM_PARAM_NR_IOREQ_SERVER_PAGES]);
+
+    xc_hvm_param_set(xch, dest_dom, HVM_PARAM_VM_GENERATION_ID_ADDR,
+                     hvm_params[HVM_PARAM_VM_GENERATION_ID_ADDR]);
+
+    if ( xc_dom_gnttab_hvm_seed(xch, dest_dom, console_pfn, store_pfn,
+                               console_domid, store_domid) )
+    {
+        PERROR("Error seeding hvm grant table");
+        goto out;
+    }
+
+    xc_domain_destroy(xch, source_dom);
+
+    /* Check if we need to add pages to the PoD cache of the new domain */
+    if ( (nr_trans < old_info.nr_pages) &&
+         xc_domain_set_pod_target(xch, dest_dom, old_info.nr_pages +
+                                  (sharedinfo_pfn == XEN_DOMCTL_PFINFO_XTAB ?
+                                   1 : 0), NULL, NULL, NULL) )
+    {
+        PERROR("Failed to set POD target for new domain");
+        goto out;
+    }
+
+    rc = 0;
+ out:
+    if ( hvm_buf )
+        free(hvm_buf);
+
+    if ( (rc != 0) && (dest_dom != 0) ) {
+            PERROR("Failed to perform soft reset, destroying domain %d",
+                   dest_dom);
+	    xc_domain_destroy(xch, dest_dom);
+    }
+
+    return !!rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 08/10] xl: introduce enum domain_restart_type
  2015-05-13  9:49 [PATCH v6 00/10] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (6 preceding siblings ...)
  2015-05-13  9:49 ` [PATCH v6 07/10] libxc: introduce soft reset for HVM domains Vitaly Kuznetsov
@ 2015-05-13  9:49 ` Vitaly Kuznetsov
  2015-05-13  9:49 ` [PATCH v6 09/10] libxc: add XC_DEVICE_MODEL_SAVE_FILE Vitaly Kuznetsov
  2015-05-13  9:49 ` [PATCH v6 10/10] (lib)xl: soft reset support Vitaly Kuznetsov
  9 siblings, 0 replies; 31+ messages in thread
From: Vitaly Kuznetsov @ 2015-05-13  9:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu, Daniel De Graaf

As a preparation before adding new restart type (soft reset) put all
restart types into an enum.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/libxl/xl.h         |  6 ++++++
 tools/libxl/xl_cmdimpl.c | 23 ++++++++++-------------
 2 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 5bc138c..60045d4 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -186,6 +186,12 @@ enum output_format {
 };
 extern enum output_format default_output_format;
 
+typedef enum {
+    DOMAIN_RESTART_NONE = 0,     /* No domain restart */
+    DOMAIN_RESTART_NORMAL,       /* Domain should be restarter */
+    DOMAIN_RESTART_RENAME,       /* Domain should be renamed and restarted */
+} domain_restart_type;
+
 extern void printf_info_sexp(int domid, libxl_domain_config *d_config, FILE *fh);
 
 #define XL_GLOBAL_CONFIG XEN_CONFIG_DIR "/xl.conf"
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index c8cdf6e..a757023 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -2262,15 +2262,12 @@ static void reload_domain_config(uint32_t domid,
     }
 }
 
-/* Returns 1 if domain should be restarted,
- * 2 if domain should be renamed then restarted, or 0
- * Can update r_domid if domain is destroyed etc */
-static int handle_domain_death(uint32_t *r_domid,
-                               libxl_event *event,
-                               libxl_domain_config *d_config)
-
+/* Can update r_domid if domain is destroyed */
+static domain_restart_type handle_domain_death(uint32_t *r_domid,
+                                               libxl_event *event,
+                                               libxl_domain_config *d_config)
 {
-    int restart = 0;
+    domain_restart_type restart = DOMAIN_RESTART_NONE;
     libxl_action_on_shutdown action;
 
     switch (event->u.domain_shutdown.shutdown_reason) {
@@ -2325,12 +2322,12 @@ static int handle_domain_death(uint32_t *r_domid,
 
     case LIBXL_ACTION_ON_SHUTDOWN_RESTART_RENAME:
         reload_domain_config(*r_domid, d_config);
-        restart = 2;
+        restart = DOMAIN_RESTART_RENAME;
         break;
 
     case LIBXL_ACTION_ON_SHUTDOWN_RESTART:
         reload_domain_config(*r_domid, d_config);
-        restart = 1;
+        restart = DOMAIN_RESTART_NORMAL;
         /* fall-through */
     case LIBXL_ACTION_ON_SHUTDOWN_DESTROY:
         LOG("Domain %d needs to be cleaned up: destroying the domain",
@@ -2778,7 +2775,7 @@ start:
                 event->u.domain_shutdown.shutdown_reason,
                 event->u.domain_shutdown.shutdown_reason);
             switch (handle_domain_death(&domid, event, &d_config)) {
-            case 2:
+            case DOMAIN_RESTART_RENAME:
                 if (!preserve_domain(&domid, event, &d_config)) {
                     /* If we fail then exit leaving the old domain in place. */
                     ret = -1;
@@ -2786,7 +2783,7 @@ start:
                 }
 
                 /* Otherwise fall through and restart. */
-            case 1:
+            case DOMAIN_RESTART_NORMAL:
                 libxl_event_free(ctx, event);
                 libxl_evdisable_domain_death(ctx, deathw);
                 deathw = NULL;
@@ -2822,7 +2819,7 @@ start:
                 sleep(2);
                 goto start;
 
-            case 0:
+            case DOMAIN_RESTART_NONE:
                 LOG("Done. Exiting now");
                 libxl_event_free(ctx, event);
                 ret = 0;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 09/10] libxc: add XC_DEVICE_MODEL_SAVE_FILE
  2015-05-13  9:49 [PATCH v6 00/10] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (7 preceding siblings ...)
  2015-05-13  9:49 ` [PATCH v6 08/10] xl: introduce enum domain_restart_type Vitaly Kuznetsov
@ 2015-05-13  9:49 ` Vitaly Kuznetsov
  2015-05-13  9:49 ` [PATCH v6 10/10] (lib)xl: soft reset support Vitaly Kuznetsov
  9 siblings, 0 replies; 31+ messages in thread
From: Vitaly Kuznetsov @ 2015-05-13  9:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu, Daniel De Graaf

Use this in libxl_dm instead of hard-coding.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/libxc/include/xenguest.h | 1 +
 tools/libxl/libxl_dm.c         | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index d1227f7..42948df 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -166,6 +166,7 @@ int xc_domain_soft_reset(xc_interface *xch, uint32_t source_dom,
                          unsigned long *console_mfn, domid_t store_domid,
                          unsigned long *store_mfn);
 
+#define XC_DEVICE_MODEL_SAVE_FILE "/var/lib/xen/qemu-save"
 #define XC_DEVICE_MODEL_RESTORE_FILE "/var/lib/xen/qemu-resume"
 
 /**
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 0c6408d..869f4cd 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -31,7 +31,7 @@ static const char *libxl_tapif_script(libxl__gc *gc)
 
 const char *libxl__device_model_savefile(libxl__gc *gc, uint32_t domid)
 {
-    return libxl__sprintf(gc, "/var/lib/xen/qemu-save.%d", domid);
+    return libxl__sprintf(gc, XC_DEVICE_MODEL_SAVE_FILE".%d", domid);
 }
 
 static const char *qemu_xen_path(libxl__gc *gc)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 10/10] (lib)xl: soft reset support
  2015-05-13  9:49 [PATCH v6 00/10] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (8 preceding siblings ...)
  2015-05-13  9:49 ` [PATCH v6 09/10] libxc: add XC_DEVICE_MODEL_SAVE_FILE Vitaly Kuznetsov
@ 2015-05-13  9:49 ` Vitaly Kuznetsov
  2015-05-22 14:55   ` Vitaly Kuznetsov
  9 siblings, 1 reply; 31+ messages in thread
From: Vitaly Kuznetsov @ 2015-05-13  9:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu, Daniel De Graaf

Perform soft reset when a domain did SHUTDOWN_soft_reset. Migrate the
content with xc_domain_soft_reset(), reload dm and toolstack.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 docs/man/xl.cfg.pod.5        |  12 +++++
 tools/libxl/libxl.c          |   4 ++
 tools/libxl/libxl.h          |   6 +++
 tools/libxl/libxl_create.c   | 121 +++++++++++++++++++++++++++++++++++++++----
 tools/libxl/libxl_internal.h |  26 ++++++++++
 tools/libxl/libxl_types.idl  |   3 ++
 tools/libxl/xl.h             |   1 +
 tools/libxl/xl_cmdimpl.c     |  33 +++++++++++-
 8 files changed, 195 insertions(+), 11 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 8e4154f..ced5cc0 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -368,6 +368,13 @@ destroy the domain.
 write a "coredump" of the domain to F</var/xen/dump/NAME> and then
 restart the domain.
 
+=item B<soft-reset>
+
+create a new domain with the same configuration, reassign all the domain's
+memory to this new domain, kill the original domain, and continue execution
+of the new domain from where the action was triggered. Supported for HVM
+guests only.
+
 =back
 
 The default for C<on_poweroff> is C<destroy>.
@@ -386,6 +393,11 @@ Default is C<destroy>.
 
 Action to take if the domain crashes.  Default is C<destroy>.
 
+=item B<on_soft_reset="ACTION">
+
+Action to take if the domain performs 'soft reset' (e.g. does kexec).
+Default is C<soft-reset>.
+
 =back
 
 =head3 Direct Kernel Boot
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 516713e..2c31456 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -1492,6 +1492,7 @@ void libxl__domain_destroy(libxl__egc *egc, libxl__domain_destroy_state *dds)
         dds->stubdom.ao = ao;
         dds->stubdom.domid = stubdomid;
         dds->stubdom.callback = stubdom_destroy_callback;
+        dds->stubdom.soft_reset = false;
         libxl__destroy_domid(egc, &dds->stubdom);
     } else {
         dds->stubdom_finished = 1;
@@ -1500,6 +1501,7 @@ void libxl__domain_destroy(libxl__egc *egc, libxl__domain_destroy_state *dds)
     dds->domain.ao = ao;
     dds->domain.domid = dds->domid;
     dds->domain.callback = domain_destroy_callback;
+    dds->domain.soft_reset = dds->soft_reset;
     libxl__destroy_domid(egc, &dds->domain);
 }
 
@@ -1678,6 +1680,8 @@ static void devices_destroy_cb(libxl__egc *egc,
 
     libxl__unlock_domain_userdata(lock);
 
+    if (dis->soft_reset) goto out;
+
     rc = libxl__ev_child_fork(gc, &dis->destroyer, domain_destroy_domid_cb);
     if (rc < 0) goto out;
     if (!rc) { /* child */
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 8be0840..eb27031 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1004,6 +1004,12 @@ int static inline libxl_domain_create_restore_0x040200(
 
 #endif
 
+int libxl_domain_soft_reset(libxl_ctx *ctx, libxl_domain_config *d_config,
+                            uint32_t *domid, uint32_t domid_soft_reset,
+                            const libxl_asyncop_how *ao_how,
+                            const libxl_asyncprogress_how *aop_console_how)
+                            LIBXL_EXTERNAL_CALLERS_ONLY;
+
   /* A progress report will be made via ao_console_how, of type
    * domain_create_console_available, when the domain's primary
    * console is available and can be connected to.
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f0da7dc..7c9fcd6 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -945,6 +945,9 @@ static void initiate_domain_create(libxl__egc *egc,
     if (restore_fd >= 0) {
         LOG(DEBUG, "restoring, not running bootloader");
         domcreate_bootloader_done(egc, &dcs->bl, 0);
+    } else if (dcs->domid_soft_reset != INVALID_DOMID) {
+        LOG(DEBUG, "soft reset, not running bootloader\n");
+        domcreate_bootloader_done(egc, &dcs->bl, 0);
     } else  {
         LOG(DEBUG, "running bootloader");
         dcs->bl.callback = domcreate_bootloader_done;
@@ -993,6 +996,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     libxl_domain_config *const d_config = dcs->guest_config;
     libxl_domain_build_info *const info = &d_config->b_info;
     const int restore_fd = dcs->restore_fd;
+    const uint32_t domid_soft_reset = dcs->domid_soft_reset;
     libxl__domain_build_state *const state = &dcs->build_state;
     libxl__srm_restore_autogen_callbacks *const callbacks =
         &dcs->shs.callbacks.restore.a;
@@ -1016,7 +1020,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     dcs->dmss.dm.callback = domcreate_devmodel_started;
     dcs->dmss.callback = domcreate_devmodel_started;
 
-    if ( restore_fd < 0 ) {
+    if ( (restore_fd < 0) && (domid_soft_reset == INVALID_DOMID) ) {
         rc = libxl__domain_build(gc, d_config, domid, state);
         domcreate_rebuild_done(egc, dcs, rc);
         return;
@@ -1046,14 +1050,55 @@ static void domcreate_bootloader_done(libxl__egc *egc,
         rc = ERROR_INVAL;
         goto out;
     }
-    libxl__xc_domain_restore(egc, dcs,
-                             hvm, pae, superpages);
+    if ( restore_fd >= 0 ) {
+        libxl__xc_domain_restore(egc, dcs,
+                                 hvm, pae, superpages);
+    } else {
+        libxl__xc_domain_soft_reset(egc, dcs);
+    }
+
     return;
 
  out:
     libxl__xc_domain_restore_done(egc, dcs, rc, 0, 0);
 }
 
+void libxl__xc_domain_soft_reset(libxl__egc *egc,
+                                 libxl__domain_create_state *dcs)
+{
+    STATE_AO_GC(dcs->ao);
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+    const uint32_t domid_soft_reset = dcs->domid_soft_reset;
+    const uint32_t domid = dcs->guest_domid;
+    uint8_t *buf;
+    uint32_t len;
+    int rc;
+    libxl__domain_suspend_state *dss;
+
+    GCNEW(dss);
+    dss->ao = ao;
+    dss->domid = domid_soft_reset;
+
+    rc = libxl__toolstack_save(domid_soft_reset, &buf, &len, dss);
+    if (rc) goto out;
+
+    rc = xc_domain_soft_reset(ctx->xch, domid_soft_reset, domid,
+                              dcs->build_state.console_domid,
+                              &dcs->build_state.console_mfn,
+                              dcs->build_state.store_domid,
+                              &dcs->build_state.store_mfn);
+    if (rc) goto out;
+
+    rc = libxl__toolstack_restore(domid, buf, len, &dcs->shs);
+    if (rc) goto out;
+out:
+    /*
+     * Now pretend we did normal restore and simply call
+     * libxl__xc_domain_restore_done().
+     */
+    libxl__xc_domain_restore_done(egc, dcs, rc, 0, 0);
+}
+
 void libxl__srm_callout_callback_restore_results(unsigned long store_mfn,
           unsigned long console_mfn, void *user)
 {
@@ -1079,6 +1124,7 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
 
     /* convenience aliases */
     const uint32_t domid = dcs->guest_domid;
+    const uint32_t domid_soft_reset = dcs->domid_soft_reset;
     libxl_domain_config *const d_config = dcs->guest_config;
     libxl_domain_build_info *const info = &d_config->b_info;
     libxl__domain_build_state *const state = &dcs->build_state;
@@ -1131,9 +1177,12 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
     if (ret)
         goto out;
 
-    if (info->type == LIBXL_DOMAIN_TYPE_HVM) {
+    if (info->type == LIBXL_DOMAIN_TYPE_HVM && fd != -1) {
         state->saved_state = GCSPRINTF(
                        XC_DEVICE_MODEL_RESTORE_FILE".%d", domid);
+    } else if (domid_soft_reset != INVALID_DOMID) {
+        state->saved_state = GCSPRINTF(
+                       XC_DEVICE_MODEL_SAVE_FILE".%d", domid_soft_reset);
     }
 
 out:
@@ -1142,9 +1191,12 @@ out:
         libxl__file_reference_unmap(&state->pv_ramdisk);
     }
 
-    esave = errno;
-    libxl_fd_set_nonblock(ctx, fd, 0);
-    errno = esave;
+    if ( fd != -1 ) {
+        esave = errno;
+        libxl_fd_set_nonblock(ctx, fd, 0);
+        errno = esave;
+    }
+
     domcreate_rebuild_done(egc, dcs, ret);
 }
 
@@ -1529,20 +1581,37 @@ static void domcreate_destruction_cb(libxl__egc *egc,
 typedef struct {
     libxl__domain_create_state dcs;
     uint32_t *domid_out;
+    libxl__domain_destroy_state dds;
 } libxl__app_domain_create_state;
 
 static void domain_create_cb(libxl__egc *egc,
                              libxl__domain_create_state *dcs,
                              int rc, uint32_t domid);
 
+static void domain_soft_reset_cb(libxl__egc *egc,
+                                 libxl__domain_destroy_state *dds,
+                                 int rc)
+{
+    STATE_AO_GC(dds->ao);
+    libxl__app_domain_create_state *cdcs = CONTAINER_OF(dds, *cdcs, dds);
+
+    if (rc)
+        LOG(ERROR, "destruction of domain %u failed", dds->domid);
+
+    initiate_domain_create(egc, &cdcs->dcs);
+}
+
 static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
                             uint32_t *domid,
                             int restore_fd, int checkpointed_stream,
+                            uint32_t domid_soft_reset,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
     AO_CREATE(ctx, 0, ao_how);
     libxl__app_domain_create_state *cdcs;
+    libxl__domain_suspend_state *dss;
+    int rc;
 
     GCNEW(cdcs);
     cdcs->dcs.ao = ao;
@@ -1550,12 +1619,30 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
     libxl_domain_config_init(&cdcs->dcs.guest_config_saved);
     libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config);
     cdcs->dcs.restore_fd = restore_fd;
+    cdcs->dcs.domid_soft_reset = domid_soft_reset;
     cdcs->dcs.callback = domain_create_cb;
     cdcs->dcs.checkpointed_stream = checkpointed_stream;
     libxl__ao_progress_gethow(&cdcs->dcs.aop_console_how, aop_console_how);
     cdcs->domid_out = domid;
 
-    initiate_domain_create(egc, &cdcs->dcs);
+    if (domid_soft_reset != INVALID_DOMID) {
+        GCNEW(dss);
+        dss->ao = ao;
+        dss->domid = domid_soft_reset;
+        dss->dm_savefile = GCSPRINTF(XC_DEVICE_MODEL_SAVE_FILE".%d",
+                                     domid_soft_reset);
+
+        rc = libxl__domain_suspend_device_model(gc, dss);
+        if (rc) return AO_ABORT(rc);
+
+        cdcs->dds.ao = ao;
+        cdcs->dds.domid = domid_soft_reset;
+        cdcs->dds.callback = domain_soft_reset_cb;
+        cdcs->dds.soft_reset = true;
+        libxl__domain_destroy(egc, &cdcs->dds);
+    }
+    else
+        initiate_domain_create(egc, &cdcs->dcs);
 
     return AO_INPROGRESS;
 }
@@ -1578,7 +1665,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
-    return do_domain_create(ctx, d_config, domid, -1, 0,
+    return do_domain_create(ctx, d_config, domid, -1, 0, INVALID_DOMID,
                             ao_how, aop_console_how);
 }
 
@@ -1589,7 +1676,21 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 const libxl_asyncprogress_how *aop_console_how)
 {
     return do_domain_create(ctx, d_config, domid, restore_fd,
-                            params->checkpointed_stream, ao_how, aop_console_how);
+                            params->checkpointed_stream, INVALID_DOMID,
+                            ao_how, aop_console_how);
+}
+
+int libxl_domain_soft_reset(libxl_ctx *ctx, libxl_domain_config *d_config,
+                            uint32_t *domid, uint32_t domid_soft_reset,
+                            const libxl_asyncop_how *ao_how,
+                            const libxl_asyncprogress_how *aop_console_how)
+{
+    libxl_domain_build_info *const info = &d_config->b_info;
+
+    if (info->type != LIBXL_DOMAIN_TYPE_HVM) return ERROR_INVAL;
+
+    return do_domain_create(ctx, d_config, domid, -1, 0, domid_soft_reset,
+                            ao_how, aop_console_how);
 }
 
 /*
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 8eb38aa..b950134 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -106,6 +106,7 @@
 #define TAP_DEVICE_SUFFIX "-emu"
 #define DISABLE_UDEV_PATH "libxl/disable_udev"
 #define DOMID_XS_PATH "domid"
+#define INVALID_DOMID ~0
 
 #define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
 
@@ -2997,6 +2998,7 @@ struct libxl__destroy_domid_state {
     /* private to implementation */
     libxl__devices_remove_state drs;
     libxl__ev_child destroyer;
+    bool soft_reset;
 };
 
 struct libxl__domain_destroy_state {
@@ -3011,6 +3013,7 @@ struct libxl__domain_destroy_state {
     int stubdom_finished;
     libxl__destroy_domid_state domain;
     int domain_finished;
+    bool soft_reset;
 };
 
 /*
@@ -3111,6 +3114,7 @@ struct libxl__domain_create_state {
     libxl_domain_config *guest_config;
     libxl_domain_config guest_config_saved; /* vanilla config */
     int restore_fd;
+    uint32_t domid_soft_reset;
     libxl__domain_create_cb *callback;
     libxl_asyncprogress_how aop_console_how;
     /* private to domain_create */
@@ -3175,6 +3179,28 @@ _hidden void libxl__domain_save_device_model(libxl__egc *egc,
 
 _hidden const char *libxl__device_model_savefile(libxl__gc *gc, uint32_t domid);
 
+/*
+ * Soft reset is a special type of reset when the new domain is being built
+ * with the memory contents and vCPU contexts of the original domain thus
+ * allowing to continue execution from where the hypercall was done. This is
+ * currently supported for HVM domains only and is done as a modification for
+ * the domain create/restore path within libxl:
+ * do_domain_create()
+ *    libxl__domain_suspend_device_model()
+ *    libxl__domain_destroy()->domain_soft_reset_cb()->initiate_domain_create()
+ *    ... regular domain create path ...
+ *    domcreate_bootloader_done()
+ *       libxl__toolstack_save()
+ *       libxl__xc_domain_soft_reset()
+ *           xc_domain_soft_reset() which saves HVM context and HVM params,
+ *             calls XENMEM_soft_reset to transfer the domain's memory,
+ *             restores HVM context and HVM params for the new domain.
+ *       libxl__toolstack_restore()
+ *       libxl__xc_domain_restore_done() and follow the standard restore path.
+ */
+
+_hidden void libxl__xc_domain_soft_reset(libxl__egc *egc,
+                                         libxl__domain_create_state *dcs);
 
 /*
  * Convenience macros.
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 986c4d3..0fa076c 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -123,6 +123,8 @@ libxl_action_on_shutdown = Enumeration("action_on_shutdown", [
 
     (5, "COREDUMP_DESTROY"),
     (6, "COREDUMP_RESTART"),
+
+    (7, "SOFT_RESET"),
     ], init_val = "LIBXL_ACTION_ON_SHUTDOWN_DESTROY")
 
 libxl_trigger = Enumeration("trigger", [
@@ -573,6 +575,7 @@ libxl_domain_config = Struct("domain_config", [
     ("on_reboot", libxl_action_on_shutdown),
     ("on_watchdog", libxl_action_on_shutdown),
     ("on_crash", libxl_action_on_shutdown),
+    ("on_soft_reset", libxl_action_on_shutdown),
     ], dir=DIR_IN)
 
 libxl_diskinfo = Struct("diskinfo", [
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 60045d4..3abb7d5 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -190,6 +190,7 @@ typedef enum {
     DOMAIN_RESTART_NONE = 0,     /* No domain restart */
     DOMAIN_RESTART_NORMAL,       /* Domain should be restarter */
     DOMAIN_RESTART_RENAME,       /* Domain should be renamed and restarted */
+    DOMAIN_RESTART_SOFT_RESET,   /* Soft reset should be performed */
 } domain_restart_type;
 
 extern void printf_info_sexp(int domid, libxl_domain_config *d_config, FILE *fh);
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index a757023..09f4c94 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -130,6 +130,8 @@ static const char *action_on_shutdown_names[] = {
 
     [LIBXL_ACTION_ON_SHUTDOWN_COREDUMP_DESTROY] = "coredump-destroy",
     [LIBXL_ACTION_ON_SHUTDOWN_COREDUMP_RESTART] = "coredump-restart",
+
+    [LIBXL_ACTION_ON_SHUTDOWN_SOFT_RESET] = "soft-reset",
 };
 
 /* Optional data, in order:
@@ -1329,6 +1331,13 @@ static void parse_config_data(const char *config_source,
         exit(1);
     }
 
+    if (xlu_cfg_get_string (config, "on_soft_reset", &buf, 0))
+        buf = "soft-reset";
+    if (!parse_action_on_shutdown(buf, &d_config->on_soft_reset)) {
+        fprintf(stderr, "Unknown on_soft_reset action \"%s\" specified\n", buf);
+        exit(1);
+    }
+
     /* libxl_get_required_shadow_memory() must be called after final values
      * (default or specified) for vcpus and memory are set, because the
      * calculation depends on those values. */
@@ -2286,6 +2295,9 @@ static domain_restart_type handle_domain_death(uint32_t *r_domid,
     case LIBXL_SHUTDOWN_REASON_WATCHDOG:
         action = d_config->on_watchdog;
         break;
+    case LIBXL_SHUTDOWN_REASON_SOFT_RESET:
+        action = d_config->on_soft_reset;
+        break;
     default:
         LOG("Unknown shutdown reason code %d. Destroying domain.",
             event->u.domain_shutdown.shutdown_reason);
@@ -2336,6 +2348,11 @@ static domain_restart_type handle_domain_death(uint32_t *r_domid,
         *r_domid = INVALID_DOMID;
         break;
 
+    case LIBXL_ACTION_ON_SHUTDOWN_SOFT_RESET:
+        reload_domain_config(*r_domid, d_config);
+        restart = DOMAIN_RESTART_SOFT_RESET;
+        break;
+
     case LIBXL_ACTION_ON_SHUTDOWN_COREDUMP_DESTROY:
     case LIBXL_ACTION_ON_SHUTDOWN_COREDUMP_RESTART:
         /* Already handled these above. */
@@ -2485,6 +2502,7 @@ static void evdisable_disk_ejects(libxl_evgen_disk_eject **diskws,
 static uint32_t create_domain(struct domain_create *dom_info)
 {
     uint32_t domid = INVALID_DOMID;
+    uint32_t domid_old = INVALID_DOMID;
 
     libxl_domain_config d_config;
 
@@ -2711,7 +2729,17 @@ start:
          * restore/migrate-receive it again.
          */
         restoring = 0;
-    }else{
+    } else if (domid_old != INVALID_DOMID) {
+        /* Do soft reset */
+        ret = libxl_domain_soft_reset(ctx, &d_config,
+                                      &domid, domid_old,
+                                      0, autoconnect_console_how);
+
+        if ( ret )
+            goto error_out;
+
+        domid_old = INVALID_DOMID;
+    } else {
         ret = libxl_domain_create_new(ctx, &d_config, &domid,
                                       0, autoconnect_console_how);
     }
@@ -2775,6 +2803,9 @@ start:
                 event->u.domain_shutdown.shutdown_reason,
                 event->u.domain_shutdown.shutdown_reason);
             switch (handle_domain_death(&domid, event, &d_config)) {
+            case DOMAIN_RESTART_SOFT_RESET:
+                domid_old = domid;
+                /* fall through */
             case DOMAIN_RESTART_RENAME:
                 if (!preserve_domain(&domid, event, &d_config)) {
                     /* If we fail then exit leaving the old domain in place. */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 02/10] libxl: support SHUTDOWN_soft_reset shutdown reason
  2015-05-13  9:49 ` [PATCH v6 02/10] libxl: support " Vitaly Kuznetsov
@ 2015-05-20  9:44   ` Wei Liu
  0 siblings, 0 replies; 31+ messages in thread
From: Wei Liu @ 2015-05-20  9:44 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Wei Liu, Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, xen-devel,
	Daniel De Graaf

On Wed, May 13, 2015 at 11:49:37AM +0200, Vitaly Kuznetsov wrote:
> Use letter 'S' to indicate a domain in such state.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  tools/libxl/libxl.h               | 7 +++++++
>  tools/libxl/libxl_types.idl       | 1 +
>  tools/libxl/xl_cmdimpl.c          | 2 +-
>  tools/python/xen/lowlevel/xl/xl.c | 1 +
>  4 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index efc0617..8be0840 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -192,6 +192,13 @@
>   * is not present, instead of ERROR_INVAL.
>   */
>  #define LIBXL_HAVE_ERROR_DOMAIN_NOTFOUND 1
> +
> +/*
> + * LIBXL_HAVE_SHUTDWON_REASON_SOFT_RESET indicates that libxl supports
> + * 'soft reset' shutdown reason in enum libxl_shutdown_reason.
> + */
> +#define LIBXL_HAVE_SHUTDWON_REASON_SOFT_RESET 1
> +

This #define is too fine-grained.  And you didn't have any #define in
your last patch to advertise the new functionality.

You can just use LIBXL_HAVE_SOFT_RESET in your last patch.

Wei.

>  /*
>   * libxl ABI compatibility
>   *
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 023b21e..986c4d3 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -177,6 +177,7 @@ libxl_shutdown_reason = Enumeration("shutdown_reason", [
>      (2, "suspend"),
>      (3, "crash"),
>      (4, "watchdog"),
> +    (5, "soft_reset"),
>      ], init_val = "LIBXL_SHUTDOWN_REASON_UNKNOWN")
>  
>  libxl_vga_interface_type = Enumeration("vga_interface_type", [
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 526a1f6..c8cdf6e 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -3704,7 +3704,7 @@ static void list_domains(bool verbose, bool context, bool claim, bool numa,
>                           bool cpupool, const libxl_dominfo *info, int nb_domain)
>  {
>      int i;
> -    static const char shutdown_reason_letters[]= "-rscw";
> +    static const char shutdown_reason_letters[]= "-rscwS";
>      libxl_bitmap nodemap;
>      libxl_physinfo physinfo;
>  
> diff --git a/tools/python/xen/lowlevel/xl/xl.c b/tools/python/xen/lowlevel/xl/xl.c
> index 32f982a..7c61160 100644
> --- a/tools/python/xen/lowlevel/xl/xl.c
> +++ b/tools/python/xen/lowlevel/xl/xl.c
> @@ -784,6 +784,7 @@ PyMODINIT_FUNC initxl(void)
>      _INT_CONST_LIBXL(m, SHUTDOWN_REASON_SUSPEND);
>      _INT_CONST_LIBXL(m, SHUTDOWN_REASON_CRASH);
>      _INT_CONST_LIBXL(m, SHUTDOWN_REASON_WATCHDOG);
> +    _INT_CONST_LIBXL(m, SHUTDOWN_REASON_SOFT_RESET);
>  
>      genwrap__init(m);
>  }
> -- 
> 1.9.3

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 07/10] libxc: introduce soft reset for HVM domains
  2015-05-13  9:49 ` [PATCH v6 07/10] libxc: introduce soft reset for HVM domains Vitaly Kuznetsov
@ 2015-05-20 15:10   ` Julien Grall
  2015-05-20 15:20     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 31+ messages in thread
From: Julien Grall @ 2015-05-20 15:10 UTC (permalink / raw)
  To: Vitaly Kuznetsov, xen-devel
  Cc: Andrew Jones, Keir Fraser, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu, Daniel De Graaf

Hi Vitaly,

On 13/05/15 10:49, Vitaly Kuznetsov wrote:
> +int xc_domain_soft_reset(xc_interface *xch, uint32_t source_dom,
> +                         uint32_t dest_dom, domid_t console_domid,
> +                         unsigned long *console_mfn, domid_t store_domid,
> +                         unsigned long *store_mfn)
> +{
> +    xc_dominfo_t old_info, new_info;
> +    int rc = 1;
> +
> +    uint32_t hvm_buf_size = 0;
> +    uint64_t nr_trans;
> +    uint8_t *hvm_buf = NULL;
> +    unsigned long console_pfn, store_pfn, io_pfn, buffio_pfn;
> +    uint64_t hvm_params[HVM_NR_PARAMS];
> +    xen_pfn_t sharedinfo_pfn;
> +
> +    DPRINTF("%s: soft reset domid %u -> %u", __func__, source_dom, dest_dom);
> +
> +    if ( xc_domain_getinfo(xch, source_dom, 1, &old_info) != 1 )
> +    {
> +        PERROR("Could not get old domain info");
> +        return 1;
> +    }
> +
> +    if ( xc_domain_getinfo(xch, dest_dom, 1, &new_info) != 1 )
> +    {
> +        PERROR("Could not get new domain info");
> +        return 1;
> +    }

xc_domain_getinfo returns the first domain ID used from dest_dom. If
dest_dom doesn't exist it may return another domain.

Therefore you have to check that the info correspond to the correct domain.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 07/10] libxc: introduce soft reset for HVM domains
  2015-05-20 15:10   ` Julien Grall
@ 2015-05-20 15:20     ` Vitaly Kuznetsov
  2015-05-20 15:28       ` Julien Grall
  0 siblings, 1 reply; 31+ messages in thread
From: Vitaly Kuznetsov @ 2015-05-20 15:20 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Liu, Andrew Jones, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Julien Grall, Ian Jackson,
	Olaf Hering, Tim Deegan, David Vrabel, Jan Beulich, xen-devel,
	Daniel De Graaf

Julien Grall <julien.grall@citrix.com> writes:

> Hi Vitaly,
>
> On 13/05/15 10:49, Vitaly Kuznetsov wrote:
>> +int xc_domain_soft_reset(xc_interface *xch, uint32_t source_dom,
>> +                         uint32_t dest_dom, domid_t console_domid,
>> +                         unsigned long *console_mfn, domid_t store_domid,
>> +                         unsigned long *store_mfn)
>> +{
>> +    xc_dominfo_t old_info, new_info;
>> +    int rc = 1;
>> +
>> +    uint32_t hvm_buf_size = 0;
>> +    uint64_t nr_trans;
>> +    uint8_t *hvm_buf = NULL;
>> +    unsigned long console_pfn, store_pfn, io_pfn, buffio_pfn;
>> +    uint64_t hvm_params[HVM_NR_PARAMS];
>> +    xen_pfn_t sharedinfo_pfn;
>> +
>> +    DPRINTF("%s: soft reset domid %u -> %u", __func__, source_dom, dest_dom);
>> +
>> +    if ( xc_domain_getinfo(xch, source_dom, 1, &old_info) != 1 )
>> +    {
>> +        PERROR("Could not get old domain info");
>> +        return 1;
>> +    }
>> +
>> +    if ( xc_domain_getinfo(xch, dest_dom, 1, &new_info) != 1 )
>> +    {
>> +        PERROR("Could not get new domain info");
>> +        return 1;
>> +    }
>
> xc_domain_getinfo returns the first domain ID used from dest_dom. If
> dest_dom doesn't exist it may return another domain.
>
> Therefore you have to check that the info correspond to the correct domain.

Oh, thanks, I fixed xc_get_tot_pages() a while ago:

commit 5dcd0dcb85941bd92336e01ae3f8c44730099c96
Author: Vitaly Kuznetsov <vkuznets@redhat.com>
Date:   Tue Dec 2 16:18:08 2014 +0100

    libxc: check in xc_get_tot_pages() that the proper domain is
    reported

but I already managed to forget that xc_domain_getinfo() needs
additional check. Its interface is a bit misleading :-(

>
> Regards,


-- 
  Vitaly

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 07/10] libxc: introduce soft reset for HVM domains
  2015-05-20 15:20     ` Vitaly Kuznetsov
@ 2015-05-20 15:28       ` Julien Grall
  0 siblings, 0 replies; 31+ messages in thread
From: Julien Grall @ 2015-05-20 15:28 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Julien Grall
  Cc: Andrew Jones, Keir Fraser, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, xen-devel, Wei Liu,
	Daniel De Graaf

On 20/05/15 16:20, Vitaly Kuznetsov wrote:
> Julien Grall <julien.grall@citrix.com> writes:
> 
>> Hi Vitaly,
>>
>> On 13/05/15 10:49, Vitaly Kuznetsov wrote:
>>> +int xc_domain_soft_reset(xc_interface *xch, uint32_t source_dom,
>>> +                         uint32_t dest_dom, domid_t console_domid,
>>> +                         unsigned long *console_mfn, domid_t store_domid,
>>> +                         unsigned long *store_mfn)
>>> +{
>>> +    xc_dominfo_t old_info, new_info;
>>> +    int rc = 1;
>>> +
>>> +    uint32_t hvm_buf_size = 0;
>>> +    uint64_t nr_trans;
>>> +    uint8_t *hvm_buf = NULL;
>>> +    unsigned long console_pfn, store_pfn, io_pfn, buffio_pfn;
>>> +    uint64_t hvm_params[HVM_NR_PARAMS];
>>> +    xen_pfn_t sharedinfo_pfn;
>>> +
>>> +    DPRINTF("%s: soft reset domid %u -> %u", __func__, source_dom, dest_dom);
>>> +
>>> +    if ( xc_domain_getinfo(xch, source_dom, 1, &old_info) != 1 )
>>> +    {
>>> +        PERROR("Could not get old domain info");
>>> +        return 1;
>>> +    }
>>> +
>>> +    if ( xc_domain_getinfo(xch, dest_dom, 1, &new_info) != 1 )
>>> +    {
>>> +        PERROR("Could not get new domain info");
>>> +        return 1;
>>> +    }
>>
>> xc_domain_getinfo returns the first domain ID used from dest_dom. If
>> dest_dom doesn't exist it may return another domain.
>>
>> Therefore you have to check that the info correspond to the correct domain.
> 
> Oh, thanks, I fixed xc_get_tot_pages() a while ago:
> 
> commit 5dcd0dcb85941bd92336e01ae3f8c44730099c96
> Author: Vitaly Kuznetsov <vkuznets@redhat.com>
> Date:   Tue Dec 2 16:18:08 2014 +0100
> 
>     libxc: check in xc_get_tot_pages() that the proper domain is
>     reported
> 
> but I already managed to forget that xc_domain_getinfo() needs
> additional check. Its interface is a bit misleading :-(

Agreed. There is multiple place in libxc and other xen related software
where this function is misused.

It would be worth to introduce a new xc function to return an error if
it's not possible to return the info of the specified domid.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 05/10] xsm: add XENMEM_soft_reset support
  2015-05-13  9:49 ` [PATCH v6 05/10] xsm: add XENMEM_soft_reset support Vitaly Kuznetsov
@ 2015-05-20 23:30   ` Daniel De Graaf
  2015-05-21  9:49     ` Vitaly Kuznetsov
  2015-05-22  9:40   ` Jan Beulich
  1 sibling, 1 reply; 31+ messages in thread
From: Daniel De Graaf @ 2015-05-20 23:30 UTC (permalink / raw)
  To: Vitaly Kuznetsov, xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu

On 05/13/2015 05:49 AM, Vitaly Kuznetsov wrote:
> Dummy policy just checks that the current domain is privileged,
> in flask policy soft_reset is added to create_domain.
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

I think the FLASK policy should also check that memory can be moved
from d1 to d2, independent of the check that the toolstack can move
the memory of d1 (or d2).  While I would expect that the security
contexts of d1 and d2 would be identical in most cases (and only
allow that in the example policy), there may be reasons to change
the context along with the kexec operation.  The best examples I
can think of are kexec from a bootloader domain of some kind, or
an installation that transitions into an active system that needs
access to a different network or set of peer domains.

For the example, policy, I'd add something like
	allow $2 $2 : mmu reset_transfer;
to the create_domain interface.

[...]
> --- a/xen/xsm/flask/policy/access_vectors
> +++ b/xen/xsm/flask/policy/access_vectors
> @@ -366,6 +366,10 @@ class mmu
>   #  source = domain making the hypercall
>   #  target = domain whose pages are being exchanged
>       exchange
> +# XENMEM_soft_reset:
> +#  source = source soft reset domain
> +#  target = destination soft reset domain
> +    soft_reset

These comments are a bit ambiguous.  I would suggest something like:
#  source = domain making the hypercall
#  target = domain being reset (source or destination)

-- 
Daniel De Graaf
National Security Agency

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 05/10] xsm: add XENMEM_soft_reset support
  2015-05-20 23:30   ` Daniel De Graaf
@ 2015-05-21  9:49     ` Vitaly Kuznetsov
  2015-05-21 14:25       ` Daniel De Graaf
  0 siblings, 1 reply; 31+ messages in thread
From: Vitaly Kuznetsov @ 2015-05-21  9:49 UTC (permalink / raw)
  To: Daniel De Graaf
  Cc: Wei Liu, Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, xen-devel

Daniel De Graaf <dgdegra@tycho.nsa.gov> writes:

> On 05/13/2015 05:49 AM, Vitaly Kuznetsov wrote:
>> Dummy policy just checks that the current domain is privileged,
>> in flask policy soft_reset is added to create_domain.
>>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>
> I think the FLASK policy should also check that memory can be moved
> from d1 to d2, independent of the check that the toolstack can move
> the memory of d1 (or d2).  While I would expect that the security
> contexts of d1 and d2 would be identical in most cases (and only
> allow that in the example policy), there may be reasons to change
> the context along with the kexec operation.  The best examples I
> can think of are kexec from a bootloader domain of some kind, or
> an installation that transitions into an active system that needs
> access to a different network or set of peer domains.
>
> For the example, policy, I'd add something like
> 	allow $2 $2 : mmu reset_transfer;
> to the create_domain interface.

Hi Daniel, thank you for your review!

Did I get you right and you suggest we use two new vectors in MMU class
for soft reset: the first one to check that the domain making the
hypercall is allowed to do it and the second one to check that that
memory can be moved from d1 to d2? In that case the FLASK-related part
of the patch would look like that I suppose:

diff --git a/tools/flask/policy/policy/modules/xen/xen.if b/tools/flask/policy/policy/modules/xen/xen.if
index 620d151..ab4fe7d 100644
--- a/tools/flask/policy/policy/modules/xen/xen.if
+++ b/tools/flask/policy/policy/modules/xen/xen.if
@@ -54,10 +54,12 @@ define(`create_domain_common', `
 			psr_cmt_op };
 	allow $1 $2:security check_context;
 	allow $1 $2:shadow enable;
-	allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage mmuext_op updatemp };
+	allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage
+			mmuext_op updatemp soft_reset };
 	allow $1 $2:grant setup;
 	allow $1 $2:hvm { cacheattr getparam hvmctl irqlevel pciroute sethvmc
 			setparam pcilevel trackdirtyvram nested };
+	allow $2 $2:mmu reset_transfer;
 ')
 
 # create_domain(priv, target)
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 11b7453..547d55c 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -383,6 +383,21 @@ static int flask_memory_exchange(struct domain *d)
     return current_has_perm(d, SECCLASS_MMU, MMU__EXCHANGE);
 }
 
+static int flask_memory_soft_reset(struct domain *d1, struct domain *d2)
+{
+    int rc;
+
+    rc = current_has_perm(d1, SECCLASS_MMU, MMU__SOFT_RESET);
+    if (rc)
+        return rc;
+
+    rc = current_has_perm(d2, SECCLASS_MMU, MMU__SOFT_RESET);
+    if (rc)
+        return rc;
+
+    return domain_has_perm(d1, d2, SECCLASS_MMU, MMU__RESET_TRANSFER);
+}
+
 static int flask_memory_adjust_reservation(struct domain *d1, struct domain *d2)
 {
     return domain_has_perm(d1, d2, SECCLASS_MMU, MMU__ADJUST);
@@ -1617,6 +1632,7 @@ static struct xsm_operations flask_ops = {
     .get_pod_target = flask_get_pod_target,
     .set_pod_target = flask_set_pod_target,
     .memory_exchange = flask_memory_exchange,
+    .memory_soft_reset = flask_memory_soft_reset,
     .memory_adjust_reservation = flask_memory_adjust_reservation,
     .memory_stat_reservation = flask_memory_stat_reservation,
     .memory_pin_page = flask_memory_pin_page,
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index ea556df..6872c1a 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -366,6 +366,13 @@ class mmu
 #  source = domain making the hypercall
 #  target = domain whose pages are being exchanged
     exchange
+# XENMEM_soft_reset:
+#  source = domain making the hypercall
+#  target = domain being reset (source or destination)
+    soft_reset
+#  source = source domain being reset
+#  target = destination domain being reset
+    reset_transfer
 # Allow a privileged domain to install a map of a page it does not own.  Used
 # for stub domain device models with the PV framebuffer.
     target_hack

[...]

-- 
  Vitaly

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 05/10] xsm: add XENMEM_soft_reset support
  2015-05-21  9:49     ` Vitaly Kuznetsov
@ 2015-05-21 14:25       ` Daniel De Graaf
  0 siblings, 0 replies; 31+ messages in thread
From: Daniel De Graaf @ 2015-05-21 14:25 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Wei Liu, Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, xen-devel

On 05/21/2015 05:49 AM, Vitaly Kuznetsov wrote:
> Daniel De Graaf <dgdegra@tycho.nsa.gov> writes:
>
>> On 05/13/2015 05:49 AM, Vitaly Kuznetsov wrote:
>>> Dummy policy just checks that the current domain is privileged,
>>> in flask policy soft_reset is added to create_domain.
>>>
>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>
>> I think the FLASK policy should also check that memory can be moved
>> from d1 to d2, independent of the check that the toolstack can move
>> the memory of d1 (or d2).  While I would expect that the security
>> contexts of d1 and d2 would be identical in most cases (and only
>> allow that in the example policy), there may be reasons to change
>> the context along with the kexec operation.  The best examples I
>> can think of are kexec from a bootloader domain of some kind, or
>> an installation that transitions into an active system that needs
>> access to a different network or set of peer domains.
>>
>> For the example, policy, I'd add something like
>> 	allow $2 $2 : mmu reset_transfer;
>> to the create_domain interface.
>
> Hi Daniel, thank you for your review!
>
> Did I get you right and you suggest we use two new vectors in MMU class
> for soft reset: the first one to check that the domain making the
> hypercall is allowed to do it and the second one to check that that
> memory can be moved from d1 to d2? In that case the FLASK-related part
> of the patch would look like that I suppose:
>
> diff --git a/tools/flask/policy/policy/modules/xen/xen.if b/tools/flask/policy/policy/modules/xen/xen.if
> index 620d151..ab4fe7d 100644
> --- a/tools/flask/policy/policy/modules/xen/xen.if
> +++ b/tools/flask/policy/policy/modules/xen/xen.if
> @@ -54,10 +54,12 @@ define(`create_domain_common', `
>   			psr_cmt_op };
>   	allow $1 $2:security check_context;
>   	allow $1 $2:shadow enable;
> -	allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage mmuext_op updatemp };
> +	allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage
> +			mmuext_op updatemp soft_reset };
>   	allow $1 $2:grant setup;
>   	allow $1 $2:hvm { cacheattr getparam hvmctl irqlevel pciroute sethvmc
>   			setparam pcilevel trackdirtyvram nested };
> +	allow $2 $2:mmu reset_transfer;
>   ')
>
>   # create_domain(priv, target)
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index 11b7453..547d55c 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -383,6 +383,21 @@ static int flask_memory_exchange(struct domain *d)
>       return current_has_perm(d, SECCLASS_MMU, MMU__EXCHANGE);
>   }
>
> +static int flask_memory_soft_reset(struct domain *d1, struct domain *d2)
> +{
> +    int rc;
> +
> +    rc = current_has_perm(d1, SECCLASS_MMU, MMU__SOFT_RESET);
> +    if (rc)
> +        return rc;
> +
> +    rc = current_has_perm(d2, SECCLASS_MMU, MMU__SOFT_RESET);
> +    if (rc)
> +        return rc;
> +
> +    return domain_has_perm(d1, d2, SECCLASS_MMU, MMU__RESET_TRANSFER);
> +}
> +
>   static int flask_memory_adjust_reservation(struct domain *d1, struct domain *d2)
>   {
>       return domain_has_perm(d1, d2, SECCLASS_MMU, MMU__ADJUST);
> @@ -1617,6 +1632,7 @@ static struct xsm_operations flask_ops = {
>       .get_pod_target = flask_get_pod_target,
>       .set_pod_target = flask_set_pod_target,
>       .memory_exchange = flask_memory_exchange,
> +    .memory_soft_reset = flask_memory_soft_reset,
>       .memory_adjust_reservation = flask_memory_adjust_reservation,
>       .memory_stat_reservation = flask_memory_stat_reservation,
>       .memory_pin_page = flask_memory_pin_page,
> diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
> index ea556df..6872c1a 100644
> --- a/xen/xsm/flask/policy/access_vectors
> +++ b/xen/xsm/flask/policy/access_vectors
> @@ -366,6 +366,13 @@ class mmu
>   #  source = domain making the hypercall
>   #  target = domain whose pages are being exchanged
>       exchange
> +# XENMEM_soft_reset:
> +#  source = domain making the hypercall
> +#  target = domain being reset (source or destination)
> +    soft_reset
> +#  source = source domain being reset
> +#  target = destination domain being reset
> +    reset_transfer
>   # Allow a privileged domain to install a map of a page it does not own.  Used
>   # for stub domain device models with the PV framebuffer.
>       target_hack
>
> [...]

Yes, this is what I was looking for.  When combined with the rest of the patch:

Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

-- 
Daniel De Graaf
National Security Agency

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 04/10] xen: Introduce XENMEM_soft_reset operation
  2015-05-13  9:49 ` [PATCH v6 04/10] xen: Introduce XENMEM_soft_reset operation Vitaly Kuznetsov
@ 2015-05-22  9:38   ` Jan Beulich
  2015-05-22 15:36     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2015-05-22  9:38 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	Tim Deegan, David Vrabel, xen-devel, Daniel De Graaf,
	Keir Fraser

>>> On 13.05.15 at 11:49, <vkuznets@redhat.com> wrote:
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -580,6 +580,234 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>      return rc;
>  }
>  
> +static long memory_soft_reset(XEN_GUEST_HANDLE_PARAM(xen_memory_soft_reset_t) arg)
> +{
> +    long rc = 0;

The way it's coded right now, I don't see the need for this or the
function return type being "long" (but see the comments on the
public header change).

> +    struct xen_memory_soft_reset softr;
> +    struct domain *source_d, *dest_d;
> +    unsigned long mfn, mfn_new, gmfn, last_gmfn, count;
> +    unsigned int order;
> +    p2m_type_t p2mt;
> +    struct page_info *page, *new_page, *next_page;
> +    int drop_dom_ref;
> +
> +    if ( copy_from_guest(&softr, arg, 1) )
> +        return -EFAULT;
> +
> +    if ( softr.source_domid == softr.dest_domid )
> +        return -EINVAL;
> +
> +    source_d = rcu_lock_domain_by_any_id(softr.source_domid);
> +    if ( source_d == NULL )
> +    {
> +        rc = -ESRCH;
> +        goto fail_early;
> +    }
> +
> +    dest_d = rcu_lock_domain_by_any_id(softr.dest_domid);
> +    if ( dest_d == NULL )
> +    {
> +        rc = -ESRCH;
> +        rcu_unlock_domain(source_d);
> +        goto fail_early;
> +    }
> +
> +    if ( dest_d->is_dying )
> +    {
> +        rc = -EINVAL;
> +        goto fail;
> +    }
> +
> +    if ( !source_d->is_dying )
> +    {
> +        /*
> +         * Make sure no allocation/remapping for the source domain is ongoing
> +         * and set is_dying flag to prevent such actions in future.
> +         */
> +        spin_lock(&source_d->page_alloc_lock);
> +        source_d->is_dying = DOMDYING_locked;

You need to repeat the is_dying check with the lock held, and act
accordingly if it isn't zero anymore.

Furthermore I don't see how this prevents there being further
changes to the GFN <-> MFN mappings for the guest (yet you rely
on them to be stable below afaict).

> +        spin_unlock(&source_d->page_alloc_lock);
> +    }
> +
> +    last_gmfn = domain_get_maximum_gpfn(source_d);
> +    gmfn = softr.gmfn_start;
> +    while ( gmfn <= last_gmfn )

By now you should have done a privilege check.

> +    {
> +        page = get_page_from_gfn(source_d, gmfn, &p2mt, 0);
> +        if ( unlikely(page == NULL) )
> +        {
> +#ifdef CONFIG_X86
> +            count = 0;
> +            while ( p2m_is_pod(p2mt) )
> +            {
> +                count++;
> +                if ( gmfn + count > last_gmfn )
> +                    break;
> +                page = get_page_from_gfn(source_d, gmfn + count, &p2mt, 0);
> +                if ( page )
> +                {
> +                    put_page(page);
> +                    page = NULL;

Why? You don't use the value further down.

> +                    break;
> +                }
> +            }
> +
> +            if ( !count )
> +                gmfn++;
> +
> +            while ( count )
> +            {
> +                order = get_order_from_pages(count);
> +                if ( (1ul << order) > count )
> +                    order -= 1;

--

> +                rc = guest_physmap_mark_populate_on_demand(dest_d, gmfn, order);
> +                if ( rc )
> +                    goto fail;
> +                count -= 1ul << order;
> +                gmfn += 1ul << order;
> +            }
> +#else
> +            gmfn++;
> +#endif
> +            goto preempt_check;
> +        }
> +
> +        mfn = page_to_mfn(page);
> +        if ( unlikely(!mfn_valid(mfn)) )
> +        {
> +            put_page(page);
> +            gmfn++;
> +            goto preempt_check;

I guess this should never happen, and hence issuing a warning may be
warranted?

> +        }
> +
> +        next_page = page;
> +
> +        /*
> +         * A normal page is supposed to have count_info = 2 (1 from the domain
> +         * and 1 from get_page_from_gfn() above). All other pages need to be
> +         * copied.
> +         */

I'm afraid this isn't precise enough: _PGC_allocated implies one
refcount, i.e. if that flag turns out to be clear, you should expect
just 1. And then I'd wonder whether a !_PGC_allocated page
needs any handling here in the first place (or perhaps, once the
GFN <-> MFN mapping is ensured to be stable, this case can't
happen anyway).

> +        count = 0;
> +        while ( next_page && !is_xen_heap_page(next_page) &&
> +                (next_page->count_info & PGC_count_mask) == 2 &&
> +                page_to_mfn(next_page) == mfn + count )
> +        {
> +            count++;
> +            drop_dom_ref = 0;
> +            spin_lock(&source_d->page_alloc_lock);
> +            page_set_owner(next_page, NULL);
> +            page_list_del(next_page, &source_d->page_list);
> +            source_d->tot_pages -= 1;

--

> +            if ( unlikely(source_d->tot_pages == 0) )
> +                drop_dom_ref = 1;
> +            spin_unlock(&source_d->page_alloc_lock);
> +            put_page(next_page);
> +            if ( drop_dom_ref )
> +                put_domain(source_d);
> +
> +            if ( unlikely(assign_pages(dest_d, next_page, 0, 0)) )
> +            {
> +                printk(XENLOG_G_INFO "Failed to assign Dom%d's MFN %lx"
> +                       " to Dom%d\n", source_d->domain_id, mfn,
> +                       dest_d->domain_id);
> +                rc = -EFAULT;

A better (more distinct) error code should be used here. And I
suppose printing the GFN would be more helpful for debugging
possible issues than printing the MFN.

> +                goto fail;
> +            }
> +
> +            if ( unlikely(gmfn + count > last_gmfn) )
> +            {
> +                next_page = NULL;
> +                break;
> +            }
> +
> +            next_page = get_page_from_gfn(source_d, gmfn + count, &p2mt, 0);
> +        }
> +
> +        if ( next_page && count )
> +                put_page(next_page);
> +
> +        if ( !count && (page->count_info & PGC_count_mask) != 2 )

Re-reading the field seems wrong here: What if it was > 2 above,
but is == 2 now?

Also together with the "else if" below I think this would better be
structured

        if ( count )
        {
            if ( next_page && count )
                put_page(next_page);
        }
        else if ( (page->count_info & PGC_count_mask) != 2 )
        ...
        else

> +        {
> +            new_page = alloc_domheap_page(dest_d, 0);
> +            if ( unlikely(new_page == NULL) )
> +            {
> +                printk(XENLOG_G_INFO "Failed to alloc a page to replace"
> +                       " Dom%d's GFN %lx (MFN %lx) for Dom %d\n",
> +                       source_d->domain_id, gmfn, mfn, dest_d->domain_id);
> +                rc = -ENOMEM;
> +                put_page(page);
> +                goto fail;
> +            }
> +            mfn_new = page_to_mfn(new_page);
> +            copy_domain_page(mfn_new, mfn);
> +            mfn = mfn_new;
> +            put_page(page);
> +            if ( guest_physmap_add_page(dest_d, gmfn, mfn, 0) )
> +            {
> +                printk(XENLOG_G_INFO "Failed to add new GFN %lx"
> +                       " (MFN %lx) to Dom%d\n",
> +                       gmfn, mfn, dest_d->domain_id);
> +                rc = -EFAULT;
> +                goto fail;
> +            }
> +            gmfn++;
> +            softr.nr_transferred++;
> +        }
> +        else if ( !count )
> +        {
> +            put_page(page);
> +            gmfn++;
> +            goto preempt_check;
> +        }
> +
> +        while ( count )
> +        {
> +            order = get_order_from_pages(count);
> +            if ( (1ul << order) > count )
> +                order -= 1;
> +
> +            guest_physmap_remove_page(source_d, gmfn, mfn, order);
> +
> +            if ( guest_physmap_add_page(dest_d, gmfn, mfn, order) )

Considering the non-atomicity between the assign_pages() much
further up and this call, shouldn't the domain be paused while its
state is inconsistent?

> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -573,7 +573,39 @@ struct xen_vnuma_topology_info {
>  typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
>  
> -/* Next available subop number is 27 */
> +/*
> + * Transfer all memory pages from one domain to the other. Pages are unmapped
> + * from the source domain and mapped at the same GFNs to the destination
> + * domain. This hypercall has a side-effect of moving the source domain to the
> + * 'dying' state.

Namely due to this, shouldn't this rather be a domctl?

> + * If a particular page is mapped more then once in the source domain a new
> + * empty page is being allocated for the destination domain and the content is
> + * being copied. The original page remains mapped to the source domain.
> + *
> + * The caller has to be priviliged and it is supposed to set gmfn_start to 0,
> + * this field is required for the hypercall continuation.

If it was kept to be a mem-op - mem-ops have a mechanism to be
continued without custom structure fields.

> +#define XENMEM_soft_reset                   27
> +struct xen_memory_soft_reset {
> +    /*
> +     * [IN] Memory transfer details.
> +     */
> +    domid_t source_domid; /* steal pages from */
> +    domid_t dest_domid;   /* assign pages to */
> +
> +    xen_pfn_t gmfn_start; /* start from gmfn */
> +
> +    /*
> +     * [OUT] Number of transfered pages including new allocations.
> +     */
> +    xen_ulong_t nr_transferred;

It's not really clear to me what use this is.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 05/10] xsm: add XENMEM_soft_reset support
  2015-05-13  9:49 ` [PATCH v6 05/10] xsm: add XENMEM_soft_reset support Vitaly Kuznetsov
  2015-05-20 23:30   ` Daniel De Graaf
@ 2015-05-22  9:40   ` Jan Beulich
  2015-05-22 14:58     ` Daniel De Graaf
  2015-05-22 14:59     ` Vitaly Kuznetsov
  1 sibling, 2 replies; 31+ messages in thread
From: Jan Beulich @ 2015-05-22  9:40 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	Tim Deegan, David Vrabel, xen-devel, Daniel De Graaf,
	Keir Fraser

>>> On 13.05.15 at 11:49, <vkuznets@redhat.com> wrote:
> --- a/xen/include/xsm/dummy.h
> +++ b/xen/include/xsm/dummy.h
> @@ -193,6 +193,13 @@ static XSM_INLINE int xsm_memory_exchange(XSM_DEFAULT_ARG struct domain *d)
>      return xsm_default_action(action, current->domain, d);
>  }
>  
> +static XSM_INLINE int xsm_memory_soft_reset(XSM_DEFAULT_ARG struct domain *d1,
> +					    struct domain *d2)
> +{
> +    XSM_ASSERT_ACTION(XSM_PRIV);
> +    return xsm_default_action(action, current->domain, NULL);
> +}

Why XSM_PRIV instead of XSM_TARGET against _both_ domains?

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 10/10] (lib)xl: soft reset support
  2015-05-13  9:49 ` [PATCH v6 10/10] (lib)xl: soft reset support Vitaly Kuznetsov
@ 2015-05-22 14:55   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 31+ messages in thread
From: Vitaly Kuznetsov @ 2015-05-22 14:55 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Julien Grall, Keir Fraser, Ian Campbell,
	Stefano Stabellini, Andrew Cooper, Ian Jackson, Olaf Hering,
	Tim Deegan, David Vrabel, Jan Beulich, Wei Liu, Daniel De Graaf

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> Perform soft reset when a domain did SHUTDOWN_soft_reset. Migrate the
> content with xc_domain_soft_reset(), reload dm and toolstack.
>

..skip..

> +void libxl__xc_domain_soft_reset(libxl__egc *egc,
> +                                 libxl__domain_create_state *dcs)
> +{
> +    STATE_AO_GC(dcs->ao);
> +    libxl_ctx *ctx = libxl__gc_owner(gc);
> +    const uint32_t domid_soft_reset = dcs->domid_soft_reset;
> +    const uint32_t domid = dcs->guest_domid;
> +    uint8_t *buf;
> +    uint32_t len;
> +    int rc;
> +    libxl__domain_suspend_state *dss;
> +
> +    GCNEW(dss);
> +    dss->ao = ao;
> +    dss->domid = domid_soft_reset;
> +
> +    rc = libxl__toolstack_save(domid_soft_reset, &buf, &len, dss);
> +    if (rc) goto out;

It turned out being too late, QEMU process for the original domain is
already dead at this point and xenstore in cleanen. Qemu-upstream will
fail when it requires e.g. video ram info. I'll move this call earlier
in the chain in v7.

..skip...

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 05/10] xsm: add XENMEM_soft_reset support
  2015-05-22  9:40   ` Jan Beulich
@ 2015-05-22 14:58     ` Daniel De Graaf
  2015-05-22 15:26       ` Jan Beulich
  2015-05-22 14:59     ` Vitaly Kuznetsov
  1 sibling, 1 reply; 31+ messages in thread
From: Daniel De Graaf @ 2015-05-22 14:58 UTC (permalink / raw)
  To: Jan Beulich, Vitaly Kuznetsov
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	Tim Deegan, David Vrabel, xen-devel, Keir Fraser

On 05/22/2015 05:40 AM, Jan Beulich wrote:
>>>> On 13.05.15 at 11:49, <vkuznets@redhat.com> wrote:
>> --- a/xen/include/xsm/dummy.h
>> +++ b/xen/include/xsm/dummy.h
>> @@ -193,6 +193,13 @@ static XSM_INLINE int xsm_memory_exchange(XSM_DEFAULT_ARG struct domain *d)
>>       return xsm_default_action(action, current->domain, d);
>>   }
>>
>> +static XSM_INLINE int xsm_memory_soft_reset(XSM_DEFAULT_ARG struct domain *d1,
>> +					    struct domain *d2)
>> +{
>> +    XSM_ASSERT_ACTION(XSM_PRIV);
>> +    return xsm_default_action(action, current->domain, NULL);
>> +}
>
> Why XSM_PRIV instead of XSM_TARGET against _both_ domains?
>
> Jan

Unless there is a change in how XSM_TARGET is implemented, the result
is going to be equivalent: it is not possible for a domain to have
more than one target at a time, so if current->domain is not dom0,
then one of the two XSM_TARGET checks will fail.

-- 
Daniel De Graaf
National Security Agency

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 05/10] xsm: add XENMEM_soft_reset support
  2015-05-22  9:40   ` Jan Beulich
  2015-05-22 14:58     ` Daniel De Graaf
@ 2015-05-22 14:59     ` Vitaly Kuznetsov
  1 sibling, 0 replies; 31+ messages in thread
From: Vitaly Kuznetsov @ 2015-05-22 14:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	Tim Deegan, David Vrabel, xen-devel, Daniel De Graaf,
	Keir Fraser

"Jan Beulich" <JBeulich@suse.com> writes:

>>>> On 13.05.15 at 11:49, <vkuznets@redhat.com> wrote:
>> --- a/xen/include/xsm/dummy.h
>> +++ b/xen/include/xsm/dummy.h
>> @@ -193,6 +193,13 @@ static XSM_INLINE int xsm_memory_exchange(XSM_DEFAULT_ARG struct domain *d)
>>      return xsm_default_action(action, current->domain, d);
>>  }
>>  
>> +static XSM_INLINE int xsm_memory_soft_reset(XSM_DEFAULT_ARG struct domain *d1,
>> +					    struct domain *d2)
>> +{
>> +    XSM_ASSERT_ACTION(XSM_PRIV);
>> +    return xsm_default_action(action, current->domain, NULL);
>> +}
>
> Why XSM_PRIV instead of XSM_TARGET against _both_ domains?
>

No reason to be honest. We agreed on a 2-step check for FLASK with
Daniel (the caller has permission to operate on both domains and the
memory itself can be transfered), XSM_TARGET against both domains in
dummy seems reasonable too.

> Jan

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 05/10] xsm: add XENMEM_soft_reset support
  2015-05-22 14:58     ` Daniel De Graaf
@ 2015-05-22 15:26       ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2015-05-22 15:26 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Daniel De Graaf
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	Tim Deegan, David Vrabel, xen-devel, Keir Fraser

>>> On 22.05.15 at 16:58, <dgdegra@tycho.nsa.gov> wrote:
> On 05/22/2015 05:40 AM, Jan Beulich wrote:
>>>>> On 13.05.15 at 11:49, <vkuznets@redhat.com> wrote:
>>> --- a/xen/include/xsm/dummy.h
>>> +++ b/xen/include/xsm/dummy.h
>>> @@ -193,6 +193,13 @@ static XSM_INLINE int 
> xsm_memory_exchange(XSM_DEFAULT_ARG struct domain *d)
>>>       return xsm_default_action(action, current->domain, d);
>>>   }
>>>
>>> +static XSM_INLINE int xsm_memory_soft_reset(XSM_DEFAULT_ARG struct domain 
> *d1,
>>> +					    struct domain *d2)
>>> +{
>>> +    XSM_ASSERT_ACTION(XSM_PRIV);
>>> +    return xsm_default_action(action, current->domain, NULL);
>>> +}
>>
>> Why XSM_PRIV instead of XSM_TARGET against _both_ domains?
> 
> Unless there is a change in how XSM_TARGET is implemented, the result
> is going to be equivalent: it is not possible for a domain to have
> more than one target at a time, so if current->domain is not dom0,
> then one of the two XSM_TARGET checks will fail.

Ah, right. But the thing here wants that expressed. But I guess this
should be a separate change later on. Vitaly - please leave a
respective comment around.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 04/10] xen: Introduce XENMEM_soft_reset operation
  2015-05-22  9:38   ` Jan Beulich
@ 2015-05-22 15:36     ` Vitaly Kuznetsov
  2015-05-22 16:26       ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Vitaly Kuznetsov @ 2015-05-22 15:36 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	Tim Deegan, David Vrabel, xen-devel, Daniel De Graaf,
	Keir Fraser

"Jan Beulich" <JBeulich@suse.com> writes:

>>>> On 13.05.15 at 11:49, <vkuznets@redhat.com> wrote:
>> --- a/xen/common/memory.c
>> +++ b/xen/common/memory.c
>> @@ -580,6 +580,234 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>>      return rc;
>>  }
>>  
>> +static long memory_soft_reset(XEN_GUEST_HANDLE_PARAM(xen_memory_soft_reset_t) arg)
>> +{
>> +    long rc = 0;
>
> The way it's coded right now, I don't see the need for this or the
> function return type being "long" (but see the comments on the
> public header change).
>
>> +    struct xen_memory_soft_reset softr;
>> +    struct domain *source_d, *dest_d;
>> +    unsigned long mfn, mfn_new, gmfn, last_gmfn, count;
>> +    unsigned int order;
>> +    p2m_type_t p2mt;
>> +    struct page_info *page, *new_page, *next_page;
>> +    int drop_dom_ref;
>> +
>> +    if ( copy_from_guest(&softr, arg, 1) )
>> +        return -EFAULT;
>> +
>> +    if ( softr.source_domid == softr.dest_domid )
>> +        return -EINVAL;
>> +
>> +    source_d = rcu_lock_domain_by_any_id(softr.source_domid);
>> +    if ( source_d == NULL )
>> +    {
>> +        rc = -ESRCH;
>> +        goto fail_early;
>> +    }
>> +
>> +    dest_d = rcu_lock_domain_by_any_id(softr.dest_domid);
>> +    if ( dest_d == NULL )
>> +    {
>> +        rc = -ESRCH;
>> +        rcu_unlock_domain(source_d);
>> +        goto fail_early;
>> +    }
>> +
>> +    if ( dest_d->is_dying )
>> +    {
>> +        rc = -EINVAL;
>> +        goto fail;
>> +    }
>> +
>> +    if ( !source_d->is_dying )
>> +    {
>> +        /*
>> +         * Make sure no allocation/remapping for the source domain is ongoing
>> +         * and set is_dying flag to prevent such actions in future.
>> +         */
>> +        spin_lock(&source_d->page_alloc_lock);
>> +        source_d->is_dying = DOMDYING_locked;
>
> You need to repeat the is_dying check with the lock held, and act
> accordingly if it isn't zero anymore.

Yep, missed that.

>
> Furthermore I don't see how this prevents there being further
> changes to the GFN <-> MFN mappings for the guest (yet you rely
> on them to be stable below afaict).
>

As you suggested below we can complement that by pausing both source and
destination domains here. In that case these domains won't be changing
their mappings by themselves but it would still be possible for the
hypervisor to change something. We do have !d->is_dying check in many
places though ... In theory we could have taken the p2m_lock() for both
domains but I'm not sure all stuff I need here will cope with p2m_lock()
already being held, this lock is x86-specific and I'm not sure we want
it in the first place. I'd be very grateful for some additional ideas on
how to make it safer.

>> +        spin_unlock(&source_d->page_alloc_lock);
>> +    }
>> +
>> +    last_gmfn = domain_get_maximum_gpfn(source_d);
>> +    gmfn = softr.gmfn_start;
>> +    while ( gmfn <= last_gmfn )
>
> By now you should have done a privilege check.
>

I'm adding xsm_memory_soft_reset() in PATCH 05/10, hope it's sufficient.

>> +    {
>> +        page = get_page_from_gfn(source_d, gmfn, &p2mt, 0);
>> +        if ( unlikely(page == NULL) )
>> +        {
>> +#ifdef CONFIG_X86
>> +            count = 0;
>> +            while ( p2m_is_pod(p2mt) )
>> +            {
>> +                count++;
>> +                if ( gmfn + count > last_gmfn )
>> +                    break;
>> +                page = get_page_from_gfn(source_d, gmfn + count, &p2mt, 0);
>> +                if ( page )
>> +                {
>> +                    put_page(page);
>> +                    page = NULL;
>
> Why? You don't use the value further down.
>

Leftovers, will get rid of it in v7.

>> +                    break;
>> +                }
>> +            }
>> +
>> +            if ( !count )
>> +                gmfn++;
>> +
>> +            while ( count )
>> +            {
>> +                order = get_order_from_pages(count);
>> +                if ( (1ul << order) > count )
>> +                    order -= 1;
>
> --
>
>> +                rc = guest_physmap_mark_populate_on_demand(dest_d, gmfn, order);
>> +                if ( rc )
>> +                    goto fail;
>> +                count -= 1ul << order;
>> +                gmfn += 1ul << order;
>> +            }
>> +#else
>> +            gmfn++;
>> +#endif
>> +            goto preempt_check;
>> +        }
>> +
>> +        mfn = page_to_mfn(page);
>> +        if ( unlikely(!mfn_valid(mfn)) )
>> +        {
>> +            put_page(page);
>> +            gmfn++;
>> +            goto preempt_check;
>
> I guess this should never happen, and hence issuing a warning may be
> warranted?
>

Yea, not sure what should is be though. "page with invalid MFN!" or something.

>> +        }
>> +
>> +        next_page = page;
>> +
>> +        /*
>> +         * A normal page is supposed to have count_info = 2 (1 from the domain
>> +         * and 1 from get_page_from_gfn() above). All other pages need to be
>> +         * copied.
>> +         */
>
> I'm afraid this isn't precise enough: _PGC_allocated implies one
> refcount, i.e. if that flag turns out to be clear, you should expect
> just 1. And then I'd wonder whether a !_PGC_allocated page
> needs any handling here in the first place (or perhaps, once the
> GFN <-> MFN mapping is ensured to be stable, this case can't
> happen anyway).

A !_PGC_allocated page should never happen here I suppose but to be on
the safe side we can add _PGC_allocated check here, I think we need to
skip a page or even fail the hypercall in case this check fails.

>
>> +        count = 0;
>> +        while ( next_page && !is_xen_heap_page(next_page) &&
>> +                (next_page->count_info & PGC_count_mask) == 2 &&
>> +                page_to_mfn(next_page) == mfn + count )
>> +        {
>> +            count++;
>> +            drop_dom_ref = 0;
>> +            spin_lock(&source_d->page_alloc_lock);
>> +            page_set_owner(next_page, NULL);
>> +            page_list_del(next_page, &source_d->page_list);
>> +            source_d->tot_pages -= 1;
>
> --
>
>> +            if ( unlikely(source_d->tot_pages == 0) )
>> +                drop_dom_ref = 1;
>> +            spin_unlock(&source_d->page_alloc_lock);
>> +            put_page(next_page);
>> +            if ( drop_dom_ref )
>> +                put_domain(source_d);
>> +
>> +            if ( unlikely(assign_pages(dest_d, next_page, 0, 0)) )
>> +            {
>> +                printk(XENLOG_G_INFO "Failed to assign Dom%d's MFN %lx"
>> +                       " to Dom%d\n", source_d->domain_id, mfn,
>> +                       dest_d->domain_id);
>> +                rc = -EFAULT;
>
> A better (more distinct) error code should be used here. And I
> suppose printing the GFN would be more helpful for debugging
> possible issues than printing the MFN.

As I can see assign_pages() fails in two cases:
1) Over-allocating
2) Domain is dying (this is never supposed to happen as we're holding
the lock).
So actually both MFN and GFN are a bit irrelevant here.

>> +                goto fail;
>> +            }
>> +
>> +            if ( unlikely(gmfn + count > last_gmfn) )
>> +            {
>> +                next_page = NULL;
>> +                break;
>> +            }
>> +
>> +            next_page = get_page_from_gfn(source_d, gmfn + count, &p2mt, 0);
>> +        }
>> +
>> +        if ( next_page && count )
>> +                put_page(next_page);
>> +
>> +        if ( !count && (page->count_info & PGC_count_mask) != 2 )
>
> Re-reading the field seems wrong here: What if it was > 2 above,
> but is == 2 now?
>
> Also together with the "else if" below I think this would better be
> structured
>
>         if ( count )
>         {
>             if ( next_page && count )
>                 put_page(next_page);
>         }
>         else if ( (page->count_info & PGC_count_mask) != 2 )
>         ...
>         else
>
>> +        {
>> +            new_page = alloc_domheap_page(dest_d, 0);
>> +            if ( unlikely(new_page == NULL) )
>> +            {
>> +                printk(XENLOG_G_INFO "Failed to alloc a page to replace"
>> +                       " Dom%d's GFN %lx (MFN %lx) for Dom %d\n",
>> +                       source_d->domain_id, gmfn, mfn, dest_d->domain_id);
>> +                rc = -ENOMEM;
>> +                put_page(page);
>> +                goto fail;
>> +            }
>> +            mfn_new = page_to_mfn(new_page);
>> +            copy_domain_page(mfn_new, mfn);
>> +            mfn = mfn_new;
>> +            put_page(page);
>> +            if ( guest_physmap_add_page(dest_d, gmfn, mfn, 0) )
>> +            {
>> +                printk(XENLOG_G_INFO "Failed to add new GFN %lx"
>> +                       " (MFN %lx) to Dom%d\n",
>> +                       gmfn, mfn, dest_d->domain_id);
>> +                rc = -EFAULT;
>> +                goto fail;
>> +            }
>> +            gmfn++;
>> +            softr.nr_transferred++;
>> +        }
>> +        else if ( !count )
>> +        {
>> +            put_page(page);
>> +            gmfn++;
>> +            goto preempt_check;
>> +        }
>> +
>> +        while ( count )
>> +        {
>> +            order = get_order_from_pages(count);
>> +            if ( (1ul << order) > count )
>> +                order -= 1;
>> +
>> +            guest_physmap_remove_page(source_d, gmfn, mfn, order);
>> +
>> +            if ( guest_physmap_add_page(dest_d, gmfn, mfn, order) )
>
> Considering the non-atomicity between the assign_pages() much
> further up and this call, shouldn't the domain be paused while its
> state is inconsistent?

Yes, actually they're already paused by the toolstack but we'd better
enforce this in the hypervisor. Will fix in v7.

>
>> --- a/xen/include/public/memory.h
>> +++ b/xen/include/public/memory.h
>> @@ -573,7 +573,39 @@ struct xen_vnuma_topology_info {
>>  typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
>>  DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
>>  
>> -/* Next available subop number is 27 */
>> +/*
>> + * Transfer all memory pages from one domain to the other. Pages are unmapped
>> + * from the source domain and mapped at the same GFNs to the destination
>> + * domain. This hypercall has a side-effect of moving the source domain to the
>> + * 'dying' state.
>
> Namely due to this, shouldn't this rather be a domctl?
>

Probably, it looked like a memory op in the very beginning but unless we
decide to ensure that domain's mappings don't change by some other
mechanism we'd better make this a domctl destroying the original domain
at the very end.

>> + * If a particular page is mapped more then once in the source domain a new
>> + * empty page is being allocated for the destination domain and the content is
>> + * being copied. The original page remains mapped to the source domain.
>> + *
>> + * The caller has to be priviliged and it is supposed to set gmfn_start to 0,
>> + * this field is required for the hypercall continuation.
>
> If it was kept to be a mem-op - mem-ops have a mechanism to be
> continued without custom structure fields.
>
>> +#define XENMEM_soft_reset                   27
>> +struct xen_memory_soft_reset {
>> +    /*
>> +     * [IN] Memory transfer details.
>> +     */
>> +    domid_t source_domid; /* steal pages from */
>> +    domid_t dest_domid;   /* assign pages to */
>> +
>> +    xen_pfn_t gmfn_start; /* start from gmfn */
>> +
>> +    /*
>> +     * [OUT] Number of transfered pages including new allocations.
>> +     */
>> +    xen_ulong_t nr_transferred;
>
> It's not really clear to me what use this is.

I use it to detect a PoD domain for which I need to populate cache but I
think I can live without it.

>
> Jan

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 04/10] xen: Introduce XENMEM_soft_reset operation
  2015-05-22 15:36     ` Vitaly Kuznetsov
@ 2015-05-22 16:26       ` Jan Beulich
  2015-05-25  9:24         ` Tim Deegan
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2015-05-22 16:26 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	Tim Deegan, David Vrabel, xen-devel, Daniel De Graaf,
	Keir Fraser

>>> On 22.05.15 at 17:36, <vkuznets@redhat.com> wrote:
>>>>> On 13.05.15 at 11:49, <vkuznets@redhat.com> wrote:
>>> +    if ( !source_d->is_dying )
>>> +    {
>>> +        /*
>>> +         * Make sure no allocation/remapping for the source domain is ongoing
>>> +         * and set is_dying flag to prevent such actions in future.
>>> +         */
>>> +        spin_lock(&source_d->page_alloc_lock);
>>> +        source_d->is_dying = DOMDYING_locked;
>>
>> Furthermore I don't see how this prevents there being further
>> changes to the GFN <-> MFN mappings for the guest (yet you rely
>> on them to be stable below afaict).
>>
> 
> As you suggested below we can complement that by pausing both source and
> destination domains here. In that case these domains won't be changing
> their mappings by themselves but it would still be possible for the
> hypervisor to change something. We do have !d->is_dying check in many
> places though ... In theory we could have taken the p2m_lock() for both
> domains but I'm not sure all stuff I need here will cope with p2m_lock()
> already being held, this lock is x86-specific and I'm not sure we want
> it in the first place. I'd be very grateful for some additional ideas on
> how to make it safer.

For whether p2m_lock() might be needed here I'd like to defer to
Tim. As to there being changes by the hypervisor - the hypervisor
is a passive entity, i.e. unless asked by some guest to do
something, it won't do anything. Hence the question is whether
other domains (including the control one) could cause any change
here.

Also the destination domain as I understand it should have never
got un-paused before this operation. 

>>> +            if ( unlikely(source_d->tot_pages == 0) )
>>> +                drop_dom_ref = 1;
>>> +            spin_unlock(&source_d->page_alloc_lock);
>>> +            put_page(next_page);
>>> +            if ( drop_dom_ref )
>>> +                put_domain(source_d);
>>> +
>>> +            if ( unlikely(assign_pages(dest_d, next_page, 0, 0)) )
>>> +            {
>>> +                printk(XENLOG_G_INFO "Failed to assign Dom%d's MFN %lx"
>>> +                       " to Dom%d\n", source_d->domain_id, mfn,
>>> +                       dest_d->domain_id);
>>> +                rc = -EFAULT;
>>
>> A better (more distinct) error code should be used here. And I
>> suppose printing the GFN would be more helpful for debugging
>> possible issues than printing the MFN.
> 
> As I can see assign_pages() fails in two cases:
> 1) Over-allocating
> 2) Domain is dying (this is never supposed to happen as we're holding
> the lock).
> So actually both MFN and GFN are a bit irrelevant here.

Well, perhaps. To decide what to print, simply try to understand
what would be helpful to understand the reasons for the failure
without having to instrument the code right away.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 04/10] xen: Introduce XENMEM_soft_reset operation
  2015-05-22 16:26       ` Jan Beulich
@ 2015-05-25  9:24         ` Tim Deegan
  2015-05-25 10:06           ` Vitaly Kuznetsov
  0 siblings, 1 reply; 31+ messages in thread
From: Tim Deegan @ 2015-05-25  9:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	David Vrabel, xen-devel, Vitaly Kuznetsov, Keir Fraser,
	Daniel De Graaf

At 17:26 +0100 on 22 May (1432315574), Jan Beulich wrote:
> >>> On 22.05.15 at 17:36, <vkuznets@redhat.com> wrote:
> >>>>> On 13.05.15 at 11:49, <vkuznets@redhat.com> wrote:
> >>> +    if ( !source_d->is_dying )
> >>> +    {
> >>> +        /*
> >>> +         * Make sure no allocation/remapping for the source domain is ongoing
> >>> +         * and set is_dying flag to prevent such actions in future.
> >>> +         */
> >>> +        spin_lock(&source_d->page_alloc_lock);
> >>> +        source_d->is_dying = DOMDYING_locked;
> >>
> >> Furthermore I don't see how this prevents there being further
> >> changes to the GFN <-> MFN mappings for the guest (yet you rely
> >> on them to be stable below afaict).
> >>
> > 
> > As you suggested below we can complement that by pausing both source and
> > destination domains here. In that case these domains won't be changing
> > their mappings by themselves but it would still be possible for the
> > hypervisor to change something. We do have !d->is_dying check in many
> > places though ... In theory we could have taken the p2m_lock() for both
> > domains but I'm not sure all stuff I need here will cope with p2m_lock()
> > already being held, this lock is x86-specific and I'm not sure we want
> > it in the first place. I'd be very grateful for some additional ideas on
> > how to make it safer.
> 
> For whether p2m_lock() might be needed here I'd like to defer to
> Tim.

I don't think that will work.  Given that you have to make this
preemptable you can't hope to hold the p2m lock for the entire
operation.

If you want to make sure that the p2m mapping doesn't change
underfoot, you can use get_gfn()/put_gfn() around each individual
operation.  If you use get_gfn_type_access() it will also report
superpage mappings, so you can drop the loop that attempts
to detect them in the PoD case.

Incidentally, counting from 0-->max_mapped_gfn is unfortunate (though I
know this is not the first code to do that).  It would be better
to introduce an iterator over the p2m itself, either some sort of
for_each_gfn(dom, callback_fn, callback_arg) or a get_next_gfn(dom,
base_gfn, &found_gfn) that skips unmapped areas.

Tim.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 04/10] xen: Introduce XENMEM_soft_reset operation
  2015-05-25  9:24         ` Tim Deegan
@ 2015-05-25 10:06           ` Vitaly Kuznetsov
  2015-05-25 16:13             ` Tim Deegan
  0 siblings, 1 reply; 31+ messages in thread
From: Vitaly Kuznetsov @ 2015-05-25 10:06 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	David Vrabel, Jan Beulich, xen-devel, Daniel De Graaf,
	Keir Fraser

Tim Deegan <tim@xen.org> writes:

> At 17:26 +0100 on 22 May (1432315574), Jan Beulich wrote:
>> >>> On 22.05.15 at 17:36, <vkuznets@redhat.com> wrote:
>> >>>>> On 13.05.15 at 11:49, <vkuznets@redhat.com> wrote:
>> >>> +    if ( !source_d->is_dying )
>> >>> +    {
>> >>> +        /*
>> >>> +         * Make sure no allocation/remapping for the source domain is ongoing
>> >>> +         * and set is_dying flag to prevent such actions in future.
>> >>> +         */
>> >>> +        spin_lock(&source_d->page_alloc_lock);
>> >>> +        source_d->is_dying = DOMDYING_locked;
>> >>
>> >> Furthermore I don't see how this prevents there being further
>> >> changes to the GFN <-> MFN mappings for the guest (yet you rely
>> >> on them to be stable below afaict).
>> >>
>> > 
>> > As you suggested below we can complement that by pausing both source and
>> > destination domains here. In that case these domains won't be changing
>> > their mappings by themselves but it would still be possible for the
>> > hypervisor to change something. We do have !d->is_dying check in many
>> > places though ... In theory we could have taken the p2m_lock() for both
>> > domains but I'm not sure all stuff I need here will cope with p2m_lock()
>> > already being held, this lock is x86-specific and I'm not sure we want
>> > it in the first place. I'd be very grateful for some additional ideas on
>> > how to make it safer.
>> 
>> For whether p2m_lock() might be needed here I'd like to defer to
>> Tim.
>
> I don't think that will work.  Given that you have to make this
> preemptable you can't hope to hold the p2m lock for the entire
> operation.
>
> If you want to make sure that the p2m mapping doesn't change
> underfoot, you can use get_gfn()/put_gfn() around each individual
> operation.

Thanks, I see... An additional concern from Jan was (I suppose) about
the safeness (or correctness) of this operation as a whole: even when
both source and destination domains are paused their mappings can be
changed by the control domain (especially having possible preemption in
mind). I'm putting the source domain to the dying state so most
hypercalls will fail (if we're not checking is_dying somewhere we
probably should) but the destination domain is alive. I'm not sure this
adds any additional risk as a misbehaving control domain is always
able to screw mappings.

>  If you use get_gfn_type_access() it will also report
> superpage mappings, so you can drop the loop that attempts
> to detect them in the PoD case.

Thanks, I'll have a look. This one is x86-specific but the whole PoD
case is already x86-specific because of p2m_is_pod().

>
> Incidentally, counting from 0-->max_mapped_gfn is unfortunate (though I
> know this is not the first code to do that).  It would be better
> to introduce an iterator over the p2m itself, either some sort of
> for_each_gfn(dom, callback_fn, callback_arg) or a get_next_gfn(dom,
> base_gfn, &found_gfn) that skips unmapped areas.

I agree. Do you see this as a compulsory part of this series?

>
> Tim.

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 04/10] xen: Introduce XENMEM_soft_reset operation
  2015-05-25 10:06           ` Vitaly Kuznetsov
@ 2015-05-25 16:13             ` Tim Deegan
  2015-05-26  8:05               ` Vitaly Kuznetsov
  0 siblings, 1 reply; 31+ messages in thread
From: Tim Deegan @ 2015-05-25 16:13 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	David Vrabel, Jan Beulich, xen-devel, Daniel De Graaf,
	Keir Fraser

At 12:06 +0200 on 25 May (1432555565), Vitaly Kuznetsov wrote:
> Tim Deegan <tim@xen.org> writes:
> 
> > At 17:26 +0100 on 22 May (1432315574), Jan Beulich wrote:
> >> >>> On 22.05.15 at 17:36, <vkuznets@redhat.com> wrote:
> >> >>>>> On 13.05.15 at 11:49, <vkuznets@redhat.com> wrote:
> >> >>> +    if ( !source_d->is_dying )
> >> >>> +    {
> >> >>> +        /*
> >> >>> +         * Make sure no allocation/remapping for the source domain is ongoing
> >> >>> +         * and set is_dying flag to prevent such actions in future.
> >> >>> +         */
> >> >>> +        spin_lock(&source_d->page_alloc_lock);
> >> >>> +        source_d->is_dying = DOMDYING_locked;
> >> >>
> >> >> Furthermore I don't see how this prevents there being further
> >> >> changes to the GFN <-> MFN mappings for the guest (yet you rely
> >> >> on them to be stable below afaict).
> >> >>
> >> > 
> >> > As you suggested below we can complement that by pausing both source and
> >> > destination domains here. In that case these domains won't be changing
> >> > their mappings by themselves but it would still be possible for the
> >> > hypervisor to change something. We do have !d->is_dying check in many
> >> > places though ... In theory we could have taken the p2m_lock() for both
> >> > domains but I'm not sure all stuff I need here will cope with p2m_lock()
> >> > already being held, this lock is x86-specific and I'm not sure we want
> >> > it in the first place. I'd be very grateful for some additional ideas on
> >> > how to make it safer.
> >> 
> >> For whether p2m_lock() might be needed here I'd like to defer to
> >> Tim.
> >
> > I don't think that will work.  Given that you have to make this
> > preemptable you can't hope to hold the p2m lock for the entire
> > operation.
> >
> > If you want to make sure that the p2m mapping doesn't change
> > underfoot, you can use get_gfn()/put_gfn() around each individual
> > operation.
> 
> Thanks, I see... An additional concern from Jan was (I suppose) about
> the safeness (or correctness) of this operation as a whole: even when
> both source and destination domains are paused their mappings can be
> changed by the control domain (especially having possible preemption in
> mind).

True.  I don't think there's anything you can do about that.  As I
said, you can't hold the p2m lock while your operation is
preempted.  AFAICT this operation is pretty much best-effort already;
it doesn't really handle grant-table/shared/paged/readonly/access-controlled
mappings.

> >  If you use get_gfn_type_access() it will also report
> > superpage mappings, so you can drop the loop that attempts
> > to detect them in the PoD case.
> 
> Thanks, I'll have a look. This one is x86-specific but the whole PoD
> case is already x86-specific because of p2m_is_pod().

It might be worth adding this interface to arm, depending on
what you want to do about p2m-access types.  E.g. can a guest OS use
this kexec operation to turn off all its p2m-access controls (and so
escape from the control of an external security tool)?

> > Incidentally, counting from 0-->max_mapped_gfn is unfortunate (though I
> > know this is not the first code to do that).  It would be better
> > to introduce an iterator over the p2m itself, either some sort of
> > for_each_gfn(dom, callback_fn, callback_arg) or a get_next_gfn(dom,
> > base_gfn, &found_gfn) that skips unmapped areas.
> 
> I agree. Do you see this as a compulsory part of this series?

I don't think so; not in something like this form anyway.  If you
wanted to handle the source p2m changing underfoot by re-scanning for
new entries, then you'd definitely want it.

Tim.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 04/10] xen: Introduce XENMEM_soft_reset operation
  2015-05-25 16:13             ` Tim Deegan
@ 2015-05-26  8:05               ` Vitaly Kuznetsov
  2015-05-26  8:50                 ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Vitaly Kuznetsov @ 2015-05-26  8:05 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	David Vrabel, Jan Beulich, xen-devel, Daniel De Graaf,
	Keir Fraser

Tim Deegan <tim@xen.org> writes:

> At 12:06 +0200 on 25 May (1432555565), Vitaly Kuznetsov wrote:
>> Tim Deegan <tim@xen.org> writes:
>> 
>> > At 17:26 +0100 on 22 May (1432315574), Jan Beulich wrote:
>> >> >>> On 22.05.15 at 17:36, <vkuznets@redhat.com> wrote:
>> >> >>>>> On 13.05.15 at 11:49, <vkuznets@redhat.com> wrote:
>> >> >>> +    if ( !source_d->is_dying )
>> >> >>> +    {
>> >> >>> +        /*
>> >> >>> +         * Make sure no allocation/remapping for the source domain is ongoing
>> >> >>> +         * and set is_dying flag to prevent such actions in future.
>> >> >>> +         */
>> >> >>> +        spin_lock(&source_d->page_alloc_lock);
>> >> >>> +        source_d->is_dying = DOMDYING_locked;
>> >> >>
>> >> >> Furthermore I don't see how this prevents there being further
>> >> >> changes to the GFN <-> MFN mappings for the guest (yet you rely
>> >> >> on them to be stable below afaict).
>> >> >>
>> >> > 
>> >> > As you suggested below we can complement that by pausing both source and
>> >> > destination domains here. In that case these domains won't be changing
>> >> > their mappings by themselves but it would still be possible for the
>> >> > hypervisor to change something. We do have !d->is_dying check in many
>> >> > places though ... In theory we could have taken the p2m_lock() for both
>> >> > domains but I'm not sure all stuff I need here will cope with p2m_lock()
>> >> > already being held, this lock is x86-specific and I'm not sure we want
>> >> > it in the first place. I'd be very grateful for some additional ideas on
>> >> > how to make it safer.
>> >> 
>> >> For whether p2m_lock() might be needed here I'd like to defer to
>> >> Tim.
>> >
>> > I don't think that will work.  Given that you have to make this
>> > preemptable you can't hope to hold the p2m lock for the entire
>> > operation.
>> >
>> > If you want to make sure that the p2m mapping doesn't change
>> > underfoot, you can use get_gfn()/put_gfn() around each individual
>> > operation.
>> 
>> Thanks, I see... An additional concern from Jan was (I suppose) about
>> the safeness (or correctness) of this operation as a whole: even when
>> both source and destination domains are paused their mappings can be
>> changed by the control domain (especially having possible preemption in
>> mind).
>
> True.  I don't think there's anything you can do about that.  As I
> said, you can't hold the p2m lock while your operation is
> preempted.  AFAICT this operation is pretty much best-effort already;
> it doesn't really handle grant-table/shared/paged/readonly/access-controlled
> mappings.

This is intentional mostly intentional and desired: when a Linux guest
does e.g. kexec there is no easy way to transfer the knowledge from one
kernel to another. This operation is supposed to emulate a normal reboot
for a domain where all these special mappings are being lost.

>
>> >  If you use get_gfn_type_access() it will also report
>> > superpage mappings, so you can drop the loop that attempts
>> > to detect them in the PoD case.
>> 
>> Thanks, I'll have a look. This one is x86-specific but the whole PoD
>> case is already x86-specific because of p2m_is_pod().
>
> It might be worth adding this interface to arm, depending on
> what you want to do about p2m-access types.  E.g. can a guest OS use
> this kexec operation to turn off all its p2m-access controls (and so
> escape from the control of an external security tool)?

Good question. We could definitely do
p2m_get_mem_access/p2m_set_mem_access to preserve this information but
I'm not sure we really need this. What happens when a domain reboots? It
gets new p2m and all p2m-access controls are being lost so all external
security tools are supposed to re-establish all the required p2m-access
controls. The only difference with soft-reset is the fact that the
memory content is being preserved but the same effect could in theory be 
achieved by the guest domain itself by e.g. dumping all memory content
to a persistent storage, rebooting and restoring it.

>
>> > Incidentally, counting from 0-->max_mapped_gfn is unfortunate (though I
>> > know this is not the first code to do that).  It would be better
>> > to introduce an iterator over the p2m itself, either some sort of
>> > for_each_gfn(dom, callback_fn, callback_arg) or a get_next_gfn(dom,
>> > base_gfn, &found_gfn) that skips unmapped areas.
>> 
>> I agree. Do you see this as a compulsory part of this series?
>
> I don't think so; not in something like this form anyway.  If you
> wanted to handle the source p2m changing underfoot by re-scanning for
> new entries, then you'd definitely want it.

No, not really. Protecting against a misbehaving control domain
(guaranteeing the correctness of the operation) is going
to be incomplete anyway. I think we just need to make this operation
safe so even the control domain won't be able to compromise the
hypervisor by doing something nasty.

>
> Tim.

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 04/10] xen: Introduce XENMEM_soft_reset operation
  2015-05-26  8:05               ` Vitaly Kuznetsov
@ 2015-05-26  8:50                 ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2015-05-26  8:50 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Olaf Hering, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Julien Grall, Ian Jackson, Andrew Jones,
	Tim Deegan, David Vrabel, xen-devel, Daniel De Graaf,
	Keir Fraser

>>> On 26.05.15 at 10:05, <vkuznets@redhat.com> wrote:
> No, not really. Protecting against a misbehaving control domain
> (guaranteeing the correctness of the operation) is going
> to be incomplete anyway. I think we just need to make this operation
> safe so even the control domain won't be able to compromise the
> hypervisor by doing something nasty.

Of course, as new exceptions as per XSA-77 are not allowed.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2015-05-26  8:50 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-13  9:49 [PATCH v6 00/10] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
2015-05-13  9:49 ` [PATCH v6 01/10] xen: introduce SHUTDOWN_soft_reset shutdown reason Vitaly Kuznetsov
2015-05-13  9:49 ` [PATCH v6 02/10] libxl: support " Vitaly Kuznetsov
2015-05-20  9:44   ` Wei Liu
2015-05-13  9:49 ` [PATCH v6 03/10] xen: introduce DOMDYING_locked state Vitaly Kuznetsov
2015-05-13  9:49 ` [PATCH v6 04/10] xen: Introduce XENMEM_soft_reset operation Vitaly Kuznetsov
2015-05-22  9:38   ` Jan Beulich
2015-05-22 15:36     ` Vitaly Kuznetsov
2015-05-22 16:26       ` Jan Beulich
2015-05-25  9:24         ` Tim Deegan
2015-05-25 10:06           ` Vitaly Kuznetsov
2015-05-25 16:13             ` Tim Deegan
2015-05-26  8:05               ` Vitaly Kuznetsov
2015-05-26  8:50                 ` Jan Beulich
2015-05-13  9:49 ` [PATCH v6 05/10] xsm: add XENMEM_soft_reset support Vitaly Kuznetsov
2015-05-20 23:30   ` Daniel De Graaf
2015-05-21  9:49     ` Vitaly Kuznetsov
2015-05-21 14:25       ` Daniel De Graaf
2015-05-22  9:40   ` Jan Beulich
2015-05-22 14:58     ` Daniel De Graaf
2015-05-22 15:26       ` Jan Beulich
2015-05-22 14:59     ` Vitaly Kuznetsov
2015-05-13  9:49 ` [PATCH v6 06/10] libxc: support XENMEM_soft_reset operation Vitaly Kuznetsov
2015-05-13  9:49 ` [PATCH v6 07/10] libxc: introduce soft reset for HVM domains Vitaly Kuznetsov
2015-05-20 15:10   ` Julien Grall
2015-05-20 15:20     ` Vitaly Kuznetsov
2015-05-20 15:28       ` Julien Grall
2015-05-13  9:49 ` [PATCH v6 08/10] xl: introduce enum domain_restart_type Vitaly Kuznetsov
2015-05-13  9:49 ` [PATCH v6 09/10] libxc: add XC_DEVICE_MODEL_SAVE_FILE Vitaly Kuznetsov
2015-05-13  9:49 ` [PATCH v6 10/10] (lib)xl: soft reset support Vitaly Kuznetsov
2015-05-22 14:55   ` Vitaly Kuznetsov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.