All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC/WIPv2 0/6] toolstack-based approach to pvhvm guest kexec
@ 2014-09-24 14:20 Vitaly Kuznetsov
  2014-09-24 14:20 ` [PATCH RFC/WIPv2 1/6] Introduce XENMEM_transfer operation Vitaly Kuznetsov
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Vitaly Kuznetsov @ 2014-09-24 14:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Andrew Jones, David Vrabel, Jan Beulich

When a PVHVM linux guest performs kexec there are lots of things which
require taking care of:
- shared info, vcpu_info
- grants
- event channels
- ...
Instead of taking care of all these things we can rebuild the domain
performing kexec from scratch doing so-called soft-reboot.

The idea was suggested by David Vrabel, Jan Beulich, and Konrad Rzeszutek Wilk.

Main RFC part:
I'm not sure my suggested XENMEM_transfer op is the right way to go:
- If we steal pages from a particular domain it should be dead as it makes no
  sense to it after such the call.
- we need to copy all L1-L4 pages, rebuild the shared info ... for PV. This
  will result in additional complexity in libxc (rebuilding the p2m).
- we also need to keep track of all copied L1-L4 pages for PV as we'll be
  re-creating p2m.

I can see three possible ways to go:
1) Forbid the call for PV and remove this part from libxc. The easiest way to
   go (and PV kexec/kdump will require additional work on kernel side anyway).
2) Go ahead and rebuild p2m in libxc (similar to what we have for save/restore
    path).
3) Instead of XENMEM_transfer introduce special oneshot domain kill op which
   will follow the same domain_kill path but instead of relinquishing resources
   it will reassign them. I suppose it will be posible to reassign L1-L4 pages
   and pages from xenheap here as well.

What would you say?

WIP part:
- PV support is broken.
- Not sure huge pages will work well.
- Not sure about ARM/PVH.
- Not tested with qemu-upstream.

P.S. The patch series can be tested with PVHVM Linux guest with the following
modifications:

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c0cb11f..33c5cdd 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -33,6 +33,10 @@
 #include <linux/memblock.h>
 #include <linux/edd.h>
 
+#ifdef CONFIG_KEXEC
+#include <linux/kexec.h>
+#endif
+
 #include <xen/xen.h>
 #include <xen/events.h>
 #include <xen/interface/xen.h>
@@ -1810,6 +1814,22 @@ static struct notifier_block xen_hvm_cpu_notifier = {
   .notifier_call   = xen_hvm_cpu_notify,
 };
 
+#ifdef CONFIG_KEXEC
+static void xen_pvhvm_kexec_shutdown(void)
+{
+	native_machine_shutdown();
+	if (kexec_in_progress) {
+	   xen_reboot(SHUTDOWN_soft_reset);
+	   }
+}
+
+static void xen_pvhvm_crash_shutdown(struct pt_regs *regs)
+{
+	native_machine_crash_shutdown(regs);
+	xen_reboot(SHUTDOWN_soft_reset);
+}
+#endif
+
 static void __init xen_hvm_guest_init(void)
 {
	init_hvm_pv_info();
@@ -1826,6 +1846,10 @@ static void __init xen_hvm_guest_init(void)
   x86_init.irqs.intr_init = xen_init_IRQ;
   xen_hvm_init_time_ops();
   xen_hvm_init_mmu_ops();
+#ifdef CONFIG_KEXEC
+	machine_ops.shutdown = xen_pvhvm_kexec_shutdown;
+	machine_ops.crash_shutdown = xen_pvhvm_crash_shutdown;
+#endif
 }
 
 static bool xen_nopv = false;
diff --git a/include/xen/interface/sched.h b/include/xen/interface/sched.h
index 9ce0839..b5942a8 100644
--- a/include/xen/interface/sched.h
+++ b/include/xen/interface/sched.h
@@ -107,5 +107,6 @@ struct sched_watchdog {
 #define SHUTDOWN_suspend    2  /* Clean up, save suspend info, kill.         */
 #define SHUTDOWN_crash      3  /* Tell controller we've crashed.             */
 #define SHUTDOWN_watchdog   4  /* Restart because watchdog time expired.     */
+#define SHUTDOWN_soft_reset 5  /* Soft-reset for kexec.                      */
 
 #endif /* __XEN_PUBLIC_SCHED_H__ */

Vitaly Kuznetsov (6):
  Introduce XENMEM_transfer operation
  libxc: support XENMEM_transfer operation
  libxc: introduce soft reset
  xen: Introduce SHUTDOWN_soft_reset shutdown reason
  libxl: support SHUTDOWN_soft_reset shutdown reason
  libxl: soft reset support

 tools/libxc/Makefile               |   1 +
 tools/libxc/xc_domain.c            |  19 +++
 tools/libxc/xc_domain_soft_reset.c | 300 +++++++++++++++++++++++++++++++++++++
 tools/libxc/xenctrl.h              |   6 +
 tools/libxc/xenguest.h             |  19 +++
 tools/libxl/libxl.h                |   6 +
 tools/libxl/libxl_create.c         | 100 +++++++++++--
 tools/libxl/libxl_internal.h       |   5 +
 tools/libxl/libxl_types.idl        |   1 +
 tools/libxl/xl_cmdimpl.c           |  31 +++-
 tools/python/xen/lowlevel/xl/xl.c  |   1 +
 xen/common/memory.c                | 178 ++++++++++++++++++++++
 xen/common/shutdown.c              |   7 +
 xen/include/public/memory.h        |  32 +++-
 xen/include/public/sched.h         |   3 +-
 15 files changed, 695 insertions(+), 14 deletions(-)
 create mode 100644 tools/libxc/xc_domain_soft_reset.c

-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RFC/WIPv2 1/6] Introduce XENMEM_transfer operation
  2014-09-24 14:20 [PATCH RFC/WIPv2 0/6] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
@ 2014-09-24 14:20 ` Vitaly Kuznetsov
  2014-09-24 15:07   ` Andrew Cooper
  2014-09-24 14:20 ` [PATCH RFC/WIPv2 2/6] libxc: support " Vitaly Kuznetsov
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Vitaly Kuznetsov @ 2014-09-24 14:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Andrew Jones, David Vrabel, Jan Beulich

New operation reassigns pages from one domain to the other mapping them
at exactly the same GFNs in the destination domain. Pages mapped more
than once (e.g. granted pages) are being copied.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 xen/common/memory.c         | 178 ++++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/memory.h |  32 +++++++-
 2 files changed, 209 insertions(+), 1 deletion(-)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 2e3225d..653e117 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -578,6 +578,180 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
     return rc;
 }
 
+static long memory_transfer(XEN_GUEST_HANDLE_PARAM(xen_memory_transfer_t) arg)
+{
+    long rc = 0;
+    struct xen_memory_transfer trans;
+    struct domain *source_d, *dest_d;
+    unsigned long mfn, gmfn, last_gmfn;
+    p2m_type_t p2mt;
+    struct page_info *page, *new_page;
+    char *sp, *dp;
+    int copying;
+
+    if ( copy_from_guest(&trans, arg, 1) )
+        return -EFAULT;
+
+    source_d = rcu_lock_domain_by_any_id(trans.source_domid);
+    if ( source_d == NULL )
+    {
+        rc = -ESRCH;
+        goto fail_early;
+    }
+
+    if ( source_d->is_dying )
+    {
+        rc = -EINVAL;
+        rcu_unlock_domain(source_d);
+        goto fail_early;
+    }
+
+    dest_d = rcu_lock_domain_by_any_id(trans.dest_domid);
+    if ( dest_d == NULL )
+    {
+        rc = -ESRCH;
+        rcu_unlock_domain(source_d);
+        goto fail_early;
+    }
+
+    if ( dest_d->is_dying )
+    {
+        rc = -EINVAL;
+        goto fail;
+    }
+
+    last_gmfn = trans.gmfn_start + trans.gmfn_count;
+    for ( gmfn = trans.gmfn_start; gmfn < last_gmfn; gmfn++ )
+    {
+        page = get_page_from_gfn(source_d, gmfn, &p2mt, 0);
+        if ( !page )
+        {
+            continue;
+        }
+
+        mfn = page_to_mfn(page);
+        if ( !mfn_valid(mfn) )
+        {
+            put_page(page);
+            continue;
+        }
+
+        copying = 0;
+
+        if ( is_xen_heap_mfn(mfn) )
+        {
+            put_page(page);
+            continue;
+        }
+
+        /* Page table always worth copying */
+        if ( (page->u.inuse.type_info & PGT_l4_page_table) ||
+             (page->u.inuse.type_info & PGT_l3_page_table) ||
+             (page->u.inuse.type_info & PGT_l2_page_table) ||
+             (page->u.inuse.type_info & PGT_l1_page_table) )
+            copying = 1;
+
+        /*
+         * A normal page is supposed to have count_info = 2 ( 1 from the domain
+         * and 1 from get_page_from_gfn() above ). If the condition is not met
+         * copy the page. These are granted pages, vcpu info pages, ...
+         */
+        if ( (page->count_info & (PGC_count_mask|PGC_allocated)) !=
+              (2 | PGC_allocated) )
+            copying = 1;
+
+        if ( copying )
+        {
+            new_page = alloc_domheap_page(dest_d, 0);
+            if ( !new_page )
+            {
+                gdprintk(XENLOG_INFO, "Failed to alloc free page instead of "
+                         "%lx\n", mfn);
+                rc = -ENOMEM;
+                put_page(page);
+                goto fail;
+            }
+            if ( (page->u.inuse.type_info & PGT_l4_page_table) )
+                new_page->u.inuse.type_info = PGT_l4_page_table;
+
+            if ( (page->u.inuse.type_info & PGT_l3_page_table) )
+                new_page->u.inuse.type_info = PGT_l3_page_table;
+
+            if ( (page->u.inuse.type_info & PGT_l2_page_table) )
+                new_page->u.inuse.type_info = PGT_l2_page_table;
+
+            if ( (page->u.inuse.type_info & PGT_l1_page_table) )
+                new_page->u.inuse.type_info = PGT_l1_page_table;
+
+            if ( (page->u.inuse.type_info & PGT_pinned) )
+                set_bit(_PGT_pinned, &new_page->u.inuse.type_info);
+
+            sp = map_domain_page(mfn);
+            mfn = page_to_mfn(new_page);
+            dp = map_domain_page(mfn);
+            memcpy(dp, sp, PAGE_SIZE);
+            unmap_domain_page(dp);
+            unmap_domain_page(sp);
+            put_page(page);
+        }
+        else
+        {
+            new_page = page;
+            spin_lock(&source_d->page_alloc_lock);
+            page_set_owner(page, NULL);
+            page_list_del(page, &source_d->page_list);
+            /*
+             * Don't use domain_adjust_tot_pages() here as we're reassigning
+             * the page to avoid increasing outstanding_pages counter.
+             */
+            source_d->tot_pages -= 1;
+            if ( unlikely(!source_d->tot_pages) )
+                put_domain(source_d);
+            guest_physmap_remove_page(source_d, gmfn, mfn, 0);
+            spin_unlock(&source_d->page_alloc_lock);
+            put_page(page);
+            if ( assign_pages(dest_d, page, 0, 0) )
+            {
+                gdprintk(XENLOG_INFO, "Failed to assign page to destination domain"
+                         " mfn: %lx\n", mfn);
+                rc = -EFAULT;
+                goto fail;
+            }
+        }
+
+        if ( guest_physmap_add_page(dest_d, gmfn, mfn, 0) ) {
+            gdprintk(XENLOG_INFO, "Failed to add page to domain's physmap"
+                     " mfn: %lx\n", mfn);
+            rc = -EFAULT;
+            goto fail;
+        }
+
+        trans.nr_transferred++;
+
+        if ( hypercall_preempt_check() && (gmfn + 1 < last_gmfn) )
+        {
+            trans.gmfn_start = gmfn + 1;
+            rcu_unlock_domain(source_d);
+            rcu_unlock_domain(dest_d);
+            if ( __copy_field_to_guest(arg, &trans, gmfn_start) )
+                return -EFAULT;
+            if ( __copy_field_to_guest(arg, &trans, nr_transferred) )
+                return -EFAULT;
+            return hypercall_create_continuation(
+                __HYPERVISOR_memory_op, "lh", XENMEM_transfer, arg);
+        }
+    }
+
+ fail:
+    rcu_unlock_domain(dest_d);
+    rcu_unlock_domain(source_d);
+ fail_early:
+    if ( __copy_field_to_guest(arg, &trans, nr_transferred) )
+        rc = -EFAULT;
+
+    return rc;
+}
+
 static int xenmem_add_to_physmap(struct domain *d,
                                  struct xen_add_to_physmap *xatp,
                                  unsigned int start)
@@ -781,6 +955,10 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         rc = memory_exchange(guest_handle_cast(arg, xen_memory_exchange_t));
         break;
 
+    case XENMEM_transfer:
+        rc = memory_transfer(guest_handle_cast(arg, xen_memory_transfer_t));
+        break;
+
     case XENMEM_maximum_ram_page:
         rc = max_page;
         break;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index db961ec..1414012 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -570,10 +570,40 @@ DEFINE_XEN_GUEST_HANDLE(vnuma_topology_info_t);
  * vNUMA topology from hypervisor.
  */
 #define XENMEM_get_vnumainfo               26
+/*
+ * Trasfer pages from one domain to another. Pages are unmapped from
+ * source domain and mapped at exactly the same GFNs to the destination
+ * domain.
+ *
+ * If a particular page is mapped more then once a new page is being allocated
+ * for the destination domain and its content is being copied instead of
+ * reassigning. The original page remains mapped to the source domain.
+ *
+ * The caller has to be priviliged.
+ */
+
+#define XENMEM_transfer                     27
+struct xen_memory_transfer {
+    /*
+     * [IN] Transfer details.
+     */
+    domid_t source_domid; /* steal pages from */
+    domid_t dest_domid; /* assign pages to */
+
+    xen_pfn_t gmfn_start; /* start from gmfn */
+    uint64_aligned_t gmfn_count; /* how many pages to steal */
+
+    /*
+     * [OUT] Number of transfered pages including copies.
+     */
+    xen_ulong_t nr_transferred;
+};
+typedef struct xen_memory_transfer xen_memory_transfer_t;
+DEFINE_XEN_GUEST_HANDLE(xen_memory_transfer_t);
 
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
-/* Next available subop number is 27 */
+/* Next available subop number is 28 */
 
 #endif /* __XEN_PUBLIC_MEMORY_H__ */
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RFC/WIPv2 2/6] libxc: support XENMEM_transfer operation
  2014-09-24 14:20 [PATCH RFC/WIPv2 0/6] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
  2014-09-24 14:20 ` [PATCH RFC/WIPv2 1/6] Introduce XENMEM_transfer operation Vitaly Kuznetsov
@ 2014-09-24 14:20 ` Vitaly Kuznetsov
  2014-09-24 14:20 ` [PATCH RFC/WIPv2 3/6] libxc: introduce soft reset Vitaly Kuznetsov
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Vitaly Kuznetsov @ 2014-09-24 14:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Andrew Jones, David Vrabel, Jan Beulich

Introduce xc_domain_transfer_pages() function supporting XENMEM_transfer.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/libxc/xc_domain.c | 19 +++++++++++++++++++
 tools/libxc/xenctrl.h   |  6 ++++++
 2 files changed, 25 insertions(+)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 1a6f90a..b844f8b 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -924,6 +924,25 @@ int xc_domain_claim_pages(xc_interface *xch,
     return err;
 }
 
+int xc_domain_transfer_pages(xc_interface *xch,
+                             uint32_t source_domid,
+                             uint32_t dest_domid,
+                             xen_pfn_t gmfn_start,
+                             uint64_t gmfn_count)
+{
+    int err;
+    struct xen_memory_transfer trans = {
+        .source_domid   = source_domid,
+        .dest_domid     = dest_domid,
+        .gmfn_start      = gmfn_start,
+        .gmfn_count      = gmfn_count,
+        .nr_transferred = 0
+    };
+
+    err = do_memory_op(xch, XENMEM_transfer, &trans, sizeof(trans));
+    return err;
+}
+
 int xc_domain_populate_physmap(xc_interface *xch,
                                uint32_t domid,
                                unsigned long nr_extents,
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 514b241..cff36c6 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1353,6 +1353,12 @@ int xc_domain_claim_pages(xc_interface *xch,
                                uint32_t domid,
                                unsigned long nr_pages);
 
+int xc_domain_transfer_pages(xc_interface *xch,
+			     uint32_t source_domid,
+			     uint32_t dest_domid,
+			     xen_pfn_t gmfn_start,
+			     uint64_t gmfn_count);
+
 int xc_domain_memory_exchange_pages(xc_interface *xch,
                                     int domid,
                                     unsigned long nr_in_extents,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RFC/WIPv2 3/6] libxc: introduce soft reset
  2014-09-24 14:20 [PATCH RFC/WIPv2 0/6] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
  2014-09-24 14:20 ` [PATCH RFC/WIPv2 1/6] Introduce XENMEM_transfer operation Vitaly Kuznetsov
  2014-09-24 14:20 ` [PATCH RFC/WIPv2 2/6] libxc: support " Vitaly Kuznetsov
@ 2014-09-24 14:20 ` Vitaly Kuznetsov
  2014-09-24 14:20 ` [PATCH RFC/WIPv2 4/6] xen: Introduce SHUTDOWN_soft_reset shutdown reason Vitaly Kuznetsov
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Vitaly Kuznetsov @ 2014-09-24 14:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Andrew Jones, David Vrabel, Jan Beulich

Add new xc_domain_soft_reset() function which performs so-called soft-reset
for a domain. During soft reset all source domain's memory is being reassigned
to the destination domain, cpu contexts are being copied,... The behavior is
similar to save/restore except for memory reassigning.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/libxc/Makefile               |   1 +
 tools/libxc/xc_domain_soft_reset.c | 300 +++++++++++++++++++++++++++++++++++++
 tools/libxc/xenguest.h             |  19 +++
 3 files changed, 320 insertions(+)
 create mode 100644 tools/libxc/xc_domain_soft_reset.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 3b04027..b5d4b60 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -49,6 +49,7 @@ GUEST_SRCS-y += xc_offline_page.c xc_compression.c
 else
 GUEST_SRCS-y += xc_nomigrate.c
 endif
+GUEST_SRCS-y += xc_domain_soft_reset.c
 
 vpath %.c ../../xen/common/libelf
 CFLAGS += -I../../xen/common/libelf
diff --git a/tools/libxc/xc_domain_soft_reset.c b/tools/libxc/xc_domain_soft_reset.c
new file mode 100644
index 0000000..c5d9873
--- /dev/null
+++ b/tools/libxc/xc_domain_soft_reset.c
@@ -0,0 +1,300 @@
+/******************************************************************************
+ * xc_domain_soft_reset.c
+ *
+ * Do soft reset.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include <inttypes.h>
+#include <time.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/time.h>
+
+#include "xc_private.h"
+#include "xc_core.h"
+#include "xc_bitops.h"
+#include "xc_dom.h"
+#include "xg_private.h"
+#include "xg_save_restore.h"
+
+#include <xen/hvm/params.h>
+
+static unsigned long saverestore_hvm_param(xc_interface *xch, uint32_t source_dom,
+                                           uint32_t dest_dom, int param)
+{
+	uint64_t val = 0;
+	xc_hvm_param_get(xch, source_dom, param, &val);
+
+	if ( val )
+	    xc_hvm_param_set(xch, dest_dom, param, val);
+
+    return val;
+}
+
+int xc_domain_soft_reset(xc_interface *xch, uint32_t source_dom,
+                         uint32_t dest_dom, int hvm, domid_t console_domid,
+                         unsigned int console_evtchn,
+                         unsigned long *console_mfn,
+                         domid_t store_domid, unsigned int store_evtchn,
+                         unsigned long *store_mfn)
+{
+    DECLARE_DOMCTL;
+    xc_dominfo_t info;
+    int rc = 1, i;
+
+    /* A copy of the CPU context of the guest. */
+    vcpu_guest_context_any_t _ctxt;
+    vcpu_guest_context_any_t *ctxt = &_ctxt;
+
+    /* HVM: a buffer for holding HVM context */
+    uint32_t hvm_buf_size = 0;
+    uint8_t *hvm_buf = NULL;
+    unsigned long console_pfn, store_pfn, io_pfn, buffio_pfn;
+    unsigned long mfn, pfn, max_pfn, pagetype;
+    start_info_any_t *start_info;
+    struct xc_domain_meminfo minfo;
+
+    DPRINTF("%s: soft reset domid %u -> %u", __func__, source_dom, dest_dom);
+
+    if ( xc_domain_getinfo(xch, source_dom, 1, &info) != 1 )
+    {
+        PERROR("Could not get domain info");
+        return 1;
+    }
+
+    max_pfn = xc_domain_maximum_gpfn(xch, source_dom);
+
+    if (!hvm)
+    {
+        memset(&minfo, 0, sizeof(minfo));
+        if ( xc_map_domain_meminfo(xch, source_dom, &minfo) )
+        {
+            PERROR("Could not map domain %d memory information\n", source_dom);
+            goto out;
+        }
+        for (pfn = 0; pfn < minfo.p2m_size; pfn++)
+        {
+            rc = xc_domain_transfer_pages(xch, source_dom, dest_dom, minfo.p2m_table[pfn], 1);
+            if ( rc != 0 )
+            {
+                PERROR("failed to transfer pfn %lx, rc=%d\n", pfn, rc);
+                goto out;
+            }
+        }
+    }
+
+    if (hvm) {
+        rc = xc_domain_transfer_pages(xch, source_dom, dest_dom, 0, max_pfn + 1);
+        if ( rc != 0 )
+        {
+            PERROR("failed to transfer pages, rc=%d\n", rc);
+            goto out;
+        }
+
+	    hvm_buf_size = xc_domain_hvm_getcontext(xch, source_dom, 0, 0);
+	    if ( hvm_buf_size == -1 )
+	    {
+		    PERROR("Couldn't get HVM context size from Xen");
+		    goto out;
+	    }
+
+        hvm_buf = malloc(hvm_buf_size);
+	    if ( !hvm_buf )
+	    {
+		    ERROR("Couldn't allocate memory");
+		    goto out;
+	    }
+
+        if ( xc_domain_hvm_getcontext(xch, source_dom, hvm_buf,
+                                      hvm_buf_size) == -1 )
+        {
+            PERROR("HVM:Could not get hvm buffer");
+            goto out;
+        }
+
+        if ( xc_domain_hvm_setcontext(xch, dest_dom, hvm_buf,
+                                      hvm_buf_size) == -1 )
+        {
+            PERROR("HVM:Could not set hvm buffer");
+            goto out;
+        }
+
+	    store_pfn = saverestore_hvm_param(xch, source_dom, dest_dom,
+                                          HVM_PARAM_STORE_PFN);
+        if ( store_pfn )
+            xc_clear_domain_page(xch, dest_dom, store_pfn);
+        *store_mfn = store_pfn;
+
+	    console_pfn = saverestore_hvm_param(xch, source_dom, dest_dom,
+                                            HVM_PARAM_CONSOLE_PFN);
+        if ( console_pfn )
+            xc_clear_domain_page(xch, dest_dom, console_pfn);
+        *console_mfn = console_pfn;
+
+	    buffio_pfn = saverestore_hvm_param(xch, source_dom, dest_dom,
+                                           HVM_PARAM_BUFIOREQ_PFN);
+        if ( buffio_pfn )
+            xc_clear_domain_page(xch, dest_dom, buffio_pfn);
+
+        io_pfn = saverestore_hvm_param(xch, source_dom, dest_dom,
+                                       HVM_PARAM_IOREQ_PFN);
+        if ( io_pfn )
+            xc_clear_domain_page(xch, dest_dom, buffio_pfn);
+
+	    saverestore_hvm_param(xch, source_dom, dest_dom, HVM_PARAM_IDENT_PT);
+	    saverestore_hvm_param(xch, source_dom, dest_dom, HVM_PARAM_PAGING_RING_PFN);
+	    saverestore_hvm_param(xch, source_dom, dest_dom, HVM_PARAM_ACCESS_RING_PFN);
+	    saverestore_hvm_param(xch, source_dom, dest_dom, HVM_PARAM_VM86_TSS);
+	    saverestore_hvm_param(xch, source_dom, dest_dom, HVM_PARAM_ACPI_IOPORTS_LOCATION);
+	    saverestore_hvm_param(xch, source_dom, dest_dom, HVM_PARAM_VIRIDIAN);
+	    saverestore_hvm_param(xch, source_dom, dest_dom, HVM_PARAM_PAE_ENABLED);
+	    saverestore_hvm_param(xch, source_dom, dest_dom, HVM_PARAM_STORE_EVTCHN);
+
+	    saverestore_hvm_param(xch, source_dom, dest_dom, HVM_PARAM_IOREQ_SERVER_PFN);
+	    saverestore_hvm_param(xch, source_dom, dest_dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES);
+	    saverestore_hvm_param(xch, source_dom, dest_dom, HVM_PARAM_VM_GENERATION_ID_ADDR);
+
+        if (xc_dom_gnttab_hvm_seed(xch, dest_dom, console_pfn, store_pfn,
+                                   console_domid, store_domid))
+        {
+            PERROR("error seeding hvm grant table");
+            goto out;
+        }
+
+    }
+    else
+    {
+        for ( i = 0; i <= info.max_vcpu_id; i++ )
+        {
+            if ( xc_vcpu_getcontext(xch, source_dom, i, ctxt) )
+            {
+                PERROR("No context for VCPU%d", i);
+                goto out;
+            }
+            domctl.cmd = XEN_DOMCTL_get_ext_vcpucontext;
+            domctl.domain = source_dom;
+            memset(&domctl.u, 0, sizeof(domctl.u));
+            domctl.u.ext_vcpucontext.vcpu = i;
+            if ( xc_domctl(xch, &domctl) < 0 )
+            {
+                PERROR("No extended context for VCPU%d", i);
+                goto out;
+            }
+
+            if ( xc_vcpu_setcontext(xch, dest_dom, i, ctxt) )
+            {
+                PERROR("Failed to set vcpu context for VCPU%d", i);
+                goto out;
+            }
+
+            domctl.cmd = XEN_DOMCTL_set_ext_vcpucontext;
+            domctl.domain = dest_dom;
+            if ( xc_domctl(xch, &domctl) < 0 )
+            {
+                PERROR("Failed to set extended context for VCPU%d", i);
+                goto out;
+            }
+
+            if ( i == 0 )
+            {
+                /* Mounting start_info for PV guest */
+                pfn = GET_FIELD(ctxt, user_regs.edx, minfo.guest_width);
+                pagetype = minfo.pfn_type[i] & XEN_DOMCTL_PFINFO_LTAB_MASK;
+                if ( (pfn >= minfo.p2m_size) ||
+                     (pagetype != XEN_DOMCTL_PFINFO_NOTAB) )
+                {
+                    ERROR("start_info frame number is bad");
+                    goto out;
+                }
+
+                mfn = minfo.p2m_table[pfn];
+                SET_FIELD(ctxt, user_regs.edx, mfn, minfo.guest_width);
+                start_info = xc_map_foreign_range(
+                    xch, dest_dom, PAGE_SIZE, PROT_READ | PROT_WRITE, mfn);
+                if ( start_info == NULL )
+                {
+                    PERROR("xc_map_foreign_range failed (for start_info)");
+                    goto out;
+                }
+
+                SET_FIELD(start_info, nr_pages, minfo.p2m_size,
+                          minfo.guest_width);
+                SET_FIELD(start_info, shared_info,
+                          info.shared_info_frame<<PAGE_SHIFT, minfo.guest_width);
+                SET_FIELD(start_info, flags, 0, minfo.guest_width);
+                if ( GET_FIELD(start_info, store_mfn, minfo.guest_width) >
+                     minfo.p2m_size )
+                {
+                    ERROR("Suspend record xenstore frame number is bad");
+                    munmap(start_info, PAGE_SIZE);
+                    goto out;
+                }
+                *store_mfn = minfo.p2m_table[GET_FIELD(start_info, store_mfn,
+                                                       minfo.guest_width)];
+                SET_FIELD(start_info, store_mfn, *store_mfn,
+                          minfo.guest_width);
+                SET_FIELD(start_info, store_evtchn, store_evtchn,
+                          minfo.guest_width);
+                if ( GET_FIELD(start_info, console.domU.mfn,
+                               minfo.guest_width) > minfo.p2m_size )
+                {
+                    ERROR("Suspend record console frame number is bad");
+                    munmap(start_info, PAGE_SIZE);
+                    goto out;
+                }
+                console_pfn = GET_FIELD(start_info, console.domU.mfn,
+                                        minfo.guest_width);
+                *console_mfn = minfo.p2m_table[console_pfn];
+                SET_FIELD(start_info, console.domU.mfn, *console_mfn,
+                          minfo.guest_width);
+                SET_FIELD(start_info, console.domU.evtchn, console_evtchn,
+                          minfo.guest_width);
+                munmap(start_info, PAGE_SIZE);
+            }
+        }
+
+        if (xc_dom_gnttab_seed(xch, dest_dom, *console_mfn, *store_mfn,
+                               console_domid, store_domid))
+        {
+            PERROR("error seeding grant table");
+            goto out;
+        }
+    }
+    rc = 0;
+out:
+    if (!hvm)
+        xc_unmap_domain_meminfo(xch, &minfo);
+
+    if (hvm_buf) free(hvm_buf);
+
+    if ( (rc != 0) && (dest_dom != 0) ) {
+            PERROR("Faled to perform soft reset, destroying domain %d", dest_dom);
+	    xc_domain_destroy(xch, dest_dom);
+    }
+
+    return !!rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
index 40bbac8..c1dbabd 100644
--- a/tools/libxc/xenguest.h
+++ b/tools/libxc/xenguest.h
@@ -131,6 +131,25 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
  * of the new domain is automatically appended to the filename,
  * separated by a ".".
  */
+
+/**
+ * This function doest soft reset for a domain. During soft reset all
+ * source domain's memory is being reassigned to the destination domain,
+ * cpu contexts are being copied,...
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm source_dom the id of the source domain
+ * @parm dest_dom the id of the destination domain
+ * @parm hvm non-zero if this is an HVM domain
+ * @return 0 on success, -1 on failure
+ */
+int xc_domain_soft_reset(xc_interface *xch, uint32_t source_dom,
+                         uint32_t dest_dom, int hvm, domid_t console_domid,
+                         unsigned int console_evtchn,
+                         unsigned long *console_mfn,
+                         domid_t store_domid, unsigned int store_evtchn,
+                         unsigned long *store_mfn);
+
 #define XC_DEVICE_MODEL_RESTORE_FILE "/var/lib/xen/qemu-resume"
 
 /**
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RFC/WIPv2 4/6] xen: Introduce SHUTDOWN_soft_reset shutdown reason
  2014-09-24 14:20 [PATCH RFC/WIPv2 0/6] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  2014-09-24 14:20 ` [PATCH RFC/WIPv2 3/6] libxc: introduce soft reset Vitaly Kuznetsov
@ 2014-09-24 14:20 ` Vitaly Kuznetsov
  2014-09-24 14:20 ` [PATCH RFC/WIPv2 5/6] libxl: support " Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Vitaly Kuznetsov @ 2014-09-24 14:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Andrew Jones, David Vrabel, Jan Beulich

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 xen/common/shutdown.c      | 7 +++++++
 xen/include/public/sched.h | 3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/xen/common/shutdown.c b/xen/common/shutdown.c
index 94d4c53..5c3a158 100644
--- a/xen/common/shutdown.c
+++ b/xen/common/shutdown.c
@@ -71,6 +71,13 @@ void hwdom_shutdown(u8 reason)
         break; /* not reached */
     }
 
+    case SHUTDOWN_soft_reset:
+    {
+        printk("Domain 0 did soft reset but it is unsupported, rebooting.\n");
+        machine_restart(0);
+        break; /* not reached */
+    }
+
     default:
     {
         printk("Domain 0 shutdown (unknown reason %u): ", reason);
diff --git a/xen/include/public/sched.h b/xen/include/public/sched.h
index 4000ac9..800c808 100644
--- a/xen/include/public/sched.h
+++ b/xen/include/public/sched.h
@@ -159,7 +159,8 @@ DEFINE_XEN_GUEST_HANDLE(sched_watchdog_t);
 #define SHUTDOWN_suspend    2  /* Clean up, save suspend info, kill.         */
 #define SHUTDOWN_crash      3  /* Tell controller we've crashed.             */
 #define SHUTDOWN_watchdog   4  /* Restart because watchdog time expired.     */
-#define SHUTDOWN_MAX        4  /* Maximum valid shutdown reason.             */
+#define SHUTDOWN_soft_reset 5  /* Soft reset, rebuild keeping memory content */
+#define SHUTDOWN_MAX        5  /* Maximum valid shutdown reason.             */
 /* ` } */
 
 #endif /* __XEN_PUBLIC_SCHED_H__ */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RFC/WIPv2 5/6] libxl: support SHUTDOWN_soft_reset shutdown reason
  2014-09-24 14:20 [PATCH RFC/WIPv2 0/6] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (3 preceding siblings ...)
  2014-09-24 14:20 ` [PATCH RFC/WIPv2 4/6] xen: Introduce SHUTDOWN_soft_reset shutdown reason Vitaly Kuznetsov
@ 2014-09-24 14:20 ` Vitaly Kuznetsov
  2014-09-24 14:20 ` [PATCH RFC/WIPv2 6/6] libxl: soft reset support Vitaly Kuznetsov
  2014-09-24 15:23 ` [PATCH RFC/WIPv2 0/6] toolstack-based approach to pvhvm guest kexec Ian Campbell
  6 siblings, 0 replies; 12+ messages in thread
From: Vitaly Kuznetsov @ 2014-09-24 14:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Andrew Jones, David Vrabel, Jan Beulich

Use letter 't' to indicate a domain in such state.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/libxl/libxl_types.idl       | 1 +
 tools/libxl/xl_cmdimpl.c          | 2 +-
 tools/python/xen/lowlevel/xl/xl.c | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index f1fcbc3..8077415 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -166,6 +166,7 @@ libxl_shutdown_reason = Enumeration("shutdown_reason", [
     (2, "suspend"),
     (3, "crash"),
     (4, "watchdog"),
+    (5, "soft_reset"),
     ], init_val = "LIBXL_SHUTDOWN_REASON_UNKNOWN")
 
 libxl_vga_interface_type = Enumeration("vga_interface_type", [
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 698b3bc..b40ad50 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -3279,7 +3279,7 @@ static void list_domains(int verbose, int context, int claim, int numa,
                          const libxl_dominfo *info, int nb_domain)
 {
     int i;
-    static const char shutdown_reason_letters[]= "-rscw";
+    static const char shutdown_reason_letters[]= "-rscwt";
     libxl_bitmap nodemap;
     libxl_physinfo physinfo;
 
diff --git a/tools/python/xen/lowlevel/xl/xl.c b/tools/python/xen/lowlevel/xl/xl.c
index 32f982a..7c61160 100644
--- a/tools/python/xen/lowlevel/xl/xl.c
+++ b/tools/python/xen/lowlevel/xl/xl.c
@@ -784,6 +784,7 @@ PyMODINIT_FUNC initxl(void)
     _INT_CONST_LIBXL(m, SHUTDOWN_REASON_SUSPEND);
     _INT_CONST_LIBXL(m, SHUTDOWN_REASON_CRASH);
     _INT_CONST_LIBXL(m, SHUTDOWN_REASON_WATCHDOG);
+    _INT_CONST_LIBXL(m, SHUTDOWN_REASON_SOFT_RESET);
 
     genwrap__init(m);
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RFC/WIPv2 6/6] libxl: soft reset support
  2014-09-24 14:20 [PATCH RFC/WIPv2 0/6] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (4 preceding siblings ...)
  2014-09-24 14:20 ` [PATCH RFC/WIPv2 5/6] libxl: support " Vitaly Kuznetsov
@ 2014-09-24 14:20 ` Vitaly Kuznetsov
  2014-09-24 15:23 ` [PATCH RFC/WIPv2 0/6] toolstack-based approach to pvhvm guest kexec Ian Campbell
  6 siblings, 0 replies; 12+ messages in thread
From: Vitaly Kuznetsov @ 2014-09-24 14:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Andrew Jones, David Vrabel, Jan Beulich

Perform soft reset when a domain did SHUTDOWN_soft_reset. Migrate the
content with xc_domain_soft_reset(), reload dm and toolstack.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 tools/libxl/libxl.h          |   6 +++
 tools/libxl/libxl_create.c   | 100 +++++++++++++++++++++++++++++++++++++++----
 tools/libxl/libxl_internal.h |   5 +++
 tools/libxl/xl_cmdimpl.c     |  29 ++++++++++++-
 4 files changed, 129 insertions(+), 11 deletions(-)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index bc68cac..24b43c5 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -877,6 +877,12 @@ int static inline libxl_domain_create_restore_0x040200(
 
 #endif
 
+int libxl_domain_soft_reset(libxl_ctx *ctx, libxl_domain_config *d_config,
+                            uint32_t *domid, uint32_t domid_old,
+                            const libxl_asyncop_how *ao_how,
+                            const libxl_asyncprogress_how *aop_console_how)
+                            LIBXL_EXTERNAL_CALLERS_ONLY;
+
   /* A progress report will be made via ao_console_how, of type
    * domain_create_console_available, when the domain's primary
    * console is available and can be connected to.
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 8b82584..b63fc1c 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -24,6 +24,8 @@
 #include <xenguest.h>
 #include <xen/hvm/hvm_info_table.h>
 
+#define INVALID_DOMID ~0
+
 int libxl__domain_create_info_setdefault(libxl__gc *gc,
                                          libxl_domain_create_info *c_info)
 {
@@ -885,6 +887,9 @@ static void initiate_domain_create(libxl__egc *egc,
     if (restore_fd >= 0) {
         LOG(DEBUG, "restoring, not running bootloader");
         domcreate_bootloader_done(egc, &dcs->bl, 0);
+    } else if (dcs->domid_soft_reset != INVALID_DOMID) {
+        LOG(DEBUG, "soft reset, not running bootloader\n");
+        domcreate_bootloader_done(egc, &dcs->bl, 0);
     } else  {
         LOG(DEBUG, "running bootloader");
         dcs->bl.callback = domcreate_bootloader_done;
@@ -933,6 +938,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     libxl_domain_config *const d_config = dcs->guest_config;
     libxl_domain_build_info *const info = &d_config->b_info;
     const int restore_fd = dcs->restore_fd;
+    const uint32_t domid_soft_reset = dcs->domid_soft_reset;
     libxl__domain_build_state *const state = &dcs->build_state;
     libxl__srm_restore_autogen_callbacks *const callbacks =
         &dcs->shs.callbacks.restore.a;
@@ -956,7 +962,7 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     dcs->dmss.dm.callback = domcreate_devmodel_started;
     dcs->dmss.callback = domcreate_devmodel_started;
 
-    if ( restore_fd < 0 ) {
+    if ( (restore_fd < 0) && (domid_soft_reset == INVALID_DOMID) ) {
         rc = libxl__domain_build(gc, d_config, domid, state);
         domcreate_rebuild_done(egc, dcs, rc);
         return;
@@ -986,14 +992,71 @@ static void domcreate_bootloader_done(libxl__egc *egc,
         rc = ERROR_INVAL;
         goto out;
     }
-    libxl__xc_domain_restore(egc, dcs,
-                             hvm, pae, superpages);
+    if ( restore_fd >= 0 ) {
+        libxl__xc_domain_restore(egc, dcs,
+                                 hvm, pae, superpages);
+    } else {
+        libxl__xc_domain_soft_reset(egc, dcs, hvm);
+    }
+
     return;
 
  out:
     libxl__xc_domain_restore_done(egc, dcs, rc, 0, 0);
 }
 
+void libxl__xc_domain_soft_reset(libxl__egc *egc,
+                                 libxl__domain_create_state *dcs, int hvm)
+{
+    STATE_AO_GC(dcs->ao);
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+    const uint32_t domid_soft_reset = dcs->domid_soft_reset;
+    const uint32_t domid = dcs->guest_domid;
+    libxl_domain_config *const d_config = dcs->guest_config;
+    libxl_domain_build_info *const info = &d_config->b_info;
+    uint8_t *buf;
+    uint32_t len;
+    uint32_t console_domid, store_domid, console_evtchn, store_evtchn;
+    unsigned long store_mfn, console_mfn;
+    int rc;
+    struct libxl__domain_suspend_state *dss;
+
+    GCNEW(dss);
+
+    dss->ao = ao;
+    dss->domid = domid_soft_reset;
+    dss->dm_savefile = GCSPRINTF("/var/lib/xen/qemu-save.%d", domid_soft_reset);
+
+    if (info->type == LIBXL_DOMAIN_TYPE_HVM) {
+        rc = libxl__domain_suspend_device_model(gc, dss);
+        if (rc) goto out;
+    }
+    console_domid = dcs->build_state.console_domid;
+    console_evtchn = dcs->build_state.console_port;
+    store_domid = dcs->build_state.store_domid;
+    store_evtchn = dcs->build_state.store_port;
+
+    rc = xc_domain_soft_reset(ctx->xch, domid_soft_reset, domid, hvm,
+                              console_domid, console_evtchn, &console_mfn,
+                              store_domid, store_evtchn, &store_mfn);
+    if (rc) goto out;
+
+    dcs->build_state.store_mfn = store_mfn;
+    dcs->build_state.console_mfn = console_mfn;
+
+    rc = libxl__toolstack_save(domid_soft_reset, &buf, &len, dss);
+    if (rc) goto out;
+
+    rc = libxl__toolstack_restore(domid, buf, len, &dcs->shs);
+    if (rc) goto out;
+out:
+    /*
+     * Now pretend we did normal restore and simply call
+     * libxl__xc_domain_restore_done().
+     */
+    libxl__xc_domain_restore_done(egc, dcs, rc, 0, 0);
+}
+
 void libxl__srm_callout_callback_restore_results(unsigned long store_mfn,
           unsigned long console_mfn, void *user)
 {
@@ -1019,6 +1082,7 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
 
     /* convenience aliases */
     const uint32_t domid = dcs->guest_domid;
+    const uint32_t domid_soft_reset = dcs->domid_soft_reset;
     libxl_domain_config *const d_config = dcs->guest_config;
     libxl_domain_build_info *const info = &d_config->b_info;
     libxl__domain_build_state *const state = &dcs->build_state;
@@ -1071,9 +1135,12 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
     if (ret)
         goto out;
 
-    if (info->type == LIBXL_DOMAIN_TYPE_HVM) {
+    if (info->type == LIBXL_DOMAIN_TYPE_HVM && fd != -1) {
         state->saved_state = GCSPRINTF(
                        XC_DEVICE_MODEL_RESTORE_FILE".%d", domid);
+    } else if (domid_soft_reset != INVALID_DOMID) {
+        state->saved_state = GCSPRINTF(
+                       "/var/lib/xen/qemu-save.%d", domid_soft_reset);
     }
 
 out:
@@ -1082,9 +1149,12 @@ out:
         libxl__file_reference_unmap(&state->pv_ramdisk);
     }
 
-    esave = errno;
-    libxl_fd_set_nonblock(ctx, fd, 0);
-    errno = esave;
+    if ( fd != -1 ) {
+        esave = errno;
+        libxl_fd_set_nonblock(ctx, fd, 0);
+        errno = esave;
+    }
+
     domcreate_rebuild_done(egc, dcs, ret);
 }
 
@@ -1463,6 +1533,7 @@ static void domain_create_cb(libxl__egc *egc,
 static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
                             uint32_t *domid,
                             int restore_fd, int checkpointed_stream,
+                            uint32_t domid_old,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
@@ -1475,6 +1546,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
     libxl_domain_config_init(&cdcs->dcs.guest_config_saved);
     libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config);
     cdcs->dcs.restore_fd = restore_fd;
+    cdcs->dcs.domid_soft_reset = domid_old;
     cdcs->dcs.callback = domain_create_cb;
     cdcs->dcs.checkpointed_stream = checkpointed_stream;
     libxl__ao_progress_gethow(&cdcs->dcs.aop_console_how, aop_console_how);
@@ -1503,7 +1575,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
 {
-    return do_domain_create(ctx, d_config, domid, -1, 0,
+    return do_domain_create(ctx, d_config, domid, -1, 0, INVALID_DOMID,
                             ao_how, aop_console_how);
 }
 
@@ -1514,7 +1586,17 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 const libxl_asyncprogress_how *aop_console_how)
 {
     return do_domain_create(ctx, d_config, domid, restore_fd,
-                            params->checkpointed_stream, ao_how, aop_console_how);
+                            params->checkpointed_stream, INVALID_DOMID,
+                            ao_how, aop_console_how);
+}
+
+int libxl_domain_soft_reset(libxl_ctx *ctx, libxl_domain_config *d_config,
+                            uint32_t *domid, uint32_t domid_old,
+                            const libxl_asyncop_how *ao_how,
+                            const libxl_asyncprogress_how *aop_console_how)
+{
+    return do_domain_create(ctx, d_config, domid, -1, 0, domid_old,
+                            ao_how, aop_console_how);
 }
 
 /*
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index f61673c..03f490b 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2861,6 +2861,7 @@ struct libxl__domain_create_state {
     libxl_domain_config *guest_config;
     libxl_domain_config guest_config_saved; /* vanilla config */
     int restore_fd;
+    uint32_t domid_soft_reset;
     libxl__domain_create_cb *callback;
     libxl_asyncprogress_how aop_console_how;
     /* private to domain_create */
@@ -2915,6 +2916,10 @@ _hidden void libxl__xc_domain_restore(libxl__egc *egc,
  * If rc!=0, retval and errnoval are undefined. */
 _hidden void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
                                            int rc, int retval, int errnoval);
+/* calls libxl__xc_domain_restore_done when done */
+_hidden void libxl__xc_domain_soft_reset(libxl__egc *egc,
+                                         libxl__domain_create_state *dcs,
+                                         int hvm);
 
 /* Each time the dm needs to be saved, we must call suspend and then save */
 _hidden int libxl__domain_suspend_device_model(libxl__gc *gc,
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index b40ad50..9183be4 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1818,7 +1818,8 @@ static void reload_domain_config(uint32_t domid,
 }
 
 /* Returns 1 if domain should be restarted,
- * 2 if domain should be renamed then restarted, or 0
+ * 2 if domain should be renamed then restarted,
+ * 3 if domain performed soft reset, or 0
  * Can update r_domid if domain is destroyed etc */
 static int handle_domain_death(uint32_t *r_domid,
                                libxl_event *event,
@@ -1844,6 +1845,9 @@ static int handle_domain_death(uint32_t *r_domid,
     case LIBXL_SHUTDOWN_REASON_WATCHDOG:
         action = d_config->on_watchdog;
         break;
+    case LIBXL_SHUTDOWN_REASON_SOFT_RESET:
+        LOG("Domain performed soft reset.");
+        return 3;
     default:
         LOG("Unknown shutdown reason code %d. Destroying domain.",
             event->u.domain_shutdown.shutdown_reason);
@@ -2067,6 +2071,7 @@ static void evdisable_disk_ejects(libxl_evgen_disk_eject **diskws,
 static uint32_t create_domain(struct domain_create *dom_info)
 {
     uint32_t domid = INVALID_DOMID;
+    uint32_t domid_old = INVALID_DOMID;
 
     libxl_domain_config d_config;
 
@@ -2292,7 +2297,25 @@ start:
          * restore/migrate-receive it again.
          */
         restoring = 0;
-    }else{
+    } else if ( domid_old != INVALID_DOMID ) {
+        /* Do soft reset */
+        d_config.b_info.nodemap.size = 0;
+        ret = libxl_domain_soft_reset(ctx, &d_config,
+                                      &domid, domid_old,
+                                      0, 0);
+
+        if ( libxl_domain_destroy(ctx, domid_old, 0) )
+            fprintf(stderr, "failed to destroy old domain, ret %d\n", ret);
+
+        if ( ret ) {
+            /*
+             * Soft reset failed for some reason. It's probbably best to kill
+             * the new domain as well.
+             */
+            goto error_out;
+        }
+        domid_old = INVALID_DOMID;
+    } else {
         ret = libxl_domain_create_new(ctx, &d_config, &domid,
                                       0, autoconnect_console_how);
     }
@@ -2356,6 +2379,8 @@ start:
                 event->u.domain_shutdown.shutdown_reason,
                 event->u.domain_shutdown.shutdown_reason);
             switch (handle_domain_death(&domid, event, &d_config)) {
+            case 3:
+                domid_old = domid;
             case 2:
                 if (!preserve_domain(&domid, event, &d_config)) {
                     /* If we fail then exit leaving the old domain in place. */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC/WIPv2 1/6] Introduce XENMEM_transfer operation
  2014-09-24 14:20 ` [PATCH RFC/WIPv2 1/6] Introduce XENMEM_transfer operation Vitaly Kuznetsov
@ 2014-09-24 15:07   ` Andrew Cooper
  2014-09-24 15:13     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Cooper @ 2014-09-24 15:07 UTC (permalink / raw)
  To: Vitaly Kuznetsov, xen-devel; +Cc: Andrew Jones, David Vrabel, Jan Beulich

On 24/09/14 15:20, Vitaly Kuznetsov wrote:
> New operation reassigns pages from one domain to the other mapping them
> at exactly the same GFNs in the destination domain. Pages mapped more
> than once (e.g. granted pages) are being copied.
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  xen/common/memory.c         | 178 ++++++++++++++++++++++++++++++++++++++++++++
>  xen/include/public/memory.h |  32 +++++++-
>  2 files changed, 209 insertions(+), 1 deletion(-)
>
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 2e3225d..653e117 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -578,6 +578,180 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>      return rc;
>  }
>  
> +static long memory_transfer(XEN_GUEST_HANDLE_PARAM(xen_memory_transfer_t) arg)
> +{
> +    long rc = 0;
> +    struct xen_memory_transfer trans;
> +    struct domain *source_d, *dest_d;
> +    unsigned long mfn, gmfn, last_gmfn;
> +    p2m_type_t p2mt;
> +    struct page_info *page, *new_page;
> +    char *sp, *dp;
> +    int copying;
> +
> +    if ( copy_from_guest(&trans, arg, 1) )
> +        return -EFAULT;
> +
> +    source_d = rcu_lock_domain_by_any_id(trans.source_domid);
> +    if ( source_d == NULL )
> +    {
> +        rc = -ESRCH;
> +        goto fail_early;
> +    }
> +
> +    if ( source_d->is_dying )
> +    {
> +        rc = -EINVAL;
> +        rcu_unlock_domain(source_d);
> +        goto fail_early;
> +    }
> +
> +    dest_d = rcu_lock_domain_by_any_id(trans.dest_domid);
> +    if ( dest_d == NULL )
> +    {
> +        rc = -ESRCH;
> +        rcu_unlock_domain(source_d);
> +        goto fail_early;
> +    }
> +
> +    if ( dest_d->is_dying )
> +    {
> +        rc = -EINVAL;
> +        goto fail;
> +    }
> +
> +    last_gmfn = trans.gmfn_start + trans.gmfn_count;
> +    for ( gmfn = trans.gmfn_start; gmfn < last_gmfn; gmfn++ )
> +    {
> +        page = get_page_from_gfn(source_d, gmfn, &p2mt, 0);
> +        if ( !page )
> +        {
> +            continue;
> +        }
> +
> +        mfn = page_to_mfn(page);
> +        if ( !mfn_valid(mfn) )
> +        {
> +            put_page(page);
> +            continue;
> +        }
> +
> +        copying = 0;
> +
> +        if ( is_xen_heap_mfn(mfn) )
> +        {
> +            put_page(page);
> +            continue;
> +        }
> +
> +        /* Page table always worth copying */
> +        if ( (page->u.inuse.type_info & PGT_l4_page_table) ||
> +             (page->u.inuse.type_info & PGT_l3_page_table) ||
> +             (page->u.inuse.type_info & PGT_l2_page_table) ||
> +             (page->u.inuse.type_info & PGT_l1_page_table) )
> +            copying = 1;

How can copying pagetables like this ever work?  You will end up with an
L4 belonging to the new domain pointing to L3's owned by the old domain.

Even if you change the ownership of the pages pointed to by the L1's, as
soon as the old domain is torn down, the new domains pagetables will be
freed heap pages.

~Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC/WIPv2 1/6] Introduce XENMEM_transfer operation
  2014-09-24 15:07   ` Andrew Cooper
@ 2014-09-24 15:13     ` Vitaly Kuznetsov
  2014-09-24 15:33       ` Andrew Cooper
  0 siblings, 1 reply; 12+ messages in thread
From: Vitaly Kuznetsov @ 2014-09-24 15:13 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Andrew Jones, David Vrabel, Jan Beulich

Andrew Cooper <andrew.cooper3@citrix.com> writes:

> On 24/09/14 15:20, Vitaly Kuznetsov wrote:
>> New operation reassigns pages from one domain to the other mapping them
>> at exactly the same GFNs in the destination domain. Pages mapped more
>> than once (e.g. granted pages) are being copied.
>>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  xen/common/memory.c         | 178 ++++++++++++++++++++++++++++++++++++++++++++
>>  xen/include/public/memory.h |  32 +++++++-
>>  2 files changed, 209 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/common/memory.c b/xen/common/memory.c
>> index 2e3225d..653e117 100644
>> --- a/xen/common/memory.c
>> +++ b/xen/common/memory.c
>> @@ -578,6 +578,180 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>>      return rc;
>>  }
>>  
>> +static long memory_transfer(XEN_GUEST_HANDLE_PARAM(xen_memory_transfer_t) arg)
>> +{
>> +    long rc = 0;
>> +    struct xen_memory_transfer trans;
>> +    struct domain *source_d, *dest_d;
>> +    unsigned long mfn, gmfn, last_gmfn;
>> +    p2m_type_t p2mt;
>> +    struct page_info *page, *new_page;
>> +    char *sp, *dp;
>> +    int copying;
>> +
>> +    if ( copy_from_guest(&trans, arg, 1) )
>> +        return -EFAULT;
>> +
>> +    source_d = rcu_lock_domain_by_any_id(trans.source_domid);
>> +    if ( source_d == NULL )
>> +    {
>> +        rc = -ESRCH;
>> +        goto fail_early;
>> +    }
>> +
>> +    if ( source_d->is_dying )
>> +    {
>> +        rc = -EINVAL;
>> +        rcu_unlock_domain(source_d);
>> +        goto fail_early;
>> +    }
>> +
>> +    dest_d = rcu_lock_domain_by_any_id(trans.dest_domid);
>> +    if ( dest_d == NULL )
>> +    {
>> +        rc = -ESRCH;
>> +        rcu_unlock_domain(source_d);
>> +        goto fail_early;
>> +    }
>> +
>> +    if ( dest_d->is_dying )
>> +    {
>> +        rc = -EINVAL;
>> +        goto fail;
>> +    }
>> +
>> +    last_gmfn = trans.gmfn_start + trans.gmfn_count;
>> +    for ( gmfn = trans.gmfn_start; gmfn < last_gmfn; gmfn++ )
>> +    {
>> +        page = get_page_from_gfn(source_d, gmfn, &p2mt, 0);
>> +        if ( !page )
>> +        {
>> +            continue;
>> +        }
>> +
>> +        mfn = page_to_mfn(page);
>> +        if ( !mfn_valid(mfn) )
>> +        {
>> +            put_page(page);
>> +            continue;
>> +        }
>> +
>> +        copying = 0;
>> +
>> +        if ( is_xen_heap_mfn(mfn) )
>> +        {
>> +            put_page(page);
>> +            continue;
>> +        }
>> +
>> +        /* Page table always worth copying */
>> +        if ( (page->u.inuse.type_info & PGT_l4_page_table) ||
>> +             (page->u.inuse.type_info & PGT_l3_page_table) ||
>> +             (page->u.inuse.type_info & PGT_l2_page_table) ||
>> +             (page->u.inuse.type_info & PGT_l1_page_table) )
>> +            copying = 1;
>
> How can copying pagetables like this ever work?  You will end up with an
> L4 belonging to the new domain pointing to L3's owned by the old domain.
>
> Even if you change the ownership of the pages pointed to by the L1's, as
> soon as the old domain is torn down, the new domains pagetables will be
> freed heap pages.

Yes, I'm aware it is broken and that's actually why I sent this RFC - in
my PATCH 0/6 letter the main question was: what's the best approach here
with regards to PV? If we want to avoid copying and updating this pages
we can do it while killing the original domain (so instead of this
_transfer op we'll have special 'domain kill' op).

Thanks,

>
> ~Andrew

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC/WIPv2 0/6] toolstack-based approach to pvhvm guest kexec
  2014-09-24 14:20 [PATCH RFC/WIPv2 0/6] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
                   ` (5 preceding siblings ...)
  2014-09-24 14:20 ` [PATCH RFC/WIPv2 6/6] libxl: soft reset support Vitaly Kuznetsov
@ 2014-09-24 15:23 ` Ian Campbell
  2014-09-24 15:37   ` Vitaly Kuznetsov
  6 siblings, 1 reply; 12+ messages in thread
From: Ian Campbell @ 2014-09-24 15:23 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: xen-devel, Andrew Jones, David Vrabel, Jan Beulich, Andrew Cooper

Hi Vitaly,

On Wed, 2014-09-24 at 16:20 +0200, Vitaly Kuznetsov wrote:

I assume this is targeting 4.6? With the 4.5 freeze happening as we
speak many folks attention is on that, at least that's true for me (with
my toolstack hat on). If you don't hear soon perhaps ping in a couple of
weeks?

Ian.

> When a PVHVM linux guest performs kexec there are lots of things which
> require taking care of:
> - shared info, vcpu_info
> - grants
> - event channels
> - ...
> Instead of taking care of all these things we can rebuild the domain
> performing kexec from scratch doing so-called soft-reboot.
> 
> The idea was suggested by David Vrabel, Jan Beulich, and Konrad Rzeszutek Wilk.
> 
> Main RFC part:
> I'm not sure my suggested XENMEM_transfer op is the right way to go:
> - If we steal pages from a particular domain it should be dead as it makes no
>   sense to it after such the call.
> - we need to copy all L1-L4 pages, rebuild the shared info ... for PV. This
>   will result in additional complexity in libxc (rebuilding the p2m).
> - we also need to keep track of all copied L1-L4 pages for PV as we'll be
>   re-creating p2m.
> 
> I can see three possible ways to go:
> 1) Forbid the call for PV and remove this part from libxc. The easiest way to
>    go (and PV kexec/kdump will require additional work on kernel side anyway).
> 2) Go ahead and rebuild p2m in libxc (similar to what we have for save/restore
>     path).
> 3) Instead of XENMEM_transfer introduce special oneshot domain kill op which
>    will follow the same domain_kill path but instead of relinquishing resources
>    it will reassign them. I suppose it will be posible to reassign L1-L4 pages
>    and pages from xenheap here as well.
> 
> What would you say?
> 
> WIP part:
> - PV support is broken.
> - Not sure huge pages will work well.
> - Not sure about ARM/PVH.
> - Not tested with qemu-upstream.
> 
> P.S. The patch series can be tested with PVHVM Linux guest with the following
> modifications:
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index c0cb11f..33c5cdd 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -33,6 +33,10 @@
>  #include <linux/memblock.h>
>  #include <linux/edd.h>
>  
> +#ifdef CONFIG_KEXEC
> +#include <linux/kexec.h>
> +#endif
> +
>  #include <xen/xen.h>
>  #include <xen/events.h>
>  #include <xen/interface/xen.h>
> @@ -1810,6 +1814,22 @@ static struct notifier_block xen_hvm_cpu_notifier = {
>    .notifier_call   = xen_hvm_cpu_notify,
>  };
>  
> +#ifdef CONFIG_KEXEC
> +static void xen_pvhvm_kexec_shutdown(void)
> +{
> +	native_machine_shutdown();
> +	if (kexec_in_progress) {
> +	   xen_reboot(SHUTDOWN_soft_reset);
> +	   }
> +}
> +
> +static void xen_pvhvm_crash_shutdown(struct pt_regs *regs)
> +{
> +	native_machine_crash_shutdown(regs);
> +	xen_reboot(SHUTDOWN_soft_reset);
> +}
> +#endif
> +
>  static void __init xen_hvm_guest_init(void)
>  {
> 	init_hvm_pv_info();
> @@ -1826,6 +1846,10 @@ static void __init xen_hvm_guest_init(void)
>    x86_init.irqs.intr_init = xen_init_IRQ;
>    xen_hvm_init_time_ops();
>    xen_hvm_init_mmu_ops();
> +#ifdef CONFIG_KEXEC
> +	machine_ops.shutdown = xen_pvhvm_kexec_shutdown;
> +	machine_ops.crash_shutdown = xen_pvhvm_crash_shutdown;
> +#endif
>  }
>  
>  static bool xen_nopv = false;
> diff --git a/include/xen/interface/sched.h b/include/xen/interface/sched.h
> index 9ce0839..b5942a8 100644
> --- a/include/xen/interface/sched.h
> +++ b/include/xen/interface/sched.h
> @@ -107,5 +107,6 @@ struct sched_watchdog {
>  #define SHUTDOWN_suspend    2  /* Clean up, save suspend info, kill.         */
>  #define SHUTDOWN_crash      3  /* Tell controller we've crashed.             */
>  #define SHUTDOWN_watchdog   4  /* Restart because watchdog time expired.     */
> +#define SHUTDOWN_soft_reset 5  /* Soft-reset for kexec.                      */
>  
>  #endif /* __XEN_PUBLIC_SCHED_H__ */
> 
> Vitaly Kuznetsov (6):
>   Introduce XENMEM_transfer operation
>   libxc: support XENMEM_transfer operation
>   libxc: introduce soft reset
>   xen: Introduce SHUTDOWN_soft_reset shutdown reason
>   libxl: support SHUTDOWN_soft_reset shutdown reason
>   libxl: soft reset support
> 
>  tools/libxc/Makefile               |   1 +
>  tools/libxc/xc_domain.c            |  19 +++
>  tools/libxc/xc_domain_soft_reset.c | 300 +++++++++++++++++++++++++++++++++++++
>  tools/libxc/xenctrl.h              |   6 +
>  tools/libxc/xenguest.h             |  19 +++
>  tools/libxl/libxl.h                |   6 +
>  tools/libxl/libxl_create.c         | 100 +++++++++++--
>  tools/libxl/libxl_internal.h       |   5 +
>  tools/libxl/libxl_types.idl        |   1 +
>  tools/libxl/xl_cmdimpl.c           |  31 +++-
>  tools/python/xen/lowlevel/xl/xl.c  |   1 +
>  xen/common/memory.c                | 178 ++++++++++++++++++++++
>  xen/common/shutdown.c              |   7 +
>  xen/include/public/memory.h        |  32 +++-
>  xen/include/public/sched.h         |   3 +-
>  15 files changed, 695 insertions(+), 14 deletions(-)
>  create mode 100644 tools/libxc/xc_domain_soft_reset.c
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC/WIPv2 1/6] Introduce XENMEM_transfer operation
  2014-09-24 15:13     ` Vitaly Kuznetsov
@ 2014-09-24 15:33       ` Andrew Cooper
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Cooper @ 2014-09-24 15:33 UTC (permalink / raw)
  To: Vitaly Kuznetsov; +Cc: xen-devel, Andrew Jones, David Vrabel, Jan Beulich

On 24/09/14 16:13, Vitaly Kuznetsov wrote:
> Andrew Cooper <andrew.cooper3@citrix.com> writes:
>
>> On 24/09/14 15:20, Vitaly Kuznetsov wrote:
>>> New operation reassigns pages from one domain to the other mapping them
>>> at exactly the same GFNs in the destination domain. Pages mapped more
>>> than once (e.g. granted pages) are being copied.
>>>
>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>> ---
>>>  xen/common/memory.c         | 178 ++++++++++++++++++++++++++++++++++++++++++++
>>>  xen/include/public/memory.h |  32 +++++++-
>>>  2 files changed, 209 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/common/memory.c b/xen/common/memory.c
>>> index 2e3225d..653e117 100644
>>> --- a/xen/common/memory.c
>>> +++ b/xen/common/memory.c
>>> @@ -578,6 +578,180 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>>>      return rc;
>>>  }
>>>  
>>> +static long memory_transfer(XEN_GUEST_HANDLE_PARAM(xen_memory_transfer_t) arg)
>>> +{
>>> +    long rc = 0;
>>> +    struct xen_memory_transfer trans;
>>> +    struct domain *source_d, *dest_d;
>>> +    unsigned long mfn, gmfn, last_gmfn;
>>> +    p2m_type_t p2mt;
>>> +    struct page_info *page, *new_page;
>>> +    char *sp, *dp;
>>> +    int copying;
>>> +
>>> +    if ( copy_from_guest(&trans, arg, 1) )
>>> +        return -EFAULT;
>>> +
>>> +    source_d = rcu_lock_domain_by_any_id(trans.source_domid);
>>> +    if ( source_d == NULL )
>>> +    {
>>> +        rc = -ESRCH;
>>> +        goto fail_early;
>>> +    }
>>> +
>>> +    if ( source_d->is_dying )
>>> +    {
>>> +        rc = -EINVAL;
>>> +        rcu_unlock_domain(source_d);
>>> +        goto fail_early;
>>> +    }
>>> +
>>> +    dest_d = rcu_lock_domain_by_any_id(trans.dest_domid);
>>> +    if ( dest_d == NULL )
>>> +    {
>>> +        rc = -ESRCH;
>>> +        rcu_unlock_domain(source_d);
>>> +        goto fail_early;
>>> +    }
>>> +
>>> +    if ( dest_d->is_dying )
>>> +    {
>>> +        rc = -EINVAL;
>>> +        goto fail;
>>> +    }
>>> +
>>> +    last_gmfn = trans.gmfn_start + trans.gmfn_count;
>>> +    for ( gmfn = trans.gmfn_start; gmfn < last_gmfn; gmfn++ )
>>> +    {
>>> +        page = get_page_from_gfn(source_d, gmfn, &p2mt, 0);
>>> +        if ( !page )
>>> +        {
>>> +            continue;
>>> +        }
>>> +
>>> +        mfn = page_to_mfn(page);
>>> +        if ( !mfn_valid(mfn) )
>>> +        {
>>> +            put_page(page);
>>> +            continue;
>>> +        }
>>> +
>>> +        copying = 0;
>>> +
>>> +        if ( is_xen_heap_mfn(mfn) )
>>> +        {
>>> +            put_page(page);
>>> +            continue;
>>> +        }
>>> +
>>> +        /* Page table always worth copying */
>>> +        if ( (page->u.inuse.type_info & PGT_l4_page_table) ||
>>> +             (page->u.inuse.type_info & PGT_l3_page_table) ||
>>> +             (page->u.inuse.type_info & PGT_l2_page_table) ||
>>> +             (page->u.inuse.type_info & PGT_l1_page_table) )
>>> +            copying = 1;
>> How can copying pagetables like this ever work?  You will end up with an
>> L4 belonging to the new domain pointing to L3's owned by the old domain.
>>
>> Even if you change the ownership of the pages pointed to by the L1's, as
>> soon as the old domain is torn down, the new domains pagetables will be
>> freed heap pages.
> Yes, I'm aware it is broken and that's actually why I sent this RFC - in
> my PATCH 0/6 letter the main question was: what's the best approach here
> with regards to PV? If we want to avoid copying and updating this pages
> we can do it while killing the original domain (so instead of this
> _transfer op we'll have special 'domain kill' op).

Ah - I had not taken that meaning from your 0/6.

Xen has no knowledge whatsoever of a PV domains p2m table (other than
holding a reference to it for toolstack/domain use).  This knowledge
lives exclusively in the toolstack and guest.

As a result, I would say that a hypercall like this cannot possibly be
made to work for PV guests without some PV architectural changes in Xen.

~Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC/WIPv2 0/6] toolstack-based approach to pvhvm guest kexec
  2014-09-24 15:23 ` [PATCH RFC/WIPv2 0/6] toolstack-based approach to pvhvm guest kexec Ian Campbell
@ 2014-09-24 15:37   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 12+ messages in thread
From: Vitaly Kuznetsov @ 2014-09-24 15:37 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, Andrew Jones, David Vrabel, Jan Beulich, Andrew Cooper

Ian Campbell <Ian.Campbell@citrix.com> writes:

> Hi Vitaly,
>
> On Wed, 2014-09-24 at 16:20 +0200, Vitaly Kuznetsov wrote:
>
> I assume this is targeting 4.6? 

Well, if 'HVM only' solution is good enough for us I think *in theory*
it is still possible to make it for 4.5 (the feature is kinda
separate). But I anticipate discussions and future work here so ... 4.6
is more realistic.

> With the 4.5 freeze happening as we
> speak many folks attention is on that, at least that's true for me (with
> my toolstack hat on). If you don't hear soon perhaps ping in a couple of
> weeks?

Sure, thanks!

>
> Ian.
>
>> When a PVHVM linux guest performs kexec there are lots of things which
>> require taking care of:
>> - shared info, vcpu_info
>> - grants
>> - event channels
>> - ...
>> Instead of taking care of all these things we can rebuild the domain
>> performing kexec from scratch doing so-called soft-reboot.
>> 
>> The idea was suggested by David Vrabel, Jan Beulich, and Konrad Rzeszutek Wilk.
>> 
>> Main RFC part:
>> I'm not sure my suggested XENMEM_transfer op is the right way to go:
>> - If we steal pages from a particular domain it should be dead as it makes no
>>   sense to it after such the call.
>> - we need to copy all L1-L4 pages, rebuild the shared info ... for PV. This
>>   will result in additional complexity in libxc (rebuilding the p2m).
>> - we also need to keep track of all copied L1-L4 pages for PV as we'll be
>>   re-creating p2m.
>> 
>> I can see three possible ways to go:
>> 1) Forbid the call for PV and remove this part from libxc. The easiest way to
>>    go (and PV kexec/kdump will require additional work on kernel side anyway).
>> 2) Go ahead and rebuild p2m in libxc (similar to what we have for save/restore
>>     path).
>> 3) Instead of XENMEM_transfer introduce special oneshot domain kill op which
>>    will follow the same domain_kill path but instead of relinquishing resources
>>    it will reassign them. I suppose it will be posible to reassign L1-L4 pages
>>    and pages from xenheap here as well.
>> 
>> What would you say?
>> 
>> WIP part:
>> - PV support is broken.
>> - Not sure huge pages will work well.
>> - Not sure about ARM/PVH.
>> - Not tested with qemu-upstream.
>> 
>> P.S. The patch series can be tested with PVHVM Linux guest with the following
>> modifications:
>> 
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index c0cb11f..33c5cdd 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -33,6 +33,10 @@
>>  #include <linux/memblock.h>
>>  #include <linux/edd.h>
>>  
>> +#ifdef CONFIG_KEXEC
>> +#include <linux/kexec.h>
>> +#endif
>> +
>>  #include <xen/xen.h>
>>  #include <xen/events.h>
>>  #include <xen/interface/xen.h>
>> @@ -1810,6 +1814,22 @@ static struct notifier_block xen_hvm_cpu_notifier = {
>>    .notifier_call   = xen_hvm_cpu_notify,
>>  };
>>  
>> +#ifdef CONFIG_KEXEC
>> +static void xen_pvhvm_kexec_shutdown(void)
>> +{
>> +	native_machine_shutdown();
>> +	if (kexec_in_progress) {
>> +	   xen_reboot(SHUTDOWN_soft_reset);
>> +	   }
>> +}
>> +
>> +static void xen_pvhvm_crash_shutdown(struct pt_regs *regs)
>> +{
>> +	native_machine_crash_shutdown(regs);
>> +	xen_reboot(SHUTDOWN_soft_reset);
>> +}
>> +#endif
>> +
>>  static void __init xen_hvm_guest_init(void)
>>  {
>> 	init_hvm_pv_info();
>> @@ -1826,6 +1846,10 @@ static void __init xen_hvm_guest_init(void)
>>    x86_init.irqs.intr_init = xen_init_IRQ;
>>    xen_hvm_init_time_ops();
>>    xen_hvm_init_mmu_ops();
>> +#ifdef CONFIG_KEXEC
>> +	machine_ops.shutdown = xen_pvhvm_kexec_shutdown;
>> +	machine_ops.crash_shutdown = xen_pvhvm_crash_shutdown;
>> +#endif
>>  }
>>  
>>  static bool xen_nopv = false;
>> diff --git a/include/xen/interface/sched.h b/include/xen/interface/sched.h
>> index 9ce0839..b5942a8 100644
>> --- a/include/xen/interface/sched.h
>> +++ b/include/xen/interface/sched.h
>> @@ -107,5 +107,6 @@ struct sched_watchdog {
>>  #define SHUTDOWN_suspend    2  /* Clean up, save suspend info, kill.         */
>>  #define SHUTDOWN_crash      3  /* Tell controller we've crashed.             */
>>  #define SHUTDOWN_watchdog   4  /* Restart because watchdog time expired.     */
>> +#define SHUTDOWN_soft_reset 5  /* Soft-reset for kexec.                      */
>>  
>>  #endif /* __XEN_PUBLIC_SCHED_H__ */
>> 
>> Vitaly Kuznetsov (6):
>>   Introduce XENMEM_transfer operation
>>   libxc: support XENMEM_transfer operation
>>   libxc: introduce soft reset
>>   xen: Introduce SHUTDOWN_soft_reset shutdown reason
>>   libxl: support SHUTDOWN_soft_reset shutdown reason
>>   libxl: soft reset support
>> 
>>  tools/libxc/Makefile               |   1 +
>>  tools/libxc/xc_domain.c            |  19 +++
>>  tools/libxc/xc_domain_soft_reset.c | 300 +++++++++++++++++++++++++++++++++++++
>>  tools/libxc/xenctrl.h              |   6 +
>>  tools/libxc/xenguest.h             |  19 +++
>>  tools/libxl/libxl.h                |   6 +
>>  tools/libxl/libxl_create.c         | 100 +++++++++++--
>>  tools/libxl/libxl_internal.h       |   5 +
>>  tools/libxl/libxl_types.idl        |   1 +
>>  tools/libxl/xl_cmdimpl.c           |  31 +++-
>>  tools/python/xen/lowlevel/xl/xl.c  |   1 +
>>  xen/common/memory.c                | 178 ++++++++++++++++++++++
>>  xen/common/shutdown.c              |   7 +
>>  xen/include/public/memory.h        |  32 +++-
>>  xen/include/public/sched.h         |   3 +-
>>  15 files changed, 695 insertions(+), 14 deletions(-)
>>  create mode 100644 tools/libxc/xc_domain_soft_reset.c
>> 

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-09-24 15:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-24 14:20 [PATCH RFC/WIPv2 0/6] toolstack-based approach to pvhvm guest kexec Vitaly Kuznetsov
2014-09-24 14:20 ` [PATCH RFC/WIPv2 1/6] Introduce XENMEM_transfer operation Vitaly Kuznetsov
2014-09-24 15:07   ` Andrew Cooper
2014-09-24 15:13     ` Vitaly Kuznetsov
2014-09-24 15:33       ` Andrew Cooper
2014-09-24 14:20 ` [PATCH RFC/WIPv2 2/6] libxc: support " Vitaly Kuznetsov
2014-09-24 14:20 ` [PATCH RFC/WIPv2 3/6] libxc: introduce soft reset Vitaly Kuznetsov
2014-09-24 14:20 ` [PATCH RFC/WIPv2 4/6] xen: Introduce SHUTDOWN_soft_reset shutdown reason Vitaly Kuznetsov
2014-09-24 14:20 ` [PATCH RFC/WIPv2 5/6] libxl: support " Vitaly Kuznetsov
2014-09-24 14:20 ` [PATCH RFC/WIPv2 6/6] libxl: soft reset support Vitaly Kuznetsov
2014-09-24 15:23 ` [PATCH RFC/WIPv2 0/6] toolstack-based approach to pvhvm guest kexec Ian Campbell
2014-09-24 15:37   ` Vitaly Kuznetsov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.