xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [Xen-devel] [PATCH v8 0/5] VM forking
@ 2020-02-10 19:21 Tamas K Lengyel
  2020-02-10 19:21 ` [Xen-devel] [PATCH v8 1/5] x86/p2m: Allow p2m_get_page_from_gfn to return shared entries Tamas K Lengyel
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Tamas K Lengyel @ 2020-02-10 19:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tamas K Lengyel, Jan Beulich, Anthony PERARD, Julien Grall,
	Roger Pau Monné

The following series implements VM forking for Intel HVM guests to allow for
the fast creation of identical VMs without the assosciated high startup costs
of booting or restoring the VM from a savefile.

JIRA issue: https://xenproject.atlassian.net/browse/XEN-89

The fork operation is implemented as part of the "xl fork-vm" command:
    xl fork-vm -C <config_file_for_fork> -Q <qemu_save_file> <parent_domid>
    
By default a fully functional fork is created. The user is in charge however to
create the appropriate config file for the fork and to generate the QEMU save
file before the fork-vm call is made. The config file needs to give the
fork a new name at minimum but other settings may also require changes.

The interface also allows to split the forking into two steps:
    xl fork-vm --launch-dm no \
               -p <parent_domid>
    xl fork-vm --launch-dm late \
               -C <config_file_for_fork> \
               -Q <qemu_save_file> \
               <fork_domid>

The split creation model is useful when the VM needs to be created as fast as
possible. The forked VM can be unpaused without the device model being launched
to be monitored and accessed via VMI. Note however that without its device
model running (depending on what is executing in the VM) it is bound to
misbehave or even crash when its trying to access devices that would be
emulated by QEMU. We anticipate that for certain use-cases this would be an
acceptable situation, in case for example when fuzzing is performed of code
segments that don't access such devices.

Launching the device model requires the QEMU Xen savefile to be generated
manually from the parent VM. This can be accomplished simply by connecting to
its QMP socket and issuing the "xen-save-devices-state" command. For example
using the standard tool socat these commands can be used to generate the file:
    socat - UNIX-CONNECT:/var/run/xen/qmp-libxl-<parent_domid>
    { "execute": "qmp_capabilities" }
    { "execute": "xen-save-devices-state", \
        "arguments": { "filename": "/path/to/save/qemu_state", \
                        "live": false} }

At runtime the forked VM starts running with an empty p2m which gets lazily
populated when the VM generates EPT faults, similar to how altp2m views are
populated. If the memory access is a read-only access, the p2m entry is
populated with a memory shared entry with its parent. For write memory accesses
or in case memory sharing wasn't possible (for example in case a reference is
held by a third party), a new page is allocated and the page contents are
copied over from the parent VM. Forks can be further forked if needed, thus
allowing for further memory savings.

A VM fork reset hypercall is also added that allows the fork to be reset to the
state it was just after a fork, also accessible via xl:
    xl fork-vm --fork-reset -p <fork_domid>

This is an optimization for cases where the forks are very short-lived and run
without a device model, so resetting saves some time compared to creating a
brand new fork provided the fork has not aquired a lot of memory. If the fork
has a lot of memory deduplicated it is likely going to be faster to create a
new fork from scratch and asynchronously destroying the old one.

The series has been tested with both Linux and Windows VMs and functions as
expected. VM forking time has been measured to be 0.0007s, device model launch
to be around 1s depending largely on the number of devices being emulated. Fork
resets have been measured to be 0.0001s under the optimal circumstances.

New in v8: rebase on staging and a minor fix in the toolstack code

Patch 1 is a bugfix for pre-existing code in p2m

Patch 2 exposes a hap internal function that will be used during VM forking

Patch 3-4 implements the VM fork & reset operation hypervisor side bits

Patch 5 adds the toolstack-side code implementing VM forking and reset

Tamas K Lengyel (5):
  x86/p2m: Allow p2m_get_page_from_gfn to return shared entries
  xen/x86: Make hap_get_allocation accessible
  xen/mem_sharing: VM forking
  x86/mem_sharing: reset a fork
  xen/tools: VM forking toolstack side

 docs/man/xl.1.pod.in              |  36 ++++
 tools/libxc/include/xenctrl.h     |  13 ++
 tools/libxc/xc_memshr.c           |  22 +++
 tools/libxl/libxl.h               |   7 +
 tools/libxl/libxl_create.c        | 256 ++++++++++++++++---------
 tools/libxl/libxl_dm.c            |   2 +-
 tools/libxl/libxl_dom.c           |  43 ++++-
 tools/libxl/libxl_internal.h      |   1 +
 tools/libxl/libxl_types.idl       |   1 +
 tools/xl/xl.h                     |   5 +
 tools/xl/xl_cmdtable.c            |  12 ++
 tools/xl/xl_saverestore.c         |  97 ++++++++++
 tools/xl/xl_vmcontrol.c           |   8 +
 xen/arch/x86/domain.c             |  11 ++
 xen/arch/x86/hvm/hvm.c            |   2 +-
 xen/arch/x86/mm/hap/hap.c         |   3 +-
 xen/arch/x86/mm/mem_sharing.c     | 297 ++++++++++++++++++++++++++++++
 xen/arch/x86/mm/p2m.c             |  25 ++-
 xen/include/asm-x86/hap.h         |   1 +
 xen/include/asm-x86/mem_sharing.h |  17 ++
 xen/include/public/memory.h       |   6 +
 xen/include/xen/sched.h           |   2 +
 22 files changed, 763 insertions(+), 104 deletions(-)

-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Xen-devel] [PATCH v8 1/5] x86/p2m: Allow p2m_get_page_from_gfn to return shared entries
  2020-02-10 19:21 [Xen-devel] [PATCH v8 0/5] VM forking Tamas K Lengyel
@ 2020-02-10 19:21 ` Tamas K Lengyel
  2020-02-11  9:16   ` Jan Beulich
  2020-02-21 13:48   ` Andrew Cooper
  2020-02-10 19:21 ` [Xen-devel] [PATCH v8 2/5] xen/x86: Make hap_get_allocation accessible Tamas K Lengyel
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 20+ messages in thread
From: Tamas K Lengyel @ 2020-02-10 19:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	Jan Beulich, Roger Pau Monné

The owner domain of shared pages is dom_cow, use that for get_page
otherwise the function fails to return the correct page under some
situations. The check if dom_cow should be used was only performed in
a subset of use-cases. Fixing the error and simplifying the existing check
since we can't have any shared entries with dom_cow being NULL.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/p2m.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index fd9f09536d..2c0bb7e869 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -574,11 +574,12 @@ struct page_info *p2m_get_page_from_gfn(
                 if ( fdom == NULL )
                     page = NULL;
             }
-            else if ( !get_page(page, p2m->domain) &&
-                      /* Page could be shared */
-                      (!dom_cow || !p2m_is_shared(*t) ||
-                       !get_page(page, dom_cow)) )
-                page = NULL;
+            else
+            {
+                struct domain *d = !p2m_is_shared(*t) ? p2m->domain : dom_cow;
+                if ( !get_page(page, d) )
+                    page = NULL;
+            }
         }
         p2m_read_unlock(p2m);
 
@@ -594,8 +595,9 @@ struct page_info *p2m_get_page_from_gfn(
     mfn = get_gfn_type_access(p2m, gfn_x(gfn), t, a, q, NULL);
     if ( p2m_is_ram(*t) && mfn_valid(mfn) )
     {
+        struct domain *d = !p2m_is_shared(*t) ? p2m->domain : dom_cow;
         page = mfn_to_page(mfn);
-        if ( !get_page(page, p2m->domain) )
+        if ( !get_page(page, d) )
             page = NULL;
     }
     put_gfn(p2m->domain, gfn_x(gfn));
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Xen-devel] [PATCH v8 2/5] xen/x86: Make hap_get_allocation accessible
  2020-02-10 19:21 [Xen-devel] [PATCH v8 0/5] VM forking Tamas K Lengyel
  2020-02-10 19:21 ` [Xen-devel] [PATCH v8 1/5] x86/p2m: Allow p2m_get_page_from_gfn to return shared entries Tamas K Lengyel
@ 2020-02-10 19:21 ` Tamas K Lengyel
  2020-02-10 19:21 ` [Xen-devel] [PATCH v8 3/5] xen/mem_sharing: VM forking Tamas K Lengyel
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Tamas K Lengyel @ 2020-02-10 19:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	Jan Beulich, Roger Pau Monné

During VM forking we'll copy the parent domain's parameters to the client,
including the HAP shadow memory setting that is used for storing the domain's
EPT. We'll copy this in the hypervisor instead doing it during toolstack launch
to allow the domain to start executing and unsharing memory before (or
even completely without) the toolstack.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/hap/hap.c | 3 +--
 xen/include/asm-x86/hap.h | 1 +
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 3d93f3451c..c7c7ff6e99 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -321,8 +321,7 @@ static void hap_free_p2m_page(struct domain *d, struct page_info *pg)
 }
 
 /* Return the size of the pool, rounded up to the nearest MB */
-static unsigned int
-hap_get_allocation(struct domain *d)
+unsigned int hap_get_allocation(struct domain *d)
 {
     unsigned int pg = d->arch.paging.hap.total_pages
         + d->arch.paging.hap.p2m_pages;
diff --git a/xen/include/asm-x86/hap.h b/xen/include/asm-x86/hap.h
index b94bfb4ed0..1bf07e49fe 100644
--- a/xen/include/asm-x86/hap.h
+++ b/xen/include/asm-x86/hap.h
@@ -45,6 +45,7 @@ int   hap_track_dirty_vram(struct domain *d,
 
 extern const struct paging_mode *hap_paging_get_mode(struct vcpu *);
 int hap_set_allocation(struct domain *d, unsigned int pages, bool *preempted);
+unsigned int hap_get_allocation(struct domain *d);
 
 #endif /* XEN_HAP_H */
 
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Xen-devel] [PATCH v8 3/5] xen/mem_sharing: VM forking
  2020-02-10 19:21 [Xen-devel] [PATCH v8 0/5] VM forking Tamas K Lengyel
  2020-02-10 19:21 ` [Xen-devel] [PATCH v8 1/5] x86/p2m: Allow p2m_get_page_from_gfn to return shared entries Tamas K Lengyel
  2020-02-10 19:21 ` [Xen-devel] [PATCH v8 2/5] xen/x86: Make hap_get_allocation accessible Tamas K Lengyel
@ 2020-02-10 19:21 ` Tamas K Lengyel
  2020-02-21 13:43   ` Andrew Cooper
  2020-02-21 14:42   ` Andrew Cooper
  2020-02-10 19:21 ` [Xen-devel] [PATCH v8 4/5] x86/mem_sharing: reset a fork Tamas K Lengyel
  2020-02-10 19:21 ` [Xen-devel] [PATCH v8 5/5] xen/tools: VM forking toolstack side Tamas K Lengyel
  4 siblings, 2 replies; 20+ messages in thread
From: Tamas K Lengyel @ 2020-02-10 19:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tamas K Lengyel, Jan Beulich, Julien Grall, Roger Pau Monné

VM forking is the process of creating a domain with an empty memory space and a
parent domain specified from which to populate the memory when necessary. For
the new domain to be functional the VM state is copied over as part of the fork
operation (HVM params, hap allocation, etc).

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/domain.c             |  11 ++
 xen/arch/x86/hvm/hvm.c            |   2 +-
 xen/arch/x86/mm/mem_sharing.c     | 221 ++++++++++++++++++++++++++++++
 xen/arch/x86/mm/p2m.c             |  11 +-
 xen/include/asm-x86/mem_sharing.h |  17 +++
 xen/include/public/memory.h       |   5 +
 xen/include/xen/sched.h           |   2 +
 7 files changed, 267 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f53ae5ff86..a98e2e0479 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2189,6 +2189,17 @@ int domain_relinquish_resources(struct domain *d)
             ret = relinquish_shared_pages(d);
             if ( ret )
                 return ret;
+
+            /*
+             * If the domain is forked, decrement the parent's pause count
+             * and release the domain.
+             */
+            if ( d->parent )
+            {
+                domain_unpause(d->parent);
+                put_domain(d->parent);
+                d->parent = NULL;
+            }
         }
 #endif
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 00a9e70b7c..55520bbd23 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1915,7 +1915,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
     }
 #endif
 
-    /* Spurious fault? PoD and log-dirty also take this path. */
+    /* Spurious fault? PoD, log-dirty and VM forking also take this path. */
     if ( p2m_is_ram(p2mt) )
     {
         rc = 1;
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 3835bc928f..ccf338918d 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -22,6 +22,7 @@
 
 #include <xen/types.h>
 #include <xen/domain_page.h>
+#include <xen/event.h>
 #include <xen/spinlock.h>
 #include <xen/rwlock.h>
 #include <xen/mm.h>
@@ -36,6 +37,9 @@
 #include <asm/altp2m.h>
 #include <asm/atomic.h>
 #include <asm/event.h>
+#include <asm/hap.h>
+#include <asm/hvm/hvm.h>
+#include <asm/hvm/save.h>
 #include <xsm/xsm.h>
 
 #include "mm-locks.h"
@@ -1444,6 +1448,193 @@ static inline int mem_sharing_control(struct domain *d, bool enable)
     return 0;
 }
 
+/*
+ * Forking a page only gets called when the VM faults due to no entry being
+ * in the EPT for the access. Depending on the type of access we either
+ * populate the physmap with a shared entry for read-only access or
+ * fork the page if its a write access.
+ *
+ * The client p2m is already locked so we only need to lock
+ * the parent's here.
+ */
+int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool unsharing)
+{
+    int rc = -ENOENT;
+    shr_handle_t handle;
+    struct domain *parent;
+    struct p2m_domain *p2m;
+    unsigned long gfn_l = gfn_x(gfn);
+    mfn_t mfn, new_mfn;
+    p2m_type_t p2mt;
+    struct page_info *page;
+
+    if ( !mem_sharing_is_fork(d) )
+        return -ENOENT;
+
+    parent = d->parent;
+
+    if ( !unsharing )
+    {
+        /* For read-only accesses we just add a shared entry to the physmap */
+        while ( parent )
+        {
+            if ( !(rc = nominate_page(parent, gfn, 0, &handle)) )
+                break;
+
+            parent = parent->parent;
+        }
+
+        if ( !rc )
+        {
+            /* The client's p2m is already locked */
+            struct p2m_domain *pp2m = p2m_get_hostp2m(parent);
+
+            p2m_lock(pp2m);
+            rc = add_to_physmap(parent, gfn_l, handle, d, gfn_l, false);
+            p2m_unlock(pp2m);
+
+            if ( !rc )
+                return 0;
+        }
+    }
+
+    /*
+     * If it's a write access (ie. unsharing) or if adding a shared entry to
+     * the physmap failed we'll fork the page directly.
+     */
+    p2m = p2m_get_hostp2m(d);
+    parent = d->parent;
+
+    while ( parent )
+    {
+        mfn = get_gfn_query(parent, gfn_l, &p2mt);
+
+        if ( mfn_valid(mfn) && p2m_is_any_ram(p2mt) )
+            break;
+
+        put_gfn(parent, gfn_l);
+        parent = parent->parent;
+    }
+
+    if ( !parent )
+        return -ENOENT;
+
+    if ( !(page = alloc_domheap_page(d, 0)) )
+    {
+        put_gfn(parent, gfn_l);
+        return -ENOMEM;
+    }
+
+    new_mfn = page_to_mfn(page);
+    copy_domain_page(new_mfn, mfn);
+    set_gpfn_from_mfn(mfn_x(new_mfn), gfn_l);
+
+    put_gfn(parent, gfn_l);
+
+    return p2m->set_entry(p2m, gfn, new_mfn, PAGE_ORDER_4K, p2m_ram_rw,
+                          p2m->default_access, -1);
+}
+
+static int bring_up_vcpus(struct domain *cd, struct cpupool *cpupool)
+{
+    int ret;
+    unsigned int i;
+
+    if ( (ret = cpupool_move_domain(cd, cpupool)) )
+        return ret;
+
+    for ( i = 0; i < cd->max_vcpus; i++ )
+    {
+        if ( cd->vcpu[i] )
+            continue;
+
+        if ( !vcpu_create(cd, i) )
+            return -EINVAL;
+    }
+
+    domain_update_node_affinity(cd);
+    return 0;
+}
+
+static int fork_hap_allocation(struct domain *cd, struct domain *d)
+{
+    int rc;
+    bool preempted;
+    unsigned long mb = hap_get_allocation(d);
+
+    if ( mb == hap_get_allocation(cd) )
+        return 0;
+
+    paging_lock(cd);
+    rc = hap_set_allocation(cd, mb << (20 - PAGE_SHIFT), &preempted);
+    paging_unlock(cd);
+
+    if ( rc )
+        return rc;
+
+    if ( preempted )
+        return -ERESTART;
+
+    return 0;
+}
+
+static void fork_tsc(struct domain *cd, struct domain *d)
+{
+    uint32_t tsc_mode;
+    uint32_t gtsc_khz;
+    uint32_t incarnation;
+    uint64_t elapsed_nsec;
+
+    tsc_get_info(d, &tsc_mode, &elapsed_nsec, &gtsc_khz, &incarnation);
+    tsc_set_info(cd, tsc_mode, elapsed_nsec, gtsc_khz, incarnation);
+}
+
+static int mem_sharing_fork(struct domain *d, struct domain *cd)
+{
+    int rc = -EINVAL;
+
+    if ( !cd->controller_pause_count )
+        return rc;
+
+    /*
+     * We only want to get and pause the parent once, not each time this
+     * operation is restarted due to preemption.
+     */
+    if ( !cd->parent_paused )
+    {
+        ASSERT(get_domain(d));
+        domain_pause(d);
+
+        cd->parent_paused = true;
+        cd->max_pages = d->max_pages;
+        cd->max_vcpus = d->max_vcpus;
+    }
+
+    /* this is preemptible so it's the first to get done */
+    if ( (rc = fork_hap_allocation(cd, d)) )
+        goto done;
+
+    if ( (rc = bring_up_vcpus(cd, d->cpupool)) )
+        goto done;
+
+    if ( (rc = hvm_copy_context_and_params(cd, d)) )
+        goto done;
+
+    fork_tsc(cd, d);
+
+    cd->parent = d;
+
+ done:
+    if ( rc && rc != -ERESTART )
+    {
+        domain_unpause(d);
+        put_domain(d);
+        cd->parent_paused = false;
+    }
+
+    return rc;
+}
+
 int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 {
     int rc;
@@ -1698,6 +1889,36 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
         rc = debug_gref(d, mso.u.debug.u.gref);
         break;
 
+    case XENMEM_sharing_op_fork:
+    {
+        struct domain *pd;
+
+        rc = -EINVAL;
+        if ( mso.u.fork._pad[0] || mso.u.fork._pad[1] ||
+             mso.u.fork._pad[2] )
+            goto out;
+
+        rc = rcu_lock_live_remote_domain_by_id(mso.u.fork.parent_domain,
+                                               &pd);
+        if ( rc )
+            goto out;
+
+        if ( !mem_sharing_enabled(pd) )
+        {
+            if ( (rc = mem_sharing_control(pd, true)) )
+                goto out;
+        }
+
+        rc = mem_sharing_fork(pd, d);
+
+        if ( rc == -ERESTART )
+            rc = hypercall_create_continuation(__HYPERVISOR_memory_op,
+                                               "lh", XENMEM_sharing_op,
+                                               arg);
+        rcu_unlock_domain(pd);
+        break;
+    }
+
     default:
         rc = -ENOSYS;
         break;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 2c0bb7e869..72b4485970 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -509,6 +509,14 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
 
     mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
 
+    /* Check if we need to fork the page */
+    if ( (q & P2M_ALLOC) && p2m_is_hole(*t) &&
+         !mem_sharing_fork_page(p2m->domain, gfn, !!(q & P2M_UNSHARE)) )
+    {
+        mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
+    }
+
+    /* Check if we need to unshare the page */
     if ( (q & P2M_UNSHARE) && p2m_is_shared(*t) )
     {
         ASSERT(p2m_is_hostp2m(p2m));
@@ -587,7 +595,8 @@ struct page_info *p2m_get_page_from_gfn(
             return page;
 
         /* Error path: not a suitable GFN at all */
-        if ( !p2m_is_ram(*t) && !p2m_is_paging(*t) && !p2m_is_pod(*t) )
+        if ( !p2m_is_ram(*t) && !p2m_is_paging(*t) && !p2m_is_pod(*t) &&
+             !mem_sharing_is_fork(p2m->domain) )
             return NULL;
     }
 
diff --git a/xen/include/asm-x86/mem_sharing.h b/xen/include/asm-x86/mem_sharing.h
index 53760a2896..ac968fae3f 100644
--- a/xen/include/asm-x86/mem_sharing.h
+++ b/xen/include/asm-x86/mem_sharing.h
@@ -39,6 +39,9 @@ struct mem_sharing_domain
 
 #define mem_sharing_enabled(d) ((d)->arch.hvm.mem_sharing.enabled)
 
+#define mem_sharing_is_fork(d) \
+    (mem_sharing_enabled(d) && !!((d)->parent))
+
 /* Auditing of memory sharing code? */
 #ifndef NDEBUG
 #define MEM_SHARING_AUDIT 1
@@ -88,6 +91,9 @@ static inline int mem_sharing_unshare_page(struct domain *d,
     return rc;
 }
 
+int mem_sharing_fork_page(struct domain *d, gfn_t gfn,
+                          bool unsharing);
+
 /*
  * If called by a foreign domain, possible errors are
  *   -EBUSY -> ring full
@@ -117,6 +123,7 @@ int relinquish_shared_pages(struct domain *d);
 #else
 
 #define mem_sharing_enabled(d) false
+#define mem_sharing_is_fork(p2m) false
 
 static inline unsigned int mem_sharing_get_nr_saved_mfns(void)
 {
@@ -141,6 +148,16 @@ static inline int mem_sharing_notify_enomem(struct domain *d, unsigned long gfn,
     return -EOPNOTSUPP;
 }
 
+static inline int mem_sharing_fork(struct domain *d, struct domain *cd, bool vcpu)
+{
+    return -EOPNOTSUPP;
+}
+
+static inline int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool lock)
+{
+    return -EOPNOTSUPP;
+}
+
 #endif
 
 #endif /* __MEM_SHARING_H__ */
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index cfdda6e2a8..90a3f4498e 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -482,6 +482,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
 #define XENMEM_sharing_op_add_physmap       6
 #define XENMEM_sharing_op_audit             7
 #define XENMEM_sharing_op_range_share       8
+#define XENMEM_sharing_op_fork              9
 
 #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
 #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
@@ -532,6 +533,10 @@ struct xen_mem_sharing_op {
                 uint32_t gref;     /* IN: gref to debug         */
             } u;
         } debug;
+        struct mem_sharing_op_fork {
+            domid_t parent_domain;
+            uint16_t _pad[3];                /* Must be set to 0 */
+        } fork;
     } u;
 };
 typedef struct xen_mem_sharing_op xen_mem_sharing_op_t;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 7c5c437247..8ed727e10c 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -507,6 +507,8 @@ struct domain
     /* Memory sharing support */
 #ifdef CONFIG_MEM_SHARING
     struct vm_event_domain *vm_event_share;
+    struct domain *parent; /* VM fork parent */
+    bool parent_paused;
 #endif
     /* Memory paging support */
 #ifdef CONFIG_HAS_MEM_PAGING
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Xen-devel] [PATCH v8 4/5] x86/mem_sharing: reset a fork
  2020-02-10 19:21 [Xen-devel] [PATCH v8 0/5] VM forking Tamas K Lengyel
                   ` (2 preceding siblings ...)
  2020-02-10 19:21 ` [Xen-devel] [PATCH v8 3/5] xen/mem_sharing: VM forking Tamas K Lengyel
@ 2020-02-10 19:21 ` Tamas K Lengyel
  2020-02-10 19:21 ` [Xen-devel] [PATCH v8 5/5] xen/tools: VM forking toolstack side Tamas K Lengyel
  4 siblings, 0 replies; 20+ messages in thread
From: Tamas K Lengyel @ 2020-02-10 19:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Stefano Stabellini,
	Jan Beulich, Julien Grall, Roger Pau Monné

Implement hypercall that allows a fork to shed all memory that got allocated
for it during its execution and re-load its vCPU context from the parent VM.
This allows the forked VM to reset into the same state the parent VM is in a
faster way then creating a new fork would be. Measurements show about a 2x
speedup during normal fuzzing operations. Performance may vary depending how
much memory got allocated for the forked VM. If it has been completely
deduplicated from the parent VM then creating a new fork would likely be more
performant.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 76 +++++++++++++++++++++++++++++++++++
 xen/include/public/memory.h   |  1 +
 2 files changed, 77 insertions(+)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index ccf338918d..9d61592efa 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1635,6 +1635,59 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd)
     return rc;
 }
 
+/*
+ * The fork reset operation is intended to be used on short-lived forks only.
+ * There is no hypercall continuation operation implemented for this reason.
+ * For forks that obtain a larger memory footprint it is likely going to be
+ * more performant to create a new fork instead of resetting an existing one.
+ *
+ * TODO: In case this hypercall would become useful on forks with larger memory
+ * footprints the hypercall continuation should be implemented.
+ */
+static int mem_sharing_fork_reset(struct domain *d, struct domain *cd)
+{
+    int rc;
+    struct p2m_domain* p2m = p2m_get_hostp2m(cd);
+    struct page_info *page, *tmp;
+
+    domain_pause(cd);
+
+    page_list_for_each_safe(page, tmp, &cd->page_list)
+    {
+        p2m_type_t p2mt;
+        p2m_access_t p2ma;
+        gfn_t gfn;
+        mfn_t mfn = page_to_mfn(page);
+
+        if ( !mfn_valid(mfn) )
+            continue;
+
+        gfn = mfn_to_gfn(cd, mfn);
+        mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
+                                    0, NULL, false);
+
+        if ( !p2m_is_ram(p2mt) || p2m_is_shared(p2mt) )
+            continue;
+
+        /* take an extra reference */
+        if ( !get_page(page, cd) )
+            continue;
+
+        rc = p2m->set_entry(p2m, gfn, INVALID_MFN, PAGE_ORDER_4K,
+                            p2m_invalid, p2m_access_rwx, -1);
+        ASSERT(!rc);
+
+        put_page_alloc_ref(page);
+        put_page(page);
+    }
+
+    if ( !(rc = hvm_copy_context_and_params(cd, d)) )
+        fork_tsc(cd, d);
+
+    domain_unpause(cd);
+    return rc;
+}
+
 int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 {
     int rc;
@@ -1919,6 +1972,29 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
         break;
     }
 
+    case XENMEM_sharing_op_fork_reset:
+    {
+        struct domain *pd;
+
+        rc = -EINVAL;
+        if ( mso.u.fork._pad[0] || mso.u.fork._pad[1] ||
+             mso.u.fork._pad[2] )
+            goto out;
+
+        rc = -ENOSYS;
+        if ( !d->parent )
+            goto out;
+
+        rc = rcu_lock_live_remote_domain_by_id(d->parent->domain_id, &pd);
+        if ( rc )
+            goto out;
+
+        rc = mem_sharing_fork_reset(pd, d);
+
+        rcu_unlock_domain(pd);
+        break;
+    }
+
     default:
         rc = -ENOSYS;
         break;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 90a3f4498e..e3d063e22e 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -483,6 +483,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
 #define XENMEM_sharing_op_audit             7
 #define XENMEM_sharing_op_range_share       8
 #define XENMEM_sharing_op_fork              9
+#define XENMEM_sharing_op_fork_reset        10
 
 #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
 #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Xen-devel] [PATCH v8 5/5] xen/tools: VM forking toolstack side
  2020-02-10 19:21 [Xen-devel] [PATCH v8 0/5] VM forking Tamas K Lengyel
                   ` (3 preceding siblings ...)
  2020-02-10 19:21 ` [Xen-devel] [PATCH v8 4/5] x86/mem_sharing: reset a fork Tamas K Lengyel
@ 2020-02-10 19:21 ` Tamas K Lengyel
  4 siblings, 0 replies; 20+ messages in thread
From: Tamas K Lengyel @ 2020-02-10 19:21 UTC (permalink / raw)
  To: xen-devel; +Cc: Anthony PERARD, Ian Jackson, Tamas K Lengyel, Wei Liu

Add necessary bits to implement "xl fork-vm" commands. The command allows the
user to specify how to launch the device model allowing for a late-launch model
in which the user can execute the fork without the device model and decide to
only later launch it.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
v8: don't try to unpause twice when launching dm
---
 docs/man/xl.1.pod.in          |  36 +++++
 tools/libxc/include/xenctrl.h |  13 ++
 tools/libxc/xc_memshr.c       |  22 +++
 tools/libxl/libxl.h           |   7 +
 tools/libxl/libxl_create.c    | 256 ++++++++++++++++++++++------------
 tools/libxl/libxl_dm.c        |   2 +-
 tools/libxl/libxl_dom.c       |  43 +++++-
 tools/libxl/libxl_internal.h  |   1 +
 tools/libxl/libxl_types.idl   |   1 +
 tools/xl/xl.h                 |   5 +
 tools/xl/xl_cmdtable.c        |  12 ++
 tools/xl/xl_saverestore.c     |  97 +++++++++++++
 tools/xl/xl_vmcontrol.c       |   8 ++
 13 files changed, 409 insertions(+), 94 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index 33ad2ebd71..c4012939f5 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -694,6 +694,42 @@ Leave the domain paused after creating the snapshot.
 
 =back
 
+=item B<fork-vm> [I<OPTIONS>] I<domain-id>
+
+Create a fork of a running VM. The domain will be paused after the operation
+and needs to remain paused while forks of it exist.
+
+B<OPTIONS>
+
+=over 4
+
+=item B<-p>
+
+Leave the fork paused after creating it.
+
+=item B<--launch-dm>
+
+Specify whether the device model (QEMU) should be launched for the fork. Late
+launch allows to start the device model for an already running fork.
+
+=item B<-C>
+
+The config file to use when launching the device model. Currently required when
+launching the device model.
+
+=item B<-Q>
+
+The qemu save file to use when launching the device model.  Currently required
+when launching the device model.
+
+=item B<--fork-reset>
+
+Perform a reset operation of an already running fork. Note that resetting may
+be less performant then creating a new fork depending on how much memory the
+fork has deduplicated during its runtime.
+
+=back
+
 =item B<sharing> [I<domain-id>]
 
 Display the number of shared pages for a specified domain. If no domain is
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index cc4eb1e3d3..6f65888dd0 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2225,6 +2225,19 @@ int xc_memshr_range_share(xc_interface *xch,
                           uint64_t first_gfn,
                           uint64_t last_gfn);
 
+int xc_memshr_fork(xc_interface *xch,
+                   uint32_t source_domain,
+                   uint32_t client_domain);
+
+/*
+ * Note: this function is only intended to be used on short-lived forks that
+ * haven't yet aquired a lot of memory. In case the fork has a lot of memory
+ * it is likely more performant to create a new fork with xc_memshr_fork.
+ *
+ * With VMs that have a lot of memory this call may block for a long time.
+ */
+int xc_memshr_fork_reset(xc_interface *xch, uint32_t forked_domain);
+
 /* Debug calls: return the number of pages referencing the shared frame backing
  * the input argument. Should be one or greater.
  *
diff --git a/tools/libxc/xc_memshr.c b/tools/libxc/xc_memshr.c
index 97e2e6a8d9..d0e4ee225b 100644
--- a/tools/libxc/xc_memshr.c
+++ b/tools/libxc/xc_memshr.c
@@ -239,6 +239,28 @@ int xc_memshr_debug_gref(xc_interface *xch,
     return xc_memshr_memop(xch, domid, &mso);
 }
 
+int xc_memshr_fork(xc_interface *xch, uint32_t pdomid, uint32_t domid)
+{
+    xen_mem_sharing_op_t mso;
+
+    memset(&mso, 0, sizeof(mso));
+
+    mso.op = XENMEM_sharing_op_fork;
+    mso.u.fork.parent_domain = pdomid;
+
+    return xc_memshr_memop(xch, domid, &mso);
+}
+
+int xc_memshr_fork_reset(xc_interface *xch, uint32_t domid)
+{
+    xen_mem_sharing_op_t mso;
+
+    memset(&mso, 0, sizeof(mso));
+    mso.op = XENMEM_sharing_op_fork_reset;
+
+    return xc_memshr_memop(xch, domid, &mso);
+}
+
 int xc_memshr_audit(xc_interface *xch)
 {
     xen_mem_sharing_op_t mso;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 18c1a2d6bf..094ab0d205 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1538,6 +1538,13 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
                             LIBXL_EXTERNAL_CALLERS_ONLY;
+int libxl_domain_fork_vm(libxl_ctx *ctx, uint32_t pdomid, uint32_t *domid)
+                             LIBXL_EXTERNAL_CALLERS_ONLY;
+int libxl_domain_fork_launch_dm(libxl_ctx *ctx, libxl_domain_config *d_config,
+                                uint32_t domid,
+                                const libxl_asyncprogress_how *aop_console_how)
+                                LIBXL_EXTERNAL_CALLERS_ONLY;
+int libxl_domain_fork_reset(libxl_ctx *ctx, uint32_t domid);
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 uint32_t *domid, int restore_fd,
                                 int send_back_fd,
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 3a7364e2ac..9dd9802fc7 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -536,12 +536,12 @@ out:
     return ret;
 }
 
-int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
-                       libxl__domain_build_state *state,
-                       uint32_t *domid, bool soft_reset)
+static int libxl__domain_make_xs_entries(libxl__gc *gc, libxl_domain_config *d_config,
+                                         libxl__domain_build_state *state,
+                                         uint32_t domid)
 {
     libxl_ctx *ctx = libxl__gc_owner(gc);
-    int ret, rc, nb_vm;
+    int rc, nb_vm;
     const char *dom_type;
     char *uuid_string;
     char *dom_path, *vm_path, *libxl_path;
@@ -553,9 +553,6 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
 
     /* convenience aliases */
     libxl_domain_create_info *info = &d_config->c_info;
-    libxl_domain_build_info *b_info = &d_config->b_info;
-
-    assert(soft_reset || *domid == INVALID_DOMID);
 
     uuid_string = libxl__uuid2string(gc, info->uuid);
     if (!uuid_string) {
@@ -563,71 +560,7 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
         goto out;
     }
 
-    if (!soft_reset) {
-        struct xen_domctl_createdomain create = {
-            .ssidref = info->ssidref,
-            .max_vcpus = b_info->max_vcpus,
-            .max_evtchn_port = b_info->event_channels,
-            .max_grant_frames = b_info->max_grant_frames,
-            .max_maptrack_frames = b_info->max_maptrack_frames,
-        };
-
-        if (info->type != LIBXL_DOMAIN_TYPE_PV) {
-            create.flags |= XEN_DOMCTL_CDF_hvm;
-            create.flags |=
-                libxl_defbool_val(info->hap) ? XEN_DOMCTL_CDF_hap : 0;
-            create.flags |=
-                libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off;
-        }
-
-        assert(info->passthrough != LIBXL_PASSTHROUGH_DEFAULT);
-        LOG(DETAIL, "passthrough: %s",
-            libxl_passthrough_to_string(info->passthrough));
-
-        if (info->passthrough != LIBXL_PASSTHROUGH_DISABLED)
-            create.flags |= XEN_DOMCTL_CDF_iommu;
-
-        if (info->passthrough == LIBXL_PASSTHROUGH_SYNC_PT)
-            create.iommu_opts |= XEN_DOMCTL_IOMMU_no_sharept;
-
-        /* Ultimately, handle is an array of 16 uint8_t, same as uuid */
-        libxl_uuid_copy(ctx, (libxl_uuid *)&create.handle, &info->uuid);
-
-        ret = libxl__arch_domain_prepare_config(gc, d_config, &create);
-        if (ret < 0) {
-            LOGED(ERROR, *domid, "fail to get domain config");
-            rc = ERROR_FAIL;
-            goto out;
-        }
-
-        ret = xc_domain_create(ctx->xch, domid, &create);
-        if (ret < 0) {
-            LOGED(ERROR, *domid, "domain creation fail");
-            rc = ERROR_FAIL;
-            goto out;
-        }
-
-        rc = libxl__arch_domain_save_config(gc, d_config, state, &create);
-        if (rc < 0)
-            goto out;
-    }
-
-    /*
-     * If soft_reset is set the the domid will have been valid on entry.
-     * If it was not set then xc_domain_create() should have assigned a
-     * valid value. Either way, if we reach this point, domid should be
-     * valid.
-     */
-    assert(libxl_domid_valid_guest(*domid));
-
-    ret = xc_cpupool_movedomain(ctx->xch, info->poolid, *domid);
-    if (ret < 0) {
-        LOGED(ERROR, *domid, "domain move fail");
-        rc = ERROR_FAIL;
-        goto out;
-    }
-
-    dom_path = libxl__xs_get_dompath(gc, *domid);
+    dom_path = libxl__xs_get_dompath(gc, domid);
     if (!dom_path) {
         rc = ERROR_FAIL;
         goto out;
@@ -635,12 +568,12 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
 
     vm_path = GCSPRINTF("/vm/%s", uuid_string);
     if (!vm_path) {
-        LOGD(ERROR, *domid, "cannot allocate create paths");
+        LOGD(ERROR, domid, "cannot allocate create paths");
         rc = ERROR_FAIL;
         goto out;
     }
 
-    libxl_path = libxl__xs_libxl_path(gc, *domid);
+    libxl_path = libxl__xs_libxl_path(gc, domid);
     if (!libxl_path) {
         rc = ERROR_FAIL;
         goto out;
@@ -651,10 +584,10 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
 
     roperm[0].id = 0;
     roperm[0].perms = XS_PERM_NONE;
-    roperm[1].id = *domid;
+    roperm[1].id = domid;
     roperm[1].perms = XS_PERM_READ;
 
-    rwperm[0].id = *domid;
+    rwperm[0].id = domid;
     rwperm[0].perms = XS_PERM_NONE;
 
 retry_transaction:
@@ -672,7 +605,7 @@ retry_transaction:
                     noperm, ARRAY_SIZE(noperm));
 
     xs_write(ctx->xsh, t, GCSPRINTF("%s/vm", dom_path), vm_path, strlen(vm_path));
-    rc = libxl__domain_rename(gc, *domid, 0, info->name, t);
+    rc = libxl__domain_rename(gc, domid, 0, info->name, t);
     if (rc)
         goto out;
 
@@ -749,7 +682,7 @@ retry_transaction:
 
     vm_list = libxl_list_vm(ctx, &nb_vm);
     if (!vm_list) {
-        LOGD(ERROR, *domid, "cannot get number of running guests");
+        LOGD(ERROR, domid, "cannot get number of running guests");
         rc = ERROR_FAIL;
         goto out;
     }
@@ -773,7 +706,7 @@ retry_transaction:
             t = 0;
             goto retry_transaction;
         }
-        LOGED(ERROR, *domid, "domain creation ""xenstore transaction commit failed");
+        LOGED(ERROR, domid, "domain creation ""xenstore transaction commit failed");
         rc = ERROR_FAIL;
         goto out;
     }
@@ -785,6 +718,89 @@ retry_transaction:
     return rc;
 }
 
+int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
+                       libxl__domain_build_state *state,
+                       uint32_t *domid, bool soft_reset)
+{
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+    int ret, rc;
+
+    /* convenience aliases */
+    libxl_domain_create_info *info = &d_config->c_info;
+    libxl_domain_build_info *b_info = &d_config->b_info;
+
+    assert(soft_reset || *domid == INVALID_DOMID);
+
+    if (!soft_reset) {
+        struct xen_domctl_createdomain create = {
+            .ssidref = info->ssidref,
+            .max_vcpus = b_info->max_vcpus,
+            .max_evtchn_port = b_info->event_channels,
+            .max_grant_frames = b_info->max_grant_frames,
+            .max_maptrack_frames = b_info->max_maptrack_frames,
+        };
+
+        if (info->type != LIBXL_DOMAIN_TYPE_PV) {
+            create.flags |= XEN_DOMCTL_CDF_hvm;
+            create.flags |=
+                libxl_defbool_val(info->hap) ? XEN_DOMCTL_CDF_hap : 0;
+            create.flags |=
+                libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off;
+        }
+
+        assert(info->passthrough != LIBXL_PASSTHROUGH_DEFAULT);
+        LOG(DETAIL, "passthrough: %s",
+            libxl_passthrough_to_string(info->passthrough));
+
+        if (info->passthrough != LIBXL_PASSTHROUGH_DISABLED)
+            create.flags |= XEN_DOMCTL_CDF_iommu;
+
+        if (info->passthrough == LIBXL_PASSTHROUGH_SYNC_PT)
+            create.iommu_opts |= XEN_DOMCTL_IOMMU_no_sharept;
+
+        /* Ultimately, handle is an array of 16 uint8_t, same as uuid */
+        libxl_uuid_copy(ctx, (libxl_uuid *)&create.handle, &info->uuid);
+
+        ret = libxl__arch_domain_prepare_config(gc, d_config, &create);
+        if (ret < 0) {
+            LOGED(ERROR, *domid, "fail to get domain config");
+            rc = ERROR_FAIL;
+            goto out;
+        }
+
+        ret = xc_domain_create(ctx->xch, domid, &create);
+        if (ret < 0) {
+            LOGED(ERROR, *domid, "domain creation fail");
+            rc = ERROR_FAIL;
+            goto out;
+        }
+
+        rc = libxl__arch_domain_save_config(gc, d_config, state, &create);
+        if (rc < 0)
+            goto out;
+    }
+
+    /*
+     * If soft_reset is set the the domid will have been valid on entry.
+     * If it was not set then xc_domain_create() should have assigned a
+     * valid value. Either way, if we reach this point, domid should be
+     * valid.
+     */
+    assert(libxl_domid_valid_guest(*domid));
+
+    ret = xc_cpupool_movedomain(ctx->xch, info->poolid, *domid);
+    if (ret < 0) {
+        LOGED(ERROR, *domid, "domain move fail");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = libxl__domain_make_xs_entries(gc, d_config, state, *domid);
+
+out:
+    return rc;
+}
+
 static int store_libxl_entry(libxl__gc *gc, uint32_t domid,
                              libxl_domain_build_info *b_info)
 {
@@ -1106,16 +1122,32 @@ static void initiate_domain_create(libxl__egc *egc,
     ret = libxl__domain_config_setdefault(gc,d_config,domid);
     if (ret) goto error_out;
 
-    ret = libxl__domain_make(gc, d_config, &dcs->build_state, &domid,
-                             dcs->soft_reset);
-    if (ret) {
-        LOGD(ERROR, domid, "cannot make domain: %d", ret);
+    if ( !d_config->dm_restore_file )
+    {
+        ret = libxl__domain_make(gc, d_config, &dcs->build_state, &domid,
+                                 dcs->soft_reset);
         dcs->guest_domid = domid;
+
+        if (ret) {
+            LOGD(ERROR, domid, "cannot make domain: %d", ret);
+            ret = ERROR_FAIL;
+            goto error_out;
+        }
+    } else if ( dcs->guest_domid != INVALID_DOMID ) {
+        domid = dcs->guest_domid;
+
+        ret = libxl__domain_make_xs_entries(gc, d_config, &dcs->build_state, domid);
+        if (ret) {
+            LOGD(ERROR, domid, "cannot make domain: %d", ret);
+            ret = ERROR_FAIL;
+            goto error_out;
+        }
+    } else {
+        LOGD(ERROR, domid, "cannot make domain");
         ret = ERROR_FAIL;
         goto error_out;
     }
 
-    dcs->guest_domid = domid;
     dcs->sdss.dm.guest_domid = 0; /* means we haven't spawned */
 
     /* post-4.13 todo: move these next bits of defaulting to
@@ -1151,7 +1183,7 @@ static void initiate_domain_create(libxl__egc *egc,
     if (ret)
         goto error_out;
 
-    if (restore_fd >= 0 || dcs->soft_reset) {
+    if (restore_fd >= 0 || dcs->soft_reset || d_config->dm_restore_file) {
         LOGD(DEBUG, domid, "restoring, not running bootloader");
         domcreate_bootloader_done(egc, &dcs->bl, 0);
     } else  {
@@ -1227,7 +1259,16 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     dcs->sdss.dm.callback = domcreate_devmodel_started;
     dcs->sdss.callback = domcreate_devmodel_started;
 
-    if (restore_fd < 0 && !dcs->soft_reset) {
+    if (restore_fd < 0 && !dcs->soft_reset && !d_config->dm_restore_file) {
+        rc = libxl__domain_build(gc, d_config, domid, state);
+        domcreate_rebuild_done(egc, dcs, rc);
+        return;
+    }
+
+    if ( d_config->dm_restore_file ) {
+        dcs->srs.dcs = dcs;
+        dcs->srs.ao = ao;
+        state->forked_vm = true;
         rc = libxl__domain_build(gc, d_config, domid, state);
         domcreate_rebuild_done(egc, dcs, rc);
         return;
@@ -1425,6 +1466,7 @@ static void domcreate_rebuild_done(libxl__egc *egc,
     /* convenience aliases */
     const uint32_t domid = dcs->guest_domid;
     libxl_domain_config *const d_config = dcs->guest_config;
+    libxl__domain_build_state *const state = &dcs->build_state;
 
     if (ret) {
         LOGD(ERROR, domid, "cannot (re-)build domain: %d", ret);
@@ -1432,6 +1474,9 @@ static void domcreate_rebuild_done(libxl__egc *egc,
         goto error_out;
     }
 
+    if ( d_config->dm_restore_file )
+        state->saved_state = GCSPRINTF("%s", d_config->dm_restore_file);
+
     store_libxl_entry(gc, domid, &d_config->b_info);
 
     libxl__multidev_begin(ao, &dcs->multidev);
@@ -1833,6 +1878,8 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
     GCNEW(cdcs);
     cdcs->dcs.ao = ao;
     cdcs->dcs.guest_config = d_config;
+    cdcs->dcs.guest_domid = *domid;
+
     libxl_domain_config_init(&cdcs->dcs.guest_config_saved);
     libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config);
     cdcs->dcs.restore_fd = cdcs->dcs.libxc_fd = restore_fd;
@@ -2081,6 +2128,43 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             ao_how, aop_console_how);
 }
 
+int libxl_domain_fork_vm(libxl_ctx *ctx, uint32_t pdomid, uint32_t *domid)
+{
+    int rc;
+    struct xen_domctl_createdomain create = {0};
+    create.flags |= XEN_DOMCTL_CDF_hvm;
+    create.flags |= XEN_DOMCTL_CDF_hap;
+    create.flags |= XEN_DOMCTL_CDF_oos_off;
+    create.arch.emulation_flags = (XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI);
+
+    create.ssidref = SECINITSID_DOMU;
+    create.max_vcpus = 1; // placeholder, will be cloned from pdomid
+    create.max_evtchn_port = 1023;
+    create.max_grant_frames = LIBXL_MAX_GRANT_FRAMES_DEFAULT;
+    create.max_maptrack_frames = LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT;
+
+    if ( (rc = xc_domain_create(ctx->xch, domid, &create)) )
+        return rc;
+
+    if ( (rc = xc_memshr_fork(ctx->xch, pdomid, *domid)) )
+        xc_domain_destroy(ctx->xch, *domid);
+
+    return rc;
+}
+
+int libxl_domain_fork_launch_dm(libxl_ctx *ctx, libxl_domain_config *d_config,
+                                uint32_t domid,
+                                const libxl_asyncprogress_how *aop_console_how)
+{
+    unset_disk_colo_restore(d_config);
+    return do_domain_create(ctx, d_config, &domid, -1, -1, 0, 0, aop_console_how);
+}
+
+int libxl_domain_fork_reset(libxl_ctx *ctx, uint32_t domid)
+{
+    return xc_memshr_fork_reset(ctx->xch, domid);
+}
+
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 uint32_t *domid, int restore_fd,
                                 int send_back_fd,
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 3b1da90167..87ae1478cf 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -2787,7 +2787,7 @@ static void device_model_spawn_outcome(libxl__egc *egc,
 
     libxl__domain_build_state *state = dmss->build_state;
 
-    if (state->saved_state) {
+    if (state->saved_state && !state->forked_vm) {
         ret2 = unlink(state->saved_state);
         if (ret2) {
             LOGED(ERROR, dmss->guest_domid, "%s: failed to remove device-model state %s",
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index d9ada8a422..e7c54ddf63 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -249,9 +249,12 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
     libxl_domain_build_info *const info = &d_config->b_info;
     libxl_ctx *ctx = libxl__gc_owner(gc);
     char *xs_domid, *con_domid;
-    int rc;
+    int rc = 0;
     uint64_t size;
 
+    if ( state->forked_vm )
+        goto skip_fork;
+
     if (xc_domain_max_vcpus(ctx->xch, domid, info->max_vcpus) != 0) {
         LOG(ERROR, "Couldn't set max vcpu count");
         return ERROR_FAIL;
@@ -362,7 +365,6 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
         }
     }
 
-
     rc = libxl__arch_extra_memory(gc, info, &size);
     if (rc < 0) {
         LOGE(ERROR, "Couldn't get arch extra constant memory size");
@@ -374,6 +376,11 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
         return ERROR_FAIL;
     }
 
+    rc = libxl__arch_domain_create(gc, d_config, domid);
+    if ( rc )
+        goto out;
+
+skip_fork:
     xs_domid = xs_read(ctx->xsh, XBT_NULL, "/tool/xenstored/domid", NULL);
     state->store_domid = xs_domid ? atoi(xs_domid) : 0;
     free(xs_domid);
@@ -385,8 +392,7 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
     state->store_port = xc_evtchn_alloc_unbound(ctx->xch, domid, state->store_domid);
     state->console_port = xc_evtchn_alloc_unbound(ctx->xch, domid, state->console_domid);
 
-    rc = libxl__arch_domain_create(gc, d_config, domid);
-
+out:
     return rc;
 }
 
@@ -444,6 +450,9 @@ int libxl__build_post(libxl__gc *gc, uint32_t domid,
     char **ents;
     int i, rc;
 
+    if ( state->forked_vm )
+        goto skip_fork;
+
     if (info->num_vnuma_nodes && !info->num_vcpu_soft_affinity) {
         rc = set_vnuma_affinity(gc, domid, info);
         if (rc)
@@ -468,6 +477,7 @@ int libxl__build_post(libxl__gc *gc, uint32_t domid,
         }
     }
 
+skip_fork:
     ents = libxl__calloc(gc, 12 + (info->max_vcpus * 2) + 2, sizeof(char *));
     ents[0] = "memory/static-max";
     ents[1] = GCSPRINTF("%"PRId64, info->max_memkb);
@@ -730,14 +740,16 @@ static int hvm_build_set_params(xc_interface *handle, uint32_t domid,
                                 libxl_domain_build_info *info,
                                 int store_evtchn, unsigned long *store_mfn,
                                 int console_evtchn, unsigned long *console_mfn,
-                                domid_t store_domid, domid_t console_domid)
+                                domid_t store_domid, domid_t console_domid,
+                                bool forked_vm)
 {
     struct hvm_info_table *va_hvm;
     uint8_t *va_map, sum;
     uint64_t str_mfn, cons_mfn;
     int i;
 
-    if (info->type == LIBXL_DOMAIN_TYPE_HVM) {
+    if ( info->type == LIBXL_DOMAIN_TYPE_HVM && !forked_vm )
+    {
         va_map = xc_map_foreign_range(handle, domid,
                                       XC_PAGE_SIZE, PROT_READ | PROT_WRITE,
                                       HVM_INFO_PFN);
@@ -1053,6 +1065,23 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     struct xc_dom_image *dom = NULL;
     bool device_model = info->type == LIBXL_DOMAIN_TYPE_HVM ? true : false;
 
+    if ( state->forked_vm )
+    {
+        rc = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
+                                  &state->store_mfn, state->console_port,
+                                  &state->console_mfn, state->store_domid,
+                                  state->console_domid, state->forked_vm);
+
+        if ( rc )
+            return rc;
+
+        return xc_dom_gnttab_seed(ctx->xch, domid, true,
+                                  state->console_mfn,
+                                  state->store_mfn,
+                                  state->console_domid,
+                                  state->store_domid);
+    }
+
     xc_dom_loginit(ctx->xch);
 
     /*
@@ -1177,7 +1206,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     rc = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
                                &state->store_mfn, state->console_port,
                                &state->console_mfn, state->store_domid,
-                               state->console_domid);
+                               state->console_domid, false);
     if (rc != 0) {
         LOG(ERROR, "hvm build set params failed");
         goto out;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index dd3c08bc14..f69a8387ed 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1374,6 +1374,7 @@ typedef struct {
 
     char *saved_state;
     int dm_monitor_fd;
+    bool forked_vm;
 
     libxl__file_reference pv_kernel;
     libxl__file_reference pv_ramdisk;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 7921950f6a..7c4c4057a9 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -956,6 +956,7 @@ libxl_domain_config = Struct("domain_config", [
     ("on_watchdog", libxl_action_on_shutdown),
     ("on_crash", libxl_action_on_shutdown),
     ("on_soft_reset", libxl_action_on_shutdown),
+    ("dm_restore_file", string, {'const': True}),
     ], dir=DIR_IN)
 
 libxl_diskinfo = Struct("diskinfo", [
diff --git a/tools/xl/xl.h b/tools/xl/xl.h
index 60bdad8ffb..9bdad6526e 100644
--- a/tools/xl/xl.h
+++ b/tools/xl/xl.h
@@ -31,6 +31,7 @@ struct cmd_spec {
 };
 
 struct domain_create {
+    uint32_t ddomid; /* fork launch dm for this domid */
     int debug;
     int daemonize;
     int monitor; /* handle guest reboots etc */
@@ -45,6 +46,7 @@ struct domain_create {
     const char *config_file;
     char *extra_config; /* extra config string */
     const char *restore_file;
+    const char *dm_restore_file;
     char *colo_proxy_script;
     bool userspace_colo_proxy;
     int migrate_fd; /* -1 means none */
@@ -127,6 +129,9 @@ int main_pciassignable_remove(int argc, char **argv);
 int main_pciassignable_list(int argc, char **argv);
 #ifndef LIBXL_HAVE_NO_SUSPEND_RESUME
 int main_restore(int argc, char **argv);
+int main_fork_vm(int argc, char **argv);
+int main_fork_launch_dm(int argc, char **argv);
+int main_fork_reset(int argc, char **argv);
 int main_migrate_receive(int argc, char **argv);
 int main_save(int argc, char **argv);
 int main_migrate(int argc, char **argv);
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 3b302b2f20..3a5d371057 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -185,6 +185,18 @@ struct cmd_spec cmd_table[] = {
       "Restore a domain from a saved state",
       "- for internal use only",
     },
+    { "fork-vm",
+      &main_fork_vm, 0, 1,
+      "Fork a domain from the running parent domid",
+      "[options] <Domid>",
+      "-h                           Print this help.\n"
+      "-C <config>                  Use config file for VM fork.\n"
+      "-Q <qemu-save-file>          Use qemu save file for VM fork.\n"
+      "--launch-dm <yes|no|late>    Launch device model (QEMU) for VM fork.\n"
+      "--fork-reset                 Reset VM fork.\n"
+      "-p                           Do not unpause fork VM after operation.\n"
+      "-d                           Enable debug messages.\n"
+    },
 #endif
     { "dump-core",
       &main_dump_core, 0, 1,
diff --git a/tools/xl/xl_saverestore.c b/tools/xl/xl_saverestore.c
index 9be033fe65..d99d3eceb2 100644
--- a/tools/xl/xl_saverestore.c
+++ b/tools/xl/xl_saverestore.c
@@ -229,6 +229,103 @@ int main_restore(int argc, char **argv)
     return EXIT_SUCCESS;
 }
 
+int main_fork_vm(int argc, char **argv)
+{
+    int rc, debug = 0;
+    uint32_t domid_in = INVALID_DOMID, domid_out = INVALID_DOMID;
+    int launch_dm = 1;
+    bool reset = 0;
+    bool pause = 0;
+    const char *config_file = NULL;
+    const char *dm_restore_file = NULL;
+
+    int opt;
+    static struct option opts[] = {
+        {"launch-dm", 1, 0, 'l'},
+        {"fork-reset", 0, 0, 'r'},
+        COMMON_LONG_OPTS
+    };
+
+    SWITCH_FOREACH_OPT(opt, "phdC:Q:l:rN:D:B:V:", opts, "fork-vm", 1) {
+    case 'd':
+        debug = 1;
+        break;
+    case 'p':
+        pause = 1;
+        break;
+    case 'C':
+        config_file = optarg;
+        break;
+    case 'Q':
+        dm_restore_file = optarg;
+        break;
+    case 'l':
+        if ( !strcmp(optarg, "no") )
+            launch_dm = 0;
+        if ( !strcmp(optarg, "yes") )
+            launch_dm = 1;
+        if ( !strcmp(optarg, "late") )
+            launch_dm = 2;
+        break;
+    case 'r':
+        reset = 1;
+        break;
+    case 'N': /* fall-through */
+    case 'D': /* fall-through */
+    case 'B': /* fall-through */
+    case 'V':
+        fprintf(stderr, "Unimplemented option(s)\n");
+        return EXIT_FAILURE;
+    }
+
+    if (argc-optind == 1) {
+        domid_in = atoi(argv[optind]);
+    } else {
+        help("fork-vm");
+        return EXIT_FAILURE;
+    }
+
+    if (launch_dm && (!config_file || !dm_restore_file)) {
+        fprintf(stderr, "Currently you must provide both -C and -Q options\n");
+        return EXIT_FAILURE;
+    }
+
+    if (reset) {
+        domid_out = domid_in;
+        if (libxl_domain_fork_reset(ctx, domid_in) == EXIT_FAILURE)
+            return EXIT_FAILURE;
+    }
+
+    if (launch_dm == 2 || reset) {
+        domid_out = domid_in;
+        rc = EXIT_SUCCESS;
+    } else
+        rc = libxl_domain_fork_vm(ctx, domid_in, &domid_out);
+
+    if (rc == EXIT_SUCCESS) {
+        if ( launch_dm ) {
+            struct domain_create dom_info;
+            memset(&dom_info, 0, sizeof(dom_info));
+            dom_info.ddomid = domid_out;
+            dom_info.dm_restore_file = dm_restore_file;
+            dom_info.debug = debug;
+            dom_info.paused = pause;
+            dom_info.config_file = config_file;
+            dom_info.migrate_fd = -1;
+            dom_info.send_back_fd = -1;
+            rc = create_domain(&dom_info) < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
+        } else if ( !pause )
+            rc = libxl_domain_unpause(ctx, domid_out, NULL);
+    }
+
+    if (rc == EXIT_SUCCESS)
+        fprintf(stderr, "fork-vm command successfully returned domid: %u\n", domid_out);
+    else if ( domid_out != INVALID_DOMID )
+        libxl_domain_destroy(ctx, domid_out, 0);
+
+    return rc;
+}
+
 int main_save(int argc, char **argv)
 {
     uint32_t domid;
diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index e520b1da79..d9cb19c599 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -645,6 +645,7 @@ int create_domain(struct domain_create *dom_info)
 
     libxl_domain_config d_config;
 
+    uint32_t ddomid = dom_info->ddomid; // launch dm for this domain iff set
     int debug = dom_info->debug;
     int daemonize = dom_info->daemonize;
     int monitor = dom_info->monitor;
@@ -655,6 +656,7 @@ int create_domain(struct domain_create *dom_info)
     const char *restore_file = dom_info->restore_file;
     const char *config_source = NULL;
     const char *restore_source = NULL;
+    const char *dm_restore_file = dom_info->dm_restore_file;
     int migrate_fd = dom_info->migrate_fd;
     bool config_in_json;
 
@@ -923,6 +925,12 @@ start:
          * restore/migrate-receive it again.
          */
         restoring = 0;
+    } else if ( ddomid ) {
+        d_config.dm_restore_file = dm_restore_file;
+        ret = libxl_domain_fork_launch_dm(ctx, &d_config, ddomid,
+                                          autoconnect_console_how);
+        domid = ddomid;
+        ddomid = INVALID_DOMID;
     } else if (domid_soft_reset != INVALID_DOMID) {
         /* Do soft reset. */
         ret = libxl_domain_soft_reset(ctx, &d_config, domid_soft_reset,
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH v8 1/5] x86/p2m: Allow p2m_get_page_from_gfn to return shared entries
  2020-02-10 19:21 ` [Xen-devel] [PATCH v8 1/5] x86/p2m: Allow p2m_get_page_from_gfn to return shared entries Tamas K Lengyel
@ 2020-02-11  9:16   ` Jan Beulich
  2020-02-11 10:29     ` Tamas K Lengyel
  2020-02-21 13:48   ` Andrew Cooper
  1 sibling, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2020-02-11  9:16 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: George Dunlap, xen-devel, Roger Pau Monné, Wei Liu, Andrew Cooper

On 10.02.2020 20:21, Tamas K Lengyel wrote:
> The owner domain of shared pages is dom_cow, use that for get_page
> otherwise the function fails to return the correct page under some
> situations. The check if dom_cow should be used was only performed in
> a subset of use-cases. Fixing the error and simplifying the existing check
> since we can't have any shared entries with dom_cow being NULL.
> 
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>

I find it quite disappointing that the blank lines requested to be
added ...

> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -574,11 +574,12 @@ struct page_info *p2m_get_page_from_gfn(
>                  if ( fdom == NULL )
>                      page = NULL;
>              }
> -            else if ( !get_page(page, p2m->domain) &&
> -                      /* Page could be shared */
> -                      (!dom_cow || !p2m_is_shared(*t) ||
> -                       !get_page(page, dom_cow)) )
> -                page = NULL;
> +            else
> +            {
> +                struct domain *d = !p2m_is_shared(*t) ? p2m->domain : dom_cow;
> +                if ( !get_page(page, d) )

.. above here and ...

> @@ -594,8 +595,9 @@ struct page_info *p2m_get_page_from_gfn(
>      mfn = get_gfn_type_access(p2m, gfn_x(gfn), t, a, q, NULL);
>      if ( p2m_is_ram(*t) && mfn_valid(mfn) )
>      {
> +        struct domain *d = !p2m_is_shared(*t) ? p2m->domain : dom_cow;
>          page = mfn_to_page(mfn);

... above here still haven't appeared. No matter that it's easy to
do so while committing, when you send a new version you should
really address such remarks yourself, I think.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH v8 1/5] x86/p2m: Allow p2m_get_page_from_gfn to return shared entries
  2020-02-11  9:16   ` Jan Beulich
@ 2020-02-11 10:29     ` Tamas K Lengyel
  2020-02-11 11:04       ` Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Tamas K Lengyel @ 2020-02-11 10:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel, Roger Pau Monné

On Tue, Feb 11, 2020 at 2:17 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 10.02.2020 20:21, Tamas K Lengyel wrote:
> > The owner domain of shared pages is dom_cow, use that for get_page
> > otherwise the function fails to return the correct page under some
> > situations. The check if dom_cow should be used was only performed in
> > a subset of use-cases. Fixing the error and simplifying the existing check
> > since we can't have any shared entries with dom_cow being NULL.
> >
> > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
>
> I find it quite disappointing that the blank lines requested to be
> added ...
>
> > --- a/xen/arch/x86/mm/p2m.c
> > +++ b/xen/arch/x86/mm/p2m.c
> > @@ -574,11 +574,12 @@ struct page_info *p2m_get_page_from_gfn(
> >                  if ( fdom == NULL )
> >                      page = NULL;
> >              }
> > -            else if ( !get_page(page, p2m->domain) &&
> > -                      /* Page could be shared */
> > -                      (!dom_cow || !p2m_is_shared(*t) ||
> > -                       !get_page(page, dom_cow)) )
> > -                page = NULL;
> > +            else
> > +            {
> > +                struct domain *d = !p2m_is_shared(*t) ? p2m->domain : dom_cow;
> > +                if ( !get_page(page, d) )
>
> .. above here and ...
>
> > @@ -594,8 +595,9 @@ struct page_info *p2m_get_page_from_gfn(
> >      mfn = get_gfn_type_access(p2m, gfn_x(gfn), t, a, q, NULL);
> >      if ( p2m_is_ram(*t) && mfn_valid(mfn) )
> >      {
> > +        struct domain *d = !p2m_is_shared(*t) ? p2m->domain : dom_cow;
> >          page = mfn_to_page(mfn);
>
> ... above here still haven't appeared. No matter that it's easy to
> do so while committing, when you send a new version you should
> really address such remarks yourself, I think.

Noted. I haven't addressed it since it appeared to me that this patch
has been ready to go in for like 3 revisions already as-is given the
blank-lines were non-blockers. By the time I get around rolling a new
one I simply forget nuisance style issues like this. I know we have
been having the discussion about having automated style-checks and
style-formatting added to Xen, this just further highlights to me the
need for it as we are wasting time and energy on stuff like this for
no real reason.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH v8 1/5] x86/p2m: Allow p2m_get_page_from_gfn to return shared entries
  2020-02-11 10:29     ` Tamas K Lengyel
@ 2020-02-11 11:04       ` Jan Beulich
  2020-02-11 13:34         ` Tamas K Lengyel
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2020-02-11 11:04 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel, Roger Pau Monné

On 11.02.2020 11:29, Tamas K Lengyel wrote:
> On Tue, Feb 11, 2020 at 2:17 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 10.02.2020 20:21, Tamas K Lengyel wrote:
>>> The owner domain of shared pages is dom_cow, use that for get_page
>>> otherwise the function fails to return the correct page under some
>>> situations. The check if dom_cow should be used was only performed in
>>> a subset of use-cases. Fixing the error and simplifying the existing check
>>> since we can't have any shared entries with dom_cow being NULL.
>>>
>>> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
>>
>> I find it quite disappointing that the blank lines requested to be
>> added ...
>>
>>> --- a/xen/arch/x86/mm/p2m.c
>>> +++ b/xen/arch/x86/mm/p2m.c
>>> @@ -574,11 +574,12 @@ struct page_info *p2m_get_page_from_gfn(
>>>                  if ( fdom == NULL )
>>>                      page = NULL;
>>>              }
>>> -            else if ( !get_page(page, p2m->domain) &&
>>> -                      /* Page could be shared */
>>> -                      (!dom_cow || !p2m_is_shared(*t) ||
>>> -                       !get_page(page, dom_cow)) )
>>> -                page = NULL;
>>> +            else
>>> +            {
>>> +                struct domain *d = !p2m_is_shared(*t) ? p2m->domain : dom_cow;
>>> +                if ( !get_page(page, d) )
>>
>> .. above here and ...
>>
>>> @@ -594,8 +595,9 @@ struct page_info *p2m_get_page_from_gfn(
>>>      mfn = get_gfn_type_access(p2m, gfn_x(gfn), t, a, q, NULL);
>>>      if ( p2m_is_ram(*t) && mfn_valid(mfn) )
>>>      {
>>> +        struct domain *d = !p2m_is_shared(*t) ? p2m->domain : dom_cow;
>>>          page = mfn_to_page(mfn);
>>
>> ... above here still haven't appeared. No matter that it's easy to
>> do so while committing, when you send a new version you should
>> really address such remarks yourself, I think.
> 
> Noted. I haven't addressed it since it appeared to me that this patch
> has been ready to go in for like 3 revisions already as-is given the
> blank-lines were non-blockers.

The patch continues to lack a maintainer ack. Hence it hasn't been
ready to go in at any point in time.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH v8 1/5] x86/p2m: Allow p2m_get_page_from_gfn to return shared entries
  2020-02-11 11:04       ` Jan Beulich
@ 2020-02-11 13:34         ` Tamas K Lengyel
  0 siblings, 0 replies; 20+ messages in thread
From: Tamas K Lengyel @ 2020-02-11 13:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel, Roger Pau Monné

On Tue, Feb 11, 2020 at 4:04 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 11.02.2020 11:29, Tamas K Lengyel wrote:
> > On Tue, Feb 11, 2020 at 2:17 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 10.02.2020 20:21, Tamas K Lengyel wrote:
> >>> The owner domain of shared pages is dom_cow, use that for get_page
> >>> otherwise the function fails to return the correct page under some
> >>> situations. The check if dom_cow should be used was only performed in
> >>> a subset of use-cases. Fixing the error and simplifying the existing check
> >>> since we can't have any shared entries with dom_cow being NULL.
> >>>
> >>> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> >>
> >> I find it quite disappointing that the blank lines requested to be
> >> added ...
> >>
> >>> --- a/xen/arch/x86/mm/p2m.c
> >>> +++ b/xen/arch/x86/mm/p2m.c
> >>> @@ -574,11 +574,12 @@ struct page_info *p2m_get_page_from_gfn(
> >>>                  if ( fdom == NULL )
> >>>                      page = NULL;
> >>>              }
> >>> -            else if ( !get_page(page, p2m->domain) &&
> >>> -                      /* Page could be shared */
> >>> -                      (!dom_cow || !p2m_is_shared(*t) ||
> >>> -                       !get_page(page, dom_cow)) )
> >>> -                page = NULL;
> >>> +            else
> >>> +            {
> >>> +                struct domain *d = !p2m_is_shared(*t) ? p2m->domain : dom_cow;
> >>> +                if ( !get_page(page, d) )
> >>
> >> .. above here and ...
> >>
> >>> @@ -594,8 +595,9 @@ struct page_info *p2m_get_page_from_gfn(
> >>>      mfn = get_gfn_type_access(p2m, gfn_x(gfn), t, a, q, NULL);
> >>>      if ( p2m_is_ram(*t) && mfn_valid(mfn) )
> >>>      {
> >>> +        struct domain *d = !p2m_is_shared(*t) ? p2m->domain : dom_cow;
> >>>          page = mfn_to_page(mfn);
> >>
> >> ... above here still haven't appeared. No matter that it's easy to
> >> do so while committing, when you send a new version you should
> >> really address such remarks yourself, I think.
> >
> > Noted. I haven't addressed it since it appeared to me that this patch
> > has been ready to go in for like 3 revisions already as-is given the
> > blank-lines were non-blockers.
>
> The patch continues to lack a maintainer ack. Hence it hasn't been
> ready to go in at any point in time.

I meant there has been no comments or anything blocking noted for
three resends now.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH v8 3/5] xen/mem_sharing: VM forking
  2020-02-10 19:21 ` [Xen-devel] [PATCH v8 3/5] xen/mem_sharing: VM forking Tamas K Lengyel
@ 2020-02-21 13:43   ` Andrew Cooper
  2020-02-21 14:02     ` Andrew Cooper
  2020-02-21 14:42   ` Andrew Cooper
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Cooper @ 2020-02-21 13:43 UTC (permalink / raw)
  To: Tamas K Lengyel, xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Ian Jackson, Tamas K Lengyel, Jan Beulich,
	Roger Pau Monné

On 10/02/2020 19:21, Tamas K Lengyel wrote:
> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> index 3835bc928f..ccf338918d 100644
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -36,6 +37,9 @@
>  #include <asm/altp2m.h>
>  #include <asm/atomic.h>
>  #include <asm/event.h>
> +#include <asm/hap.h>
> +#include <asm/hvm/hvm.h>
> +#include <asm/hvm/save.h>

This include is stale, I think.

> +static void fork_tsc(struct domain *cd, struct domain *d)
> +{
> +    uint32_t tsc_mode;
> +    uint32_t gtsc_khz;
> +    uint32_t incarnation;
> +    uint64_t elapsed_nsec;
> +
> +    tsc_get_info(d, &tsc_mode, &elapsed_nsec, &gtsc_khz, &incarnation);
> +    tsc_set_info(cd, tsc_mode, elapsed_nsec, gtsc_khz, incarnation);

Sadly, get and set are asymetric.  For reasons best understood by the
original authors, incarnation gets automatically incremented on set,
rather than happing as part of migration where it logically lives.

As a result, you probably want to set incarnation - 1, and leave a
comment saying "Don't bump the incarnation" or similar.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH v8 1/5] x86/p2m: Allow p2m_get_page_from_gfn to return shared entries
  2020-02-10 19:21 ` [Xen-devel] [PATCH v8 1/5] x86/p2m: Allow p2m_get_page_from_gfn to return shared entries Tamas K Lengyel
  2020-02-11  9:16   ` Jan Beulich
@ 2020-02-21 13:48   ` Andrew Cooper
  2020-02-21 14:21     ` Tamas K Lengyel
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Cooper @ 2020-02-21 13:48 UTC (permalink / raw)
  To: Tamas K Lengyel, xen-devel
  Cc: George Dunlap, Wei Liu, Jan Beulich, Roger Pau Monné

On 10/02/2020 19:21, Tamas K Lengyel wrote:
> The owner domain of shared pages is dom_cow, use that for get_page
> otherwise the function fails to return the correct page under some
> situations. The check if dom_cow should be used was only performed in
> a subset of use-cases. Fixing the error and simplifying the existing check
> since we can't have any shared entries with dom_cow being NULL.
>
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>

Given the recent change in p2m maintainership, I've committed this and
fixed up the style issues.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH v8 3/5] xen/mem_sharing: VM forking
  2020-02-21 13:43   ` Andrew Cooper
@ 2020-02-21 14:02     ` Andrew Cooper
  2020-02-21 14:22       ` Tamas K Lengyel
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Cooper @ 2020-02-21 14:02 UTC (permalink / raw)
  To: Tamas K Lengyel, xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Ian Jackson, Tamas K Lengyel, Jan Beulich,
	Roger Pau Monné

On 21/02/2020 13:43, Andrew Cooper wrote:
> On 10/02/2020 19:21, Tamas K Lengyel wrote:
>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
>> index 3835bc928f..ccf338918d 100644
>> --- a/xen/arch/x86/mm/mem_sharing.c
>> +++ b/xen/arch/x86/mm/mem_sharing.c
>> @@ -36,6 +37,9 @@
>>  #include <asm/altp2m.h>
>>  #include <asm/atomic.h>
>>  #include <asm/event.h>
>> +#include <asm/hap.h>
>> +#include <asm/hvm/hvm.h>
>> +#include <asm/hvm/save.h>
> This include is stale, I think.
>
>> +static void fork_tsc(struct domain *cd, struct domain *d)
>> +{
>> +    uint32_t tsc_mode;
>> +    uint32_t gtsc_khz;
>> +    uint32_t incarnation;
>> +    uint64_t elapsed_nsec;
>> +
>> +    tsc_get_info(d, &tsc_mode, &elapsed_nsec, &gtsc_khz, &incarnation);
>> +    tsc_set_info(cd, tsc_mode, elapsed_nsec, gtsc_khz, incarnation);
> Sadly, get and set are asymetric.  For reasons best understood by the
> original authors, incarnation gets automatically incremented on set,
> rather than happing as part of migration where it logically lives.
>
> As a result, you probably want to set incarnation - 1, and leave a
> comment saying "Don't bump the incarnation" or similar.

P.S. Can both be fixed on commit if you agree.  Seems pointless sending
a v9 just for these two.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH v8 1/5] x86/p2m: Allow p2m_get_page_from_gfn to return shared entries
  2020-02-21 13:48   ` Andrew Cooper
@ 2020-02-21 14:21     ` Tamas K Lengyel
  0 siblings, 0 replies; 20+ messages in thread
From: Tamas K Lengyel @ 2020-02-21 14:21 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Jan Beulich, Xen-devel,
	Roger Pau Monné

On Fri, Feb 21, 2020 at 6:49 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 10/02/2020 19:21, Tamas K Lengyel wrote:
> > The owner domain of shared pages is dom_cow, use that for get_page
> > otherwise the function fails to return the correct page under some
> > situations. The check if dom_cow should be used was only performed in
> > a subset of use-cases. Fixing the error and simplifying the existing check
> > since we can't have any shared entries with dom_cow being NULL.
> >
> > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
>
> Given the recent change in p2m maintainership, I've committed this and
> fixed up the style issues.

Thanks, appreciated!

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH v8 3/5] xen/mem_sharing: VM forking
  2020-02-21 14:02     ` Andrew Cooper
@ 2020-02-21 14:22       ` Tamas K Lengyel
  0 siblings, 0 replies; 20+ messages in thread
From: Tamas K Lengyel @ 2020-02-21 14:22 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tamas K Lengyel, Ian Jackson, Jan Beulich,
	Xen-devel, Roger Pau Monné

On Fri, Feb 21, 2020 at 7:02 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 21/02/2020 13:43, Andrew Cooper wrote:
> > On 10/02/2020 19:21, Tamas K Lengyel wrote:
> >> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> >> index 3835bc928f..ccf338918d 100644
> >> --- a/xen/arch/x86/mm/mem_sharing.c
> >> +++ b/xen/arch/x86/mm/mem_sharing.c
> >> @@ -36,6 +37,9 @@
> >>  #include <asm/altp2m.h>
> >>  #include <asm/atomic.h>
> >>  #include <asm/event.h>
> >> +#include <asm/hap.h>
> >> +#include <asm/hvm/hvm.h>
> >> +#include <asm/hvm/save.h>
> > This include is stale, I think.
> >
> >> +static void fork_tsc(struct domain *cd, struct domain *d)
> >> +{
> >> +    uint32_t tsc_mode;
> >> +    uint32_t gtsc_khz;
> >> +    uint32_t incarnation;
> >> +    uint64_t elapsed_nsec;
> >> +
> >> +    tsc_get_info(d, &tsc_mode, &elapsed_nsec, &gtsc_khz, &incarnation);
> >> +    tsc_set_info(cd, tsc_mode, elapsed_nsec, gtsc_khz, incarnation);
> > Sadly, get and set are asymetric.  For reasons best understood by the
> > original authors, incarnation gets automatically incremented on set,
> > rather than happing as part of migration where it logically lives.
> >
> > As a result, you probably want to set incarnation - 1, and leave a
> > comment saying "Don't bump the incarnation" or similar.
>
> P.S. Can both be fixed on commit if you agree.  Seems pointless sending
> a v9 just for these two.

Great, I have no issue with these changes.

Thanks!
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH v8 3/5] xen/mem_sharing: VM forking
  2020-02-10 19:21 ` [Xen-devel] [PATCH v8 3/5] xen/mem_sharing: VM forking Tamas K Lengyel
  2020-02-21 13:43   ` Andrew Cooper
@ 2020-02-21 14:42   ` Andrew Cooper
  2020-02-21 17:07     ` Tamas K Lengyel
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Cooper @ 2020-02-21 14:42 UTC (permalink / raw)
  To: Tamas K Lengyel, xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Ian Jackson, Tamas K Lengyel, Jan Beulich,
	Roger Pau Monné

On 10/02/2020 19:21, Tamas K Lengyel wrote:
> +static int mem_sharing_fork(struct domain *d, struct domain *cd)
> +{
> +    int rc = -EINVAL;
> +
> +    if ( !cd->controller_pause_count )
> +        return rc;
> +
> +    /*
> +     * We only want to get and pause the parent once, not each time this
> +     * operation is restarted due to preemption.
> +     */
> +    if ( !cd->parent_paused )
> +    {
> +        ASSERT(get_domain(d));
> +        domain_pause(d);
> +
> +        cd->parent_paused = true;
> +        cd->max_pages = d->max_pages;
> +        cd->max_vcpus = d->max_vcpus;

Sorry, I spoke too soon.  You can't modify max_vcpus here, because it
violates the invariant that domain_vcpu() depends upon for safety.

If the toolstack gets things wrong, Xen will either leak struct vcpu's
on cd's teardown, or corrupt memory beyond the end of the cd->vcpu[] array.

Looking at the hypercall semantics, userspace creates a new domain
(which specifies max_cpus), then calls mem_sharing_fork(parent_dom,
new_dom);  Forking should be rejected if toolstack hasn't chosen the
same number of vcpus for the new domain.

This raises the question of whether the same should be true for
max_pages as well.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH v8 3/5] xen/mem_sharing: VM forking
  2020-02-21 14:42   ` Andrew Cooper
@ 2020-02-21 17:07     ` Tamas K Lengyel
  2020-02-21 17:47       ` Andrew Cooper
  0 siblings, 1 reply; 20+ messages in thread
From: Tamas K Lengyel @ 2020-02-21 17:07 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Jan Beulich,
	Xen-devel, Roger Pau Monné

On Fri, Feb 21, 2020 at 7:42 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 10/02/2020 19:21, Tamas K Lengyel wrote:
> > +static int mem_sharing_fork(struct domain *d, struct domain *cd)
> > +{
> > +    int rc = -EINVAL;
> > +
> > +    if ( !cd->controller_pause_count )
> > +        return rc;
> > +
> > +    /*
> > +     * We only want to get and pause the parent once, not each time this
> > +     * operation is restarted due to preemption.
> > +     */
> > +    if ( !cd->parent_paused )
> > +    {
> > +        ASSERT(get_domain(d));
> > +        domain_pause(d);
> > +
> > +        cd->parent_paused = true;
> > +        cd->max_pages = d->max_pages;
> > +        cd->max_vcpus = d->max_vcpus;
>
> Sorry, I spoke too soon.  You can't modify max_vcpus here, because it
> violates the invariant that domain_vcpu() depends upon for safety.
>
> If the toolstack gets things wrong, Xen will either leak struct vcpu's
> on cd's teardown, or corrupt memory beyond the end of the cd->vcpu[] array.
>
> Looking at the hypercall semantics, userspace creates a new domain
> (which specifies max_cpus), then calls mem_sharing_fork(parent_dom,
> new_dom);  Forking should be rejected if toolstack hasn't chosen the
> same number of vcpus for the new domain.

That's unfortunate since this would require an extra hypercall just to
get information Xen already has. I think instead of what you recommend
what I'll do is extend XEN_DOMCTL_createdomain to include the parent
domain's ID already so Xen can gather these information automatically
without the toolstack having to do it this roundabout way.

>
> This raises the question of whether the same should be true for
> max_pages as well.

Could you expand on this?

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH v8 3/5] xen/mem_sharing: VM forking
  2020-02-21 17:07     ` Tamas K Lengyel
@ 2020-02-21 17:47       ` Andrew Cooper
  2020-02-21 17:56         ` Tamas K Lengyel
  2020-02-21 18:08         ` Tamas K Lengyel
  0 siblings, 2 replies; 20+ messages in thread
From: Andrew Cooper @ 2020-02-21 17:47 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Jan Beulich,
	Xen-devel, Roger Pau Monné

On 21/02/2020 17:07, Tamas K Lengyel wrote:
> On Fri, Feb 21, 2020 at 7:42 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 10/02/2020 19:21, Tamas K Lengyel wrote:
>>> +static int mem_sharing_fork(struct domain *d, struct domain *cd)
>>> +{
>>> +    int rc = -EINVAL;
>>> +
>>> +    if ( !cd->controller_pause_count )
>>> +        return rc;
>>> +
>>> +    /*
>>> +     * We only want to get and pause the parent once, not each time this
>>> +     * operation is restarted due to preemption.
>>> +     */
>>> +    if ( !cd->parent_paused )
>>> +    {
>>> +        ASSERT(get_domain(d));
>>> +        domain_pause(d);
>>> +
>>> +        cd->parent_paused = true;
>>> +        cd->max_pages = d->max_pages;
>>> +        cd->max_vcpus = d->max_vcpus;
>> Sorry, I spoke too soon.  You can't modify max_vcpus here, because it
>> violates the invariant that domain_vcpu() depends upon for safety.
>>
>> If the toolstack gets things wrong, Xen will either leak struct vcpu's
>> on cd's teardown, or corrupt memory beyond the end of the cd->vcpu[] array.
>>
>> Looking at the hypercall semantics, userspace creates a new domain
>> (which specifies max_cpus), then calls mem_sharing_fork(parent_dom,
>> new_dom);  Forking should be rejected if toolstack hasn't chosen the
>> same number of vcpus for the new domain.
> That's unfortunate since this would require an extra hypercall just to
> get information Xen already has. I think instead of what you recommend
> what I'll do is extend XEN_DOMCTL_createdomain to include the parent
> domain's ID already so Xen can gather these information automatically
> without the toolstack having to do it this roundabout way.

Conceptually, I think that is cleaner.  You really do want to duplicate
the parent domain, whatever its settings are.

However, I'd suggest not extending createdomain.  What should the
semantics be for such a call which passes conflicting settings?

How about a new top level domctl for clone_domain, taking a parent
domain identifier, and optionally providing a specific new domid?  This
way, it is impossible to provide conflicting settings, and the semantics
should be quite clear.  Internally, Xen can do whatever it needs to copy
appropriate settings, and get things sorted before the domain becomes
globally visible.

One question needing considering is whether such a hypercall could in
theory be useful without the mem_sharing infrastructure, or not.  (i.e.
how tightly integrated we should aim for.)

>> This raises the question of whether the same should be true for
>> max_pages as well.
> Could you expand on this?

Well - these two are a very odd subset of things to blindly copy.  The
max_cpus side is wrong, which makes max_pages likely to be wrong as well.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH v8 3/5] xen/mem_sharing: VM forking
  2020-02-21 17:47       ` Andrew Cooper
@ 2020-02-21 17:56         ` Tamas K Lengyel
  2020-02-21 18:08         ` Tamas K Lengyel
  1 sibling, 0 replies; 20+ messages in thread
From: Tamas K Lengyel @ 2020-02-21 17:56 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Jan Beulich,
	Xen-devel, Roger Pau Monné

On Fri, Feb 21, 2020 at 10:47 AM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 21/02/2020 17:07, Tamas K Lengyel wrote:
> > On Fri, Feb 21, 2020 at 7:42 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >> On 10/02/2020 19:21, Tamas K Lengyel wrote:
> >>> +static int mem_sharing_fork(struct domain *d, struct domain *cd)
> >>> +{
> >>> +    int rc = -EINVAL;
> >>> +
> >>> +    if ( !cd->controller_pause_count )
> >>> +        return rc;
> >>> +
> >>> +    /*
> >>> +     * We only want to get and pause the parent once, not each time this
> >>> +     * operation is restarted due to preemption.
> >>> +     */
> >>> +    if ( !cd->parent_paused )
> >>> +    {
> >>> +        ASSERT(get_domain(d));
> >>> +        domain_pause(d);
> >>> +
> >>> +        cd->parent_paused = true;
> >>> +        cd->max_pages = d->max_pages;
> >>> +        cd->max_vcpus = d->max_vcpus;
> >> Sorry, I spoke too soon.  You can't modify max_vcpus here, because it
> >> violates the invariant that domain_vcpu() depends upon for safety.
> >>
> >> If the toolstack gets things wrong, Xen will either leak struct vcpu's
> >> on cd's teardown, or corrupt memory beyond the end of the cd->vcpu[] array.
> >>
> >> Looking at the hypercall semantics, userspace creates a new domain
> >> (which specifies max_cpus), then calls mem_sharing_fork(parent_dom,
> >> new_dom);  Forking should be rejected if toolstack hasn't chosen the
> >> same number of vcpus for the new domain.
> > That's unfortunate since this would require an extra hypercall just to
> > get information Xen already has. I think instead of what you recommend
> > what I'll do is extend XEN_DOMCTL_createdomain to include the parent
> > domain's ID already so Xen can gather these information automatically
> > without the toolstack having to do it this roundabout way.
>
> Conceptually, I think that is cleaner.  You really do want to duplicate
> the parent domain, whatever its settings are.
>
> However, I'd suggest not extending createdomain.  What should the
> semantics be for such a call which passes conflicting settings?

I'm not sure what conflicting settings you worry about? I simply add a
field to xen_domctl_createdomain called parent_domain, then in the
domctl handler if its set we copy the max_vcpus value directly from
there:

        parent_dom = op->u.createdomain.parent_domid;
        if ( parent_dom )
        {
            struct domain *pd = rcu_lock_domain_by_id(parent_dom);

            ret = -EINVAL;
            if ( !pd )
                break;

            op->u.createdomain.max_vcpus = pd->max_vcpus;
            rcu_unlock_domain(pd);
        }

>
> How about a new top level domctl for clone_domain, taking a parent
> domain identifier, and optionally providing a specific new domid?  This
> way, it is impossible to provide conflicting settings, and the semantics
> should be quite clear.  Internally, Xen can do whatever it needs to copy
> appropriate settings, and get things sorted before the domain becomes
> globally visible.

I already have a hypercall added for fork_domain and I even tried
doing everything in a single hypercall. The problem I ran into with
that was it required a lot of refactoring within Xen since
domain_create is not preemptible right now, while some other parts of
forking need to be preemptible. So it was just easier to do it with 2
hypercalls. One just creates the domain shell via
XEN_DOMCTL_createdomain and the second actually sets it up as a fork.

>
> One question needing considering is whether such a hypercall could in
> theory be useful without the mem_sharing infrastructure, or not.  (i.e.
> how tightly integrated we should aim for.)
>
> >> This raises the question of whether the same should be true for
> >> max_pages as well.
> > Could you expand on this?
>
> Well - these two are a very odd subset of things to blindly copy.  The
> max_cpus side is wrong, which makes max_pages likely to be wrong as well.

The max_cpus side is clearer why it's wrong since there is an
allocation during domain_create based on that number. I haven't ran
into that issue it seems because all the domains I tested with had
only a single vCPU. But max_pages should be safe to copy, since any
page that would get accessed up to max_pages would get forked from the
parent. I haven't run into any issues with that. So I don't really see
what's the problem there.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH v8 3/5] xen/mem_sharing: VM forking
  2020-02-21 17:47       ` Andrew Cooper
  2020-02-21 17:56         ` Tamas K Lengyel
@ 2020-02-21 18:08         ` Tamas K Lengyel
  1 sibling, 0 replies; 20+ messages in thread
From: Tamas K Lengyel @ 2020-02-21 18:08 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson, Jan Beulich,
	Xen-devel, Roger Pau Monné

> One question needing considering is whether such a hypercall could in
> theory be useful without the mem_sharing infrastructure, or not.  (i.e.
> how tightly integrated we should aim for.)

It would be possible to create domain forks without mem_sharing. The
mem_sharing part just adds an extra optimization on top so we don't
end up copying all accessed pages needlessly, we only do that when the
page is written to. In one of the earlier revisions of the series
(~v4) due to a bug we actually ran the system with all pages being
deduplicated and no mem_sharing and it was working just fine. I'm not
sure if that's what you were asking though :)

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-02-21 18:09 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-10 19:21 [Xen-devel] [PATCH v8 0/5] VM forking Tamas K Lengyel
2020-02-10 19:21 ` [Xen-devel] [PATCH v8 1/5] x86/p2m: Allow p2m_get_page_from_gfn to return shared entries Tamas K Lengyel
2020-02-11  9:16   ` Jan Beulich
2020-02-11 10:29     ` Tamas K Lengyel
2020-02-11 11:04       ` Jan Beulich
2020-02-11 13:34         ` Tamas K Lengyel
2020-02-21 13:48   ` Andrew Cooper
2020-02-21 14:21     ` Tamas K Lengyel
2020-02-10 19:21 ` [Xen-devel] [PATCH v8 2/5] xen/x86: Make hap_get_allocation accessible Tamas K Lengyel
2020-02-10 19:21 ` [Xen-devel] [PATCH v8 3/5] xen/mem_sharing: VM forking Tamas K Lengyel
2020-02-21 13:43   ` Andrew Cooper
2020-02-21 14:02     ` Andrew Cooper
2020-02-21 14:22       ` Tamas K Lengyel
2020-02-21 14:42   ` Andrew Cooper
2020-02-21 17:07     ` Tamas K Lengyel
2020-02-21 17:47       ` Andrew Cooper
2020-02-21 17:56         ` Tamas K Lengyel
2020-02-21 18:08         ` Tamas K Lengyel
2020-02-10 19:21 ` [Xen-devel] [PATCH v8 4/5] x86/mem_sharing: reset a fork Tamas K Lengyel
2020-02-10 19:21 ` [Xen-devel] [PATCH v8 5/5] xen/tools: VM forking toolstack side Tamas K Lengyel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).