All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xen-devel] [PATCH v9 0/5] VM forking
@ 2020-02-21 18:49 Tamas K Lengyel
  2020-02-21 18:49 ` [Xen-devel] [PATCH v9 1/5] xen/x86: Make hap_get_allocation accessible Tamas K Lengyel
                   ` (4 more replies)
  0 siblings, 5 replies; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-21 18:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Tamas K Lengyel, Jan Beulich, Anthony PERARD, Julien Grall,
	Roger Pau Monné

The following series implements VM forking for Intel HVM guests to allow for
the fast creation of identical VMs without the assosciated high startup costs
of booting or restoring the VM from a savefile.

JIRA issue: https://xenproject.atlassian.net/browse/XEN-89

The fork operation is implemented as part of the "xl fork-vm" command:
    xl fork-vm -C <config_file_for_fork> -Q <qemu_save_file> <parent_domid>
    
By default a fully functional fork is created. The user is in charge however to
create the appropriate config file for the fork and to generate the QEMU save
file before the fork-vm call is made. The config file needs to give the
fork a new name at minimum but other settings may also require changes.

The interface also allows to split the forking into two steps:
    xl fork-vm --launch-dm no \
               -p <parent_domid>
    xl fork-vm --launch-dm late \
               -C <config_file_for_fork> \
               -Q <qemu_save_file> \
               <fork_domid>

The split creation model is useful when the VM needs to be created as fast as
possible. The forked VM can be unpaused without the device model being launched
to be monitored and accessed via VMI. Note however that without its device
model running (depending on what is executing in the VM) it is bound to
misbehave or even crash when its trying to access devices that would be
emulated by QEMU. We anticipate that for certain use-cases this would be an
acceptable situation, in case for example when fuzzing is performed of code
segments that don't access such devices.

Launching the device model requires the QEMU Xen savefile to be generated
manually from the parent VM. This can be accomplished simply by connecting to
its QMP socket and issuing the "xen-save-devices-state" command. For example
using the standard tool socat these commands can be used to generate the file:
    socat - UNIX-CONNECT:/var/run/xen/qmp-libxl-<parent_domid>
    { "execute": "qmp_capabilities" }
    { "execute": "xen-save-devices-state", \
        "arguments": { "filename": "/path/to/save/qemu_state", \
                        "live": false} }

At runtime the forked VM starts running with an empty p2m which gets lazily
populated when the VM generates EPT faults, similar to how altp2m views are
populated. If the memory access is a read-only access, the p2m entry is
populated with a memory shared entry with its parent. For write memory accesses
or in case memory sharing wasn't possible (for example in case a reference is
held by a third party), a new page is allocated and the page contents are
copied over from the parent VM. Forks can be further forked if needed, thus
allowing for further memory savings.

A VM fork reset hypercall is also added that allows the fork to be reset to the
state it was just after a fork, also accessible via xl:
    xl fork-vm --fork-reset -p <fork_domid>

This is an optimization for cases where the forks are very short-lived and run
without a device model, so resetting saves some time compared to creating a
brand new fork provided the fork has not aquired a lot of memory. If the fork
has a lot of memory deduplicated it is likely going to be faster to create a
new fork from scratch and asynchronously destroying the old one.

The series has been tested with both Linux and Windows VMs and functions as
expected. VM forking time has been measured to be 0.0007s, device model launch
to be around 1s depending largely on the number of devices being emulated. Fork
resets have been measured to be 0.0001s under the optimal circumstances.

New in v9:
    Patch 2
    Rebased on staging and minor fixes for things pointed out by Andrew

Patch 1 exposes a hap internal function that will be used during VM forking

Patch 2 extends the createdomain domctl with an extra used during VM forking

Patch 3-4 implements the VM fork & reset operation hypervisor side bits

Patch 5 adds the toolstack-side code implementing VM forking and reset

Tamas K Lengyel (5):
  xen/x86: Make hap_get_allocation accessible
  xen: add parent_domid field to createdomain domctl
  xen/mem_sharing: VM forking
  x86/mem_sharing: reset a fork
  xen/tools: VM forking toolstack side

 docs/man/xl.1.pod.in              |  36 ++++
 tools/libxc/include/xenctrl.h     |  13 ++
 tools/libxc/xc_memshr.c           |  22 +++
 tools/libxl/libxl.h               |   7 +
 tools/libxl/libxl_create.c        | 256 ++++++++++++++++---------
 tools/libxl/libxl_dm.c            |   2 +-
 tools/libxl/libxl_dom.c           |  43 ++++-
 tools/libxl/libxl_internal.h      |   1 +
 tools/libxl/libxl_types.idl       |   1 +
 tools/xl/xl.h                     |   5 +
 tools/xl/xl_cmdtable.c            |  12 ++
 tools/xl/xl_saverestore.c         |  97 ++++++++++
 tools/xl/xl_vmcontrol.c           |   8 +
 xen/arch/x86/domain.c             |  11 ++
 xen/arch/x86/hvm/hvm.c            |   2 +-
 xen/arch/x86/mm/hap/hap.c         |   3 +-
 xen/arch/x86/mm/mem_sharing.c     | 298 ++++++++++++++++++++++++++++++
 xen/arch/x86/mm/p2m.c             |  11 +-
 xen/common/domctl.c               |  14 ++
 xen/include/asm-x86/hap.h         |   1 +
 xen/include/asm-x86/mem_sharing.h |  17 ++
 xen/include/public/domctl.h       |   3 +-
 xen/include/public/memory.h       |   6 +
 xen/include/xen/sched.h           |   2 +
 24 files changed, 772 insertions(+), 99 deletions(-)

-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Xen-devel] [PATCH v9 1/5] xen/x86: Make hap_get_allocation accessible
  2020-02-21 18:49 [Xen-devel] [PATCH v9 0/5] VM forking Tamas K Lengyel
@ 2020-02-21 18:49 ` Tamas K Lengyel
  2020-02-24 15:20   ` Roger Pau Monné
  2020-02-21 18:49 ` [Xen-devel] [PATCH v9 2/5] xen: add parent_domid field to createdomain domctl Tamas K Lengyel
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-21 18:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Wei Liu, Andrew Cooper, George Dunlap,
	Jan Beulich, Roger Pau Monné

During VM forking we'll copy the parent domain's parameters to the client,
including the HAP shadow memory setting that is used for storing the domain's
EPT. We'll copy this in the hypervisor instead doing it during toolstack launch
to allow the domain to start executing and unsharing memory before (or
even completely without) the toolstack.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/hap/hap.c | 3 +--
 xen/include/asm-x86/hap.h | 1 +
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 3d93f3451c..c7c7ff6e99 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -321,8 +321,7 @@ static void hap_free_p2m_page(struct domain *d, struct page_info *pg)
 }
 
 /* Return the size of the pool, rounded up to the nearest MB */
-static unsigned int
-hap_get_allocation(struct domain *d)
+unsigned int hap_get_allocation(struct domain *d)
 {
     unsigned int pg = d->arch.paging.hap.total_pages
         + d->arch.paging.hap.p2m_pages;
diff --git a/xen/include/asm-x86/hap.h b/xen/include/asm-x86/hap.h
index b94bfb4ed0..1bf07e49fe 100644
--- a/xen/include/asm-x86/hap.h
+++ b/xen/include/asm-x86/hap.h
@@ -45,6 +45,7 @@ int   hap_track_dirty_vram(struct domain *d,
 
 extern const struct paging_mode *hap_paging_get_mode(struct vcpu *);
 int hap_set_allocation(struct domain *d, unsigned int pages, bool *preempted);
+unsigned int hap_get_allocation(struct domain *d);
 
 #endif /* XEN_HAP_H */
 
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Xen-devel] [PATCH v9 2/5] xen: add parent_domid field to createdomain domctl
  2020-02-21 18:49 [Xen-devel] [PATCH v9 0/5] VM forking Tamas K Lengyel
  2020-02-21 18:49 ` [Xen-devel] [PATCH v9 1/5] xen/x86: Make hap_get_allocation accessible Tamas K Lengyel
@ 2020-02-21 18:49 ` Tamas K Lengyel
  2020-02-21 21:02   ` Julien Grall
  2020-02-24 15:44   ` Andrew Cooper
  2020-02-21 18:49 ` [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking Tamas K Lengyel
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-21 18:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Julien Grall

When creating a domain that will be used as a VM fork some information is
required to set things up properly, like the max_vcpus count. Instead of the
toolstack having to gather this information for each fork in a separate
hypercall we can just include the parent domain's id in the createdomain domctl
so that Xen can copy the setting without the extra toolstack queries.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/common/domctl.c         | 14 ++++++++++++++
 xen/include/public/domctl.h |  3 ++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index a69b3b59a8..22aceb3860 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -489,6 +489,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
     case XEN_DOMCTL_createdomain:
     {
         domid_t        dom;
+        domid_t        parent_dom;
         static domid_t rover = 0;
 
         dom = op->domain;
@@ -515,6 +516,19 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
             rover = dom;
         }
 
+        parent_dom = op->u.createdomain.parent_domid;
+        if ( parent_dom )
+        {
+            struct domain *pd = rcu_lock_domain_by_id(parent_dom);
+
+            ret = -EINVAL;
+            if ( !pd )
+                break;
+
+            op->u.createdomain.max_vcpus = pd->max_vcpus;
+            rcu_unlock_domain(pd);
+        }
+
         d = domain_create(dom, &op->u.createdomain, false);
         if ( IS_ERR(d) )
         {
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index fec6f6fdd1..251cc40ef6 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -38,7 +38,7 @@
 #include "hvm/save.h"
 #include "memory.h"
 
-#define XEN_DOMCTL_INTERFACE_VERSION 0x00000012
+#define XEN_DOMCTL_INTERFACE_VERSION 0x00000013
 
 /*
  * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
@@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
     uint32_t max_evtchn_port;
     int32_t max_grant_frames;
     int32_t max_maptrack_frames;
+    domid_t parent_domid;
 
     struct xen_arch_domainconfig arch;
 };
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking
  2020-02-21 18:49 [Xen-devel] [PATCH v9 0/5] VM forking Tamas K Lengyel
  2020-02-21 18:49 ` [Xen-devel] [PATCH v9 1/5] xen/x86: Make hap_get_allocation accessible Tamas K Lengyel
  2020-02-21 18:49 ` [Xen-devel] [PATCH v9 2/5] xen: add parent_domid field to createdomain domctl Tamas K Lengyel
@ 2020-02-21 18:49 ` Tamas K Lengyel
  2020-02-24 12:39   ` Roger Pau Monné
  2020-02-25 10:04   ` Roger Pau Monné
  2020-02-21 18:49 ` [Xen-devel] [PATCH v9 4/5] x86/mem_sharing: reset a fork Tamas K Lengyel
  2020-02-21 18:49 ` [Xen-devel] [PATCH v9 5/5] xen/tools: VM forking toolstack side Tamas K Lengyel
  4 siblings, 2 replies; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-21 18:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Tamas K Lengyel, Jan Beulich, Julien Grall, Roger Pau Monné

VM forking is the process of creating a domain with an empty memory space and a
parent domain specified from which to populate the memory when necessary. For
the new domain to be functional the VM state is copied over as part of the fork
operation (HVM params, hap allocation, etc).

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
v9: remove stale header
    fix tsc incarnation being bumped on set
---
 xen/arch/x86/domain.c             |  11 ++
 xen/arch/x86/hvm/hvm.c            |   2 +-
 xen/arch/x86/mm/mem_sharing.c     | 222 ++++++++++++++++++++++++++++++
 xen/arch/x86/mm/p2m.c             |  11 +-
 xen/include/asm-x86/mem_sharing.h |  17 +++
 xen/include/public/memory.h       |   5 +
 xen/include/xen/sched.h           |   2 +
 7 files changed, 268 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index fe63c23676..1ab0ca0942 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2203,6 +2203,17 @@ int domain_relinquish_resources(struct domain *d)
             ret = relinquish_shared_pages(d);
             if ( ret )
                 return ret;
+
+            /*
+             * If the domain is forked, decrement the parent's pause count
+             * and release the domain.
+             */
+            if ( d->parent )
+            {
+                domain_unpause(d->parent);
+                put_domain(d->parent);
+                d->parent = NULL;
+            }
         }
 #endif
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index a339b36a0d..9bfa603f8c 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1915,7 +1915,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
     }
 #endif
 
-    /* Spurious fault? PoD and log-dirty also take this path. */
+    /* Spurious fault? PoD, log-dirty and VM forking also take this path. */
     if ( p2m_is_ram(p2mt) )
     {
         rc = 1;
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 3835bc928f..ad5db9d8d5 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -22,6 +22,7 @@
 
 #include <xen/types.h>
 #include <xen/domain_page.h>
+#include <xen/event.h>
 #include <xen/spinlock.h>
 #include <xen/rwlock.h>
 #include <xen/mm.h>
@@ -36,6 +37,9 @@
 #include <asm/altp2m.h>
 #include <asm/atomic.h>
 #include <asm/event.h>
+#include <asm/hap.h>
+#include <asm/hvm/hvm.h>
+#include <asm/hvm/save.h>
 #include <xsm/xsm.h>
 
 #include "mm-locks.h"
@@ -1444,6 +1448,194 @@ static inline int mem_sharing_control(struct domain *d, bool enable)
     return 0;
 }
 
+/*
+ * Forking a page only gets called when the VM faults due to no entry being
+ * in the EPT for the access. Depending on the type of access we either
+ * populate the physmap with a shared entry for read-only access or
+ * fork the page if its a write access.
+ *
+ * The client p2m is already locked so we only need to lock
+ * the parent's here.
+ */
+int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool unsharing)
+{
+    int rc = -ENOENT;
+    shr_handle_t handle;
+    struct domain *parent;
+    struct p2m_domain *p2m;
+    unsigned long gfn_l = gfn_x(gfn);
+    mfn_t mfn, new_mfn;
+    p2m_type_t p2mt;
+    struct page_info *page;
+
+    if ( !mem_sharing_is_fork(d) )
+        return -ENOENT;
+
+    parent = d->parent;
+
+    if ( !unsharing )
+    {
+        /* For read-only accesses we just add a shared entry to the physmap */
+        while ( parent )
+        {
+            if ( !(rc = nominate_page(parent, gfn, 0, &handle)) )
+                break;
+
+            parent = parent->parent;
+        }
+
+        if ( !rc )
+        {
+            /* The client's p2m is already locked */
+            struct p2m_domain *pp2m = p2m_get_hostp2m(parent);
+
+            p2m_lock(pp2m);
+            rc = add_to_physmap(parent, gfn_l, handle, d, gfn_l, false);
+            p2m_unlock(pp2m);
+
+            if ( !rc )
+                return 0;
+        }
+    }
+
+    /*
+     * If it's a write access (ie. unsharing) or if adding a shared entry to
+     * the physmap failed we'll fork the page directly.
+     */
+    p2m = p2m_get_hostp2m(d);
+    parent = d->parent;
+
+    while ( parent )
+    {
+        mfn = get_gfn_query(parent, gfn_l, &p2mt);
+
+        if ( mfn_valid(mfn) && p2m_is_any_ram(p2mt) )
+            break;
+
+        put_gfn(parent, gfn_l);
+        parent = parent->parent;
+    }
+
+    if ( !parent )
+        return -ENOENT;
+
+    if ( !(page = alloc_domheap_page(d, 0)) )
+    {
+        put_gfn(parent, gfn_l);
+        return -ENOMEM;
+    }
+
+    new_mfn = page_to_mfn(page);
+    copy_domain_page(new_mfn, mfn);
+    set_gpfn_from_mfn(mfn_x(new_mfn), gfn_l);
+
+    put_gfn(parent, gfn_l);
+
+    return p2m->set_entry(p2m, gfn, new_mfn, PAGE_ORDER_4K, p2m_ram_rw,
+                          p2m->default_access, -1);
+}
+
+static int bring_up_vcpus(struct domain *cd, struct cpupool *cpupool)
+{
+    int ret;
+    unsigned int i;
+
+    if ( (ret = cpupool_move_domain(cd, cpupool)) )
+        return ret;
+
+    for ( i = 0; i < cd->max_vcpus; i++ )
+    {
+        if ( cd->vcpu[i] )
+            continue;
+
+        if ( !vcpu_create(cd, i) )
+            return -EINVAL;
+    }
+
+    domain_update_node_affinity(cd);
+    return 0;
+}
+
+static int fork_hap_allocation(struct domain *cd, struct domain *d)
+{
+    int rc;
+    bool preempted;
+    unsigned long mb = hap_get_allocation(d);
+
+    if ( mb == hap_get_allocation(cd) )
+        return 0;
+
+    paging_lock(cd);
+    rc = hap_set_allocation(cd, mb << (20 - PAGE_SHIFT), &preempted);
+    paging_unlock(cd);
+
+    if ( rc )
+        return rc;
+
+    if ( preempted )
+        return -ERESTART;
+
+    return 0;
+}
+
+static void fork_tsc(struct domain *cd, struct domain *d)
+{
+    uint32_t tsc_mode;
+    uint32_t gtsc_khz;
+    uint32_t incarnation;
+    uint64_t elapsed_nsec;
+
+    tsc_get_info(d, &tsc_mode, &elapsed_nsec, &gtsc_khz, &incarnation);
+    /* Don't bump incarnation on set */
+    tsc_set_info(cd, tsc_mode, elapsed_nsec, gtsc_khz, incarnation - 1);
+}
+
+static int mem_sharing_fork(struct domain *d, struct domain *cd)
+{
+    int rc = -EINVAL;
+
+    if ( !cd->controller_pause_count )
+        return rc;
+
+    /*
+     * We only want to get and pause the parent once, not each time this
+     * operation is restarted due to preemption.
+     */
+    if ( !cd->parent_paused )
+    {
+        ASSERT(get_domain(d));
+        domain_pause(d);
+
+        cd->parent_paused = true;
+        cd->max_pages = d->max_pages;
+        cd->max_vcpus = d->max_vcpus;
+    }
+
+    /* this is preemptible so it's the first to get done */
+    if ( (rc = fork_hap_allocation(cd, d)) )
+        goto done;
+
+    if ( (rc = bring_up_vcpus(cd, d->cpupool)) )
+        goto done;
+
+    if ( (rc = hvm_copy_context_and_params(cd, d)) )
+        goto done;
+
+    fork_tsc(cd, d);
+
+    cd->parent = d;
+
+ done:
+    if ( rc && rc != -ERESTART )
+    {
+        domain_unpause(d);
+        put_domain(d);
+        cd->parent_paused = false;
+    }
+
+    return rc;
+}
+
 int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 {
     int rc;
@@ -1698,6 +1890,36 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
         rc = debug_gref(d, mso.u.debug.u.gref);
         break;
 
+    case XENMEM_sharing_op_fork:
+    {
+        struct domain *pd;
+
+        rc = -EINVAL;
+        if ( mso.u.fork._pad[0] || mso.u.fork._pad[1] ||
+             mso.u.fork._pad[2] )
+            goto out;
+
+        rc = rcu_lock_live_remote_domain_by_id(mso.u.fork.parent_domain,
+                                               &pd);
+        if ( rc )
+            goto out;
+
+        if ( !mem_sharing_enabled(pd) )
+        {
+            if ( (rc = mem_sharing_control(pd, true)) )
+                goto out;
+        }
+
+        rc = mem_sharing_fork(pd, d);
+
+        if ( rc == -ERESTART )
+            rc = hypercall_create_continuation(__HYPERVISOR_memory_op,
+                                               "lh", XENMEM_sharing_op,
+                                               arg);
+        rcu_unlock_domain(pd);
+        break;
+    }
+
     default:
         rc = -ENOSYS;
         break;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index c5f428d67c..7c4d2fd7a0 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -509,6 +509,14 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
 
     mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
 
+    /* Check if we need to fork the page */
+    if ( (q & P2M_ALLOC) && p2m_is_hole(*t) &&
+         !mem_sharing_fork_page(p2m->domain, gfn, !!(q & P2M_UNSHARE)) )
+    {
+        mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
+    }
+
+    /* Check if we need to unshare the page */
     if ( (q & P2M_UNSHARE) && p2m_is_shared(*t) )
     {
         ASSERT(p2m_is_hostp2m(p2m));
@@ -588,7 +596,8 @@ struct page_info *p2m_get_page_from_gfn(
             return page;
 
         /* Error path: not a suitable GFN at all */
-        if ( !p2m_is_ram(*t) && !p2m_is_paging(*t) && !p2m_is_pod(*t) )
+        if ( !p2m_is_ram(*t) && !p2m_is_paging(*t) && !p2m_is_pod(*t) &&
+             !mem_sharing_is_fork(p2m->domain) )
             return NULL;
     }
 
diff --git a/xen/include/asm-x86/mem_sharing.h b/xen/include/asm-x86/mem_sharing.h
index 53760a2896..ac968fae3f 100644
--- a/xen/include/asm-x86/mem_sharing.h
+++ b/xen/include/asm-x86/mem_sharing.h
@@ -39,6 +39,9 @@ struct mem_sharing_domain
 
 #define mem_sharing_enabled(d) ((d)->arch.hvm.mem_sharing.enabled)
 
+#define mem_sharing_is_fork(d) \
+    (mem_sharing_enabled(d) && !!((d)->parent))
+
 /* Auditing of memory sharing code? */
 #ifndef NDEBUG
 #define MEM_SHARING_AUDIT 1
@@ -88,6 +91,9 @@ static inline int mem_sharing_unshare_page(struct domain *d,
     return rc;
 }
 
+int mem_sharing_fork_page(struct domain *d, gfn_t gfn,
+                          bool unsharing);
+
 /*
  * If called by a foreign domain, possible errors are
  *   -EBUSY -> ring full
@@ -117,6 +123,7 @@ int relinquish_shared_pages(struct domain *d);
 #else
 
 #define mem_sharing_enabled(d) false
+#define mem_sharing_is_fork(p2m) false
 
 static inline unsigned int mem_sharing_get_nr_saved_mfns(void)
 {
@@ -141,6 +148,16 @@ static inline int mem_sharing_notify_enomem(struct domain *d, unsigned long gfn,
     return -EOPNOTSUPP;
 }
 
+static inline int mem_sharing_fork(struct domain *d, struct domain *cd, bool vcpu)
+{
+    return -EOPNOTSUPP;
+}
+
+static inline int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool lock)
+{
+    return -EOPNOTSUPP;
+}
+
 #endif
 
 #endif /* __MEM_SHARING_H__ */
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 126d0ff06e..24106c6c2f 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -482,6 +482,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
 #define XENMEM_sharing_op_add_physmap       6
 #define XENMEM_sharing_op_audit             7
 #define XENMEM_sharing_op_range_share       8
+#define XENMEM_sharing_op_fork              9
 
 #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
 #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
@@ -532,6 +533,10 @@ struct xen_mem_sharing_op {
                 uint32_t gref;     /* IN: gref to debug         */
             } u;
         } debug;
+        struct mem_sharing_op_fork {
+            domid_t parent_domain;
+            uint16_t _pad[3];                /* Must be set to 0 */
+        } fork;
     } u;
 };
 typedef struct xen_mem_sharing_op xen_mem_sharing_op_t;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 3a4f43098c..d98e999fcc 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -503,6 +503,8 @@ struct domain
     /* Memory sharing support */
 #ifdef CONFIG_MEM_SHARING
     struct vm_event_domain *vm_event_share;
+    struct domain *parent; /* VM fork parent */
+    bool parent_paused;
 #endif
     /* Memory paging support */
 #ifdef CONFIG_HAS_MEM_PAGING
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Xen-devel] [PATCH v9 4/5] x86/mem_sharing: reset a fork
  2020-02-21 18:49 [Xen-devel] [PATCH v9 0/5] VM forking Tamas K Lengyel
                   ` (2 preceding siblings ...)
  2020-02-21 18:49 ` [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking Tamas K Lengyel
@ 2020-02-21 18:49 ` Tamas K Lengyel
  2020-02-24 15:12   ` Roger Pau Monné
  2020-02-21 18:49 ` [Xen-devel] [PATCH v9 5/5] xen/tools: VM forking toolstack side Tamas K Lengyel
  4 siblings, 1 reply; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-21 18:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, Konrad Rzeszutek Wilk,
	Andrew Cooper, Ian Jackson, George Dunlap, Stefano Stabellini,
	Jan Beulich, Julien Grall, Roger Pau Monné

Implement hypercall that allows a fork to shed all memory that got allocated
for it during its execution and re-load its vCPU context from the parent VM.
This allows the forked VM to reset into the same state the parent VM is in a
faster way then creating a new fork would be. Measurements show about a 2x
speedup during normal fuzzing operations. Performance may vary depending how
much memory got allocated for the forked VM. If it has been completely
deduplicated from the parent VM then creating a new fork would likely be more
performant.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 76 +++++++++++++++++++++++++++++++++++
 xen/include/public/memory.h   |  1 +
 2 files changed, 77 insertions(+)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index ad5db9d8d5..fb6892aaa6 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1636,6 +1636,59 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd)
     return rc;
 }
 
+/*
+ * The fork reset operation is intended to be used on short-lived forks only.
+ * There is no hypercall continuation operation implemented for this reason.
+ * For forks that obtain a larger memory footprint it is likely going to be
+ * more performant to create a new fork instead of resetting an existing one.
+ *
+ * TODO: In case this hypercall would become useful on forks with larger memory
+ * footprints the hypercall continuation should be implemented.
+ */
+static int mem_sharing_fork_reset(struct domain *d, struct domain *cd)
+{
+    int rc;
+    struct p2m_domain* p2m = p2m_get_hostp2m(cd);
+    struct page_info *page, *tmp;
+
+    domain_pause(cd);
+
+    page_list_for_each_safe(page, tmp, &cd->page_list)
+    {
+        p2m_type_t p2mt;
+        p2m_access_t p2ma;
+        gfn_t gfn;
+        mfn_t mfn = page_to_mfn(page);
+
+        if ( !mfn_valid(mfn) )
+            continue;
+
+        gfn = mfn_to_gfn(cd, mfn);
+        mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
+                                    0, NULL, false);
+
+        if ( !p2m_is_ram(p2mt) || p2m_is_shared(p2mt) )
+            continue;
+
+        /* take an extra reference */
+        if ( !get_page(page, cd) )
+            continue;
+
+        rc = p2m->set_entry(p2m, gfn, INVALID_MFN, PAGE_ORDER_4K,
+                            p2m_invalid, p2m_access_rwx, -1);
+        ASSERT(!rc);
+
+        put_page_alloc_ref(page);
+        put_page(page);
+    }
+
+    if ( !(rc = hvm_copy_context_and_params(cd, d)) )
+        fork_tsc(cd, d);
+
+    domain_unpause(cd);
+    return rc;
+}
+
 int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 {
     int rc;
@@ -1920,6 +1973,29 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
         break;
     }
 
+    case XENMEM_sharing_op_fork_reset:
+    {
+        struct domain *pd;
+
+        rc = -EINVAL;
+        if ( mso.u.fork._pad[0] || mso.u.fork._pad[1] ||
+             mso.u.fork._pad[2] )
+            goto out;
+
+        rc = -ENOSYS;
+        if ( !d->parent )
+            goto out;
+
+        rc = rcu_lock_live_remote_domain_by_id(d->parent->domain_id, &pd);
+        if ( rc )
+            goto out;
+
+        rc = mem_sharing_fork_reset(pd, d);
+
+        rcu_unlock_domain(pd);
+        break;
+    }
+
     default:
         rc = -ENOSYS;
         break;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 24106c6c2f..defab3a5bc 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -483,6 +483,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
 #define XENMEM_sharing_op_audit             7
 #define XENMEM_sharing_op_range_share       8
 #define XENMEM_sharing_op_fork              9
+#define XENMEM_sharing_op_fork_reset        10
 
 #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
 #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [Xen-devel] [PATCH v9 5/5] xen/tools: VM forking toolstack side
  2020-02-21 18:49 [Xen-devel] [PATCH v9 0/5] VM forking Tamas K Lengyel
                   ` (3 preceding siblings ...)
  2020-02-21 18:49 ` [Xen-devel] [PATCH v9 4/5] x86/mem_sharing: reset a fork Tamas K Lengyel
@ 2020-02-21 18:49 ` Tamas K Lengyel
  2020-02-24 16:12   ` Julien Grall
  4 siblings, 1 reply; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-21 18:49 UTC (permalink / raw)
  To: xen-devel; +Cc: Anthony PERARD, Ian Jackson, Tamas K Lengyel, Wei Liu

Add necessary bits to implement "xl fork-vm" commands. The command allows the
user to specify how to launch the device model allowing for a late-launch model
in which the user can execute the fork without the device model and decide to
only later launch it.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 docs/man/xl.1.pod.in          |  36 +++++
 tools/libxc/include/xenctrl.h |  13 ++
 tools/libxc/xc_memshr.c       |  22 +++
 tools/libxl/libxl.h           |   7 +
 tools/libxl/libxl_create.c    | 256 ++++++++++++++++++++++------------
 tools/libxl/libxl_dm.c        |   2 +-
 tools/libxl/libxl_dom.c       |  43 +++++-
 tools/libxl/libxl_internal.h  |   1 +
 tools/libxl/libxl_types.idl   |   1 +
 tools/xl/xl.h                 |   5 +
 tools/xl/xl_cmdtable.c        |  12 ++
 tools/xl/xl_saverestore.c     |  97 +++++++++++++
 tools/xl/xl_vmcontrol.c       |   8 ++
 13 files changed, 409 insertions(+), 94 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index 33ad2ebd71..c4012939f5 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -694,6 +694,42 @@ Leave the domain paused after creating the snapshot.
 
 =back
 
+=item B<fork-vm> [I<OPTIONS>] I<domain-id>
+
+Create a fork of a running VM. The domain will be paused after the operation
+and needs to remain paused while forks of it exist.
+
+B<OPTIONS>
+
+=over 4
+
+=item B<-p>
+
+Leave the fork paused after creating it.
+
+=item B<--launch-dm>
+
+Specify whether the device model (QEMU) should be launched for the fork. Late
+launch allows to start the device model for an already running fork.
+
+=item B<-C>
+
+The config file to use when launching the device model. Currently required when
+launching the device model.
+
+=item B<-Q>
+
+The qemu save file to use when launching the device model.  Currently required
+when launching the device model.
+
+=item B<--fork-reset>
+
+Perform a reset operation of an already running fork. Note that resetting may
+be less performant then creating a new fork depending on how much memory the
+fork has deduplicated during its runtime.
+
+=back
+
 =item B<sharing> [I<domain-id>]
 
 Display the number of shared pages for a specified domain. If no domain is
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 99552a5f73..90fce83196 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2225,6 +2225,19 @@ int xc_memshr_range_share(xc_interface *xch,
                           uint64_t first_gfn,
                           uint64_t last_gfn);
 
+int xc_memshr_fork(xc_interface *xch,
+                   uint32_t source_domain,
+                   uint32_t client_domain);
+
+/*
+ * Note: this function is only intended to be used on short-lived forks that
+ * haven't yet aquired a lot of memory. In case the fork has a lot of memory
+ * it is likely more performant to create a new fork with xc_memshr_fork.
+ *
+ * With VMs that have a lot of memory this call may block for a long time.
+ */
+int xc_memshr_fork_reset(xc_interface *xch, uint32_t forked_domain);
+
 /* Debug calls: return the number of pages referencing the shared frame backing
  * the input argument. Should be one or greater.
  *
diff --git a/tools/libxc/xc_memshr.c b/tools/libxc/xc_memshr.c
index 97e2e6a8d9..d0e4ee225b 100644
--- a/tools/libxc/xc_memshr.c
+++ b/tools/libxc/xc_memshr.c
@@ -239,6 +239,28 @@ int xc_memshr_debug_gref(xc_interface *xch,
     return xc_memshr_memop(xch, domid, &mso);
 }
 
+int xc_memshr_fork(xc_interface *xch, uint32_t pdomid, uint32_t domid)
+{
+    xen_mem_sharing_op_t mso;
+
+    memset(&mso, 0, sizeof(mso));
+
+    mso.op = XENMEM_sharing_op_fork;
+    mso.u.fork.parent_domain = pdomid;
+
+    return xc_memshr_memop(xch, domid, &mso);
+}
+
+int xc_memshr_fork_reset(xc_interface *xch, uint32_t domid)
+{
+    xen_mem_sharing_op_t mso;
+
+    memset(&mso, 0, sizeof(mso));
+    mso.op = XENMEM_sharing_op_fork_reset;
+
+    return xc_memshr_memop(xch, domid, &mso);
+}
+
 int xc_memshr_audit(xc_interface *xch)
 {
     xen_mem_sharing_op_t mso;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index fde8548847..dcdabe8f80 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1538,6 +1538,13 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
                             LIBXL_EXTERNAL_CALLERS_ONLY;
+int libxl_domain_fork_vm(libxl_ctx *ctx, uint32_t pdomid, uint32_t *domid)
+                             LIBXL_EXTERNAL_CALLERS_ONLY;
+int libxl_domain_fork_launch_dm(libxl_ctx *ctx, libxl_domain_config *d_config,
+                                uint32_t domid,
+                                const libxl_asyncprogress_how *aop_console_how)
+                                LIBXL_EXTERNAL_CALLERS_ONLY;
+int libxl_domain_fork_reset(libxl_ctx *ctx, uint32_t domid);
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 uint32_t *domid, int restore_fd,
                                 int send_back_fd,
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 3a7364e2ac..9b4ed920b7 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -536,12 +536,12 @@ out:
     return ret;
 }
 
-int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
-                       libxl__domain_build_state *state,
-                       uint32_t *domid, bool soft_reset)
+static int libxl__domain_make_xs_entries(libxl__gc *gc, libxl_domain_config *d_config,
+                                         libxl__domain_build_state *state,
+                                         uint32_t domid)
 {
     libxl_ctx *ctx = libxl__gc_owner(gc);
-    int ret, rc, nb_vm;
+    int rc, nb_vm;
     const char *dom_type;
     char *uuid_string;
     char *dom_path, *vm_path, *libxl_path;
@@ -553,9 +553,6 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
 
     /* convenience aliases */
     libxl_domain_create_info *info = &d_config->c_info;
-    libxl_domain_build_info *b_info = &d_config->b_info;
-
-    assert(soft_reset || *domid == INVALID_DOMID);
 
     uuid_string = libxl__uuid2string(gc, info->uuid);
     if (!uuid_string) {
@@ -563,71 +560,7 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
         goto out;
     }
 
-    if (!soft_reset) {
-        struct xen_domctl_createdomain create = {
-            .ssidref = info->ssidref,
-            .max_vcpus = b_info->max_vcpus,
-            .max_evtchn_port = b_info->event_channels,
-            .max_grant_frames = b_info->max_grant_frames,
-            .max_maptrack_frames = b_info->max_maptrack_frames,
-        };
-
-        if (info->type != LIBXL_DOMAIN_TYPE_PV) {
-            create.flags |= XEN_DOMCTL_CDF_hvm;
-            create.flags |=
-                libxl_defbool_val(info->hap) ? XEN_DOMCTL_CDF_hap : 0;
-            create.flags |=
-                libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off;
-        }
-
-        assert(info->passthrough != LIBXL_PASSTHROUGH_DEFAULT);
-        LOG(DETAIL, "passthrough: %s",
-            libxl_passthrough_to_string(info->passthrough));
-
-        if (info->passthrough != LIBXL_PASSTHROUGH_DISABLED)
-            create.flags |= XEN_DOMCTL_CDF_iommu;
-
-        if (info->passthrough == LIBXL_PASSTHROUGH_SYNC_PT)
-            create.iommu_opts |= XEN_DOMCTL_IOMMU_no_sharept;
-
-        /* Ultimately, handle is an array of 16 uint8_t, same as uuid */
-        libxl_uuid_copy(ctx, (libxl_uuid *)&create.handle, &info->uuid);
-
-        ret = libxl__arch_domain_prepare_config(gc, d_config, &create);
-        if (ret < 0) {
-            LOGED(ERROR, *domid, "fail to get domain config");
-            rc = ERROR_FAIL;
-            goto out;
-        }
-
-        ret = xc_domain_create(ctx->xch, domid, &create);
-        if (ret < 0) {
-            LOGED(ERROR, *domid, "domain creation fail");
-            rc = ERROR_FAIL;
-            goto out;
-        }
-
-        rc = libxl__arch_domain_save_config(gc, d_config, state, &create);
-        if (rc < 0)
-            goto out;
-    }
-
-    /*
-     * If soft_reset is set the the domid will have been valid on entry.
-     * If it was not set then xc_domain_create() should have assigned a
-     * valid value. Either way, if we reach this point, domid should be
-     * valid.
-     */
-    assert(libxl_domid_valid_guest(*domid));
-
-    ret = xc_cpupool_movedomain(ctx->xch, info->poolid, *domid);
-    if (ret < 0) {
-        LOGED(ERROR, *domid, "domain move fail");
-        rc = ERROR_FAIL;
-        goto out;
-    }
-
-    dom_path = libxl__xs_get_dompath(gc, *domid);
+    dom_path = libxl__xs_get_dompath(gc, domid);
     if (!dom_path) {
         rc = ERROR_FAIL;
         goto out;
@@ -635,12 +568,12 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
 
     vm_path = GCSPRINTF("/vm/%s", uuid_string);
     if (!vm_path) {
-        LOGD(ERROR, *domid, "cannot allocate create paths");
+        LOGD(ERROR, domid, "cannot allocate create paths");
         rc = ERROR_FAIL;
         goto out;
     }
 
-    libxl_path = libxl__xs_libxl_path(gc, *domid);
+    libxl_path = libxl__xs_libxl_path(gc, domid);
     if (!libxl_path) {
         rc = ERROR_FAIL;
         goto out;
@@ -651,10 +584,10 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
 
     roperm[0].id = 0;
     roperm[0].perms = XS_PERM_NONE;
-    roperm[1].id = *domid;
+    roperm[1].id = domid;
     roperm[1].perms = XS_PERM_READ;
 
-    rwperm[0].id = *domid;
+    rwperm[0].id = domid;
     rwperm[0].perms = XS_PERM_NONE;
 
 retry_transaction:
@@ -672,7 +605,7 @@ retry_transaction:
                     noperm, ARRAY_SIZE(noperm));
 
     xs_write(ctx->xsh, t, GCSPRINTF("%s/vm", dom_path), vm_path, strlen(vm_path));
-    rc = libxl__domain_rename(gc, *domid, 0, info->name, t);
+    rc = libxl__domain_rename(gc, domid, 0, info->name, t);
     if (rc)
         goto out;
 
@@ -749,7 +682,7 @@ retry_transaction:
 
     vm_list = libxl_list_vm(ctx, &nb_vm);
     if (!vm_list) {
-        LOGD(ERROR, *domid, "cannot get number of running guests");
+        LOGD(ERROR, domid, "cannot get number of running guests");
         rc = ERROR_FAIL;
         goto out;
     }
@@ -773,7 +706,7 @@ retry_transaction:
             t = 0;
             goto retry_transaction;
         }
-        LOGED(ERROR, *domid, "domain creation ""xenstore transaction commit failed");
+        LOGED(ERROR, domid, "domain creation ""xenstore transaction commit failed");
         rc = ERROR_FAIL;
         goto out;
     }
@@ -785,6 +718,89 @@ retry_transaction:
     return rc;
 }
 
+int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
+                       libxl__domain_build_state *state,
+                       uint32_t *domid, bool soft_reset)
+{
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+    int ret, rc;
+
+    /* convenience aliases */
+    libxl_domain_create_info *info = &d_config->c_info;
+    libxl_domain_build_info *b_info = &d_config->b_info;
+
+    assert(soft_reset || *domid == INVALID_DOMID);
+
+    if (!soft_reset) {
+        struct xen_domctl_createdomain create = {
+            .ssidref = info->ssidref,
+            .max_vcpus = b_info->max_vcpus,
+            .max_evtchn_port = b_info->event_channels,
+            .max_grant_frames = b_info->max_grant_frames,
+            .max_maptrack_frames = b_info->max_maptrack_frames,
+        };
+
+        if (info->type != LIBXL_DOMAIN_TYPE_PV) {
+            create.flags |= XEN_DOMCTL_CDF_hvm;
+            create.flags |=
+                libxl_defbool_val(info->hap) ? XEN_DOMCTL_CDF_hap : 0;
+            create.flags |=
+                libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off;
+        }
+
+        assert(info->passthrough != LIBXL_PASSTHROUGH_DEFAULT);
+        LOG(DETAIL, "passthrough: %s",
+            libxl_passthrough_to_string(info->passthrough));
+
+        if (info->passthrough != LIBXL_PASSTHROUGH_DISABLED)
+            create.flags |= XEN_DOMCTL_CDF_iommu;
+
+        if (info->passthrough == LIBXL_PASSTHROUGH_SYNC_PT)
+            create.iommu_opts |= XEN_DOMCTL_IOMMU_no_sharept;
+
+        /* Ultimately, handle is an array of 16 uint8_t, same as uuid */
+        libxl_uuid_copy(ctx, (libxl_uuid *)&create.handle, &info->uuid);
+
+        ret = libxl__arch_domain_prepare_config(gc, d_config, &create);
+        if (ret < 0) {
+            LOGED(ERROR, *domid, "fail to get domain config");
+            rc = ERROR_FAIL;
+            goto out;
+        }
+
+        ret = xc_domain_create(ctx->xch, domid, &create);
+        if (ret < 0) {
+            LOGED(ERROR, *domid, "domain creation fail");
+            rc = ERROR_FAIL;
+            goto out;
+        }
+
+        rc = libxl__arch_domain_save_config(gc, d_config, state, &create);
+        if (rc < 0)
+            goto out;
+    }
+
+    /*
+     * If soft_reset is set the the domid will have been valid on entry.
+     * If it was not set then xc_domain_create() should have assigned a
+     * valid value. Either way, if we reach this point, domid should be
+     * valid.
+     */
+    assert(libxl_domid_valid_guest(*domid));
+
+    ret = xc_cpupool_movedomain(ctx->xch, info->poolid, *domid);
+    if (ret < 0) {
+        LOGED(ERROR, *domid, "domain move fail");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = libxl__domain_make_xs_entries(gc, d_config, state, *domid);
+
+out:
+    return rc;
+}
+
 static int store_libxl_entry(libxl__gc *gc, uint32_t domid,
                              libxl_domain_build_info *b_info)
 {
@@ -1106,16 +1122,32 @@ static void initiate_domain_create(libxl__egc *egc,
     ret = libxl__domain_config_setdefault(gc,d_config,domid);
     if (ret) goto error_out;
 
-    ret = libxl__domain_make(gc, d_config, &dcs->build_state, &domid,
-                             dcs->soft_reset);
-    if (ret) {
-        LOGD(ERROR, domid, "cannot make domain: %d", ret);
+    if ( !d_config->dm_restore_file )
+    {
+        ret = libxl__domain_make(gc, d_config, &dcs->build_state, &domid,
+                                 dcs->soft_reset);
         dcs->guest_domid = domid;
+
+        if (ret) {
+            LOGD(ERROR, domid, "cannot make domain: %d", ret);
+            ret = ERROR_FAIL;
+            goto error_out;
+        }
+    } else if ( dcs->guest_domid != INVALID_DOMID ) {
+        domid = dcs->guest_domid;
+
+        ret = libxl__domain_make_xs_entries(gc, d_config, &dcs->build_state, domid);
+        if (ret) {
+            LOGD(ERROR, domid, "cannot make domain: %d", ret);
+            ret = ERROR_FAIL;
+            goto error_out;
+        }
+    } else {
+        LOGD(ERROR, domid, "cannot make domain");
         ret = ERROR_FAIL;
         goto error_out;
     }
 
-    dcs->guest_domid = domid;
     dcs->sdss.dm.guest_domid = 0; /* means we haven't spawned */
 
     /* post-4.13 todo: move these next bits of defaulting to
@@ -1151,7 +1183,7 @@ static void initiate_domain_create(libxl__egc *egc,
     if (ret)
         goto error_out;
 
-    if (restore_fd >= 0 || dcs->soft_reset) {
+    if (restore_fd >= 0 || dcs->soft_reset || d_config->dm_restore_file) {
         LOGD(DEBUG, domid, "restoring, not running bootloader");
         domcreate_bootloader_done(egc, &dcs->bl, 0);
     } else  {
@@ -1227,7 +1259,16 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     dcs->sdss.dm.callback = domcreate_devmodel_started;
     dcs->sdss.callback = domcreate_devmodel_started;
 
-    if (restore_fd < 0 && !dcs->soft_reset) {
+    if (restore_fd < 0 && !dcs->soft_reset && !d_config->dm_restore_file) {
+        rc = libxl__domain_build(gc, d_config, domid, state);
+        domcreate_rebuild_done(egc, dcs, rc);
+        return;
+    }
+
+    if ( d_config->dm_restore_file ) {
+        dcs->srs.dcs = dcs;
+        dcs->srs.ao = ao;
+        state->forked_vm = true;
         rc = libxl__domain_build(gc, d_config, domid, state);
         domcreate_rebuild_done(egc, dcs, rc);
         return;
@@ -1425,6 +1466,7 @@ static void domcreate_rebuild_done(libxl__egc *egc,
     /* convenience aliases */
     const uint32_t domid = dcs->guest_domid;
     libxl_domain_config *const d_config = dcs->guest_config;
+    libxl__domain_build_state *const state = &dcs->build_state;
 
     if (ret) {
         LOGD(ERROR, domid, "cannot (re-)build domain: %d", ret);
@@ -1432,6 +1474,9 @@ static void domcreate_rebuild_done(libxl__egc *egc,
         goto error_out;
     }
 
+    if ( d_config->dm_restore_file )
+        state->saved_state = GCSPRINTF("%s", d_config->dm_restore_file);
+
     store_libxl_entry(gc, domid, &d_config->b_info);
 
     libxl__multidev_begin(ao, &dcs->multidev);
@@ -1833,6 +1878,8 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
     GCNEW(cdcs);
     cdcs->dcs.ao = ao;
     cdcs->dcs.guest_config = d_config;
+    cdcs->dcs.guest_domid = *domid;
+
     libxl_domain_config_init(&cdcs->dcs.guest_config_saved);
     libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config);
     cdcs->dcs.restore_fd = cdcs->dcs.libxc_fd = restore_fd;
@@ -2081,6 +2128,43 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             ao_how, aop_console_how);
 }
 
+int libxl_domain_fork_vm(libxl_ctx *ctx, uint32_t pdomid, uint32_t *domid)
+{
+    int rc;
+    struct xen_domctl_createdomain create = {0};
+    create.flags |= XEN_DOMCTL_CDF_hvm;
+    create.flags |= XEN_DOMCTL_CDF_hap;
+    create.flags |= XEN_DOMCTL_CDF_oos_off;
+    create.arch.emulation_flags = (XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI);
+
+    create.ssidref = SECINITSID_DOMU;
+    create.parent_domid = pdomid;
+    create.max_evtchn_port = 1023;
+    create.max_grant_frames = LIBXL_MAX_GRANT_FRAMES_DEFAULT;
+    create.max_maptrack_frames = LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT;
+
+    if ( (rc = xc_domain_create(ctx->xch, domid, &create)) )
+        return rc;
+
+    if ( (rc = xc_memshr_fork(ctx->xch, pdomid, *domid)) )
+        xc_domain_destroy(ctx->xch, *domid);
+
+    return rc;
+}
+
+int libxl_domain_fork_launch_dm(libxl_ctx *ctx, libxl_domain_config *d_config,
+                                uint32_t domid,
+                                const libxl_asyncprogress_how *aop_console_how)
+{
+    unset_disk_colo_restore(d_config);
+    return do_domain_create(ctx, d_config, &domid, -1, -1, 0, 0, aop_console_how);
+}
+
+int libxl_domain_fork_reset(libxl_ctx *ctx, uint32_t domid)
+{
+    return xc_memshr_fork_reset(ctx->xch, domid);
+}
+
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 uint32_t *domid, int restore_fd,
                                 int send_back_fd,
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 3b1da90167..87ae1478cf 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -2787,7 +2787,7 @@ static void device_model_spawn_outcome(libxl__egc *egc,
 
     libxl__domain_build_state *state = dmss->build_state;
 
-    if (state->saved_state) {
+    if (state->saved_state && !state->forked_vm) {
         ret2 = unlink(state->saved_state);
         if (ret2) {
             LOGED(ERROR, dmss->guest_domid, "%s: failed to remove device-model state %s",
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 71cb578923..3bc7117b99 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -249,9 +249,12 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
     libxl_domain_build_info *const info = &d_config->b_info;
     libxl_ctx *ctx = libxl__gc_owner(gc);
     char *xs_domid, *con_domid;
-    int rc;
+    int rc = 0;
     uint64_t size;
 
+    if ( state->forked_vm )
+        goto skip_fork;
+
     if (xc_domain_max_vcpus(ctx->xch, domid, info->max_vcpus) != 0) {
         LOG(ERROR, "Couldn't set max vcpu count");
         return ERROR_FAIL;
@@ -362,7 +365,6 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
         }
     }
 
-
     rc = libxl__arch_extra_memory(gc, info, &size);
     if (rc < 0) {
         LOGE(ERROR, "Couldn't get arch extra constant memory size");
@@ -374,6 +376,11 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
         return ERROR_FAIL;
     }
 
+    rc = libxl__arch_domain_create(gc, d_config, domid);
+    if ( rc )
+        goto out;
+
+skip_fork:
     xs_domid = xs_read(ctx->xsh, XBT_NULL, "/tool/xenstored/domid", NULL);
     state->store_domid = xs_domid ? atoi(xs_domid) : 0;
     free(xs_domid);
@@ -385,8 +392,7 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
     state->store_port = xc_evtchn_alloc_unbound(ctx->xch, domid, state->store_domid);
     state->console_port = xc_evtchn_alloc_unbound(ctx->xch, domid, state->console_domid);
 
-    rc = libxl__arch_domain_create(gc, d_config, domid);
-
+out:
     return rc;
 }
 
@@ -444,6 +450,9 @@ int libxl__build_post(libxl__gc *gc, uint32_t domid,
     char **ents;
     int i, rc;
 
+    if ( state->forked_vm )
+        goto skip_fork;
+
     if (info->num_vnuma_nodes && !info->num_vcpu_soft_affinity) {
         rc = set_vnuma_affinity(gc, domid, info);
         if (rc)
@@ -466,6 +475,7 @@ int libxl__build_post(libxl__gc *gc, uint32_t domid,
         }
     }
 
+skip_fork:
     ents = libxl__calloc(gc, 12 + (info->max_vcpus * 2) + 2, sizeof(char *));
     ents[0] = "memory/static-max";
     ents[1] = GCSPRINTF("%"PRId64, info->max_memkb);
@@ -728,14 +738,16 @@ static int hvm_build_set_params(xc_interface *handle, uint32_t domid,
                                 libxl_domain_build_info *info,
                                 int store_evtchn, unsigned long *store_mfn,
                                 int console_evtchn, unsigned long *console_mfn,
-                                domid_t store_domid, domid_t console_domid)
+                                domid_t store_domid, domid_t console_domid,
+                                bool forked_vm)
 {
     struct hvm_info_table *va_hvm;
     uint8_t *va_map, sum;
     uint64_t str_mfn, cons_mfn;
     int i;
 
-    if (info->type == LIBXL_DOMAIN_TYPE_HVM) {
+    if ( info->type == LIBXL_DOMAIN_TYPE_HVM && !forked_vm )
+    {
         va_map = xc_map_foreign_range(handle, domid,
                                       XC_PAGE_SIZE, PROT_READ | PROT_WRITE,
                                       HVM_INFO_PFN);
@@ -1051,6 +1063,23 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     struct xc_dom_image *dom = NULL;
     bool device_model = info->type == LIBXL_DOMAIN_TYPE_HVM ? true : false;
 
+    if ( state->forked_vm )
+    {
+        rc = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
+                                  &state->store_mfn, state->console_port,
+                                  &state->console_mfn, state->store_domid,
+                                  state->console_domid, state->forked_vm);
+
+        if ( rc )
+            return rc;
+
+        return xc_dom_gnttab_seed(ctx->xch, domid, true,
+                                  state->console_mfn,
+                                  state->store_mfn,
+                                  state->console_domid,
+                                  state->store_domid);
+    }
+
     xc_dom_loginit(ctx->xch);
 
     /*
@@ -1175,7 +1204,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     rc = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
                                &state->store_mfn, state->console_port,
                                &state->console_mfn, state->store_domid,
-                               state->console_domid);
+                               state->console_domid, false);
     if (rc != 0) {
         LOG(ERROR, "hvm build set params failed");
         goto out;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 4936446069..d7a7ce163d 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1374,6 +1374,7 @@ typedef struct {
 
     char *saved_state;
     int dm_monitor_fd;
+    bool forked_vm;
 
     libxl__file_reference pv_kernel;
     libxl__file_reference pv_ramdisk;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 7921950f6a..7c4c4057a9 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -956,6 +956,7 @@ libxl_domain_config = Struct("domain_config", [
     ("on_watchdog", libxl_action_on_shutdown),
     ("on_crash", libxl_action_on_shutdown),
     ("on_soft_reset", libxl_action_on_shutdown),
+    ("dm_restore_file", string, {'const': True}),
     ], dir=DIR_IN)
 
 libxl_diskinfo = Struct("diskinfo", [
diff --git a/tools/xl/xl.h b/tools/xl/xl.h
index 60bdad8ffb..9bdad6526e 100644
--- a/tools/xl/xl.h
+++ b/tools/xl/xl.h
@@ -31,6 +31,7 @@ struct cmd_spec {
 };
 
 struct domain_create {
+    uint32_t ddomid; /* fork launch dm for this domid */
     int debug;
     int daemonize;
     int monitor; /* handle guest reboots etc */
@@ -45,6 +46,7 @@ struct domain_create {
     const char *config_file;
     char *extra_config; /* extra config string */
     const char *restore_file;
+    const char *dm_restore_file;
     char *colo_proxy_script;
     bool userspace_colo_proxy;
     int migrate_fd; /* -1 means none */
@@ -127,6 +129,9 @@ int main_pciassignable_remove(int argc, char **argv);
 int main_pciassignable_list(int argc, char **argv);
 #ifndef LIBXL_HAVE_NO_SUSPEND_RESUME
 int main_restore(int argc, char **argv);
+int main_fork_vm(int argc, char **argv);
+int main_fork_launch_dm(int argc, char **argv);
+int main_fork_reset(int argc, char **argv);
 int main_migrate_receive(int argc, char **argv);
 int main_save(int argc, char **argv);
 int main_migrate(int argc, char **argv);
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 3b302b2f20..3a5d371057 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -185,6 +185,18 @@ struct cmd_spec cmd_table[] = {
       "Restore a domain from a saved state",
       "- for internal use only",
     },
+    { "fork-vm",
+      &main_fork_vm, 0, 1,
+      "Fork a domain from the running parent domid",
+      "[options] <Domid>",
+      "-h                           Print this help.\n"
+      "-C <config>                  Use config file for VM fork.\n"
+      "-Q <qemu-save-file>          Use qemu save file for VM fork.\n"
+      "--launch-dm <yes|no|late>    Launch device model (QEMU) for VM fork.\n"
+      "--fork-reset                 Reset VM fork.\n"
+      "-p                           Do not unpause fork VM after operation.\n"
+      "-d                           Enable debug messages.\n"
+    },
 #endif
     { "dump-core",
       &main_dump_core, 0, 1,
diff --git a/tools/xl/xl_saverestore.c b/tools/xl/xl_saverestore.c
index 9be033fe65..d99d3eceb2 100644
--- a/tools/xl/xl_saverestore.c
+++ b/tools/xl/xl_saverestore.c
@@ -229,6 +229,103 @@ int main_restore(int argc, char **argv)
     return EXIT_SUCCESS;
 }
 
+int main_fork_vm(int argc, char **argv)
+{
+    int rc, debug = 0;
+    uint32_t domid_in = INVALID_DOMID, domid_out = INVALID_DOMID;
+    int launch_dm = 1;
+    bool reset = 0;
+    bool pause = 0;
+    const char *config_file = NULL;
+    const char *dm_restore_file = NULL;
+
+    int opt;
+    static struct option opts[] = {
+        {"launch-dm", 1, 0, 'l'},
+        {"fork-reset", 0, 0, 'r'},
+        COMMON_LONG_OPTS
+    };
+
+    SWITCH_FOREACH_OPT(opt, "phdC:Q:l:rN:D:B:V:", opts, "fork-vm", 1) {
+    case 'd':
+        debug = 1;
+        break;
+    case 'p':
+        pause = 1;
+        break;
+    case 'C':
+        config_file = optarg;
+        break;
+    case 'Q':
+        dm_restore_file = optarg;
+        break;
+    case 'l':
+        if ( !strcmp(optarg, "no") )
+            launch_dm = 0;
+        if ( !strcmp(optarg, "yes") )
+            launch_dm = 1;
+        if ( !strcmp(optarg, "late") )
+            launch_dm = 2;
+        break;
+    case 'r':
+        reset = 1;
+        break;
+    case 'N': /* fall-through */
+    case 'D': /* fall-through */
+    case 'B': /* fall-through */
+    case 'V':
+        fprintf(stderr, "Unimplemented option(s)\n");
+        return EXIT_FAILURE;
+    }
+
+    if (argc-optind == 1) {
+        domid_in = atoi(argv[optind]);
+    } else {
+        help("fork-vm");
+        return EXIT_FAILURE;
+    }
+
+    if (launch_dm && (!config_file || !dm_restore_file)) {
+        fprintf(stderr, "Currently you must provide both -C and -Q options\n");
+        return EXIT_FAILURE;
+    }
+
+    if (reset) {
+        domid_out = domid_in;
+        if (libxl_domain_fork_reset(ctx, domid_in) == EXIT_FAILURE)
+            return EXIT_FAILURE;
+    }
+
+    if (launch_dm == 2 || reset) {
+        domid_out = domid_in;
+        rc = EXIT_SUCCESS;
+    } else
+        rc = libxl_domain_fork_vm(ctx, domid_in, &domid_out);
+
+    if (rc == EXIT_SUCCESS) {
+        if ( launch_dm ) {
+            struct domain_create dom_info;
+            memset(&dom_info, 0, sizeof(dom_info));
+            dom_info.ddomid = domid_out;
+            dom_info.dm_restore_file = dm_restore_file;
+            dom_info.debug = debug;
+            dom_info.paused = pause;
+            dom_info.config_file = config_file;
+            dom_info.migrate_fd = -1;
+            dom_info.send_back_fd = -1;
+            rc = create_domain(&dom_info) < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
+        } else if ( !pause )
+            rc = libxl_domain_unpause(ctx, domid_out, NULL);
+    }
+
+    if (rc == EXIT_SUCCESS)
+        fprintf(stderr, "fork-vm command successfully returned domid: %u\n", domid_out);
+    else if ( domid_out != INVALID_DOMID )
+        libxl_domain_destroy(ctx, domid_out, 0);
+
+    return rc;
+}
+
 int main_save(int argc, char **argv)
 {
     uint32_t domid;
diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index e520b1da79..d9cb19c599 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -645,6 +645,7 @@ int create_domain(struct domain_create *dom_info)
 
     libxl_domain_config d_config;
 
+    uint32_t ddomid = dom_info->ddomid; // launch dm for this domain iff set
     int debug = dom_info->debug;
     int daemonize = dom_info->daemonize;
     int monitor = dom_info->monitor;
@@ -655,6 +656,7 @@ int create_domain(struct domain_create *dom_info)
     const char *restore_file = dom_info->restore_file;
     const char *config_source = NULL;
     const char *restore_source = NULL;
+    const char *dm_restore_file = dom_info->dm_restore_file;
     int migrate_fd = dom_info->migrate_fd;
     bool config_in_json;
 
@@ -923,6 +925,12 @@ start:
          * restore/migrate-receive it again.
          */
         restoring = 0;
+    } else if ( ddomid ) {
+        d_config.dm_restore_file = dm_restore_file;
+        ret = libxl_domain_fork_launch_dm(ctx, &d_config, ddomid,
+                                          autoconnect_console_how);
+        domid = ddomid;
+        ddomid = INVALID_DOMID;
     } else if (domid_soft_reset != INVALID_DOMID) {
         /* Do soft reset. */
         ret = libxl_domain_soft_reset(ctx, &d_config, domid_soft_reset,
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 2/5] xen: add parent_domid field to createdomain domctl
  2020-02-21 18:49 ` [Xen-devel] [PATCH v9 2/5] xen: add parent_domid field to createdomain domctl Tamas K Lengyel
@ 2020-02-21 21:02   ` Julien Grall
  2020-02-21 21:35     ` Tamas K Lengyel
  2020-02-24 15:44   ` Andrew Cooper
  1 sibling, 1 reply; 45+ messages in thread
From: Julien Grall @ 2020-02-21 21:02 UTC (permalink / raw)
  To: Tamas K Lengyel, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich

Hi Tamas,

On 21/02/2020 18:49, Tamas K Lengyel wrote:
> When creating a domain that will be used as a VM fork some information is
> required to set things up properly, like the max_vcpus count. Instead of the
> toolstack having to gather this information for each fork in a separate
> hypercall we can just include the parent domain's id in the createdomain domctl
> so that Xen can copy the setting without the extra toolstack queries.

It is not entirely clear why you only want to copy max_vcpus. From my 
understanding,  when you are going to fork a domain you will want the 
domain to be nearly identical. So how do you decide what to copy?

> 
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> ---
>   xen/common/domctl.c         | 14 ++++++++++++++
>   xen/include/public/domctl.h |  3 ++-
>   2 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index a69b3b59a8..22aceb3860 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -489,6 +489,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>       case XEN_DOMCTL_createdomain:
>       {
>           domid_t        dom;
> +        domid_t        parent_dom;
>           static domid_t rover = 0;
>   
>           dom = op->domain;
> @@ -515,6 +516,19 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>               rover = dom;
>           }
>   
> +        parent_dom = op->u.createdomain.parent_domid;
> +        if ( parent_dom )

I would rather avoid to assume that parent_dom will not be 0 for a few 
reasons:
    1) Most of Xen (if not all) now avoid to assume that dom0->domain_id 
== 0.
    2) I can see usecases where it we may want to recreate dom0 setup.

So we should consider a different value to indicate whether we want to 
clone from a domain. Maybe by setting bit 16 of the parent_domid?

> +        {
> +            struct domain *pd = rcu_lock_domain_by_id(parent_dom);
> +
> +            ret = -EINVAL;
> +            if ( !pd )
> +                break;
> +
> +            op->u.createdomain.max_vcpus = pd->max_vcpus;
> +            rcu_unlock_domain(pd);
> +        }
> +
>           d = domain_create(dom, &op->u.createdomain, false);
>           if ( IS_ERR(d) )
>           {
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index fec6f6fdd1..251cc40ef6 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -38,7 +38,7 @@
>   #include "hvm/save.h"
>   #include "memory.h"
>   
> -#define XEN_DOMCTL_INTERFACE_VERSION 0x00000012
> +#define XEN_DOMCTL_INTERFACE_VERSION 0x00000013
>   
>   /*
>    * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
> @@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
>       uint32_t max_evtchn_port;
>       int32_t max_grant_frames;
>       int32_t max_maptrack_frames;
> +    domid_t parent_domid;

By just looking at the name, it is not clear what the field is for. It 
also suggest that one domain will be linked to the other. But this is 
not the case here. I would recommend to add a comment explaining how 
this is used by Xen.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 2/5] xen: add parent_domid field to createdomain domctl
  2020-02-21 21:02   ` Julien Grall
@ 2020-02-21 21:35     ` Tamas K Lengyel
  2020-02-21 22:34       ` Julien Grall
  0 siblings, 1 reply; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-21 21:35 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel

On Fri, Feb 21, 2020 at 2:03 PM Julien Grall <julien@xen.org> wrote:
>
> Hi Tamas,
>
> On 21/02/2020 18:49, Tamas K Lengyel wrote:
> > When creating a domain that will be used as a VM fork some information is
> > required to set things up properly, like the max_vcpus count. Instead of the
> > toolstack having to gather this information for each fork in a separate
> > hypercall we can just include the parent domain's id in the createdomain domctl
> > so that Xen can copy the setting without the extra toolstack queries.
>
> It is not entirely clear why you only want to copy max_vcpus. From my
> understanding,  when you are going to fork a domain you will want the
> domain to be nearly identical. So how do you decide what to copy?

All parameters of the parent domain need to be copied, but during
createdomain domctl only max_vcpus is required. The rest are copied
during XENMEM_sharing_op_fork.

>
> >
> > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> > ---
> >   xen/common/domctl.c         | 14 ++++++++++++++
> >   xen/include/public/domctl.h |  3 ++-
> >   2 files changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> > index a69b3b59a8..22aceb3860 100644
> > --- a/xen/common/domctl.c
> > +++ b/xen/common/domctl.c
> > @@ -489,6 +489,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
> >       case XEN_DOMCTL_createdomain:
> >       {
> >           domid_t        dom;
> > +        domid_t        parent_dom;
> >           static domid_t rover = 0;
> >
> >           dom = op->domain;
> > @@ -515,6 +516,19 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
> >               rover = dom;
> >           }
> >
> > +        parent_dom = op->u.createdomain.parent_domid;
> > +        if ( parent_dom )
>
> I would rather avoid to assume that parent_dom will not be 0 for a few
> reasons:
>     1) Most of Xen (if not all) now avoid to assume that dom0->domain_id
> == 0.
>     2) I can see usecases where it we may want to recreate dom0 setup.

That's an interesting thought, I don't expect that will be a usecase
but it's interesting. Currently I don't think it would work, so I
consider that to be out-of-scope.

>
> So we should consider a different value to indicate whether we want to
> clone from a domain. Maybe by setting bit 16 of the parent_domid?

I can add a XEN_DOMCTL_CDF_fork flag, but I think that's an overkill.

>
> > +        {
> > +            struct domain *pd = rcu_lock_domain_by_id(parent_dom);
> > +
> > +            ret = -EINVAL;
> > +            if ( !pd )
> > +                break;
> > +
> > +            op->u.createdomain.max_vcpus = pd->max_vcpus;
> > +            rcu_unlock_domain(pd);
> > +        }
> > +
> >           d = domain_create(dom, &op->u.createdomain, false);
> >           if ( IS_ERR(d) )
> >           {
> > diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> > index fec6f6fdd1..251cc40ef6 100644
> > --- a/xen/include/public/domctl.h
> > +++ b/xen/include/public/domctl.h
> > @@ -38,7 +38,7 @@
> >   #include "hvm/save.h"
> >   #include "memory.h"
> >
> > -#define XEN_DOMCTL_INTERFACE_VERSION 0x00000012
> > +#define XEN_DOMCTL_INTERFACE_VERSION 0x00000013
> >
> >   /*
> >    * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
> > @@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
> >       uint32_t max_evtchn_port;
> >       int32_t max_grant_frames;
> >       int32_t max_maptrack_frames;
> > +    domid_t parent_domid;
>
> By just looking at the name, it is not clear what the field is for. It
> also suggest that one domain will be linked to the other. But this is
> not the case here. I would recommend to add a comment explaining how
> this is used by Xen.

No, during createdomain the domains won't get linked. Only once the
XENMEM_sharing_op_fork finishes do the domains get linked. I explain
this in the patch message, I can copy that as a comment into the
header if you prefer.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 2/5] xen: add parent_domid field to createdomain domctl
  2020-02-21 21:35     ` Tamas K Lengyel
@ 2020-02-21 22:34       ` Julien Grall
  2020-02-21 22:53         ` Tamas K Lengyel
  0 siblings, 1 reply; 45+ messages in thread
From: Julien Grall @ 2020-02-21 22:34 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel

Hi Tamas,

On 21/02/2020 21:35, Tamas K Lengyel wrote:
> On Fri, Feb 21, 2020 at 2:03 PM Julien Grall <julien@xen.org> wrote:
>>
>> Hi Tamas,
>>
>> On 21/02/2020 18:49, Tamas K Lengyel wrote:
>>> When creating a domain that will be used as a VM fork some information is
>>> required to set things up properly, like the max_vcpus count. Instead of the
>>> toolstack having to gather this information for each fork in a separate
>>> hypercall we can just include the parent domain's id in the createdomain domctl
>>> so that Xen can copy the setting without the extra toolstack queries.
>>
>> It is not entirely clear why you only want to copy max_vcpus. From my
>> understanding,  when you are going to fork a domain you will want the
>> domain to be nearly identical. So how do you decide what to copy?
> 
> All parameters of the parent domain need to be copied, but during
> createdomain domctl only max_vcpus is required. The rest are copied
> during XENMEM_sharing_op_fork.

I don't believe max_vcpus is the only field required here. Most of the 
information hold in the structure are required at creation time so the 
domain is configured correctly. For instance, on Arm, the version of 
interrupt controller can only be configured at creation time. For x86, I 
am pretty sure the emuflags have to be correct at creation time as well.

So it feels weird to me that you only need to copy max_vcpus here. 
Because if you are able to fill up the other fields of the structure, 
then most likely you have the max_vcpus in hand as well.

The current suggestion is too restrictive and only save you one 
hypercall. IMHO, if we are going to update createdomain, then all the 
fields but parent_domid should be ignored when the latter is valid. The 
fields can then be filled up and copied back to the toolstack so it can 
consumed the information.

> 
>>
>>>
>>> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
>>> ---
>>>    xen/common/domctl.c         | 14 ++++++++++++++
>>>    xen/include/public/domctl.h |  3 ++-
>>>    2 files changed, 16 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
>>> index a69b3b59a8..22aceb3860 100644
>>> --- a/xen/common/domctl.c
>>> +++ b/xen/common/domctl.c
>>> @@ -489,6 +489,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>>>        case XEN_DOMCTL_createdomain:
>>>        {
>>>            domid_t        dom;
>>> +        domid_t        parent_dom;
>>>            static domid_t rover = 0;
>>>
>>>            dom = op->domain;
>>> @@ -515,6 +516,19 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>>>                rover = dom;
>>>            }
>>>
>>> +        parent_dom = op->u.createdomain.parent_domid;
>>> +        if ( parent_dom )
>>
>> I would rather avoid to assume that parent_dom will not be 0 for a few
>> reasons:
>>      1) Most of Xen (if not all) now avoid to assume that dom0->domain_id
>> == 0.
>>      2) I can see usecases where it we may want to recreate dom0 setup.
> 
> That's an interesting thought, I don't expect that will be a usecase
> but it's interesting. 

Maybe not in the context VM fork. But ...

> Currently I don't think it would work, so I
> consider that to be out-of-scope.

... this is a generic hypercall and therefore we should be open to other 
usecase. I am not asking to check whether we can recreate a domain based 
on dom0. I am only asking to be mindful with your interface and not put 
ourself in a corner just because this is out-of-scope for you.

The more important bit for me my point 1). I.e not assuming that 0 is an 
invalid value for domid.

> 
>>
>> So we should consider a different value to indicate whether we want to
>> clone from a domain. Maybe by setting bit 16 of the parent_domid?
> 
> I can add a XEN_DOMCTL_CDF_fork flag, but I think that's an overkill.

See above. If you are going to modify a common interface then you need 
to bear in mind how this can be used.

> 
>>
>>> +        {
>>> +            struct domain *pd = rcu_lock_domain_by_id(parent_dom);
>>> +
>>> +            ret = -EINVAL;
>>> +            if ( !pd )
>>> +                break;
>>> +
>>> +            op->u.createdomain.max_vcpus = pd->max_vcpus;
>>> +            rcu_unlock_domain(pd);
>>> +        }
>>> +
>>>            d = domain_create(dom, &op->u.createdomain, false);
>>>            if ( IS_ERR(d) )
>>>            {
>>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>>> index fec6f6fdd1..251cc40ef6 100644
>>> --- a/xen/include/public/domctl.h
>>> +++ b/xen/include/public/domctl.h
>>> @@ -38,7 +38,7 @@
>>>    #include "hvm/save.h"
>>>    #include "memory.h"
>>>
>>> -#define XEN_DOMCTL_INTERFACE_VERSION 0x00000012
>>> +#define XEN_DOMCTL_INTERFACE_VERSION 0x00000013
>>>
>>>    /*
>>>     * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
>>> @@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
>>>        uint32_t max_evtchn_port;
>>>        int32_t max_grant_frames;
>>>        int32_t max_maptrack_frames;
>>> +    domid_t parent_domid;
>>
>> By just looking at the name, it is not clear what the field is for. It
>> also suggest that one domain will be linked to the other. But this is
>> not the case here. I would recommend to add a comment explaining how
>> this is used by Xen.
> 
> No, during createdomain the domains won't get linked. Only once the
> XENMEM_sharing_op_fork finishes do the domains get linked. I explain
> this in the patch message, I can copy that as a comment into the
> header if you prefer.

Bear in mind that developpers don't want to play the blame game 
everytime they want to understand an interfaces. So you either want to 
use a meaningful name or a comment in your code.

Maybe "clone_domid" would be clearer here. Yet it is not clear that only 
"max_vcpus" is going to be copied.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 2/5] xen: add parent_domid field to createdomain domctl
  2020-02-21 22:34       ` Julien Grall
@ 2020-02-21 22:53         ` Tamas K Lengyel
  2020-02-21 23:18           ` Julien Grall
  0 siblings, 1 reply; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-21 22:53 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel

On Fri, Feb 21, 2020 at 3:34 PM Julien Grall <julien@xen.org> wrote:
>
> Hi Tamas,
>
> On 21/02/2020 21:35, Tamas K Lengyel wrote:
> > On Fri, Feb 21, 2020 at 2:03 PM Julien Grall <julien@xen.org> wrote:
> >>
> >> Hi Tamas,
> >>
> >> On 21/02/2020 18:49, Tamas K Lengyel wrote:
> >>> When creating a domain that will be used as a VM fork some information is
> >>> required to set things up properly, like the max_vcpus count. Instead of the
> >>> toolstack having to gather this information for each fork in a separate
> >>> hypercall we can just include the parent domain's id in the createdomain domctl
> >>> so that Xen can copy the setting without the extra toolstack queries.
> >>
> >> It is not entirely clear why you only want to copy max_vcpus. From my
> >> understanding,  when you are going to fork a domain you will want the
> >> domain to be nearly identical. So how do you decide what to copy?
> >
> > All parameters of the parent domain need to be copied, but during
> > createdomain domctl only max_vcpus is required. The rest are copied
> > during XENMEM_sharing_op_fork.
>
> I don't believe max_vcpus is the only field required here. Most of the
> information hold in the structure are required at creation time so the
> domain is configured correctly. For instance, on Arm, the version of
> interrupt controller can only be configured at creation time. For x86, I
> am pretty sure the emuflags have to be correct at creation time as well.
>
> So it feels weird to me that you only need to copy max_vcpus here.
> Because if you are able to fill up the other fields of the structure,
> then most likely you have the max_vcpus in hand as well.

Look at patch 5 and see how libxl statically define most of these
values and why we don't need to copy them.

>
> The current suggestion is too restrictive and only save you one
> hypercall. IMHO, if we are going to update createdomain, then all the
> fields but parent_domid should be ignored when the latter is valid. The
> fields can then be filled up and copied back to the toolstack so it can
> consumed the information.
>
> >
> >>
> >>>
> >>> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> >>> ---
> >>>    xen/common/domctl.c         | 14 ++++++++++++++
> >>>    xen/include/public/domctl.h |  3 ++-
> >>>    2 files changed, 16 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> >>> index a69b3b59a8..22aceb3860 100644
> >>> --- a/xen/common/domctl.c
> >>> +++ b/xen/common/domctl.c
> >>> @@ -489,6 +489,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
> >>>        case XEN_DOMCTL_createdomain:
> >>>        {
> >>>            domid_t        dom;
> >>> +        domid_t        parent_dom;
> >>>            static domid_t rover = 0;
> >>>
> >>>            dom = op->domain;
> >>> @@ -515,6 +516,19 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
> >>>                rover = dom;
> >>>            }
> >>>
> >>> +        parent_dom = op->u.createdomain.parent_domid;
> >>> +        if ( parent_dom )
> >>
> >> I would rather avoid to assume that parent_dom will not be 0 for a few
> >> reasons:
> >>      1) Most of Xen (if not all) now avoid to assume that dom0->domain_id
> >> == 0.
> >>      2) I can see usecases where it we may want to recreate dom0 setup.
> >
> > That's an interesting thought, I don't expect that will be a usecase
> > but it's interesting.
>
> Maybe not in the context VM fork. But ...
>
> > Currently I don't think it would work, so I
> > consider that to be out-of-scope.
>
> ... this is a generic hypercall and therefore we should be open to other
> usecase. I am not asking to check whether we can recreate a domain based
> on dom0. I am only asking to be mindful with your interface and not put
> ourself in a corner just because this is out-of-scope for you.
>
> The more important bit for me my point 1). I.e not assuming that 0 is an
> invalid value for domid.
>
> >
> >>
> >> So we should consider a different value to indicate whether we want to
> >> clone from a domain. Maybe by setting bit 16 of the parent_domid?
> >
> > I can add a XEN_DOMCTL_CDF_fork flag, but I think that's an overkill.
>
> See above. If you are going to modify a common interface then you need
> to bear in mind how this can be used.
>
> >
> >>
> >>> +        {
> >>> +            struct domain *pd = rcu_lock_domain_by_id(parent_dom);
> >>> +
> >>> +            ret = -EINVAL;
> >>> +            if ( !pd )
> >>> +                break;
> >>> +
> >>> +            op->u.createdomain.max_vcpus = pd->max_vcpus;
> >>> +            rcu_unlock_domain(pd);
> >>> +        }
> >>> +
> >>>            d = domain_create(dom, &op->u.createdomain, false);
> >>>            if ( IS_ERR(d) )
> >>>            {
> >>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> >>> index fec6f6fdd1..251cc40ef6 100644
> >>> --- a/xen/include/public/domctl.h
> >>> +++ b/xen/include/public/domctl.h
> >>> @@ -38,7 +38,7 @@
> >>>    #include "hvm/save.h"
> >>>    #include "memory.h"
> >>>
> >>> -#define XEN_DOMCTL_INTERFACE_VERSION 0x00000012
> >>> +#define XEN_DOMCTL_INTERFACE_VERSION 0x00000013
> >>>
> >>>    /*
> >>>     * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
> >>> @@ -92,6 +92,7 @@ struct xen_domctl_createdomain {
> >>>        uint32_t max_evtchn_port;
> >>>        int32_t max_grant_frames;
> >>>        int32_t max_maptrack_frames;
> >>> +    domid_t parent_domid;
> >>
> >> By just looking at the name, it is not clear what the field is for. It
> >> also suggest that one domain will be linked to the other. But this is
> >> not the case here. I would recommend to add a comment explaining how
> >> this is used by Xen.
> >
> > No, during createdomain the domains won't get linked. Only once the
> > XENMEM_sharing_op_fork finishes do the domains get linked. I explain
> > this in the patch message, I can copy that as a comment into the
> > header if you prefer.
>
> Bear in mind that developpers don't want to play the blame game
> everytime they want to understand an interfaces. So you either want to
> use a meaningful name or a comment in your code.
>
> Maybe "clone_domid" would be clearer here. Yet it is not clear that only
> "max_vcpus" is going to be copied.

Julien,
you have valid points but at this time I won't be able to refactor
this series any more. This patch series was first posted in September,
it's now almost March. So at this point I'm just going to say drop
this patch and we'll live with the limitation that VM forking only
works with single vCPU domains until at some later time someone needs
it to work with guests that have more then 1 vCPUs. If that's also
unacceptable for whatever reason, then we'll live with the feature not
being merged upstream.

Cheers,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 2/5] xen: add parent_domid field to createdomain domctl
  2020-02-21 22:53         ` Tamas K Lengyel
@ 2020-02-21 23:18           ` Julien Grall
  2020-02-21 23:31             ` Tamas K Lengyel
  0 siblings, 1 reply; 45+ messages in thread
From: Julien Grall @ 2020-02-21 23:18 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel

On Fri, 21 Feb 2020 at 22:53, Tamas K Lengyel <tamas.k.lengyel@gmail.com> wrote:
>
> On Fri, Feb 21, 2020 at 3:34 PM Julien Grall <julien@xen.org> wrote:
> >
> > Hi Tamas,
> >
> > On 21/02/2020 21:35, Tamas K Lengyel wrote:
> > > On Fri, Feb 21, 2020 at 2:03 PM Julien Grall <julien@xen.org> wrote:
> > >>
> > >> Hi Tamas,
> > >>
> > >> On 21/02/2020 18:49, Tamas K Lengyel wrote:
> > >>> When creating a domain that will be used as a VM fork some information is
> > >>> required to set things up properly, like the max_vcpus count. Instead of the
> > >>> toolstack having to gather this information for each fork in a separate
> > >>> hypercall we can just include the parent domain's id in the createdomain domctl
> > >>> so that Xen can copy the setting without the extra toolstack queries.
> > >>
> > >> It is not entirely clear why you only want to copy max_vcpus. From my
> > >> understanding,  when you are going to fork a domain you will want the
> > >> domain to be nearly identical. So how do you decide what to copy?
> > >
> > > All parameters of the parent domain need to be copied, but during
> > > createdomain domctl only max_vcpus is required. The rest are copied
> > > during XENMEM_sharing_op_fork.
> >
> > I don't believe max_vcpus is the only field required here. Most of the
> > information hold in the structure are required at creation time so the
> > domain is configured correctly. For instance, on Arm, the version of
> > interrupt controller can only be configured at creation time. For x86, I
> > am pretty sure the emuflags have to be correct at creation time as well.
> >
> > So it feels weird to me that you only need to copy max_vcpus here.
> > Because if you are able to fill up the other fields of the structure,
> > then most likely you have the max_vcpus in hand as well.
>
> Look at patch 5 and see how libxl statically define most of these
> values and why we don't need to copy them.

I looked at patch 5, this is an example of the implementation.
You limit yourself quite a bit and that's your choice. But I am afraid
this does not mean the interface with the hypervisor should do the
same.

[...]

> Julien,
> you have valid points but at this time I won't be able to refactor
> this series any more. This patch series was first posted in September,
> it's now almost March. So at this point I'm just going to say drop
> this patch and we'll live with the limitation that VM forking only
> works with single vCPU domains until at some later time someone needs
> it to work with guests that have more then 1 vCPUs.

To be honest, I don't have a vested interest for the VM forking. So
the limitation
of 1 vCPU is fine with me.

Anyone who will want to support more than 1 vCPU with forking will have to
come up with a proper interface. If you don't want to invest time on it that's
fine, the rest of the code is still useful to have.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 2/5] xen: add parent_domid field to createdomain domctl
  2020-02-21 23:18           ` Julien Grall
@ 2020-02-21 23:31             ` Tamas K Lengyel
  0 siblings, 0 replies; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-21 23:31 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel

On Fri, Feb 21, 2020 at 4:18 PM Julien Grall <julien.grall.oss@gmail.com> wrote:
>
> On Fri, 21 Feb 2020 at 22:53, Tamas K Lengyel <tamas.k.lengyel@gmail.com> wrote:
> >
> > On Fri, Feb 21, 2020 at 3:34 PM Julien Grall <julien@xen.org> wrote:
> > >
> > > Hi Tamas,
> > >
> > > On 21/02/2020 21:35, Tamas K Lengyel wrote:
> > > > On Fri, Feb 21, 2020 at 2:03 PM Julien Grall <julien@xen.org> wrote:
> > > >>
> > > >> Hi Tamas,
> > > >>
> > > >> On 21/02/2020 18:49, Tamas K Lengyel wrote:
> > > >>> When creating a domain that will be used as a VM fork some information is
> > > >>> required to set things up properly, like the max_vcpus count. Instead of the
> > > >>> toolstack having to gather this information for each fork in a separate
> > > >>> hypercall we can just include the parent domain's id in the createdomain domctl
> > > >>> so that Xen can copy the setting without the extra toolstack queries.
> > > >>
> > > >> It is not entirely clear why you only want to copy max_vcpus. From my
> > > >> understanding,  when you are going to fork a domain you will want the
> > > >> domain to be nearly identical. So how do you decide what to copy?
> > > >
> > > > All parameters of the parent domain need to be copied, but during
> > > > createdomain domctl only max_vcpus is required. The rest are copied
> > > > during XENMEM_sharing_op_fork.
> > >
> > > I don't believe max_vcpus is the only field required here. Most of the
> > > information hold in the structure are required at creation time so the
> > > domain is configured correctly. For instance, on Arm, the version of
> > > interrupt controller can only be configured at creation time. For x86, I
> > > am pretty sure the emuflags have to be correct at creation time as well.
> > >
> > > So it feels weird to me that you only need to copy max_vcpus here.
> > > Because if you are able to fill up the other fields of the structure,
> > > then most likely you have the max_vcpus in hand as well.
> >
> > Look at patch 5 and see how libxl statically define most of these
> > values and why we don't need to copy them.
>
> I looked at patch 5, this is an example of the implementation.
> You limit yourself quite a bit and that's your choice. But I am afraid
> this does not mean the interface with the hypervisor should do the
> same.
>
> [...]
>
> > Julien,
> > you have valid points but at this time I won't be able to refactor
> > this series any more. This patch series was first posted in September,
> > it's now almost March. So at this point I'm just going to say drop
> > this patch and we'll live with the limitation that VM forking only
> > works with single vCPU domains until at some later time someone needs
> > it to work with guests that have more then 1 vCPUs.
>
> To be honest, I don't have a vested interest for the VM forking. So
> the limitation
> of 1 vCPU is fine with me.
>
> Anyone who will want to support more than 1 vCPU with forking will have to
> come up with a proper interface. If you don't want to invest time on it that's
> fine, the rest of the code is still useful to have.

The toolstack can always just decide to do the extra hypercall query
for the max_vcpus and make things work that way. In our usecase we
have single vCPU domains so doing that extra query is something I want
to avoid.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking
  2020-02-21 18:49 ` [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking Tamas K Lengyel
@ 2020-02-24 12:39   ` Roger Pau Monné
  2020-02-24 15:45     ` Tamas K Lengyel
  2020-02-25 13:28     ` Jan Beulich
  2020-02-25 10:04   ` Roger Pau Monné
  1 sibling, 2 replies; 45+ messages in thread
From: Roger Pau Monné @ 2020-02-24 12:39 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	Andrew Cooper, Ian Jackson, George Dunlap, Tamas K Lengyel,
	Jan Beulich, xen-devel

On Fri, Feb 21, 2020 at 10:49:21AM -0800, Tamas K Lengyel wrote:
> VM forking is the process of creating a domain with an empty memory space and a
> parent domain specified from which to populate the memory when necessary. For
> the new domain to be functional the VM state is copied over as part of the fork
> operation (HVM params, hap allocation, etc).
> 
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> ---
> v9: remove stale header
>     fix tsc incarnation being bumped on set
> ---
>  xen/arch/x86/domain.c             |  11 ++
>  xen/arch/x86/hvm/hvm.c            |   2 +-
>  xen/arch/x86/mm/mem_sharing.c     | 222 ++++++++++++++++++++++++++++++
>  xen/arch/x86/mm/p2m.c             |  11 +-
>  xen/include/asm-x86/mem_sharing.h |  17 +++
>  xen/include/public/memory.h       |   5 +
>  xen/include/xen/sched.h           |   2 +
>  7 files changed, 268 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index fe63c23676..1ab0ca0942 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -2203,6 +2203,17 @@ int domain_relinquish_resources(struct domain *d)
>              ret = relinquish_shared_pages(d);
>              if ( ret )
>                  return ret;
> +
> +            /*
> +             * If the domain is forked, decrement the parent's pause count
> +             * and release the domain.
> +             */
> +            if ( d->parent )
> +            {
> +                domain_unpause(d->parent);
> +                put_domain(d->parent);
> +                d->parent = NULL;
> +            }
>          }
>  #endif
>  
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index a339b36a0d..9bfa603f8c 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -1915,7 +1915,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
>      }
>  #endif
>  
> -    /* Spurious fault? PoD and log-dirty also take this path. */
> +    /* Spurious fault? PoD, log-dirty and VM forking also take this path. */
>      if ( p2m_is_ram(p2mt) )
>      {
>          rc = 1;
> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> index 3835bc928f..ad5db9d8d5 100644
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -22,6 +22,7 @@
>  
>  #include <xen/types.h>
>  #include <xen/domain_page.h>
> +#include <xen/event.h>
>  #include <xen/spinlock.h>
>  #include <xen/rwlock.h>
>  #include <xen/mm.h>
> @@ -36,6 +37,9 @@
>  #include <asm/altp2m.h>
>  #include <asm/atomic.h>
>  #include <asm/event.h>
> +#include <asm/hap.h>
> +#include <asm/hvm/hvm.h>
> +#include <asm/hvm/save.h>
>  #include <xsm/xsm.h>
>  
>  #include "mm-locks.h"
> @@ -1444,6 +1448,194 @@ static inline int mem_sharing_control(struct domain *d, bool enable)
>      return 0;
>  }
>  
> +/*
> + * Forking a page only gets called when the VM faults due to no entry being
> + * in the EPT for the access. Depending on the type of access we either
> + * populate the physmap with a shared entry for read-only access or
> + * fork the page if its a write access.
> + *
> + * The client p2m is already locked so we only need to lock
> + * the parent's here.
> + */
> +int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool unsharing)
> +{
> +    int rc = -ENOENT;
> +    shr_handle_t handle;
> +    struct domain *parent;

Can you constify parent, I assume there are no changes made to the
parent domain, just the forked one.

> +    struct p2m_domain *p2m;
> +    unsigned long gfn_l = gfn_x(gfn);
> +    mfn_t mfn, new_mfn;
> +    p2m_type_t p2mt;
> +    struct page_info *page;
> +
> +    if ( !mem_sharing_is_fork(d) )
> +        return -ENOENT;
> +
> +    parent = d->parent;

You can initialize at declaration time.

> +
> +    if ( !unsharing )
> +    {
> +        /* For read-only accesses we just add a shared entry to the physmap */
> +        while ( parent )
> +        {
> +            if ( !(rc = nominate_page(parent, gfn, 0, &handle)) )
> +                break;
> +
> +            parent = parent->parent;
> +        }
> +
> +        if ( !rc )
> +        {
> +            /* The client's p2m is already locked */
> +            struct p2m_domain *pp2m = p2m_get_hostp2m(parent);
> +
> +            p2m_lock(pp2m);
> +            rc = add_to_physmap(parent, gfn_l, handle, d, gfn_l, false);
> +            p2m_unlock(pp2m);
> +
> +            if ( !rc )
> +                return 0;
> +        }

Don't you need to return here, or else you will fallback into the
unsharing path?

> +    }
> +
> +    /*
> +     * If it's a write access (ie. unsharing) or if adding a shared entry to
> +     * the physmap failed we'll fork the page directly.
> +     */
> +    p2m = p2m_get_hostp2m(d);
> +    parent = d->parent;
> +
> +    while ( parent )
> +    {
> +        mfn = get_gfn_query(parent, gfn_l, &p2mt);
> +
> +        if ( mfn_valid(mfn) && p2m_is_any_ram(p2mt) )

This would also cover grants, but I'm not sure how those are handled
by forking, as access to those is granted on a per-domain basis. Ie:
the parent will have access to the grant, but not the child.

> +            break;
> +
> +        put_gfn(parent, gfn_l);
> +        parent = parent->parent;
> +    }
> +
> +    if ( !parent )
> +        return -ENOENT;
> +
> +    if ( !(page = alloc_domheap_page(d, 0)) )
> +    {
> +        put_gfn(parent, gfn_l);
> +        return -ENOMEM;
> +    }
> +
> +    new_mfn = page_to_mfn(page);
> +    copy_domain_page(new_mfn, mfn);
> +    set_gpfn_from_mfn(mfn_x(new_mfn), gfn_l);
> +
> +    put_gfn(parent, gfn_l);
> +
> +    return p2m->set_entry(p2m, gfn, new_mfn, PAGE_ORDER_4K, p2m_ram_rw,

So the child p2m is going to be populated using 4K pages exclusively?
Maybe it would make sense to try to use 2M if the parent domain page
is also a 2M page or larger?

> +                          p2m->default_access, -1);
> +}
> +
> +static int bring_up_vcpus(struct domain *cd, struct cpupool *cpupool)
> +{
> +    int ret;
> +    unsigned int i;
> +
> +    if ( (ret = cpupool_move_domain(cd, cpupool)) )
> +        return ret;
> +
> +    for ( i = 0; i < cd->max_vcpus; i++ )
> +    {
> +        if ( cd->vcpu[i] )
> +            continue;
> +
> +        if ( !vcpu_create(cd, i) )
> +            return -EINVAL;
> +    }
> +
> +    domain_update_node_affinity(cd);
> +    return 0;
> +}
> +
> +static int fork_hap_allocation(struct domain *cd, struct domain *d)

Can you constify the parent domain?

Also cd and d and not very helpful names, I would just use child and
parent.

> +{
> +    int rc;
> +    bool preempted;
> +    unsigned long mb = hap_get_allocation(d);
> +
> +    if ( mb == hap_get_allocation(cd) )
> +        return 0;
> +
> +    paging_lock(cd);
> +    rc = hap_set_allocation(cd, mb << (20 - PAGE_SHIFT), &preempted);
> +    paging_unlock(cd);
> +
> +    if ( rc )
> +        return rc;
> +
> +    if ( preempted )
> +        return -ERESTART;
> +
> +    return 0;

You can join all the checks into a single return:

return preempted ? -ERESTART : rc;

> +}
> +
> +static void fork_tsc(struct domain *cd, struct domain *d)
> +{
> +    uint32_t tsc_mode;
> +    uint32_t gtsc_khz;
> +    uint32_t incarnation;
> +    uint64_t elapsed_nsec;
> +
> +    tsc_get_info(d, &tsc_mode, &elapsed_nsec, &gtsc_khz, &incarnation);
> +    /* Don't bump incarnation on set */
> +    tsc_set_info(cd, tsc_mode, elapsed_nsec, gtsc_khz, incarnation - 1);

Why is the incarnation not bumped?

> +}
> +
> +static int mem_sharing_fork(struct domain *d, struct domain *cd)
> +{
> +    int rc = -EINVAL;
> +
> +    if ( !cd->controller_pause_count )
> +        return rc;

-EBUSY might be better here.

> +
> +    /*
> +     * We only want to get and pause the parent once, not each time this
> +     * operation is restarted due to preemption.
> +     */
> +    if ( !cd->parent_paused )
> +    {
> +        ASSERT(get_domain(d));

We are trying to avoid such constructs, instead I suggest:

if ( !get_domain(parent) )
{
    ASSERT_UNREACHABLE();
    return -EBUSY;
}

> +        domain_pause(d);
> +
> +        cd->parent_paused = true;
> +        cd->max_pages = d->max_pages;
> +        cd->max_vcpus = d->max_vcpus;
> +    }
> +
> +    /* this is preemptible so it's the first to get done */
> +    if ( (rc = fork_hap_allocation(cd, d)) )
> +        goto done;
> +
> +    if ( (rc = bring_up_vcpus(cd, d->cpupool)) )
> +        goto done;
> +
> +    if ( (rc = hvm_copy_context_and_params(cd, d)) )
> +        goto done;
> +
> +    fork_tsc(cd, d);
> +
> +    cd->parent = d;
> +
> + done:
> +    if ( rc && rc != -ERESTART )
> +    {
> +        domain_unpause(d);
> +        put_domain(d);
> +        cd->parent_paused = false;
> +    }
> +
> +    return rc;
> +}
> +
>  int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>  {
>      int rc;
> @@ -1698,6 +1890,36 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>          rc = debug_gref(d, mso.u.debug.u.gref);
>          break;
>  
> +    case XENMEM_sharing_op_fork:
> +    {
> +        struct domain *pd;
> +
> +        rc = -EINVAL;
> +        if ( mso.u.fork._pad[0] || mso.u.fork._pad[1] ||
> +             mso.u.fork._pad[2] )
> +            goto out;
> +
> +        rc = rcu_lock_live_remote_domain_by_id(mso.u.fork.parent_domain,
> +                                               &pd);
> +        if ( rc )
> +            goto out;
> +
> +        if ( !mem_sharing_enabled(pd) )
> +        {
> +            if ( (rc = mem_sharing_control(pd, true)) )
> +                goto out;

Please join both conditions:

if ( !mem_sharing_enabled(pd) &&
     (rc = mem_sharing_control(pd, true)) )
    goto out;

> +        }
> +
> +        rc = mem_sharing_fork(pd, d);
> +
> +        if ( rc == -ERESTART )
> +            rc = hypercall_create_continuation(__HYPERVISOR_memory_op,
> +                                               "lh", XENMEM_sharing_op,
> +                                               arg);
> +        rcu_unlock_domain(pd);
> +        break;
> +    }
> +
>      default:
>          rc = -ENOSYS;
>          break;
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index c5f428d67c..7c4d2fd7a0 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -509,6 +509,14 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
>  
>      mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
>  
> +    /* Check if we need to fork the page */
> +    if ( (q & P2M_ALLOC) && p2m_is_hole(*t) &&
> +         !mem_sharing_fork_page(p2m->domain, gfn, !!(q & P2M_UNSHARE)) )
> +    {
> +        mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
> +    }

No need for the braces.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 4/5] x86/mem_sharing: reset a fork
  2020-02-21 18:49 ` [Xen-devel] [PATCH v9 4/5] x86/mem_sharing: reset a fork Tamas K Lengyel
@ 2020-02-24 15:12   ` Roger Pau Monné
  2020-02-24 15:35     ` Tamas K Lengyel
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monné @ 2020-02-24 15:12 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tamas K Lengyel, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	Andrew Cooper, Ian Jackson, George Dunlap, Stefano Stabellini,
	Jan Beulich, xen-devel

On Fri, Feb 21, 2020 at 10:49:22AM -0800, Tamas K Lengyel wrote:
> Implement hypercall that allows a fork to shed all memory that got allocated
> for it during its execution and re-load its vCPU context from the parent VM.
> This allows the forked VM to reset into the same state the parent VM is in a
> faster way then creating a new fork would be. Measurements show about a 2x
> speedup during normal fuzzing operations. Performance may vary depending how
                                                                          ^ on
> much memory got allocated for the forked VM. If it has been completely
> deduplicated from the parent VM then creating a new fork would likely be more
> performant.
> 
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> ---
>  xen/arch/x86/mm/mem_sharing.c | 76 +++++++++++++++++++++++++++++++++++
>  xen/include/public/memory.h   |  1 +
>  2 files changed, 77 insertions(+)
> 
> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> index ad5db9d8d5..fb6892aaa6 100644
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -1636,6 +1636,59 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd)
>      return rc;
>  }
>  
> +/*
> + * The fork reset operation is intended to be used on short-lived forks only.
> + * There is no hypercall continuation operation implemented for this reason.
> + * For forks that obtain a larger memory footprint it is likely going to be
> + * more performant to create a new fork instead of resetting an existing one.
> + *
> + * TODO: In case this hypercall would become useful on forks with larger memory
> + * footprints the hypercall continuation should be implemented.

I'm afraid this is not safe, as users don't have an easy way to know
whether a fork will have a large memory footprint or not.

IMO you either need some kind of check that prevents this function
from being executed when the domain as too much memory assigned, or
you need to implement continuations.

Or else this is very likely to trip over the watchdog.

> + */
> +static int mem_sharing_fork_reset(struct domain *d, struct domain *cd)

No need for the mem_sharing prefix, as it's a static function.

> +{
> +    int rc;
> +    struct p2m_domain* p2m = p2m_get_hostp2m(cd);
> +    struct page_info *page, *tmp;
> +
> +    domain_pause(cd);
> +
> +    page_list_for_each_safe(page, tmp, &cd->page_list)

You can use something like:

while ( (page = page_list_remove_head(&cd->page_list) != NULL )
{
    ...
}

And avoid the *tmp local variable.

> +    {
> +        p2m_type_t p2mt;
> +        p2m_access_t p2ma;
> +        gfn_t gfn;
> +        mfn_t mfn = page_to_mfn(page);
> +
> +        if ( !mfn_valid(mfn) )
> +            continue;
> +
> +        gfn = mfn_to_gfn(cd, mfn);
> +        mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
> +                                    0, NULL, false);
> +
> +        if ( !p2m_is_ram(p2mt) || p2m_is_shared(p2mt) )
> +            continue;
> +
> +        /* take an extra reference */
> +        if ( !get_page(page, cd) )
> +            continue;
> +
> +        rc = p2m->set_entry(p2m, gfn, INVALID_MFN, PAGE_ORDER_4K,
> +                            p2m_invalid, p2m_access_rwx, -1);
> +        ASSERT(!rc);

Can you handle this gracefully?

> +
> +        put_page_alloc_ref(page);
> +        put_page(page);
> +    }
> +
> +    if ( !(rc = hvm_copy_context_and_params(cd, d)) )
> +        fork_tsc(cd, d);
> +
> +    domain_unpause(cd);
> +    return rc;
> +}
> +
>  int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>  {
>      int rc;
> @@ -1920,6 +1973,29 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>          break;
>      }
>  
> +    case XENMEM_sharing_op_fork_reset:
> +    {
> +        struct domain *pd;
> +
> +        rc = -EINVAL;
> +        if ( mso.u.fork._pad[0] || mso.u.fork._pad[1] ||
> +             mso.u.fork._pad[2] )
> +            goto out;
> +
> +        rc = -ENOSYS;
> +        if ( !d->parent )

Why not use mem_sharing_is_fork?

Also I think EINVAL would be more suitable here, as the passed domid
doesn't belong to a fork?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 1/5] xen/x86: Make hap_get_allocation accessible
  2020-02-21 18:49 ` [Xen-devel] [PATCH v9 1/5] xen/x86: Make hap_get_allocation accessible Tamas K Lengyel
@ 2020-02-24 15:20   ` Roger Pau Monné
  2020-02-25 13:16     ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monné @ 2020-02-24 15:20 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: xen-devel, Wei Liu, George Dunlap, Jan Beulich, Andrew Cooper

On Fri, Feb 21, 2020 at 10:49:19AM -0800, Tamas K Lengyel wrote:
> During VM forking we'll copy the parent domain's parameters to the client,
> including the HAP shadow memory setting that is used for storing the domain's
> EPT. We'll copy this in the hypervisor instead doing it during toolstack launch
> to allow the domain to start executing and unsharing memory before (or
> even completely without) the toolstack.
> 
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

I would also be fine with this merged into the respective patch where
hap_get_allocation gets called.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 4/5] x86/mem_sharing: reset a fork
  2020-02-24 15:12   ` Roger Pau Monné
@ 2020-02-24 15:35     ` Tamas K Lengyel
  2020-02-24 15:42       ` Roger Pau Monné
  2020-02-25 13:39       ` Jan Beulich
  0 siblings, 2 replies; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-24 15:35 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel

On Mon, Feb 24, 2020 at 8:13 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Fri, Feb 21, 2020 at 10:49:22AM -0800, Tamas K Lengyel wrote:
> > Implement hypercall that allows a fork to shed all memory that got allocated
> > for it during its execution and re-load its vCPU context from the parent VM.
> > This allows the forked VM to reset into the same state the parent VM is in a
> > faster way then creating a new fork would be. Measurements show about a 2x
> > speedup during normal fuzzing operations. Performance may vary depending how
>                                                                           ^ on
> > much memory got allocated for the forked VM. If it has been completely
> > deduplicated from the parent VM then creating a new fork would likely be more
> > performant.
> >
> > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> > ---
> >  xen/arch/x86/mm/mem_sharing.c | 76 +++++++++++++++++++++++++++++++++++
> >  xen/include/public/memory.h   |  1 +
> >  2 files changed, 77 insertions(+)
> >
> > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> > index ad5db9d8d5..fb6892aaa6 100644
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -1636,6 +1636,59 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd)
> >      return rc;
> >  }
> >
> > +/*
> > + * The fork reset operation is intended to be used on short-lived forks only.
> > + * There is no hypercall continuation operation implemented for this reason.
> > + * For forks that obtain a larger memory footprint it is likely going to be
> > + * more performant to create a new fork instead of resetting an existing one.
> > + *
> > + * TODO: In case this hypercall would become useful on forks with larger memory
> > + * footprints the hypercall continuation should be implemented.
>
> I'm afraid this is not safe, as users don't have an easy way to know
> whether a fork will have a large memory footprint or not.

They do, getdomaininfo tells a user exactly how much memory has been
allocated for a domain.

>
> IMO you either need some kind of check that prevents this function
> from being executed when the domain as too much memory assigned, or
> you need to implement continuations.

I really don't think we need continuation here with the usecase we
have for this function but I'm also tired of arguing about it, so I'll
just add it even if its going to be dead code.

>
> Or else this is very likely to trip over the watchdog.

The watchdog?

>
> > + */
> > +static int mem_sharing_fork_reset(struct domain *d, struct domain *cd)
>
> No need for the mem_sharing prefix, as it's a static function.

Sure.

>
> > +{
> > +    int rc;
> > +    struct p2m_domain* p2m = p2m_get_hostp2m(cd);
> > +    struct page_info *page, *tmp;
> > +
> > +    domain_pause(cd);
> > +
> > +    page_list_for_each_safe(page, tmp, &cd->page_list)
>
> You can use something like:
>
> while ( (page = page_list_remove_head(&cd->page_list) != NULL )
> {
>     ...
> }
>
> And avoid the *tmp local variable.

Sure.

>
> > +    {
> > +        p2m_type_t p2mt;
> > +        p2m_access_t p2ma;
> > +        gfn_t gfn;
> > +        mfn_t mfn = page_to_mfn(page);
> > +
> > +        if ( !mfn_valid(mfn) )
> > +            continue;
> > +
> > +        gfn = mfn_to_gfn(cd, mfn);
> > +        mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
> > +                                    0, NULL, false);
> > +
> > +        if ( !p2m_is_ram(p2mt) || p2m_is_shared(p2mt) )
> > +            continue;
> > +
> > +        /* take an extra reference */
> > +        if ( !get_page(page, cd) )
> > +            continue;
> > +
> > +        rc = p2m->set_entry(p2m, gfn, INVALID_MFN, PAGE_ORDER_4K,
> > +                            p2m_invalid, p2m_access_rwx, -1);
> > +        ASSERT(!rc);
>
> Can you handle this gracefully?

Nope. This should never happen, so if it does, something is very wrong
in some other part of Xen.

>
> > +
> > +        put_page_alloc_ref(page);
> > +        put_page(page);
> > +    }
> > +
> > +    if ( !(rc = hvm_copy_context_and_params(cd, d)) )
> > +        fork_tsc(cd, d);
> > +
> > +    domain_unpause(cd);
> > +    return rc;
> > +}
> > +
> >  int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
> >  {
> >      int rc;
> > @@ -1920,6 +1973,29 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
> >          break;
> >      }
> >
> > +    case XENMEM_sharing_op_fork_reset:
> > +    {
> > +        struct domain *pd;
> > +
> > +        rc = -EINVAL;
> > +        if ( mso.u.fork._pad[0] || mso.u.fork._pad[1] ||
> > +             mso.u.fork._pad[2] )
> > +            goto out;
> > +
> > +        rc = -ENOSYS;
> > +        if ( !d->parent )
>
> Why not use mem_sharing_is_fork?

I could.

>
> Also I think EINVAL would be more suitable here, as the passed domid
> doesn't belong to a fork?

Sure.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 4/5] x86/mem_sharing: reset a fork
  2020-02-24 15:35     ` Tamas K Lengyel
@ 2020-02-24 15:42       ` Roger Pau Monné
  2020-02-24 15:49         ` Tamas K Lengyel
  2020-02-25 13:39       ` Jan Beulich
  1 sibling, 1 reply; 45+ messages in thread
From: Roger Pau Monné @ 2020-02-24 15:42 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel

On Mon, Feb 24, 2020 at 08:35:09AM -0700, Tamas K Lengyel wrote:
> On Mon, Feb 24, 2020 at 8:13 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Fri, Feb 21, 2020 at 10:49:22AM -0800, Tamas K Lengyel wrote:
> > > Implement hypercall that allows a fork to shed all memory that got allocated
> > > for it during its execution and re-load its vCPU context from the parent VM.
> > > This allows the forked VM to reset into the same state the parent VM is in a
> > > faster way then creating a new fork would be. Measurements show about a 2x
> > > speedup during normal fuzzing operations. Performance may vary depending how
> >                                                                           ^ on
> > > much memory got allocated for the forked VM. If it has been completely
> > > deduplicated from the parent VM then creating a new fork would likely be more
> > > performant.
> > >
> > > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> > > ---
> > >  xen/arch/x86/mm/mem_sharing.c | 76 +++++++++++++++++++++++++++++++++++
> > >  xen/include/public/memory.h   |  1 +
> > >  2 files changed, 77 insertions(+)
> > >
> > > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> > > index ad5db9d8d5..fb6892aaa6 100644
> > > --- a/xen/arch/x86/mm/mem_sharing.c
> > > +++ b/xen/arch/x86/mm/mem_sharing.c
> > > @@ -1636,6 +1636,59 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd)
> > >      return rc;
> > >  }
> > >
> > > +/*
> > > + * The fork reset operation is intended to be used on short-lived forks only.
> > > + * There is no hypercall continuation operation implemented for this reason.
> > > + * For forks that obtain a larger memory footprint it is likely going to be
> > > + * more performant to create a new fork instead of resetting an existing one.
> > > + *
> > > + * TODO: In case this hypercall would become useful on forks with larger memory
> > > + * footprints the hypercall continuation should be implemented.
> >
> > I'm afraid this is not safe, as users don't have an easy way to know
> > whether a fork will have a large memory footprint or not.
> 
> They do, getdomaininfo tells a user exactly how much memory has been
> allocated for a domain.
> 
> >
> > IMO you either need some kind of check that prevents this function
> > from being executed when the domain as too much memory assigned, or
> > you need to implement continuations.
> 
> I really don't think we need continuation here with the usecase we
> have for this function but I'm also tired of arguing about it, so I'll
> just add it even if its going to be dead code.
> 
> >
> > Or else this is very likely to trip over the watchdog.
> 
> The watchdog?

Yes, Xen has a watchdog and this loop is likely to trigger it if it
takes > 5s to complete. The watchdog triggering is a fatal event and
leads to host crash.

Note that watchdog is not enabled by default, you need to enable it on
the Xen command line.

> > > +    {
> > > +        p2m_type_t p2mt;
> > > +        p2m_access_t p2ma;
> > > +        gfn_t gfn;
> > > +        mfn_t mfn = page_to_mfn(page);
> > > +
> > > +        if ( !mfn_valid(mfn) )
> > > +            continue;
> > > +
> > > +        gfn = mfn_to_gfn(cd, mfn);
> > > +        mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
> > > +                                    0, NULL, false);
> > > +
> > > +        if ( !p2m_is_ram(p2mt) || p2m_is_shared(p2mt) )
> > > +            continue;
> > > +
> > > +        /* take an extra reference */
> > > +        if ( !get_page(page, cd) )
> > > +            continue;
> > > +
> > > +        rc = p2m->set_entry(p2m, gfn, INVALID_MFN, PAGE_ORDER_4K,
> > > +                            p2m_invalid, p2m_access_rwx, -1);
> > > +        ASSERT(!rc);
> >
> > Can you handle this gracefully?
> 
> Nope. This should never happen, so if it does, something is very wrong
> in some other part of Xen.

OK, please switch it to BUG_ON then instead of ASSERT. It's better to
crash here than to misbehave later.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 2/5] xen: add parent_domid field to createdomain domctl
  2020-02-21 18:49 ` [Xen-devel] [PATCH v9 2/5] xen: add parent_domid field to createdomain domctl Tamas K Lengyel
  2020-02-21 21:02   ` Julien Grall
@ 2020-02-24 15:44   ` Andrew Cooper
  2020-02-24 15:55     ` Tamas K Lengyel
  1 sibling, 1 reply; 45+ messages in thread
From: Andrew Cooper @ 2020-02-24 15:44 UTC (permalink / raw)
  To: Tamas K Lengyel, xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	Ian Jackson, George Dunlap, Jan Beulich

On 21/02/2020 18:49, Tamas K Lengyel wrote:
> When creating a domain that will be used as a VM fork some information is
> required to set things up properly, like the max_vcpus count. Instead of the
> toolstack having to gather this information for each fork in a separate
> hypercall we can just include the parent domain's id in the createdomain domctl
> so that Xen can copy the setting without the extra toolstack queries.

Right, but when I said this wasn't safe, I did mean it...

What happens when parent and the current domain have different gnttab or
evtchn limits, or different emulation settings?

If you want to fork a domain safely, you either need to have no
parameters passed by the toolstack (and let Xen copy appropriate
values), or cross check every provided parameter and bail early on a
mismatch.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking
  2020-02-24 12:39   ` Roger Pau Monné
@ 2020-02-24 15:45     ` Tamas K Lengyel
  2020-02-24 15:59       ` Roger Pau Monné
                         ` (3 more replies)
  2020-02-25 13:28     ` Jan Beulich
  1 sibling, 4 replies; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-24 15:45 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel

On Mon, Feb 24, 2020 at 5:39 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Fri, Feb 21, 2020 at 10:49:21AM -0800, Tamas K Lengyel wrote:
> > VM forking is the process of creating a domain with an empty memory space and a
> > parent domain specified from which to populate the memory when necessary. For
> > the new domain to be functional the VM state is copied over as part of the fork
> > operation (HVM params, hap allocation, etc).
> >
> > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> > ---
> > v9: remove stale header
> >     fix tsc incarnation being bumped on set
> > ---
> >  xen/arch/x86/domain.c             |  11 ++
> >  xen/arch/x86/hvm/hvm.c            |   2 +-
> >  xen/arch/x86/mm/mem_sharing.c     | 222 ++++++++++++++++++++++++++++++
> >  xen/arch/x86/mm/p2m.c             |  11 +-
> >  xen/include/asm-x86/mem_sharing.h |  17 +++
> >  xen/include/public/memory.h       |   5 +
> >  xen/include/xen/sched.h           |   2 +
> >  7 files changed, 268 insertions(+), 2 deletions(-)
> >
> > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> > index fe63c23676..1ab0ca0942 100644
> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -2203,6 +2203,17 @@ int domain_relinquish_resources(struct domain *d)
> >              ret = relinquish_shared_pages(d);
> >              if ( ret )
> >                  return ret;
> > +
> > +            /*
> > +             * If the domain is forked, decrement the parent's pause count
> > +             * and release the domain.
> > +             */
> > +            if ( d->parent )
> > +            {
> > +                domain_unpause(d->parent);
> > +                put_domain(d->parent);
> > +                d->parent = NULL;
> > +            }
> >          }
> >  #endif
> >
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index a339b36a0d..9bfa603f8c 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -1915,7 +1915,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
> >      }
> >  #endif
> >
> > -    /* Spurious fault? PoD and log-dirty also take this path. */
> > +    /* Spurious fault? PoD, log-dirty and VM forking also take this path. */
> >      if ( p2m_is_ram(p2mt) )
> >      {
> >          rc = 1;
> > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> > index 3835bc928f..ad5db9d8d5 100644
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -22,6 +22,7 @@
> >
> >  #include <xen/types.h>
> >  #include <xen/domain_page.h>
> > +#include <xen/event.h>
> >  #include <xen/spinlock.h>
> >  #include <xen/rwlock.h>
> >  #include <xen/mm.h>
> > @@ -36,6 +37,9 @@
> >  #include <asm/altp2m.h>
> >  #include <asm/atomic.h>
> >  #include <asm/event.h>
> > +#include <asm/hap.h>
> > +#include <asm/hvm/hvm.h>
> > +#include <asm/hvm/save.h>
> >  #include <xsm/xsm.h>
> >
> >  #include "mm-locks.h"
> > @@ -1444,6 +1448,194 @@ static inline int mem_sharing_control(struct domain *d, bool enable)
> >      return 0;
> >  }
> >
> > +/*
> > + * Forking a page only gets called when the VM faults due to no entry being
> > + * in the EPT for the access. Depending on the type of access we either
> > + * populate the physmap with a shared entry for read-only access or
> > + * fork the page if its a write access.
> > + *
> > + * The client p2m is already locked so we only need to lock
> > + * the parent's here.
> > + */
> > +int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool unsharing)
> > +{
> > +    int rc = -ENOENT;
> > +    shr_handle_t handle;
> > +    struct domain *parent;
>
> Can you constify parent, I assume there are no changes made to the
> parent domain, just the forked one.

Sure.

>
> > +    struct p2m_domain *p2m;
> > +    unsigned long gfn_l = gfn_x(gfn);
> > +    mfn_t mfn, new_mfn;
> > +    p2m_type_t p2mt;
> > +    struct page_info *page;
> > +
> > +    if ( !mem_sharing_is_fork(d) )
> > +        return -ENOENT;
> > +
> > +    parent = d->parent;
>
> You can initialize at declaration time.

Sure.

>
> > +
> > +    if ( !unsharing )
> > +    {
> > +        /* For read-only accesses we just add a shared entry to the physmap */
> > +        while ( parent )
> > +        {
> > +            if ( !(rc = nominate_page(parent, gfn, 0, &handle)) )
> > +                break;
> > +
> > +            parent = parent->parent;
> > +        }
> > +
> > +        if ( !rc )
> > +        {
> > +            /* The client's p2m is already locked */
> > +            struct p2m_domain *pp2m = p2m_get_hostp2m(parent);
> > +
> > +            p2m_lock(pp2m);
> > +            rc = add_to_physmap(parent, gfn_l, handle, d, gfn_l, false);
> > +            p2m_unlock(pp2m);
> > +
> > +            if ( !rc )
> > +                return 0;
> > +        }
>
> Don't you need to return here, or else you will fallback into the
> unsharing path?

No, we want to fall-back to unsharing path if populating with a shared
entry failed. Notice above we return in case there was no error with
add_to_physmap.

>
> > +    }
> > +
> > +    /*
> > +     * If it's a write access (ie. unsharing) or if adding a shared entry to
> > +     * the physmap failed we'll fork the page directly.
> > +     */
> > +    p2m = p2m_get_hostp2m(d);
> > +    parent = d->parent;
> > +
> > +    while ( parent )
> > +    {
> > +        mfn = get_gfn_query(parent, gfn_l, &p2mt);
> > +
> > +        if ( mfn_valid(mfn) && p2m_is_any_ram(p2mt) )
>
> This would also cover grants, but I'm not sure how those are handled
> by forking, as access to those is granted on a per-domain basis. Ie:
> the parent will have access to the grant, but not the child.

Good question. Grants are not sharable because their reference count
will prevent sharing, so here the page content would just get copied
as a regular page into the fork. I can check that if in the usecase we
have anything breaks if we just skip grants completely, I don't think
a regular domain has any grants by default.

>
> > +            break;
> > +
> > +        put_gfn(parent, gfn_l);
> > +        parent = parent->parent;
> > +    }
> > +
> > +    if ( !parent )
> > +        return -ENOENT;
> > +
> > +    if ( !(page = alloc_domheap_page(d, 0)) )
> > +    {
> > +        put_gfn(parent, gfn_l);
> > +        return -ENOMEM;
> > +    }
> > +
> > +    new_mfn = page_to_mfn(page);
> > +    copy_domain_page(new_mfn, mfn);
> > +    set_gpfn_from_mfn(mfn_x(new_mfn), gfn_l);
> > +
> > +    put_gfn(parent, gfn_l);
> > +
> > +    return p2m->set_entry(p2m, gfn, new_mfn, PAGE_ORDER_4K, p2m_ram_rw,
>
> So the child p2m is going to be populated using 4K pages exclusively?
> Maybe it would make sense to try to use 2M if the parent domain page
> is also a 2M page or larger?

No, memory sharing only works on a 4k granularity so that's what we
are going with. No reason to copy 2M pages when most of it can just be
shared when broken up.

>
> > +                          p2m->default_access, -1);
> > +}
> > +
> > +static int bring_up_vcpus(struct domain *cd, struct cpupool *cpupool)
> > +{
> > +    int ret;
> > +    unsigned int i;
> > +
> > +    if ( (ret = cpupool_move_domain(cd, cpupool)) )
> > +        return ret;
> > +
> > +    for ( i = 0; i < cd->max_vcpus; i++ )
> > +    {
> > +        if ( cd->vcpu[i] )
> > +            continue;
> > +
> > +        if ( !vcpu_create(cd, i) )
> > +            return -EINVAL;
> > +    }
> > +
> > +    domain_update_node_affinity(cd);
> > +    return 0;
> > +}
> > +
> > +static int fork_hap_allocation(struct domain *cd, struct domain *d)
>
> Can you constify the parent domain?

Sure

>
> Also cd and d and not very helpful names, I would just use child and
> parent.

Sure.

> > +{
> > +    int rc;
> > +    bool preempted;
> > +    unsigned long mb = hap_get_allocation(d);
> > +
> > +    if ( mb == hap_get_allocation(cd) )
> > +        return 0;
> > +
> > +    paging_lock(cd);
> > +    rc = hap_set_allocation(cd, mb << (20 - PAGE_SHIFT), &preempted);
> > +    paging_unlock(cd);
> > +
> > +    if ( rc )
> > +        return rc;
> > +
> > +    if ( preempted )
> > +        return -ERESTART;
> > +
> > +    return 0;
>
> You can join all the checks into a single return:
>
> return preempted ? -ERESTART : rc;

Sure.

> > +}
> > +
> > +static void fork_tsc(struct domain *cd, struct domain *d)
> > +{
> > +    uint32_t tsc_mode;
> > +    uint32_t gtsc_khz;
> > +    uint32_t incarnation;
> > +    uint64_t elapsed_nsec;
> > +
> > +    tsc_get_info(d, &tsc_mode, &elapsed_nsec, &gtsc_khz, &incarnation);
> > +    /* Don't bump incarnation on set */
> > +    tsc_set_info(cd, tsc_mode, elapsed_nsec, gtsc_khz, incarnation - 1);
>
> Why is the incarnation not bumped?

See Andrew's comment in the previous version of the path. Apparently
tsc_set_info bumps it itself.

>
> > +}
> > +
> > +static int mem_sharing_fork(struct domain *d, struct domain *cd)
> > +{
> > +    int rc = -EINVAL;
> > +
> > +    if ( !cd->controller_pause_count )
> > +        return rc;
>
> -EBUSY might be better here.

Sure.

>
> > +
> > +    /*
> > +     * We only want to get and pause the parent once, not each time this
> > +     * operation is restarted due to preemption.
> > +     */
> > +    if ( !cd->parent_paused )
> > +    {
> > +        ASSERT(get_domain(d));
>
> We are trying to avoid such constructs, instead I suggest:
>
> if ( !get_domain(parent) )
> {
>     ASSERT_UNREACHABLE();
>     return -EBUSY;
> }

Sure.

> > +        domain_pause(d);
> > +
> > +        cd->parent_paused = true;
> > +        cd->max_pages = d->max_pages;
> > +        cd->max_vcpus = d->max_vcpus;
> > +    }
> > +
> > +    /* this is preemptible so it's the first to get done */
> > +    if ( (rc = fork_hap_allocation(cd, d)) )
> > +        goto done;
> > +
> > +    if ( (rc = bring_up_vcpus(cd, d->cpupool)) )
> > +        goto done;
> > +
> > +    if ( (rc = hvm_copy_context_and_params(cd, d)) )
> > +        goto done;
> > +
> > +    fork_tsc(cd, d);
> > +
> > +    cd->parent = d;
> > +
> > + done:
> > +    if ( rc && rc != -ERESTART )
> > +    {
> > +        domain_unpause(d);
> > +        put_domain(d);
> > +        cd->parent_paused = false;
> > +    }
> > +
> > +    return rc;
> > +}
> > +
> >  int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
> >  {
> >      int rc;
> > @@ -1698,6 +1890,36 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
> >          rc = debug_gref(d, mso.u.debug.u.gref);
> >          break;
> >
> > +    case XENMEM_sharing_op_fork:
> > +    {
> > +        struct domain *pd;
> > +
> > +        rc = -EINVAL;
> > +        if ( mso.u.fork._pad[0] || mso.u.fork._pad[1] ||
> > +             mso.u.fork._pad[2] )
> > +            goto out;
> > +
> > +        rc = rcu_lock_live_remote_domain_by_id(mso.u.fork.parent_domain,
> > +                                               &pd);
> > +        if ( rc )
> > +            goto out;
> > +
> > +        if ( !mem_sharing_enabled(pd) )
> > +        {
> > +            if ( (rc = mem_sharing_control(pd, true)) )
> > +                goto out;
>
> Please join both conditions:
>
> if ( !mem_sharing_enabled(pd) &&
>      (rc = mem_sharing_control(pd, true)) )
>     goto out;

Sure.

>
> > +        }
> > +
> > +        rc = mem_sharing_fork(pd, d);
> > +
> > +        if ( rc == -ERESTART )
> > +            rc = hypercall_create_continuation(__HYPERVISOR_memory_op,
> > +                                               "lh", XENMEM_sharing_op,
> > +                                               arg);
> > +        rcu_unlock_domain(pd);
> > +        break;
> > +    }
> > +
> >      default:
> >          rc = -ENOSYS;
> >          break;
> > diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> > index c5f428d67c..7c4d2fd7a0 100644
> > --- a/xen/arch/x86/mm/p2m.c
> > +++ b/xen/arch/x86/mm/p2m.c
> > @@ -509,6 +509,14 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
> >
> >      mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
> >
> > +    /* Check if we need to fork the page */
> > +    if ( (q & P2M_ALLOC) && p2m_is_hole(*t) &&
> > +         !mem_sharing_fork_page(p2m->domain, gfn, !!(q & P2M_UNSHARE)) )
> > +    {
> > +        mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
> > +    }
>
> No need for the braces.

I would keep them, it helps with readability in this case.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 4/5] x86/mem_sharing: reset a fork
  2020-02-24 15:42       ` Roger Pau Monné
@ 2020-02-24 15:49         ` Tamas K Lengyel
  2020-02-24 16:02           ` Roger Pau Monné
  0 siblings, 1 reply; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-24 15:49 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel

On Mon, Feb 24, 2020 at 8:42 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Mon, Feb 24, 2020 at 08:35:09AM -0700, Tamas K Lengyel wrote:
> > On Mon, Feb 24, 2020 at 8:13 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >
> > > On Fri, Feb 21, 2020 at 10:49:22AM -0800, Tamas K Lengyel wrote:
> > > > Implement hypercall that allows a fork to shed all memory that got allocated
> > > > for it during its execution and re-load its vCPU context from the parent VM.
> > > > This allows the forked VM to reset into the same state the parent VM is in a
> > > > faster way then creating a new fork would be. Measurements show about a 2x
> > > > speedup during normal fuzzing operations. Performance may vary depending how
> > >                                                                           ^ on
> > > > much memory got allocated for the forked VM. If it has been completely
> > > > deduplicated from the parent VM then creating a new fork would likely be more
> > > > performant.
> > > >
> > > > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> > > > ---
> > > >  xen/arch/x86/mm/mem_sharing.c | 76 +++++++++++++++++++++++++++++++++++
> > > >  xen/include/public/memory.h   |  1 +
> > > >  2 files changed, 77 insertions(+)
> > > >
> > > > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> > > > index ad5db9d8d5..fb6892aaa6 100644
> > > > --- a/xen/arch/x86/mm/mem_sharing.c
> > > > +++ b/xen/arch/x86/mm/mem_sharing.c
> > > > @@ -1636,6 +1636,59 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd)
> > > >      return rc;
> > > >  }
> > > >
> > > > +/*
> > > > + * The fork reset operation is intended to be used on short-lived forks only.
> > > > + * There is no hypercall continuation operation implemented for this reason.
> > > > + * For forks that obtain a larger memory footprint it is likely going to be
> > > > + * more performant to create a new fork instead of resetting an existing one.
> > > > + *
> > > > + * TODO: In case this hypercall would become useful on forks with larger memory
> > > > + * footprints the hypercall continuation should be implemented.
> > >
> > > I'm afraid this is not safe, as users don't have an easy way to know
> > > whether a fork will have a large memory footprint or not.
> >
> > They do, getdomaininfo tells a user exactly how much memory has been
> > allocated for a domain.
> >
> > >
> > > IMO you either need some kind of check that prevents this function
> > > from being executed when the domain as too much memory assigned, or
> > > you need to implement continuations.
> >
> > I really don't think we need continuation here with the usecase we
> > have for this function but I'm also tired of arguing about it, so I'll
> > just add it even if its going to be dead code.
> >
> > >
> > > Or else this is very likely to trip over the watchdog.
> >
> > The watchdog?
>
> Yes, Xen has a watchdog and this loop is likely to trigger it if it
> takes > 5s to complete. The watchdog triggering is a fatal event and
> leads to host crash.

OK, just to give you numbers, in the usecase this function is targeted
at we call it about ~100/s. Even in our worst case scenario we've seen
so far we haven't had a domain with enough memory deduplication where
this function took longer then 1s to finish. But again, at this point
we spent more time arguing about continuation then it takes to add it
so lets just move on.

>
> Note that watchdog is not enabled by default, you need to enable it on
> the Xen command line.

In that case I don't think the current setup would really bother anyone.

>
> > > > +    {
> > > > +        p2m_type_t p2mt;
> > > > +        p2m_access_t p2ma;
> > > > +        gfn_t gfn;
> > > > +        mfn_t mfn = page_to_mfn(page);
> > > > +
> > > > +        if ( !mfn_valid(mfn) )
> > > > +            continue;
> > > > +
> > > > +        gfn = mfn_to_gfn(cd, mfn);
> > > > +        mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
> > > > +                                    0, NULL, false);
> > > > +
> > > > +        if ( !p2m_is_ram(p2mt) || p2m_is_shared(p2mt) )
> > > > +            continue;
> > > > +
> > > > +        /* take an extra reference */
> > > > +        if ( !get_page(page, cd) )
> > > > +            continue;
> > > > +
> > > > +        rc = p2m->set_entry(p2m, gfn, INVALID_MFN, PAGE_ORDER_4K,
> > > > +                            p2m_invalid, p2m_access_rwx, -1);
> > > > +        ASSERT(!rc);
> > >
> > > Can you handle this gracefully?
> >
> > Nope. This should never happen, so if it does, something is very wrong
> > in some other part of Xen.
>
> OK, please switch it to BUG_ON then instead of ASSERT. It's better to
> crash here than to misbehave later.

Sure

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 2/5] xen: add parent_domid field to createdomain domctl
  2020-02-24 15:44   ` Andrew Cooper
@ 2020-02-24 15:55     ` Tamas K Lengyel
  0 siblings, 0 replies; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-24 15:55 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	Tamas K Lengyel, Ian Jackson, George Dunlap, Jan Beulich,
	Xen-devel

On Mon, Feb 24, 2020 at 8:45 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 21/02/2020 18:49, Tamas K Lengyel wrote:
> > When creating a domain that will be used as a VM fork some information is
> > required to set things up properly, like the max_vcpus count. Instead of the
> > toolstack having to gather this information for each fork in a separate
> > hypercall we can just include the parent domain's id in the createdomain domctl
> > so that Xen can copy the setting without the extra toolstack queries.
>
> Right, but when I said this wasn't safe, I did mean it...
>
> What happens when parent and the current domain have different gnttab or
> evtchn limits, or different emulation settings?
>
> If you want to fork a domain safely, you either need to have no
> parameters passed by the toolstack (and let Xen copy appropriate
> values), or cross check every provided parameter and bail early on a
> mismatch.

If you are using the toolstack code we add in patch 5 that doesn't
happen. So, for the situation you describe to happen: 1) you have to
custom compile Xen with the EXPERT setting enable this experimental
feature 2) write your own toolstack code 3) screw up doing so. This is
such an unlikely scenario that I'm not really bothered by it.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking
  2020-02-24 15:45     ` Tamas K Lengyel
@ 2020-02-24 15:59       ` Roger Pau Monné
  2020-02-24 22:14       ` Tamas K Lengyel
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 45+ messages in thread
From: Roger Pau Monné @ 2020-02-24 15:59 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel

On Mon, Feb 24, 2020 at 08:45:05AM -0700, Tamas K Lengyel wrote:
> On Mon, Feb 24, 2020 at 5:39 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Fri, Feb 21, 2020 at 10:49:21AM -0800, Tamas K Lengyel wrote:
> > > +    }
> > > +
> > > +    /*
> > > +     * If it's a write access (ie. unsharing) or if adding a shared entry to
> > > +     * the physmap failed we'll fork the page directly.
> > > +     */
> > > +    p2m = p2m_get_hostp2m(d);
> > > +    parent = d->parent;
> > > +
> > > +    while ( parent )
> > > +    {
> > > +        mfn = get_gfn_query(parent, gfn_l, &p2mt);
> > > +
> > > +        if ( mfn_valid(mfn) && p2m_is_any_ram(p2mt) )
> >
> > This would also cover grants, but I'm not sure how those are handled
> > by forking, as access to those is granted on a per-domain basis. Ie:
> > the parent will have access to the grant, but not the child.
> 
> Good question. Grants are not sharable because their reference count
> will prevent sharing, so here the page content would just get copied
> as a regular page into the fork. I can check that if in the usecase we
> have anything breaks if we just skip grants completely, I don't think
> a regular domain has any grants by default.

Hm, I don't have a good suggestion for this. Since the domain is not
aware of the fork there's no way for it to notice grant maps have
become stale.

Can you add a note in this regard? And maybe crashing the fork when a
grant is found would be fine, until we figure out how to handle them
properly.

> >
> > > +            break;
> > > +
> > > +        put_gfn(parent, gfn_l);
> > > +        parent = parent->parent;
> > > +    }
> > > +
> > > +    if ( !parent )
> > > +        return -ENOENT;
> > > +
> > > +    if ( !(page = alloc_domheap_page(d, 0)) )
> > > +    {
> > > +        put_gfn(parent, gfn_l);
> > > +        return -ENOMEM;
> > > +    }
> > > +
> > > +    new_mfn = page_to_mfn(page);
> > > +    copy_domain_page(new_mfn, mfn);
> > > +    set_gpfn_from_mfn(mfn_x(new_mfn), gfn_l);
> > > +
> > > +    put_gfn(parent, gfn_l);
> > > +
> > > +    return p2m->set_entry(p2m, gfn, new_mfn, PAGE_ORDER_4K, p2m_ram_rw,
> >
> > So the child p2m is going to be populated using 4K pages exclusively?
> > Maybe it would make sense to try to use 2M if the parent domain page
> > is also a 2M page or larger?
> 
> No, memory sharing only works on a 4k granularity so that's what we
> are going with. No reason to copy 2M pages when most of it can just be
> shared when broken up.

Oh, OK. For your use-case it likely doesn't matter that much, but
long-running forks would get better performance if using large pages.

> > > @@ -509,6 +509,14 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
> > >
> > >      mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
> > >
> > > +    /* Check if we need to fork the page */
> > > +    if ( (q & P2M_ALLOC) && p2m_is_hole(*t) &&
> > > +         !mem_sharing_fork_page(p2m->domain, gfn, !!(q & P2M_UNSHARE)) )
> > > +    {
> > > +        mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
> > > +    }
> >
> > No need for the braces.
> 
> I would keep them, it helps with readability in this case.

CODING_STYLE mentions that braces should be omitted for blocks with a
single statement, but I'm not sure how strictly we enforce this.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 4/5] x86/mem_sharing: reset a fork
  2020-02-24 15:49         ` Tamas K Lengyel
@ 2020-02-24 16:02           ` Roger Pau Monné
  0 siblings, 0 replies; 45+ messages in thread
From: Roger Pau Monné @ 2020-02-24 16:02 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel

On Mon, Feb 24, 2020 at 08:49:51AM -0700, Tamas K Lengyel wrote:
> On Mon, Feb 24, 2020 at 8:42 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Mon, Feb 24, 2020 at 08:35:09AM -0700, Tamas K Lengyel wrote:
> > > On Mon, Feb 24, 2020 at 8:13 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > >
> > > > On Fri, Feb 21, 2020 at 10:49:22AM -0800, Tamas K Lengyel wrote:
> > > > > Implement hypercall that allows a fork to shed all memory that got allocated
> > > > > for it during its execution and re-load its vCPU context from the parent VM.
> > > > > This allows the forked VM to reset into the same state the parent VM is in a
> > > > > faster way then creating a new fork would be. Measurements show about a 2x
> > > > > speedup during normal fuzzing operations. Performance may vary depending how
> > > >                                                                           ^ on
> > > > > much memory got allocated for the forked VM. If it has been completely
> > > > > deduplicated from the parent VM then creating a new fork would likely be more
> > > > > performant.
> > > > >
> > > > > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> > > > > ---
> > > > >  xen/arch/x86/mm/mem_sharing.c | 76 +++++++++++++++++++++++++++++++++++
> > > > >  xen/include/public/memory.h   |  1 +
> > > > >  2 files changed, 77 insertions(+)
> > > > >
> > > > > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> > > > > index ad5db9d8d5..fb6892aaa6 100644
> > > > > --- a/xen/arch/x86/mm/mem_sharing.c
> > > > > +++ b/xen/arch/x86/mm/mem_sharing.c
> > > > > @@ -1636,6 +1636,59 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd)
> > > > >      return rc;
> > > > >  }
> > > > >
> > > > > +/*
> > > > > + * The fork reset operation is intended to be used on short-lived forks only.
> > > > > + * There is no hypercall continuation operation implemented for this reason.
> > > > > + * For forks that obtain a larger memory footprint it is likely going to be
> > > > > + * more performant to create a new fork instead of resetting an existing one.
> > > > > + *
> > > > > + * TODO: In case this hypercall would become useful on forks with larger memory
> > > > > + * footprints the hypercall continuation should be implemented.
> > > >
> > > > I'm afraid this is not safe, as users don't have an easy way to know
> > > > whether a fork will have a large memory footprint or not.
> > >
> > > They do, getdomaininfo tells a user exactly how much memory has been
> > > allocated for a domain.
> > >
> > > >
> > > > IMO you either need some kind of check that prevents this function
> > > > from being executed when the domain as too much memory assigned, or
> > > > you need to implement continuations.
> > >
> > > I really don't think we need continuation here with the usecase we
> > > have for this function but I'm also tired of arguing about it, so I'll
> > > just add it even if its going to be dead code.
> > >
> > > >
> > > > Or else this is very likely to trip over the watchdog.
> > >
> > > The watchdog?
> >
> > Yes, Xen has a watchdog and this loop is likely to trigger it if it
> > takes > 5s to complete. The watchdog triggering is a fatal event and
> > leads to host crash.
> 
> OK, just to give you numbers, in the usecase this function is targeted
> at we call it about ~100/s. Even in our worst case scenario we've seen
> so far we haven't had a domain with enough memory deduplication where
> this function took longer then 1s to finish. But again, at this point
> we spent more time arguing about continuation then it takes to add it
> so lets just move on.

Right, adding continuation support is fairly trivial, and it would
keep us on the safe side.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 5/5] xen/tools: VM forking toolstack side
  2020-02-21 18:49 ` [Xen-devel] [PATCH v9 5/5] xen/tools: VM forking toolstack side Tamas K Lengyel
@ 2020-02-24 16:12   ` Julien Grall
  2020-02-24 16:19     ` Tamas K Lengyel
  0 siblings, 1 reply; 45+ messages in thread
From: Julien Grall @ 2020-02-24 16:12 UTC (permalink / raw)
  To: Tamas K Lengyel, xen-devel; +Cc: Anthony PERARD, Ian Jackson, Wei Liu

Hi Tamas,

On 21/02/2020 18:49, Tamas K Lengyel wrote:
> +int libxl_domain_fork_vm(libxl_ctx *ctx, uint32_t pdomid, uint32_t *domid)
> +{
> +    int rc;
> +    struct xen_domctl_createdomain create = {0};
> +    create.flags |= XEN_DOMCTL_CDF_hvm;
> +    create.flags |= XEN_DOMCTL_CDF_hap;
> +    create.flags |= XEN_DOMCTL_CDF_oos_off;
> +    create.arch.emulation_flags = (XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI);

This is x86 specific but seems to be written in common code. Did you 
build it on Arm?

> +
> +    create.ssidref = SECINITSID_DOMU;
> +    create.parent_domid = pdomid;
> +    create.max_evtchn_port = 1023;
> +    create.max_grant_frames = LIBXL_MAX_GRANT_FRAMES_DEFAULT;
> +    create.max_maptrack_frames = LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT;

The domain you fork may have different values here. From my 
understanding, the fork requires to have the same parameters as the 
parent. So how do you ensure they are the same?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 5/5] xen/tools: VM forking toolstack side
  2020-02-24 16:12   ` Julien Grall
@ 2020-02-24 16:19     ` Tamas K Lengyel
  2020-02-24 16:30       ` Julien Grall
  0 siblings, 1 reply; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-24 16:19 UTC (permalink / raw)
  To: Julien Grall
  Cc: Anthony PERARD, Xen-devel, Tamas K Lengyel, Ian Jackson, Wei Liu

On Mon, Feb 24, 2020 at 9:13 AM Julien Grall <julien@xen.org> wrote:
>
> Hi Tamas,
>
> On 21/02/2020 18:49, Tamas K Lengyel wrote:
> > +int libxl_domain_fork_vm(libxl_ctx *ctx, uint32_t pdomid, uint32_t *domid)
> > +{
> > +    int rc;
> > +    struct xen_domctl_createdomain create = {0};
> > +    create.flags |= XEN_DOMCTL_CDF_hvm;
> > +    create.flags |= XEN_DOMCTL_CDF_hap;
> > +    create.flags |= XEN_DOMCTL_CDF_oos_off;
> > +    create.arch.emulation_flags = (XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI);
>
> This is x86 specific but seems to be written in common code. Did you
> build it on Arm?

I have not, don't have a setup for ARM at the moment. I guess I'll
just move this function to libxl_x86.c to resolve the issue.

>
> > +
> > +    create.ssidref = SECINITSID_DOMU;
> > +    create.parent_domid = pdomid;
> > +    create.max_evtchn_port = 1023;
> > +    create.max_grant_frames = LIBXL_MAX_GRANT_FRAMES_DEFAULT;
> > +    create.max_maptrack_frames = LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT;
>
> The domain you fork may have different values here. From my
> understanding, the fork requires to have the same parameters as the
> parent. So how do you ensure they are the same?

The parent domain is created by xl. If you create a domain with xl it
will have these parameters set by default.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 5/5] xen/tools: VM forking toolstack side
  2020-02-24 16:19     ` Tamas K Lengyel
@ 2020-02-24 16:30       ` Julien Grall
  2020-02-24 16:45         ` Tamas K Lengyel
  0 siblings, 1 reply; 45+ messages in thread
From: Julien Grall @ 2020-02-24 16:30 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Anthony PERARD, Xen-devel, Tamas K Lengyel, Ian Jackson, Wei Liu

Hi,

On 24/02/2020 16:19, Tamas K Lengyel wrote:
> On Mon, Feb 24, 2020 at 9:13 AM Julien Grall <julien@xen.org> wrote:
>>
>> Hi Tamas,
>>
>> On 21/02/2020 18:49, Tamas K Lengyel wrote:
>>> +int libxl_domain_fork_vm(libxl_ctx *ctx, uint32_t pdomid, uint32_t *domid)
>>> +{
>>> +    int rc;
>>> +    struct xen_domctl_createdomain create = {0};
>>> +    create.flags |= XEN_DOMCTL_CDF_hvm;
>>> +    create.flags |= XEN_DOMCTL_CDF_hap;
>>> +    create.flags |= XEN_DOMCTL_CDF_oos_off;
>>> +    create.arch.emulation_flags = (XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI);
>>
>> This is x86 specific but seems to be written in common code. Did you
>> build it on Arm?
> 
> I have not, don't have a setup for ARM at the moment. I guess I'll
> just move this function to libxl_x86.c to resolve the issue.

It is fairly easy to setup a build environment for Arm. You can use QEMU 
user emulation and a Arm rootfs.

> 
>>
>>> +
>>> +    create.ssidref = SECINITSID_DOMU;
>>> +    create.parent_domid = pdomid;
>>> +    create.max_evtchn_port = 1023;
>>> +    create.max_grant_frames = LIBXL_MAX_GRANT_FRAMES_DEFAULT;
>>> +    create.max_maptrack_frames = LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT;
>>
>> The domain you fork may have different values here. From my
>> understanding, the fork requires to have the same parameters as the
>> parent. So how do you ensure they are the same?
> 
> The parent domain is created by xl. If you create a domain with xl it
> will have these parameters set by default.

I hope you are aware that you can override most of those parameters in 
the guest configuration file. It would be good to at least write down 
the limitations so people doesn't spend ages figuring out why it does 
not work.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 5/5] xen/tools: VM forking toolstack side
  2020-02-24 16:30       ` Julien Grall
@ 2020-02-24 16:45         ` Tamas K Lengyel
  0 siblings, 0 replies; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-24 16:45 UTC (permalink / raw)
  To: Julien Grall
  Cc: Anthony PERARD, Xen-devel, Tamas K Lengyel, Ian Jackson, Wei Liu

On Mon, Feb 24, 2020 at 9:30 AM Julien Grall <julien@xen.org> wrote:
>
> Hi,
>
> On 24/02/2020 16:19, Tamas K Lengyel wrote:
> > On Mon, Feb 24, 2020 at 9:13 AM Julien Grall <julien@xen.org> wrote:
> >>
> >> Hi Tamas,
> >>
> >> On 21/02/2020 18:49, Tamas K Lengyel wrote:
> >>> +int libxl_domain_fork_vm(libxl_ctx *ctx, uint32_t pdomid, uint32_t *domid)
> >>> +{
> >>> +    int rc;
> >>> +    struct xen_domctl_createdomain create = {0};
> >>> +    create.flags |= XEN_DOMCTL_CDF_hvm;
> >>> +    create.flags |= XEN_DOMCTL_CDF_hap;
> >>> +    create.flags |= XEN_DOMCTL_CDF_oos_off;
> >>> +    create.arch.emulation_flags = (XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI);
> >>
> >> This is x86 specific but seems to be written in common code. Did you
> >> build it on Arm?
> >
> > I have not, don't have a setup for ARM at the moment. I guess I'll
> > just move this function to libxl_x86.c to resolve the issue.
>
> It is fairly easy to setup a build environment for Arm. You can use QEMU
> user emulation and a Arm rootfs.
>
> >
> >>
> >>> +
> >>> +    create.ssidref = SECINITSID_DOMU;
> >>> +    create.parent_domid = pdomid;
> >>> +    create.max_evtchn_port = 1023;
> >>> +    create.max_grant_frames = LIBXL_MAX_GRANT_FRAMES_DEFAULT;
> >>> +    create.max_maptrack_frames = LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT;
> >>
> >> The domain you fork may have different values here. From my
> >> understanding, the fork requires to have the same parameters as the
> >> parent. So how do you ensure they are the same?
> >
> > The parent domain is created by xl. If you create a domain with xl it
> > will have these parameters set by default.
>
> I hope you are aware that you can override most of those parameters in
> the guest configuration file. It would be good to at least write down
> the limitations so people doesn't spend ages figuring out why it does
> not work.

Fair enough.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking
  2020-02-24 15:45     ` Tamas K Lengyel
  2020-02-24 15:59       ` Roger Pau Monné
@ 2020-02-24 22:14       ` Tamas K Lengyel
  2020-02-25  9:40         ` Roger Pau Monné
  2020-02-24 22:26       ` Tamas K Lengyel
  2020-02-25 13:30       ` Jan Beulich
  3 siblings, 1 reply; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-24 22:14 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel

> > > +int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool unsharing)
> > > +{
> > > +    int rc = -ENOENT;
> > > +    shr_handle_t handle;
> > > +    struct domain *parent;
> >
> > Can you constify parent, I assume there are no changes made to the
> > parent domain, just the forked one.
>
> Sure.

Actually, no can't do. The parent could get modified here, since we
could end up making the parent's memory into shared entries. So using
const here is not justified since the parent can and does get
modified.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking
  2020-02-24 15:45     ` Tamas K Lengyel
  2020-02-24 15:59       ` Roger Pau Monné
  2020-02-24 22:14       ` Tamas K Lengyel
@ 2020-02-24 22:26       ` Tamas K Lengyel
  2020-02-25  9:40         ` Roger Pau Monné
  2020-02-25 13:30       ` Jan Beulich
  3 siblings, 1 reply; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-24 22:26 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel

> > Also cd and d and not very helpful names, I would just use child and
> > parent.
>
> Sure.

Looking at this in the context of the whole pre-existing code-base,
the names cd & d are used throughout mem_sharing. So for consistency I
will keep it for now.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking
  2020-02-24 22:14       ` Tamas K Lengyel
@ 2020-02-25  9:40         ` Roger Pau Monné
  0 siblings, 0 replies; 45+ messages in thread
From: Roger Pau Monné @ 2020-02-25  9:40 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel

On Mon, Feb 24, 2020 at 03:14:56PM -0700, Tamas K Lengyel wrote:
> > > > +int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool unsharing)
> > > > +{
> > > > +    int rc = -ENOENT;
> > > > +    shr_handle_t handle;
> > > > +    struct domain *parent;
> > >
> > > Can you constify parent, I assume there are no changes made to the
> > > parent domain, just the forked one.
> >
> > Sure.
> 
> Actually, no can't do. The parent could get modified here, since we
> could end up making the parent's memory into shared entries. So using
> const here is not justified since the parent can and does get
> modified.

Oh, that's fine.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking
  2020-02-24 22:26       ` Tamas K Lengyel
@ 2020-02-25  9:40         ` Roger Pau Monné
  0 siblings, 0 replies; 45+ messages in thread
From: Roger Pau Monné @ 2020-02-25  9:40 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel

On Mon, Feb 24, 2020 at 03:26:05PM -0700, Tamas K Lengyel wrote:
> > > Also cd and d and not very helpful names, I would just use child and
> > > parent.
> >
> > Sure.
> 
> Looking at this in the context of the whole pre-existing code-base,
> the names cd & d are used throughout mem_sharing. So for consistency I
> will keep it for now.

Ack, if that's the prevailing style of the file and maintainers are
fine with it.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking
  2020-02-21 18:49 ` [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking Tamas K Lengyel
  2020-02-24 12:39   ` Roger Pau Monné
@ 2020-02-25 10:04   ` Roger Pau Monné
  2020-02-25 11:43     ` Tamas K Lengyel
  1 sibling, 1 reply; 45+ messages in thread
From: Roger Pau Monné @ 2020-02-25 10:04 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	Andrew Cooper, Ian Jackson, George Dunlap, Tamas K Lengyel,
	Jan Beulich, xen-devel

On Fri, Feb 21, 2020 at 10:49:21AM -0800, Tamas K Lengyel wrote:
> VM forking is the process of creating a domain with an empty memory space and a
> parent domain specified from which to populate the memory when necessary. For
> the new domain to be functional the VM state is copied over as part of the fork
> operation (HVM params, hap allocation, etc).

I've just realized that I'm not sure how are special pages handled,
what happens to the xenstore, console, shared info pages, or the
vcpu info pages if the parent has those registered?

Also, what happens to pages that are being used for DMA with emulated
devices? Will QEMU foreign mappings also trigger the populate on
demand routine, so that external emulators can access those?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking
  2020-02-25 10:04   ` Roger Pau Monné
@ 2020-02-25 11:43     ` Tamas K Lengyel
  2020-02-25 12:06       ` Roger Pau Monné
  0 siblings, 1 reply; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-25 11:43 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel

On Tue, Feb 25, 2020 at 3:05 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Fri, Feb 21, 2020 at 10:49:21AM -0800, Tamas K Lengyel wrote:
> > VM forking is the process of creating a domain with an empty memory space and a
> > parent domain specified from which to populate the memory when necessary. For
> > the new domain to be functional the VM state is copied over as part of the fork
> > operation (HVM params, hap allocation, etc).
>
> I've just realized that I'm not sure how are special pages handled,
> what happens to the xenstore, console, shared info pages, or the
> vcpu info pages if the parent has those registered?

The fork starts out with an empty p2m so these pages are not present.
In case the guest itself tries to access these pages, or a foreign
mapping is set up for them, then that will trigger fork_page from the
parent. I see that in the VM restore path clears the pages for
HVM_PARAM_CONSOLE_PFN, HVM_PARAM_STORE_PFN, HVM_PARAM_IOREQ_PFN and
HVM_PARAM_BUFIOREQ_PFN but doesn't really explain why that would be
required. I can certainly add the same special handling for these HVM
params when they get copied during the fork hypercall.

As for the vcpu info page, I'm not sure where that gets allocated and
mapped normally. I don't see any special handling for that in the
save/restore paths. We use the standard createdomain hypercall to
setup the VM that will be used for the fork, then use vcpu_create to
bring up the vCPUs and just load them with the hvm context, pretty
much the same way the restore path would.

> Also, what happens to pages that are being used for DMA with emulated
> devices? Will QEMU foreign mappings also trigger the populate on
> demand routine, so that external emulators can access those?

Foreign mappings do trigger the fork_page routine, yes. Same for
setting mem_access permissions.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking
  2020-02-25 11:43     ` Tamas K Lengyel
@ 2020-02-25 12:06       ` Roger Pau Monné
  2020-02-25 12:23         ` Tamas K Lengyel
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monné @ 2020-02-25 12:06 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel

On Tue, Feb 25, 2020 at 04:43:30AM -0700, Tamas K Lengyel wrote:
> On Tue, Feb 25, 2020 at 3:05 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Fri, Feb 21, 2020 at 10:49:21AM -0800, Tamas K Lengyel wrote:
> > > VM forking is the process of creating a domain with an empty memory space and a
> > > parent domain specified from which to populate the memory when necessary. For
> > > the new domain to be functional the VM state is copied over as part of the fork
> > > operation (HVM params, hap allocation, etc).
> >
> > I've just realized that I'm not sure how are special pages handled,
> > what happens to the xenstore, console, shared info pages, or the
> > vcpu info pages if the parent has those registered?
> 
> The fork starts out with an empty p2m so these pages are not present.
> In case the guest itself tries to access these pages, or a foreign
> mapping is set up for them, then that will trigger fork_page from the
> parent. I see that in the VM restore path clears the pages for
> HVM_PARAM_CONSOLE_PFN, HVM_PARAM_STORE_PFN, HVM_PARAM_IOREQ_PFN and
> HVM_PARAM_BUFIOREQ_PFN but doesn't really explain why that would be
> required. I can certainly add the same special handling for these HVM
> params when they get copied during the fork hypercall.

Those params are likely set by the toolstack when creating the domain.
I think you will have to at least copy the values from the parent and
maybe pre-populate them when forking, that depends on whether internal
Xen accesses will also trigger the populate logic.

> As for the vcpu info page, I'm not sure where that gets allocated and
> mapped normally. I don't see any special handling for that in the
> save/restore paths.

The shared info page, the vcpu info and the stolen time pages are tear
down during suspend/restore, and the guest re-registers them when
resuming. Since the guest is not aware of the fork happening, you will
have to populate those on fork and make sure the domain fields
pointing to them are updated, so that Xen can continue updating this
shared areas.

> We use the standard createdomain hypercall to
> setup the VM that will be used for the fork, then use vcpu_create to
> bring up the vCPUs and just load them with the hvm context, pretty
> much the same way the restore path would.

Depending on how much of the creation process you skip you might end
up missing bits, there are a bunch of hypercalls involved in domain
creation.

Note also that during a restore the guest is aware of such process,
and will know that it needs to re-register some stuff, but that's not
the case when forking, since the guest is not aware you need to make
sure everything is in place. There are also the grant table pages,
which I think should also be handled somehow (or we need to at least
note this isn't handled and will need fixing).

> > Also, what happens to pages that are being used for DMA with emulated
> > devices? Will QEMU foreign mappings also trigger the populate on
> > demand routine, so that external emulators can access those?
> 
> Foreign mappings do trigger the fork_page routine, yes. Same for
> setting mem_access permissions.

OK, that's fine then.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking
  2020-02-25 12:06       ` Roger Pau Monné
@ 2020-02-25 12:23         ` Tamas K Lengyel
  2020-02-25 14:23           ` Tamas K Lengyel
  0 siblings, 1 reply; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-25 12:23 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel

On Tue, Feb 25, 2020 at 5:06 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Tue, Feb 25, 2020 at 04:43:30AM -0700, Tamas K Lengyel wrote:
> > On Tue, Feb 25, 2020 at 3:05 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >
> > > On Fri, Feb 21, 2020 at 10:49:21AM -0800, Tamas K Lengyel wrote:
> > > > VM forking is the process of creating a domain with an empty memory space and a
> > > > parent domain specified from which to populate the memory when necessary. For
> > > > the new domain to be functional the VM state is copied over as part of the fork
> > > > operation (HVM params, hap allocation, etc).
> > >
> > > I've just realized that I'm not sure how are special pages handled,
> > > what happens to the xenstore, console, shared info pages, or the
> > > vcpu info pages if the parent has those registered?
> >
> > The fork starts out with an empty p2m so these pages are not present.
> > In case the guest itself tries to access these pages, or a foreign
> > mapping is set up for them, then that will trigger fork_page from the
> > parent. I see that in the VM restore path clears the pages for
> > HVM_PARAM_CONSOLE_PFN, HVM_PARAM_STORE_PFN, HVM_PARAM_IOREQ_PFN and
> > HVM_PARAM_BUFIOREQ_PFN but doesn't really explain why that would be
> > required. I can certainly add the same special handling for these HVM
> > params when they get copied during the fork hypercall.
>
> Those params are likely set by the toolstack when creating the domain.
> I think you will have to at least copy the values from the parent and
> maybe pre-populate them when forking, that depends on whether internal
> Xen accesses will also trigger the populate logic.

All params and the hvm context gets copied from the parent already.
Whether or not these pages need to be manually populated, I'm not
sure. I haven't noticed any issues when forking a standard VM and not
pre-populating these pages. But I guess it doesn't hurt to be as close
to the save/restore routine as possible.

>
> > As for the vcpu info page, I'm not sure where that gets allocated and
> > mapped normally. I don't see any special handling for that in the
> > save/restore paths.
>
> The shared info page, the vcpu info and the stolen time pages are tear
> down during suspend/restore, and the guest re-registers them when
> resuming. Since the guest is not aware of the fork happening, you will
> have to populate those on fork and make sure the domain fields
> pointing to them are updated, so that Xen can continue updating this
> shared areas.

Could you point me to where this code lives or where these pages are
being mapped into the guest? I've been grepping for a while now and
it's not clear to me still.

>
> > We use the standard createdomain hypercall to
> > setup the VM that will be used for the fork, then use vcpu_create to
> > bring up the vCPUs and just load them with the hvm context, pretty
> > much the same way the restore path would.
>
> Depending on how much of the creation process you skip you might end
> up missing bits, there are a bunch of hypercalls involved in domain
> creation.

I modeled the forking processes heavily on the save/restore routines.
So everything that gets called during a VM restore gets called during
a VM fork, except it happens within Xen instead of the toolstack
issuing a bunch of hypercalls since that's a lot a faster. The only
difference between a save/restore and a fork should be that the p2m of
the fork isn't populated since that happens during runtime.

>
> Note also that during a restore the guest is aware of such process,
> and will know that it needs to re-register some stuff, but that's not
> the case when forking, since the guest is not aware you need to make
> sure everything is in place. There are also the grant table pages,
> which I think should also be handled somehow (or we need to at least
> note this isn't handled and will need fixing).

True, the fork is not aware that something happened (and we want to
keep it that way). But right now everything seems to "just work" as
far as a standard VM is used. There must be a million cornercases left
that I haven't covered for sure.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 1/5] xen/x86: Make hap_get_allocation accessible
  2020-02-24 15:20   ` Roger Pau Monné
@ 2020-02-25 13:16     ` Jan Beulich
  2020-02-25 13:21       ` Tamas K Lengyel
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2020-02-25 13:16 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: xen-devel, Andrew Cooper, George Dunlap, Wei Liu, Roger Pau Monné

On 24.02.2020 16:20, Roger Pau Monné wrote:
> On Fri, Feb 21, 2020 at 10:49:19AM -0800, Tamas K Lengyel wrote:
>> During VM forking we'll copy the parent domain's parameters to the client,
>> including the HAP shadow memory setting that is used for storing the domain's
>> EPT. We'll copy this in the hypervisor instead doing it during toolstack launch
>> to allow the domain to start executing and unsharing memory before (or
>> even completely without) the toolstack.
>>
>> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

And with maintainership now having changed

Acked-by: Jan Beulich <jbeulich@suse.com>

> I would also be fine with this merged into the respective patch where
> hap_get_allocation gets called.

Indeed that's the reason why this hasn't gone in yet, I think.
There's no use from putting it in without also putting in the
patch where this is actually needed. In fact it introduces a
"stray lack of static" instance until then, which typically we
try to clean up.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 1/5] xen/x86: Make hap_get_allocation accessible
  2020-02-25 13:16     ` Jan Beulich
@ 2020-02-25 13:21       ` Tamas K Lengyel
  0 siblings, 0 replies; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-25 13:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tamas K Lengyel, Wei Liu, Andrew Cooper, George Dunlap,
	Xen-devel, Roger Pau Monné

On Tue, Feb 25, 2020 at 6:17 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 24.02.2020 16:20, Roger Pau Monné wrote:
> > On Fri, Feb 21, 2020 at 10:49:19AM -0800, Tamas K Lengyel wrote:
> >> During VM forking we'll copy the parent domain's parameters to the client,
> >> including the HAP shadow memory setting that is used for storing the domain's
> >> EPT. We'll copy this in the hypervisor instead doing it during toolstack launch
> >> to allow the domain to start executing and unsharing memory before (or
> >> even completely without) the toolstack.
> >>
> >> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> >
> > Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
>
> And with maintainership now having changed
>
> Acked-by: Jan Beulich <jbeulich@suse.com>
>
> > I would also be fine with this merged into the respective patch where
> > hap_get_allocation gets called.
>
> Indeed that's the reason why this hasn't gone in yet, I think.
> There's no use from putting it in without also putting in the
> patch where this is actually needed. In fact it introduces a
> "stray lack of static" instance until then, which typically we
> try to clean up.

I can certainly fold it in to the larger patch if that's preferred.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking
  2020-02-24 12:39   ` Roger Pau Monné
  2020-02-24 15:45     ` Tamas K Lengyel
@ 2020-02-25 13:28     ` Jan Beulich
  2020-02-25 13:39       ` Tamas K Lengyel
  1 sibling, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2020-02-25 13:28 UTC (permalink / raw)
  To: Roger Pau Monné, Tamas K Lengyel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	Andrew Cooper, Ian Jackson, George Dunlap, Tamas K Lengyel,
	xen-devel

On 24.02.2020 13:39, Roger Pau Monné wrote:
> On Fri, Feb 21, 2020 at 10:49:21AM -0800, Tamas K Lengyel wrote:
>> +static int mem_sharing_fork(struct domain *d, struct domain *cd)
>> +{
>> +    int rc = -EINVAL;
>> +
>> +    if ( !cd->controller_pause_count )
>> +        return rc;
> 
> -EBUSY might be better here.
> 
>> +
>> +    /*
>> +     * We only want to get and pause the parent once, not each time this
>> +     * operation is restarted due to preemption.
>> +     */
>> +    if ( !cd->parent_paused )
>> +    {
>> +        ASSERT(get_domain(d));
> 
> We are trying to avoid such constructs, instead I suggest:
> 
> if ( !get_domain(parent) )
> {
>     ASSERT_UNREACHABLE();
>     return -EBUSY;
> }

But isn't the ASSERT() here wrong anyway? I.e. what is it that
guarantees that d hasn't gone away? If it's the caller of this
function, then wouldn't it be get_knownalive_domain() that
wants using here?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking
  2020-02-24 15:45     ` Tamas K Lengyel
                         ` (2 preceding siblings ...)
  2020-02-24 22:26       ` Tamas K Lengyel
@ 2020-02-25 13:30       ` Jan Beulich
  3 siblings, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2020-02-25 13:30 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Xen-devel, Roger Pau Monné

On 24.02.2020 16:45, Tamas K Lengyel wrote:
> On Mon, Feb 24, 2020 at 5:39 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>> On Fri, Feb 21, 2020 at 10:49:21AM -0800, Tamas K Lengyel wrote:
>>> --- a/xen/arch/x86/mm/p2m.c
>>> +++ b/xen/arch/x86/mm/p2m.c
>>> @@ -509,6 +509,14 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
>>>
>>>      mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
>>>
>>> +    /* Check if we need to fork the page */
>>> +    if ( (q & P2M_ALLOC) && p2m_is_hole(*t) &&
>>> +         !mem_sharing_fork_page(p2m->domain, gfn, !!(q & P2M_UNSHARE)) )
>>> +    {
>>> +        mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
>>> +    }
>>
>> No need for the braces.
> 
> I would keep them, it helps with readability in this case.

I agree with Roger here, fwiw.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking
  2020-02-25 13:28     ` Jan Beulich
@ 2020-02-25 13:39       ` Tamas K Lengyel
  0 siblings, 0 replies; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-25 13:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Xen-devel, Roger Pau Monné

On Tue, Feb 25, 2020 at 6:28 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 24.02.2020 13:39, Roger Pau Monné wrote:
> > On Fri, Feb 21, 2020 at 10:49:21AM -0800, Tamas K Lengyel wrote:
> >> +static int mem_sharing_fork(struct domain *d, struct domain *cd)
> >> +{
> >> +    int rc = -EINVAL;
> >> +
> >> +    if ( !cd->controller_pause_count )
> >> +        return rc;
> >
> > -EBUSY might be better here.
> >
> >> +
> >> +    /*
> >> +     * We only want to get and pause the parent once, not each time this
> >> +     * operation is restarted due to preemption.
> >> +     */
> >> +    if ( !cd->parent_paused )
> >> +    {
> >> +        ASSERT(get_domain(d));
> >
> > We are trying to avoid such constructs, instead I suggest:
> >
> > if ( !get_domain(parent) )
> > {
> >     ASSERT_UNREACHABLE();
> >     return -EBUSY;
> > }
>
> But isn't the ASSERT() here wrong anyway? I.e. what is it that
> guarantees that d hasn't gone away? If it's the caller of this
> function, then wouldn't it be get_knownalive_domain() that
> wants using here?

During the fork hypercall the parent's pause count is incremented via
domain_pause. It can't go away until the reference count goes down to
zero, which only happens when it's forks are all gone.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 4/5] x86/mem_sharing: reset a fork
  2020-02-24 15:35     ` Tamas K Lengyel
  2020-02-24 15:42       ` Roger Pau Monné
@ 2020-02-25 13:39       ` Jan Beulich
  2020-02-25 13:45         ` Tamas K Lengyel
  1 sibling, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2020-02-25 13:39 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Xen-devel, Roger Pau Monné

On 24.02.2020 16:35, Tamas K Lengyel wrote:
> On Mon, Feb 24, 2020 at 8:13 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>> On Fri, Feb 21, 2020 at 10:49:22AM -0800, Tamas K Lengyel wrote:
>>> --- a/xen/arch/x86/mm/mem_sharing.c
>>> +++ b/xen/arch/x86/mm/mem_sharing.c
>>> @@ -1636,6 +1636,59 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd)
>>>      return rc;
>>>  }
>>>
>>> +/*
>>> + * The fork reset operation is intended to be used on short-lived forks only.
>>> + * There is no hypercall continuation operation implemented for this reason.
>>> + * For forks that obtain a larger memory footprint it is likely going to be
>>> + * more performant to create a new fork instead of resetting an existing one.
>>> + *
>>> + * TODO: In case this hypercall would become useful on forks with larger memory
>>> + * footprints the hypercall continuation should be implemented.
>>
>> I'm afraid this is not safe, as users don't have an easy way to know
>> whether a fork will have a large memory footprint or not.
> 
> They do, getdomaininfo tells a user exactly how much memory has been
> allocated for a domain.

This tells the tool stack how much memory a guest has in absolute
numbers, but it doesn't tell it whether Xen would consider this
"large".

>>> +    {
>>> +        p2m_type_t p2mt;
>>> +        p2m_access_t p2ma;
>>> +        gfn_t gfn;
>>> +        mfn_t mfn = page_to_mfn(page);
>>> +
>>> +        if ( !mfn_valid(mfn) )
>>> +            continue;
>>> +
>>> +        gfn = mfn_to_gfn(cd, mfn);
>>> +        mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
>>> +                                    0, NULL, false);
>>> +
>>> +        if ( !p2m_is_ram(p2mt) || p2m_is_shared(p2mt) )
>>> +            continue;
>>> +
>>> +        /* take an extra reference */
>>> +        if ( !get_page(page, cd) )
>>> +            continue;
>>> +
>>> +        rc = p2m->set_entry(p2m, gfn, INVALID_MFN, PAGE_ORDER_4K,
>>> +                            p2m_invalid, p2m_access_rwx, -1);
>>> +        ASSERT(!rc);
>>
>> Can you handle this gracefully?
> 
> Nope. This should never happen, so if it does, something is very wrong
> in some other part of Xen.


In such a case, please put in a comment explaining why failure is
impossible. In the general case e.g. a 2Mb page may need splitting,
which may yield -ENOMEM. Such a comment will then also be useful in
case a new failure mode gets added to ->set_entry(), where it then
will need judging whether the assumption here still holds. (This is
also why in general it'd be better to handle the error. It'll still
be better to crash the guest than the host in case you can't. See
the bottom of ./CODING_STYLE.)

Janan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 4/5] x86/mem_sharing: reset a fork
  2020-02-25 13:39       ` Jan Beulich
@ 2020-02-25 13:45         ` Tamas K Lengyel
  2020-02-25 14:13           ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-25 13:45 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Xen-devel, Roger Pau Monné

On Tue, Feb 25, 2020 at 6:39 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 24.02.2020 16:35, Tamas K Lengyel wrote:
> > On Mon, Feb 24, 2020 at 8:13 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >> On Fri, Feb 21, 2020 at 10:49:22AM -0800, Tamas K Lengyel wrote:
> >>> --- a/xen/arch/x86/mm/mem_sharing.c
> >>> +++ b/xen/arch/x86/mm/mem_sharing.c
> >>> @@ -1636,6 +1636,59 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd)
> >>>      return rc;
> >>>  }
> >>>
> >>> +/*
> >>> + * The fork reset operation is intended to be used on short-lived forks only.
> >>> + * There is no hypercall continuation operation implemented for this reason.
> >>> + * For forks that obtain a larger memory footprint it is likely going to be
> >>> + * more performant to create a new fork instead of resetting an existing one.
> >>> + *
> >>> + * TODO: In case this hypercall would become useful on forks with larger memory
> >>> + * footprints the hypercall continuation should be implemented.
> >>
> >> I'm afraid this is not safe, as users don't have an easy way to know
> >> whether a fork will have a large memory footprint or not.
> >
> > They do, getdomaininfo tells a user exactly how much memory has been
> > allocated for a domain.
>
> This tells the tool stack how much memory a guest has in absolute
> numbers, but it doesn't tell it whether Xen would consider this
> "large".
>
> >>> +    {
> >>> +        p2m_type_t p2mt;
> >>> +        p2m_access_t p2ma;
> >>> +        gfn_t gfn;
> >>> +        mfn_t mfn = page_to_mfn(page);
> >>> +
> >>> +        if ( !mfn_valid(mfn) )
> >>> +            continue;
> >>> +
> >>> +        gfn = mfn_to_gfn(cd, mfn);
> >>> +        mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
> >>> +                                    0, NULL, false);
> >>> +
> >>> +        if ( !p2m_is_ram(p2mt) || p2m_is_shared(p2mt) )
> >>> +            continue;
> >>> +
> >>> +        /* take an extra reference */
> >>> +        if ( !get_page(page, cd) )
> >>> +            continue;
> >>> +
> >>> +        rc = p2m->set_entry(p2m, gfn, INVALID_MFN, PAGE_ORDER_4K,
> >>> +                            p2m_invalid, p2m_access_rwx, -1);
> >>> +        ASSERT(!rc);
> >>
> >> Can you handle this gracefully?
> >
> > Nope. This should never happen, so if it does, something is very wrong
> > in some other part of Xen.
>
>
> In such a case, please put in a comment explaining why failure is
> impossible. In the general case e.g. a 2Mb page may need splitting,
> which may yield -ENOMEM. Such a comment will then also be useful in
> case a new failure mode gets added to ->set_entry(), where it then
> will need judging whether the assumption here still holds. (This is
> also why in general it'd be better to handle the error. It'll still
> be better to crash the guest than the host in case you can't. See
> the bottom of ./CODING_STYLE.)

The mem_sharing codebase uses ASSERT(!rc) on p2m->set_entry already
when removing pages like we do here (see relinquish_shared_pages).
This only gets called on shared pages that we know for sure are
present. Since these are shared pages we know that their size is 4k
thus there is no splitting. I can certainly add a comment to this
effect to spell it out why the ASSERT is appropriate.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 4/5] x86/mem_sharing: reset a fork
  2020-02-25 13:45         ` Tamas K Lengyel
@ 2020-02-25 14:13           ` Jan Beulich
  2020-02-25 14:26             ` Tamas K Lengyel
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2020-02-25 14:13 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Xen-devel, Roger Pau Monné

On 25.02.2020 14:45, Tamas K Lengyel wrote:
> On Tue, Feb 25, 2020 at 6:39 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 24.02.2020 16:35, Tamas K Lengyel wrote:
>>> On Mon, Feb 24, 2020 at 8:13 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>> On Fri, Feb 21, 2020 at 10:49:22AM -0800, Tamas K Lengyel wrote:
>>>>> --- a/xen/arch/x86/mm/mem_sharing.c
>>>>> +++ b/xen/arch/x86/mm/mem_sharing.c
>>>>> @@ -1636,6 +1636,59 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd)
>>>>>      return rc;
>>>>>  }
>>>>>
>>>>> +/*
>>>>> + * The fork reset operation is intended to be used on short-lived forks only.
>>>>> + * There is no hypercall continuation operation implemented for this reason.
>>>>> + * For forks that obtain a larger memory footprint it is likely going to be
>>>>> + * more performant to create a new fork instead of resetting an existing one.
>>>>> + *
>>>>> + * TODO: In case this hypercall would become useful on forks with larger memory
>>>>> + * footprints the hypercall continuation should be implemented.
>>>>
>>>> I'm afraid this is not safe, as users don't have an easy way to know
>>>> whether a fork will have a large memory footprint or not.
>>>
>>> They do, getdomaininfo tells a user exactly how much memory has been
>>> allocated for a domain.
>>
>> This tells the tool stack how much memory a guest has in absolute
>> numbers, but it doesn't tell it whether Xen would consider this
>> "large".
>>
>>>>> +    {
>>>>> +        p2m_type_t p2mt;
>>>>> +        p2m_access_t p2ma;
>>>>> +        gfn_t gfn;
>>>>> +        mfn_t mfn = page_to_mfn(page);
>>>>> +
>>>>> +        if ( !mfn_valid(mfn) )
>>>>> +            continue;
>>>>> +
>>>>> +        gfn = mfn_to_gfn(cd, mfn);
>>>>> +        mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
>>>>> +                                    0, NULL, false);
>>>>> +
>>>>> +        if ( !p2m_is_ram(p2mt) || p2m_is_shared(p2mt) )
>>>>> +            continue;
>>>>> +
>>>>> +        /* take an extra reference */
>>>>> +        if ( !get_page(page, cd) )
>>>>> +            continue;
>>>>> +
>>>>> +        rc = p2m->set_entry(p2m, gfn, INVALID_MFN, PAGE_ORDER_4K,
>>>>> +                            p2m_invalid, p2m_access_rwx, -1);
>>>>> +        ASSERT(!rc);
>>>>
>>>> Can you handle this gracefully?
>>>
>>> Nope. This should never happen, so if it does, something is very wrong
>>> in some other part of Xen.
>>
>>
>> In such a case, please put in a comment explaining why failure is
>> impossible. In the general case e.g. a 2Mb page may need splitting,
>> which may yield -ENOMEM. Such a comment will then also be useful in
>> case a new failure mode gets added to ->set_entry(), where it then
>> will need judging whether the assumption here still holds. (This is
>> also why in general it'd be better to handle the error. It'll still
>> be better to crash the guest than the host in case you can't. See
>> the bottom of ./CODING_STYLE.)
> 
> The mem_sharing codebase uses ASSERT(!rc) on p2m->set_entry already
> when removing pages like we do here (see relinquish_shared_pages).
> This only gets called on shared pages that we know for sure are
> present. Since these are shared pages we know that their size is 4k
> thus there is no splitting. I can certainly add a comment to this
> effect to spell it out why the ASSERT is appropriate.

Well, if this is a pre-existing pattern in the file, then - you
being the maintainer - so be it. I consider this bad practice though,
and I would suggest that every such site needs a comment (possibly
all but one simply referring to the one where things get actually
explained).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking
  2020-02-25 12:23         ` Tamas K Lengyel
@ 2020-02-25 14:23           ` Tamas K Lengyel
  0 siblings, 0 replies; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-25 14:23 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel

> > Note also that during a restore the guest is aware of such process,
> > and will know that it needs to re-register some stuff, but that's not
> > the case when forking, since the guest is not aware you need to make
> > sure everything is in place. There are also the grant table pages,
> > which I think should also be handled somehow (or we need to at least
> > note this isn't handled and will need fixing).
>
> True, the fork is not aware that something happened (and we want to
> keep it that way). But right now everything seems to "just work" as
> far as a standard VM is used. There must be a million cornercases left
> that I haven't covered for sure.
>

I've implemented forking for the vcpu_info pages and while I was
testing I noticed that a Linux VM behaves differently during forking
compared to a Windows VM. Forking a Windows VM at runtime works all
fine, VNC screen is responsive, can log in and in every way the fork
behaves like a regular domain. Forking the Linux VM however at runtime
results in a frozen VNC screen. The VM runs fine otherwise, I use it
in my fuzz tests and tracing it via VMI shows that both the kernel and
userspace programs continue to execute in the fork normally. So there
must be some "glue" the Linux guest needs to wire things up with the
toolstack that I'm missing. When I first save the Linux VM and restore
it with "xl restore -p" and then create forks the VNC console becomes
responsive just as with the Windows VM.

At the moment it's not entirely clear to me what I'm missing that the
Linux guest needs so I'm inclined to declare this a known limitation
and move on since it's not critical for our use-case.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Xen-devel] [PATCH v9 4/5] x86/mem_sharing: reset a fork
  2020-02-25 14:13           ` Jan Beulich
@ 2020-02-25 14:26             ` Tamas K Lengyel
  0 siblings, 0 replies; 45+ messages in thread
From: Tamas K Lengyel @ 2020-02-25 14:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Julien Grall, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson, George Dunlap,
	Xen-devel, Roger Pau Monné

> >> In such a case, please put in a comment explaining why failure is
> >> impossible. In the general case e.g. a 2Mb page may need splitting,
> >> which may yield -ENOMEM. Such a comment will then also be useful in
> >> case a new failure mode gets added to ->set_entry(), where it then
> >> will need judging whether the assumption here still holds. (This is
> >> also why in general it'd be better to handle the error. It'll still
> >> be better to crash the guest than the host in case you can't. See
> >> the bottom of ./CODING_STYLE.)
> >
> > The mem_sharing codebase uses ASSERT(!rc) on p2m->set_entry already
> > when removing pages like we do here (see relinquish_shared_pages).
> > This only gets called on shared pages that we know for sure are
> > present. Since these are shared pages we know that their size is 4k
> > thus there is no splitting. I can certainly add a comment to this
> > effect to spell it out why the ASSERT is appropriate.
>
> Well, if this is a pre-existing pattern in the file, then - you
> being the maintainer - so be it. I consider this bad practice though,
> and I would suggest that every such site needs a comment (possibly
> all but one simply referring to the one where things get actually
> explained).
>

Noted. I think for an experimental code-base it's fine.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2020-02-25 14:27 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-21 18:49 [Xen-devel] [PATCH v9 0/5] VM forking Tamas K Lengyel
2020-02-21 18:49 ` [Xen-devel] [PATCH v9 1/5] xen/x86: Make hap_get_allocation accessible Tamas K Lengyel
2020-02-24 15:20   ` Roger Pau Monné
2020-02-25 13:16     ` Jan Beulich
2020-02-25 13:21       ` Tamas K Lengyel
2020-02-21 18:49 ` [Xen-devel] [PATCH v9 2/5] xen: add parent_domid field to createdomain domctl Tamas K Lengyel
2020-02-21 21:02   ` Julien Grall
2020-02-21 21:35     ` Tamas K Lengyel
2020-02-21 22:34       ` Julien Grall
2020-02-21 22:53         ` Tamas K Lengyel
2020-02-21 23:18           ` Julien Grall
2020-02-21 23:31             ` Tamas K Lengyel
2020-02-24 15:44   ` Andrew Cooper
2020-02-24 15:55     ` Tamas K Lengyel
2020-02-21 18:49 ` [Xen-devel] [PATCH v9 3/5] xen/mem_sharing: VM forking Tamas K Lengyel
2020-02-24 12:39   ` Roger Pau Monné
2020-02-24 15:45     ` Tamas K Lengyel
2020-02-24 15:59       ` Roger Pau Monné
2020-02-24 22:14       ` Tamas K Lengyel
2020-02-25  9:40         ` Roger Pau Monné
2020-02-24 22:26       ` Tamas K Lengyel
2020-02-25  9:40         ` Roger Pau Monné
2020-02-25 13:30       ` Jan Beulich
2020-02-25 13:28     ` Jan Beulich
2020-02-25 13:39       ` Tamas K Lengyel
2020-02-25 10:04   ` Roger Pau Monné
2020-02-25 11:43     ` Tamas K Lengyel
2020-02-25 12:06       ` Roger Pau Monné
2020-02-25 12:23         ` Tamas K Lengyel
2020-02-25 14:23           ` Tamas K Lengyel
2020-02-21 18:49 ` [Xen-devel] [PATCH v9 4/5] x86/mem_sharing: reset a fork Tamas K Lengyel
2020-02-24 15:12   ` Roger Pau Monné
2020-02-24 15:35     ` Tamas K Lengyel
2020-02-24 15:42       ` Roger Pau Monné
2020-02-24 15:49         ` Tamas K Lengyel
2020-02-24 16:02           ` Roger Pau Monné
2020-02-25 13:39       ` Jan Beulich
2020-02-25 13:45         ` Tamas K Lengyel
2020-02-25 14:13           ` Jan Beulich
2020-02-25 14:26             ` Tamas K Lengyel
2020-02-21 18:49 ` [Xen-devel] [PATCH v9 5/5] xen/tools: VM forking toolstack side Tamas K Lengyel
2020-02-24 16:12   ` Julien Grall
2020-02-24 16:19     ` Tamas K Lengyel
2020-02-24 16:30       ` Julien Grall
2020-02-24 16:45         ` Tamas K Lengyel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.