xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [Xen-devel] [PATCH v4 00/18] VM forking
@ 2020-01-08 17:13 Tamas K Lengyel
  2020-01-08 17:13 ` [Xen-devel] [PATCH v4 01/18] x86/hvm: introduce hvm_copy_context_and_params Tamas K Lengyel
                   ` (18 more replies)
  0 siblings, 19 replies; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Petre Pircalabu, Tamas K Lengyel, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Anthony PERARD, Stefano Stabellini, Jan Beulich,
	Alexandru Isaila, Julien Grall, Roger Pau Monné

The following series implements VM forking for Intel HVM guests to allow for
the fast creation of identical VMs without the assosciated high startup costs
of booting or restoring the VM from a savefile.

JIRA issue: https://xenproject.atlassian.net/browse/XEN-89

The fork operation is implemented as part of the "xl fork-vm" command:
    xl fork-vm -C <config_file_for_fork> -Q <qemu_save_file> <parent_domid>
    
By default a fully functional fork is created. The user is in charge however to
create the appropriate config file for the fork and to generate the QEMU save
file before the fork-vm call is made. The config file needs to give the
fork a new name at minimum but other settings may also require changes.

The interface also allows to split the forking into two steps:
    xl fork-vm --launch-dm no \
               -p <parent_domid>
    xl fork-vm --launch-dm late \
               -C <config_file_for_fork> \
               -Q <qemu_save_file> \
               <fork_domid>

The split creation model is useful when the VM needs to be created as fast as
possible. The forked VM can be unpaused without the device model being launched
to be monitored and accessed via VMI. Note however that without its device
model running (depending on what is executing in the VM) it is bound to
misbehave or even crash when its trying to access devices that would be
emulated by QEMU. We anticipate that for certain use-cases this would be an
acceptable situation, in case for example when fuzzing is performed of code
segments that don't access such devices.

Launching the device model requires the QEMU Xen savefile to be generated
manually from the parent VM. This can be accomplished simply by connecting to
its QMP socket and issuing the "xen-save-devices-state" command. For example
using the standard tool socat these commands can be used to generate the file:
    socat - UNIX-CONNECT:/var/run/xen/qmp-libxl-<parent_domid>
    { "execute": "qmp_capabilities" }
    { "execute": "xen-save-devices-state", \
        "arguments": { "filename": "/path/to/save/qemu_state", \
                        "live": false} }

At runtime the forked VM starts running with an empty p2m which gets lazily
populated when the VM generates EPT faults, similar to how altp2m views are
populated. If the memory access is a read-only access, the p2m entry is
populated with a memory shared entry with its parent. For write memory accesses
or in case memory sharing wasn't possible (for example in case a reference is
held by a third party), a new page is allocated and the page contents are
copied over from the parent VM. Forks can be further forked if needed, thus
allowing for further memory savings.

A VM fork reset hypercall is also added that allows the fork to be reset to the
state it was just after a fork, also accessible via xl:
    xl fork-vm --fork-reset -p <fork_domid>

This is an optimization for cases where the forks are very short-lived and run
without a device model, so resetting saves some time compared to creating a
brand new fork provided the fork has not aquired a lot of memory. If the fork
has a lot of memory deduplicated it is likely going to be faster to create a
new fork from scratch and asynchronously destroying the old one.

The series has been tested with both Linux and Windows VMs and functions as
expected. VM forking time has been measured to be 0.0007s, device model launch
to be around 1s depending largely on the number of devices being emulated. Fork
resets have been measured to be 0.0001s under the optimal circumstances.

Patches 1-2 implement changes to existing internal Xen APIs to make VM forking
possible.

Patches 3-14 are code-cleanups and adjustments of to Xen memory sharing
subsystem with no functional changes.

Patch 15 adds the hypervisor-side code implementing VM forking.

Patch 16 is integration of mem_access with forked VMs.

Patch 17 implements the VM fork reset operation hypervisor side bits.

Patch 18 adds the toolstack-side code implementing VM forking and reset.

Tamas K Lengyel (18):
  x86/hvm: introduce hvm_copy_context_and_params
  xen/x86: Make hap_get_allocation accessible
  x86/mem_sharing: make get_two_gfns take locks conditionally
  x86/mem_sharing: drop flags from mem_sharing_unshare_page
  x86/mem_sharing: don't try to unshare twice during page fault
  x86/mem_sharing: define mem_sharing_domain to hold some scattered
    variables
  x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in
    relinquish_shared_pages
  x86/mem_sharing: Make add_to_physmap static and shorten name
  x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool
  x86/mem_sharing: Replace MEM_SHARING_DEBUG with gdprintk
  x86/mem_sharing: ASSERT that p2m_set_entry succeeds
  x86/mem_sharing: Enable mem_sharing on first memop
  x86/mem_sharing: Skip xen heap pages in memshr nominate
  x86/mem_sharing: check page type count earlier
  xen/mem_sharing: VM forking
  xen/mem_access: Use __get_gfn_type_access in set_mem_access
  x86/mem_sharing: reset a fork
  xen/tools: VM forking toolstack side

 docs/man/xl.1.pod.in              |  36 +++
 tools/libxc/include/xenctrl.h     |  13 +
 tools/libxc/xc_memshr.c           |  22 ++
 tools/libxl/libxl.h               |   7 +
 tools/libxl/libxl_create.c        | 237 +++++++++-----
 tools/libxl/libxl_dm.c            |   2 +-
 tools/libxl/libxl_dom.c           |  83 +++--
 tools/libxl/libxl_internal.h      |   1 +
 tools/libxl/libxl_types.idl       |   1 +
 tools/xl/xl.h                     |   5 +
 tools/xl/xl_cmdtable.c            |  12 +
 tools/xl/xl_saverestore.c         |  96 ++++++
 tools/xl/xl_vmcontrol.c           |   8 +
 xen/arch/x86/hvm/hvm.c            | 271 ++++++++++------
 xen/arch/x86/mm/hap/hap.c         |   3 +-
 xen/arch/x86/mm/mem_access.c      |   5 +-
 xen/arch/x86/mm/mem_sharing.c     | 501 +++++++++++++++++++++++-------
 xen/arch/x86/mm/p2m.c             |  16 +-
 xen/common/memory.c               |   2 +-
 xen/drivers/passthrough/pci.c     |   3 +-
 xen/include/asm-x86/hap.h         |   1 +
 xen/include/asm-x86/hvm/domain.h  |   6 +-
 xen/include/asm-x86/hvm/hvm.h     |   2 +
 xen/include/asm-x86/mem_sharing.h |  43 ++-
 xen/include/asm-x86/p2m.h         |  14 +-
 xen/include/public/memory.h       |   6 +
 xen/include/xen/sched.h           |   1 +
 27 files changed, 1058 insertions(+), 339 deletions(-)

-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Xen-devel] [PATCH v4 01/18] x86/hvm: introduce hvm_copy_context_and_params
  2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
@ 2020-01-08 17:13 ` Tamas K Lengyel
  2020-01-16 12:27   ` Jan Beulich
  2020-01-08 17:13 ` [Xen-devel] [PATCH v4 02/18] xen/x86: Make hap_get_allocation accessible Tamas K Lengyel
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Tamas K Lengyel, Wei Liu, Jan Beulich,
	Roger Pau Monné

Currently the hvm parameters are only accessible via the HVMOP hypercalls. In
this patch we introduce a new function that can copy both the hvm context and
parameters directly into a target domain.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/hvm/hvm.c        | 241 +++++++++++++++++++++-------------
 xen/include/asm-x86/hvm/hvm.h |   2 +
 2 files changed, 152 insertions(+), 91 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 4723f5d09c..24f08d7043 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4067,16 +4067,17 @@ static int hvmop_set_evtchn_upcall_vector(
 }
 
 static int hvm_allow_set_param(struct domain *d,
-                               const struct xen_hvm_param *a)
+                               uint32_t index,
+                               uint64_t new_value)
 {
-    uint64_t value = d->arch.hvm.params[a->index];
+    uint64_t value = d->arch.hvm.params[index];
     int rc;
 
     rc = xsm_hvm_param(XSM_TARGET, d, HVMOP_set_param);
     if ( rc )
         return rc;
 
-    switch ( a->index )
+    switch ( index )
     {
     /* The following parameters can be set by the guest. */
     case HVM_PARAM_CALLBACK_IRQ:
@@ -4109,7 +4110,7 @@ static int hvm_allow_set_param(struct domain *d,
     if ( rc )
         return rc;
 
-    switch ( a->index )
+    switch ( index )
     {
     /* The following parameters should only be changed once. */
     case HVM_PARAM_VIRIDIAN:
@@ -4119,7 +4120,7 @@ static int hvm_allow_set_param(struct domain *d,
     case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
     case HVM_PARAM_ALTP2M:
     case HVM_PARAM_MCA_CAP:
-        if ( value != 0 && a->value != value )
+        if ( value != 0 && new_value != value )
             rc = -EEXIST;
         break;
     default:
@@ -4129,49 +4130,32 @@ static int hvm_allow_set_param(struct domain *d,
     return rc;
 }
 
-static int hvmop_set_param(
-    XEN_GUEST_HANDLE_PARAM(xen_hvm_param_t) arg)
+static int hvm_set_param(struct domain *d, uint32_t index, uint64_t value)
 {
     struct domain *curr_d = current->domain;
-    struct xen_hvm_param a;
-    struct domain *d;
-    struct vcpu *v;
     int rc;
+    struct vcpu *v;
 
-    if ( copy_from_guest(&a, arg, 1) )
-        return -EFAULT;
-
-    if ( a.index >= HVM_NR_PARAMS )
+    if ( index >= HVM_NR_PARAMS )
         return -EINVAL;
 
-    /* Make sure the above bound check is not bypassed during speculation. */
-    block_speculation();
-
-    d = rcu_lock_domain_by_any_id(a.domid);
-    if ( d == NULL )
-        return -ESRCH;
-
-    rc = -EINVAL;
-    if ( !is_hvm_domain(d) )
-        goto out;
-
-    rc = hvm_allow_set_param(d, &a);
+    rc = hvm_allow_set_param(d, index, value);
     if ( rc )
         goto out;
 
-    switch ( a.index )
+    switch ( index )
     {
     case HVM_PARAM_CALLBACK_IRQ:
-        hvm_set_callback_via(d, a.value);
+        hvm_set_callback_via(d, value);
         hvm_latch_shinfo_size(d);
         break;
     case HVM_PARAM_TIMER_MODE:
-        if ( a.value > HVMPTM_one_missed_tick_pending )
+        if ( value > HVMPTM_one_missed_tick_pending )
             rc = -EINVAL;
         break;
     case HVM_PARAM_VIRIDIAN:
-        if ( (a.value & ~HVMPV_feature_mask) ||
-             !(a.value & HVMPV_base_freq) )
+        if ( (value & ~HVMPV_feature_mask) ||
+             !(value & HVMPV_base_freq) )
             rc = -EINVAL;
         break;
     case HVM_PARAM_IDENT_PT:
@@ -4181,7 +4165,7 @@ static int hvmop_set_param(
          */
         if ( !paging_mode_hap(d) || !cpu_has_vmx )
         {
-            d->arch.hvm.params[a.index] = a.value;
+            d->arch.hvm.params[index] = value;
             break;
         }
 
@@ -4196,7 +4180,7 @@ static int hvmop_set_param(
 
         rc = 0;
         domain_pause(d);
-        d->arch.hvm.params[a.index] = a.value;
+        d->arch.hvm.params[index] = value;
         for_each_vcpu ( d, v )
             paging_update_cr3(v, false);
         domain_unpause(d);
@@ -4205,23 +4189,23 @@ static int hvmop_set_param(
         break;
     case HVM_PARAM_DM_DOMAIN:
         /* The only value this should ever be set to is DOMID_SELF */
-        if ( a.value != DOMID_SELF )
+        if ( value != DOMID_SELF )
             rc = -EINVAL;
 
-        a.value = curr_d->domain_id;
+        value = curr_d->domain_id;
         break;
     case HVM_PARAM_ACPI_S_STATE:
         rc = 0;
-        if ( a.value == 3 )
+        if ( value == 3 )
             hvm_s3_suspend(d);
-        else if ( a.value == 0 )
+        else if ( value == 0 )
             hvm_s3_resume(d);
         else
             rc = -EINVAL;
 
         break;
     case HVM_PARAM_ACPI_IOPORTS_LOCATION:
-        rc = pmtimer_change_ioport(d, a.value);
+        rc = pmtimer_change_ioport(d, value);
         break;
     case HVM_PARAM_MEMORY_EVENT_CR0:
     case HVM_PARAM_MEMORY_EVENT_CR3:
@@ -4236,24 +4220,24 @@ static int hvmop_set_param(
         rc = xsm_hvm_param_nested(XSM_PRIV, d);
         if ( rc )
             break;
-        if ( a.value > 1 )
+        if ( value > 1 )
             rc = -EINVAL;
         /*
          * Remove the check below once we have
          * shadow-on-shadow.
          */
-        if ( !paging_mode_hap(d) && a.value )
+        if ( !paging_mode_hap(d) && value )
             rc = -EINVAL;
-        if ( a.value &&
+        if ( value &&
              d->arch.hvm.params[HVM_PARAM_ALTP2M] )
             rc = -EINVAL;
         /* Set up NHVM state for any vcpus that are already up. */
-        if ( a.value &&
+        if ( value &&
              !d->arch.hvm.params[HVM_PARAM_NESTEDHVM] )
             for_each_vcpu(d, v)
                 if ( rc == 0 )
                     rc = nestedhvm_vcpu_initialise(v);
-        if ( !a.value || rc )
+        if ( !value || rc )
             for_each_vcpu(d, v)
                 nestedhvm_vcpu_destroy(v);
         break;
@@ -4261,30 +4245,30 @@ static int hvmop_set_param(
         rc = xsm_hvm_param_altp2mhvm(XSM_PRIV, d);
         if ( rc )
             break;
-        if ( a.value > XEN_ALTP2M_limited )
+        if ( value > XEN_ALTP2M_limited )
             rc = -EINVAL;
-        if ( a.value &&
+        if ( value &&
              d->arch.hvm.params[HVM_PARAM_NESTEDHVM] )
             rc = -EINVAL;
         break;
     case HVM_PARAM_TRIPLE_FAULT_REASON:
-        if ( a.value > SHUTDOWN_MAX )
+        if ( value > SHUTDOWN_MAX )
             rc = -EINVAL;
         break;
     case HVM_PARAM_IOREQ_SERVER_PFN:
-        d->arch.hvm.ioreq_gfn.base = a.value;
+        d->arch.hvm.ioreq_gfn.base = value;
         break;
     case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
     {
         unsigned int i;
 
-        if ( a.value == 0 ||
-             a.value > sizeof(d->arch.hvm.ioreq_gfn.mask) * 8 )
+        if ( value == 0 ||
+             value > sizeof(d->arch.hvm.ioreq_gfn.mask) * 8 )
         {
             rc = -EINVAL;
             break;
         }
-        for ( i = 0; i < a.value; i++ )
+        for ( i = 0; i < value; i++ )
             set_bit(i, &d->arch.hvm.ioreq_gfn.mask);
 
         break;
@@ -4296,35 +4280,35 @@ static int hvmop_set_param(
                      sizeof(d->arch.hvm.ioreq_gfn.legacy_mask) * 8);
         BUILD_BUG_ON(HVM_PARAM_BUFIOREQ_PFN >
                      sizeof(d->arch.hvm.ioreq_gfn.legacy_mask) * 8);
-        if ( a.value )
-            set_bit(a.index, &d->arch.hvm.ioreq_gfn.legacy_mask);
+        if ( value )
+            set_bit(index, &d->arch.hvm.ioreq_gfn.legacy_mask);
         break;
 
     case HVM_PARAM_X87_FIP_WIDTH:
-        if ( a.value != 0 && a.value != 4 && a.value != 8 )
+        if ( value != 0 && value != 4 && value != 8 )
         {
             rc = -EINVAL;
             break;
         }
-        d->arch.x87_fip_width = a.value;
+        d->arch.x87_fip_width = value;
         break;
 
     case HVM_PARAM_VM86_TSS:
         /* Hardware would silently truncate high bits. */
-        if ( a.value != (uint32_t)a.value )
+        if ( value != (uint32_t)value )
         {
             if ( d == curr_d )
                 domain_crash(d);
             rc = -EINVAL;
         }
         /* Old hvmloader binaries hardcode the size to 128 bytes. */
-        if ( a.value )
-            a.value |= (128ULL << 32) | VM86_TSS_UPDATED;
-        a.index = HVM_PARAM_VM86_TSS_SIZED;
+        if ( value )
+            value |= (128ULL << 32) | VM86_TSS_UPDATED;
+        index = HVM_PARAM_VM86_TSS_SIZED;
         break;
 
     case HVM_PARAM_VM86_TSS_SIZED:
-        if ( (a.value >> 32) < sizeof(struct tss32) )
+        if ( (value >> 32) < sizeof(struct tss32) )
         {
             if ( d == curr_d )
                 domain_crash(d);
@@ -4335,26 +4319,56 @@ static int hvmop_set_param(
          * 256 bits interrupt redirection bitmap + 64k bits I/O bitmap
          * plus one padding byte).
          */
-        if ( (a.value >> 32) > sizeof(struct tss32) +
+        if ( (value >> 32) > sizeof(struct tss32) +
                                (0x100 / 8) + (0x10000 / 8) + 1 )
-            a.value = (uint32_t)a.value |
+            value = (uint32_t)value |
                       ((sizeof(struct tss32) + (0x100 / 8) +
                                                (0x10000 / 8) + 1) << 32);
-        a.value |= VM86_TSS_UPDATED;
+        value |= VM86_TSS_UPDATED;
         break;
 
     case HVM_PARAM_MCA_CAP:
-        rc = vmce_enable_mca_cap(d, a.value);
+        rc = vmce_enable_mca_cap(d, value);
         break;
     }
 
     if ( rc != 0 )
         goto out;
 
-    d->arch.hvm.params[a.index] = a.value;
+    d->arch.hvm.params[index] = value;
 
     HVM_DBG_LOG(DBG_LEVEL_HCALL, "set param %u = %"PRIx64,
-                a.index, a.value);
+                index, value);
+
+ out:
+    return rc;
+}
+
+int hvmop_set_param(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_param_t) arg)
+{
+    struct xen_hvm_param a;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&a, arg, 1) )
+        return -EFAULT;
+
+    if ( a.index >= HVM_NR_PARAMS )
+        return -EINVAL;
+
+    /* Make sure the above bound check is not bypassed during speculation. */
+    block_speculation();
+
+    d = rcu_lock_domain_by_any_id(a.domid);
+    if ( d == NULL )
+        return -ESRCH;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_set_param(d, a.index, a.value);
 
  out:
     rcu_unlock_domain(d);
@@ -4362,7 +4376,7 @@ static int hvmop_set_param(
 }
 
 static int hvm_allow_get_param(struct domain *d,
-                               const struct xen_hvm_param *a)
+                               uint32_t index)
 {
     int rc;
 
@@ -4370,7 +4384,7 @@ static int hvm_allow_get_param(struct domain *d,
     if ( rc )
         return rc;
 
-    switch ( a->index )
+    switch ( index )
     {
     /* The following parameters can be read by the guest. */
     case HVM_PARAM_CALLBACK_IRQ:
@@ -4400,6 +4414,43 @@ static int hvm_allow_get_param(struct domain *d,
     return rc;
 }
 
+static int hvm_get_param(struct domain *d, uint32_t index, uint64_t *value)
+{
+    int rc;
+
+    if ( index >= HVM_NR_PARAMS || !value )
+        return -EINVAL;
+
+    rc = hvm_allow_get_param(d, index);
+    if ( rc )
+        return rc;
+
+    switch ( index )
+    {
+    case HVM_PARAM_ACPI_S_STATE:
+        *value = d->arch.hvm.is_s3_suspended ? 3 : 0;
+        break;
+
+    case HVM_PARAM_VM86_TSS:
+        *value = (uint32_t)d->arch.hvm.params[HVM_PARAM_VM86_TSS_SIZED];
+        break;
+
+    case HVM_PARAM_VM86_TSS_SIZED:
+        *value = d->arch.hvm.params[HVM_PARAM_VM86_TSS_SIZED] &
+                   ~VM86_TSS_UPDATED;
+        break;
+
+    case HVM_PARAM_X87_FIP_WIDTH:
+        *value = d->arch.x87_fip_width;
+        break;
+    default:
+        *value = d->arch.hvm.params[index];
+        break;
+    }
+
+    return 0;
+};
+
 static int hvmop_get_param(
     XEN_GUEST_HANDLE_PARAM(xen_hvm_param_t) arg)
 {
@@ -4424,33 +4475,10 @@ static int hvmop_get_param(
     if ( !is_hvm_domain(d) )
         goto out;
 
-    rc = hvm_allow_get_param(d, &a);
+    rc = hvm_get_param(d, a.index, &a.value);
     if ( rc )
         goto out;
 
-    switch ( a.index )
-    {
-    case HVM_PARAM_ACPI_S_STATE:
-        a.value = d->arch.hvm.is_s3_suspended ? 3 : 0;
-        break;
-
-    case HVM_PARAM_VM86_TSS:
-        a.value = (uint32_t)d->arch.hvm.params[HVM_PARAM_VM86_TSS_SIZED];
-        break;
-
-    case HVM_PARAM_VM86_TSS_SIZED:
-        a.value = d->arch.hvm.params[HVM_PARAM_VM86_TSS_SIZED] &
-                  ~VM86_TSS_UPDATED;
-        break;
-
-    case HVM_PARAM_X87_FIP_WIDTH:
-        a.value = d->arch.x87_fip_width;
-        break;
-    default:
-        a.value = d->arch.hvm.params[a.index];
-        break;
-    }
-
     rc = __copy_to_guest(arg, &a, 1) ? -EFAULT : 0;
 
     HVM_DBG_LOG(DBG_LEVEL_HCALL, "get param %u = %"PRIx64,
@@ -5266,6 +5294,37 @@ void hvm_set_segment_register(struct vcpu *v, enum x86_segment seg,
     alternative_vcall(hvm_funcs.set_segment_register, v, seg, reg);
 }
 
+int hvm_copy_context_and_params(struct domain *src, struct domain *dst)
+{
+    int rc, i;
+    struct hvm_domain_context c = { };
+
+    c.size = hvm_save_size(src);
+    if ( (c.data = xmalloc_bytes(c.size)) == NULL )
+        return -ENOMEM;
+
+    for ( i = 0; i < HVM_NR_PARAMS; i++ )
+    {
+        uint64_t value = 0;
+
+        if ( hvm_get_param(src, i, &value) || !value )
+            continue;
+
+        if ( (rc = hvm_set_param(dst, i, value)) )
+            goto out;
+    }
+
+    if ( (rc = hvm_save(src, &c)) )
+        goto out;
+
+    c.cur = 0;
+    rc = hvm_load(dst, &c);
+
+out:
+    xfree(c.data);
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 09793c12e9..6106b82c95 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -336,6 +336,8 @@ unsigned long hvm_cr4_guest_valid_bits(const struct domain *d, bool restore);
 bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
                         void *ctxt);
 
+int hvm_copy_context_and_params(struct domain *src, struct domain *dst);
+
 #ifdef CONFIG_HVM
 
 #define hvm_get_guest_tsc(v) hvm_get_guest_tsc_fixed(v, 0)
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Xen-devel] [PATCH v4 02/18] xen/x86: Make hap_get_allocation accessible
  2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
  2020-01-08 17:13 ` [Xen-devel] [PATCH v4 01/18] x86/hvm: introduce hvm_copy_context_and_params Tamas K Lengyel
@ 2020-01-08 17:13 ` Tamas K Lengyel
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 03/18] x86/mem_sharing: make get_two_gfns take locks conditionally Tamas K Lengyel
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	Jan Beulich, Roger Pau Monné

During VM forking we'll copy the parent domain's parameters to the client,
including the HAP shadow memory setting that is used for storing the domain's
EPT. We'll copy this in the hypervisor instead doing it during toolstack launch
to allow the domain to start executing and unsharing memory before (or
even completely without) the toolstack.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/hap/hap.c | 3 +--
 xen/include/asm-x86/hap.h | 1 +
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 3d93f3451c..c7c7ff6e99 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -321,8 +321,7 @@ static void hap_free_p2m_page(struct domain *d, struct page_info *pg)
 }
 
 /* Return the size of the pool, rounded up to the nearest MB */
-static unsigned int
-hap_get_allocation(struct domain *d)
+unsigned int hap_get_allocation(struct domain *d)
 {
     unsigned int pg = d->arch.paging.hap.total_pages
         + d->arch.paging.hap.p2m_pages;
diff --git a/xen/include/asm-x86/hap.h b/xen/include/asm-x86/hap.h
index b94bfb4ed0..1bf07e49fe 100644
--- a/xen/include/asm-x86/hap.h
+++ b/xen/include/asm-x86/hap.h
@@ -45,6 +45,7 @@ int   hap_track_dirty_vram(struct domain *d,
 
 extern const struct paging_mode *hap_paging_get_mode(struct vcpu *);
 int hap_set_allocation(struct domain *d, unsigned int pages, bool *preempted);
+unsigned int hap_get_allocation(struct domain *d);
 
 #endif /* XEN_HAP_H */
 
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Xen-devel] [PATCH v4 03/18] x86/mem_sharing: make get_two_gfns take locks conditionally
  2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
  2020-01-08 17:13 ` [Xen-devel] [PATCH v4 01/18] x86/hvm: introduce hvm_copy_context_and_params Tamas K Lengyel
  2020-01-08 17:13 ` [Xen-devel] [PATCH v4 02/18] xen/x86: Make hap_get_allocation accessible Tamas K Lengyel
@ 2020-01-08 17:14 ` Tamas K Lengyel
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 04/18] x86/mem_sharing: drop flags from mem_sharing_unshare_page Tamas K Lengyel
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

During VM forking the client lock will already be taken.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Acked-by: Andrew Coopers <andrew.cooper3@citrix.com>
---
 xen/arch/x86/mm/mem_sharing.c | 11 ++++++-----
 xen/include/asm-x86/p2m.h     | 10 +++++-----
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index ddf1f0f9f9..f6187403a0 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -955,7 +955,7 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh,
     unsigned long put_count = 0;
 
     get_two_gfns(sd, sgfn, &smfn_type, NULL, &smfn,
-                 cd, cgfn, &cmfn_type, NULL, &cmfn, 0, &tg);
+                 cd, cgfn, &cmfn_type, NULL, &cmfn, 0, &tg, true);
 
     /*
      * This tricky business is to avoid two callers deadlocking if
@@ -1073,7 +1073,7 @@ err_out:
 }
 
 int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh,
-                               struct domain *cd, unsigned long cgfn)
+                               struct domain *cd, unsigned long cgfn, bool lock)
 {
     struct page_info *spage;
     int ret = -EINVAL;
@@ -1085,7 +1085,7 @@ int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle
     struct two_gfns tg;
 
     get_two_gfns(sd, _gfn(sgfn), &smfn_type, NULL, &smfn,
-                 cd, _gfn(cgfn), &cmfn_type, &a, &cmfn, 0, &tg);
+                 cd, _gfn(cgfn), &cmfn_type, &a, &cmfn, 0, &tg, lock);
 
     /* Get the source shared page, check and lock */
     ret = XENMEM_SHARING_OP_S_HANDLE_INVALID;
@@ -1162,7 +1162,8 @@ int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle
 err_unlock:
     mem_sharing_page_unlock(spage);
 err_out:
-    put_two_gfns(&tg);
+    if ( lock )
+        put_two_gfns(&tg);
     return ret;
 }
 
@@ -1583,7 +1584,7 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
         sh      = mso.u.share.source_handle;
         cgfn    = mso.u.share.client_gfn;
 
-        rc = mem_sharing_add_to_physmap(d, sgfn, sh, cd, cgfn);
+        rc = mem_sharing_add_to_physmap(d, sgfn, sh, cd, cgfn, true);
 
         rcu_unlock_domain(cd);
     }
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 94285db1b4..7399c4a897 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -539,7 +539,7 @@ struct two_gfns {
 static inline void get_two_gfns(struct domain *rd, gfn_t rgfn,
         p2m_type_t *rt, p2m_access_t *ra, mfn_t *rmfn, struct domain *ld,
         gfn_t lgfn, p2m_type_t *lt, p2m_access_t *la, mfn_t *lmfn,
-        p2m_query_t q, struct two_gfns *rval)
+        p2m_query_t q, struct two_gfns *rval, bool lock)
 {
     mfn_t           *first_mfn, *second_mfn, scratch_mfn;
     p2m_access_t    *first_a, *second_a, scratch_a;
@@ -569,10 +569,10 @@ do {                                                    \
 #undef assign_pointers
 
     /* Now do the gets */
-    *first_mfn  = get_gfn_type_access(p2m_get_hostp2m(rval->first_domain),
-                                      gfn_x(rval->first_gfn), first_t, first_a, q, NULL);
-    *second_mfn = get_gfn_type_access(p2m_get_hostp2m(rval->second_domain),
-                                      gfn_x(rval->second_gfn), second_t, second_a, q, NULL);
+    *first_mfn  = __get_gfn_type_access(p2m_get_hostp2m(rval->first_domain),
+                                        gfn_x(rval->first_gfn), first_t, first_a, q, NULL, lock);
+    *second_mfn = __get_gfn_type_access(p2m_get_hostp2m(rval->second_domain),
+                                        gfn_x(rval->second_gfn), second_t, second_a, q, NULL, lock);
 }
 
 static inline void put_two_gfns(struct two_gfns *arg)
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Xen-devel] [PATCH v4 04/18] x86/mem_sharing: drop flags from mem_sharing_unshare_page
  2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
                   ` (2 preceding siblings ...)
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 03/18] x86/mem_sharing: make get_two_gfns take locks conditionally Tamas K Lengyel
@ 2020-01-08 17:14 ` Tamas K Lengyel
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 05/18] x86/mem_sharing: don't try to unshare twice during page fault Tamas K Lengyel
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Jan Beulich, Julien Grall, Roger Pau Monné

All callers pass 0 in.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Wei Liu <wl@xen.org>
---
 xen/arch/x86/hvm/hvm.c            | 2 +-
 xen/arch/x86/mm/p2m.c             | 5 ++---
 xen/common/memory.c               | 2 +-
 xen/include/asm-x86/mem_sharing.h | 8 +++-----
 4 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 24f08d7043..38e9006c92 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1898,7 +1898,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
     if ( npfec.write_access && (p2mt == p2m_ram_shared) )
     {
         ASSERT(p2m_is_hostp2m(p2m));
-        sharing_enomem = mem_sharing_unshare_page(currd, gfn, 0);
+        sharing_enomem = mem_sharing_unshare_page(currd, gfn);
         rc = 1;
         goto out_put_gfn;
     }
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 3119269073..baea632acc 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -515,7 +515,7 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
          * Try to unshare. If we fail, communicate ENOMEM without
          * sleeping.
          */
-        if ( mem_sharing_unshare_page(p2m->domain, gfn_l, 0) < 0 )
+        if ( mem_sharing_unshare_page(p2m->domain, gfn_l) < 0 )
             mem_sharing_notify_enomem(p2m->domain, gfn_l, false);
         mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
     }
@@ -896,8 +896,7 @@ guest_physmap_add_entry(struct domain *d, gfn_t gfn, mfn_t mfn,
         {
             /* Do an unshare to cleanly take care of all corner cases. */
             int rc;
-            rc = mem_sharing_unshare_page(p2m->domain,
-                                          gfn_x(gfn_add(gfn, i)), 0);
+            rc = mem_sharing_unshare_page(p2m->domain, gfn_x(gfn_add(gfn, i)));
             if ( rc )
             {
                 p2m_unlock(p2m);
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 309e872edf..c7d2bac452 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -352,7 +352,7 @@ int guest_remove_page(struct domain *d, unsigned long gmfn)
          * might be the only one using this shared page, and we need to
          * trigger proper cleanup. Once done, this is like any other page.
          */
-        rc = mem_sharing_unshare_page(d, gmfn, 0);
+        rc = mem_sharing_unshare_page(d, gmfn);
         if ( rc )
         {
             mem_sharing_notify_enomem(d, gmfn, false);
diff --git a/xen/include/asm-x86/mem_sharing.h b/xen/include/asm-x86/mem_sharing.h
index af2a1038b5..cf7848709f 100644
--- a/xen/include/asm-x86/mem_sharing.h
+++ b/xen/include/asm-x86/mem_sharing.h
@@ -69,10 +69,9 @@ int __mem_sharing_unshare_page(struct domain *d,
                                uint16_t flags);
 
 static inline int mem_sharing_unshare_page(struct domain *d,
-                                           unsigned long gfn,
-                                           uint16_t flags)
+                                           unsigned long gfn)
 {
-    int rc = __mem_sharing_unshare_page(d, gfn, flags);
+    int rc = __mem_sharing_unshare_page(d, gfn, 0);
     BUG_ON(rc && (rc != -ENOMEM));
     return rc;
 }
@@ -115,8 +114,7 @@ static inline unsigned int mem_sharing_get_nr_shared_mfns(void)
     return 0;
 }
 
-static inline int mem_sharing_unshare_page(struct domain *d, unsigned long gfn,
-                                           uint16_t flags)
+static inline int mem_sharing_unshare_page(struct domain *d, unsigned long gfn)
 {
     ASSERT_UNREACHABLE();
     return -EOPNOTSUPP;
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Xen-devel] [PATCH v4 05/18] x86/mem_sharing: don't try to unshare twice during page fault
  2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
                   ` (3 preceding siblings ...)
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 04/18] x86/mem_sharing: drop flags from mem_sharing_unshare_page Tamas K Lengyel
@ 2020-01-08 17:14 ` Tamas K Lengyel
  2020-01-16 14:53   ` Jan Beulich
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 06/18] x86/mem_sharing: define mem_sharing_domain to hold some scattered variables Tamas K Lengyel
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Tamas K Lengyel, Wei Liu, Jan Beulich,
	Roger Pau Monné

The page was already tried to be unshared in get_gfn_type_access. If that
didn't work, then trying again is pointless. Don't try to send vm_event again
either, simply check if there is a ring or not.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/hvm/hvm.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 38e9006c92..5d24ceb469 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -38,6 +38,7 @@
 #include <xen/warning.h>
 #include <xen/vpci.h>
 #include <xen/nospec.h>
+#include <xen/vm_event.h>
 #include <asm/shadow.h>
 #include <asm/hap.h>
 #include <asm/current.h>
@@ -1702,11 +1703,14 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
     struct domain *currd = curr->domain;
     struct p2m_domain *p2m, *hostp2m;
     int rc, fall_through = 0, paged = 0;
-    int sharing_enomem = 0;
     vm_event_request_t *req_ptr = NULL;
     bool sync = false;
     unsigned int page_order;
 
+#ifdef CONFIG_MEM_SHARING
+    bool sharing_enomem = false;
+#endif
+
     /* On Nested Virtualization, walk the guest page table.
      * If this succeeds, all is fine.
      * If this fails, inject a nested page fault into the guest.
@@ -1894,14 +1898,16 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
     if ( p2m_is_paged(p2mt) || (p2mt == p2m_ram_paging_out) )
         paged = 1;
 
-    /* Mem sharing: unshare the page and try again */
-    if ( npfec.write_access && (p2mt == p2m_ram_shared) )
+#ifdef CONFIG_MEM_SHARING
+    /* Mem sharing: if still shared on write access then its enomem */
+    if ( npfec.write_access && p2m_is_shared(p2mt) )
     {
         ASSERT(p2m_is_hostp2m(p2m));
-        sharing_enomem = mem_sharing_unshare_page(currd, gfn);
+        sharing_enomem = true;
         rc = 1;
         goto out_put_gfn;
     }
+#endif
 
     /* Spurious fault? PoD and log-dirty also take this path. */
     if ( p2m_is_ram(p2mt) )
@@ -1955,19 +1961,21 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
      */
     if ( paged )
         p2m_mem_paging_populate(currd, gfn);
+
+#ifdef CONFIG_MEM_SHARING
     if ( sharing_enomem )
     {
-        int rv;
-
-        if ( (rv = mem_sharing_notify_enomem(currd, gfn, true)) < 0 )
+        if ( !vm_event_check_ring(currd->vm_event_share) )
         {
-            gdprintk(XENLOG_ERR, "Domain %hu attempt to unshare "
-                     "gfn %lx, ENOMEM and no helper (rc %d)\n",
-                     currd->domain_id, gfn, rv);
+            gprintk(XENLOG_ERR, "Domain %pd attempt to unshare "
+                    "gfn %lx, ENOMEM and no helper\n",
+                    currd, gfn);
             /* Crash the domain */
             rc = 0;
         }
     }
+#endif
+
     if ( req_ptr )
     {
         if ( monitor_traps(curr, sync, req_ptr) < 0 )
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Xen-devel] [PATCH v4 06/18] x86/mem_sharing: define mem_sharing_domain to hold some scattered variables
  2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
                   ` (4 preceding siblings ...)
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 05/18] x86/mem_sharing: don't try to unshare twice during page fault Tamas K Lengyel
@ 2020-01-08 17:14 ` Tamas K Lengyel
  2020-01-16 15:23   ` Jan Beulich
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 07/18] x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in relinquish_shared_pages Tamas K Lengyel
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

Create struct mem_sharing_domain under hvm_domain and move mem sharing
variables into it from p2m_domain and hvm_domain.

Expose the mem_sharing_enabled macro to be used consistently across Xen.

Remove some duplicate calls to mem_sharing_enabled in mem_sharing.c

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c     | 10 ++++------
 xen/drivers/passthrough/pci.c     |  3 +--
 xen/include/asm-x86/hvm/domain.h  |  6 +++++-
 xen/include/asm-x86/mem_sharing.h | 16 ++++++++++++++++
 xen/include/asm-x86/p2m.h         |  4 ----
 5 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index f6187403a0..3aa61c30e6 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -197,9 +197,6 @@ static shr_handle_t get_next_handle(void)
     return x + 1;
 }
 
-#define mem_sharing_enabled(d) \
-    (is_hvm_domain(d) && (d)->arch.hvm.mem_sharing_enabled)
-
 static atomic_t nr_saved_mfns   = ATOMIC_INIT(0);
 static atomic_t nr_shared_mfns  = ATOMIC_INIT(0);
 
@@ -1309,6 +1306,7 @@ int __mem_sharing_unshare_page(struct domain *d,
 int relinquish_shared_pages(struct domain *d)
 {
     int rc = 0;
+    struct mem_sharing_domain *msd = &d->arch.hvm.mem_sharing;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
     unsigned long gfn, count = 0;
 
@@ -1316,7 +1314,7 @@ int relinquish_shared_pages(struct domain *d)
         return 0;
 
     p2m_lock(p2m);
-    for ( gfn = p2m->next_shared_gfn_to_relinquish;
+    for ( gfn = msd->next_shared_gfn_to_relinquish;
           gfn <= p2m->max_mapped_pfn; gfn++ )
     {
         p2m_access_t a;
@@ -1351,7 +1349,7 @@ int relinquish_shared_pages(struct domain *d)
         {
             if ( hypercall_preempt_check() )
             {
-                p2m->next_shared_gfn_to_relinquish = gfn + 1;
+                msd->next_shared_gfn_to_relinquish = gfn + 1;
                 rc = -ERESTART;
                 break;
             }
@@ -1437,7 +1435,7 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 
     /* Only HAP is supported */
     rc = -ENODEV;
-    if ( !hap_enabled(d) || !d->arch.hvm.mem_sharing_enabled )
+    if ( !mem_sharing_enabled(d) )
         goto out;
 
     switch ( mso.op )
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index c07a63981a..65d1d457ff 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1498,8 +1498,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
     /* Prevent device assign if mem paging or mem sharing have been 
      * enabled for this domain */
     if ( d != dom_io &&
-         unlikely((is_hvm_domain(d) &&
-                   d->arch.hvm.mem_sharing_enabled) ||
+         unlikely(mem_sharing_enabled(d) ||
                   vm_event_check_ring(d->vm_event_paging) ||
                   p2m_get_hostp2m(d)->global_logdirty) )
         return -EXDEV;
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index bcc5621797..8f70ba2b1a 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -29,6 +29,7 @@
 #include <asm/hvm/viridian.h>
 #include <asm/hvm/vmx/vmcs.h>
 #include <asm/hvm/svm/vmcb.h>
+#include <asm/mem_sharing.h>
 #include <public/grant_table.h>
 #include <public/hvm/params.h>
 #include <public/hvm/save.h>
@@ -156,7 +157,6 @@ struct hvm_domain {
 
     struct viridian_domain *viridian;
 
-    bool_t                 mem_sharing_enabled;
     bool_t                 qemu_mapcache_invalidate;
     bool_t                 is_s3_suspended;
 
@@ -192,6 +192,10 @@ struct hvm_domain {
         struct vmx_domain vmx;
         struct svm_domain svm;
     };
+
+#ifdef CONFIG_MEM_SHARING
+    struct mem_sharing_domain mem_sharing;
+#endif
 };
 
 #endif /* __ASM_X86_HVM_DOMAIN_H__ */
diff --git a/xen/include/asm-x86/mem_sharing.h b/xen/include/asm-x86/mem_sharing.h
index cf7848709f..13114b6346 100644
--- a/xen/include/asm-x86/mem_sharing.h
+++ b/xen/include/asm-x86/mem_sharing.h
@@ -26,6 +26,20 @@
 
 #ifdef CONFIG_MEM_SHARING
 
+struct mem_sharing_domain
+{
+    bool enabled;
+
+    /*
+     * When releasing shared gfn's in a preemptible manner, recall where
+     * to resume the search.
+     */
+    unsigned long next_shared_gfn_to_relinquish;
+};
+
+#define mem_sharing_enabled(d) \
+    (hap_enabled(d) && (d)->arch.hvm.mem_sharing.enabled)
+
 /* Auditing of memory sharing code? */
 #ifndef NDEBUG
 #define MEM_SHARING_AUDIT 1
@@ -104,6 +118,8 @@ int relinquish_shared_pages(struct domain *d);
 
 #else
 
+#define mem_sharing_enabled(d) false
+
 static inline unsigned int mem_sharing_get_nr_saved_mfns(void)
 {
     return 0;
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 7399c4a897..8defa90306 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -305,10 +305,6 @@ struct p2m_domain {
     unsigned long min_remapped_gfn;
     unsigned long max_remapped_gfn;
 
-    /* When releasing shared gfn's in a preemptible manner, recall where
-     * to resume the search */
-    unsigned long next_shared_gfn_to_relinquish;
-
 #ifdef CONFIG_HVM
     /* Populate-on-demand variables
      * All variables are protected with the pod lock. We cannot rely on
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Xen-devel] [PATCH v4 07/18] x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in relinquish_shared_pages
  2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
                   ` (5 preceding siblings ...)
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 06/18] x86/mem_sharing: define mem_sharing_domain to hold some scattered variables Tamas K Lengyel
@ 2020-01-08 17:14 ` Tamas K Lengyel
  2020-01-16 15:40   ` Jan Beulich
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 08/18] x86/mem_sharing: Make add_to_physmap static and shorten name Tamas K Lengyel
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

While using _mfn(0) is of no consequence during teardown, INVALID_MFN is the
correct value that should be used.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 3aa61c30e6..95e75ff298 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1326,7 +1326,7 @@ int relinquish_shared_pages(struct domain *d)
             break;
 
         mfn = p2m->get_entry(p2m, _gfn(gfn), &t, &a, 0, NULL, NULL);
-        if ( mfn_valid(mfn) && t == p2m_ram_shared )
+        if ( mfn_valid(mfn) && p2m_is_shared(t) )
         {
             /* Does not fail with ENOMEM given the DESTROY flag */
             BUG_ON(__mem_sharing_unshare_page(
@@ -1336,7 +1336,7 @@ int relinquish_shared_pages(struct domain *d)
              * unshare.  Must succeed: we just read the old entry and
              * we hold the p2m lock.
              */
-            set_rc = p2m->set_entry(p2m, _gfn(gfn), _mfn(0), PAGE_ORDER_4K,
+            set_rc = p2m->set_entry(p2m, _gfn(gfn), INVALID_MFN, PAGE_ORDER_4K,
                                     p2m_invalid, p2m_access_rwx, -1);
             ASSERT(!set_rc);
             count += 0x10;
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Xen-devel] [PATCH v4 08/18] x86/mem_sharing: Make add_to_physmap static and shorten name
  2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
                   ` (6 preceding siblings ...)
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 07/18] x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in relinquish_shared_pages Tamas K Lengyel
@ 2020-01-08 17:14 ` Tamas K Lengyel
  2020-01-16 15:40   ` Jan Beulich
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 09/18] x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool Tamas K Lengyel
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

It's not being called from outside mem_sharing.c

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 95e75ff298..84b9f130b9 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1069,8 +1069,9 @@ err_out:
     return ret;
 }
 
-int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh,
-                               struct domain *cd, unsigned long cgfn, bool lock)
+static
+int add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh,
+                   struct domain *cd, unsigned long cgfn, bool lock)
 {
     struct page_info *spage;
     int ret = -EINVAL;
@@ -1582,7 +1583,7 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
         sh      = mso.u.share.source_handle;
         cgfn    = mso.u.share.client_gfn;
 
-        rc = mem_sharing_add_to_physmap(d, sgfn, sh, cd, cgfn, true);
+        rc = add_to_physmap(d, sgfn, sh, cd, cgfn, true);
 
         rcu_unlock_domain(cd);
     }
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Xen-devel] [PATCH v4 09/18] x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool
  2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
                   ` (7 preceding siblings ...)
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 08/18] x86/mem_sharing: Make add_to_physmap static and shorten name Tamas K Lengyel
@ 2020-01-08 17:14 ` Tamas K Lengyel
  2020-01-16 15:42   ` Jan Beulich
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 10/18] x86/mem_sharing: Replace MEM_SHARING_DEBUG with gdprintk Tamas K Lengyel
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

MEM_SHARING_DESTROY_GFN is used on the 'flags' bitfield during unsharing.
However, the bitfield is not used for anything else, so just convert it to a
bool instead.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c     | 9 ++++-----
 xen/include/asm-x86/mem_sharing.h | 5 ++---
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 84b9f130b9..0435a7f803 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1182,7 +1182,7 @@ err_out:
  */
 int __mem_sharing_unshare_page(struct domain *d,
                                unsigned long gfn,
-                               uint16_t flags)
+                               bool destroy)
 {
     p2m_type_t p2mt;
     mfn_t mfn;
@@ -1238,7 +1238,7 @@ int __mem_sharing_unshare_page(struct domain *d,
      * If the GFN is getting destroyed drop the references to MFN
      * (possibly freeing the page), and exit early.
      */
-    if ( flags & MEM_SHARING_DESTROY_GFN )
+    if ( destroy )
     {
         if ( !last_gfn )
             mem_sharing_gfn_destroy(page, d, gfn_info);
@@ -1329,9 +1329,8 @@ int relinquish_shared_pages(struct domain *d)
         mfn = p2m->get_entry(p2m, _gfn(gfn), &t, &a, 0, NULL, NULL);
         if ( mfn_valid(mfn) && p2m_is_shared(t) )
         {
-            /* Does not fail with ENOMEM given the DESTROY flag */
-            BUG_ON(__mem_sharing_unshare_page(
-                       d, gfn, MEM_SHARING_DESTROY_GFN));
+            /* Does not fail with ENOMEM given "destroy" is set to true */
+            BUG_ON(__mem_sharing_unshare_page(d, gfn, true));
             /*
              * Clear out the p2m entry so no one else may try to
              * unshare.  Must succeed: we just read the old entry and
diff --git a/xen/include/asm-x86/mem_sharing.h b/xen/include/asm-x86/mem_sharing.h
index 13114b6346..c915fd973f 100644
--- a/xen/include/asm-x86/mem_sharing.h
+++ b/xen/include/asm-x86/mem_sharing.h
@@ -76,16 +76,15 @@ struct page_sharing_info
 unsigned int mem_sharing_get_nr_saved_mfns(void);
 unsigned int mem_sharing_get_nr_shared_mfns(void);
 
-#define MEM_SHARING_DESTROY_GFN       (1<<1)
 /* Only fails with -ENOMEM. Enforce it with a BUG_ON wrapper. */
 int __mem_sharing_unshare_page(struct domain *d,
                                unsigned long gfn,
-                               uint16_t flags);
+                               bool destroy);
 
 static inline int mem_sharing_unshare_page(struct domain *d,
                                            unsigned long gfn)
 {
-    int rc = __mem_sharing_unshare_page(d, gfn, 0);
+    int rc = __mem_sharing_unshare_page(d, gfn, false);
     BUG_ON(rc && (rc != -ENOMEM));
     return rc;
 }
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Xen-devel] [PATCH v4 10/18] x86/mem_sharing: Replace MEM_SHARING_DEBUG with gdprintk
  2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
                   ` (8 preceding siblings ...)
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 09/18] x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool Tamas K Lengyel
@ 2020-01-08 17:14 ` Tamas K Lengyel
  2020-01-16 16:01   ` Jan Beulich
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 11/18] x86/mem_sharing: ASSERT that p2m_set_entry succeeds Tamas K Lengyel
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

Using XENLOG_ERR level since this is only used in debug paths (ie. it's
expected the user already has loglvl=all set).

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 86 +++++++++++++++++------------------
 1 file changed, 43 insertions(+), 43 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 0435a7f803..93e7605900 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -49,9 +49,6 @@ typedef struct pg_lock_data {
 
 static DEFINE_PER_CPU(pg_lock_data_t, __pld);
 
-#define MEM_SHARING_DEBUG(_f, _a...)                                  \
-    debugtrace_printk("mem_sharing_debug: %s(): " _f, __func__, ##_a)
-
 /* Reverse map defines */
 #define RMAP_HASHTAB_ORDER  0
 #define RMAP_HASHTAB_SIZE   \
@@ -494,19 +491,19 @@ static int audit(void)
         /* If we can't lock it, it's definitely not a shared page */
         if ( !mem_sharing_page_lock(pg) )
         {
-            MEM_SHARING_DEBUG(
-                "mfn %lx in audit list, but cannot be locked (%lx)!\n",
-                mfn_x(mfn), pg->u.inuse.type_info);
-            errors++;
-            continue;
+            gdprintk(XENLOG_ERR,
+                     "mfn %lx in audit list, but cannot be locked (%lx)!\n",
+                     mfn_x(mfn), pg->u.inuse.type_info);
+           errors++;
+           continue;
         }
 
         /* Check if the MFN has correct type, owner and handle. */
         if ( (pg->u.inuse.type_info & PGT_type_mask) != PGT_shared_page )
         {
-            MEM_SHARING_DEBUG(
-                "mfn %lx in audit list, but not PGT_shared_page (%lx)!\n",
-                mfn_x(mfn), pg->u.inuse.type_info & PGT_type_mask);
+            gdprintk(XENLOG_ERR,
+                     "mfn %lx in audit list, but not PGT_shared_page (%lx)!\n",
+                     mfn_x(mfn), pg->u.inuse.type_info & PGT_type_mask);
             errors++;
             continue;
         }
@@ -514,24 +511,24 @@ static int audit(void)
         /* Check the page owner. */
         if ( page_get_owner(pg) != dom_cow )
         {
-            MEM_SHARING_DEBUG("mfn %lx shared, but wrong owner %pd!\n",
-                              mfn_x(mfn), page_get_owner(pg));
-            errors++;
+               gdprintk(XENLOG_ERR, "mfn %lx shared, but wrong owner (%hu)!\n",
+                        mfn_x(mfn), page_get_owner(pg)->domain_id);
+               errors++;
         }
 
         /* Check the m2p entry */
         if ( !SHARED_M2P(get_gpfn_from_mfn(mfn_x(mfn))) )
         {
-            MEM_SHARING_DEBUG("mfn %lx shared, but wrong m2p entry (%lx)!\n",
-                              mfn_x(mfn), get_gpfn_from_mfn(mfn_x(mfn)));
-            errors++;
+               gdprintk(XENLOG_ERR, "mfn %lx shared, but wrong m2p entry (%lx)!\n",
+                        mfn_x(mfn), get_gpfn_from_mfn(mfn_x(mfn)));
+               errors++;
         }
 
         /* Check we have a list */
         if ( (!pg->sharing) || !rmap_has_entries(pg) )
         {
-            MEM_SHARING_DEBUG("mfn %lx shared, but empty gfn list!\n",
-                              mfn_x(mfn));
+            gdprintk(XENLOG_ERR, "mfn %lx shared, but empty gfn list!\n",
+                     mfn_x(mfn));
             errors++;
             continue;
         }
@@ -550,24 +547,26 @@ static int audit(void)
             d = get_domain_by_id(g->domain);
             if ( d == NULL )
             {
-                MEM_SHARING_DEBUG("Unknown dom: %hu, for PFN=%lx, MFN=%lx\n",
-                                  g->domain, g->gfn, mfn_x(mfn));
+                gdprintk(XENLOG_ERR,
+                         "Unknown dom: %hu, for PFN=%lx, MFN=%lx\n",
+                         g->domain, g->gfn, mfn_x(mfn));
                 errors++;
                 continue;
             }
             o_mfn = get_gfn_query_unlocked(d, g->gfn, &t);
             if ( !mfn_eq(o_mfn, mfn) )
             {
-                MEM_SHARING_DEBUG("Incorrect P2M for d=%hu, PFN=%lx."
-                                  "Expecting MFN=%lx, got %lx\n",
-                                  g->domain, g->gfn, mfn_x(mfn), mfn_x(o_mfn));
+                gdprintk(XENLOG_ERR, "Incorrect P2M for d=%hu, PFN=%lx."
+                         "Expecting MFN=%lx, got %lx\n",
+                         g->domain, g->gfn, mfn_x(mfn), mfn_x(o_mfn));
                 errors++;
             }
             if ( t != p2m_ram_shared )
             {
-                MEM_SHARING_DEBUG("Incorrect P2M type for d=%hu, PFN=%lx MFN=%lx."
-                                  "Expecting t=%d, got %d\n",
-                                  g->domain, g->gfn, mfn_x(mfn), p2m_ram_shared, t);
+                gdprintk(XENLOG_ERR,
+                         "Incorrect P2M type for d=%hu, PFN=%lx MFN=%lx."
+                         "Expecting t=%d, got %d\n",
+                         g->domain, g->gfn, mfn_x(mfn), p2m_ram_shared, t);
                 errors++;
             }
             put_domain(d);
@@ -576,10 +575,10 @@ static int audit(void)
         /* The type count has an extra ref because we have locked the page */
         if ( (nr_gfns + 1) != (pg->u.inuse.type_info & PGT_count_mask) )
         {
-            MEM_SHARING_DEBUG("Mismatched counts for MFN=%lx."
-                              "nr_gfns in list %lu, in type_info %lx\n",
-                              mfn_x(mfn), nr_gfns,
-                              (pg->u.inuse.type_info & PGT_count_mask));
+            gdprintk(XENLOG_ERR, "Mismatched counts for MFN=%lx."
+                     "nr_gfns in list %lu, in type_info %lx\n",
+                     mfn_x(mfn), nr_gfns,
+                     (pg->u.inuse.type_info & PGT_count_mask));
             errors++;
         }
 
@@ -590,8 +589,8 @@ static int audit(void)
 
     if ( count_found != count_expected )
     {
-        MEM_SHARING_DEBUG("Expected %ld shared mfns, found %ld.",
-                          count_expected, count_found);
+        gdprintk(XENLOG_ERR, "Expected %ld shared mfns, found %ld.",
+                 count_expected, count_found);
         errors++;
     }
 
@@ -769,10 +768,10 @@ static int debug_mfn(mfn_t mfn)
         return -EINVAL;
     }
 
-    MEM_SHARING_DEBUG(
-        "Debug page: MFN=%lx is ci=%lx, ti=%lx, owner=%pd\n",
-        mfn_x(page_to_mfn(page)), page->count_info,
-        page->u.inuse.type_info, page_get_owner(page));
+    gdprintk(XENLOG_ERR,
+             "Debug page: MFN=%lx is ci=%lx, ti=%lx, owner_id=%d\n",
+             mfn_x(page_to_mfn(page)), page->count_info,
+             page->u.inuse.type_info, page_get_owner(page)->domain_id);
 
     /* -1 because the page is locked and that's an additional type ref */
     num_refs = ((int) (page->u.inuse.type_info & PGT_count_mask)) - 1;
@@ -788,8 +787,9 @@ static int debug_gfn(struct domain *d, gfn_t gfn)
 
     mfn = get_gfn_query(d, gfn_x(gfn), &p2mt);
 
-    MEM_SHARING_DEBUG("Debug for dom%d, gfn=%" PRI_gfn "\n",
-                      d->domain_id, gfn_x(gfn));
+    gdprintk(XENLOG_ERR, "Debug for dom%d, gfn=%" PRI_gfn "\n",
+             d->domain_id, gfn_x(gfn));
+
     num_refs = debug_mfn(mfn);
     put_gfn(d, gfn_x(gfn));
 
@@ -805,13 +805,13 @@ static int debug_gref(struct domain *d, grant_ref_t ref)
     rc = mem_sharing_gref_to_gfn(d->grant_table, ref, &gfn, &status);
     if ( rc )
     {
-        MEM_SHARING_DEBUG("Asked to debug [dom=%d,gref=%u]: error %d.\n",
-                          d->domain_id, ref, rc);
+        gdprintk(XENLOG_ERR, "Asked to debug [dom=%d,gref=%u]: error %d.\n",
+                 d->domain_id, ref, rc);
         return rc;
     }
 
-    MEM_SHARING_DEBUG("==> Grant [dom=%d,ref=%d], status=%x. ",
-                      d->domain_id, ref, status);
+    gdprintk(XENLOG_ERR, "==> Grant [dom=%d,ref=%d], status=%x. ",
+             d->domain_id, ref, status);
 
     return debug_gfn(d, gfn);
 }
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Xen-devel] [PATCH v4 11/18] x86/mem_sharing: ASSERT that p2m_set_entry succeeds
  2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
                   ` (9 preceding siblings ...)
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 10/18] x86/mem_sharing: Replace MEM_SHARING_DEBUG with gdprintk Tamas K Lengyel
@ 2020-01-08 17:14 ` Tamas K Lengyel
  2020-01-16 16:07   ` Jan Beulich
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 12/18] x86/mem_sharing: Enable mem_sharing on first memop Tamas K Lengyel
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 42 +++++++++++++++++------------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 93e7605900..3f36cd6bbc 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1117,11 +1117,19 @@ int add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh,
         goto err_unlock;
     }
 
+    /*
+     * Must succeed, we just read the entry and hold the p2m lock
+     * via get_two_gfns.
+     */
     ret = p2m_set_entry(p2m, _gfn(cgfn), smfn, PAGE_ORDER_4K,
                         p2m_ram_shared, a);
+    ASSERT(!ret);
 
-    /* Tempted to turn this into an assert */
-    if ( ret )
+    /*
+     * There is a chance we're plugging a hole where a paged out
+     * page was.
+     */
+    if ( p2m_is_paging(cmfn_type) && (cmfn_type != p2m_ram_paging_out) )
     {
         mem_sharing_gfn_destroy(spage, cd, gfn_info);
         put_page_and_type(spage);
@@ -1129,29 +1137,21 @@ int add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh,
     else
     {
         /*
-         * There is a chance we're plugging a hole where a paged out
-         * page was.
+         * Further, there is a chance this was a valid page.
+         * Don't leak it.
          */
-        if ( p2m_is_paging(cmfn_type) && (cmfn_type != p2m_ram_paging_out) )
+        if ( mfn_valid(cmfn) )
         {
-            atomic_dec(&cd->paged_pages);
-            /*
-             * Further, there is a chance this was a valid page.
-             * Don't leak it.
-             */
-            if ( mfn_valid(cmfn) )
+            struct page_info *cpage = mfn_to_page(cmfn);
+
+            if ( !get_page(cpage, cd) )
             {
-                struct page_info *cpage = mfn_to_page(cmfn);
-
-                if ( !get_page(cpage, cd) )
-                {
-                    domain_crash(cd);
-                    ret = -EOVERFLOW;
-                    goto err_unlock;
-                }
-                put_page_alloc_ref(cpage);
-                put_page(cpage);
+                domain_crash(cd);
+                ret = -EOVERFLOW;
+                goto err_unlock;
             }
+            put_page_alloc_ref(cpage);
+            put_page(cpage);
         }
     }
 
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Xen-devel] [PATCH v4 12/18] x86/mem_sharing: Enable mem_sharing on first memop
  2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
                   ` (10 preceding siblings ...)
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 11/18] x86/mem_sharing: ASSERT that p2m_set_entry succeeds Tamas K Lengyel
@ 2020-01-08 17:14 ` Tamas K Lengyel
  2020-01-16 16:18   ` Jan Beulich
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 13/18] x86/mem_sharing: Skip xen heap pages in memshr nominate Tamas K Lengyel
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

It is wasteful to require separate hypercalls to enable sharing on both the
parent and the client domain during VM forking. To speed things up we enable
sharing on the first memop in case it wasn't already enabled.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 36 +++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 3f36cd6bbc..b8a9228ecf 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1412,6 +1412,24 @@ static int range_share(struct domain *d, struct domain *cd,
     return rc;
 }
 
+static inline int mem_sharing_control(struct domain *d, bool enable)
+{
+    if ( enable )
+    {
+        if ( unlikely(!is_hvm_domain(d)) )
+            return -ENOSYS;
+
+        if ( unlikely(!hap_enabled(d)) )
+            return -ENODEV;
+
+        if ( unlikely(is_iommu_enabled(d)) )
+            return -EXDEV;
+    }
+
+    d->arch.hvm.mem_sharing.enabled = enable;
+    return 0;
+}
+
 int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 {
     int rc;
@@ -1433,10 +1451,8 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
     if ( rc )
         goto out;
 
-    /* Only HAP is supported */
-    rc = -ENODEV;
-    if ( !mem_sharing_enabled(d) )
-        goto out;
+    if ( !mem_sharing_enabled(d) && (rc = mem_sharing_control(d, true)) )
+        return rc;
 
     switch ( mso.op )
     {
@@ -1703,18 +1719,10 @@ int mem_sharing_domctl(struct domain *d, struct xen_domctl_mem_sharing_op *mec)
 {
     int rc;
 
-    /* Only HAP is supported */
-    if ( !hap_enabled(d) )
-        return -ENODEV;
-
-    switch ( mec->op )
+    switch( mec->op )
     {
     case XEN_DOMCTL_MEM_SHARING_CONTROL:
-        rc = 0;
-        if ( unlikely(is_iommu_enabled(d) && mec->u.enable) )
-            rc = -EXDEV;
-        else
-            d->arch.hvm.mem_sharing_enabled = mec->u.enable;
+        rc = mem_sharing_control(d, mec->u.enable);
         break;
 
     default:
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Xen-devel] [PATCH v4 13/18] x86/mem_sharing: Skip xen heap pages in memshr nominate
  2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
                   ` (11 preceding siblings ...)
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 12/18] x86/mem_sharing: Enable mem_sharing on first memop Tamas K Lengyel
@ 2020-01-08 17:14 ` Tamas K Lengyel
  2020-01-20 16:23   ` Jan Beulich
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 14/18] x86/mem_sharing: check page type count earlier Tamas K Lengyel
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

Trying to share these would fail anyway, better to skip them early.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index b8a9228ecf..baa3e35ded 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -852,6 +852,11 @@ static int nominate_page(struct domain *d, gfn_t gfn,
     if ( !p2m_is_sharable(p2mt) )
         goto out;
 
+    /* Skip xen heap pages */
+    page = mfn_to_page(mfn);
+    if ( !page || is_xen_heap_page(page) )
+        goto out;
+
     /* Check if there are mem_access/remapped altp2m entries for this page */
     if ( altp2m_active(d) )
     {
@@ -882,7 +887,6 @@ static int nominate_page(struct domain *d, gfn_t gfn,
     }
 
     /* Try to convert the mfn to the sharable type */
-    page = mfn_to_page(mfn);
     ret = page_make_sharable(d, page, expected_refcnt);
     if ( ret )
         goto out;
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Xen-devel] [PATCH v4 14/18] x86/mem_sharing: check page type count earlier
  2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
                   ` (12 preceding siblings ...)
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 13/18] x86/mem_sharing: Skip xen heap pages in memshr nominate Tamas K Lengyel
@ 2020-01-08 17:14 ` Tamas K Lengyel
  2020-01-20 16:34   ` Jan Beulich
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 15/18] xen/mem_sharing: VM forking Tamas K Lengyel
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index baa3e35ded..ecbe40545d 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -652,19 +652,18 @@ static int page_make_sharable(struct domain *d,
         return -EBUSY;
     }
 
-    /* Change page type and count atomically */
-    if ( !get_page_and_type(page, d, PGT_shared_page) )
+    /* Check if page is already typed and bail early if it is */
+    if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
     {
         spin_unlock(&d->page_alloc_lock);
-        return -EINVAL;
+        return -EEXIST;
     }
 
-    /* Check it wasn't already sharable and undo if it was */
-    if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
+    /* Change page type and count atomically */
+    if ( !get_page_and_type(page, d, PGT_shared_page) )
     {
         spin_unlock(&d->page_alloc_lock);
-        put_page_and_type(page);
-        return -EEXIST;
+        return -EINVAL;
     }
 
     /*
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Xen-devel] [PATCH v4 15/18] xen/mem_sharing: VM forking
  2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
                   ` (13 preceding siblings ...)
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 14/18] x86/mem_sharing: check page type count earlier Tamas K Lengyel
@ 2020-01-08 17:14 ` Tamas K Lengyel
  2020-01-09 10:28   ` Julien Grall
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 16/18] xen/mem_access: Use __get_gfn_type_access in set_mem_access Tamas K Lengyel
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tamas K Lengyel, Jan Beulich, Julien Grall, Roger Pau Monné

VM forking is the process of creating a domain with an empty memory space and a
parent domain specified from which to populate the memory when necessary. For
the new domain to be functional the VM state is copied over as part of the fork
operation (HVM params, hap allocation, etc).

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/hvm/hvm.c            |   2 +-
 xen/arch/x86/mm/mem_sharing.c     | 204 ++++++++++++++++++++++++++++++
 xen/arch/x86/mm/p2m.c             |  11 +-
 xen/include/asm-x86/mem_sharing.h |  20 ++-
 xen/include/public/memory.h       |   5 +
 xen/include/xen/sched.h           |   1 +
 6 files changed, 239 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5d24ceb469..3241e2a5ac 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1909,7 +1909,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
     }
 #endif
 
-    /* Spurious fault? PoD and log-dirty also take this path. */
+    /* Spurious fault? PoD, log-dirty and VM forking also take this path. */
     if ( p2m_is_ram(p2mt) )
     {
         rc = 1;
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index ecbe40545d..d544801681 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -22,11 +22,13 @@
 
 #include <xen/types.h>
 #include <xen/domain_page.h>
+#include <xen/event.h>
 #include <xen/spinlock.h>
 #include <xen/rwlock.h>
 #include <xen/mm.h>
 #include <xen/grant_table.h>
 #include <xen/sched.h>
+#include <xen/sched-if.h>
 #include <xen/rcupdate.h>
 #include <xen/guest_access.h>
 #include <xen/vm_event.h>
@@ -36,6 +38,9 @@
 #include <asm/altp2m.h>
 #include <asm/atomic.h>
 #include <asm/event.h>
+#include <asm/hap.h>
+#include <asm/hvm/hvm.h>
+#include <asm/hvm/save.h>
 #include <xsm/xsm.h>
 
 #include "mm-locks.h"
@@ -1433,6 +1438,175 @@ static inline int mem_sharing_control(struct domain *d, bool enable)
     return 0;
 }
 
+/*
+ * Forking a page only gets called when the VM faults due to no entry being
+ * in the EPT for the access. Depending on the type of access we either
+ * populate the physmap with a shared entry for read-only access or
+ * fork the page if its a write access.
+ *
+ * The client p2m is already locked so we only need to lock
+ * the parent's here.
+ */
+int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool unsharing)
+{
+    int rc = -ENOENT;
+    shr_handle_t handle;
+    struct domain *parent;
+    struct p2m_domain *p2m;
+    unsigned long gfn_l = gfn_x(gfn);
+    mfn_t mfn, new_mfn;
+    p2m_type_t p2mt;
+    struct page_info *page;
+
+    if ( !mem_sharing_is_fork(d) )
+        return -ENOENT;
+
+    parent = d->parent;
+
+    if ( !unsharing )
+    {
+        /* For read-only accesses we just add a shared entry to the physmap */
+        while ( parent )
+        {
+            if ( !(rc = nominate_page(parent, gfn, 0, &handle)) )
+                break;
+
+            parent = parent->parent;
+        }
+
+        if ( !rc )
+        {
+            /* The client's p2m is already locked */
+            struct p2m_domain *pp2m = p2m_get_hostp2m(parent);
+
+            p2m_lock(pp2m);
+            rc = add_to_physmap(parent, gfn_l, handle, d, gfn_l, false);
+            p2m_unlock(pp2m);
+
+            if ( !rc )
+                return 0;
+        }
+    }
+
+    /*
+     * If it's a write access (ie. unsharing) or if adding a shared entry to
+     * the physmap failed we'll fork the page directly.
+     */
+    p2m = p2m_get_hostp2m(d);
+    parent = d->parent;
+
+    while ( parent )
+    {
+        mfn = get_gfn_query(parent, gfn_l, &p2mt);
+
+        if ( mfn_valid(mfn) && p2m_is_any_ram(p2mt) )
+            break;
+
+        put_gfn(parent, gfn_l);
+        parent = parent->parent;
+    }
+
+    if ( !parent )
+        return -ENOENT;
+
+    if ( !(page = alloc_domheap_page(d, 0)) )
+    {
+        put_gfn(parent, gfn_l);
+        return -ENOMEM;
+    }
+
+    new_mfn = page_to_mfn(page);
+    copy_domain_page(new_mfn, mfn);
+    set_gpfn_from_mfn(mfn_x(new_mfn), gfn_l);
+
+    put_gfn(parent, gfn_l);
+
+    return p2m->set_entry(p2m, gfn, new_mfn, PAGE_ORDER_4K, p2m_ram_rw,
+                          p2m->default_access, -1);
+}
+
+static int bring_up_vcpus(struct domain *cd, struct cpupool *cpupool)
+{
+    int ret;
+    unsigned int i;
+
+    if ( (ret = cpupool_move_domain(cd, cpupool)) )
+        return ret;
+
+    for ( i = 0; i < cd->max_vcpus; i++ )
+    {
+        if ( cd->vcpu[i] )
+            continue;
+
+        if ( !vcpu_create(cd, i) )
+            return -EINVAL;
+    }
+
+    domain_update_node_affinity(cd);
+    return 0;
+}
+
+static int fork_hap_allocation(struct domain *d, struct domain *cd)
+{
+    int rc;
+    bool preempted;
+    unsigned long mb = hap_get_allocation(d);
+
+    if ( mb == hap_get_allocation(cd) )
+        return 0;
+
+    paging_lock(cd);
+    rc = hap_set_allocation(cd, mb << (20 - PAGE_SHIFT), &preempted);
+    paging_unlock(cd);
+
+    if ( rc )
+        return rc;
+
+    if ( preempted )
+        return -ERESTART;
+
+    return 0;
+}
+
+static void fork_tsc(struct domain *d, struct domain *cd)
+{
+    uint32_t tsc_mode;
+    uint32_t gtsc_khz;
+    uint32_t incarnation;
+    uint64_t elapsed_nsec;
+
+    tsc_get_info(d, &tsc_mode, &elapsed_nsec, &gtsc_khz, &incarnation);
+    tsc_set_info(cd, tsc_mode, elapsed_nsec, gtsc_khz, incarnation);
+}
+
+static int mem_sharing_fork(struct domain *d, struct domain *cd)
+{
+    int rc;
+
+    if ( !d->controller_pause_count &&
+         (rc = domain_pause_by_systemcontroller(d)) )
+        return rc;
+
+    cd->max_pages = d->max_pages;
+    cd->max_vcpus = d->max_vcpus;
+
+    /* this is preemptible so it's the first to get done */
+    if ( (rc = fork_hap_allocation(d, cd)) )
+        return rc;
+
+    if ( (rc = bring_up_vcpus(cd, d->cpupool)) )
+        return rc;
+
+    if ( (rc = hvm_copy_context_and_params(d, cd)) )
+        return rc;
+
+    fork_tsc(d, cd);
+
+    cd->parent = d;
+
+    return 0;
+}
+
 int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 {
     int rc;
@@ -1705,6 +1879,36 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
         rc = debug_gref(d, mso.u.debug.u.gref);
         break;
 
+    case XENMEM_sharing_op_fork:
+    {
+        struct domain *pd;
+
+        rc = -EINVAL;
+        if ( mso.u.fork._pad[0] || mso.u.fork._pad[1] ||
+             mso.u.fork._pad[2] )
+            goto out;
+
+        rc = rcu_lock_live_remote_domain_by_id(mso.u.fork.parent_domain,
+                                               &pd);
+        if ( rc )
+            goto out;
+
+        if ( !mem_sharing_enabled(pd) )
+        {
+            if ( (rc = mem_sharing_control(pd, true)) )
+                goto out;
+        }
+
+        rc = mem_sharing_fork(pd, d);
+
+        if ( rc == -ERESTART )
+            rc = hypercall_create_continuation(__HYPERVISOR_memory_op,
+                                               "lh", XENMEM_sharing_op,
+                                               arg);
+        rcu_unlock_domain(pd);
+        break;
+    }
+
     default:
         rc = -ENOSYS;
         break;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index baea632acc..81f7679ec1 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -508,6 +508,14 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
 
     mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
 
+    /* Check if we need to fork the page */
+    if ( (q & P2M_ALLOC) && p2m_is_hole(*t) &&
+         !mem_sharing_fork_page(p2m->domain, gfn, !!(q & P2M_UNSHARE)) )
+    {
+        mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
+    }
+
+    /* Check if we need to unshare the page */
     if ( (q & P2M_UNSHARE) && p2m_is_shared(*t) )
     {
         ASSERT(p2m_is_hostp2m(p2m));
@@ -585,7 +593,8 @@ struct page_info *p2m_get_page_from_gfn(
             return page;
 
         /* Error path: not a suitable GFN at all */
-        if ( !p2m_is_ram(*t) && !p2m_is_paging(*t) && !p2m_is_pod(*t) )
+        if ( !p2m_is_ram(*t) && !p2m_is_paging(*t) && !p2m_is_pod(*t) &&
+             !mem_sharing_is_fork(p2m->domain) )
             return NULL;
     }
 
diff --git a/xen/include/asm-x86/mem_sharing.h b/xen/include/asm-x86/mem_sharing.h
index c915fd973f..f1f785296f 100644
--- a/xen/include/asm-x86/mem_sharing.h
+++ b/xen/include/asm-x86/mem_sharing.h
@@ -26,8 +26,7 @@
 
 #ifdef CONFIG_MEM_SHARING
 
-struct mem_sharing_domain
-{
+struct mem_sharing_domain {
     bool enabled;
 
     /*
@@ -40,6 +39,9 @@ struct mem_sharing_domain
 #define mem_sharing_enabled(d) \
     (hap_enabled(d) && (d)->arch.hvm.mem_sharing.enabled)
 
+#define mem_sharing_is_fork(d) \
+    (mem_sharing_enabled(d) && !!((d)->parent))
+
 /* Auditing of memory sharing code? */
 #ifndef NDEBUG
 #define MEM_SHARING_AUDIT 1
@@ -89,6 +91,9 @@ static inline int mem_sharing_unshare_page(struct domain *d,
     return rc;
 }
 
+int mem_sharing_fork_page(struct domain *d, gfn_t gfn,
+                          bool unsharing);
+
 /*
  * If called by a foreign domain, possible errors are
  *   -EBUSY -> ring full
@@ -118,6 +123,7 @@ int relinquish_shared_pages(struct domain *d);
 #else
 
 #define mem_sharing_enabled(d) false
+#define mem_sharing_is_fork(p2m) false
 
 static inline unsigned int mem_sharing_get_nr_saved_mfns(void)
 {
@@ -142,6 +148,16 @@ static inline int mem_sharing_notify_enomem(struct domain *d, unsigned long gfn,
     return -EOPNOTSUPP;
 }
 
+static inline int mem_sharing_fork(struct domain *d, struct domain *cd, bool vcpu)
+{
+    return -EOPNOTSUPP;
+}
+
+static inline int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool lock)
+{
+    return -EOPNOTSUPP;
+}
+
 #endif
 
 #endif /* __MEM_SHARING_H__ */
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index cfdda6e2a8..90a3f4498e 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -482,6 +482,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
 #define XENMEM_sharing_op_add_physmap       6
 #define XENMEM_sharing_op_audit             7
 #define XENMEM_sharing_op_range_share       8
+#define XENMEM_sharing_op_fork              9
 
 #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
 #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
@@ -532,6 +533,10 @@ struct xen_mem_sharing_op {
                 uint32_t gref;     /* IN: gref to debug         */
             } u;
         } debug;
+        struct mem_sharing_op_fork {
+            domid_t parent_domain;
+            uint16_t _pad[3];                /* Must be set to 0 */
+        } fork;
     } u;
 };
 typedef struct xen_mem_sharing_op xen_mem_sharing_op_t;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index cc942a3621..135cb2cd22 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -504,6 +504,7 @@ struct domain
     /* Memory sharing support */
 #ifdef CONFIG_MEM_SHARING
     struct vm_event_domain *vm_event_share;
+    struct domain *parent; /* VM fork parent */
 #endif
     /* Memory paging support */
 #ifdef CONFIG_HAS_MEM_PAGING
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Xen-devel] [PATCH v4 16/18] xen/mem_access: Use __get_gfn_type_access in set_mem_access
  2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
                   ` (14 preceding siblings ...)
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 15/18] xen/mem_sharing: VM forking Tamas K Lengyel
@ 2020-01-08 17:14 ` Tamas K Lengyel
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 17/18] x86/mem_sharing: reset a fork Tamas K Lengyel
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Petre Pircalabu, Tamas K Lengyel, Tamas K Lengyel, Wei Liu,
	George Dunlap, Andrew Cooper, Jan Beulich, Alexandru Isaila,
	Roger Pau Monné

Use __get_gfn_type_access instead of p2m->get_entry to trigger page-forking
when the mem_access permission is being set on a page that has not yet been
copied over from the parent.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_access.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm/mem_access.c b/xen/arch/x86/mm/mem_access.c
index 320b9fe621..9caf08a5b2 100644
--- a/xen/arch/x86/mm/mem_access.c
+++ b/xen/arch/x86/mm/mem_access.c
@@ -303,11 +303,10 @@ static int set_mem_access(struct domain *d, struct p2m_domain *p2m,
     ASSERT(!ap2m);
 #endif
     {
-        mfn_t mfn;
         p2m_access_t _a;
         p2m_type_t t;
-
-        mfn = p2m->get_entry(p2m, gfn, &t, &_a, 0, NULL, NULL);
+        mfn_t mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &t, &_a,
+                                          P2M_ALLOC, NULL, false);
         rc = p2m->set_entry(p2m, gfn, mfn, PAGE_ORDER_4K, t, a, -1);
     }
 
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Xen-devel] [PATCH v4 17/18] x86/mem_sharing: reset a fork
  2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
                   ` (15 preceding siblings ...)
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 16/18] xen/mem_access: Use __get_gfn_type_access in set_mem_access Tamas K Lengyel
@ 2020-01-08 17:14 ` Tamas K Lengyel
  2020-01-09 10:30   ` Julien Grall
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 18/18] xen/tools: VM forking toolstack side Tamas K Lengyel
  2020-01-16 15:47 ` [Xen-devel] [PATCH v4 00/18] VM forking Jan Beulich
  18 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Stefano Stabellini,
	Jan Beulich, Julien Grall, Roger Pau Monné

Implement hypercall that allows a fork to shed all memory that got allocated
for it during its execution and re-load its vCPU context from the parent VM.
This allows the forked VM to reset into the same state the parent VM is in a
faster way then creating a new fork would be. Measurements show about a 2x
speedup during normal fuzzing operations. Performance may vary depending how
much memory got allocated for the forked VM. If it has been completely
deduplicated from the parent VM then creating a new fork would likely be more
performant.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 79 +++++++++++++++++++++++++++++++++++
 xen/include/public/memory.h   |  1 +
 2 files changed, 80 insertions(+)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index d544801681..aaa678da14 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1607,6 +1607,62 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd)
     return 0;
 }
 
+/*
+ * The fork reset operation is intended to be used on short-lived forks only.
+ * There is no hypercall continuation operation implemented for this reason.
+ * For forks that obtain a larger memory footprint it is likely going to be
+ * more performant to create a new fork instead of resetting an existing one.
+ *
+ * TODO: In case this hypercall would become useful on forks with larger memory
+ * footprints the hypercall continuation should be implemented.
+ */
+static int mem_sharing_fork_reset(struct domain *d, struct domain *cd)
+{
+    int rc;
+    struct p2m_domain* p2m = p2m_get_hostp2m(cd);
+    struct page_info *page, *tmp;
+
+    if ( !d->controller_pause_count &&
+         (rc = domain_pause_by_systemcontroller(d)) )
+        return rc;
+
+    page_list_for_each_safe(page, tmp, &cd->page_list)
+    {
+        p2m_type_t p2mt;
+        p2m_access_t p2ma;
+        gfn_t gfn;
+        mfn_t mfn = page_to_mfn(page);
+
+        if ( !mfn_valid(mfn) )
+            continue;
+
+        gfn = mfn_to_gfn(cd, mfn);
+        mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
+                                    0, NULL, false);
+
+        if ( !p2m_is_ram(p2mt) || p2m_is_shared(p2mt) )
+            continue;
+
+        /* take an extra reference */
+        if ( !get_page(page, cd) )
+            continue;
+
+        rc = p2m->set_entry(p2m, gfn, INVALID_MFN, PAGE_ORDER_4K,
+                            p2m_invalid, p2m_access_rwx, -1);
+        ASSERT(!rc);
+
+        put_page_alloc_ref(page);
+        put_page(page);
+    }
+
+    if ( (rc = hvm_copy_context_and_params(d, cd)) )
+        return rc;
+
+    fork_tsc(d, cd);
+
+    return 0;
+}
+
 int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 {
     int rc;
@@ -1909,6 +1965,29 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
         break;
     }
 
+    case XENMEM_sharing_op_fork_reset:
+    {
+        struct domain *pd;
+
+        rc = -EINVAL;
+        if ( mso.u.fork._pad[0] || mso.u.fork._pad[1] ||
+             mso.u.fork._pad[2] )
+            goto out;
+
+        rc = -ENOSYS;
+        if ( !d->parent )
+            goto out;
+
+        rc = rcu_lock_live_remote_domain_by_id(d->parent->domain_id, &pd);
+        if ( rc )
+            goto out;
+
+        rc = mem_sharing_fork_reset(pd, d);
+
+        rcu_unlock_domain(pd);
+        break;
+    }
+
     default:
         rc = -ENOSYS;
         break;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 90a3f4498e..e3d063e22e 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -483,6 +483,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
 #define XENMEM_sharing_op_audit             7
 #define XENMEM_sharing_op_range_share       8
 #define XENMEM_sharing_op_fork              9
+#define XENMEM_sharing_op_fork_reset        10
 
 #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
 #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Xen-devel] [PATCH v4 18/18] xen/tools: VM forking toolstack side
  2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
                   ` (16 preceding siblings ...)
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 17/18] x86/mem_sharing: reset a fork Tamas K Lengyel
@ 2020-01-08 17:14 ` Tamas K Lengyel
  2020-01-16 15:47 ` [Xen-devel] [PATCH v4 00/18] VM forking Jan Beulich
  18 siblings, 0 replies; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:14 UTC (permalink / raw)
  To: xen-devel; +Cc: Anthony PERARD, Ian Jackson, Tamas K Lengyel, Wei Liu

Add necessary bits to implement "xl fork-vm" commands. The command allows the
user to specify how to launch the device model allowing for a late-launch model
in which the user can execute the fork without the device model and decide to
only later launch it.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
v4: combine xl commands as suboptions to xl fork-vm
---
 docs/man/xl.1.pod.in          |  36 ++++++
 tools/libxc/include/xenctrl.h |  13 ++
 tools/libxc/xc_memshr.c       |  22 ++++
 tools/libxl/libxl.h           |   7 +
 tools/libxl/libxl_create.c    | 237 +++++++++++++++++++++++-----------
 tools/libxl/libxl_dm.c        |   2 +-
 tools/libxl/libxl_dom.c       |  83 ++++++++----
 tools/libxl/libxl_internal.h  |   1 +
 tools/libxl/libxl_types.idl   |   1 +
 tools/xl/xl.h                 |   5 +
 tools/xl/xl_cmdtable.c        |  12 ++
 tools/xl/xl_saverestore.c     |  96 ++++++++++++++
 tools/xl/xl_vmcontrol.c       |   8 ++
 13 files changed, 419 insertions(+), 104 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index d4b5e8e362..22cc4149b0 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -694,6 +694,42 @@ Leave the domain paused after creating the snapshot.
 
 =back
 
+=item B<fork-vm> [I<OPTIONS>] I<domain-id>
+
+Create a fork of a running VM. The domain will be paused after the operation
+and needs to remain paused while forks of it exist.
+
+B<OPTIONS>
+
+=over 4
+
+=item B<-p>
+
+Leave the fork paused after creating it.
+
+=item B<--launch-dm>
+
+Specify whether the device model (QEMU) should be launched for the fork. Late
+launch allows to start the device model for an already running fork.
+
+=item B<-C>
+
+The config file to use when launching the device model. Currently required when
+launching the device model.
+
+=item B<-Q>
+
+The qemu save file to use when launching the device model.  Currently required
+when launching the device model.
+
+=item B<--fork-reset>
+
+Perform a reset operation of an already running fork. Note that resetting may
+be less performant then creating a new fork depending on how much memory the
+fork has deduplicated during its runtime.
+
+=back
+
 =item B<sharing> [I<domain-id>]
 
 Display the number of shared pages for a specified domain. If no domain is
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 75f191ae3a..ffb0bb9a42 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2221,6 +2221,19 @@ int xc_memshr_range_share(xc_interface *xch,
                           uint64_t first_gfn,
                           uint64_t last_gfn);
 
+int xc_memshr_fork(xc_interface *xch,
+                   uint32_t source_domain,
+                   uint32_t client_domain);
+
+/*
+ * Note: this function is only intended to be used on short-lived forks that
+ * haven't yet aquired a lot of memory. In case the fork has a lot of memory
+ * it is likely more performant to create a new fork with xc_memshr_fork.
+ *
+ * With VMs that have a lot of memory this call may block for a long time.
+ */
+int xc_memshr_fork_reset(xc_interface *xch, uint32_t forked_domain);
+
 /* Debug calls: return the number of pages referencing the shared frame backing
  * the input argument. Should be one or greater.
  *
diff --git a/tools/libxc/xc_memshr.c b/tools/libxc/xc_memshr.c
index 97e2e6a8d9..d0e4ee225b 100644
--- a/tools/libxc/xc_memshr.c
+++ b/tools/libxc/xc_memshr.c
@@ -239,6 +239,28 @@ int xc_memshr_debug_gref(xc_interface *xch,
     return xc_memshr_memop(xch, domid, &mso);
 }
 
+int xc_memshr_fork(xc_interface *xch, uint32_t pdomid, uint32_t domid)
+{
+    xen_mem_sharing_op_t mso;
+
+    memset(&mso, 0, sizeof(mso));
+
+    mso.op = XENMEM_sharing_op_fork;
+    mso.u.fork.parent_domain = pdomid;
+
+    return xc_memshr_memop(xch, domid, &mso);
+}
+
+int xc_memshr_fork_reset(xc_interface *xch, uint32_t domid)
+{
+    xen_mem_sharing_op_t mso;
+
+    memset(&mso, 0, sizeof(mso));
+    mso.op = XENMEM_sharing_op_fork_reset;
+
+    return xc_memshr_memop(xch, domid, &mso);
+}
+
 int xc_memshr_audit(xc_interface *xch)
 {
     xen_mem_sharing_op_t mso;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 54abb9db1f..75cb070587 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1536,6 +1536,13 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
                             LIBXL_EXTERNAL_CALLERS_ONLY;
+int libxl_domain_fork_vm(libxl_ctx *ctx, uint32_t pdomid, uint32_t *domid)
+                             LIBXL_EXTERNAL_CALLERS_ONLY;
+int libxl_domain_fork_launch_dm(libxl_ctx *ctx, libxl_domain_config *d_config,
+                                uint32_t domid,
+                                const libxl_asyncprogress_how *aop_console_how)
+                                LIBXL_EXTERNAL_CALLERS_ONLY;
+int libxl_domain_fork_reset(libxl_ctx *ctx, uint32_t domid);
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 uint32_t *domid, int restore_fd,
                                 int send_back_fd,
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 32d45dcef0..e0d219596c 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -536,12 +536,12 @@ out:
     return ret;
 }
 
-int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
-                       libxl__domain_build_state *state,
-                       uint32_t *domid)
+static int libxl__domain_make_xs_entries(libxl__gc *gc, libxl_domain_config *d_config,
+                                         libxl__domain_build_state *state,
+                                         uint32_t domid)
 {
     libxl_ctx *ctx = libxl__gc_owner(gc);
-    int ret, rc, nb_vm;
+    int rc, nb_vm;
     const char *dom_type;
     char *uuid_string;
     char *dom_path, *vm_path, *libxl_path;
@@ -553,7 +553,6 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
 
     /* convenience aliases */
     libxl_domain_create_info *info = &d_config->c_info;
-    libxl_domain_build_info *b_info = &d_config->b_info;
 
     uuid_string = libxl__uuid2string(gc, info->uuid);
     if (!uuid_string) {
@@ -561,64 +560,7 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
         goto out;
     }
 
-    /* Valid domid here means we're soft resetting. */
-    if (!libxl_domid_valid_guest(*domid)) {
-        struct xen_domctl_createdomain create = {
-            .ssidref = info->ssidref,
-            .max_vcpus = b_info->max_vcpus,
-            .max_evtchn_port = b_info->event_channels,
-            .max_grant_frames = b_info->max_grant_frames,
-            .max_maptrack_frames = b_info->max_maptrack_frames,
-        };
-
-        if (info->type != LIBXL_DOMAIN_TYPE_PV) {
-            create.flags |= XEN_DOMCTL_CDF_hvm;
-            create.flags |=
-                libxl_defbool_val(info->hap) ? XEN_DOMCTL_CDF_hap : 0;
-            create.flags |=
-                libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off;
-        }
-
-        assert(info->passthrough != LIBXL_PASSTHROUGH_DEFAULT);
-        LOG(DETAIL, "passthrough: %s",
-            libxl_passthrough_to_string(info->passthrough));
-
-        if (info->passthrough != LIBXL_PASSTHROUGH_DISABLED)
-            create.flags |= XEN_DOMCTL_CDF_iommu;
-
-        if (info->passthrough == LIBXL_PASSTHROUGH_SYNC_PT)
-            create.iommu_opts |= XEN_DOMCTL_IOMMU_no_sharept;
-
-        /* Ultimately, handle is an array of 16 uint8_t, same as uuid */
-        libxl_uuid_copy(ctx, (libxl_uuid *)&create.handle, &info->uuid);
-
-        ret = libxl__arch_domain_prepare_config(gc, d_config, &create);
-        if (ret < 0) {
-            LOGED(ERROR, *domid, "fail to get domain config");
-            rc = ERROR_FAIL;
-            goto out;
-        }
-
-        ret = xc_domain_create(ctx->xch, domid, &create);
-        if (ret < 0) {
-            LOGED(ERROR, *domid, "domain creation fail");
-            rc = ERROR_FAIL;
-            goto out;
-        }
-
-        rc = libxl__arch_domain_save_config(gc, d_config, state, &create);
-        if (rc < 0)
-            goto out;
-    }
-
-    ret = xc_cpupool_movedomain(ctx->xch, info->poolid, *domid);
-    if (ret < 0) {
-        LOGED(ERROR, *domid, "domain move fail");
-        rc = ERROR_FAIL;
-        goto out;
-    }
-
-    dom_path = libxl__xs_get_dompath(gc, *domid);
+    dom_path = libxl__xs_get_dompath(gc, domid);
     if (!dom_path) {
         rc = ERROR_FAIL;
         goto out;
@@ -626,12 +568,12 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
 
     vm_path = GCSPRINTF("/vm/%s", uuid_string);
     if (!vm_path) {
-        LOGD(ERROR, *domid, "cannot allocate create paths");
+        LOGD(ERROR, domid, "cannot allocate create paths");
         rc = ERROR_FAIL;
         goto out;
     }
 
-    libxl_path = libxl__xs_libxl_path(gc, *domid);
+    libxl_path = libxl__xs_libxl_path(gc, domid);
     if (!libxl_path) {
         rc = ERROR_FAIL;
         goto out;
@@ -642,10 +584,10 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
 
     roperm[0].id = 0;
     roperm[0].perms = XS_PERM_NONE;
-    roperm[1].id = *domid;
+    roperm[1].id = domid;
     roperm[1].perms = XS_PERM_READ;
 
-    rwperm[0].id = *domid;
+    rwperm[0].id = domid;
     rwperm[0].perms = XS_PERM_NONE;
 
 retry_transaction:
@@ -663,7 +605,7 @@ retry_transaction:
                     noperm, ARRAY_SIZE(noperm));
 
     xs_write(ctx->xsh, t, GCSPRINTF("%s/vm", dom_path), vm_path, strlen(vm_path));
-    rc = libxl__domain_rename(gc, *domid, 0, info->name, t);
+    rc = libxl__domain_rename(gc, domid, 0, info->name, t);
     if (rc)
         goto out;
 
@@ -740,7 +682,7 @@ retry_transaction:
 
     vm_list = libxl_list_vm(ctx, &nb_vm);
     if (!vm_list) {
-        LOGD(ERROR, *domid, "cannot get number of running guests");
+        LOGD(ERROR, domid, "cannot get number of running guests");
         rc = ERROR_FAIL;
         goto out;
     }
@@ -764,7 +706,7 @@ retry_transaction:
             t = 0;
             goto retry_transaction;
         }
-        LOGED(ERROR, *domid, "domain creation ""xenstore transaction commit failed");
+        LOGED(ERROR, domid, "domain creation ""xenstore transaction commit failed");
         rc = ERROR_FAIL;
         goto out;
     }
@@ -776,6 +718,80 @@ retry_transaction:
     return rc;
 }
 
+int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
+                       libxl__domain_build_state *state,
+                       uint32_t *domid)
+{
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+    int ret, rc;
+
+    /* convenience aliases */
+    libxl_domain_create_info *info = &d_config->c_info;
+    libxl_domain_build_info *b_info = &d_config->b_info;
+
+    /* Valid domid here means we're soft resetting. */
+    if (!libxl_domid_valid_guest(*domid)) {
+        struct xen_domctl_createdomain create = {
+            .ssidref = info->ssidref,
+            .max_vcpus = b_info->max_vcpus,
+            .max_evtchn_port = b_info->event_channels,
+            .max_grant_frames = b_info->max_grant_frames,
+            .max_maptrack_frames = b_info->max_maptrack_frames,
+        };
+
+        if (info->type != LIBXL_DOMAIN_TYPE_PV) {
+            create.flags |= XEN_DOMCTL_CDF_hvm;
+            create.flags |=
+                libxl_defbool_val(info->hap) ? XEN_DOMCTL_CDF_hap : 0;
+            create.flags |=
+                libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off;
+        }
+
+        assert(info->passthrough != LIBXL_PASSTHROUGH_DEFAULT);
+        LOG(DETAIL, "passthrough: %s",
+            libxl_passthrough_to_string(info->passthrough));
+
+        if (info->passthrough != LIBXL_PASSTHROUGH_DISABLED)
+            create.flags |= XEN_DOMCTL_CDF_iommu;
+
+        if (info->passthrough == LIBXL_PASSTHROUGH_SYNC_PT)
+            create.iommu_opts |= XEN_DOMCTL_IOMMU_no_sharept;
+
+        /* Ultimately, handle is an array of 16 uint8_t, same as uuid */
+        libxl_uuid_copy(ctx, (libxl_uuid *)&create.handle, &info->uuid);
+
+        ret = libxl__arch_domain_prepare_config(gc, d_config, &create);
+        if (ret < 0) {
+            LOGED(ERROR, *domid, "fail to get domain config");
+            rc = ERROR_FAIL;
+            goto out;
+        }
+
+        ret = xc_domain_create(ctx->xch, domid, &create);
+        if (ret < 0) {
+            LOGED(ERROR, *domid, "domain creation fail");
+            rc = ERROR_FAIL;
+            goto out;
+        }
+
+        rc = libxl__arch_domain_save_config(gc, d_config, state, &create);
+        if (rc < 0)
+            goto out;
+    }
+
+    ret = xc_cpupool_movedomain(ctx->xch, info->poolid, *domid);
+    if (ret < 0) {
+        LOGED(ERROR, *domid, "domain move fail");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = libxl__domain_make_xs_entries(gc, d_config, state, *domid);
+
+out:
+    return rc;
+}
+
 static int store_libxl_entry(libxl__gc *gc, uint32_t domid,
                              libxl_domain_build_info *b_info)
 {
@@ -1097,15 +1113,31 @@ static void initiate_domain_create(libxl__egc *egc,
     ret = libxl__domain_config_setdefault(gc,d_config,domid);
     if (ret) goto error_out;
 
-    ret = libxl__domain_make(gc, d_config, &dcs->build_state, &domid);
-    if (ret) {
-        LOGD(ERROR, domid, "cannot make domain: %d", ret);
+    if ( !d_config->dm_restore_file )
+    {
+        ret = libxl__domain_make(gc, d_config, &dcs->build_state, &domid);
         dcs->guest_domid = domid;
+
+        if (ret) {
+            LOGD(ERROR, domid, "cannot make domain: %d", ret);
+            ret = ERROR_FAIL;
+            goto error_out;
+        }
+    } else if ( dcs->guest_domid != INVALID_DOMID ) {
+        domid = dcs->guest_domid;
+
+        ret = libxl__domain_make_xs_entries(gc, d_config, &dcs->build_state, domid);
+        if (ret) {
+            LOGD(ERROR, domid, "cannot make domain: %d", ret);
+            ret = ERROR_FAIL;
+            goto error_out;
+        }
+    } else {
+        LOGD(ERROR, domid, "cannot make domain");
         ret = ERROR_FAIL;
         goto error_out;
     }
 
-    dcs->guest_domid = domid;
     dcs->sdss.dm.guest_domid = 0; /* means we haven't spawned */
 
     /* post-4.13 todo: move these next bits of defaulting to
@@ -1141,7 +1173,7 @@ static void initiate_domain_create(libxl__egc *egc,
     if (ret)
         goto error_out;
 
-    if (restore_fd >= 0 || dcs->domid_soft_reset != INVALID_DOMID) {
+    if (restore_fd >= 0 || dcs->domid_soft_reset != INVALID_DOMID || d_config->dm_restore_file) {
         LOGD(DEBUG, domid, "restoring, not running bootloader");
         domcreate_bootloader_done(egc, &dcs->bl, 0);
     } else  {
@@ -1217,7 +1249,16 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     dcs->sdss.dm.callback = domcreate_devmodel_started;
     dcs->sdss.callback = domcreate_devmodel_started;
 
-    if (restore_fd < 0 && dcs->domid_soft_reset == INVALID_DOMID) {
+    if (restore_fd < 0 && dcs->domid_soft_reset == INVALID_DOMID && !d_config->dm_restore_file) {
+        rc = libxl__domain_build(gc, d_config, domid, state);
+        domcreate_rebuild_done(egc, dcs, rc);
+        return;
+    }
+
+    if ( d_config->dm_restore_file ) {
+        dcs->srs.dcs = dcs;
+        dcs->srs.ao = ao;
+        state->forked_vm = true;
         rc = libxl__domain_build(gc, d_config, domid, state);
         domcreate_rebuild_done(egc, dcs, rc);
         return;
@@ -1415,6 +1456,7 @@ static void domcreate_rebuild_done(libxl__egc *egc,
     /* convenience aliases */
     const uint32_t domid = dcs->guest_domid;
     libxl_domain_config *const d_config = dcs->guest_config;
+    libxl__domain_build_state *const state = &dcs->build_state;
 
     if (ret) {
         LOGD(ERROR, domid, "cannot (re-)build domain: %d", ret);
@@ -1422,6 +1464,9 @@ static void domcreate_rebuild_done(libxl__egc *egc,
         goto error_out;
     }
 
+    if ( d_config->dm_restore_file )
+        state->saved_state = GCSPRINTF("%s", d_config->dm_restore_file);
+
     store_libxl_entry(gc, domid, &d_config->b_info);
 
     libxl__multidev_begin(ao, &dcs->multidev);
@@ -1823,10 +1868,13 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
     GCNEW(cdcs);
     cdcs->dcs.ao = ao;
     cdcs->dcs.guest_config = d_config;
+    cdcs->dcs.guest_domid = *domid;
+
     libxl_domain_config_init(&cdcs->dcs.guest_config_saved);
     libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config);
     cdcs->dcs.restore_fd = cdcs->dcs.libxc_fd = restore_fd;
     cdcs->dcs.send_back_fd = send_back_fd;
+
     if (restore_fd > -1) {
         cdcs->dcs.restore_params = *params;
         rc = libxl__fd_flags_modify_save(gc, cdcs->dcs.restore_fd,
@@ -2069,6 +2117,43 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             ao_how, aop_console_how);
 }
 
+int libxl_domain_fork_vm(libxl_ctx *ctx, uint32_t pdomid, uint32_t *domid)
+{
+    int rc;
+    struct xen_domctl_createdomain create = {0};
+    create.flags |= XEN_DOMCTL_CDF_hvm;
+    create.flags |= XEN_DOMCTL_CDF_hap;
+    create.flags |= XEN_DOMCTL_CDF_oos_off;
+    create.arch.emulation_flags = (XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI);
+
+    create.ssidref = SECINITSID_DOMU;
+    create.max_vcpus = 1; // placeholder, will be cloned from pdomid
+    create.max_evtchn_port = 1023;
+    create.max_grant_frames = LIBXL_MAX_GRANT_FRAMES_DEFAULT;
+    create.max_maptrack_frames = LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT;
+
+    if ( (rc = xc_domain_create(ctx->xch, domid, &create)) )
+        return rc;
+
+    if ( (rc = xc_memshr_fork(ctx->xch, pdomid, *domid)) )
+        xc_domain_destroy(ctx->xch, *domid);
+
+    return rc;
+}
+
+int libxl_domain_fork_launch_dm(libxl_ctx *ctx, libxl_domain_config *d_config,
+                                uint32_t domid,
+                                const libxl_asyncprogress_how *aop_console_how)
+{
+    unset_disk_colo_restore(d_config);
+    return do_domain_create(ctx, d_config, &domid, -1, -1, 0, 0, aop_console_how);
+}
+
+int libxl_domain_fork_reset(libxl_ctx *ctx, uint32_t domid)
+{
+    return xc_memshr_fork_reset(ctx->xch, domid);
+}
+
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 uint32_t *domid, int restore_fd,
                                 int send_back_fd,
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index e92e412c1b..9d967e1d32 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -2787,7 +2787,7 @@ static void device_model_spawn_outcome(libxl__egc *egc,
 
     libxl__domain_build_state *state = dmss->build_state;
 
-    if (state->saved_state) {
+    if (state->saved_state && !state->forked_vm) {
         ret2 = unlink(state->saved_state);
         if (ret2) {
             LOGED(ERROR, dmss->guest_domid, "%s: failed to remove device-model state %s",
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index cdb294ab8d..95e6ecc9d3 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -392,9 +392,12 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
     libxl_domain_build_info *const info = &d_config->b_info;
     libxl_ctx *ctx = libxl__gc_owner(gc);
     char *xs_domid, *con_domid;
-    int rc;
+    int rc = 0;
     uint64_t size;
 
+    if ( state->forked_vm )
+        goto skip_fork;
+
     if (xc_domain_max_vcpus(ctx->xch, domid, info->max_vcpus) != 0) {
         LOG(ERROR, "Couldn't set max vcpu count");
         return ERROR_FAIL;
@@ -499,29 +502,6 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
         }
     }
 
-
-    rc = libxl__arch_extra_memory(gc, info, &size);
-    if (rc < 0) {
-        LOGE(ERROR, "Couldn't get arch extra constant memory size");
-        return ERROR_FAIL;
-    }
-
-    if (xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + size) < 0) {
-        LOGE(ERROR, "Couldn't set max memory");
-        return ERROR_FAIL;
-    }
-
-    xs_domid = xs_read(ctx->xsh, XBT_NULL, "/tool/xenstored/domid", NULL);
-    state->store_domid = xs_domid ? atoi(xs_domid) : 0;
-    free(xs_domid);
-
-    con_domid = xs_read(ctx->xsh, XBT_NULL, "/tool/xenconsoled/domid", NULL);
-    state->console_domid = con_domid ? atoi(con_domid) : 0;
-    free(con_domid);
-
-    state->store_port = xc_evtchn_alloc_unbound(ctx->xch, domid, state->store_domid);
-    state->console_port = xc_evtchn_alloc_unbound(ctx->xch, domid, state->console_domid);
-
     if (info->type != LIBXL_DOMAIN_TYPE_PV)
         hvm_set_conf_params(ctx->xch, domid, info);
 
@@ -556,8 +536,34 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
                          info->altp2m);
     }
 
+    rc = libxl__arch_extra_memory(gc, info, &size);
+    if (rc < 0) {
+        LOGE(ERROR, "Couldn't get arch extra constant memory size");
+        return ERROR_FAIL;
+    }
+
+    if (xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + size) < 0) {
+        LOGE(ERROR, "Couldn't set max memory");
+        return ERROR_FAIL;
+    }
+
     rc = libxl__arch_domain_create(gc, d_config, domid);
+    if ( rc )
+        goto out;
 
+skip_fork:
+    xs_domid = xs_read(ctx->xsh, XBT_NULL, "/tool/xenstored/domid", NULL);
+    state->store_domid = xs_domid ? atoi(xs_domid) : 0;
+    free(xs_domid);
+
+    con_domid = xs_read(ctx->xsh, XBT_NULL, "/tool/xenconsoled/domid", NULL);
+    state->console_domid = con_domid ? atoi(con_domid) : 0;
+    free(con_domid);
+
+    state->store_port = xc_evtchn_alloc_unbound(ctx->xch, domid, state->store_domid);
+    state->console_port = xc_evtchn_alloc_unbound(ctx->xch, domid, state->console_domid);
+
+out:
     return rc;
 }
 
@@ -615,6 +621,9 @@ int libxl__build_post(libxl__gc *gc, uint32_t domid,
     char **ents;
     int i, rc;
 
+    if ( state->forked_vm )
+        goto skip_fork;
+
     if (info->num_vnuma_nodes && !info->num_vcpu_soft_affinity) {
         rc = set_vnuma_affinity(gc, domid, info);
         if (rc)
@@ -639,6 +648,7 @@ int libxl__build_post(libxl__gc *gc, uint32_t domid,
         }
     }
 
+skip_fork:
     ents = libxl__calloc(gc, 12 + (info->max_vcpus * 2) + 2, sizeof(char *));
     ents[0] = "memory/static-max";
     ents[1] = GCSPRINTF("%"PRId64, info->max_memkb);
@@ -901,14 +911,16 @@ static int hvm_build_set_params(xc_interface *handle, uint32_t domid,
                                 libxl_domain_build_info *info,
                                 int store_evtchn, unsigned long *store_mfn,
                                 int console_evtchn, unsigned long *console_mfn,
-                                domid_t store_domid, domid_t console_domid)
+                                domid_t store_domid, domid_t console_domid,
+                                bool forked_vm)
 {
     struct hvm_info_table *va_hvm;
     uint8_t *va_map, sum;
     uint64_t str_mfn, cons_mfn;
     int i;
 
-    if (info->type == LIBXL_DOMAIN_TYPE_HVM) {
+    if ( info->type == LIBXL_DOMAIN_TYPE_HVM && !forked_vm )
+    {
         va_map = xc_map_foreign_range(handle, domid,
                                       XC_PAGE_SIZE, PROT_READ | PROT_WRITE,
                                       HVM_INFO_PFN);
@@ -1224,6 +1236,23 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     struct xc_dom_image *dom = NULL;
     bool device_model = info->type == LIBXL_DOMAIN_TYPE_HVM ? true : false;
 
+    if ( state->forked_vm )
+    {
+        rc = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
+                                  &state->store_mfn, state->console_port,
+                                  &state->console_mfn, state->store_domid,
+                                  state->console_domid, state->forked_vm);
+
+        if ( rc )
+            return rc;
+
+        return xc_dom_gnttab_seed(ctx->xch, domid, true,
+                                  state->console_mfn,
+                                  state->store_mfn,
+                                  state->console_domid,
+                                  state->store_domid);
+    }
+
     xc_dom_loginit(ctx->xch);
 
     /*
@@ -1348,7 +1377,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     rc = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
                                &state->store_mfn, state->console_port,
                                &state->console_mfn, state->store_domid,
-                               state->console_domid);
+                               state->console_domid, false);
     if (rc != 0) {
         LOG(ERROR, "hvm build set params failed");
         goto out;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index ba8c9b41ab..796d162cf2 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1360,6 +1360,7 @@ typedef struct {
 
     char *saved_state;
     int dm_monitor_fd;
+    bool forked_vm;
 
     libxl__file_reference pv_kernel;
     libxl__file_reference pv_ramdisk;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 7921950f6a..7c4c4057a9 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -956,6 +956,7 @@ libxl_domain_config = Struct("domain_config", [
     ("on_watchdog", libxl_action_on_shutdown),
     ("on_crash", libxl_action_on_shutdown),
     ("on_soft_reset", libxl_action_on_shutdown),
+    ("dm_restore_file", string, {'const': True}),
     ], dir=DIR_IN)
 
 libxl_diskinfo = Struct("diskinfo", [
diff --git a/tools/xl/xl.h b/tools/xl/xl.h
index 60bdad8ffb..9bdad6526e 100644
--- a/tools/xl/xl.h
+++ b/tools/xl/xl.h
@@ -31,6 +31,7 @@ struct cmd_spec {
 };
 
 struct domain_create {
+    uint32_t ddomid; /* fork launch dm for this domid */
     int debug;
     int daemonize;
     int monitor; /* handle guest reboots etc */
@@ -45,6 +46,7 @@ struct domain_create {
     const char *config_file;
     char *extra_config; /* extra config string */
     const char *restore_file;
+    const char *dm_restore_file;
     char *colo_proxy_script;
     bool userspace_colo_proxy;
     int migrate_fd; /* -1 means none */
@@ -127,6 +129,9 @@ int main_pciassignable_remove(int argc, char **argv);
 int main_pciassignable_list(int argc, char **argv);
 #ifndef LIBXL_HAVE_NO_SUSPEND_RESUME
 int main_restore(int argc, char **argv);
+int main_fork_vm(int argc, char **argv);
+int main_fork_launch_dm(int argc, char **argv);
+int main_fork_reset(int argc, char **argv);
 int main_migrate_receive(int argc, char **argv);
 int main_save(int argc, char **argv);
 int main_migrate(int argc, char **argv);
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 3b302b2f20..3a5d371057 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -185,6 +185,18 @@ struct cmd_spec cmd_table[] = {
       "Restore a domain from a saved state",
       "- for internal use only",
     },
+    { "fork-vm",
+      &main_fork_vm, 0, 1,
+      "Fork a domain from the running parent domid",
+      "[options] <Domid>",
+      "-h                           Print this help.\n"
+      "-C <config>                  Use config file for VM fork.\n"
+      "-Q <qemu-save-file>          Use qemu save file for VM fork.\n"
+      "--launch-dm <yes|no|late>    Launch device model (QEMU) for VM fork.\n"
+      "--fork-reset                 Reset VM fork.\n"
+      "-p                           Do not unpause fork VM after operation.\n"
+      "-d                           Enable debug messages.\n"
+    },
 #endif
     { "dump-core",
       &main_dump_core, 0, 1,
diff --git a/tools/xl/xl_saverestore.c b/tools/xl/xl_saverestore.c
index 9be033fe65..72c6209558 100644
--- a/tools/xl/xl_saverestore.c
+++ b/tools/xl/xl_saverestore.c
@@ -229,6 +229,102 @@ int main_restore(int argc, char **argv)
     return EXIT_SUCCESS;
 }
 
+int main_fork_vm(int argc, char **argv)
+{
+    int rc, debug = 0;
+    uint32_t domid_in = INVALID_DOMID, domid_out = INVALID_DOMID;
+    int launch_dm = 1;
+    bool reset = 0;
+    bool pause = 0;
+    const char *config_file = NULL;
+    const char *dm_restore_file = NULL;
+
+    int opt;
+    static struct option opts[] = {
+        {"launch-dm", 1, 0, 'l'},
+        {"fork-reset", 0, 0, 'r'},
+        COMMON_LONG_OPTS
+    };
+
+    SWITCH_FOREACH_OPT(opt, "phdC:Q:l:rN:D:B:V:", opts, "fork-vm", 1) {
+    case 'd':
+        debug = 1;
+        break;
+    case 'p':
+        pause = 1;
+        break;
+    case 'C':
+        config_file = optarg;
+        break;
+    case 'Q':
+        dm_restore_file = optarg;
+        break;
+    case 'l':
+        printf("optarg: %s\n", optarg);
+        if ( !strcmp(optarg, "no") )
+            launch_dm = 0;
+        if ( !strcmp(optarg, "yes") )
+            launch_dm = 1;
+        if ( !strcmp(optarg, "late") )
+            launch_dm = 2;
+        break;
+    case 'r':
+        reset = 1;
+        break;
+    case 'N': /* fall-through */
+    case 'D': /* fall-through */
+    case 'B': /* fall-through */
+    case 'V':
+        fprintf(stderr, "Unimplemented option(s)\n");
+        return EXIT_FAILURE;
+    }
+
+    if (argc-optind == 1) {
+        domid_in = atoi(argv[optind]);
+    } else {
+        help("fork-vm");
+        return EXIT_FAILURE;
+    }
+
+    if (launch_dm && (!config_file || !dm_restore_file)) {
+        fprintf(stderr, "Currently you must provide both -C and -Q options\n");
+        return EXIT_FAILURE;
+    }
+
+    if (reset) {
+        domid_out = domid_in;
+        if (libxl_domain_fork_reset(ctx, domid_in) == EXIT_FAILURE)
+            return EXIT_FAILURE;
+    }
+
+    if (launch_dm == 2 || reset) {
+        domid_out = domid_in;
+        rc = EXIT_SUCCESS;
+    } else
+        rc = libxl_domain_fork_vm(ctx, domid_in, &domid_out);
+
+    if (rc == EXIT_SUCCESS && launch_dm) {
+        struct domain_create dom_info;
+        memset(&dom_info, 0, sizeof(dom_info));
+        dom_info.ddomid = domid_out;
+        dom_info.dm_restore_file = dm_restore_file;
+        dom_info.debug = debug;
+        dom_info.paused = pause;
+        dom_info.config_file = config_file;
+        dom_info.migrate_fd = -1;
+        dom_info.send_back_fd = -1;
+        rc = create_domain(&dom_info) < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
+    }
+
+    if (rc == EXIT_SUCCESS && !pause)
+        rc = libxl_domain_unpause(ctx, domid_out, NULL);
+
+    if (rc == EXIT_SUCCESS)
+        fprintf(stderr, "fork-vm command successfully returned domid: %u\n", domid_out);
+
+    return rc;
+}
+
 int main_save(int argc, char **argv)
 {
     uint32_t domid;
diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index e520b1da79..d9cb19c599 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -645,6 +645,7 @@ int create_domain(struct domain_create *dom_info)
 
     libxl_domain_config d_config;
 
+    uint32_t ddomid = dom_info->ddomid; // launch dm for this domain iff set
     int debug = dom_info->debug;
     int daemonize = dom_info->daemonize;
     int monitor = dom_info->monitor;
@@ -655,6 +656,7 @@ int create_domain(struct domain_create *dom_info)
     const char *restore_file = dom_info->restore_file;
     const char *config_source = NULL;
     const char *restore_source = NULL;
+    const char *dm_restore_file = dom_info->dm_restore_file;
     int migrate_fd = dom_info->migrate_fd;
     bool config_in_json;
 
@@ -923,6 +925,12 @@ start:
          * restore/migrate-receive it again.
          */
         restoring = 0;
+    } else if ( ddomid ) {
+        d_config.dm_restore_file = dm_restore_file;
+        ret = libxl_domain_fork_launch_dm(ctx, &d_config, ddomid,
+                                          autoconnect_console_how);
+        domid = ddomid;
+        ddomid = INVALID_DOMID;
     } else if (domid_soft_reset != INVALID_DOMID) {
         /* Do soft reset. */
         ret = libxl_domain_soft_reset(ctx, &d_config, domid_soft_reset,
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 15/18] xen/mem_sharing: VM forking
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 15/18] xen/mem_sharing: VM forking Tamas K Lengyel
@ 2020-01-09 10:28   ` Julien Grall
  2020-01-09 13:41     ` Tamas K Lengyel
  0 siblings, 1 reply; 57+ messages in thread
From: Julien Grall @ 2020-01-09 10:28 UTC (permalink / raw)
  To: Tamas K Lengyel, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tamas K Lengyel,
	Jan Beulich, Roger Pau Monné

Hi Tamas,

On 08/01/2020 17:14, Tamas K Lengyel wrote:
> +static int mem_sharing_fork(struct domain *d, struct domain *cd)
> +{
> +    int rc;
> +
> +    if ( !d->controller_pause_count &&
> +         (rc = domain_pause_by_systemcontroller(d)) )

AFAIU, the parent domain will be paused if it wasn't paused before and 
this will not be unpaused by the same hypercall. Right?

If so, this brings two questions:
     - What would happen if the toolstack decide to unpause the domain?
     - How does the caller knows whether this was paused by the 
hypercall or not?

I would also suggest to document the behavior of the hypercall as this 
is not entirely obvious that the domain will be paused here.

> +        return rc;
> +
> +    cd->max_pages = d->max_pages;
> +    cd->max_vcpus = d->max_vcpus;
> +
> +    /* this is preemptible so it's the first to get done */
> +    if ( (rc = fork_hap_allocation(d, cd)) )
> +        return rc;
> +
> +    if ( (rc = bring_up_vcpus(cd, d->cpupool)) )
> +        return rc;
> +
> +    if ( (rc = hvm_copy_context_and_params(d, cd)) )
> +        return rc;
> +
> +    fork_tsc(d, cd);
> +
> +    cd->parent = d;

How do you ensure that the parent domain will not get destroyed before 
the forked domain? Do you have a refcount somewhere?

> +
> +    return 0;
> +}
> +

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 17/18] x86/mem_sharing: reset a fork
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 17/18] x86/mem_sharing: reset a fork Tamas K Lengyel
@ 2020-01-09 10:30   ` Julien Grall
  0 siblings, 0 replies; 57+ messages in thread
From: Julien Grall @ 2020-01-09 10:30 UTC (permalink / raw)
  To: Tamas K Lengyel, xen-devel
  Cc: Tamas K Lengyel, Wei Liu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Ian Jackson, Stefano Stabellini, Jan Beulich,
	Roger Pau Monné



On 08/01/2020 17:14, Tamas K Lengyel wrote:
> Implement hypercall that allows a fork to shed all memory that got allocated
> for it during its execution and re-load its vCPU context from the parent VM.
> This allows the forked VM to reset into the same state the parent VM is in a
> faster way then creating a new fork would be. Measurements show about a 2x
> speedup during normal fuzzing operations. Performance may vary depending how
> much memory got allocated for the forked VM. If it has been completely
> deduplicated from the parent VM then creating a new fork would likely be more
> performant.
> 
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> ---
>   xen/arch/x86/mm/mem_sharing.c | 79 +++++++++++++++++++++++++++++++++++
>   xen/include/public/memory.h   |  1 +
>   2 files changed, 80 insertions(+)
> 
> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> index d544801681..aaa678da14 100644
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -1607,6 +1607,62 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd)
>       return 0;
>   }
>   
> +/*
> + * The fork reset operation is intended to be used on short-lived forks only.
> + * There is no hypercall continuation operation implemented for this reason.
> + * For forks that obtain a larger memory footprint it is likely going to be
> + * more performant to create a new fork instead of resetting an existing one.
> + *
> + * TODO: In case this hypercall would become useful on forks with larger memory
> + * footprints the hypercall continuation should be implemented.
> + */
> +static int mem_sharing_fork_reset(struct domain *d, struct domain *cd)
> +{
> +    int rc;
> +    struct p2m_domain* p2m = p2m_get_hostp2m(cd);
> +    struct page_info *page, *tmp;
> +
> +    if ( !d->controller_pause_count &&
> +         (rc = domain_pause_by_systemcontroller(d)) )
> +        return rc;

Similar question as patch #15 here.

> +
> +    page_list_for_each_safe(page, tmp, &cd->page_list)
> +    {
> +        p2m_type_t p2mt;
> +        p2m_access_t p2ma;
> +        gfn_t gfn;
> +        mfn_t mfn = page_to_mfn(page);
> +
> +        if ( !mfn_valid(mfn) )
> +            continue;
> +
> +        gfn = mfn_to_gfn(cd, mfn);
> +        mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
> +                                    0, NULL, false);
> +
> +        if ( !p2m_is_ram(p2mt) || p2m_is_shared(p2mt) )
> +            continue;
> +
> +        /* take an extra reference */
> +        if ( !get_page(page, cd) )
> +            continue;
> +
> +        rc = p2m->set_entry(p2m, gfn, INVALID_MFN, PAGE_ORDER_4K,
> +                            p2m_invalid, p2m_access_rwx, -1);
> +        ASSERT(!rc);
> +
> +        put_page_alloc_ref(page);
> +        put_page(page);
> +    }
> +
> +    if ( (rc = hvm_copy_context_and_params(d, cd)) )
> +        return rc;
> +
> +    fork_tsc(d, cd);
> +
> +    return 0;
> +}
> +
>   int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>   {
>       int rc;
> @@ -1909,6 +1965,29 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>           break;
>       }
>   
> +    case XENMEM_sharing_op_fork_reset:
> +    {
> +        struct domain *pd;
> +
> +        rc = -EINVAL;
> +        if ( mso.u.fork._pad[0] || mso.u.fork._pad[1] ||
> +             mso.u.fork._pad[2] )
> +            goto out;
> +
> +        rc = -ENOSYS;
> +        if ( !d->parent )
> +            goto out;
> +
> +        rc = rcu_lock_live_remote_domain_by_id(d->parent->domain_id, &pd);
> +        if ( rc )
> +            goto out;
> +
> +        rc = mem_sharing_fork_reset(pd, d);
> +
> +        rcu_unlock_domain(pd);
> +        break;
> +    }
> +
>       default:
>           rc = -ENOSYS;
>           break;
> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
> index 90a3f4498e..e3d063e22e 100644
> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -483,6 +483,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
>   #define XENMEM_sharing_op_audit             7
>   #define XENMEM_sharing_op_range_share       8
>   #define XENMEM_sharing_op_fork              9
> +#define XENMEM_sharing_op_fork_reset        10
>   
>   #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
>   #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
> 

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 15/18] xen/mem_sharing: VM forking
  2020-01-09 10:28   ` Julien Grall
@ 2020-01-09 13:41     ` Tamas K Lengyel
  2020-01-09 15:10       ` Roger Pau Monné
  0 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-09 13:41 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Jan Beulich, Xen-devel, Roger Pau Monné

On Thu, Jan 9, 2020 at 3:29 AM Julien Grall <julien@xen.org> wrote:
>
> Hi Tamas,
>
> On 08/01/2020 17:14, Tamas K Lengyel wrote:
> > +static int mem_sharing_fork(struct domain *d, struct domain *cd)
> > +{
> > +    int rc;
> > +
> > +    if ( !d->controller_pause_count &&
> > +         (rc = domain_pause_by_systemcontroller(d)) )
>
> AFAIU, the parent domain will be paused if it wasn't paused before and
> this will not be unpaused by the same hypercall. Right?

Yes, it needs to remain paused as long as there are forks active from
it. Afterwards it can be unpaused.

>
> If so, this brings two questions:
>      - What would happen if the toolstack decide to unpause the domain?

The forks eventually would end up misbehaving since the memory they
haven't touched yet would start changing underneath their feat. If
they never use those pages, then they are safe. If they are, and they
expect them to be in the same state when they were created, it will
lead to issues. Depends really on what is running in the fork.

>      - How does the caller knows whether this was paused by the
> hypercall or not?

Well, if they paused the VM before then the hypercall doesn't pause it
again. If it wasn't paused, it will be now.

>
> I would also suggest to document the behavior of the hypercall as this
> is not entirely obvious that the domain will be paused here.

Sure.

>
> > +        return rc;
> > +
> > +    cd->max_pages = d->max_pages;
> > +    cd->max_vcpus = d->max_vcpus;
> > +
> > +    /* this is preemptible so it's the first to get done */
> > +    if ( (rc = fork_hap_allocation(d, cd)) )
> > +        return rc;
> > +
> > +    if ( (rc = bring_up_vcpus(cd, d->cpupool)) )
> > +        return rc;
> > +
> > +    if ( (rc = hvm_copy_context_and_params(d, cd)) )
> > +        return rc;
> > +
> > +    fork_tsc(d, cd);
> > +
> > +    cd->parent = d;
>
> How do you ensure that the parent domain will not get destroyed before
> the forked domain? Do you have a refcount somewhere?

We don't. At this point we expect the user to keep the parent VM alive
but paused. Is there such a thing as a domain refcount we could use
for this?

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 15/18] xen/mem_sharing: VM forking
  2020-01-09 13:41     ` Tamas K Lengyel
@ 2020-01-09 15:10       ` Roger Pau Monné
  2020-01-09 15:34         ` Jan Beulich
  2020-01-09 15:54         ` Tamas K Lengyel
  0 siblings, 2 replies; 57+ messages in thread
From: Roger Pau Monné @ 2020-01-09 15:10 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Jan Beulich, Xen-devel, Julien Grall

On Thu, Jan 09, 2020 at 06:41:12AM -0700, Tamas K Lengyel wrote:
> On Thu, Jan 9, 2020 at 3:29 AM Julien Grall <julien@xen.org> wrote:
> >
> > Hi Tamas,
> >
> > On 08/01/2020 17:14, Tamas K Lengyel wrote:
> > > +static int mem_sharing_fork(struct domain *d, struct domain *cd)
> > > +{
> > > +    int rc;
> > > +
> > > +    if ( !d->controller_pause_count &&
> > > +         (rc = domain_pause_by_systemcontroller(d)) )
> >
> > AFAIU, the parent domain will be paused if it wasn't paused before and
> > this will not be unpaused by the same hypercall. Right?
> 
> Yes, it needs to remain paused as long as there are forks active from
> it. Afterwards it can be unpaused.

If you want the parent domain to remain paused for as long as the
forks are active, shouldn't each fork increment the pause count on
creation and decrement it when the fork is destroyed?

How can you assure no other operation or entity has incremented
controller_pause_count temporary and is likely to decrement it at some
point while forks are still active?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 15/18] xen/mem_sharing: VM forking
  2020-01-09 15:10       ` Roger Pau Monné
@ 2020-01-09 15:34         ` Jan Beulich
  2020-01-09 15:57           ` Tamas K Lengyel
  2020-01-09 15:54         ` Tamas K Lengyel
  1 sibling, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2020-01-09 15:34 UTC (permalink / raw)
  To: Roger Pau Monné, Tamas K Lengyel
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Xen-devel, Julien Grall

On 09.01.2020 16:10, Roger Pau Monné wrote:
> On Thu, Jan 09, 2020 at 06:41:12AM -0700, Tamas K Lengyel wrote:
>> On Thu, Jan 9, 2020 at 3:29 AM Julien Grall <julien@xen.org> wrote:
>>>
>>> Hi Tamas,
>>>
>>> On 08/01/2020 17:14, Tamas K Lengyel wrote:
>>>> +static int mem_sharing_fork(struct domain *d, struct domain *cd)
>>>> +{
>>>> +    int rc;
>>>> +
>>>> +    if ( !d->controller_pause_count &&
>>>> +         (rc = domain_pause_by_systemcontroller(d)) )
>>>
>>> AFAIU, the parent domain will be paused if it wasn't paused before and
>>> this will not be unpaused by the same hypercall. Right?
>>
>> Yes, it needs to remain paused as long as there are forks active from
>> it. Afterwards it can be unpaused.
> 
> If you want the parent domain to remain paused for as long as the
> forks are active, shouldn't each fork increment the pause count on
> creation and decrement it when the fork is destroyed?
> 
> How can you assure no other operation or entity has incremented
> controller_pause_count temporary and is likely to decrement it at some
> point while forks are still active?

The _by_systemcontroller variants look wrong to be used here anyway.
Why is this not simply domain_{,un}pause()?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 15/18] xen/mem_sharing: VM forking
  2020-01-09 15:10       ` Roger Pau Monné
  2020-01-09 15:34         ` Jan Beulich
@ 2020-01-09 15:54         ` Tamas K Lengyel
  1 sibling, 0 replies; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-09 15:54 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Jan Beulich, Xen-devel, Julien Grall

On Thu, Jan 9, 2020 at 8:10 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Thu, Jan 09, 2020 at 06:41:12AM -0700, Tamas K Lengyel wrote:
> > On Thu, Jan 9, 2020 at 3:29 AM Julien Grall <julien@xen.org> wrote:
> > >
> > > Hi Tamas,
> > >
> > > On 08/01/2020 17:14, Tamas K Lengyel wrote:
> > > > +static int mem_sharing_fork(struct domain *d, struct domain *cd)
> > > > +{
> > > > +    int rc;
> > > > +
> > > > +    if ( !d->controller_pause_count &&
> > > > +         (rc = domain_pause_by_systemcontroller(d)) )
> > >
> > > AFAIU, the parent domain will be paused if it wasn't paused before and
> > > this will not be unpaused by the same hypercall. Right?
> >
> > Yes, it needs to remain paused as long as there are forks active from
> > it. Afterwards it can be unpaused.
>
> If you want the parent domain to remain paused for as long as the
> forks are active, shouldn't each fork increment the pause count on
> creation and decrement it when the fork is destroyed?

That would work.

>
> How can you assure no other operation or entity has incremented
> controller_pause_count temporary and is likely to decrement it at some
> point while forks are still active?

Right now we don't do sanity checks. It is just expected that the user
is not doing anything insane like that - I would argue that for an
experimental system (one that is even hidden behind CONFIG_EXPERT) an
assumption like that is safe to make. But doing the pause/unpause
combo you describe above is pretty simple and should get the job done.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 15/18] xen/mem_sharing: VM forking
  2020-01-09 15:34         ` Jan Beulich
@ 2020-01-09 15:57           ` Tamas K Lengyel
  2020-01-09 16:03             ` Jan Beulich
  0 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-09 15:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Xen-devel, Julien Grall, Roger Pau Monné

On Thu, Jan 9, 2020 at 8:34 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 09.01.2020 16:10, Roger Pau Monné wrote:
> > On Thu, Jan 09, 2020 at 06:41:12AM -0700, Tamas K Lengyel wrote:
> >> On Thu, Jan 9, 2020 at 3:29 AM Julien Grall <julien@xen.org> wrote:
> >>>
> >>> Hi Tamas,
> >>>
> >>> On 08/01/2020 17:14, Tamas K Lengyel wrote:
> >>>> +static int mem_sharing_fork(struct domain *d, struct domain *cd)
> >>>> +{
> >>>> +    int rc;
> >>>> +
> >>>> +    if ( !d->controller_pause_count &&
> >>>> +         (rc = domain_pause_by_systemcontroller(d)) )
> >>>
> >>> AFAIU, the parent domain will be paused if it wasn't paused before and
> >>> this will not be unpaused by the same hypercall. Right?
> >>
> >> Yes, it needs to remain paused as long as there are forks active from
> >> it. Afterwards it can be unpaused.
> >
> > If you want the parent domain to remain paused for as long as the
> > forks are active, shouldn't each fork increment the pause count on
> > creation and decrement it when the fork is destroyed?
> >
> > How can you assure no other operation or entity has incremented
> > controller_pause_count temporary and is likely to decrement it at some
> > point while forks are still active?
>
> The _by_systemcontroller variants look wrong to be used here anyway.
> Why is this not simply domain_{,un}pause()?
>

My reasoning was that by default the user should pause the parent VM
before forking. This sanity checks just mimicks that step in case the
user didn't do that already. But I guess either would work, I don't
really see much difference between the two.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 15/18] xen/mem_sharing: VM forking
  2020-01-09 15:57           ` Tamas K Lengyel
@ 2020-01-09 16:03             ` Jan Beulich
  2020-01-09 16:06               ` Tamas K Lengyel
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2020-01-09 16:03 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Xen-devel, Julien Grall, Roger Pau Monné

On 09.01.2020 16:57, Tamas K Lengyel wrote:
> On Thu, Jan 9, 2020 at 8:34 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 09.01.2020 16:10, Roger Pau Monné wrote:
>>> On Thu, Jan 09, 2020 at 06:41:12AM -0700, Tamas K Lengyel wrote:
>>>> On Thu, Jan 9, 2020 at 3:29 AM Julien Grall <julien@xen.org> wrote:
>>>>>
>>>>> Hi Tamas,
>>>>>
>>>>> On 08/01/2020 17:14, Tamas K Lengyel wrote:
>>>>>> +static int mem_sharing_fork(struct domain *d, struct domain *cd)
>>>>>> +{
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if ( !d->controller_pause_count &&
>>>>>> +         (rc = domain_pause_by_systemcontroller(d)) )
>>>>>
>>>>> AFAIU, the parent domain will be paused if it wasn't paused before and
>>>>> this will not be unpaused by the same hypercall. Right?
>>>>
>>>> Yes, it needs to remain paused as long as there are forks active from
>>>> it. Afterwards it can be unpaused.
>>>
>>> If you want the parent domain to remain paused for as long as the
>>> forks are active, shouldn't each fork increment the pause count on
>>> creation and decrement it when the fork is destroyed?
>>>
>>> How can you assure no other operation or entity has incremented
>>> controller_pause_count temporary and is likely to decrement it at some
>>> point while forks are still active?
>>
>> The _by_systemcontroller variants look wrong to be used here anyway.
>> Why is this not simply domain_{,un}pause()?
>>
> 
> My reasoning was that by default the user should pause the parent VM
> before forking. This sanity checks just mimicks that step in case the
> user didn't do that already. But I guess either would work, I don't
> really see much difference between the two.

The main difference is that the one you currently use updates
d->controller_pause_count, which can be updated by other domctls, but
which shouldn't be updated behind the back of a component in Xen which
needs the entity paused.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 15/18] xen/mem_sharing: VM forking
  2020-01-09 16:03             ` Jan Beulich
@ 2020-01-09 16:06               ` Tamas K Lengyel
  0 siblings, 0 replies; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-09 16:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Xen-devel, Julien Grall, Roger Pau Monné

On Thu, Jan 9, 2020 at 9:03 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 09.01.2020 16:57, Tamas K Lengyel wrote:
> > On Thu, Jan 9, 2020 at 8:34 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 09.01.2020 16:10, Roger Pau Monné wrote:
> >>> On Thu, Jan 09, 2020 at 06:41:12AM -0700, Tamas K Lengyel wrote:
> >>>> On Thu, Jan 9, 2020 at 3:29 AM Julien Grall <julien@xen.org> wrote:
> >>>>>
> >>>>> Hi Tamas,
> >>>>>
> >>>>> On 08/01/2020 17:14, Tamas K Lengyel wrote:
> >>>>>> +static int mem_sharing_fork(struct domain *d, struct domain *cd)
> >>>>>> +{
> >>>>>> +    int rc;
> >>>>>> +
> >>>>>> +    if ( !d->controller_pause_count &&
> >>>>>> +         (rc = domain_pause_by_systemcontroller(d)) )
> >>>>>
> >>>>> AFAIU, the parent domain will be paused if it wasn't paused before and
> >>>>> this will not be unpaused by the same hypercall. Right?
> >>>>
> >>>> Yes, it needs to remain paused as long as there are forks active from
> >>>> it. Afterwards it can be unpaused.
> >>>
> >>> If you want the parent domain to remain paused for as long as the
> >>> forks are active, shouldn't each fork increment the pause count on
> >>> creation and decrement it when the fork is destroyed?
> >>>
> >>> How can you assure no other operation or entity has incremented
> >>> controller_pause_count temporary and is likely to decrement it at some
> >>> point while forks are still active?
> >>
> >> The _by_systemcontroller variants look wrong to be used here anyway.
> >> Why is this not simply domain_{,un}pause()?
> >>
> >
> > My reasoning was that by default the user should pause the parent VM
> > before forking. This sanity checks just mimicks that step in case the
> > user didn't do that already. But I guess either would work, I don't
> > really see much difference between the two.
>
> The main difference is that the one you currently use updates
> d->controller_pause_count, which can be updated by other domctls, but
> which shouldn't be updated behind the back of a component in Xen which
> needs the entity paused.
>

Alright, I'll switch it.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 01/18] x86/hvm: introduce hvm_copy_context_and_params
  2020-01-08 17:13 ` [Xen-devel] [PATCH v4 01/18] x86/hvm: introduce hvm_copy_context_and_params Tamas K Lengyel
@ 2020-01-16 12:27   ` Jan Beulich
  2020-01-16 14:06     ` Tamas K Lengyel
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2020-01-16 12:27 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: xen-devel, Roger Pau Monné, Wei Liu, Andrew Cooper

On 08.01.2020 18:13, Tamas K Lengyel wrote:
> @@ -4129,49 +4130,32 @@ static int hvm_allow_set_param(struct domain *d,
>      return rc;
>  }
>  
> -static int hvmop_set_param(
> -    XEN_GUEST_HANDLE_PARAM(xen_hvm_param_t) arg)
> +static int hvm_set_param(struct domain *d, uint32_t index, uint64_t value)
>  {
>      struct domain *curr_d = current->domain;
> -    struct xen_hvm_param a;
> -    struct domain *d;
> -    struct vcpu *v;
>      int rc;
> +    struct vcpu *v;

Nit: Personally I'd prefer if "rc" remained last.

> +int hvmop_set_param(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_param_t) arg)
> +{
> +    struct xen_hvm_param a;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&a, arg, 1) )
> +        return -EFAULT;
> +
> +    if ( a.index >= HVM_NR_PARAMS )
> +        return -EINVAL;
> +
> +    /* Make sure the above bound check is not bypassed during speculation. */
> +    block_speculation();
> +
> +    d = rcu_lock_domain_by_any_id(a.domid);
> +    if ( d == NULL )
> +        return -ESRCH;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    rc = hvm_set_param(d, a.index, a.value);

With

    rc = -EINVAL;
    if ( is_hvm_domain(d) )
        rc = hvm_set_param(d, a.index, a.value);

the function wouldn't need an "out" label (and hence any goto)
anymore. I know others are less picky about goto-s than me, but
I think in cases where it's easy to avoid them they would better
be avoided.

> @@ -4400,6 +4414,43 @@ static int hvm_allow_get_param(struct domain *d,
>      return rc;
>  }
>  
> +static int hvm_get_param(struct domain *d, uint32_t index, uint64_t *value)
> +{
> +    int rc;
> +
> +    if ( index >= HVM_NR_PARAMS || !value )
> +        return -EINVAL;

I don't think the range check is needed here: It's redundant with
that in hvmop_get_param() and pointless for the new function you
add. (Same for "set" then, but I noticed it here first.) I also
don't think value needs checking against NULL in a case like this
one (we don't typically do so elsewhere in similar situations).

> @@ -5266,6 +5294,37 @@ void hvm_set_segment_register(struct vcpu *v, enum x86_segment seg,
>      alternative_vcall(hvm_funcs.set_segment_register, v, seg, reg);
>  }
>  
> +int hvm_copy_context_and_params(struct domain *src, struct domain *dst)

Following memcpy() and alike, perhaps better to have dst first and
src second?

> +{
> +    int rc, i;

unsigned int for i please.

> +    struct hvm_domain_context c = { };
> +
> +    c.size = hvm_save_size(src);

Put in the variable's initializer?

> +    if ( (c.data = xmalloc_bytes(c.size)) == NULL )

How likely is it for this to be more than a page's worth of space?
IOW wouldn't it be better to use vmalloc() here right away, even if
right now this may still fit in a page (which I'm not sure it does)?

> +        return -ENOMEM;
> +
> +    for ( i = 0; i < HVM_NR_PARAMS; i++ )
> +    {
> +        uint64_t value = 0;
> +
> +        if ( hvm_get_param(src, i, &value) || !value )
> +            continue;
> +
> +        if ( (rc = hvm_set_param(dst, i, value)) )
> +            goto out;
> +    }
> +
> +    if ( (rc = hvm_save(src, &c)) )
> +        goto out;

Better do this ahead of the loop? There's no point in fiddling with
dst if this fails, I would think.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 01/18] x86/hvm: introduce hvm_copy_context_and_params
  2020-01-16 12:27   ` Jan Beulich
@ 2020-01-16 14:06     ` Tamas K Lengyel
  0 siblings, 0 replies; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-16 14:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Xen-devel, Andrew Cooper, Tamas K Lengyel, Wei Liu, Roger Pau Monné

On Thu, Jan 16, 2020 at 5:28 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 08.01.2020 18:13, Tamas K Lengyel wrote:
> > @@ -4129,49 +4130,32 @@ static int hvm_allow_set_param(struct domain *d,
> >      return rc;
> >  }
> >
> > -static int hvmop_set_param(
> > -    XEN_GUEST_HANDLE_PARAM(xen_hvm_param_t) arg)
> > +static int hvm_set_param(struct domain *d, uint32_t index, uint64_t value)
> >  {
> >      struct domain *curr_d = current->domain;
> > -    struct xen_hvm_param a;
> > -    struct domain *d;
> > -    struct vcpu *v;
> >      int rc;
> > +    struct vcpu *v;
>
> Nit: Personally I'd prefer if "rc" remained last.
>
> > +int hvmop_set_param(
> > +    XEN_GUEST_HANDLE_PARAM(xen_hvm_param_t) arg)
> > +{
> > +    struct xen_hvm_param a;
> > +    struct domain *d;
> > +    int rc;
> > +
> > +    if ( copy_from_guest(&a, arg, 1) )
> > +        return -EFAULT;
> > +
> > +    if ( a.index >= HVM_NR_PARAMS )
> > +        return -EINVAL;
> > +
> > +    /* Make sure the above bound check is not bypassed during speculation. */
> > +    block_speculation();
> > +
> > +    d = rcu_lock_domain_by_any_id(a.domid);
> > +    if ( d == NULL )
> > +        return -ESRCH;
> > +
> > +    rc = -EINVAL;
> > +    if ( !is_hvm_domain(d) )
> > +        goto out;
> > +
> > +    rc = hvm_set_param(d, a.index, a.value);
>
> With
>
>     rc = -EINVAL;
>     if ( is_hvm_domain(d) )
>         rc = hvm_set_param(d, a.index, a.value);
>
> the function wouldn't need an "out" label (and hence any goto)
> anymore. I know others are less picky about goto-s than me, but
> I think in cases where it's easy to avoid them they would better
> be avoided.
>
> > @@ -4400,6 +4414,43 @@ static int hvm_allow_get_param(struct domain *d,
> >      return rc;
> >  }
> >
> > +static int hvm_get_param(struct domain *d, uint32_t index, uint64_t *value)
> > +{
> > +    int rc;
> > +
> > +    if ( index >= HVM_NR_PARAMS || !value )
> > +        return -EINVAL;
>
> I don't think the range check is needed here: It's redundant with
> that in hvmop_get_param() and pointless for the new function you
> add. (Same for "set" then, but I noticed it here first.) I also
> don't think value needs checking against NULL in a case like this
> one (we don't typically do so elsewhere in similar situations).
>
> > @@ -5266,6 +5294,37 @@ void hvm_set_segment_register(struct vcpu *v, enum x86_segment seg,
> >      alternative_vcall(hvm_funcs.set_segment_register, v, seg, reg);
> >  }
> >
> > +int hvm_copy_context_and_params(struct domain *src, struct domain *dst)
>
> Following memcpy() and alike, perhaps better to have dst first and
> src second?
>
> > +{
> > +    int rc, i;
>
> unsigned int for i please.
>
> > +    struct hvm_domain_context c = { };
> > +
> > +    c.size = hvm_save_size(src);
>
> Put in the variable's initializer?
>
> > +    if ( (c.data = xmalloc_bytes(c.size)) == NULL )
>
> How likely is it for this to be more than a page's worth of space?
> IOW wouldn't it be better to use vmalloc() here right away, even if
> right now this may still fit in a page (which I'm not sure it does)?

I'm not sure what the size is normally, never checked.

>
> > +        return -ENOMEM;
> > +
> > +    for ( i = 0; i < HVM_NR_PARAMS; i++ )
> > +    {
> > +        uint64_t value = 0;
> > +
> > +        if ( hvm_get_param(src, i, &value) || !value )
> > +            continue;
> > +
> > +        if ( (rc = hvm_set_param(dst, i, value)) )
> > +            goto out;
> > +    }
> > +
> > +    if ( (rc = hvm_save(src, &c)) )
> > +        goto out;
>
> Better do this ahead of the loop? There's no point in fiddling with
> dst if this fails, I would think.

Thanks for the review, I don't have any objections to the things you
pointed out.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 05/18] x86/mem_sharing: don't try to unshare twice during page fault
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 05/18] x86/mem_sharing: don't try to unshare twice during page fault Tamas K Lengyel
@ 2020-01-16 14:53   ` Jan Beulich
  2020-01-16 15:59     ` Tamas K Lengyel
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2020-01-16 14:53 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: xen-devel, Roger Pau Monné, Wei Liu, Andrew Cooper

On 08.01.2020 18:14, Tamas K Lengyel wrote:
> @@ -1702,11 +1703,14 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
>      struct domain *currd = curr->domain;
>      struct p2m_domain *p2m, *hostp2m;
>      int rc, fall_through = 0, paged = 0;
> -    int sharing_enomem = 0;
>      vm_event_request_t *req_ptr = NULL;
>      bool sync = false;
>      unsigned int page_order;
>  
> +#ifdef CONFIG_MEM_SHARING
> +    bool sharing_enomem = false;
> +#endif

To reduce #ifdef-ary, could you leave this alone (or convert to
bool in place, without #ifdef) and ...

> @@ -1955,19 +1961,21 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
>       */
>      if ( paged )
>          p2m_mem_paging_populate(currd, gfn);
> +
> +#ifdef CONFIG_MEM_SHARING
>      if ( sharing_enomem )
>      {
> -        int rv;
> -
> -        if ( (rv = mem_sharing_notify_enomem(currd, gfn, true)) < 0 )
> +        if ( !vm_event_check_ring(currd->vm_event_share) )
>          {
> -            gdprintk(XENLOG_ERR, "Domain %hu attempt to unshare "
> -                     "gfn %lx, ENOMEM and no helper (rc %d)\n",
> -                     currd->domain_id, gfn, rv);
> +            gprintk(XENLOG_ERR, "Domain %pd attempt to unshare "
> +                    "gfn %lx, ENOMEM and no helper\n",
> +                    currd, gfn);
>              /* Crash the domain */
>              rc = 0;
>          }
>      }
> +#endif

... move the #ifdef inside the braces here? With this
Acked-by: Jan Beulich <jbeulich@suse.com>


Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 06/18] x86/mem_sharing: define mem_sharing_domain to hold some scattered variables
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 06/18] x86/mem_sharing: define mem_sharing_domain to hold some scattered variables Tamas K Lengyel
@ 2020-01-16 15:23   ` Jan Beulich
  2020-01-16 16:05     ` Tamas K Lengyel
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2020-01-16 15:23 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	xen-devel, Roger Pau Monné

On 08.01.2020 18:14, Tamas K Lengyel wrote:
> Create struct mem_sharing_domain under hvm_domain and move mem sharing
> variables into it from p2m_domain and hvm_domain.
> 
> Expose the mem_sharing_enabled macro to be used consistently across Xen.
> 
> Remove some duplicate calls to mem_sharing_enabled in mem_sharing.c
> 
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>

Acked-by: Jan Beulich <jbeulich@suse.com>
with one question:

> @@ -192,6 +192,10 @@ struct hvm_domain {
>          struct vmx_domain vmx;
>          struct svm_domain svm;
>      };
> +
> +#ifdef CONFIG_MEM_SHARING
> +    struct mem_sharing_domain mem_sharing;
> +#endif

Are you intending to add fields to this new struct? If so,
should the field here become a pointer, and the structure
allocated only when actually needed?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 07/18] x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in relinquish_shared_pages
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 07/18] x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in relinquish_shared_pages Tamas K Lengyel
@ 2020-01-16 15:40   ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2020-01-16 15:40 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	xen-devel, Roger Pau Monné

On 08.01.2020 18:14, Tamas K Lengyel wrote:
> While using _mfn(0) is of no consequence during teardown, INVALID_MFN is the
> correct value that should be used.
> 
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 08/18] x86/mem_sharing: Make add_to_physmap static and shorten name
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 08/18] x86/mem_sharing: Make add_to_physmap static and shorten name Tamas K Lengyel
@ 2020-01-16 15:40   ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2020-01-16 15:40 UTC (permalink / raw)
  To: Tamas K Lengyel, xen-devel
  Cc: George Dunlap, Andrew Cooper, Tamas K Lengyel, Wei Liu,
	Roger Pau Monné

On 08.01.2020 18:14, Tamas K Lengyel wrote:
> It's not being called from outside mem_sharing.c
> 
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 09/18] x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 09/18] x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool Tamas K Lengyel
@ 2020-01-16 15:42   ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2020-01-16 15:42 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	xen-devel, Roger Pau Monné

On 08.01.2020 18:14, Tamas K Lengyel wrote:
> MEM_SHARING_DESTROY_GFN is used on the 'flags' bitfield during unsharing.
> However, the bitfield is not used for anything else, so just convert it to a
> bool instead.
> 
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 00/18] VM forking
  2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
                   ` (17 preceding siblings ...)
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 18/18] xen/tools: VM forking toolstack side Tamas K Lengyel
@ 2020-01-16 15:47 ` Jan Beulich
  2020-01-16 16:24   ` Tamas K Lengyel
  18 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2020-01-16 15:47 UTC (permalink / raw)
  To: George Dunlap
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, Andrew Cooper, Ian Jackson,
	Anthony PERARD, Tamas K Lengyel, Julien Grall, Alexandru Isaila,
	xen-devel, Roger Pau Monné

On 08.01.2020 18:13, Tamas K Lengyel wrote:
> Tamas K Lengyel (18):
>   x86/hvm: introduce hvm_copy_context_and_params
>   xen/x86: Make hap_get_allocation accessible
>   x86/mem_sharing: make get_two_gfns take locks conditionally
>   x86/mem_sharing: drop flags from mem_sharing_unshare_page
>   x86/mem_sharing: don't try to unshare twice during page fault
>   x86/mem_sharing: define mem_sharing_domain to hold some scattered
>     variables
>   x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in
>     relinquish_shared_pages
>   x86/mem_sharing: Make add_to_physmap static and shorten name
>   x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool

I've looked at these patches, and I think 2-4 and 7-9 could go
in right away (6 has a small question pending, but may otherwise
also be ready), if you were to give (or delegate) your ack that
they would need afaict. Thoughts?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 05/18] x86/mem_sharing: don't try to unshare twice during page fault
  2020-01-16 14:53   ` Jan Beulich
@ 2020-01-16 15:59     ` Tamas K Lengyel
  2020-01-16 16:03       ` Jan Beulich
  0 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-16 15:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Xen-devel, Andrew Cooper, Tamas K Lengyel, Wei Liu, Roger Pau Monné

On Thu, Jan 16, 2020 at 7:55 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 08.01.2020 18:14, Tamas K Lengyel wrote:
> > @@ -1702,11 +1703,14 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
> >      struct domain *currd = curr->domain;
> >      struct p2m_domain *p2m, *hostp2m;
> >      int rc, fall_through = 0, paged = 0;
> > -    int sharing_enomem = 0;
> >      vm_event_request_t *req_ptr = NULL;
> >      bool sync = false;
> >      unsigned int page_order;
> >
> > +#ifdef CONFIG_MEM_SHARING
> > +    bool sharing_enomem = false;
> > +#endif
>
> To reduce #ifdef-ary, could you leave this alone (or convert to
> bool in place, without #ifdef) and ...
>
> > @@ -1955,19 +1961,21 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
> >       */
> >      if ( paged )
> >          p2m_mem_paging_populate(currd, gfn);
> > +
> > +#ifdef CONFIG_MEM_SHARING
> >      if ( sharing_enomem )
> >      {
> > -        int rv;
> > -
> > -        if ( (rv = mem_sharing_notify_enomem(currd, gfn, true)) < 0 )
> > +        if ( !vm_event_check_ring(currd->vm_event_share) )
> >          {
> > -            gdprintk(XENLOG_ERR, "Domain %hu attempt to unshare "
> > -                     "gfn %lx, ENOMEM and no helper (rc %d)\n",
> > -                     currd->domain_id, gfn, rv);
> > +            gprintk(XENLOG_ERR, "Domain %pd attempt to unshare "
> > +                    "gfn %lx, ENOMEM and no helper\n",
> > +                    currd, gfn);
> >              /* Crash the domain */
> >              rc = 0;
> >          }
> >      }
> > +#endif
>
> ... move the #ifdef inside the braces here? With this
> Acked-by: Jan Beulich <jbeulich@suse.com>

SGTM, I assume you are counting on the compiler to just get rid of the
variable when it sees its never used?

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 10/18] x86/mem_sharing: Replace MEM_SHARING_DEBUG with gdprintk
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 10/18] x86/mem_sharing: Replace MEM_SHARING_DEBUG with gdprintk Tamas K Lengyel
@ 2020-01-16 16:01   ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2020-01-16 16:01 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	xen-devel, Roger Pau Monné

On 08.01.2020 18:14, Tamas K Lengyel wrote:
> @@ -494,19 +491,19 @@ static int audit(void)
>          /* If we can't lock it, it's definitely not a shared page */
>          if ( !mem_sharing_page_lock(pg) )
>          {
> -            MEM_SHARING_DEBUG(
> -                "mfn %lx in audit list, but cannot be locked (%lx)!\n",
> -                mfn_x(mfn), pg->u.inuse.type_info);
> -            errors++;
> -            continue;
> +            gdprintk(XENLOG_ERR,
> +                     "mfn %lx in audit list, but cannot be locked (%lx)!\n",
> +                     mfn_x(mfn), pg->u.inuse.type_info);
> +           errors++;
> +           continue;

There looks to be one space too little on these last two lines and ...

> @@ -514,24 +511,24 @@ static int audit(void)
>          /* Check the page owner. */
>          if ( page_get_owner(pg) != dom_cow )
>          {
> -            MEM_SHARING_DEBUG("mfn %lx shared, but wrong owner %pd!\n",
> -                              mfn_x(mfn), page_get_owner(pg));
> -            errors++;
> +               gdprintk(XENLOG_ERR, "mfn %lx shared, but wrong owner (%hu)!\n",
> +                        mfn_x(mfn), page_get_owner(pg)->domain_id);
> +               errors++;

... a few too many here and ...

>          }
>  
>          /* Check the m2p entry */
>          if ( !SHARED_M2P(get_gpfn_from_mfn(mfn_x(mfn))) )
>          {
> -            MEM_SHARING_DEBUG("mfn %lx shared, but wrong m2p entry (%lx)!\n",
> -                              mfn_x(mfn), get_gpfn_from_mfn(mfn_x(mfn)));
> -            errors++;
> +               gdprintk(XENLOG_ERR, "mfn %lx shared, but wrong m2p entry (%lx)!\n",
> +                        mfn_x(mfn), get_gpfn_from_mfn(mfn_x(mfn)));
> +               errors++;

... here.

Also please switch to the %pd format for domain IDs you log
anywhere here.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 05/18] x86/mem_sharing: don't try to unshare twice during page fault
  2020-01-16 15:59     ` Tamas K Lengyel
@ 2020-01-16 16:03       ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2020-01-16 16:03 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Andrew Cooper, Tamas K Lengyel, Xen-devel, Wei Liu, Roger Pau Monné

On 16.01.2020 16:59, Tamas K Lengyel wrote:
> On Thu, Jan 16, 2020 at 7:55 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 08.01.2020 18:14, Tamas K Lengyel wrote:
>>> @@ -1702,11 +1703,14 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
>>>      struct domain *currd = curr->domain;
>>>      struct p2m_domain *p2m, *hostp2m;
>>>      int rc, fall_through = 0, paged = 0;
>>> -    int sharing_enomem = 0;
>>>      vm_event_request_t *req_ptr = NULL;
>>>      bool sync = false;
>>>      unsigned int page_order;
>>>
>>> +#ifdef CONFIG_MEM_SHARING
>>> +    bool sharing_enomem = false;
>>> +#endif
>>
>> To reduce #ifdef-ary, could you leave this alone (or convert to
>> bool in place, without #ifdef) and ...
>>
>>> @@ -1955,19 +1961,21 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
>>>       */
>>>      if ( paged )
>>>          p2m_mem_paging_populate(currd, gfn);
>>> +
>>> +#ifdef CONFIG_MEM_SHARING
>>>      if ( sharing_enomem )
>>>      {
>>> -        int rv;
>>> -
>>> -        if ( (rv = mem_sharing_notify_enomem(currd, gfn, true)) < 0 )
>>> +        if ( !vm_event_check_ring(currd->vm_event_share) )
>>>          {
>>> -            gdprintk(XENLOG_ERR, "Domain %hu attempt to unshare "
>>> -                     "gfn %lx, ENOMEM and no helper (rc %d)\n",
>>> -                     currd->domain_id, gfn, rv);
>>> +            gprintk(XENLOG_ERR, "Domain %pd attempt to unshare "
>>> +                    "gfn %lx, ENOMEM and no helper\n",
>>> +                    currd, gfn);
>>>              /* Crash the domain */
>>>              rc = 0;
>>>          }
>>>      }
>>> +#endif
>>
>> ... move the #ifdef inside the braces here? With this
>> Acked-by: Jan Beulich <jbeulich@suse.com>
> 
> SGTM, I assume you are counting on the compiler to just get rid of the
> variable when it sees its never used?

Yes (and for un-optimized code it doesn't matter).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 06/18] x86/mem_sharing: define mem_sharing_domain to hold some scattered variables
  2020-01-16 15:23   ` Jan Beulich
@ 2020-01-16 16:05     ` Tamas K Lengyel
  2020-01-16 16:08       ` Jan Beulich
  0 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-16 16:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel, Roger Pau Monné

On Thu, Jan 16, 2020 at 8:23 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 08.01.2020 18:14, Tamas K Lengyel wrote:
> > Create struct mem_sharing_domain under hvm_domain and move mem sharing
> > variables into it from p2m_domain and hvm_domain.
> >
> > Expose the mem_sharing_enabled macro to be used consistently across Xen.
> >
> > Remove some duplicate calls to mem_sharing_enabled in mem_sharing.c
> >
> > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
>
> Acked-by: Jan Beulich <jbeulich@suse.com>
> with one question:
>
> > @@ -192,6 +192,10 @@ struct hvm_domain {
> >          struct vmx_domain vmx;
> >          struct svm_domain svm;
> >      };
> > +
> > +#ifdef CONFIG_MEM_SHARING
> > +    struct mem_sharing_domain mem_sharing;
> > +#endif
>
> Are you intending to add fields to this new struct? If so,
> should the field here become a pointer, and the structure
> allocated only when actually needed?
>

At the moment there are no additional variables planned to be added.
If/when we do we can consider turning this into a pointer, at which
point we can also get rid of the "enabled" field and turn the
mem_sharing_enabled macro into a NULL-pointer check instead. For now I
wouldn't bother because its not like we save much by doing so.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 11/18] x86/mem_sharing: ASSERT that p2m_set_entry succeeds
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 11/18] x86/mem_sharing: ASSERT that p2m_set_entry succeeds Tamas K Lengyel
@ 2020-01-16 16:07   ` Jan Beulich
  2020-01-16 16:12     ` Tamas K Lengyel
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2020-01-16 16:07 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	xen-devel, Roger Pau Monné

On 08.01.2020 18:14, Tamas K Lengyel wrote:
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> ---
>  xen/arch/x86/mm/mem_sharing.c | 42 +++++++++++++++++------------------
>  1 file changed, 21 insertions(+), 21 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> index 93e7605900..3f36cd6bbc 100644
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -1117,11 +1117,19 @@ int add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh,
>          goto err_unlock;
>      }
>  
> +    /*
> +     * Must succeed, we just read the entry and hold the p2m lock
> +     * via get_two_gfns.
> +     */
>      ret = p2m_set_entry(p2m, _gfn(cgfn), smfn, PAGE_ORDER_4K,
>                          p2m_ram_shared, a);
> +    ASSERT(!ret);

And there's no risk of -ENOMEM because of needing to split a
larger order page? At the very least the reasoning in the
comment would need further extending.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 06/18] x86/mem_sharing: define mem_sharing_domain to hold some scattered variables
  2020-01-16 16:05     ` Tamas K Lengyel
@ 2020-01-16 16:08       ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2020-01-16 16:08 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel, Roger Pau Monné

On 16.01.2020 17:05, Tamas K Lengyel wrote:
> On Thu, Jan 16, 2020 at 8:23 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 08.01.2020 18:14, Tamas K Lengyel wrote:
>>> Create struct mem_sharing_domain under hvm_domain and move mem sharing
>>> variables into it from p2m_domain and hvm_domain.
>>>
>>> Expose the mem_sharing_enabled macro to be used consistently across Xen.
>>>
>>> Remove some duplicate calls to mem_sharing_enabled in mem_sharing.c
>>>
>>> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
>>
>> Acked-by: Jan Beulich <jbeulich@suse.com>
>> with one question:
>>
>>> @@ -192,6 +192,10 @@ struct hvm_domain {
>>>          struct vmx_domain vmx;
>>>          struct svm_domain svm;
>>>      };
>>> +
>>> +#ifdef CONFIG_MEM_SHARING
>>> +    struct mem_sharing_domain mem_sharing;
>>> +#endif
>>
>> Are you intending to add fields to this new struct? If so,
>> should the field here become a pointer, and the structure
>> allocated only when actually needed?
>>
> 
> At the moment there are no additional variables planned to be added.
> If/when we do we can consider turning this into a pointer, at which
> point we can also get rid of the "enabled" field and turn the
> mem_sharing_enabled macro into a NULL-pointer check instead. For now I
> wouldn't bother because its not like we save much by doing so.

Thanks for clarifying.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 11/18] x86/mem_sharing: ASSERT that p2m_set_entry succeeds
  2020-01-16 16:07   ` Jan Beulich
@ 2020-01-16 16:12     ` Tamas K Lengyel
  2020-01-17  9:23       ` Jan Beulich
  0 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-16 16:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel, Roger Pau Monné

On Thu, Jan 16, 2020 at 9:07 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 08.01.2020 18:14, Tamas K Lengyel wrote:
> > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> > ---
> >  xen/arch/x86/mm/mem_sharing.c | 42 +++++++++++++++++------------------
> >  1 file changed, 21 insertions(+), 21 deletions(-)
> >
> > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> > index 93e7605900..3f36cd6bbc 100644
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -1117,11 +1117,19 @@ int add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh,
> >          goto err_unlock;
> >      }
> >
> > +    /*
> > +     * Must succeed, we just read the entry and hold the p2m lock
> > +     * via get_two_gfns.
> > +     */
> >      ret = p2m_set_entry(p2m, _gfn(cgfn), smfn, PAGE_ORDER_4K,
> >                          p2m_ram_shared, a);
> > +    ASSERT(!ret);
>
> And there's no risk of -ENOMEM because of needing to split a
> larger order page? At the very least the reasoning in the
> comment would need further extending.
>

No because we are plugging a hole in the domain. There is no larger
page mapped in here that would need to be broken up.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 12/18] x86/mem_sharing: Enable mem_sharing on first memop
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 12/18] x86/mem_sharing: Enable mem_sharing on first memop Tamas K Lengyel
@ 2020-01-16 16:18   ` Jan Beulich
  2020-01-16 16:34     ` Tamas K Lengyel
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2020-01-16 16:18 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	xen-devel, Roger Pau Monné

On 08.01.2020 18:14, Tamas K Lengyel wrote:
> It is wasteful to require separate hypercalls to enable sharing on both the
> parent and the client domain during VM forking. To speed things up we enable
> sharing on the first memop in case it wasn't already enabled.
> 
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> ---
>  xen/arch/x86/mm/mem_sharing.c | 36 +++++++++++++++++++++--------------
>  1 file changed, 22 insertions(+), 14 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> index 3f36cd6bbc..b8a9228ecf 100644
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -1412,6 +1412,24 @@ static int range_share(struct domain *d, struct domain *cd,
>      return rc;
>  }
>  
> +static inline int mem_sharing_control(struct domain *d, bool enable)
> +{
> +    if ( enable )
> +    {
> +        if ( unlikely(!is_hvm_domain(d)) )
> +            return -ENOSYS;

-EOPNOTSUPP or some such please. ENOSYS has a very specific meaning,
which (according to my understanding) doesn't apply here.

> +        if ( unlikely(!hap_enabled(d)) )
> +            return -ENODEV;

Doesn't this allow dropping the HAP check from
mem_sharing_enabled(d)?

> +        if ( unlikely(is_iommu_enabled(d)) )
> +            return -EXDEV;
> +    }
> +
> +    d->arch.hvm.mem_sharing.enabled = enable;
> +    return 0;
> +}
> +
>  int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>  {
>      int rc;
> @@ -1433,10 +1451,8 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>      if ( rc )
>          goto out;
>  
> -    /* Only HAP is supported */
> -    rc = -ENODEV;
> -    if ( !mem_sharing_enabled(d) )
> -        goto out;
> +    if ( !mem_sharing_enabled(d) && (rc = mem_sharing_control(d, true)) )
> +        return rc;

Perhaps already in patch 6, doesn't this eliminate the need for the
individual mem_sharing_enabled() checks in the case blocks?

> @@ -1703,18 +1719,10 @@ int mem_sharing_domctl(struct domain *d, struct xen_domctl_mem_sharing_op *mec)
>  {
>      int rc;
>  
> -    /* Only HAP is supported */
> -    if ( !hap_enabled(d) )
> -        return -ENODEV;
> -
> -    switch ( mec->op )
> +    switch( mec->op )

Please don't corrupt proper Xen style.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 00/18] VM forking
  2020-01-16 15:47 ` [Xen-devel] [PATCH v4 00/18] VM forking Jan Beulich
@ 2020-01-16 16:24   ` Tamas K Lengyel
  2020-01-17  9:12     ` Jan Beulich
  0 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-16 16:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Anthony PERARD, Julien Grall, Alexandru Isaila, Xen-devel,
	Roger Pau Monné

On Thu, Jan 16, 2020 at 8:47 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 08.01.2020 18:13, Tamas K Lengyel wrote:
> > Tamas K Lengyel (18):
> >   x86/hvm: introduce hvm_copy_context_and_params
> >   xen/x86: Make hap_get_allocation accessible
> >   x86/mem_sharing: make get_two_gfns take locks conditionally
> >   x86/mem_sharing: drop flags from mem_sharing_unshare_page
> >   x86/mem_sharing: don't try to unshare twice during page fault
> >   x86/mem_sharing: define mem_sharing_domain to hold some scattered
> >     variables
> >   x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in
> >     relinquish_shared_pages
> >   x86/mem_sharing: Make add_to_physmap static and shorten name
> >   x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool
>
> I've looked at these patches, and I think 2-4 and 7-9 could go
> in right away (6 has a small question pending, but may otherwise
> also be ready), if you were to give (or delegate) your ack that
> they would need afaict. Thoughts?
>

Not sure I understand your question. My understanding is that since
I'm the maintainer of the code being changed by these patches I just
need a "reviewed-by" from someone in the community and no outstanding
issue on them. Provided this is v4 now of the series and no issues
were raised so far for these particular patches they can be merged
with your Reviewed-by.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 12/18] x86/mem_sharing: Enable mem_sharing on first memop
  2020-01-16 16:18   ` Jan Beulich
@ 2020-01-16 16:34     ` Tamas K Lengyel
  0 siblings, 0 replies; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-16 16:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel, Roger Pau Monné

On Thu, Jan 16, 2020 at 9:18 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 08.01.2020 18:14, Tamas K Lengyel wrote:
> > It is wasteful to require separate hypercalls to enable sharing on both the
> > parent and the client domain during VM forking. To speed things up we enable
> > sharing on the first memop in case it wasn't already enabled.
> >
> > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> > ---
> >  xen/arch/x86/mm/mem_sharing.c | 36 +++++++++++++++++++++--------------
> >  1 file changed, 22 insertions(+), 14 deletions(-)
> >
> > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> > index 3f36cd6bbc..b8a9228ecf 100644
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -1412,6 +1412,24 @@ static int range_share(struct domain *d, struct domain *cd,
> >      return rc;
> >  }
> >
> > +static inline int mem_sharing_control(struct domain *d, bool enable)
> > +{
> > +    if ( enable )
> > +    {
> > +        if ( unlikely(!is_hvm_domain(d)) )
> > +            return -ENOSYS;
>
> -EOPNOTSUPP or some such please. ENOSYS has a very specific meaning,
> which (according to my understanding) doesn't apply here.
>
> > +        if ( unlikely(!hap_enabled(d)) )
> > +            return -ENODEV;
>
> Doesn't this allow dropping the HAP check from
> mem_sharing_enabled(d)?

Yes, looks like it could be dropped from there.

>
> > +        if ( unlikely(is_iommu_enabled(d)) )
> > +            return -EXDEV;
> > +    }
> > +
> > +    d->arch.hvm.mem_sharing.enabled = enable;
> > +    return 0;
> > +}
> > +
> >  int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
> >  {
> >      int rc;
> > @@ -1433,10 +1451,8 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
> >      if ( rc )
> >          goto out;
> >
> > -    /* Only HAP is supported */
> > -    rc = -ENODEV;
> > -    if ( !mem_sharing_enabled(d) )
> > -        goto out;
> > +    if ( !mem_sharing_enabled(d) && (rc = mem_sharing_control(d, true)) )
> > +        return rc;
>
> Perhaps already in patch 6, doesn't this eliminate the need for the
> individual mem_sharing_enabled() checks in the case blocks?

Yes it does. I think I was planning on removing those checks but it
slipped my mind.

>
> > @@ -1703,18 +1719,10 @@ int mem_sharing_domctl(struct domain *d, struct xen_domctl_mem_sharing_op *mec)
> >  {
> >      int rc;
> >
> > -    /* Only HAP is supported */
> > -    if ( !hap_enabled(d) )
> > -        return -ENODEV;
> > -
> > -    switch ( mec->op )
> > +    switch( mec->op )
>
> Please don't corrupt proper Xen style.

Ack.

Thanks!
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 00/18] VM forking
  2020-01-16 16:24   ` Tamas K Lengyel
@ 2020-01-17  9:12     ` Jan Beulich
  2020-01-17 11:15       ` Anthony PERARD
  2020-01-17 14:25       ` Tamas K Lengyel
  0 siblings, 2 replies; 57+ messages in thread
From: Jan Beulich @ 2020-01-17  9:12 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Anthony PERARD, Julien Grall, Alexandru Isaila, Xen-devel,
	Roger Pau Monné

On 16.01.2020 17:24, Tamas K Lengyel wrote:
> On Thu, Jan 16, 2020 at 8:47 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 08.01.2020 18:13, Tamas K Lengyel wrote:
>>> Tamas K Lengyel (18):
>>>   x86/hvm: introduce hvm_copy_context_and_params
>>>   xen/x86: Make hap_get_allocation accessible
>>>   x86/mem_sharing: make get_two_gfns take locks conditionally
>>>   x86/mem_sharing: drop flags from mem_sharing_unshare_page
>>>   x86/mem_sharing: don't try to unshare twice during page fault
>>>   x86/mem_sharing: define mem_sharing_domain to hold some scattered
>>>     variables
>>>   x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in
>>>     relinquish_shared_pages
>>>   x86/mem_sharing: Make add_to_physmap static and shorten name
>>>   x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool
>>
>> I've looked at these patches, and I think 2-4 and 7-9 could go
>> in right away (6 has a small question pending, but may otherwise
>> also be ready), if you were to give (or delegate) your ack that
>> they would need afaict. Thoughts?
>>
> 
> Not sure I understand your question. My understanding is that since
> I'm the maintainer of the code being changed by these patches I just
> need a "reviewed-by" from someone in the community and no outstanding
> issue on them.

Please note that my previous mail was _to_ George, with you only
_cc_-ed. Hence the question was to George, not you. (It is a
common issue which I keep mentioning on meetings that the
distinction of To and Cc is often not being honored, albeit
typically more by senders than recipients.)

> Provided this is v4 now of the series and no issues
> were raised so far for these particular patches they can be merged
> with your Reviewed-by.

I don't think so, under the current (sufficiently) common
understanding of the rules. See George's proposal to change to a
model like what you imply:
https://lists.xenproject.org/archives/html/xen-devel/2020-01/msg00885.html

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 11/18] x86/mem_sharing: ASSERT that p2m_set_entry succeeds
  2020-01-16 16:12     ` Tamas K Lengyel
@ 2020-01-17  9:23       ` Jan Beulich
  2020-01-17 16:59         ` Tamas K Lengyel
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2020-01-17  9:23 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel, Roger Pau Monné

On 16.01.2020 17:12, Tamas K Lengyel wrote:
> On Thu, Jan 16, 2020 at 9:07 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 08.01.2020 18:14, Tamas K Lengyel wrote:
>>> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
>>> ---
>>>  xen/arch/x86/mm/mem_sharing.c | 42 +++++++++++++++++------------------
>>>  1 file changed, 21 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
>>> index 93e7605900..3f36cd6bbc 100644
>>> --- a/xen/arch/x86/mm/mem_sharing.c
>>> +++ b/xen/arch/x86/mm/mem_sharing.c
>>> @@ -1117,11 +1117,19 @@ int add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh,
>>>          goto err_unlock;
>>>      }
>>>
>>> +    /*
>>> +     * Must succeed, we just read the entry and hold the p2m lock
>>> +     * via get_two_gfns.
>>> +     */
>>>      ret = p2m_set_entry(p2m, _gfn(cgfn), smfn, PAGE_ORDER_4K,
>>>                          p2m_ram_shared, a);
>>> +    ASSERT(!ret);
>>
>> And there's no risk of -ENOMEM because of needing to split a
>> larger order page? At the very least the reasoning in the
>> comment would need further extending.
> 
> No because we are plugging a hole in the domain. There is no larger
> page mapped in here that would need to be broken up.

p2m_is_hole() also covers p2m_mmio_dm and p2m_invalid. The former
(should really be the latter) is what you'll get back for e.g. a
GFN beyond max_mapped_pfn. Aiui such a "set" may then require
table population, which may still yield -ENOMEM (at least EPT
looks to return -ENOENT in this case instead; I guess I'll make
a patch).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 00/18] VM forking
  2020-01-17  9:12     ` Jan Beulich
@ 2020-01-17 11:15       ` Anthony PERARD
  2020-01-17 14:22         ` Tamas K Lengyel
  2020-01-17 14:25       ` Tamas K Lengyel
  1 sibling, 1 reply; 57+ messages in thread
From: Anthony PERARD @ 2020-01-17 11:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Petre Pircalabu, Tamas K Lengyel, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Stefano Stabellini, Julien Grall, Alexandru Isaila, Xen-devel,
	Roger Pau Monné

On Fri, Jan 17, 2020 at 10:12:14AM +0100, Jan Beulich wrote:
> Please note that my previous mail was _to_ George, with you only
> _cc_-ed. Hence the question was to George, not you. (It is a
> common issue which I keep mentioning on meetings that the
> distinction of To and Cc is often not being honored, albeit
> typically more by senders than recipients.)

Tip: Jan, you could also have started the sentence by "George, " in
addition to properly setting the "To:", it would help a lot I think.

Teaching people about setting properly "To:", and reading it before
reading the email is a lost fight I think. Even so it can be useful to
filter email which needs a response.

Cheers,

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 00/18] VM forking
  2020-01-17 11:15       ` Anthony PERARD
@ 2020-01-17 14:22         ` Tamas K Lengyel
  0 siblings, 0 replies; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-17 14:22 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Jan Beulich, Alexandru Isaila, Xen-devel,
	Roger Pau Monné

On Fri, Jan 17, 2020 at 4:15 AM Anthony PERARD
<anthony.perard@citrix.com> wrote:
>
> On Fri, Jan 17, 2020 at 10:12:14AM +0100, Jan Beulich wrote:
> > Please note that my previous mail was _to_ George, with you only
> > _cc_-ed. Hence the question was to George, not you. (It is a
> > common issue which I keep mentioning on meetings that the
> > distinction of To and Cc is often not being honored, albeit
> > typically more by senders than recipients.)
>
> Tip: Jan, you could also have started the sentence by "George, " in
> addition to properly setting the "To:", it would help a lot I think.
>
> Teaching people about setting properly "To:", and reading it before
> reading the email is a lost fight I think. Even so it can be useful to
> filter email which needs a response.

Yea, +1 for that, it would make addressed questions more apparent.
Gmail (which is what I use) doesn't break out the email header by
default with separate lines for to: and cc:, all recipients are in a
single line with no distinction between them.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 00/18] VM forking
  2020-01-17  9:12     ` Jan Beulich
  2020-01-17 11:15       ` Anthony PERARD
@ 2020-01-17 14:25       ` Tamas K Lengyel
  1 sibling, 0 replies; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-17 14:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Anthony PERARD, Julien Grall, Alexandru Isaila, Xen-devel,
	Roger Pau Monné

> > Provided this is v4 now of the series and no issues
> > were raised so far for these particular patches they can be merged
> > with your Reviewed-by.
>
> I don't think so, under the current (sufficiently) common
> understanding of the rules. See George's proposal to change to a
> model like what you imply:
> https://lists.xenproject.org/archives/html/xen-devel/2020-01/msg00885.html
>

Ah OK, I though that was already agreed upon. I would certainly prefer
that model to speed things up and reduce the hassle to work with code
noone else maintains then me.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 11/18] x86/mem_sharing: ASSERT that p2m_set_entry succeeds
  2020-01-17  9:23       ` Jan Beulich
@ 2020-01-17 16:59         ` Tamas K Lengyel
  0 siblings, 0 replies; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-17 16:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel, Roger Pau Monné

On Fri, Jan 17, 2020 at 2:23 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 16.01.2020 17:12, Tamas K Lengyel wrote:
> > On Thu, Jan 16, 2020 at 9:07 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 08.01.2020 18:14, Tamas K Lengyel wrote:
> >>> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> >>> ---
> >>>  xen/arch/x86/mm/mem_sharing.c | 42 +++++++++++++++++------------------
> >>>  1 file changed, 21 insertions(+), 21 deletions(-)
> >>>
> >>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> >>> index 93e7605900..3f36cd6bbc 100644
> >>> --- a/xen/arch/x86/mm/mem_sharing.c
> >>> +++ b/xen/arch/x86/mm/mem_sharing.c
> >>> @@ -1117,11 +1117,19 @@ int add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh,
> >>>          goto err_unlock;
> >>>      }
> >>>
> >>> +    /*
> >>> +     * Must succeed, we just read the entry and hold the p2m lock
> >>> +     * via get_two_gfns.
> >>> +     */
> >>>      ret = p2m_set_entry(p2m, _gfn(cgfn), smfn, PAGE_ORDER_4K,
> >>>                          p2m_ram_shared, a);
> >>> +    ASSERT(!ret);
> >>
> >> And there's no risk of -ENOMEM because of needing to split a
> >> larger order page? At the very least the reasoning in the
> >> comment would need further extending.
> >
> > No because we are plugging a hole in the domain. There is no larger
> > page mapped in here that would need to be broken up.
>
> p2m_is_hole() also covers p2m_mmio_dm and p2m_invalid. The former
> (should really be the latter) is what you'll get back for e.g. a
> GFN beyond max_mapped_pfn. Aiui such a "set" may then require
> table population, which may still yield -ENOMEM (at least EPT
> looks to return -ENOENT in this case instead; I guess I'll make
> a patch).

Yes, actually that is what's expected in the fork case to happen since
the fork has no entries in its EPT when it starts at all. So there
will be allocations happening there for the pagetable entries. But for
forks that's not of concern since we'll setup the same HAP allocation
the parent VM has during the fork hypercall. So it is guaranteed that
the fork will have the same amount of memory for its pagetables its
parent has.

Now as for using add_to_physmap on a non-forked VM when plugging a
hole like that, yes, I guess there is the possibility that the VM is
going to run out of space for its pagetable. So I guess we should skip
this patch.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 13/18] x86/mem_sharing: Skip xen heap pages in memshr nominate
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 13/18] x86/mem_sharing: Skip xen heap pages in memshr nominate Tamas K Lengyel
@ 2020-01-20 16:23   ` Jan Beulich
  2020-01-20 16:32     ` Tamas K Lengyel
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2020-01-20 16:23 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	xen-devel, Roger Pau Monné

On 08.01.2020 18:14, Tamas K Lengyel wrote:
> Trying to share these would fail anyway, better to skip them early.
> 
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
albeit I wonder if this couldn't be further generalized by ...

> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -852,6 +852,11 @@ static int nominate_page(struct domain *d, gfn_t gfn,
>      if ( !p2m_is_sharable(p2mt) )
>          goto out;
>  
> +    /* Skip xen heap pages */
> +    page = mfn_to_page(mfn);
> +    if ( !page || is_xen_heap_page(page) )
> +        goto out;

... checking for a zero type ref count (the only means to permit
a type change) here, and maybe also ->count_info to fit what
page_make_sharable() expects.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 13/18] x86/mem_sharing: Skip xen heap pages in memshr nominate
  2020-01-20 16:23   ` Jan Beulich
@ 2020-01-20 16:32     ` Tamas K Lengyel
  2020-01-20 16:38       ` Jan Beulich
  0 siblings, 1 reply; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-20 16:32 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel, Roger Pau Monné

On Mon, Jan 20, 2020 at 9:23 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 08.01.2020 18:14, Tamas K Lengyel wrote:
> > Trying to share these would fail anyway, better to skip them early.
> >
> > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> albeit I wonder if this couldn't be further generalized by ...
>
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -852,6 +852,11 @@ static int nominate_page(struct domain *d, gfn_t gfn,
> >      if ( !p2m_is_sharable(p2mt) )
> >          goto out;
> >
> > +    /* Skip xen heap pages */
> > +    page = mfn_to_page(mfn);
> > +    if ( !page || is_xen_heap_page(page) )
> > +        goto out;
>
> ... checking for a zero type ref count (the only means to permit
> a type change) here, and maybe also ->count_info to fit what
> page_make_sharable() expects.

Not sure I follow you, type count is checked by page_make_sharable but
it has to be exactly 1:

    /* Check if page is already typed and bail early if it is */
    if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
    {
        spin_unlock(&d->page_alloc_lock);
        return -EEXIST;
    }

I specifically want to avoid calling page_make_sharable on xen heap
pages because they end up printing an error to the console which is
very annoying.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 14/18] x86/mem_sharing: check page type count earlier
  2020-01-08 17:14 ` [Xen-devel] [PATCH v4 14/18] x86/mem_sharing: check page type count earlier Tamas K Lengyel
@ 2020-01-20 16:34   ` Jan Beulich
  2020-01-20 16:46     ` Tamas K Lengyel
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2020-01-20 16:34 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	xen-devel, Roger Pau Monné

On 08.01.2020 18:14, Tamas K Lengyel wrote:
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -652,19 +652,18 @@ static int page_make_sharable(struct domain *d,
>          return -EBUSY;
>      }
>  
> -    /* Change page type and count atomically */
> -    if ( !get_page_and_type(page, d, PGT_shared_page) )
> +    /* Check if page is already typed and bail early if it is */
> +    if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
>      {
>          spin_unlock(&d->page_alloc_lock);
> -        return -EINVAL;
> +        return -EEXIST;
>      }
>  
> -    /* Check it wasn't already sharable and undo if it was */
> -    if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
> +    /* Change page type and count atomically */
> +    if ( !get_page_and_type(page, d, PGT_shared_page) )
>      {
>          spin_unlock(&d->page_alloc_lock);
> -        put_page_and_type(page);
> -        return -EEXIST;
> +        return -EINVAL;
>      }

It would seem to me that either the original or the new code cannot
have worked / work: The original variant checked the count _after_
having incremented it, i.e. it expected a 0->1 transition. The new
code checks that the count is 1 _before_ doing the get.

However, even if this was changed to

    if ( page->u.inuse.type_info & PGT_count_mask )

I would recommend against the change: Aiui you build upon the fact
that a transition to PGT_shared_page can happen only here, and this
code holds d->page_alloc_lock. But imo this is making the code more
fragile. In fact I can't easily see why the other two cases where
PGT_shared_page gets passed to get_page_and_type() can't also
effect a 0->1 transition. I can only guess from their BUG_ON()-s
that they assume a reference was already acquired somewhere else.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 13/18] x86/mem_sharing: Skip xen heap pages in memshr nominate
  2020-01-20 16:32     ` Tamas K Lengyel
@ 2020-01-20 16:38       ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2020-01-20 16:38 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel, Roger Pau Monné

On 20.01.2020 17:32, Tamas K Lengyel wrote:
> On Mon, Jan 20, 2020 at 9:23 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 08.01.2020 18:14, Tamas K Lengyel wrote:
>>> Trying to share these would fail anyway, better to skip them early.
>>>
>>> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
>>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> albeit I wonder if this couldn't be further generalized by ...
>>
>>> --- a/xen/arch/x86/mm/mem_sharing.c
>>> +++ b/xen/arch/x86/mm/mem_sharing.c
>>> @@ -852,6 +852,11 @@ static int nominate_page(struct domain *d, gfn_t gfn,
>>>      if ( !p2m_is_sharable(p2mt) )
>>>          goto out;
>>>
>>> +    /* Skip xen heap pages */
>>> +    page = mfn_to_page(mfn);
>>> +    if ( !page || is_xen_heap_page(page) )
>>> +        goto out;
>>
>> ... checking for a zero type ref count (the only means to permit
>> a type change) here, and maybe also ->count_info to fit what
>> page_make_sharable() expects.
> 
> Not sure I follow you, type count is checked by page_make_sharable but
> it has to be exactly 1:
> 
>     /* Check if page is already typed and bail early if it is */
>     if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
>     {
>         spin_unlock(&d->page_alloc_lock);
>         return -EEXIST;
>     }

Which is after a successful get_page_and_type(). Prior to that,
therefore, the count ought to be zero. But maybe I'm very confused
- see also my comments on patch 14, where I spotted this very same
anomaly.

> I specifically want to avoid calling page_make_sharable on xen heap
> pages because they end up printing an error to the console which is
> very annoying.

That's fine. I'm not asking to drop what you're doing. Instead I'm
asking whether you couldn't bail early in even more cases.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Xen-devel] [PATCH v4 14/18] x86/mem_sharing: check page type count earlier
  2020-01-20 16:34   ` Jan Beulich
@ 2020-01-20 16:46     ` Tamas K Lengyel
  0 siblings, 0 replies; 57+ messages in thread
From: Tamas K Lengyel @ 2020-01-20 16:46 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel, Roger Pau Monné

On Mon, Jan 20, 2020 at 9:34 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 08.01.2020 18:14, Tamas K Lengyel wrote:
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -652,19 +652,18 @@ static int page_make_sharable(struct domain *d,
> >          return -EBUSY;
> >      }
> >
> > -    /* Change page type and count atomically */
> > -    if ( !get_page_and_type(page, d, PGT_shared_page) )
> > +    /* Check if page is already typed and bail early if it is */
> > +    if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
> >      {
> >          spin_unlock(&d->page_alloc_lock);
> > -        return -EINVAL;
> > +        return -EEXIST;
> >      }
> >
> > -    /* Check it wasn't already sharable and undo if it was */
> > -    if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
> > +    /* Change page type and count atomically */
> > +    if ( !get_page_and_type(page, d, PGT_shared_page) )
> >      {
> >          spin_unlock(&d->page_alloc_lock);
> > -        put_page_and_type(page);
> > -        return -EEXIST;
> > +        return -EINVAL;
> >      }
>
> It would seem to me that either the original or the new code cannot
> have worked / work: The original variant checked the count _after_
> having incremented it, i.e. it expected a 0->1 transition. The new
> code checks that the count is 1 _before_ doing the get.
>
> However, even if this was changed to
>
>     if ( page->u.inuse.type_info & PGT_count_mask )
>
> I would recommend against the change: Aiui you build upon the fact
> that a transition to PGT_shared_page can happen only here, and this
> code holds d->page_alloc_lock. But imo this is making the code more
> fragile. In fact I can't easily see why the other two cases where
> PGT_shared_page gets passed to get_page_and_type() can't also
> effect a 0->1 transition. I can only guess from their BUG_ON()-s
> that they assume a reference was already acquired somewhere else.

Hm, right, it certainly looks like this patch isn't needed. It has
been a while now and I don't recall why exactly I was moving the type
count check, it might have just been while I was experimenting and it
never got reverted.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2020-01-20 16:58 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-08 17:13 [Xen-devel] [PATCH v4 00/18] VM forking Tamas K Lengyel
2020-01-08 17:13 ` [Xen-devel] [PATCH v4 01/18] x86/hvm: introduce hvm_copy_context_and_params Tamas K Lengyel
2020-01-16 12:27   ` Jan Beulich
2020-01-16 14:06     ` Tamas K Lengyel
2020-01-08 17:13 ` [Xen-devel] [PATCH v4 02/18] xen/x86: Make hap_get_allocation accessible Tamas K Lengyel
2020-01-08 17:14 ` [Xen-devel] [PATCH v4 03/18] x86/mem_sharing: make get_two_gfns take locks conditionally Tamas K Lengyel
2020-01-08 17:14 ` [Xen-devel] [PATCH v4 04/18] x86/mem_sharing: drop flags from mem_sharing_unshare_page Tamas K Lengyel
2020-01-08 17:14 ` [Xen-devel] [PATCH v4 05/18] x86/mem_sharing: don't try to unshare twice during page fault Tamas K Lengyel
2020-01-16 14:53   ` Jan Beulich
2020-01-16 15:59     ` Tamas K Lengyel
2020-01-16 16:03       ` Jan Beulich
2020-01-08 17:14 ` [Xen-devel] [PATCH v4 06/18] x86/mem_sharing: define mem_sharing_domain to hold some scattered variables Tamas K Lengyel
2020-01-16 15:23   ` Jan Beulich
2020-01-16 16:05     ` Tamas K Lengyel
2020-01-16 16:08       ` Jan Beulich
2020-01-08 17:14 ` [Xen-devel] [PATCH v4 07/18] x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in relinquish_shared_pages Tamas K Lengyel
2020-01-16 15:40   ` Jan Beulich
2020-01-08 17:14 ` [Xen-devel] [PATCH v4 08/18] x86/mem_sharing: Make add_to_physmap static and shorten name Tamas K Lengyel
2020-01-16 15:40   ` Jan Beulich
2020-01-08 17:14 ` [Xen-devel] [PATCH v4 09/18] x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool Tamas K Lengyel
2020-01-16 15:42   ` Jan Beulich
2020-01-08 17:14 ` [Xen-devel] [PATCH v4 10/18] x86/mem_sharing: Replace MEM_SHARING_DEBUG with gdprintk Tamas K Lengyel
2020-01-16 16:01   ` Jan Beulich
2020-01-08 17:14 ` [Xen-devel] [PATCH v4 11/18] x86/mem_sharing: ASSERT that p2m_set_entry succeeds Tamas K Lengyel
2020-01-16 16:07   ` Jan Beulich
2020-01-16 16:12     ` Tamas K Lengyel
2020-01-17  9:23       ` Jan Beulich
2020-01-17 16:59         ` Tamas K Lengyel
2020-01-08 17:14 ` [Xen-devel] [PATCH v4 12/18] x86/mem_sharing: Enable mem_sharing on first memop Tamas K Lengyel
2020-01-16 16:18   ` Jan Beulich
2020-01-16 16:34     ` Tamas K Lengyel
2020-01-08 17:14 ` [Xen-devel] [PATCH v4 13/18] x86/mem_sharing: Skip xen heap pages in memshr nominate Tamas K Lengyel
2020-01-20 16:23   ` Jan Beulich
2020-01-20 16:32     ` Tamas K Lengyel
2020-01-20 16:38       ` Jan Beulich
2020-01-08 17:14 ` [Xen-devel] [PATCH v4 14/18] x86/mem_sharing: check page type count earlier Tamas K Lengyel
2020-01-20 16:34   ` Jan Beulich
2020-01-20 16:46     ` Tamas K Lengyel
2020-01-08 17:14 ` [Xen-devel] [PATCH v4 15/18] xen/mem_sharing: VM forking Tamas K Lengyel
2020-01-09 10:28   ` Julien Grall
2020-01-09 13:41     ` Tamas K Lengyel
2020-01-09 15:10       ` Roger Pau Monné
2020-01-09 15:34         ` Jan Beulich
2020-01-09 15:57           ` Tamas K Lengyel
2020-01-09 16:03             ` Jan Beulich
2020-01-09 16:06               ` Tamas K Lengyel
2020-01-09 15:54         ` Tamas K Lengyel
2020-01-08 17:14 ` [Xen-devel] [PATCH v4 16/18] xen/mem_access: Use __get_gfn_type_access in set_mem_access Tamas K Lengyel
2020-01-08 17:14 ` [Xen-devel] [PATCH v4 17/18] x86/mem_sharing: reset a fork Tamas K Lengyel
2020-01-09 10:30   ` Julien Grall
2020-01-08 17:14 ` [Xen-devel] [PATCH v4 18/18] xen/tools: VM forking toolstack side Tamas K Lengyel
2020-01-16 15:47 ` [Xen-devel] [PATCH v4 00/18] VM forking Jan Beulich
2020-01-16 16:24   ` Tamas K Lengyel
2020-01-17  9:12     ` Jan Beulich
2020-01-17 11:15       ` Anthony PERARD
2020-01-17 14:22         ` Tamas K Lengyel
2020-01-17 14:25       ` Tamas K Lengyel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).