All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xen-devel] [PATCH v2 00/20] VM forking
@ 2019-12-18 19:40 Tamas K Lengyel
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible Tamas K Lengyel
                   ` (20 more replies)
  0 siblings, 21 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Petre Pircalabu, Tamas K Lengyel, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Anthony PERARD, Stefano Stabellini, Jan Beulich,
	Alexandru Isaila, Julien Grall, Roger Pau Monné

The following series implements VM forking for Intel HVM guests to allow for
the fast creation of identical VMs without the assosciated high startup costs
of booting or restoring the VM from a savefile.

JIRA issue: https://xenproject.atlassian.net/browse/XEN-89

The main design goal with this series has been to reduce the time of creating
the VM fork as much as possible. To achieve this the VM forking process is
split into two steps:
    1) forking the VM on the hypervisor side;
    2) starting QEMU to handle the backed for emulated devices.

Step 1) involves creating a VM using the new "xl fork-vm" command. The
parent VM is expected to remain paused after forks are created from it (which
is different then what process forking normally entails). During this forking
operation the HVM context and VM settings are copied over to the new forked VM.
This operation is fast and it allows the forked VM to be unpaused and to be
monitored and accessed via VMI. Note however that without its device model
running (depending on what is executing in the VM) it is bound to
misbehave/crash when its trying to access devices that would be emulated by
QEMU. We anticipate that for certain use-cases this would be an acceptable
situation, in case for example when fuzzing is performed of code segments that
don't access such devices.

Step 2) involves launching QEMU to support the forked VM, which requires the
QEMU Xen savefile to be generated manually from the parent VM. This can be
accomplished simply by connecting to its QMP socket and issuing the
"xen-save-devices-state" command as documented by QEMU:
https://github.com/qemu/qemu/blob/master/docs/xen-save-devices-state.txt
Once the QEMU Xen savefile is generated the new "xl fork-launch-dm" command is
used to launch QEMU and load the specified savefile for it.

At runtime the forked VM starts running with an empty p2m which gets lazily
populated when the VM generates EPT faults, similar to how altp2m views are
populated. If the memory access is a read-only access, the p2m entry is
populated with a memory shared entry with its parent. For write memory accesses
or in case memory sharing wasn't possible (for example in case a reference is
held by a third party), a new page is allocated and the page contents are
copied over from the parent VM. Forks can be further forked if needed, thus
allowing for further memory savings.

A VM fork reset hypercall is also added that allows the fork to be reset to the
state it was just after a fork. This is an optimization for cases where the
forks are very short-lived and run without a device model, so resetting saves
some time compared to creating a brand new fork.

The series has been tested with both Linux and Windows VMs and functions as
expected. VM forking time has been measured to be 0.018s, device model launch
to be around 1s depending largely on the number of devices being emulated.

Patches 1-2 implement changes to existing internal Xen APIs to make VM forking
possible.

Patches 3-4 are simple code-formatting fixes for the toolstack and Xen for the
memory sharing paths with no functional changes.

Patches 5-16 are code-cleanups and adjustments of to Xen memory sharing
subsystem with no functional changes.

Patch 17 adds the hypervisor-side code implementing VM forking.

Patch 18 is integration of mem_access with forked VMs.

Patch 19 implements the VM fork reset operation hypervisor side bits.

Patch 20 adds the toolstack-side code implementing VM forking and reset.

Tamas K Lengyel (20):
  x86: make hvm_{get/set}_param accessible
  xen/x86: Make hap_get_allocation accessible
  tools/libxc: clean up memory sharing files
  x86/mem_sharing: cleanup code and comments in various locations
  x86/mem_sharing: make get_two_gfns take locks conditionally
  x86/mem_sharing: drop flags from mem_sharing_unshare_page
  x86/mem_sharing: don't try to unshare twice during page fault
  x86/mem_sharing: define mem_sharing_domain to hold some scattered
    variables
  x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in
    relinquish_shared_pages
  x86/mem_sharing: Make add_to_physmap static and shorten name
  x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool
  x86/mem_sharing: Replace MEM_SHARING_DEBUG with gdprintk
  x86/mem_sharing: ASSERT that p2m_set_entry succeeds
  x86/mem_sharing: Enable mem_sharing on first memop
  x86/mem_sharing: Skip xen heap pages in memshr nominate
  x86/mem_sharing: check page type count earlier
  xen/mem_sharing: VM forking
  xen/mem_access: Use __get_gfn_type_access in set_mem_access
  x86/mem_sharing: reset a fork
  xen/tools: VM forking toolstack side

 tools/libxc/include/xenctrl.h     |  30 +-
 tools/libxc/xc_memshr.c           |  34 +-
 tools/libxl/libxl.h               |   7 +
 tools/libxl/libxl_create.c        | 237 +++++---
 tools/libxl/libxl_dm.c            |   2 +-
 tools/libxl/libxl_dom.c           |  83 ++-
 tools/libxl/libxl_internal.h      |   1 +
 tools/libxl/libxl_types.idl       |   1 +
 tools/xl/xl.h                     |   5 +
 tools/xl/xl_cmdtable.c            |  22 +
 tools/xl/xl_saverestore.c         |  96 ++++
 tools/xl/xl_vmcontrol.c           |   8 +
 xen/arch/x86/hvm/hvm.c            | 206 ++++---
 xen/arch/x86/mm/hap/hap.c         |   3 +-
 xen/arch/x86/mm/mem_access.c      |   5 +-
 xen/arch/x86/mm/mem_sharing.c     | 875 +++++++++++++++++++++---------
 xen/arch/x86/mm/p2m.c             |  34 +-
 xen/common/memory.c               |   2 +-
 xen/drivers/passthrough/pci.c     |   3 +-
 xen/include/asm-x86/hap.h         |   1 +
 xen/include/asm-x86/hvm/domain.h  |   6 +-
 xen/include/asm-x86/hvm/hvm.h     |   4 +
 xen/include/asm-x86/mem_sharing.h |  84 ++-
 xen/include/asm-x86/p2m.h         |  14 +-
 xen/include/public/memory.h       |   6 +
 xen/include/xen/sched.h           |   1 +
 26 files changed, 1258 insertions(+), 512 deletions(-)

-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-19 19:07   ` Andrew Cooper
  2019-12-20 16:46   ` Jan Beulich
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 02/20] xen/x86: Make hap_get_allocation accessible Tamas K Lengyel
                   ` (19 subsequent siblings)
  20 siblings, 2 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Tamas K Lengyel, Wei Liu, Jan Beulich,
	Roger Pau Monné

Currently the hvm parameters are only accessible via the HVMOP hypercalls. By
exposing hvm_{get/set}_param it will be possible for VM forking to copy the
parameters directly into the clone domain.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/hvm/hvm.c        | 169 ++++++++++++++++++++--------------
 xen/include/asm-x86/hvm/hvm.h |   4 +
 2 files changed, 106 insertions(+), 67 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 614ed60fe4..5a3a962fbb 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4072,16 +4072,17 @@ static int hvmop_set_evtchn_upcall_vector(
 }
 
 static int hvm_allow_set_param(struct domain *d,
-                               const struct xen_hvm_param *a)
+                               uint32_t index,
+                               uint64_t new_value)
 {
-    uint64_t value = d->arch.hvm.params[a->index];
+    uint64_t value = d->arch.hvm.params[index];
     int rc;
 
     rc = xsm_hvm_param(XSM_TARGET, d, HVMOP_set_param);
     if ( rc )
         return rc;
 
-    switch ( a->index )
+    switch ( index )
     {
     /* The following parameters can be set by the guest. */
     case HVM_PARAM_CALLBACK_IRQ:
@@ -4114,7 +4115,7 @@ static int hvm_allow_set_param(struct domain *d,
     if ( rc )
         return rc;
 
-    switch ( a->index )
+    switch ( index )
     {
     /* The following parameters should only be changed once. */
     case HVM_PARAM_VIRIDIAN:
@@ -4124,7 +4125,7 @@ static int hvm_allow_set_param(struct domain *d,
     case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
     case HVM_PARAM_ALTP2M:
     case HVM_PARAM_MCA_CAP:
-        if ( value != 0 && a->value != value )
+        if ( value != 0 && new_value != value )
             rc = -EEXIST;
         break;
     default:
@@ -4134,13 +4135,11 @@ static int hvm_allow_set_param(struct domain *d,
     return rc;
 }
 
-static int hvmop_set_param(
+int hvmop_set_param(
     XEN_GUEST_HANDLE_PARAM(xen_hvm_param_t) arg)
 {
-    struct domain *curr_d = current->domain;
     struct xen_hvm_param a;
     struct domain *d;
-    struct vcpu *v;
     int rc;
 
     if ( copy_from_guest(&a, arg, 1) )
@@ -4160,23 +4159,42 @@ static int hvmop_set_param(
     if ( !is_hvm_domain(d) )
         goto out;
 
-    rc = hvm_allow_set_param(d, &a);
+    rc = hvm_set_param(d, a.index, a.value);
+
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+int hvm_set_param(
+    struct domain *d,
+    uint32_t index,
+    uint64_t value)
+{
+    struct domain *curr_d = current->domain;
+    int rc;
+    struct vcpu *v;
+
+    if ( index >= HVM_NR_PARAMS )
+        return -EINVAL;
+
+    rc = hvm_allow_set_param(d, index, value);
     if ( rc )
         goto out;
 
-    switch ( a.index )
+    switch ( index )
     {
     case HVM_PARAM_CALLBACK_IRQ:
-        hvm_set_callback_via(d, a.value);
+        hvm_set_callback_via(d, value);
         hvm_latch_shinfo_size(d);
         break;
     case HVM_PARAM_TIMER_MODE:
-        if ( a.value > HVMPTM_one_missed_tick_pending )
+        if ( value > HVMPTM_one_missed_tick_pending )
             rc = -EINVAL;
         break;
     case HVM_PARAM_VIRIDIAN:
-        if ( (a.value & ~HVMPV_feature_mask) ||
-             !(a.value & HVMPV_base_freq) )
+        if ( (value & ~HVMPV_feature_mask) ||
+             !(value & HVMPV_base_freq) )
             rc = -EINVAL;
         break;
     case HVM_PARAM_IDENT_PT:
@@ -4186,7 +4204,7 @@ static int hvmop_set_param(
          */
         if ( !paging_mode_hap(d) || !cpu_has_vmx )
         {
-            d->arch.hvm.params[a.index] = a.value;
+            d->arch.hvm.params[index] = value;
             break;
         }
 
@@ -4201,7 +4219,7 @@ static int hvmop_set_param(
 
         rc = 0;
         domain_pause(d);
-        d->arch.hvm.params[a.index] = a.value;
+        d->arch.hvm.params[index] = value;
         for_each_vcpu ( d, v )
             paging_update_cr3(v, false);
         domain_unpause(d);
@@ -4210,23 +4228,23 @@ static int hvmop_set_param(
         break;
     case HVM_PARAM_DM_DOMAIN:
         /* The only value this should ever be set to is DOMID_SELF */
-        if ( a.value != DOMID_SELF )
+        if ( value != DOMID_SELF )
             rc = -EINVAL;
 
-        a.value = curr_d->domain_id;
+        value = curr_d->domain_id;
         break;
     case HVM_PARAM_ACPI_S_STATE:
         rc = 0;
-        if ( a.value == 3 )
+        if ( value == 3 )
             hvm_s3_suspend(d);
-        else if ( a.value == 0 )
+        else if ( value == 0 )
             hvm_s3_resume(d);
         else
             rc = -EINVAL;
 
         break;
     case HVM_PARAM_ACPI_IOPORTS_LOCATION:
-        rc = pmtimer_change_ioport(d, a.value);
+        rc = pmtimer_change_ioport(d, value);
         break;
     case HVM_PARAM_MEMORY_EVENT_CR0:
     case HVM_PARAM_MEMORY_EVENT_CR3:
@@ -4241,24 +4259,24 @@ static int hvmop_set_param(
         rc = xsm_hvm_param_nested(XSM_PRIV, d);
         if ( rc )
             break;
-        if ( a.value > 1 )
+        if ( value > 1 )
             rc = -EINVAL;
         /*
          * Remove the check below once we have
          * shadow-on-shadow.
          */
-        if ( !paging_mode_hap(d) && a.value )
+        if ( !paging_mode_hap(d) && value )
             rc = -EINVAL;
-        if ( a.value &&
+        if ( value &&
              d->arch.hvm.params[HVM_PARAM_ALTP2M] )
             rc = -EINVAL;
         /* Set up NHVM state for any vcpus that are already up. */
-        if ( a.value &&
+        if ( value &&
              !d->arch.hvm.params[HVM_PARAM_NESTEDHVM] )
             for_each_vcpu(d, v)
                 if ( rc == 0 )
                     rc = nestedhvm_vcpu_initialise(v);
-        if ( !a.value || rc )
+        if ( !value || rc )
             for_each_vcpu(d, v)
                 nestedhvm_vcpu_destroy(v);
         break;
@@ -4266,30 +4284,30 @@ static int hvmop_set_param(
         rc = xsm_hvm_param_altp2mhvm(XSM_PRIV, d);
         if ( rc )
             break;
-        if ( a.value > XEN_ALTP2M_limited )
+        if ( value > XEN_ALTP2M_limited )
             rc = -EINVAL;
-        if ( a.value &&
+        if ( value &&
              d->arch.hvm.params[HVM_PARAM_NESTEDHVM] )
             rc = -EINVAL;
         break;
     case HVM_PARAM_TRIPLE_FAULT_REASON:
-        if ( a.value > SHUTDOWN_MAX )
+        if ( value > SHUTDOWN_MAX )
             rc = -EINVAL;
         break;
     case HVM_PARAM_IOREQ_SERVER_PFN:
-        d->arch.hvm.ioreq_gfn.base = a.value;
+        d->arch.hvm.ioreq_gfn.base = value;
         break;
     case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
     {
         unsigned int i;
 
-        if ( a.value == 0 ||
-             a.value > sizeof(d->arch.hvm.ioreq_gfn.mask) * 8 )
+        if ( value == 0 ||
+             value > sizeof(d->arch.hvm.ioreq_gfn.mask) * 8 )
         {
             rc = -EINVAL;
             break;
         }
-        for ( i = 0; i < a.value; i++ )
+        for ( i = 0; i < value; i++ )
             set_bit(i, &d->arch.hvm.ioreq_gfn.mask);
 
         break;
@@ -4301,35 +4319,35 @@ static int hvmop_set_param(
                      sizeof(d->arch.hvm.ioreq_gfn.legacy_mask) * 8);
         BUILD_BUG_ON(HVM_PARAM_BUFIOREQ_PFN >
                      sizeof(d->arch.hvm.ioreq_gfn.legacy_mask) * 8);
-        if ( a.value )
-            set_bit(a.index, &d->arch.hvm.ioreq_gfn.legacy_mask);
+        if ( value )
+            set_bit(index, &d->arch.hvm.ioreq_gfn.legacy_mask);
         break;
 
     case HVM_PARAM_X87_FIP_WIDTH:
-        if ( a.value != 0 && a.value != 4 && a.value != 8 )
+        if ( value != 0 && value != 4 && value != 8 )
         {
             rc = -EINVAL;
             break;
         }
-        d->arch.x87_fip_width = a.value;
+        d->arch.x87_fip_width = value;
         break;
 
     case HVM_PARAM_VM86_TSS:
         /* Hardware would silently truncate high bits. */
-        if ( a.value != (uint32_t)a.value )
+        if ( value != (uint32_t)value )
         {
             if ( d == curr_d )
                 domain_crash(d);
             rc = -EINVAL;
         }
         /* Old hvmloader binaries hardcode the size to 128 bytes. */
-        if ( a.value )
-            a.value |= (128ULL << 32) | VM86_TSS_UPDATED;
-        a.index = HVM_PARAM_VM86_TSS_SIZED;
+        if ( value )
+            value |= (128ULL << 32) | VM86_TSS_UPDATED;
+        index = HVM_PARAM_VM86_TSS_SIZED;
         break;
 
     case HVM_PARAM_VM86_TSS_SIZED:
-        if ( (a.value >> 32) < sizeof(struct tss32) )
+        if ( (value >> 32) < sizeof(struct tss32) )
         {
             if ( d == curr_d )
                 domain_crash(d);
@@ -4340,34 +4358,33 @@ static int hvmop_set_param(
          * 256 bits interrupt redirection bitmap + 64k bits I/O bitmap
          * plus one padding byte).
          */
-        if ( (a.value >> 32) > sizeof(struct tss32) +
+        if ( (value >> 32) > sizeof(struct tss32) +
                                (0x100 / 8) + (0x10000 / 8) + 1 )
-            a.value = (uint32_t)a.value |
+            value = (uint32_t)value |
                       ((sizeof(struct tss32) + (0x100 / 8) +
                                                (0x10000 / 8) + 1) << 32);
-        a.value |= VM86_TSS_UPDATED;
+        value |= VM86_TSS_UPDATED;
         break;
 
     case HVM_PARAM_MCA_CAP:
-        rc = vmce_enable_mca_cap(d, a.value);
+        rc = vmce_enable_mca_cap(d, value);
         break;
     }
 
     if ( rc != 0 )
         goto out;
 
-    d->arch.hvm.params[a.index] = a.value;
+    d->arch.hvm.params[index] = value;
 
     HVM_DBG_LOG(DBG_LEVEL_HCALL, "set param %u = %"PRIx64,
-                a.index, a.value);
+                index, value);
 
  out:
-    rcu_unlock_domain(d);
     return rc;
 }
 
 static int hvm_allow_get_param(struct domain *d,
-                               const struct xen_hvm_param *a)
+                               uint32_t index)
 {
     int rc;
 
@@ -4375,7 +4392,7 @@ static int hvm_allow_get_param(struct domain *d,
     if ( rc )
         return rc;
 
-    switch ( a->index )
+    switch ( index )
     {
     /* The following parameters can be read by the guest. */
     case HVM_PARAM_CALLBACK_IRQ:
@@ -4429,42 +4446,60 @@ static int hvmop_get_param(
     if ( !is_hvm_domain(d) )
         goto out;
 
-    rc = hvm_allow_get_param(d, &a);
+    rc = hvm_get_param(d, a.index, &a.value);
     if ( rc )
         goto out;
 
-    switch ( a.index )
+    rc = __copy_to_guest(arg, &a, 1) ? -EFAULT : 0;
+
+    HVM_DBG_LOG(DBG_LEVEL_HCALL, "get param %u = %"PRIx64,
+                a.index, a.value);
+
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+int hvm_get_param(
+    struct domain *d,
+    uint32_t index,
+    uint64_t *value)
+{
+    int rc;
+
+    if ( index >= HVM_NR_PARAMS || !value )
+        return -EINVAL;
+
+    rc = hvm_allow_get_param(d, index);
+    if ( rc )
+        return rc;
+
+    switch ( index )
     {
     case HVM_PARAM_ACPI_S_STATE:
-        a.value = d->arch.hvm.is_s3_suspended ? 3 : 0;
+        *value = d->arch.hvm.is_s3_suspended ? 3 : 0;
         break;
 
     case HVM_PARAM_VM86_TSS:
-        a.value = (uint32_t)d->arch.hvm.params[HVM_PARAM_VM86_TSS_SIZED];
+        *value = (uint32_t)d->arch.hvm.params[HVM_PARAM_VM86_TSS_SIZED];
         break;
 
     case HVM_PARAM_VM86_TSS_SIZED:
-        a.value = d->arch.hvm.params[HVM_PARAM_VM86_TSS_SIZED] &
-                  ~VM86_TSS_UPDATED;
+        *value = d->arch.hvm.params[HVM_PARAM_VM86_TSS_SIZED] &
+                   ~VM86_TSS_UPDATED;
         break;
 
     case HVM_PARAM_X87_FIP_WIDTH:
-        a.value = d->arch.x87_fip_width;
+        *value = d->arch.x87_fip_width;
         break;
     default:
-        a.value = d->arch.hvm.params[a.index];
+        *value = d->arch.hvm.params[index];
         break;
     }
 
-    rc = __copy_to_guest(arg, &a, 1) ? -EFAULT : 0;
-
-    HVM_DBG_LOG(DBG_LEVEL_HCALL, "get param %u = %"PRIx64,
-                a.index, a.value);
+    return 0;
+};
 
- out:
-    rcu_unlock_domain(d);
-    return rc;
-}
 
 /*
  * altp2m operations are envisioned as being used in several different
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 1d7b66f927..a6f4ae76a1 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -335,6 +335,10 @@ unsigned long hvm_cr4_guest_valid_bits(const struct domain *d, bool restore);
 bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
                         void *ctxt);
 
+/* Caller must hold domain locks */
+int hvm_get_param(struct domain *d, uint32_t index, uint64_t *value);
+int hvm_set_param(struct domain *d, uint32_t index, uint64_t value);
+
 #ifdef CONFIG_HVM
 
 #define hvm_get_guest_tsc(v) hvm_get_guest_tsc_fixed(v, 0)
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 02/20] xen/x86: Make hap_get_allocation accessible
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-19 19:08   ` Andrew Cooper
  2019-12-20 16:48   ` Jan Beulich
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 03/20] tools/libxc: clean up memory sharing files Tamas K Lengyel
                   ` (18 subsequent siblings)
  20 siblings, 2 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	Jan Beulich, Roger Pau Monné

During VM forking we'll copy the parent domain's parameters to the client,
including the HAP shadow memory setting that is used for storing the domain's
EPT. We'll copy this in the hypervisor instead doing it during toolstack launch
to allow the domain to start executing and unsharing memory before (or
even completely without) the toolstack.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/hap/hap.c | 3 +--
 xen/include/asm-x86/hap.h | 1 +
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 3d93f3451c..c7c7ff6e99 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -321,8 +321,7 @@ static void hap_free_p2m_page(struct domain *d, struct page_info *pg)
 }
 
 /* Return the size of the pool, rounded up to the nearest MB */
-static unsigned int
-hap_get_allocation(struct domain *d)
+unsigned int hap_get_allocation(struct domain *d)
 {
     unsigned int pg = d->arch.paging.hap.total_pages
         + d->arch.paging.hap.p2m_pages;
diff --git a/xen/include/asm-x86/hap.h b/xen/include/asm-x86/hap.h
index b94bfb4ed0..1bf07e49fe 100644
--- a/xen/include/asm-x86/hap.h
+++ b/xen/include/asm-x86/hap.h
@@ -45,6 +45,7 @@ int   hap_track_dirty_vram(struct domain *d,
 
 extern const struct paging_mode *hap_paging_get_mode(struct vcpu *);
 int hap_set_allocation(struct domain *d, unsigned int pages, bool *preempted);
+unsigned int hap_get_allocation(struct domain *d);
 
 #endif /* XEN_HAP_H */
 
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 03/20] tools/libxc: clean up memory sharing files
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible Tamas K Lengyel
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 02/20] xen/x86: Make hap_get_allocation accessible Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 04/20] x86/mem_sharing: cleanup code and comments in various locations Tamas K Lengyel
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Tamas K Lengyel, Wei Liu

No functional changes.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Acked-by: Wei Liu <wl@xen.org>
---
 tools/libxc/include/xenctrl.h | 24 ++++++++++++------------
 tools/libxc/xc_memshr.c       | 12 ++++++------
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index f4431687b3..b5ffa53d55 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2060,7 +2060,7 @@ int xc_monitor_emulate_each_rep(xc_interface *xch, uint32_t domain_id,
  *
  * Sharing is supported only on the x86 architecture in 64 bit mode, with
  * Hardware-Assisted Paging (i.e. Intel EPT, AMD NPT). Moreover, AMD NPT
- * support is considered experimental. 
+ * support is considered experimental.
 
  * Calls below return ENOSYS if not in the x86_64 architecture.
  * Calls below return ENODEV if the domain does not support HAP.
@@ -2107,13 +2107,13 @@ int xc_memshr_control(xc_interface *xch,
  *  EINVAL or EACCESS if the request is denied by the security policy
  */
 
-int xc_memshr_ring_enable(xc_interface *xch, 
+int xc_memshr_ring_enable(xc_interface *xch,
                           uint32_t domid,
                           uint32_t *port);
 /* Disable the ring for ENOMEM communication.
  * May fail with EINVAL if the ring was not enabled in the first place.
  */
-int xc_memshr_ring_disable(xc_interface *xch, 
+int xc_memshr_ring_disable(xc_interface *xch,
                            uint32_t domid);
 
 /*
@@ -2126,7 +2126,7 @@ int xc_memshr_ring_disable(xc_interface *xch,
 int xc_memshr_domain_resume(xc_interface *xch,
                             uint32_t domid);
 
-/* Select a page for sharing. 
+/* Select a page for sharing.
  *
  * A 64 bit opaque handle will be stored in handle.  The hypervisor ensures
  * that if the page is modified, the handle will be invalidated, and future
@@ -2155,7 +2155,7 @@ int xc_memshr_nominate_gref(xc_interface *xch,
 
 /* The three calls below may fail with
  * 10 (or -XENMEM_SHARING_OP_S_HANDLE_INVALID) if the handle passed as source
- * is invalid.  
+ * is invalid.
  * 9 (or -XENMEM_SHARING_OP_C_HANDLE_INVALID) if the handle passed as client is
  * invalid.
  */
@@ -2168,7 +2168,7 @@ int xc_memshr_nominate_gref(xc_interface *xch,
  *
  * After successful sharing, the client handle becomes invalid. Both <domain,
  * gfn> tuples point to the same mfn with the same handle, the one specified as
- * source. Either 3-tuple can be specified later for further re-sharing. 
+ * source. Either 3-tuple can be specified later for further re-sharing.
  */
 int xc_memshr_share_gfns(xc_interface *xch,
                     uint32_t source_domain,
@@ -2193,7 +2193,7 @@ int xc_memshr_share_grefs(xc_interface *xch,
 /* Allows to add to the guest physmap of the client domain a shared frame
  * directly.
  *
- * May additionally fail with 
+ * May additionally fail with
  *  9 (-XENMEM_SHARING_OP_C_HANDLE_INVALID) if the physmap entry for the gfn is
  *  not suitable.
  *  ENOMEM if internal data structures cannot be allocated.
@@ -2222,7 +2222,7 @@ int xc_memshr_range_share(xc_interface *xch,
                           uint64_t last_gfn);
 
 /* Debug calls: return the number of pages referencing the shared frame backing
- * the input argument. Should be one or greater. 
+ * the input argument. Should be one or greater.
  *
  * May fail with EINVAL if there is no backing shared frame for the input
  * argument.
@@ -2235,9 +2235,9 @@ int xc_memshr_debug_gref(xc_interface *xch,
                          uint32_t domid,
                          grant_ref_t gref);
 
-/* Audits the share subsystem. 
- * 
- * Returns ENOSYS if not supported (may not be compiled into the hypervisor). 
+/* Audits the share subsystem.
+ *
+ * Returns ENOSYS if not supported (may not be compiled into the hypervisor).
  *
  * Returns the number of errors found during auditing otherwise. May be (should
  * be!) zero.
@@ -2273,7 +2273,7 @@ long xc_sharing_freed_pages(xc_interface *xch);
  * should return 1. (And dominfo(d) for each of the two domains should return 1
  * as well).
  *
- * Note that some of these sharing_used_frames may be referenced by 
+ * Note that some of these sharing_used_frames may be referenced by
  * a single domain page, and thus not realize any savings. The same
  * applies to some of the pages counted in dominfo(d)->shr_pages.
  */
diff --git a/tools/libxc/xc_memshr.c b/tools/libxc/xc_memshr.c
index d5e135e0d9..5ef56a6933 100644
--- a/tools/libxc/xc_memshr.c
+++ b/tools/libxc/xc_memshr.c
@@ -41,7 +41,7 @@ int xc_memshr_control(xc_interface *xch,
     return do_domctl(xch, &domctl);
 }
 
-int xc_memshr_ring_enable(xc_interface *xch, 
+int xc_memshr_ring_enable(xc_interface *xch,
                           uint32_t domid,
                           uint32_t *port)
 {
@@ -57,7 +57,7 @@ int xc_memshr_ring_enable(xc_interface *xch,
                                port);
 }
 
-int xc_memshr_ring_disable(xc_interface *xch, 
+int xc_memshr_ring_disable(xc_interface *xch,
                            uint32_t domid)
 {
     return xc_vm_event_control(xch, domid,
@@ -85,11 +85,11 @@ int xc_memshr_nominate_gfn(xc_interface *xch,
     memset(&mso, 0, sizeof(mso));
 
     mso.op = XENMEM_sharing_op_nominate_gfn;
-    mso.u.nominate.u.gfn = gfn; 
+    mso.u.nominate.u.gfn = gfn;
 
     rc = xc_memshr_memop(xch, domid, &mso);
 
-    if (!rc) *handle = mso.u.nominate.handle; 
+    if (!rc) *handle = mso.u.nominate.handle;
 
     return rc;
 }
@@ -105,11 +105,11 @@ int xc_memshr_nominate_gref(xc_interface *xch,
     memset(&mso, 0, sizeof(mso));
 
     mso.op = XENMEM_sharing_op_nominate_gref;
-    mso.u.nominate.u.grant_ref = gref; 
+    mso.u.nominate.u.grant_ref = gref;
 
     rc = xc_memshr_memop(xch, domid, &mso);
 
-    if (!rc) *handle = mso.u.nominate.handle; 
+    if (!rc) *handle = mso.u.nominate.handle;
 
     return rc;
 }
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 04/20] x86/mem_sharing: cleanup code and comments in various locations
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
                   ` (2 preceding siblings ...)
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 03/20] tools/libxc: clean up memory sharing files Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-19 11:18   ` Andrew Cooper
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 05/20] x86/mem_sharing: make get_two_gfns take locks conditionally Tamas K Lengyel
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

No functional changes.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/hvm/hvm.c            |  11 +-
 xen/arch/x86/mm/mem_sharing.c     | 342 +++++++++++++++++-------------
 xen/arch/x86/mm/p2m.c             |  17 +-
 xen/include/asm-x86/mem_sharing.h |  51 +++--
 4 files changed, 236 insertions(+), 185 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5a3a962fbb..1e888b403b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1902,12 +1902,11 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
     if ( npfec.write_access && (p2mt == p2m_ram_shared) )
     {
         ASSERT(p2m_is_hostp2m(p2m));
-        sharing_enomem = 
-            (mem_sharing_unshare_page(currd, gfn, 0) < 0);
+        sharing_enomem = mem_sharing_unshare_page(currd, gfn, 0);
         rc = 1;
         goto out_put_gfn;
     }
- 
+
     /* Spurious fault? PoD and log-dirty also take this path. */
     if ( p2m_is_ram(p2mt) )
     {
@@ -1953,9 +1952,11 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
         __put_gfn(p2m, gfn);
     __put_gfn(hostp2m, gfn);
  out:
-    /* All of these are delayed until we exit, since we might 
+    /*
+     * All of these are delayed until we exit, since we might
      * sleep on event ring wait queues, and we must not hold
-     * locks in such circumstance */
+     * locks in such circumstance.
+     */
     if ( paged )
         p2m_mem_paging_populate(currd, gfn);
     if ( sharing_enomem )
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index efb8821768..319aaf3074 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -59,8 +59,10 @@ static DEFINE_PER_CPU(pg_lock_data_t, __pld);
 #define RMAP_USES_HASHTAB(page) \
         ((page)->sharing->hash_table.flag == NULL)
 #define RMAP_HEAVY_SHARED_PAGE   RMAP_HASHTAB_SIZE
-/* A bit of hysteresis. We don't want to be mutating between list and hash
- * table constantly. */
+/*
+ * A bit of hysteresis. We don't want to be mutating between list and hash
+ * table constantly.
+ */
 #define RMAP_LIGHT_SHARED_PAGE   (RMAP_HEAVY_SHARED_PAGE >> 2)
 
 #if MEM_SHARING_AUDIT
@@ -88,7 +90,7 @@ static inline void page_sharing_dispose(struct page_info *page)
 {
     /* Unlikely given our thresholds, but we should be careful. */
     if ( unlikely(RMAP_USES_HASHTAB(page)) )
-        free_xenheap_pages(page->sharing->hash_table.bucket, 
+        free_xenheap_pages(page->sharing->hash_table.bucket,
                             RMAP_HASHTAB_ORDER);
 
     spin_lock(&shr_audit_lock);
@@ -105,7 +107,7 @@ static inline void page_sharing_dispose(struct page_info *page)
 {
     /* Unlikely given our thresholds, but we should be careful. */
     if ( unlikely(RMAP_USES_HASHTAB(page)) )
-        free_xenheap_pages(page->sharing->hash_table.bucket, 
+        free_xenheap_pages(page->sharing->hash_table.bucket,
                             RMAP_HASHTAB_ORDER);
     xfree(page->sharing);
 }
@@ -122,8 +124,8 @@ static inline void page_sharing_dispose(struct page_info *page)
  * Nesting may happen when sharing (and locking) two pages.
  * Deadlock is avoided by locking pages in increasing order.
  * All memory sharing code paths take the p2m lock of the affected gfn before
- * taking the lock for the underlying page. We enforce ordering between page_lock
- * and p2m_lock using an mm-locks.h construct.
+ * taking the lock for the underlying page. We enforce ordering between
+ * page_lock and p2m_lock using an mm-locks.h construct.
  *
  * TODO: Investigate if PGT_validated is necessary.
  */
@@ -168,7 +170,7 @@ static inline bool mem_sharing_page_lock(struct page_info *pg)
     if ( rc )
     {
         preempt_disable();
-        page_sharing_mm_post_lock(&pld->mm_unlock_level, 
+        page_sharing_mm_post_lock(&pld->mm_unlock_level,
                                   &pld->recurse_count);
     }
     return rc;
@@ -178,7 +180,7 @@ static inline void mem_sharing_page_unlock(struct page_info *pg)
 {
     pg_lock_data_t *pld = &(this_cpu(__pld));
 
-    page_sharing_mm_unlock(pld->mm_unlock_level, 
+    page_sharing_mm_unlock(pld->mm_unlock_level,
                            &pld->recurse_count);
     preempt_enable();
     _page_unlock(pg);
@@ -186,7 +188,7 @@ static inline void mem_sharing_page_unlock(struct page_info *pg)
 
 static inline shr_handle_t get_next_handle(void)
 {
-    /* Get the next handle get_page style */ 
+    /* Get the next handle get_page style */
     uint64_t x, y = next_handle;
     do {
         x = y;
@@ -198,24 +200,26 @@ static inline shr_handle_t get_next_handle(void)
 #define mem_sharing_enabled(d) \
     (is_hvm_domain(d) && (d)->arch.hvm.mem_sharing_enabled)
 
-static atomic_t nr_saved_mfns   = ATOMIC_INIT(0); 
+static atomic_t nr_saved_mfns   = ATOMIC_INIT(0);
 static atomic_t nr_shared_mfns  = ATOMIC_INIT(0);
 
-/** Reverse map **/
-/* Every shared frame keeps a reverse map (rmap) of <domain, gfn> tuples that
+/*
+ * Reverse map
+ *
+ * Every shared frame keeps a reverse map (rmap) of <domain, gfn> tuples that
  * this shared frame backs. For pages with a low degree of sharing, a O(n)
  * search linked list is good enough. For pages with higher degree of sharing,
- * we use a hash table instead. */
+ * we use a hash table instead.
+ */
 
 typedef struct gfn_info
 {
     unsigned long gfn;
-    domid_t domain; 
+    domid_t domain;
     struct list_head list;
 } gfn_info_t;
 
-static inline void
-rmap_init(struct page_info *page)
+static inline void rmap_init(struct page_info *page)
 {
     /* We always start off as a doubly linked list. */
     INIT_LIST_HEAD(&page->sharing->gfns);
@@ -225,10 +229,11 @@ rmap_init(struct page_info *page)
 #define HASH(domain, gfn)       \
     (((gfn) + (domain)) % RMAP_HASHTAB_SIZE)
 
-/* Conversions. Tuned by the thresholds. Should only happen twice 
- * (once each) during the lifetime of a shared page */
-static inline int
-rmap_list_to_hash_table(struct page_info *page)
+/*
+ * Conversions. Tuned by the thresholds. Should only happen twice
+ * (once each) during the lifetime of a shared page.
+ */
+static inline int rmap_list_to_hash_table(struct page_info *page)
 {
     unsigned int i;
     struct list_head *pos, *tmp, *b =
@@ -254,8 +259,7 @@ rmap_list_to_hash_table(struct page_info *page)
     return 0;
 }
 
-static inline void
-rmap_hash_table_to_list(struct page_info *page)
+static inline void rmap_hash_table_to_list(struct page_info *page)
 {
     unsigned int i;
     struct list_head *bucket = page->sharing->hash_table.bucket;
@@ -276,8 +280,7 @@ rmap_hash_table_to_list(struct page_info *page)
 }
 
 /* Generic accessors to the rmap */
-static inline unsigned long
-rmap_count(struct page_info *pg)
+static inline unsigned long rmap_count(struct page_info *pg)
 {
     unsigned long count;
     unsigned long t = read_atomic(&pg->u.inuse.type_info);
@@ -287,11 +290,13 @@ rmap_count(struct page_info *pg)
     return count;
 }
 
-/* The page type count is always decreased after removing from the rmap.
- * Use a convert flag to avoid mutating the rmap if in the middle of an 
- * iterator, or if the page will be soon destroyed anyways. */
-static inline void
-rmap_del(gfn_info_t *gfn_info, struct page_info *page, int convert)
+/*
+ * The page type count is always decreased after removing from the rmap.
+ * Use a convert flag to avoid mutating the rmap if in the middle of an
+ * iterator, or if the page will be soon destroyed anyways.
+ */
+static inline
+void rmap_del(gfn_info_t *gfn_info, struct page_info *page, int convert)
 {
     if ( RMAP_USES_HASHTAB(page) && convert &&
          (rmap_count(page) <= RMAP_LIGHT_SHARED_PAGE) )
@@ -302,8 +307,7 @@ rmap_del(gfn_info_t *gfn_info, struct page_info *page, int convert)
 }
 
 /* The page type count is always increased before adding to the rmap. */
-static inline void
-rmap_add(gfn_info_t *gfn_info, struct page_info *page)
+static inline void rmap_add(gfn_info_t *gfn_info, struct page_info *page)
 {
     struct list_head *head;
 
@@ -314,7 +318,7 @@ rmap_add(gfn_info_t *gfn_info, struct page_info *page)
         (void)rmap_list_to_hash_table(page);
 
     head = (RMAP_USES_HASHTAB(page)) ?
-        page->sharing->hash_table.bucket + 
+        page->sharing->hash_table.bucket +
                             HASH(gfn_info->domain, gfn_info->gfn) :
         &page->sharing->gfns;
 
@@ -322,9 +326,9 @@ rmap_add(gfn_info_t *gfn_info, struct page_info *page)
     list_add(&gfn_info->list, head);
 }
 
-static inline gfn_info_t *
-rmap_retrieve(uint16_t domain_id, unsigned long gfn, 
-                            struct page_info *page)
+static inline
+gfn_info_t *rmap_retrieve(uint16_t domain_id, unsigned long gfn,
+                          struct page_info *page)
 {
     gfn_info_t *gfn_info;
     struct list_head *le, *head;
@@ -364,18 +368,18 @@ struct rmap_iterator {
     unsigned int bucket;
 };
 
-static inline void
-rmap_seed_iterator(struct page_info *page, struct rmap_iterator *ri)
+static inline
+void rmap_seed_iterator(struct page_info *page, struct rmap_iterator *ri)
 {
     ri->curr = (RMAP_USES_HASHTAB(page)) ?
                 page->sharing->hash_table.bucket :
                 &page->sharing->gfns;
-    ri->next = ri->curr->next; 
+    ri->next = ri->curr->next;
     ri->bucket = 0;
 }
 
-static inline gfn_info_t *
-rmap_iterate(struct page_info *page, struct rmap_iterator *ri)
+static inline
+gfn_info_t *rmap_iterate(struct page_info *page, struct rmap_iterator *ri)
 {
     struct list_head *head = (RMAP_USES_HASHTAB(page)) ?
                 page->sharing->hash_table.bucket + ri->bucket :
@@ -405,14 +409,14 @@ retry:
     return list_entry(ri->curr, gfn_info_t, list);
 }
 
-static inline gfn_info_t *mem_sharing_gfn_alloc(struct page_info *page,
-                                                struct domain *d,
-                                                unsigned long gfn)
+static inline
+gfn_info_t *mem_sharing_gfn_alloc(struct page_info *page, struct domain *d,
+                                  unsigned long gfn)
 {
     gfn_info_t *gfn_info = xmalloc(gfn_info_t);
 
     if ( gfn_info == NULL )
-        return NULL; 
+        return NULL;
 
     gfn_info->gfn = gfn;
     gfn_info->domain = d->domain_id;
@@ -425,9 +429,9 @@ static inline gfn_info_t *mem_sharing_gfn_alloc(struct page_info *page,
     return gfn_info;
 }
 
-static inline void mem_sharing_gfn_destroy(struct page_info *page,
-                                           struct domain *d,
-                                           gfn_info_t *gfn_info)
+static inline
+void mem_sharing_gfn_destroy(struct page_info *page, struct domain *d,
+                             gfn_info_t *gfn_info)
 {
     /* Decrement the number of pages. */
     atomic_dec(&d->shr_pages);
@@ -437,25 +441,29 @@ static inline void mem_sharing_gfn_destroy(struct page_info *page,
     xfree(gfn_info);
 }
 
-static struct page_info* mem_sharing_lookup(unsigned long mfn)
+static inline struct page_info* mem_sharing_lookup(unsigned long mfn)
 {
-    if ( mfn_valid(_mfn(mfn)) )
-    {
-        struct page_info* page = mfn_to_page(_mfn(mfn));
-        if ( page_get_owner(page) == dom_cow )
-        {
-            /* Count has to be at least two, because we're called
-             * with the mfn locked (1) and this is supposed to be 
-             * a shared page (1). */
-            unsigned long t = read_atomic(&page->u.inuse.type_info);
-            ASSERT((t & PGT_type_mask) == PGT_shared_page);
-            ASSERT((t & PGT_count_mask) >= 2);
-            ASSERT(SHARED_M2P(get_gpfn_from_mfn(mfn)));
-            return page;
-        }
-    }
+    struct page_info* page;
+    unsigned long t;
 
-    return NULL;
+    if ( !mfn_valid(_mfn(mfn)) )
+        return NULL;
+
+    page = mfn_to_page(_mfn(mfn));
+    if ( page_get_owner(page) != dom_cow )
+        return NULL;
+
+    /*
+     * Count has to be at least two, because we're called
+     * with the mfn locked (1) and this is supposed to be
+     * a shared page (1).
+     */
+    t = read_atomic(&page->u.inuse.type_info);
+    ASSERT((t & PGT_type_mask) == PGT_shared_page);
+    ASSERT((t & PGT_count_mask) >= 2);
+    ASSERT(SHARED_M2P(get_gpfn_from_mfn(mfn)));
+
+    return page;
 }
 
 static int audit(void)
@@ -492,7 +500,7 @@ static int audit(void)
            continue;
         }
 
-        /* Check if the MFN has correct type, owner and handle. */ 
+        /* Check if the MFN has correct type, owner and handle. */
         if ( (pg->u.inuse.type_info & PGT_type_mask) != PGT_shared_page )
         {
            MEM_SHARING_DEBUG("mfn %lx in audit list, but not PGT_shared_page (%lx)!\n",
@@ -545,7 +553,7 @@ static int audit(void)
                 errors++;
                 continue;
             }
-            o_mfn = get_gfn_query_unlocked(d, g->gfn, &t); 
+            o_mfn = get_gfn_query_unlocked(d, g->gfn, &t);
             if ( !mfn_eq(o_mfn, mfn) )
             {
                 MEM_SHARING_DEBUG("Incorrect P2M for d=%hu, PFN=%lx."
@@ -568,7 +576,7 @@ static int audit(void)
         {
             MEM_SHARING_DEBUG("Mismatched counts for MFN=%lx."
                               "nr_gfns in list %lu, in type_info %lx\n",
-                              mfn_x(mfn), nr_gfns, 
+                              mfn_x(mfn), nr_gfns,
                               (pg->u.inuse.type_info & PGT_count_mask));
             errors++;
         }
@@ -603,7 +611,7 @@ int mem_sharing_notify_enomem(struct domain *d, unsigned long gfn,
         .u.mem_sharing.p2mt = p2m_ram_shared
     };
 
-    if ( (rc = __vm_event_claim_slot(d, 
+    if ( (rc = __vm_event_claim_slot(d,
                         d->vm_event_share, allow_sleep)) < 0 )
         return rc;
 
@@ -629,9 +637,9 @@ unsigned int mem_sharing_get_nr_shared_mfns(void)
 }
 
 /* Functions that change a page's type and ownership */
-static int page_make_sharable(struct domain *d, 
-                       struct page_info *page, 
-                       int expected_refcnt)
+static int page_make_sharable(struct domain *d,
+                              struct page_info *page,
+                              int expected_refcnt)
 {
     bool_t drop_dom_ref;
 
@@ -658,8 +666,10 @@ static int page_make_sharable(struct domain *d,
         return -EEXIST;
     }
 
-    /* Check if the ref count is 2. The first from PGC_allocated, and
-     * the second from get_page_and_type at the top of this function */
+    /*
+     * Check if the ref count is 2. The first from PGC_allocated, and
+     * the second from get_page_and_type at the top of this function.
+     */
     if ( page->count_info != (PGC_allocated | (2 + expected_refcnt)) )
     {
         spin_unlock(&d->page_alloc_lock);
@@ -675,6 +685,7 @@ static int page_make_sharable(struct domain *d,
 
     if ( drop_dom_ref )
         put_domain(d);
+
     return 0;
 }
 
@@ -684,7 +695,7 @@ static int page_make_private(struct domain *d, struct page_info *page)
 
     if ( !get_page(page, dom_cow) )
         return -EINVAL;
-    
+
     spin_lock(&d->page_alloc_lock);
 
     if ( d->is_dying )
@@ -727,10 +738,13 @@ static inline struct page_info *__grab_shared_page(mfn_t mfn)
 
     if ( !mfn_valid(mfn) )
         return NULL;
+
     pg = mfn_to_page(mfn);
 
-    /* If the page is not validated we can't lock it, and if it's  
-     * not validated it's obviously not shared. */
+    /*
+     * If the page is not validated we can't lock it, and if it's
+     * not validated it's obviously not shared.
+     */
     if ( !mem_sharing_page_lock(pg) )
         return NULL;
 
@@ -754,10 +768,10 @@ static int debug_mfn(mfn_t mfn)
         return -EINVAL;
     }
 
-    MEM_SHARING_DEBUG( 
+    MEM_SHARING_DEBUG(
             "Debug page: MFN=%lx is ci=%lx, ti=%lx, owner_id=%d\n",
-            mfn_x(page_to_mfn(page)), 
-            page->count_info, 
+            mfn_x(page_to_mfn(page)),
+            page->count_info,
             page->u.inuse.type_info,
             page_get_owner(page)->domain_id);
 
@@ -775,7 +789,7 @@ static int debug_gfn(struct domain *d, gfn_t gfn)
 
     mfn = get_gfn_query(d, gfn_x(gfn), &p2mt);
 
-    MEM_SHARING_DEBUG("Debug for dom%d, gfn=%" PRI_gfn "\n", 
+    MEM_SHARING_DEBUG("Debug for dom%d, gfn=%" PRI_gfn "\n",
                       d->domain_id, gfn_x(gfn));
     num_refs = debug_mfn(mfn);
     put_gfn(d, gfn_x(gfn));
@@ -796,9 +810,9 @@ static int debug_gref(struct domain *d, grant_ref_t ref)
                           d->domain_id, ref, rc);
         return rc;
     }
-    
+
     MEM_SHARING_DEBUG(
-            "==> Grant [dom=%d,ref=%d], status=%x. ", 
+            "==> Grant [dom=%d,ref=%d], status=%x. ",
             d->domain_id, ref, status);
 
     return debug_gfn(d, gfn);
@@ -824,15 +838,12 @@ static int nominate_page(struct domain *d, gfn_t gfn,
         goto out;
 
     /* Return the handle if the page is already shared */
-    if ( p2m_is_shared(p2mt) ) {
+    if ( p2m_is_shared(p2mt) )
+    {
         struct page_info *pg = __grab_shared_page(mfn);
         if ( !pg )
-        {
-            gprintk(XENLOG_ERR,
-                    "Shared p2m entry gfn %" PRI_gfn ", but could not grab mfn %" PRI_mfn " dom%d\n",
-                    gfn_x(gfn), mfn_x(mfn), d->domain_id);
             BUG();
-        }
+
         *phandle = pg->sharing->handle;
         ret = 0;
         mem_sharing_page_unlock(pg);
@@ -843,7 +854,6 @@ static int nominate_page(struct domain *d, gfn_t gfn,
     if ( !p2m_is_sharable(p2mt) )
         goto out;
 
-#ifdef CONFIG_HVM
     /* Check if there are mem_access/remapped altp2m entries for this page */
     if ( altp2m_active(d) )
     {
@@ -872,42 +882,42 @@ static int nominate_page(struct domain *d, gfn_t gfn,
 
         altp2m_list_unlock(d);
     }
-#endif
 
     /* Try to convert the mfn to the sharable type */
     page = mfn_to_page(mfn);
-    ret = page_make_sharable(d, page, expected_refcnt); 
-    if ( ret ) 
+    ret = page_make_sharable(d, page, expected_refcnt);
+    if ( ret )
         goto out;
 
-    /* Now that the page is validated, we can lock it. There is no 
-     * race because we're holding the p2m entry, so no one else 
-     * could be nominating this gfn */
+    /*
+     * Now that the page is validated, we can lock it. There is no
+     * race because we're holding the p2m entry, so no one else
+     * could be nominating this gfn.
+     */
     ret = -ENOENT;
     if ( !mem_sharing_page_lock(page) )
         goto out;
 
     /* Initialize the shared state */
     ret = -ENOMEM;
-    if ( (page->sharing = 
-            xmalloc(struct page_sharing_info)) == NULL )
+    if ( !(page->sharing = xmalloc(struct page_sharing_info)) )
     {
         /* Making a page private atomically unlocks it */
-        BUG_ON(page_make_private(d, page) != 0);
+        BUG_ON(page_make_private(d, page));
         goto out;
     }
     page->sharing->pg = page;
     rmap_init(page);
 
     /* Create the handle */
-    page->sharing->handle = get_next_handle();  
+    page->sharing->handle = get_next_handle();
 
     /* Create the local gfn info */
-    if ( mem_sharing_gfn_alloc(page, d, gfn_x(gfn)) == NULL )
+    if ( !mem_sharing_gfn_alloc(page, d, gfn_x(gfn)) )
     {
         xfree(page->sharing);
         page->sharing = NULL;
-        BUG_ON(page_make_private(d, page) != 0);
+        BUG_ON(page_make_private(d, page));
         goto out;
     }
 
@@ -946,15 +956,19 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh,
     get_two_gfns(sd, sgfn, &smfn_type, NULL, &smfn,
                  cd, cgfn, &cmfn_type, NULL, &cmfn, 0, &tg);
 
-    /* This tricky business is to avoid two callers deadlocking if 
-     * grabbing pages in opposite client/source order */
+    /*
+     * This tricky business is to avoid two callers deadlocking if
+     * grabbing pages in opposite client/source order.
+     */
     if ( mfn_eq(smfn, cmfn) )
     {
-        /* The pages are already the same.  We could return some
+        /*
+         * The pages are already the same.  We could return some
          * kind of error here, but no matter how you look at it,
          * the pages are already 'shared'.  It possibly represents
          * a big problem somewhere else, but as far as sharing is
-         * concerned: great success! */
+         * concerned: great success!
+         */
         ret = 0;
         goto err_out;
     }
@@ -1010,11 +1024,15 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh,
     rmap_seed_iterator(cpage, &ri);
     while ( (gfn = rmap_iterate(cpage, &ri)) != NULL)
     {
-        /* Get the source page and type, this should never fail: 
-         * we are under shr lock, and got a successful lookup */
+        /*
+         * Get the source page and type, this should never fail:
+         * we are under shr lock, and got a successful lookup.
+         */
         BUG_ON(!get_page_and_type(spage, dom_cow, PGT_shared_page));
-        /* Move the gfn_info from client list to source list.
-         * Don't change the type of rmap for the client page. */
+        /*
+         * Move the gfn_info from client list to source list.
+         * Don't change the type of rmap for the client page.
+         */
         rmap_del(gfn, cpage, 0);
         rmap_add(gfn, spage);
         put_count++;
@@ -1043,14 +1061,14 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh,
     atomic_dec(&nr_shared_mfns);
     atomic_inc(&nr_saved_mfns);
     ret = 0;
-    
+
 err_out:
     put_two_gfns(&tg);
     return ret;
 }
 
 int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh,
-                            struct domain *cd, unsigned long cgfn) 
+                               struct domain *cd, unsigned long cgfn)
 {
     struct page_info *spage;
     int ret = -EINVAL;
@@ -1069,15 +1087,18 @@ int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle
     spage = __grab_shared_page(smfn);
     if ( spage == NULL )
         goto err_out;
+
     ASSERT(smfn_type == p2m_ram_shared);
 
     /* Check that the handles match */
     if ( spage->sharing->handle != sh )
         goto err_unlock;
 
-    /* Make sure the target page is a hole in the physmap. These are typically
+    /*
+     * Make sure the target page is a hole in the physmap. These are typically
      * p2m_mmio_dm, but also accept p2m_invalid and paged out pages. See the
-     * definition of p2m_is_hole in p2m.h. */
+     * definition of p2m_is_hole in p2m.h.
+     */
     if ( !p2m_is_hole(cmfn_type) )
     {
         ret = XENMEM_SHARING_OP_C_HANDLE_INVALID;
@@ -1086,7 +1107,7 @@ int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle
 
     /* This is simpler than regular sharing */
     BUG_ON(!get_page_and_type(spage, dom_cow, PGT_shared_page));
-    if ( (gfn_info = mem_sharing_gfn_alloc(spage, cd, cgfn)) == NULL )
+    if ( !(gfn_info = mem_sharing_gfn_alloc(spage, cd, cgfn)) )
     {
         put_page_and_type(spage);
         ret = -ENOMEM;
@@ -1102,11 +1123,17 @@ int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle
         mem_sharing_gfn_destroy(spage, cd, gfn_info);
         put_page_and_type(spage);
     } else {
-        /* There is a chance we're plugging a hole where a paged out page was */
+        /*
+         * There is a chance we're plugging a hole where a paged out
+         * page was.
+         */
         if ( p2m_is_paging(cmfn_type) && (cmfn_type != p2m_ram_paging_out) )
         {
             atomic_dec(&cd->paged_pages);
-            /* Further, there is a chance this was a valid page. Don't leak it. */
+            /*
+             * Further, there is a chance this was a valid page.
+             * Don't leak it.
+             */
             if ( mfn_valid(cmfn) )
             {
                 struct page_info *cpage = mfn_to_page(cmfn);
@@ -1133,13 +1160,14 @@ err_out:
 }
 
 
-/* A note on the rationale for unshare error handling:
+/*
+ * A note on the rationale for unshare error handling:
  *  1. Unshare can only fail with ENOMEM. Any other error conditions BUG_ON()'s
  *  2. We notify a potential dom0 helper through a vm_event ring. But we
- *     allow the notification to not go to sleep. If the event ring is full 
+ *     allow the notification to not go to sleep. If the event ring is full
  *     of ENOMEM warnings, then it's on the ball.
  *  3. We cannot go to sleep until the unshare is resolved, because we might
- *     be buried deep into locks (e.g. something -> copy_to_user -> __hvm_copy) 
+ *     be buried deep into locks (e.g. something -> copy_to_user -> __hvm_copy)
  *  4. So, we make sure we:
  *     4.1. return an error
  *     4.2. do not corrupt shared memory
@@ -1147,19 +1175,20 @@ err_out:
  *     4.4. let the guest deal with it if the error propagation will reach it
  */
 int __mem_sharing_unshare_page(struct domain *d,
-                             unsigned long gfn, 
-                             uint16_t flags)
+                               unsigned long gfn,
+                               uint16_t flags)
 {
     p2m_type_t p2mt;
     mfn_t mfn;
     struct page_info *page, *old_page;
     int last_gfn;
     gfn_info_t *gfn_info = NULL;
-   
+
     mfn = get_gfn(d, gfn, &p2mt);
-    
+
     /* Has someone already unshared it? */
-    if ( !p2m_is_shared(p2mt) ) {
+    if ( !p2m_is_shared(p2mt) )
+    {
         put_gfn(d, gfn);
         return 0;
     }
@@ -1167,26 +1196,30 @@ int __mem_sharing_unshare_page(struct domain *d,
     page = __grab_shared_page(mfn);
     if ( page == NULL )
     {
-        gdprintk(XENLOG_ERR, "Domain p2m is shared, but page is not: "
-                                "%lx\n", gfn);
+        gdprintk(XENLOG_ERR, "Domain p2m is shared, but page is not: %lx\n",
+                 gfn);
         BUG();
     }
 
     gfn_info = rmap_retrieve(d->domain_id, gfn, page);
     if ( unlikely(gfn_info == NULL) )
     {
-        gdprintk(XENLOG_ERR, "Could not find gfn_info for shared gfn: "
-                                "%lx\n", gfn);
+        gdprintk(XENLOG_ERR, "Could not find gfn_info for shared gfn: %lx\n",
+                 gfn);
         BUG();
     }
 
-    /* Do the accounting first. If anything fails below, we have bigger
-     * bigger fish to fry. First, remove the gfn from the list. */ 
+    /*
+     * Do the accounting first. If anything fails below, we have bigger
+     * bigger fish to fry. First, remove the gfn from the list.
+     */
     last_gfn = rmap_has_one_entry(page);
     if ( last_gfn )
     {
-        /* Clean up shared state. Get rid of the <domid, gfn> tuple
-         * before destroying the rmap. */
+        /*
+         * Clean up shared state. Get rid of the <domid, gfn> tuple
+         * before destroying the rmap.
+         */
         mem_sharing_gfn_destroy(page, d, gfn_info);
         page_sharing_dispose(page);
         page->sharing = NULL;
@@ -1195,8 +1228,10 @@ int __mem_sharing_unshare_page(struct domain *d,
     else
         atomic_dec(&nr_saved_mfns);
 
-    /* If the GFN is getting destroyed drop the references to MFN 
-     * (possibly freeing the page), and exit early */
+    /*
+     * If the GFN is getting destroyed drop the references to MFN
+     * (possibly freeing the page), and exit early.
+     */
     if ( flags & MEM_SHARING_DESTROY_GFN )
     {
         if ( !last_gfn )
@@ -1212,7 +1247,7 @@ int __mem_sharing_unshare_page(struct domain *d,
 
         return 0;
     }
- 
+
     if ( last_gfn )
     {
         /* Making a page private atomically unlocks it */
@@ -1222,14 +1257,16 @@ int __mem_sharing_unshare_page(struct domain *d,
 
     old_page = page;
     page = alloc_domheap_page(d, 0);
-    if ( !page ) 
+    if ( !page )
     {
         /* Undo dec of nr_saved_mfns, as the retry will decrease again. */
         atomic_inc(&nr_saved_mfns);
         mem_sharing_page_unlock(old_page);
         put_gfn(d, gfn);
-        /* Caller is responsible for placing an event
-         * in the ring */
+        /*
+         * Caller is responsible for placing an event
+         * in the ring.
+         */
         return -ENOMEM;
     }
 
@@ -1240,11 +1277,11 @@ int __mem_sharing_unshare_page(struct domain *d,
     mem_sharing_page_unlock(old_page);
     put_page_and_type(old_page);
 
-private_page_found:    
+ private_page_found:
     if ( p2m_change_type_one(d, gfn, p2m_ram_shared, p2m_ram_rw) )
     {
-        gdprintk(XENLOG_ERR, "Could not change p2m type d %hu gfn %lx.\n", 
-                                d->domain_id, gfn);
+        gdprintk(XENLOG_ERR, "Could not change p2m type d %hu gfn %lx.\n",
+                 d->domain_id, gfn);
         BUG();
     }
 
@@ -1277,20 +1314,23 @@ int relinquish_shared_pages(struct domain *d)
         mfn_t mfn;
         int set_rc;
 
-        if ( atomic_read(&d->shr_pages) == 0 )
+        if ( !atomic_read(&d->shr_pages) )
             break;
+
         mfn = p2m->get_entry(p2m, _gfn(gfn), &t, &a, 0, NULL, NULL);
-        if ( mfn_valid(mfn) && (t == p2m_ram_shared) )
+        if ( mfn_valid(mfn) && t == p2m_ram_shared )
         {
             /* Does not fail with ENOMEM given the DESTROY flag */
-            BUG_ON(__mem_sharing_unshare_page(d, gfn, 
-                    MEM_SHARING_DESTROY_GFN));
-            /* Clear out the p2m entry so no one else may try to
+            BUG_ON(__mem_sharing_unshare_page(d, gfn,
+                   MEM_SHARING_DESTROY_GFN));
+            /*
+             * Clear out the p2m entry so no one else may try to
              * unshare.  Must succeed: we just read the old entry and
-             * we hold the p2m lock. */
+             * we hold the p2m lock.
+             */
             set_rc = p2m->set_entry(p2m, _gfn(gfn), _mfn(0), PAGE_ORDER_4K,
                                     p2m_invalid, p2m_access_rwx, -1);
-            ASSERT(set_rc == 0);
+            ASSERT(!set_rc);
             count += 0x10;
         }
         else
@@ -1454,7 +1494,7 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 
             if ( XENMEM_SHARING_OP_FIELD_IS_GREF(mso.u.share.source_gfn) )
             {
-                grant_ref_t gref = (grant_ref_t) 
+                grant_ref_t gref = (grant_ref_t)
                                     (XENMEM_SHARING_OP_FIELD_GET_GREF(
                                         mso.u.share.source_gfn));
                 rc = mem_sharing_gref_to_gfn(d->grant_table, gref, &sgfn,
@@ -1470,7 +1510,7 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 
             if ( XENMEM_SHARING_OP_FIELD_IS_GREF(mso.u.share.client_gfn) )
             {
-                grant_ref_t gref = (grant_ref_t) 
+                grant_ref_t gref = (grant_ref_t)
                                     (XENMEM_SHARING_OP_FIELD_GET_GREF(
                                         mso.u.share.client_gfn));
                 rc = mem_sharing_gref_to_gfn(cd->grant_table, gref, &cgfn,
@@ -1534,7 +1574,7 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
             sh      = mso.u.share.source_handle;
             cgfn    = mso.u.share.client_gfn;
 
-            rc = mem_sharing_add_to_physmap(d, sgfn, sh, cd, cgfn); 
+            rc = mem_sharing_add_to_physmap(d, sgfn, sh, cd, cgfn);
 
             rcu_unlock_domain(cd);
         }
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index ba126f790a..3119269073 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -511,8 +511,10 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
     if ( (q & P2M_UNSHARE) && p2m_is_shared(*t) )
     {
         ASSERT(p2m_is_hostp2m(p2m));
-        /* Try to unshare. If we fail, communicate ENOMEM without
-         * sleeping. */
+        /*
+         * Try to unshare. If we fail, communicate ENOMEM without
+         * sleeping.
+         */
         if ( mem_sharing_unshare_page(p2m->domain, gfn_l, 0) < 0 )
             mem_sharing_notify_enomem(p2m->domain, gfn_l, false);
         mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
@@ -892,15 +894,15 @@ guest_physmap_add_entry(struct domain *d, gfn_t gfn, mfn_t mfn,
                               &a, 0, NULL, NULL);
         if ( p2m_is_shared(ot) )
         {
-            /* Do an unshare to cleanly take care of all corner 
-             * cases. */
+            /* Do an unshare to cleanly take care of all corner cases. */
             int rc;
             rc = mem_sharing_unshare_page(p2m->domain,
                                           gfn_x(gfn_add(gfn, i)), 0);
             if ( rc )
             {
                 p2m_unlock(p2m);
-                /* NOTE: Should a guest domain bring this upon itself,
+                /*
+                 * NOTE: Should a guest domain bring this upon itself,
                  * there is not a whole lot we can do. We are buried
                  * deep in locks from most code paths by now. So, fail
                  * the call and don't try to sleep on a wait queue
@@ -909,8 +911,9 @@ guest_physmap_add_entry(struct domain *d, gfn_t gfn, mfn_t mfn,
                  * However, all current (changeset 3432abcf9380) code
                  * paths avoid this unsavoury situation. For now.
                  *
-                 * Foreign domains are okay to place an event as they 
-                 * won't go to sleep. */
+                 * Foreign domains are okay to place an event as they
+                 * won't go to sleep.
+                 */
                 (void)mem_sharing_notify_enomem(p2m->domain,
                                                 gfn_x(gfn_add(gfn, i)), false);
                 return rc;
diff --git a/xen/include/asm-x86/mem_sharing.h b/xen/include/asm-x86/mem_sharing.h
index db22468744..7d40e38563 100644
--- a/xen/include/asm-x86/mem_sharing.h
+++ b/xen/include/asm-x86/mem_sharing.h
@@ -33,12 +33,14 @@
 #define MEM_SHARING_AUDIT 0
 #endif
 
-typedef uint64_t shr_handle_t; 
+typedef uint64_t shr_handle_t;
 
 typedef struct rmap_hashtab {
     struct list_head *bucket;
-    /* Overlaps with prev pointer of list_head in union below.
-     * Unlike the prev pointer, this can be NULL. */
+    /*
+     * Overlaps with prev pointer of list_head in union below.
+     * Unlike the prev pointer, this can be NULL.
+     */
     void *flag;
 } rmap_hashtab_t;
 
@@ -57,34 +59,34 @@ struct page_sharing_info
     };
 };
 
-#define sharing_supported(_d) \
-    (is_hvm_domain(_d) && paging_mode_hap(_d)) 
-
 unsigned int mem_sharing_get_nr_saved_mfns(void);
 unsigned int mem_sharing_get_nr_shared_mfns(void);
 
 #define MEM_SHARING_DESTROY_GFN       (1<<1)
 /* Only fails with -ENOMEM. Enforce it with a BUG_ON wrapper. */
 int __mem_sharing_unshare_page(struct domain *d,
-                             unsigned long gfn, 
-                             uint16_t flags);
-static inline int mem_sharing_unshare_page(struct domain *d,
-                                           unsigned long gfn,
-                                           uint16_t flags)
+                               unsigned long gfn,
+                               uint16_t flags);
+
+static inline
+int mem_sharing_unshare_page(struct domain *d,
+                             unsigned long gfn,
+                             uint16_t flags)
 {
     int rc = __mem_sharing_unshare_page(d, gfn, flags);
-    BUG_ON( rc && (rc != -ENOMEM) );
+    BUG_ON(rc && (rc != -ENOMEM));
     return rc;
 }
 
-/* If called by a foreign domain, possible errors are
+/*
+ * If called by a foreign domain, possible errors are
  *   -EBUSY -> ring full
  *   -ENOSYS -> no ring to begin with
  * and the foreign mapper is responsible for retrying.
  *
- * If called by the guest vcpu itself and allow_sleep is set, may 
- * sleep on a wait queue, so the caller is responsible for not 
- * holding locks on entry. It may only fail with ENOSYS 
+ * If called by the guest vcpu itself and allow_sleep is set, may
+ * sleep on a wait queue, so the caller is responsible for not
+ * holding locks on entry. It may only fail with ENOSYS
  *
  * If called by the guest vcpu itself and allow_sleep is not set,
  * then it's the same as a foreign domain.
@@ -92,10 +94,11 @@ static inline int mem_sharing_unshare_page(struct domain *d,
 int mem_sharing_notify_enomem(struct domain *d, unsigned long gfn,
                               bool allow_sleep);
 int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg);
-int mem_sharing_domctl(struct domain *d, 
+int mem_sharing_domctl(struct domain *d,
                        struct xen_domctl_mem_sharing_op *mec);
 
-/* Scans the p2m and relinquishes any shared pages, destroying 
+/*
+ * Scans the p2m and relinquishes any shared pages, destroying
  * those for which this domain holds the final reference.
  * Preemptible.
  */
@@ -107,18 +110,22 @@ static inline unsigned int mem_sharing_get_nr_saved_mfns(void)
 {
     return 0;
 }
+
 static inline unsigned int mem_sharing_get_nr_shared_mfns(void)
 {
     return 0;
 }
-static inline int mem_sharing_unshare_page(struct domain *d,
-                                           unsigned long gfn,
-                                           uint16_t flags)
+
+static inline
+int mem_sharing_unshare_page(struct domain *d, unsigned long gfn,
+                             uint16_t flags)
 {
     ASSERT_UNREACHABLE();
     return -EOPNOTSUPP;
 }
-static inline int mem_sharing_notify_enomem(struct domain *d, unsigned long gfn,
+
+static inline
+int mem_sharing_notify_enomem(struct domain *d, unsigned long gfn,
                               bool allow_sleep)
 {
     ASSERT_UNREACHABLE();
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 05/20] x86/mem_sharing: make get_two_gfns take locks conditionally
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
                   ` (3 preceding siblings ...)
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 04/20] x86/mem_sharing: cleanup code and comments in various locations Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-19 19:12   ` Andrew Cooper
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 06/20] x86/mem_sharing: drop flags from mem_sharing_unshare_page Tamas K Lengyel
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

During VM forking the client lock will already be taken.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 11 ++++++-----
 xen/include/asm-x86/p2m.h     | 10 +++++-----
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 319aaf3074..c0e305ad71 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -954,7 +954,7 @@ static int share_pages(struct domain *sd, gfn_t sgfn, shr_handle_t sh,
     unsigned long put_count = 0;
 
     get_two_gfns(sd, sgfn, &smfn_type, NULL, &smfn,
-                 cd, cgfn, &cmfn_type, NULL, &cmfn, 0, &tg);
+                 cd, cgfn, &cmfn_type, NULL, &cmfn, 0, &tg, true);
 
     /*
      * This tricky business is to avoid two callers deadlocking if
@@ -1068,7 +1068,7 @@ err_out:
 }
 
 int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh,
-                               struct domain *cd, unsigned long cgfn)
+                               struct domain *cd, unsigned long cgfn, bool lock)
 {
     struct page_info *spage;
     int ret = -EINVAL;
@@ -1080,7 +1080,7 @@ int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle
     struct two_gfns tg;
 
     get_two_gfns(sd, _gfn(sgfn), &smfn_type, NULL, &smfn,
-                 cd, _gfn(cgfn), &cmfn_type, &a, &cmfn, 0, &tg);
+                 cd, _gfn(cgfn), &cmfn_type, &a, &cmfn, 0, &tg, lock);
 
     /* Get the source shared page, check and lock */
     ret = XENMEM_SHARING_OP_S_HANDLE_INVALID;
@@ -1155,7 +1155,8 @@ int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle
 err_unlock:
     mem_sharing_page_unlock(spage);
 err_out:
-    put_two_gfns(&tg);
+    if ( lock )
+        put_two_gfns(&tg);
     return ret;
 }
 
@@ -1574,7 +1575,7 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
             sh      = mso.u.share.source_handle;
             cgfn    = mso.u.share.client_gfn;
 
-            rc = mem_sharing_add_to_physmap(d, sgfn, sh, cd, cgfn);
+            rc = mem_sharing_add_to_physmap(d, sgfn, sh, cd, cgfn, true);
 
             rcu_unlock_domain(cd);
         }
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 94285db1b4..7399c4a897 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -539,7 +539,7 @@ struct two_gfns {
 static inline void get_two_gfns(struct domain *rd, gfn_t rgfn,
         p2m_type_t *rt, p2m_access_t *ra, mfn_t *rmfn, struct domain *ld,
         gfn_t lgfn, p2m_type_t *lt, p2m_access_t *la, mfn_t *lmfn,
-        p2m_query_t q, struct two_gfns *rval)
+        p2m_query_t q, struct two_gfns *rval, bool lock)
 {
     mfn_t           *first_mfn, *second_mfn, scratch_mfn;
     p2m_access_t    *first_a, *second_a, scratch_a;
@@ -569,10 +569,10 @@ do {                                                    \
 #undef assign_pointers
 
     /* Now do the gets */
-    *first_mfn  = get_gfn_type_access(p2m_get_hostp2m(rval->first_domain),
-                                      gfn_x(rval->first_gfn), first_t, first_a, q, NULL);
-    *second_mfn = get_gfn_type_access(p2m_get_hostp2m(rval->second_domain),
-                                      gfn_x(rval->second_gfn), second_t, second_a, q, NULL);
+    *first_mfn  = __get_gfn_type_access(p2m_get_hostp2m(rval->first_domain),
+                                        gfn_x(rval->first_gfn), first_t, first_a, q, NULL, lock);
+    *second_mfn = __get_gfn_type_access(p2m_get_hostp2m(rval->second_domain),
+                                        gfn_x(rval->second_gfn), second_t, second_a, q, NULL, lock);
 }
 
 static inline void put_two_gfns(struct two_gfns *arg)
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 06/20] x86/mem_sharing: drop flags from mem_sharing_unshare_page
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
                   ` (4 preceding siblings ...)
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 05/20] x86/mem_sharing: make get_two_gfns take locks conditionally Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 07/20] x86/mem_sharing: don't try to unshare twice during page fault Tamas K Lengyel
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Andrew Cooper,
	Jan Beulich, Roger Pau Monné

All callers pass 0 in.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Wei Liu <wl@xen.org>
---
 xen/arch/x86/hvm/hvm.c            | 2 +-
 xen/arch/x86/mm/p2m.c             | 5 ++---
 xen/include/asm-x86/mem_sharing.h | 8 +++-----
 3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 1e888b403b..e055114922 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1902,7 +1902,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
     if ( npfec.write_access && (p2mt == p2m_ram_shared) )
     {
         ASSERT(p2m_is_hostp2m(p2m));
-        sharing_enomem = mem_sharing_unshare_page(currd, gfn, 0);
+        sharing_enomem = mem_sharing_unshare_page(currd, gfn);
         rc = 1;
         goto out_put_gfn;
     }
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 3119269073..baea632acc 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -515,7 +515,7 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
          * Try to unshare. If we fail, communicate ENOMEM without
          * sleeping.
          */
-        if ( mem_sharing_unshare_page(p2m->domain, gfn_l, 0) < 0 )
+        if ( mem_sharing_unshare_page(p2m->domain, gfn_l) < 0 )
             mem_sharing_notify_enomem(p2m->domain, gfn_l, false);
         mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
     }
@@ -896,8 +896,7 @@ guest_physmap_add_entry(struct domain *d, gfn_t gfn, mfn_t mfn,
         {
             /* Do an unshare to cleanly take care of all corner cases. */
             int rc;
-            rc = mem_sharing_unshare_page(p2m->domain,
-                                          gfn_x(gfn_add(gfn, i)), 0);
+            rc = mem_sharing_unshare_page(p2m->domain, gfn_x(gfn_add(gfn, i)));
             if ( rc )
             {
                 p2m_unlock(p2m);
diff --git a/xen/include/asm-x86/mem_sharing.h b/xen/include/asm-x86/mem_sharing.h
index 7d40e38563..0a9192d0e2 100644
--- a/xen/include/asm-x86/mem_sharing.h
+++ b/xen/include/asm-x86/mem_sharing.h
@@ -70,10 +70,9 @@ int __mem_sharing_unshare_page(struct domain *d,
 
 static inline
 int mem_sharing_unshare_page(struct domain *d,
-                             unsigned long gfn,
-                             uint16_t flags)
+                             unsigned long gfn)
 {
-    int rc = __mem_sharing_unshare_page(d, gfn, flags);
+    int rc = __mem_sharing_unshare_page(d, gfn, 0);
     BUG_ON(rc && (rc != -ENOMEM));
     return rc;
 }
@@ -117,8 +116,7 @@ static inline unsigned int mem_sharing_get_nr_shared_mfns(void)
 }
 
 static inline
-int mem_sharing_unshare_page(struct domain *d, unsigned long gfn,
-                             uint16_t flags)
+int mem_sharing_unshare_page(struct domain *d, unsigned long gfn)
 {
     ASSERT_UNREACHABLE();
     return -EOPNOTSUPP;
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 07/20] x86/mem_sharing: don't try to unshare twice during page fault
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
                   ` (5 preceding siblings ...)
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 06/20] x86/mem_sharing: drop flags from mem_sharing_unshare_page Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-19 19:19   ` Andrew Cooper
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 08/20] x86/mem_sharing: define mem_sharing_domain to hold some scattered variables Tamas K Lengyel
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Tamas K Lengyel, Wei Liu, Jan Beulich,
	Roger Pau Monné

The page was already tried to be unshared in get_gfn_type_access. If that
didn't work, then trying again is pointless. Don't try to send vm_event again
either, simply check if there is a ring or not.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/hvm/hvm.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index e055114922..8f90841813 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -38,6 +38,7 @@
 #include <xen/warning.h>
 #include <xen/vpci.h>
 #include <xen/nospec.h>
+#include <xen/vm_event.h>
 #include <asm/shadow.h>
 #include <asm/hap.h>
 #include <asm/current.h>
@@ -1706,11 +1707,14 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
     struct domain *currd = curr->domain;
     struct p2m_domain *p2m, *hostp2m;
     int rc, fall_through = 0, paged = 0;
-    int sharing_enomem = 0;
     vm_event_request_t *req_ptr = NULL;
     bool sync = false;
     unsigned int page_order;
 
+#ifdef CONFIG_MEM_SHARING
+    bool sharing_enomem = false;
+#endif
+
     /* On Nested Virtualization, walk the guest page table.
      * If this succeeds, all is fine.
      * If this fails, inject a nested page fault into the guest.
@@ -1898,14 +1902,16 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
     if ( p2m_is_paged(p2mt) || (p2mt == p2m_ram_paging_out) )
         paged = 1;
 
-    /* Mem sharing: unshare the page and try again */
-    if ( npfec.write_access && (p2mt == p2m_ram_shared) )
+#ifdef CONFIG_MEM_SHARING
+    /* Mem sharing: if still shared on write access then its enomem */
+    if ( npfec.write_access && p2m_is_shared(p2mt) )
     {
         ASSERT(p2m_is_hostp2m(p2m));
-        sharing_enomem = mem_sharing_unshare_page(currd, gfn);
+        sharing_enomem = true;
         rc = 1;
         goto out_put_gfn;
     }
+#endif
 
     /* Spurious fault? PoD and log-dirty also take this path. */
     if ( p2m_is_ram(p2mt) )
@@ -1959,19 +1965,21 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
      */
     if ( paged )
         p2m_mem_paging_populate(currd, gfn);
+
+#ifdef CONFIG_MEM_SHARING
     if ( sharing_enomem )
     {
-        int rv;
-
-        if ( (rv = mem_sharing_notify_enomem(currd, gfn, true)) < 0 )
+        if ( !vm_event_check_ring(currd->vm_event_share) )
         {
             gdprintk(XENLOG_ERR, "Domain %hu attempt to unshare "
-                     "gfn %lx, ENOMEM and no helper (rc %d)\n",
-                     currd->domain_id, gfn, rv);
+                     "gfn %lx, ENOMEM and no helper\n",
+                     currd->domain_id, gfn);
             /* Crash the domain */
             rc = 0;
         }
     }
+#endif
+
     if ( req_ptr )
     {
         if ( monitor_traps(curr, sync, req_ptr) < 0 )
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 08/20] x86/mem_sharing: define mem_sharing_domain to hold some scattered variables
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
                   ` (6 preceding siblings ...)
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 07/20] x86/mem_sharing: don't try to unshare twice during page fault Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 09/20] x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in relinquish_shared_pages Tamas K Lengyel
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

Create struct mem_sharing_domain under hvm_domain and move mem sharing
variables into it from p2m_domain and hvm_domain.

Expose the mem_sharing_enabled macro to be used consistently across Xen.

Remove some duplicate calls to mem_sharing_enabled in mem_sharing.c

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c     | 30 +++++-------------------------
 xen/drivers/passthrough/pci.c     |  3 +--
 xen/include/asm-x86/hvm/domain.h  |  6 +++++-
 xen/include/asm-x86/mem_sharing.h | 16 ++++++++++++++++
 xen/include/asm-x86/p2m.h         |  4 ----
 5 files changed, 27 insertions(+), 32 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index c0e305ad71..5d81730315 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -197,9 +197,6 @@ static inline shr_handle_t get_next_handle(void)
     return x + 1;
 }
 
-#define mem_sharing_enabled(d) \
-    (is_hvm_domain(d) && (d)->arch.hvm.mem_sharing_enabled)
-
 static atomic_t nr_saved_mfns   = ATOMIC_INIT(0);
 static atomic_t nr_shared_mfns  = ATOMIC_INIT(0);
 
@@ -1300,6 +1297,7 @@ int __mem_sharing_unshare_page(struct domain *d,
 int relinquish_shared_pages(struct domain *d)
 {
     int rc = 0;
+    struct mem_sharing_domain *msd = &d->arch.hvm.mem_sharing;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
     unsigned long gfn, count = 0;
 
@@ -1307,7 +1305,7 @@ int relinquish_shared_pages(struct domain *d)
         return 0;
 
     p2m_lock(p2m);
-    for ( gfn = p2m->next_shared_gfn_to_relinquish;
+    for ( gfn = msd->next_shared_gfn_to_relinquish;
           gfn <= p2m->max_mapped_pfn; gfn++ )
     {
         p2m_access_t a;
@@ -1342,7 +1340,7 @@ int relinquish_shared_pages(struct domain *d)
         {
             if ( hypercall_preempt_check() )
             {
-                p2m->next_shared_gfn_to_relinquish = gfn + 1;
+                msd->next_shared_gfn_to_relinquish = gfn + 1;
                 rc = -ERESTART;
                 break;
             }
@@ -1428,7 +1426,7 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 
     /* Only HAP is supported */
     rc = -ENODEV;
-    if ( !hap_enabled(d) || !d->arch.hvm.mem_sharing_enabled )
+    if ( !mem_sharing_enabled(d) )
         goto out;
 
     switch ( mso.op )
@@ -1437,10 +1435,6 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
         {
             shr_handle_t handle;
 
-            rc = -EINVAL;
-            if ( !mem_sharing_enabled(d) )
-                goto out;
-
             rc = nominate_page(d, _gfn(mso.u.nominate.u.gfn), 0, &handle);
             mso.u.nominate.handle = handle;
         }
@@ -1452,9 +1446,6 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
             gfn_t gfn;
             shr_handle_t handle;
 
-            rc = -EINVAL;
-            if ( !mem_sharing_enabled(d) )
-                goto out;
             rc = mem_sharing_gref_to_gfn(d->grant_table, gref, &gfn, NULL);
             if ( rc < 0 )
                 goto out;
@@ -1470,10 +1461,6 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
             struct domain *cd;
             shr_handle_t sh, ch;
 
-            rc = -EINVAL;
-            if ( !mem_sharing_enabled(d) )
-                goto out;
-
             rc = rcu_lock_live_remote_domain_by_id(mso.u.share.client_domain,
                                                    &cd);
             if ( rc )
@@ -1540,10 +1527,6 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
             struct domain *cd;
             shr_handle_t sh;
 
-            rc = -EINVAL;
-            if ( !mem_sharing_enabled(d) )
-                goto out;
-
             rc = rcu_lock_live_remote_domain_by_id(mso.u.share.client_domain,
                                                    &cd);
             if ( rc )
@@ -1602,9 +1585,6 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
                   mso.u.range.opaque > mso.u.range.last_gfn) )
                 goto out;
 
-            if ( !mem_sharing_enabled(d) )
-                goto out;
-
             rc = rcu_lock_live_remote_domain_by_id(mso.u.range.client_domain,
                                                    &cd);
             if ( rc )
@@ -1708,7 +1688,7 @@ int mem_sharing_domctl(struct domain *d, struct xen_domctl_mem_sharing_op *mec)
             if ( unlikely(is_iommu_enabled(d) && mec->u.enable) )
                 rc = -EXDEV;
             else
-                d->arch.hvm.mem_sharing_enabled = mec->u.enable;
+                d->arch.hvm.mem_sharing.enabled = mec->u.enable;
         }
         break;
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index c07a63981a..65d1d457ff 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1498,8 +1498,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
     /* Prevent device assign if mem paging or mem sharing have been 
      * enabled for this domain */
     if ( d != dom_io &&
-         unlikely((is_hvm_domain(d) &&
-                   d->arch.hvm.mem_sharing_enabled) ||
+         unlikely(mem_sharing_enabled(d) ||
                   vm_event_check_ring(d->vm_event_paging) ||
                   p2m_get_hostp2m(d)->global_logdirty) )
         return -EXDEV;
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index bcc5621797..8f70ba2b1a 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -29,6 +29,7 @@
 #include <asm/hvm/viridian.h>
 #include <asm/hvm/vmx/vmcs.h>
 #include <asm/hvm/svm/vmcb.h>
+#include <asm/mem_sharing.h>
 #include <public/grant_table.h>
 #include <public/hvm/params.h>
 #include <public/hvm/save.h>
@@ -156,7 +157,6 @@ struct hvm_domain {
 
     struct viridian_domain *viridian;
 
-    bool_t                 mem_sharing_enabled;
     bool_t                 qemu_mapcache_invalidate;
     bool_t                 is_s3_suspended;
 
@@ -192,6 +192,10 @@ struct hvm_domain {
         struct vmx_domain vmx;
         struct svm_domain svm;
     };
+
+#ifdef CONFIG_MEM_SHARING
+    struct mem_sharing_domain mem_sharing;
+#endif
 };
 
 #endif /* __ASM_X86_HVM_DOMAIN_H__ */
diff --git a/xen/include/asm-x86/mem_sharing.h b/xen/include/asm-x86/mem_sharing.h
index 0a9192d0e2..89cdaccea0 100644
--- a/xen/include/asm-x86/mem_sharing.h
+++ b/xen/include/asm-x86/mem_sharing.h
@@ -26,6 +26,20 @@
 
 #ifdef CONFIG_MEM_SHARING
 
+struct mem_sharing_domain
+{
+    bool enabled;
+
+    /*
+     * When releasing shared gfn's in a preemptible manner, recall where
+     * to resume the search.
+     */
+    unsigned long next_shared_gfn_to_relinquish;
+};
+
+#define mem_sharing_enabled(d) \
+    (hap_enabled(d) && (d)->arch.hvm.mem_sharing.enabled)
+
 /* Auditing of memory sharing code? */
 #ifndef NDEBUG
 #define MEM_SHARING_AUDIT 1
@@ -105,6 +119,8 @@ int relinquish_shared_pages(struct domain *d);
 
 #else
 
+#define mem_sharing_enabled(d) false
+
 static inline unsigned int mem_sharing_get_nr_saved_mfns(void)
 {
     return 0;
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 7399c4a897..8defa90306 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -305,10 +305,6 @@ struct p2m_domain {
     unsigned long min_remapped_gfn;
     unsigned long max_remapped_gfn;
 
-    /* When releasing shared gfn's in a preemptible manner, recall where
-     * to resume the search */
-    unsigned long next_shared_gfn_to_relinquish;
-
 #ifdef CONFIG_HVM
     /* Populate-on-demand variables
      * All variables are protected with the pod lock. We cannot rely on
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 09/20] x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in relinquish_shared_pages
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
                   ` (7 preceding siblings ...)
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 08/20] x86/mem_sharing: define mem_sharing_domain to hold some scattered variables Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 10/20] x86/mem_sharing: Make add_to_physmap static and shorten name Tamas K Lengyel
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

While using _mfn(0) is of no consequence during teardown, INVALID_MFN is the
correct value that should be used.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 5d81730315..1b7b520ccf 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1317,7 +1317,7 @@ int relinquish_shared_pages(struct domain *d)
             break;
 
         mfn = p2m->get_entry(p2m, _gfn(gfn), &t, &a, 0, NULL, NULL);
-        if ( mfn_valid(mfn) && t == p2m_ram_shared )
+        if ( mfn_valid(mfn) && p2m_is_shared(t) )
         {
             /* Does not fail with ENOMEM given the DESTROY flag */
             BUG_ON(__mem_sharing_unshare_page(d, gfn,
@@ -1327,7 +1327,7 @@ int relinquish_shared_pages(struct domain *d)
              * unshare.  Must succeed: we just read the old entry and
              * we hold the p2m lock.
              */
-            set_rc = p2m->set_entry(p2m, _gfn(gfn), _mfn(0), PAGE_ORDER_4K,
+            set_rc = p2m->set_entry(p2m, _gfn(gfn), INVALID_MFN, PAGE_ORDER_4K,
                                     p2m_invalid, p2m_access_rwx, -1);
             ASSERT(!set_rc);
             count += 0x10;
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 10/20] x86/mem_sharing: Make add_to_physmap static and shorten name
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
                   ` (8 preceding siblings ...)
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 09/20] x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in relinquish_shared_pages Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 11/20] x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool Tamas K Lengyel
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

It's not being called from outside mem_sharing.c

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 1b7b520ccf..fc1d8be1eb 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1064,8 +1064,9 @@ err_out:
     return ret;
 }
 
-int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh,
-                               struct domain *cd, unsigned long cgfn, bool lock)
+static
+int add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh,
+                   struct domain *cd, unsigned long cgfn, bool lock)
 {
     struct page_info *spage;
     int ret = -EINVAL;
@@ -1558,7 +1559,7 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
             sh      = mso.u.share.source_handle;
             cgfn    = mso.u.share.client_gfn;
 
-            rc = mem_sharing_add_to_physmap(d, sgfn, sh, cd, cgfn, true);
+            rc = add_to_physmap(d, sgfn, sh, cd, cgfn, true);
 
             rcu_unlock_domain(cd);
         }
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 11/20] x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
                   ` (9 preceding siblings ...)
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 10/20] x86/mem_sharing: Make add_to_physmap static and shorten name Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-18 21:29   ` Julien Grall
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 12/20] x86/mem_sharing: Replace MEM_SHARING_DEBUG with gdprintk Tamas K Lengyel
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Stefano Stabellini,
	Jan Beulich, Julien Grall, Roger Pau Monné

MEM_SHARING_DESTROY_GFN is used on the 'flags' bitfield during unsharing.
However, the bitfield is not used for anything else, so just convert it to a
bool instead.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c     | 7 +++----
 xen/arch/x86/mm/p2m.c             | 1 +
 xen/common/memory.c               | 2 +-
 xen/include/asm-x86/mem_sharing.h | 5 ++---
 4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index fc1d8be1eb..6e81e1a895 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1175,7 +1175,7 @@ err_out:
  */
 int __mem_sharing_unshare_page(struct domain *d,
                                unsigned long gfn,
-                               uint16_t flags)
+                               bool destroy)
 {
     p2m_type_t p2mt;
     mfn_t mfn;
@@ -1231,7 +1231,7 @@ int __mem_sharing_unshare_page(struct domain *d,
      * If the GFN is getting destroyed drop the references to MFN
      * (possibly freeing the page), and exit early.
      */
-    if ( flags & MEM_SHARING_DESTROY_GFN )
+    if ( destroy )
     {
         if ( !last_gfn )
             mem_sharing_gfn_destroy(page, d, gfn_info);
@@ -1321,8 +1321,7 @@ int relinquish_shared_pages(struct domain *d)
         if ( mfn_valid(mfn) && p2m_is_shared(t) )
         {
             /* Does not fail with ENOMEM given the DESTROY flag */
-            BUG_ON(__mem_sharing_unshare_page(d, gfn,
-                   MEM_SHARING_DESTROY_GFN));
+            BUG_ON(__mem_sharing_unshare_page(d, gfn, true));
             /*
              * Clear out the p2m entry so no one else may try to
              * unshare.  Must succeed: we just read the old entry and
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index baea632acc..53ea44fe3c 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -517,6 +517,7 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
          */
         if ( mem_sharing_unshare_page(p2m->domain, gfn_l) < 0 )
             mem_sharing_notify_enomem(p2m->domain, gfn_l, false);
+
         mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
     }
 
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 309e872edf..c7d2bac452 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -352,7 +352,7 @@ int guest_remove_page(struct domain *d, unsigned long gmfn)
          * might be the only one using this shared page, and we need to
          * trigger proper cleanup. Once done, this is like any other page.
          */
-        rc = mem_sharing_unshare_page(d, gmfn, 0);
+        rc = mem_sharing_unshare_page(d, gmfn);
         if ( rc )
         {
             mem_sharing_notify_enomem(d, gmfn, false);
diff --git a/xen/include/asm-x86/mem_sharing.h b/xen/include/asm-x86/mem_sharing.h
index 89cdaccea0..4b982a4803 100644
--- a/xen/include/asm-x86/mem_sharing.h
+++ b/xen/include/asm-x86/mem_sharing.h
@@ -76,17 +76,16 @@ struct page_sharing_info
 unsigned int mem_sharing_get_nr_saved_mfns(void);
 unsigned int mem_sharing_get_nr_shared_mfns(void);
 
-#define MEM_SHARING_DESTROY_GFN       (1<<1)
 /* Only fails with -ENOMEM. Enforce it with a BUG_ON wrapper. */
 int __mem_sharing_unshare_page(struct domain *d,
                                unsigned long gfn,
-                               uint16_t flags);
+                               bool destroy);
 
 static inline
 int mem_sharing_unshare_page(struct domain *d,
                              unsigned long gfn)
 {
-    int rc = __mem_sharing_unshare_page(d, gfn, 0);
+    int rc = __mem_sharing_unshare_page(d, gfn, false);
     BUG_ON(rc && (rc != -ENOMEM));
     return rc;
 }
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 12/20] x86/mem_sharing: Replace MEM_SHARING_DEBUG with gdprintk
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
                   ` (10 preceding siblings ...)
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 11/20] x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 13/20] x86/mem_sharing: ASSERT that p2m_set_entry succeeds Tamas K Lengyel
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

Using XENLOG_ERR level since this is only used in debug paths (ie. it's
expected the user already has loglvl=all set).

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 81 ++++++++++++++++++-----------------
 1 file changed, 41 insertions(+), 40 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 6e81e1a895..90b6371e2f 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -49,9 +49,6 @@ typedef struct pg_lock_data {
 
 static DEFINE_PER_CPU(pg_lock_data_t, __pld);
 
-#define MEM_SHARING_DEBUG(_f, _a...)                                  \
-    debugtrace_printk("mem_sharing_debug: %s(): " _f, __func__, ##_a)
-
 /* Reverse map defines */
 #define RMAP_HASHTAB_ORDER  0
 #define RMAP_HASHTAB_SIZE   \
@@ -491,8 +488,9 @@ static int audit(void)
         /* If we can't lock it, it's definitely not a shared page */
         if ( !mem_sharing_page_lock(pg) )
         {
-           MEM_SHARING_DEBUG("mfn %lx in audit list, but cannot be locked (%lx)!\n",
-                              mfn_x(mfn), pg->u.inuse.type_info);
+           gdprintk(XENLOG_ERR,
+                    "mfn %lx in audit list, but cannot be locked (%lx)!\n",
+                    mfn_x(mfn), pg->u.inuse.type_info);
            errors++;
            continue;
         }
@@ -500,8 +498,9 @@ static int audit(void)
         /* Check if the MFN has correct type, owner and handle. */
         if ( (pg->u.inuse.type_info & PGT_type_mask) != PGT_shared_page )
         {
-           MEM_SHARING_DEBUG("mfn %lx in audit list, but not PGT_shared_page (%lx)!\n",
-                              mfn_x(mfn), pg->u.inuse.type_info & PGT_type_mask);
+           gdprintk(XENLOG_ERR,
+                    "mfn %lx in audit list, but not PGT_shared_page (%lx)!\n",
+                    mfn_x(mfn), pg->u.inuse.type_info & PGT_type_mask);
            errors++;
            continue;
         }
@@ -509,24 +508,24 @@ static int audit(void)
         /* Check the page owner. */
         if ( page_get_owner(pg) != dom_cow )
         {
-           MEM_SHARING_DEBUG("mfn %lx shared, but wrong owner (%hu)!\n",
-                             mfn_x(mfn), page_get_owner(pg)->domain_id);
+           gdprintk(XENLOG_ERR, "mfn %lx shared, but wrong owner (%hu)!\n",
+                    mfn_x(mfn), page_get_owner(pg)->domain_id);
            errors++;
         }
 
         /* Check the m2p entry */
         if ( !SHARED_M2P(get_gpfn_from_mfn(mfn_x(mfn))) )
         {
-           MEM_SHARING_DEBUG("mfn %lx shared, but wrong m2p entry (%lx)!\n",
-                             mfn_x(mfn), get_gpfn_from_mfn(mfn_x(mfn)));
+           gdprintk(XENLOG_ERR, "mfn %lx shared, but wrong m2p entry (%lx)!\n",
+                    mfn_x(mfn), get_gpfn_from_mfn(mfn_x(mfn)));
            errors++;
         }
 
         /* Check we have a list */
         if ( (!pg->sharing) || !rmap_has_entries(pg) )
         {
-           MEM_SHARING_DEBUG("mfn %lx shared, but empty gfn list!\n",
-                             mfn_x(mfn));
+           gdprintk(XENLOG_ERR, "mfn %lx shared, but empty gfn list!\n",
+                    mfn_x(mfn));
            errors++;
            continue;
         }
@@ -545,24 +544,26 @@ static int audit(void)
             d = get_domain_by_id(g->domain);
             if ( d == NULL )
             {
-                MEM_SHARING_DEBUG("Unknown dom: %hu, for PFN=%lx, MFN=%lx\n",
-                                  g->domain, g->gfn, mfn_x(mfn));
+                gdprintk(XENLOG_ERR,
+                         "Unknown dom: %hu, for PFN=%lx, MFN=%lx\n",
+                         g->domain, g->gfn, mfn_x(mfn));
                 errors++;
                 continue;
             }
             o_mfn = get_gfn_query_unlocked(d, g->gfn, &t);
             if ( !mfn_eq(o_mfn, mfn) )
             {
-                MEM_SHARING_DEBUG("Incorrect P2M for d=%hu, PFN=%lx."
-                                  "Expecting MFN=%lx, got %lx\n",
-                                  g->domain, g->gfn, mfn_x(mfn), mfn_x(o_mfn));
+                gdprintk(XENLOG_ERR, "Incorrect P2M for d=%hu, PFN=%lx."
+                         "Expecting MFN=%lx, got %lx\n",
+                         g->domain, g->gfn, mfn_x(mfn), mfn_x(o_mfn));
                 errors++;
             }
             if ( t != p2m_ram_shared )
             {
-                MEM_SHARING_DEBUG("Incorrect P2M type for d=%hu, PFN=%lx MFN=%lx."
-                                  "Expecting t=%d, got %d\n",
-                                  g->domain, g->gfn, mfn_x(mfn), p2m_ram_shared, t);
+                gdprintk(XENLOG_ERR,
+                         "Incorrect P2M type for d=%hu, PFN=%lx MFN=%lx."
+                         "Expecting t=%d, got %d\n",
+                         g->domain, g->gfn, mfn_x(mfn), p2m_ram_shared, t);
                 errors++;
             }
             put_domain(d);
@@ -571,10 +572,10 @@ static int audit(void)
         /* The type count has an extra ref because we have locked the page */
         if ( (nr_gfns + 1) != (pg->u.inuse.type_info & PGT_count_mask) )
         {
-            MEM_SHARING_DEBUG("Mismatched counts for MFN=%lx."
-                              "nr_gfns in list %lu, in type_info %lx\n",
-                              mfn_x(mfn), nr_gfns,
-                              (pg->u.inuse.type_info & PGT_count_mask));
+            gdprintk(XENLOG_ERR, "Mismatched counts for MFN=%lx."
+                     "nr_gfns in list %lu, in type_info %lx\n",
+                     mfn_x(mfn), nr_gfns,
+                     (pg->u.inuse.type_info & PGT_count_mask));
             errors++;
         }
 
@@ -585,8 +586,8 @@ static int audit(void)
 
     if ( count_found != count_expected )
     {
-        MEM_SHARING_DEBUG("Expected %ld shared mfns, found %ld.",
-                          count_expected, count_found);
+        gdprintk(XENLOG_ERR, "Expected %ld shared mfns, found %ld.",
+                 count_expected, count_found);
         errors++;
     }
 
@@ -765,12 +766,12 @@ static int debug_mfn(mfn_t mfn)
         return -EINVAL;
     }
 
-    MEM_SHARING_DEBUG(
-            "Debug page: MFN=%lx is ci=%lx, ti=%lx, owner_id=%d\n",
-            mfn_x(page_to_mfn(page)),
-            page->count_info,
-            page->u.inuse.type_info,
-            page_get_owner(page)->domain_id);
+    gdprintk(XENLOG_ERR,
+             "Debug page: MFN=%lx is ci=%lx, ti=%lx, owner_id=%d\n",
+             mfn_x(page_to_mfn(page)),
+             page->count_info,
+             page->u.inuse.type_info,
+             page_get_owner(page)->domain_id);
 
     /* -1 because the page is locked and that's an additional type ref */
     num_refs = ((int) (page->u.inuse.type_info & PGT_count_mask)) - 1;
@@ -786,8 +787,9 @@ static int debug_gfn(struct domain *d, gfn_t gfn)
 
     mfn = get_gfn_query(d, gfn_x(gfn), &p2mt);
 
-    MEM_SHARING_DEBUG("Debug for dom%d, gfn=%" PRI_gfn "\n",
-                      d->domain_id, gfn_x(gfn));
+    gdprintk(XENLOG_ERR, "Debug for dom%d, gfn=%" PRI_gfn "\n",
+             d->domain_id, gfn_x(gfn));
+
     num_refs = debug_mfn(mfn);
     put_gfn(d, gfn_x(gfn));
 
@@ -803,14 +805,13 @@ static int debug_gref(struct domain *d, grant_ref_t ref)
     rc = mem_sharing_gref_to_gfn(d->grant_table, ref, &gfn, &status);
     if ( rc )
     {
-        MEM_SHARING_DEBUG("Asked to debug [dom=%d,gref=%u]: error %d.\n",
-                          d->domain_id, ref, rc);
+        gdprintk(XENLOG_ERR, "Asked to debug [dom=%d,gref=%u]: error %d.\n",
+                 d->domain_id, ref, rc);
         return rc;
     }
 
-    MEM_SHARING_DEBUG(
-            "==> Grant [dom=%d,ref=%d], status=%x. ",
-            d->domain_id, ref, status);
+    gdprintk(XENLOG_ERR, " ==> Grant [dom=%d,ref=%d], status=%x. ",
+             d->domain_id, ref, status);
 
     return debug_gfn(d, gfn);
 }
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 13/20] x86/mem_sharing: ASSERT that p2m_set_entry succeeds
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
                   ` (11 preceding siblings ...)
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 12/20] x86/mem_sharing: Replace MEM_SHARING_DEBUG with gdprintk Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 14/20] x86/mem_sharing: Enable mem_sharing on first memop Tamas K Lengyel
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 46 +++++++++++++++++------------------
 1 file changed, 22 insertions(+), 24 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 90b6371e2f..e5c1424f9b 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1113,39 +1113,37 @@ int add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle_t sh,
         goto err_unlock;
     }
 
+    /*
+     * Must succeed, we just read the entry and hold the p2m lock
+     * via get_two_gfns.
+     */
     ret = p2m_set_entry(p2m, _gfn(cgfn), smfn, PAGE_ORDER_4K,
                         p2m_ram_shared, a);
+    ASSERT(!ret);
 
-    /* Tempted to turn this into an assert */
-    if ( ret )
+    /*
+     * There is a chance we're plugging a hole where a paged out
+     * page was.
+     */
+    if ( p2m_is_paging(cmfn_type) && (cmfn_type != p2m_ram_paging_out) )
     {
-        mem_sharing_gfn_destroy(spage, cd, gfn_info);
-        put_page_and_type(spage);
-    } else {
+        atomic_dec(&cd->paged_pages);
         /*
-         * There is a chance we're plugging a hole where a paged out
-         * page was.
+         * Further, there is a chance this was a valid page.
+         * Don't leak it.
          */
-        if ( p2m_is_paging(cmfn_type) && (cmfn_type != p2m_ram_paging_out) )
+        if ( mfn_valid(cmfn) )
         {
-            atomic_dec(&cd->paged_pages);
-            /*
-             * Further, there is a chance this was a valid page.
-             * Don't leak it.
-             */
-            if ( mfn_valid(cmfn) )
-            {
-                struct page_info *cpage = mfn_to_page(cmfn);
+            struct page_info *cpage = mfn_to_page(cmfn);
 
-                if ( !get_page(cpage, cd) )
-                {
-                    domain_crash(cd);
-                    ret = -EOVERFLOW;
-                    goto err_unlock;
-                }
-                put_page_alloc_ref(cpage);
-                put_page(cpage);
+            if ( !get_page(cpage, cd) )
+            {
+                domain_crash(cd);
+                ret = -EOVERFLOW;
+                goto err_unlock;
             }
+            put_page_alloc_ref(cpage);
+            put_page(cpage);
         }
     }
 
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 14/20] x86/mem_sharing: Enable mem_sharing on first memop
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
                   ` (12 preceding siblings ...)
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 13/20] x86/mem_sharing: ASSERT that p2m_set_entry succeeds Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 15/20] x86/mem_sharing: Skip xen heap pages in memshr nominate Tamas K Lengyel
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

It is wasteful to require separate hypercalls to enable sharing on both the
parent and the client domain during VM forking. To speed things up we enable
sharing on the first memop in case it wasn't already enabled.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 39 +++++++++++++++++++++--------------
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index e5c1424f9b..48809a5349 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1402,6 +1402,24 @@ static int range_share(struct domain *d, struct domain *cd,
     return rc;
 }
 
+static inline int mem_sharing_control(struct domain *d, bool enable)
+{
+    if ( enable )
+    {
+        if ( unlikely(!is_hvm_domain(d)) )
+            return -ENOSYS;
+
+        if ( unlikely(!hap_enabled(d)) )
+            return -ENODEV;
+
+        if ( unlikely(is_iommu_enabled(d)) )
+            return -EXDEV;
+    }
+
+    d->arch.hvm.mem_sharing.enabled = enable;
+    return 0;
+}
+
 int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 {
     int rc;
@@ -1423,10 +1441,8 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
     if ( rc )
         goto out;
 
-    /* Only HAP is supported */
-    rc = -ENODEV;
-    if ( !mem_sharing_enabled(d) )
-        goto out;
+    if ( !mem_sharing_enabled(d) && (rc = mem_sharing_control(d, true)) )
+        return rc;
 
     switch ( mso.op )
     {
@@ -1675,24 +1691,15 @@ int mem_sharing_domctl(struct domain *d, struct xen_domctl_mem_sharing_op *mec)
 {
     int rc;
 
-    /* Only HAP is supported */
-    if ( !hap_enabled(d) )
-         return -ENODEV;
-
     switch(mec->op)
     {
         case XEN_DOMCTL_MEM_SHARING_CONTROL:
-        {
-            rc = 0;
-            if ( unlikely(is_iommu_enabled(d) && mec->u.enable) )
-                rc = -EXDEV;
-            else
-                d->arch.hvm.mem_sharing.enabled = mec->u.enable;
-        }
-        break;
+            rc = mem_sharing_control(d, mec->u.enable);
+            break;
 
         default:
             rc = -ENOSYS;
+            break;
     }
 
     return rc;
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 15/20] x86/mem_sharing: Skip xen heap pages in memshr nominate
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
                   ` (13 preceding siblings ...)
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 14/20] x86/mem_sharing: Enable mem_sharing on first memop Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 16/20] x86/mem_sharing: check page type count earlier Tamas K Lengyel
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

Trying to share these would fail anyway, better to skip them early.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 48809a5349..b3607b1bce 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -852,6 +852,11 @@ static int nominate_page(struct domain *d, gfn_t gfn,
     if ( !p2m_is_sharable(p2mt) )
         goto out;
 
+    /* Skip xen heap pages */
+    page = mfn_to_page(mfn);
+    if ( !page || is_xen_heap_page(page) )
+        goto out;
+
     /* Check if there are mem_access/remapped altp2m entries for this page */
     if ( altp2m_active(d) )
     {
@@ -882,7 +887,6 @@ static int nominate_page(struct domain *d, gfn_t gfn,
     }
 
     /* Try to convert the mfn to the sharable type */
-    page = mfn_to_page(mfn);
     ret = page_make_sharable(d, page, expected_refcnt);
     if ( ret )
         goto out;
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 16/20] x86/mem_sharing: check page type count earlier
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
                   ` (14 preceding siblings ...)
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 15/20] x86/mem_sharing: Skip xen heap pages in memshr nominate Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 17/20] xen/mem_sharing: VM forking Tamas K Lengyel
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index b3607b1bce..c44e7f2299 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -649,19 +649,18 @@ static int page_make_sharable(struct domain *d,
         return -EBUSY;
     }
 
-    /* Change page type and count atomically */
-    if ( !get_page_and_type(page, d, PGT_shared_page) )
+    /* Check if page is already typed and bail early if it is */
+    if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
     {
         spin_unlock(&d->page_alloc_lock);
-        return -EINVAL;
+        return -EEXIST;
     }
 
-    /* Check it wasn't already sharable and undo if it was */
-    if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
+    /* Change page type and count atomically */
+    if ( !get_page_and_type(page, d, PGT_shared_page) )
     {
         spin_unlock(&d->page_alloc_lock);
-        put_page_and_type(page);
-        return -EEXIST;
+        return -EINVAL;
     }
 
     /*
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 17/20] xen/mem_sharing: VM forking
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
                   ` (15 preceding siblings ...)
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 16/20] x86/mem_sharing: check page type count earlier Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 18/20] xen/mem_access: Use __get_gfn_type_access in set_mem_access Tamas K Lengyel
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tamas K Lengyel, Jan Beulich, Julien Grall, Roger Pau Monné

VM forking is the process of creating a domain with an empty memory space and a
parent domain specified from which to populate the memory when necessary. For
the new domain to be functional the VM state is copied over as part of the fork
operation (HVM params, hap allocation, etc).

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/hvm/hvm.c            |   2 +-
 xen/arch/x86/mm/mem_sharing.c     | 228 ++++++++++++++++++++++++++++++
 xen/arch/x86/mm/p2m.c             |  11 +-
 xen/include/asm-x86/mem_sharing.h |  20 ++-
 xen/include/public/memory.h       |   5 +
 xen/include/xen/sched.h           |   1 +
 6 files changed, 263 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 8f90841813..cafd07c67d 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1913,7 +1913,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
     }
 #endif
 
-    /* Spurious fault? PoD and log-dirty also take this path. */
+    /* Spurious fault? PoD, log-dirty and VM forking also take this path. */
     if ( p2m_is_ram(p2mt) )
     {
         rc = 1;
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index c44e7f2299..e93ad2ec5a 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -22,11 +22,13 @@
 
 #include <xen/types.h>
 #include <xen/domain_page.h>
+#include <xen/event.h>
 #include <xen/spinlock.h>
 #include <xen/rwlock.h>
 #include <xen/mm.h>
 #include <xen/grant_table.h>
 #include <xen/sched.h>
+#include <xen/sched-if.h>
 #include <xen/rcupdate.h>
 #include <xen/guest_access.h>
 #include <xen/vm_event.h>
@@ -36,6 +38,9 @@
 #include <asm/altp2m.h>
 #include <asm/atomic.h>
 #include <asm/event.h>
+#include <asm/hap.h>
+#include <asm/hvm/hvm.h>
+#include <asm/hvm/save.h>
 #include <xsm/xsm.h>
 
 #include "mm-locks.h"
@@ -1423,6 +1428,200 @@ static inline int mem_sharing_control(struct domain *d, bool enable)
     return 0;
 }
 
+/*
+ * Forking a page only gets called when the VM faults due to no entry being
+ * in the EPT for the access. Depending on the type of access we either
+ * populate the physmap with a shared entry for read-only access or
+ * fork the page if its a write access.
+ *
+ * The client p2m is already locked so we only need to lock
+ * the parent's here.
+ */
+int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool unsharing)
+{
+    int rc = -ENOENT;
+    shr_handle_t handle;
+    struct domain *parent;
+    struct p2m_domain *p2m;
+    unsigned long gfn_l = gfn_x(gfn);
+    mfn_t mfn, new_mfn;
+    p2m_type_t p2mt;
+    struct page_info *page;
+
+    if ( !mem_sharing_is_fork(d) )
+        return -ENOENT;
+
+    parent = d->parent;
+
+    if ( !unsharing )
+    {
+        /* For read-only accesses we just add a shared entry to the physmap */
+        while ( parent )
+        {
+            if ( !(rc = nominate_page(parent, gfn, 0, &handle)) )
+                break;
+
+            parent = parent->parent;
+        }
+
+        if ( !rc )
+        {
+            /* The client's p2m is already locked */
+            struct p2m_domain *pp2m = p2m_get_hostp2m(parent);
+
+            p2m_lock(pp2m);
+            rc = add_to_physmap(parent, gfn_l, handle, d, gfn_l, false);
+            p2m_unlock(pp2m);
+
+            if ( !rc )
+                return 0;
+        }
+    }
+
+    /*
+     * If it's a write access (ie. unsharing) or if adding a shared entry to
+     * the physmap failed we'll fork the page directly.
+     */
+    p2m = p2m_get_hostp2m(d);
+    parent = d->parent;
+
+    while ( parent )
+    {
+        mfn = get_gfn_query(parent, gfn_l, &p2mt);
+
+        if ( mfn_valid(mfn) && p2m_is_any_ram(p2mt) )
+            break;
+
+        put_gfn(parent, gfn_l);
+        parent = parent->parent;
+    }
+
+    if ( !parent )
+        return -ENOENT;
+
+    if ( !(page = alloc_domheap_page(d, 0)) )
+    {
+        put_gfn(parent, gfn_l);
+        return -ENOMEM;
+    }
+
+    new_mfn = page_to_mfn(page);
+    copy_domain_page(new_mfn, mfn);
+    set_gpfn_from_mfn(mfn_x(new_mfn), gfn_l);
+
+    put_gfn(parent, gfn_l);
+
+    return p2m->set_entry(p2m, gfn, new_mfn, PAGE_ORDER_4K, p2m_ram_rw,
+                          p2m->default_access, -1);
+}
+
+static int bring_up_vcpus(struct domain *cd, struct cpupool *cpupool)
+{
+    int ret;
+    unsigned int i;
+
+    if ( (ret = cpupool_move_domain(cd, cpupool)) )
+        return ret;
+
+    for ( i = 0; i < cd->max_vcpus; i++ )
+    {
+        if ( cd->vcpu[i] )
+            continue;
+
+        if ( !vcpu_create(cd, i) )
+            return -EINVAL;
+    }
+
+    domain_update_node_affinity(cd);
+    return 0;
+}
+
+static int fork_hap_allocation(struct domain *d, struct domain *cd)
+{
+    int rc;
+    bool preempted;
+    unsigned long mb = hap_get_allocation(d);
+
+    if ( mb == hap_get_allocation(cd) )
+        return 0;
+
+    paging_lock(cd);
+    rc = hap_set_allocation(cd, mb << (20 - PAGE_SHIFT), &preempted);
+    paging_unlock(cd);
+
+    if ( rc )
+        return rc;
+
+    if ( preempted )
+        return -ERESTART;
+
+    return 0;
+}
+
+static int fork_hvm(struct domain *d, struct domain *cd)
+{
+    int rc, i;
+    struct hvm_domain_context c = { 0 };
+    uint32_t tsc_mode;
+    uint32_t gtsc_khz;
+    uint32_t incarnation;
+    uint64_t elapsed_nsec;
+
+    c.size = hvm_save_size(d);
+    if ( (c.data = xmalloc_bytes(c.size)) == NULL )
+        return -ENOMEM;
+
+    for ( i = 0; i < HVM_NR_PARAMS; i++ )
+    {
+        uint64_t value = 0;
+
+        if ( hvm_get_param(d, i, &value) || !value )
+            continue;
+
+        if ( (rc = hvm_set_param(cd, i, value)) )
+            goto out;
+    }
+
+    tsc_get_info(d, &tsc_mode, &elapsed_nsec, &gtsc_khz, &incarnation);
+    tsc_set_info(cd, tsc_mode, elapsed_nsec, gtsc_khz, incarnation);
+
+    if ( (rc = hvm_save(d, &c)) )
+        goto out;
+
+    c.cur = 0;
+    rc = hvm_load(cd, &c);
+
+out:
+    xfree(c.data);
+    return rc;
+}
+
+static int mem_sharing_fork(struct domain *d, struct domain *cd)
+{
+    int rc;
+
+    if ( !d->controller_pause_count &&
+         (rc = domain_pause_by_systemcontroller(d)) )
+        return rc;
+
+    cd->max_pages = d->max_pages;
+    cd->max_vcpus = d->max_vcpus;
+
+    /* this is preemptible so it's the first to get done */
+    if ( (rc = fork_hap_allocation(d, cd)) )
+        return rc;
+
+    if ( (rc = bring_up_vcpus(cd, d->cpupool)) )
+        return rc;
+
+    if ( (rc = fork_hvm(d, cd)) )
+        return rc;
+
+    cd->parent = d;
+
+    return 0;
+}
+
 int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 {
     int rc;
@@ -1677,6 +1876,35 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
             rc = debug_gref(d, mso.u.debug.u.gref);
             break;
 
+        case XENMEM_sharing_op_fork:
+        {
+            struct domain *pd;
+
+            rc = -EINVAL;
+            if ( mso.u.fork._pad[0] || mso.u.fork._pad[1] ||
+                 mso.u.fork._pad[2] )
+                 goto out;
+
+            rc = rcu_lock_live_remote_domain_by_id(mso.u.fork.parent_domain,
+                                                   &pd);
+            if ( rc )
+                goto out;
+
+            if ( !mem_sharing_enabled(pd) )
+            {
+                if ( (rc = mem_sharing_control(pd, true)) )
+                    goto out;
+            }
+
+            rc = mem_sharing_fork(pd, d);
+
+            if ( rc == -ERESTART )
+                rc = hypercall_create_continuation(__HYPERVISOR_memory_op,
+                                                   "lh", XENMEM_sharing_op,
+                                                   arg);
+            rcu_unlock_domain(pd);
+            break;
+        }
         default:
             rc = -ENOSYS;
             break;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 53ea44fe3c..55c260731e 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -508,6 +508,14 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
 
     mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
 
+    /* Check if we need to fork the page */
+    if ( (q & P2M_ALLOC) && p2m_is_hole(*t) &&
+         !mem_sharing_fork_page(p2m->domain, gfn, !!(q & P2M_UNSHARE)) )
+    {
+        mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
+    }
+
+    /* Check if we need to unshare the page */
     if ( (q & P2M_UNSHARE) && p2m_is_shared(*t) )
     {
         ASSERT(p2m_is_hostp2m(p2m));
@@ -586,7 +594,8 @@ struct page_info *p2m_get_page_from_gfn(
             return page;
 
         /* Error path: not a suitable GFN at all */
-        if ( !p2m_is_ram(*t) && !p2m_is_paging(*t) && !p2m_is_pod(*t) )
+        if ( !p2m_is_ram(*t) && !p2m_is_paging(*t) && !p2m_is_pod(*t) &&
+             !mem_sharing_is_fork(p2m->domain) )
             return NULL;
     }
 
diff --git a/xen/include/asm-x86/mem_sharing.h b/xen/include/asm-x86/mem_sharing.h
index 4b982a4803..f80d3acdeb 100644
--- a/xen/include/asm-x86/mem_sharing.h
+++ b/xen/include/asm-x86/mem_sharing.h
@@ -26,8 +26,7 @@
 
 #ifdef CONFIG_MEM_SHARING
 
-struct mem_sharing_domain
-{
+struct mem_sharing_domain {
     bool enabled;
 
     /*
@@ -40,6 +39,9 @@ struct mem_sharing_domain
 #define mem_sharing_enabled(d) \
     (hap_enabled(d) && (d)->arch.hvm.mem_sharing.enabled)
 
+#define mem_sharing_is_fork(d) \
+    (mem_sharing_enabled(d) && !!((d)->parent))
+
 /* Auditing of memory sharing code? */
 #ifndef NDEBUG
 #define MEM_SHARING_AUDIT 1
@@ -90,6 +92,9 @@ int mem_sharing_unshare_page(struct domain *d,
     return rc;
 }
 
+int mem_sharing_fork_page(struct domain *d, gfn_t gfn,
+                          bool unsharing);
+
 /*
  * If called by a foreign domain, possible errors are
  *   -EBUSY -> ring full
@@ -119,6 +124,7 @@ int relinquish_shared_pages(struct domain *d);
 #else
 
 #define mem_sharing_enabled(d) false
+#define mem_sharing_is_fork(p2m) false
 
 static inline unsigned int mem_sharing_get_nr_saved_mfns(void)
 {
@@ -145,6 +151,16 @@ int mem_sharing_notify_enomem(struct domain *d, unsigned long gfn,
     return -EOPNOTSUPP;
 }
 
+static inline int mem_sharing_fork(struct domain *d, struct domain *cd, bool vcpu)
+{
+    return -EOPNOTSUPP;
+}
+
+static inline int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool lock)
+{
+    return -EOPNOTSUPP;
+}
+
 #endif
 
 #endif /* __MEM_SHARING_H__ */
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index cfdda6e2a8..90a3f4498e 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -482,6 +482,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
 #define XENMEM_sharing_op_add_physmap       6
 #define XENMEM_sharing_op_audit             7
 #define XENMEM_sharing_op_range_share       8
+#define XENMEM_sharing_op_fork              9
 
 #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
 #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
@@ -532,6 +533,10 @@ struct xen_mem_sharing_op {
                 uint32_t gref;     /* IN: gref to debug         */
             } u;
         } debug;
+        struct mem_sharing_op_fork {
+            domid_t parent_domain;
+            uint16_t _pad[3];                /* Must be set to 0 */
+        } fork;
     } u;
 };
 typedef struct xen_mem_sharing_op xen_mem_sharing_op_t;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 9f7bc69293..fcad948962 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -501,6 +501,7 @@ struct domain
     /* Memory sharing support */
 #ifdef CONFIG_MEM_SHARING
     struct vm_event_domain *vm_event_share;
+    struct domain *parent; /* VM fork parent */
 #endif
     /* Memory paging support */
 #ifdef CONFIG_HAS_MEM_PAGING
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 18/20] xen/mem_access: Use __get_gfn_type_access in set_mem_access
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
                   ` (16 preceding siblings ...)
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 17/20] xen/mem_sharing: VM forking Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-19  7:59   ` Alexandru Stefan ISAILA
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork Tamas K Lengyel
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Petre Pircalabu, Tamas K Lengyel, Tamas K Lengyel, Wei Liu,
	George Dunlap, Andrew Cooper, Jan Beulich, Alexandru Isaila,
	Roger Pau Monné

Use __get_gfn_type_access instead of p2m->get_entry to trigger page-forking
when the mem_access permission is being set on a page that has not yet been
copied over from the parent.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_access.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm/mem_access.c b/xen/arch/x86/mm/mem_access.c
index 320b9fe621..9caf08a5b2 100644
--- a/xen/arch/x86/mm/mem_access.c
+++ b/xen/arch/x86/mm/mem_access.c
@@ -303,11 +303,10 @@ static int set_mem_access(struct domain *d, struct p2m_domain *p2m,
     ASSERT(!ap2m);
 #endif
     {
-        mfn_t mfn;
         p2m_access_t _a;
         p2m_type_t t;
-
-        mfn = p2m->get_entry(p2m, gfn, &t, &_a, 0, NULL, NULL);
+        mfn_t mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &t, &_a,
+                                          P2M_ALLOC, NULL, false);
         rc = p2m->set_entry(p2m, gfn, mfn, PAGE_ORDER_4K, t, a, -1);
     }
 
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
                   ` (17 preceding siblings ...)
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 18/20] xen/mem_access: Use __get_gfn_type_access in set_mem_access Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-18 22:00   ` Julien Grall
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 20/20] xen/tools: VM forking toolstack side Tamas K Lengyel
  2019-12-19  9:48 ` [Xen-devel] [PATCH v2 00/20] VM forking Roger Pau Monné
  20 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Tamas K Lengyel, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Stefano Stabellini,
	Jan Beulich, Julien Grall, Roger Pau Monné

Implement hypercall that allows a fork to shed all memory that got allocated
for it during its execution and re-load its vCPU context from the parent VM.
This allows the forked VM to reset into the same state the parent VM is in a
faster way then creating a new fork would be. Measurements show about a 2x
speedup during normal fuzzing operations. Performance may vary depending how
much memory got allocated for the forked VM. If it has been completely
deduplicated from the parent VM then creating a new fork would likely be more
performant.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/mm/mem_sharing.c | 105 ++++++++++++++++++++++++++++++++++
 xen/include/public/memory.h   |   1 +
 2 files changed, 106 insertions(+)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index e93ad2ec5a..4735a334b9 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1622,6 +1622,87 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd)
     return 0;
 }
 
+struct gfn_free;
+struct gfn_free {
+    struct gfn_free *next;
+    struct page_info *page;
+    gfn_t gfn;
+};
+
+static int mem_sharing_fork_reset(struct domain *d, struct domain *cd)
+{
+    int rc;
+
+    struct p2m_domain* p2m = p2m_get_hostp2m(cd);
+    struct gfn_free *list = NULL;
+    struct page_info *page;
+
+    page_list_for_each(page, &cd->page_list)
+    {
+        mfn_t mfn = page_to_mfn(page);
+        if ( mfn_valid(mfn) )
+        {
+            p2m_type_t p2mt;
+            p2m_access_t p2ma;
+            gfn_t gfn = mfn_to_gfn(cd, mfn);
+            mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
+                                        0, NULL, false);
+            if ( p2m_is_ram(p2mt) )
+            {
+                struct gfn_free *gfn_free;
+                if ( !get_page(page, cd) )
+                    goto err_reset;
+
+                /*
+                 * We can't free the page while iterating over the page_list
+                 * so we build a separate list to loop over.
+                 *
+                 * We want to iterate over the page_list instead of checking
+                 * gfn from 0 to max_gfn because this is ~10x faster.
+                 */
+                gfn_free = xmalloc(struct gfn_free);
+                if ( !gfn_free )
+                    goto err_reset;
+
+                gfn_free->gfn = gfn;
+                gfn_free->page = page;
+                gfn_free->next = list;
+                list = gfn_free;
+            }
+        }
+    }
+
+    while ( list )
+    {
+        struct gfn_free *next = list->next;
+
+        rc = p2m->set_entry(p2m, list->gfn, INVALID_MFN, PAGE_ORDER_4K,
+                            p2m_invalid, p2m_access_rwx, -1);
+        put_page_alloc_ref(list->page);
+        put_page(list->page);
+
+        xfree(list);
+        list = next;
+
+        ASSERT(!rc);
+    }
+
+    if ( (rc = fork_hvm(d, cd)) )
+        return rc;
+
+ err_reset:
+    while ( list )
+    {
+        struct gfn_free *next = list->next;
+
+        put_page(list->page);
+        xfree(list);
+        list = next;
+    }
+
+    return 0;
+}
+
 int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 {
     int rc;
@@ -1905,6 +1986,30 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
             rcu_unlock_domain(pd);
             break;
         }
+
+        case XENMEM_sharing_op_fork_reset:
+        {
+            struct domain *pd;
+
+            rc = -EINVAL;
+            if ( mso.u.fork._pad[0] || mso.u.fork._pad[1] ||
+                 mso.u.fork._pad[2] )
+                 goto out;
+
+            rc = -ENOSYS;
+            if ( !d->parent )
+                goto out;
+
+            rc = rcu_lock_live_remote_domain_by_id(d->parent->domain_id, &pd);
+            if ( rc )
+                goto out;
+
+            rc = mem_sharing_fork_reset(pd, d);
+
+            rcu_unlock_domain(pd);
+            break;
+        }
+
         default:
             rc = -ENOSYS;
             break;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 90a3f4498e..e3d063e22e 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -483,6 +483,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
 #define XENMEM_sharing_op_audit             7
 #define XENMEM_sharing_op_range_share       8
 #define XENMEM_sharing_op_fork              9
+#define XENMEM_sharing_op_fork_reset        10
 
 #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
 #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [Xen-devel] [PATCH v2 20/20] xen/tools: VM forking toolstack side
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
                   ` (18 preceding siblings ...)
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork Tamas K Lengyel
@ 2019-12-18 19:40 ` Tamas K Lengyel
  2019-12-19  9:48 ` [Xen-devel] [PATCH v2 00/20] VM forking Roger Pau Monné
  20 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 19:40 UTC (permalink / raw)
  To: xen-devel; +Cc: Anthony PERARD, Ian Jackson, Tamas K Lengyel, Wei Liu

Add necessary bits to implement "xl fork-vm", "xl fork-launch-dm" and
"xl fork-reset" commands. The process is split in two to allow tools needing
access to the new VM as fast as possible after it was forked. It is expected
that under certain use-cases the second command that launches QEMU will be
skipped entirely.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 tools/libxc/include/xenctrl.h |   6 +
 tools/libxc/xc_memshr.c       |  22 ++++
 tools/libxl/libxl.h           |   7 +
 tools/libxl/libxl_create.c    | 237 +++++++++++++++++++++++-----------
 tools/libxl/libxl_dm.c        |   2 +-
 tools/libxl/libxl_dom.c       |  83 ++++++++----
 tools/libxl/libxl_internal.h  |   1 +
 tools/libxl/libxl_types.idl   |   1 +
 tools/xl/xl.h                 |   5 +
 tools/xl/xl_cmdtable.c        |  22 ++++
 tools/xl/xl_saverestore.c     |  96 ++++++++++++++
 tools/xl/xl_vmcontrol.c       |   8 ++
 12 files changed, 386 insertions(+), 104 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index b5ffa53d55..39afdb9b33 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2221,6 +2221,12 @@ int xc_memshr_range_share(xc_interface *xch,
                           uint64_t first_gfn,
                           uint64_t last_gfn);
 
+int xc_memshr_fork(xc_interface *xch,
+                   uint32_t source_domain,
+                   uint32_t client_domain);
+
+int xc_memshr_fork_reset(xc_interface *xch, uint32_t forked_domain);
+
 /* Debug calls: return the number of pages referencing the shared frame backing
  * the input argument. Should be one or greater.
  *
diff --git a/tools/libxc/xc_memshr.c b/tools/libxc/xc_memshr.c
index 5ef56a6933..ef5a5ee6a4 100644
--- a/tools/libxc/xc_memshr.c
+++ b/tools/libxc/xc_memshr.c
@@ -237,6 +237,28 @@ int xc_memshr_debug_gref(xc_interface *xch,
     return xc_memshr_memop(xch, domid, &mso);
 }
 
+int xc_memshr_fork(xc_interface *xch, uint32_t pdomid, uint32_t domid)
+{
+    xen_mem_sharing_op_t mso;
+
+    memset(&mso, 0, sizeof(mso));
+
+    mso.op = XENMEM_sharing_op_fork;
+    mso.u.fork.parent_domain = pdomid;
+
+    return xc_memshr_memop(xch, domid, &mso);
+}
+
+int xc_memshr_fork_reset(xc_interface *xch, uint32_t domid)
+{
+    xen_mem_sharing_op_t mso;
+
+    memset(&mso, 0, sizeof(mso));
+    mso.op = XENMEM_sharing_op_fork_reset;
+
+    return xc_memshr_memop(xch, domid, &mso);
+}
+
 int xc_memshr_audit(xc_interface *xch)
 {
     xen_mem_sharing_op_t mso;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 54abb9db1f..75cb070587 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1536,6 +1536,13 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
                             LIBXL_EXTERNAL_CALLERS_ONLY;
+int libxl_domain_fork_vm(libxl_ctx *ctx, uint32_t pdomid, uint32_t *domid)
+                             LIBXL_EXTERNAL_CALLERS_ONLY;
+int libxl_domain_fork_launch_dm(libxl_ctx *ctx, libxl_domain_config *d_config,
+                                uint32_t domid,
+                                const libxl_asyncprogress_how *aop_console_how)
+                                LIBXL_EXTERNAL_CALLERS_ONLY;
+int libxl_domain_fork_reset(libxl_ctx *ctx, uint32_t domid);
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 uint32_t *domid, int restore_fd,
                                 int send_back_fd,
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 32d45dcef0..e0d219596c 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -536,12 +536,12 @@ out:
     return ret;
 }
 
-int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
-                       libxl__domain_build_state *state,
-                       uint32_t *domid)
+static int libxl__domain_make_xs_entries(libxl__gc *gc, libxl_domain_config *d_config,
+                                         libxl__domain_build_state *state,
+                                         uint32_t domid)
 {
     libxl_ctx *ctx = libxl__gc_owner(gc);
-    int ret, rc, nb_vm;
+    int rc, nb_vm;
     const char *dom_type;
     char *uuid_string;
     char *dom_path, *vm_path, *libxl_path;
@@ -553,7 +553,6 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
 
     /* convenience aliases */
     libxl_domain_create_info *info = &d_config->c_info;
-    libxl_domain_build_info *b_info = &d_config->b_info;
 
     uuid_string = libxl__uuid2string(gc, info->uuid);
     if (!uuid_string) {
@@ -561,64 +560,7 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
         goto out;
     }
 
-    /* Valid domid here means we're soft resetting. */
-    if (!libxl_domid_valid_guest(*domid)) {
-        struct xen_domctl_createdomain create = {
-            .ssidref = info->ssidref,
-            .max_vcpus = b_info->max_vcpus,
-            .max_evtchn_port = b_info->event_channels,
-            .max_grant_frames = b_info->max_grant_frames,
-            .max_maptrack_frames = b_info->max_maptrack_frames,
-        };
-
-        if (info->type != LIBXL_DOMAIN_TYPE_PV) {
-            create.flags |= XEN_DOMCTL_CDF_hvm;
-            create.flags |=
-                libxl_defbool_val(info->hap) ? XEN_DOMCTL_CDF_hap : 0;
-            create.flags |=
-                libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off;
-        }
-
-        assert(info->passthrough != LIBXL_PASSTHROUGH_DEFAULT);
-        LOG(DETAIL, "passthrough: %s",
-            libxl_passthrough_to_string(info->passthrough));
-
-        if (info->passthrough != LIBXL_PASSTHROUGH_DISABLED)
-            create.flags |= XEN_DOMCTL_CDF_iommu;
-
-        if (info->passthrough == LIBXL_PASSTHROUGH_SYNC_PT)
-            create.iommu_opts |= XEN_DOMCTL_IOMMU_no_sharept;
-
-        /* Ultimately, handle is an array of 16 uint8_t, same as uuid */
-        libxl_uuid_copy(ctx, (libxl_uuid *)&create.handle, &info->uuid);
-
-        ret = libxl__arch_domain_prepare_config(gc, d_config, &create);
-        if (ret < 0) {
-            LOGED(ERROR, *domid, "fail to get domain config");
-            rc = ERROR_FAIL;
-            goto out;
-        }
-
-        ret = xc_domain_create(ctx->xch, domid, &create);
-        if (ret < 0) {
-            LOGED(ERROR, *domid, "domain creation fail");
-            rc = ERROR_FAIL;
-            goto out;
-        }
-
-        rc = libxl__arch_domain_save_config(gc, d_config, state, &create);
-        if (rc < 0)
-            goto out;
-    }
-
-    ret = xc_cpupool_movedomain(ctx->xch, info->poolid, *domid);
-    if (ret < 0) {
-        LOGED(ERROR, *domid, "domain move fail");
-        rc = ERROR_FAIL;
-        goto out;
-    }
-
-    dom_path = libxl__xs_get_dompath(gc, *domid);
+    dom_path = libxl__xs_get_dompath(gc, domid);
     if (!dom_path) {
         rc = ERROR_FAIL;
         goto out;
@@ -626,12 +568,12 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
 
     vm_path = GCSPRINTF("/vm/%s", uuid_string);
     if (!vm_path) {
-        LOGD(ERROR, *domid, "cannot allocate create paths");
+        LOGD(ERROR, domid, "cannot allocate create paths");
         rc = ERROR_FAIL;
         goto out;
     }
 
-    libxl_path = libxl__xs_libxl_path(gc, *domid);
+    libxl_path = libxl__xs_libxl_path(gc, domid);
     if (!libxl_path) {
         rc = ERROR_FAIL;
         goto out;
@@ -642,10 +584,10 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
 
     roperm[0].id = 0;
     roperm[0].perms = XS_PERM_NONE;
-    roperm[1].id = *domid;
+    roperm[1].id = domid;
     roperm[1].perms = XS_PERM_READ;
 
-    rwperm[0].id = *domid;
+    rwperm[0].id = domid;
     rwperm[0].perms = XS_PERM_NONE;
 
 retry_transaction:
@@ -663,7 +605,7 @@ retry_transaction:
                     noperm, ARRAY_SIZE(noperm));
 
     xs_write(ctx->xsh, t, GCSPRINTF("%s/vm", dom_path), vm_path, strlen(vm_path));
-    rc = libxl__domain_rename(gc, *domid, 0, info->name, t);
+    rc = libxl__domain_rename(gc, domid, 0, info->name, t);
     if (rc)
         goto out;
 
@@ -740,7 +682,7 @@ retry_transaction:
 
     vm_list = libxl_list_vm(ctx, &nb_vm);
     if (!vm_list) {
-        LOGD(ERROR, *domid, "cannot get number of running guests");
+        LOGD(ERROR, domid, "cannot get number of running guests");
         rc = ERROR_FAIL;
         goto out;
     }
@@ -764,7 +706,7 @@ retry_transaction:
             t = 0;
             goto retry_transaction;
         }
-        LOGED(ERROR, *domid, "domain creation ""xenstore transaction commit failed");
+        LOGED(ERROR, domid, "domain creation ""xenstore transaction commit failed");
         rc = ERROR_FAIL;
         goto out;
     }
@@ -776,6 +718,80 @@ retry_transaction:
     return rc;
 }
 
+int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
+                       libxl__domain_build_state *state,
+                       uint32_t *domid)
+{
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+    int ret, rc;
+
+    /* convenience aliases */
+    libxl_domain_create_info *info = &d_config->c_info;
+    libxl_domain_build_info *b_info = &d_config->b_info;
+
+    /* Valid domid here means we're soft resetting. */
+    if (!libxl_domid_valid_guest(*domid)) {
+        struct xen_domctl_createdomain create = {
+            .ssidref = info->ssidref,
+            .max_vcpus = b_info->max_vcpus,
+            .max_evtchn_port = b_info->event_channels,
+            .max_grant_frames = b_info->max_grant_frames,
+            .max_maptrack_frames = b_info->max_maptrack_frames,
+        };
+
+        if (info->type != LIBXL_DOMAIN_TYPE_PV) {
+            create.flags |= XEN_DOMCTL_CDF_hvm;
+            create.flags |=
+                libxl_defbool_val(info->hap) ? XEN_DOMCTL_CDF_hap : 0;
+            create.flags |=
+                libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off;
+        }
+
+        assert(info->passthrough != LIBXL_PASSTHROUGH_DEFAULT);
+        LOG(DETAIL, "passthrough: %s",
+            libxl_passthrough_to_string(info->passthrough));
+
+        if (info->passthrough != LIBXL_PASSTHROUGH_DISABLED)
+            create.flags |= XEN_DOMCTL_CDF_iommu;
+
+        if (info->passthrough == LIBXL_PASSTHROUGH_SYNC_PT)
+            create.iommu_opts |= XEN_DOMCTL_IOMMU_no_sharept;
+
+        /* Ultimately, handle is an array of 16 uint8_t, same as uuid */
+        libxl_uuid_copy(ctx, (libxl_uuid *)&create.handle, &info->uuid);
+
+        ret = libxl__arch_domain_prepare_config(gc, d_config, &create);
+        if (ret < 0) {
+            LOGED(ERROR, *domid, "fail to get domain config");
+            rc = ERROR_FAIL;
+            goto out;
+        }
+
+        ret = xc_domain_create(ctx->xch, domid, &create);
+        if (ret < 0) {
+            LOGED(ERROR, *domid, "domain creation fail");
+            rc = ERROR_FAIL;
+            goto out;
+        }
+
+        rc = libxl__arch_domain_save_config(gc, d_config, state, &create);
+        if (rc < 0)
+            goto out;
+    }
+
+    ret = xc_cpupool_movedomain(ctx->xch, info->poolid, *domid);
+    if (ret < 0) {
+        LOGED(ERROR, *domid, "domain move fail");
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = libxl__domain_make_xs_entries(gc, d_config, state, *domid);
+
+out:
+    return rc;
+}
+
 static int store_libxl_entry(libxl__gc *gc, uint32_t domid,
                              libxl_domain_build_info *b_info)
 {
@@ -1097,15 +1113,31 @@ static void initiate_domain_create(libxl__egc *egc,
     ret = libxl__domain_config_setdefault(gc,d_config,domid);
     if (ret) goto error_out;
 
-    ret = libxl__domain_make(gc, d_config, &dcs->build_state, &domid);
-    if (ret) {
-        LOGD(ERROR, domid, "cannot make domain: %d", ret);
+    if ( !d_config->dm_restore_file )
+    {
+        ret = libxl__domain_make(gc, d_config, &dcs->build_state, &domid);
         dcs->guest_domid = domid;
+
+        if (ret) {
+            LOGD(ERROR, domid, "cannot make domain: %d", ret);
+            ret = ERROR_FAIL;
+            goto error_out;
+        }
+    } else if ( dcs->guest_domid != INVALID_DOMID ) {
+        domid = dcs->guest_domid;
+
+        ret = libxl__domain_make_xs_entries(gc, d_config, &dcs->build_state, domid);
+        if (ret) {
+            LOGD(ERROR, domid, "cannot make domain: %d", ret);
+            ret = ERROR_FAIL;
+            goto error_out;
+        }
+    } else {
+        LOGD(ERROR, domid, "cannot make domain");
         ret = ERROR_FAIL;
         goto error_out;
     }
 
-    dcs->guest_domid = domid;
     dcs->sdss.dm.guest_domid = 0; /* means we haven't spawned */
 
     /* post-4.13 todo: move these next bits of defaulting to
@@ -1141,7 +1173,7 @@ static void initiate_domain_create(libxl__egc *egc,
     if (ret)
         goto error_out;
 
-    if (restore_fd >= 0 || dcs->domid_soft_reset != INVALID_DOMID) {
+    if (restore_fd >= 0 || dcs->domid_soft_reset != INVALID_DOMID || d_config->dm_restore_file) {
         LOGD(DEBUG, domid, "restoring, not running bootloader");
         domcreate_bootloader_done(egc, &dcs->bl, 0);
     } else  {
@@ -1217,7 +1249,16 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     dcs->sdss.dm.callback = domcreate_devmodel_started;
     dcs->sdss.callback = domcreate_devmodel_started;
 
-    if (restore_fd < 0 && dcs->domid_soft_reset == INVALID_DOMID) {
+    if (restore_fd < 0 && dcs->domid_soft_reset == INVALID_DOMID && !d_config->dm_restore_file) {
+        rc = libxl__domain_build(gc, d_config, domid, state);
+        domcreate_rebuild_done(egc, dcs, rc);
+        return;
+    }
+
+    if ( d_config->dm_restore_file ) {
+        dcs->srs.dcs = dcs;
+        dcs->srs.ao = ao;
+        state->forked_vm = true;
         rc = libxl__domain_build(gc, d_config, domid, state);
         domcreate_rebuild_done(egc, dcs, rc);
         return;
@@ -1415,6 +1456,7 @@ static void domcreate_rebuild_done(libxl__egc *egc,
     /* convenience aliases */
     const uint32_t domid = dcs->guest_domid;
     libxl_domain_config *const d_config = dcs->guest_config;
+    libxl__domain_build_state *const state = &dcs->build_state;
 
     if (ret) {
         LOGD(ERROR, domid, "cannot (re-)build domain: %d", ret);
@@ -1422,6 +1464,9 @@ static void domcreate_rebuild_done(libxl__egc *egc,
         goto error_out;
     }
 
+    if ( d_config->dm_restore_file )
+        state->saved_state = GCSPRINTF("%s", d_config->dm_restore_file);
+
     store_libxl_entry(gc, domid, &d_config->b_info);
 
     libxl__multidev_begin(ao, &dcs->multidev);
@@ -1823,10 +1868,13 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
     GCNEW(cdcs);
     cdcs->dcs.ao = ao;
     cdcs->dcs.guest_config = d_config;
+    cdcs->dcs.guest_domid = *domid;
+
     libxl_domain_config_init(&cdcs->dcs.guest_config_saved);
     libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config);
     cdcs->dcs.restore_fd = cdcs->dcs.libxc_fd = restore_fd;
     cdcs->dcs.send_back_fd = send_back_fd;
+
     if (restore_fd > -1) {
         cdcs->dcs.restore_params = *params;
         rc = libxl__fd_flags_modify_save(gc, cdcs->dcs.restore_fd,
@@ -2069,6 +2117,43 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             ao_how, aop_console_how);
 }
 
+int libxl_domain_fork_vm(libxl_ctx *ctx, uint32_t pdomid, uint32_t *domid)
+{
+    int rc;
+    struct xen_domctl_createdomain create = {0};
+    create.flags |= XEN_DOMCTL_CDF_hvm;
+    create.flags |= XEN_DOMCTL_CDF_hap;
+    create.flags |= XEN_DOMCTL_CDF_oos_off;
+    create.arch.emulation_flags = (XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI);
+
+    create.ssidref = SECINITSID_DOMU;
+    create.max_vcpus = 1; // placeholder, will be cloned from pdomid
+    create.max_evtchn_port = 1023;
+    create.max_grant_frames = LIBXL_MAX_GRANT_FRAMES_DEFAULT;
+    create.max_maptrack_frames = LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT;
+
+    if ( (rc = xc_domain_create(ctx->xch, domid, &create)) )
+        return rc;
+
+    if ( (rc = xc_memshr_fork(ctx->xch, pdomid, *domid)) )
+        xc_domain_destroy(ctx->xch, *domid);
+
+    return rc;
+}
+
+int libxl_domain_fork_launch_dm(libxl_ctx *ctx, libxl_domain_config *d_config,
+                                uint32_t domid,
+                                const libxl_asyncprogress_how *aop_console_how)
+{
+    unset_disk_colo_restore(d_config);
+    return do_domain_create(ctx, d_config, &domid, -1, -1, 0, 0, aop_console_how);
+}
+
+int libxl_domain_fork_reset(libxl_ctx *ctx, uint32_t domid)
+{
+    return xc_memshr_fork_reset(ctx->xch, domid);
+}
+
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 uint32_t *domid, int restore_fd,
                                 int send_back_fd,
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index dac1b8ddb8..a119e789a7 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -2784,7 +2784,7 @@ static void device_model_spawn_outcome(libxl__egc *egc,
 
     libxl__domain_build_state *state = dmss->build_state;
 
-    if (state->saved_state) {
+    if (state->saved_state && !state->forked_vm) {
         ret2 = unlink(state->saved_state);
         if (ret2) {
             LOGED(ERROR, dmss->guest_domid, "%s: failed to remove device-model state %s",
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index cdb294ab8d..95e6ecc9d3 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -392,9 +392,12 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
     libxl_domain_build_info *const info = &d_config->b_info;
     libxl_ctx *ctx = libxl__gc_owner(gc);
     char *xs_domid, *con_domid;
-    int rc;
+    int rc = 0;
     uint64_t size;
 
+    if ( state->forked_vm )
+        goto skip_fork;
+
     if (xc_domain_max_vcpus(ctx->xch, domid, info->max_vcpus) != 0) {
         LOG(ERROR, "Couldn't set max vcpu count");
         return ERROR_FAIL;
@@ -499,29 +502,6 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
         }
     }
 
-
-    rc = libxl__arch_extra_memory(gc, info, &size);
-    if (rc < 0) {
-        LOGE(ERROR, "Couldn't get arch extra constant memory size");
-        return ERROR_FAIL;
-    }
-
-    if (xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + size) < 0) {
-        LOGE(ERROR, "Couldn't set max memory");
-        return ERROR_FAIL;
-    }
-
-    xs_domid = xs_read(ctx->xsh, XBT_NULL, "/tool/xenstored/domid", NULL);
-    state->store_domid = xs_domid ? atoi(xs_domid) : 0;
-    free(xs_domid);
-
-    con_domid = xs_read(ctx->xsh, XBT_NULL, "/tool/xenconsoled/domid", NULL);
-    state->console_domid = con_domid ? atoi(con_domid) : 0;
-    free(con_domid);
-
-    state->store_port = xc_evtchn_alloc_unbound(ctx->xch, domid, state->store_domid);
-    state->console_port = xc_evtchn_alloc_unbound(ctx->xch, domid, state->console_domid);
-
     if (info->type != LIBXL_DOMAIN_TYPE_PV)
         hvm_set_conf_params(ctx->xch, domid, info);
 
@@ -556,8 +536,34 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
                          info->altp2m);
     }
 
+    rc = libxl__arch_extra_memory(gc, info, &size);
+    if (rc < 0) {
+        LOGE(ERROR, "Couldn't get arch extra constant memory size");
+        return ERROR_FAIL;
+    }
+
+    if (xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + size) < 0) {
+        LOGE(ERROR, "Couldn't set max memory");
+        return ERROR_FAIL;
+    }
+
     rc = libxl__arch_domain_create(gc, d_config, domid);
+    if ( rc )
+        goto out;
 
+skip_fork:
+    xs_domid = xs_read(ctx->xsh, XBT_NULL, "/tool/xenstored/domid", NULL);
+    state->store_domid = xs_domid ? atoi(xs_domid) : 0;
+    free(xs_domid);
+
+    con_domid = xs_read(ctx->xsh, XBT_NULL, "/tool/xenconsoled/domid", NULL);
+    state->console_domid = con_domid ? atoi(con_domid) : 0;
+    free(con_domid);
+
+    state->store_port = xc_evtchn_alloc_unbound(ctx->xch, domid, state->store_domid);
+    state->console_port = xc_evtchn_alloc_unbound(ctx->xch, domid, state->console_domid);
+
+out:
     return rc;
 }
 
@@ -615,6 +621,9 @@ int libxl__build_post(libxl__gc *gc, uint32_t domid,
     char **ents;
     int i, rc;
 
+    if ( state->forked_vm )
+        goto skip_fork;
+
     if (info->num_vnuma_nodes && !info->num_vcpu_soft_affinity) {
         rc = set_vnuma_affinity(gc, domid, info);
         if (rc)
@@ -639,6 +648,7 @@ int libxl__build_post(libxl__gc *gc, uint32_t domid,
         }
     }
 
+skip_fork:
     ents = libxl__calloc(gc, 12 + (info->max_vcpus * 2) + 2, sizeof(char *));
     ents[0] = "memory/static-max";
     ents[1] = GCSPRINTF("%"PRId64, info->max_memkb);
@@ -901,14 +911,16 @@ static int hvm_build_set_params(xc_interface *handle, uint32_t domid,
                                 libxl_domain_build_info *info,
                                 int store_evtchn, unsigned long *store_mfn,
                                 int console_evtchn, unsigned long *console_mfn,
-                                domid_t store_domid, domid_t console_domid)
+                                domid_t store_domid, domid_t console_domid,
+                                bool forked_vm)
 {
     struct hvm_info_table *va_hvm;
     uint8_t *va_map, sum;
     uint64_t str_mfn, cons_mfn;
     int i;
 
-    if (info->type == LIBXL_DOMAIN_TYPE_HVM) {
+    if ( info->type == LIBXL_DOMAIN_TYPE_HVM && !forked_vm )
+    {
         va_map = xc_map_foreign_range(handle, domid,
                                       XC_PAGE_SIZE, PROT_READ | PROT_WRITE,
                                       HVM_INFO_PFN);
@@ -1224,6 +1236,23 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     struct xc_dom_image *dom = NULL;
     bool device_model = info->type == LIBXL_DOMAIN_TYPE_HVM ? true : false;
 
+    if ( state->forked_vm )
+    {
+        rc = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
+                                  &state->store_mfn, state->console_port,
+                                  &state->console_mfn, state->store_domid,
+                                  state->console_domid, state->forked_vm);
+
+        if ( rc )
+            return rc;
+
+        return xc_dom_gnttab_seed(ctx->xch, domid, true,
+                                  state->console_mfn,
+                                  state->store_mfn,
+                                  state->console_domid,
+                                  state->store_domid);
+    }
+
     xc_dom_loginit(ctx->xch);
 
     /*
@@ -1348,7 +1377,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     rc = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
                                &state->store_mfn, state->console_port,
                                &state->console_mfn, state->store_domid,
-                               state->console_domid);
+                               state->console_domid, false);
     if (rc != 0) {
         LOG(ERROR, "hvm build set params failed");
         goto out;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index b5adbfe4b7..ea6fe133a5 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1360,6 +1360,7 @@ typedef struct {
 
     char *saved_state;
     int dm_monitor_fd;
+    bool forked_vm;
 
     libxl__file_reference pv_kernel;
     libxl__file_reference pv_ramdisk;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 7921950f6a..7c4c4057a9 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -956,6 +956,7 @@ libxl_domain_config = Struct("domain_config", [
     ("on_watchdog", libxl_action_on_shutdown),
     ("on_crash", libxl_action_on_shutdown),
     ("on_soft_reset", libxl_action_on_shutdown),
+    ("dm_restore_file", string, {'const': True}),
     ], dir=DIR_IN)
 
 libxl_diskinfo = Struct("diskinfo", [
diff --git a/tools/xl/xl.h b/tools/xl/xl.h
index 60bdad8ffb..9bdad6526e 100644
--- a/tools/xl/xl.h
+++ b/tools/xl/xl.h
@@ -31,6 +31,7 @@ struct cmd_spec {
 };
 
 struct domain_create {
+    uint32_t ddomid; /* fork launch dm for this domid */
     int debug;
     int daemonize;
     int monitor; /* handle guest reboots etc */
@@ -45,6 +46,7 @@ struct domain_create {
     const char *config_file;
     char *extra_config; /* extra config string */
     const char *restore_file;
+    const char *dm_restore_file;
     char *colo_proxy_script;
     bool userspace_colo_proxy;
     int migrate_fd; /* -1 means none */
@@ -127,6 +129,9 @@ int main_pciassignable_remove(int argc, char **argv);
 int main_pciassignable_list(int argc, char **argv);
 #ifndef LIBXL_HAVE_NO_SUSPEND_RESUME
 int main_restore(int argc, char **argv);
+int main_fork_vm(int argc, char **argv);
+int main_fork_launch_dm(int argc, char **argv);
+int main_fork_reset(int argc, char **argv);
 int main_migrate_receive(int argc, char **argv);
 int main_save(int argc, char **argv);
 int main_migrate(int argc, char **argv);
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 5baa6023aa..94217e4ed4 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -180,6 +180,28 @@ struct cmd_spec cmd_table[] = {
       "-V, --vncviewer          Connect to the VNC display after the domain is created.\n"
       "-A, --vncviewer-autopass Pass VNC password to viewer via stdin."
     },
+    { "fork-vm",
+      &main_fork_vm, 0, 1,
+      "Fork a domain from the running parent domid",
+      "[options] <ParentDomid>",
+      "-h                       Print this help.\n"
+      "-d                       Enable debug messages.\n"
+    },
+    { "fork-launch-dm",
+      &main_fork_launch_dm, 0, 1,
+      "Launch the device model for a forked VM",
+      "[options] <ConfigFile> <DmRestoreFile> <Domid>",
+      "-h                       Print this help.\n"
+      "-p                       Do not unpause domain after restoring it.\n"
+      "-d                       Enable debug messages.\n"
+    },
+    { "fork-reset",
+      &main_fork_reset, 0, 1,
+      "Launch the device model for a forked VM",
+      "[options] <Domid>",
+      "-h                       Print this help.\n"
+      "-d                       Enable debug messages.\n"
+    },
     { "migrate-receive",
       &main_migrate_receive, 0, 1,
       "Restore a domain from a saved state",
diff --git a/tools/xl/xl_saverestore.c b/tools/xl/xl_saverestore.c
index 9be033fe65..c1dd74f33e 100644
--- a/tools/xl/xl_saverestore.c
+++ b/tools/xl/xl_saverestore.c
@@ -229,6 +229,102 @@ int main_restore(int argc, char **argv)
     return EXIT_SUCCESS;
 }
 
+int main_fork_vm(int argc, char **argv)
+{
+    int debug = 0;
+    uint32_t pdomid = 0, domid = INVALID_DOMID;
+    int opt;
+
+    SWITCH_FOREACH_OPT(opt, "d", NULL, "fork-vm", 1) {
+    case 'd':
+        debug = 1;
+        break;
+    }
+
+    if (argc-optind == 1) {
+        pdomid = atoi(argv[optind]);
+    } else {
+        help("fork-vm");
+        return EXIT_FAILURE;
+    }
+
+    if (libxl_domain_fork_vm(ctx, pdomid, &domid) || domid == INVALID_DOMID)
+        return EXIT_FAILURE;
+
+    fprintf(stderr, "VM fork created with domid: %u\n", domid);
+    return EXIT_SUCCESS;
+}
+
+int main_fork_launch_dm(int argc, char **argv)
+{
+    const char *config_file = NULL;
+    const char *dm_restore_file = NULL;
+    struct domain_create dom_info;
+    int paused = 0, debug = 0;
+    uint32_t ddomid = 0;
+    int opt, rc;
+
+    SWITCH_FOREACH_OPT(opt, "pd", NULL, "fork-launch-dm", 1) {
+    case 'p':
+        paused = 1;
+        break;
+    case 'd':
+        debug = 1;
+        break;
+    }
+
+    if (argc-optind == 3) {
+        config_file = argv[optind];
+        dm_restore_file = argv[optind + 1];
+        ddomid = atoi(argv[optind + 2]);
+    } else {
+        help("fork-launch-dm");
+        return EXIT_FAILURE;
+    }
+
+    memset(&dom_info, 0, sizeof(dom_info));
+    dom_info.ddomid = ddomid;
+    dom_info.dm_restore_file = dm_restore_file;
+    dom_info.debug = debug;
+    dom_info.paused = paused;
+    dom_info.config_file = config_file;
+    dom_info.migrate_fd = -1;
+    dom_info.send_back_fd = -1;
+
+    rc = create_domain(&dom_info);
+    if (rc < 0)
+        return EXIT_FAILURE;
+
+    return EXIT_SUCCESS;
+}
+
+int main_fork_reset(int argc, char **argv)
+{
+    int debug = 0;
+    uint32_t domid = 0;
+    int opt, rc;
+
+    SWITCH_FOREACH_OPT(opt, "d", NULL, "fork-reset", 1)
+    {
+    case 'd':
+        debug = 1;
+        break;
+    }
+
+    if (argc-optind == 1) {
+        domid = atoi(argv[optind]);
+    } else {
+        help("fork-reset");
+        return EXIT_FAILURE;
+    }
+
+    rc = libxl_domain_fork_reset(ctx, domid);
+    if (rc < 0)
+        return EXIT_FAILURE;
+
+    return EXIT_SUCCESS;
+}
+
 int main_save(int argc, char **argv)
 {
     uint32_t domid;
diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index e520b1da79..d9cb19c599 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -645,6 +645,7 @@ int create_domain(struct domain_create *dom_info)
 
     libxl_domain_config d_config;
 
+    uint32_t ddomid = dom_info->ddomid; // launch dm for this domain iff set
     int debug = dom_info->debug;
     int daemonize = dom_info->daemonize;
     int monitor = dom_info->monitor;
@@ -655,6 +656,7 @@ int create_domain(struct domain_create *dom_info)
     const char *restore_file = dom_info->restore_file;
     const char *config_source = NULL;
     const char *restore_source = NULL;
+    const char *dm_restore_file = dom_info->dm_restore_file;
     int migrate_fd = dom_info->migrate_fd;
     bool config_in_json;
 
@@ -923,6 +925,12 @@ start:
          * restore/migrate-receive it again.
          */
         restoring = 0;
+    } else if ( ddomid ) {
+        d_config.dm_restore_file = dm_restore_file;
+        ret = libxl_domain_fork_launch_dm(ctx, &d_config, ddomid,
+                                          autoconnect_console_how);
+        domid = ddomid;
+        ddomid = INVALID_DOMID;
     } else if (domid_soft_reset != INVALID_DOMID) {
         /* Do soft reset. */
         ret = libxl_domain_soft_reset(ctx, &d_config, domid_soft_reset,
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 11/20] x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 11/20] x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool Tamas K Lengyel
@ 2019-12-18 21:29   ` Julien Grall
  2019-12-18 22:19     ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Julien Grall @ 2019-12-18 21:29 UTC (permalink / raw)
  To: Tamas K Lengyel, xen-devel
  Cc: Tamas K Lengyel, Wei Liu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Ian Jackson, Stefano Stabellini, Jan Beulich,
	Roger Pau Monné

Hi Tamas,

On 18/12/2019 19:40, Tamas K Lengyel wrote:
> MEM_SHARING_DESTROY_GFN is used on the 'flags' bitfield during unsharing.
> However, the bitfield is not used for anything else, so just convert it to a
> bool instead.
> 
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> ---
>   xen/arch/x86/mm/mem_sharing.c     | 7 +++----
>   xen/arch/x86/mm/p2m.c             | 1 +
>   xen/common/memory.c               | 2 +-
>   xen/include/asm-x86/mem_sharing.h | 5 ++---
>   4 files changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> index fc1d8be1eb..6e81e1a895 100644
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -1175,7 +1175,7 @@ err_out:
>    */
>   int __mem_sharing_unshare_page(struct domain *d,
>                                  unsigned long gfn,
> -                               uint16_t flags)
> +                               bool destroy)
>   {
>       p2m_type_t p2mt;
>       mfn_t mfn;
> @@ -1231,7 +1231,7 @@ int __mem_sharing_unshare_page(struct domain *d,
>        * If the GFN is getting destroyed drop the references to MFN
>        * (possibly freeing the page), and exit early.
>        */
> -    if ( flags & MEM_SHARING_DESTROY_GFN )
> +    if ( destroy )
>       {
>           if ( !last_gfn )
>               mem_sharing_gfn_destroy(page, d, gfn_info);
> @@ -1321,8 +1321,7 @@ int relinquish_shared_pages(struct domain *d)
>           if ( mfn_valid(mfn) && p2m_is_shared(t) )
>           {
>               /* Does not fail with ENOMEM given the DESTROY flag */
> -            BUG_ON(__mem_sharing_unshare_page(d, gfn,
> -                   MEM_SHARING_DESTROY_GFN));
> +            BUG_ON(__mem_sharing_unshare_page(d, gfn, true));
>               /*
>                * Clear out the p2m entry so no one else may try to
>                * unshare.  Must succeed: we just read the old entry and
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index baea632acc..53ea44fe3c 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -517,6 +517,7 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
>            */
>           if ( mem_sharing_unshare_page(p2m->domain, gfn_l) < 0 )
>               mem_sharing_notify_enomem(p2m->domain, gfn_l, false);
> +

This line looks spurious.

>           mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
>       }
>   
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 309e872edf..c7d2bac452 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -352,7 +352,7 @@ int guest_remove_page(struct domain *d, unsigned long gmfn)
>            * might be the only one using this shared page, and we need to
>            * trigger proper cleanup. Once done, this is like any other page.
>            */
> -        rc = mem_sharing_unshare_page(d, gmfn, 0);
> +        rc = mem_sharing_unshare_page(d, gmfn);

AFAICT, this patch does not reduce the number of parameters for 
mem_sharing_unshare_page(). Did you intend to make this change in 
another patch?

>           if ( rc )
>           {
>               mem_sharing_notify_enomem(d, gmfn, false);
> diff --git a/xen/include/asm-x86/mem_sharing.h b/xen/include/asm-x86/mem_sharing.h
> index 89cdaccea0..4b982a4803 100644
> --- a/xen/include/asm-x86/mem_sharing.h
> +++ b/xen/include/asm-x86/mem_sharing.h
> @@ -76,17 +76,16 @@ struct page_sharing_info
>   unsigned int mem_sharing_get_nr_saved_mfns(void);
>   unsigned int mem_sharing_get_nr_shared_mfns(void);
>   
> -#define MEM_SHARING_DESTROY_GFN       (1<<1)
>   /* Only fails with -ENOMEM. Enforce it with a BUG_ON wrapper. */
>   int __mem_sharing_unshare_page(struct domain *d,
>                                  unsigned long gfn,
> -                               uint16_t flags);
> +                               bool destroy);
>   
>   static inline
>   int mem_sharing_unshare_page(struct domain *d,
>                                unsigned long gfn)
>   {
> -    int rc = __mem_sharing_unshare_page(d, gfn, 0);
> +    int rc = __mem_sharing_unshare_page(d, gfn, false);
>       BUG_ON(rc && (rc != -ENOMEM));
>       return rc;
>   }
> 

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork Tamas K Lengyel
@ 2019-12-18 22:00   ` Julien Grall
  2019-12-18 22:33     ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Julien Grall @ 2019-12-18 22:00 UTC (permalink / raw)
  To: Tamas K Lengyel, xen-devel
  Cc: Tamas K Lengyel, Wei Liu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Ian Jackson, Stefano Stabellini, Jan Beulich,
	Roger Pau Monné

Hi Tamas,

On 18/12/2019 19:40, Tamas K Lengyel wrote:
> Implement hypercall that allows a fork to shed all memory that got allocated
> for it during its execution and re-load its vCPU context from the parent VM.
> This allows the forked VM to reset into the same state the parent VM is in a
> faster way then creating a new fork would be. Measurements show about a 2x
> speedup during normal fuzzing operations. Performance may vary depending how
> much memory got allocated for the forked VM. If it has been completely
> deduplicated from the parent VM then creating a new fork would likely be more
> performant.
> 
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> ---
>   xen/arch/x86/mm/mem_sharing.c | 105 ++++++++++++++++++++++++++++++++++
>   xen/include/public/memory.h   |   1 +
>   2 files changed, 106 insertions(+)
> 
> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> index e93ad2ec5a..4735a334b9 100644
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -1622,6 +1622,87 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd)
>       return 0;
>   }
>   
> +struct gfn_free;
> +struct gfn_free {
> +    struct gfn_free *next;
> +    struct page_info *page;
> +    gfn_t gfn;
> +};
> +
> +static int mem_sharing_fork_reset(struct domain *d, struct domain *cd)
> +{
> +    int rc;
> +
> +    struct p2m_domain* p2m = p2m_get_hostp2m(cd);
> +    struct gfn_free *list = NULL;
> +    struct page_info *page;
> +
> +    page_list_for_each(page, &cd->page_list)

AFAICT, your domain is not paused, so it would be possible to have page 
added/remove in that list behind your back.

You also have multiple loop on the page_list in this function. Given the 
number of page_list can be quite big, this is a call for hogging the 
pCPU and an RCU lock on the domain vCPU running this call.

> +    {
> +        mfn_t mfn = page_to_mfn(page);
> +        if ( mfn_valid(mfn) )
> +        {
> +            p2m_type_t p2mt;
> +            p2m_access_t p2ma;
> +            gfn_t gfn = mfn_to_gfn(cd, mfn);
> +            mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
> +                                        0, NULL, false);
> +            if ( p2m_is_ram(p2mt) )
> +            {
> +                struct gfn_free *gfn_free;
> +                if ( !get_page(page, cd) )
> +                    goto err_reset;
> +
> +                /*
> +                 * We can't free the page while iterating over the page_list
> +                 * so we build a separate list to loop over.
> +                 *
> +                 * We want to iterate over the page_list instead of checking
> +                 * gfn from 0 to max_gfn because this is ~10x faster.
> +                 */
> +                gfn_free = xmalloc(struct gfn_free);

If I did the math right, for a 4G guest this will require at ~24MB of 
memory. Actually, is it really necessary to do the allocation for a 
short period of time?

What are you trying to achieve by iterating twice on the GFN? Wouldn't 
it be easier to pause the domain?

> +                if ( !gfn_free )
> +                    goto err_reset;
> +
> +                gfn_free->gfn = gfn;
> +                gfn_free->page = page;
> +                gfn_free->next = list;
> +                list = gfn_free;
> +            }
> +        }
> +    }
> +
> +    while ( list )
> +    {
> +        struct gfn_free *next = list->next;
> +
> +        rc = p2m->set_entry(p2m, list->gfn, INVALID_MFN, PAGE_ORDER_4K,
> +                            p2m_invalid, p2m_access_rwx, -1);
> +        put_page_alloc_ref(list->page);
> +        put_page(list->page);
> +
> +        xfree(list);
> +        list = next;
> +
> +        ASSERT(!rc);
> +    }
> +
> +    if ( (rc = fork_hvm(d, cd)) )
> +        return rc;
> +
> + err_reset:
> +    while ( list )
> +    {
> +        struct gfn_free *next = list->next;
> +
> +        put_page(list->page);
> +        xfree(list);
> +        list = next;
> +    }
> +
> +    return 0;
> +}
> +

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 11/20] x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool
  2019-12-18 21:29   ` Julien Grall
@ 2019-12-18 22:19     ` Tamas K Lengyel
  0 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 22:19 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Jan Beulich, Xen-devel, Roger Pau Monné

On Wed, Dec 18, 2019 at 2:29 PM Julien Grall <julien@xen.org> wrote:
>
> Hi Tamas,
>
> On 18/12/2019 19:40, Tamas K Lengyel wrote:
> > MEM_SHARING_DESTROY_GFN is used on the 'flags' bitfield during unsharing.
> > However, the bitfield is not used for anything else, so just convert it to a
> > bool instead.
> >
> > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> > ---
> >   xen/arch/x86/mm/mem_sharing.c     | 7 +++----
> >   xen/arch/x86/mm/p2m.c             | 1 +
> >   xen/common/memory.c               | 2 +-
> >   xen/include/asm-x86/mem_sharing.h | 5 ++---
> >   4 files changed, 7 insertions(+), 8 deletions(-)
> >
> > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> > index fc1d8be1eb..6e81e1a895 100644
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -1175,7 +1175,7 @@ err_out:
> >    */
> >   int __mem_sharing_unshare_page(struct domain *d,
> >                                  unsigned long gfn,
> > -                               uint16_t flags)
> > +                               bool destroy)
> >   {
> >       p2m_type_t p2mt;
> >       mfn_t mfn;
> > @@ -1231,7 +1231,7 @@ int __mem_sharing_unshare_page(struct domain *d,
> >        * If the GFN is getting destroyed drop the references to MFN
> >        * (possibly freeing the page), and exit early.
> >        */
> > -    if ( flags & MEM_SHARING_DESTROY_GFN )
> > +    if ( destroy )
> >       {
> >           if ( !last_gfn )
> >               mem_sharing_gfn_destroy(page, d, gfn_info);
> > @@ -1321,8 +1321,7 @@ int relinquish_shared_pages(struct domain *d)
> >           if ( mfn_valid(mfn) && p2m_is_shared(t) )
> >           {
> >               /* Does not fail with ENOMEM given the DESTROY flag */
> > -            BUG_ON(__mem_sharing_unshare_page(d, gfn,
> > -                   MEM_SHARING_DESTROY_GFN));
> > +            BUG_ON(__mem_sharing_unshare_page(d, gfn, true));
> >               /*
> >                * Clear out the p2m entry so no one else may try to
> >                * unshare.  Must succeed: we just read the old entry and
> > diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> > index baea632acc..53ea44fe3c 100644
> > --- a/xen/arch/x86/mm/p2m.c
> > +++ b/xen/arch/x86/mm/p2m.c
> > @@ -517,6 +517,7 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
> >            */
> >           if ( mem_sharing_unshare_page(p2m->domain, gfn_l) < 0 )
> >               mem_sharing_notify_enomem(p2m->domain, gfn_l, false);
> > +
>
> This line looks spurious.

Yeap.

>
> >           mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
> >       }
> >
> > diff --git a/xen/common/memory.c b/xen/common/memory.c
> > index 309e872edf..c7d2bac452 100644
> > --- a/xen/common/memory.c
> > +++ b/xen/common/memory.c
> > @@ -352,7 +352,7 @@ int guest_remove_page(struct domain *d, unsigned long gmfn)
> >            * might be the only one using this shared page, and we need to
> >            * trigger proper cleanup. Once done, this is like any other page.
> >            */
> > -        rc = mem_sharing_unshare_page(d, gmfn, 0);
> > +        rc = mem_sharing_unshare_page(d, gmfn);
>
> AFAICT, this patch does not reduce the number of parameters for
> mem_sharing_unshare_page(). Did you intend to make this change in
> another patch?

Ah yea, it should have been dropped in patch 6 of the series.

>
> >           if ( rc )
> >           {
> >               mem_sharing_notify_enomem(d, gmfn, false);
> > diff --git a/xen/include/asm-x86/mem_sharing.h b/xen/include/asm-x86/mem_sharing.h
> > index 89cdaccea0..4b982a4803 100644
> > --- a/xen/include/asm-x86/mem_sharing.h
> > +++ b/xen/include/asm-x86/mem_sharing.h
> > @@ -76,17 +76,16 @@ struct page_sharing_info
> >   unsigned int mem_sharing_get_nr_saved_mfns(void);
> >   unsigned int mem_sharing_get_nr_shared_mfns(void);
> >
> > -#define MEM_SHARING_DESTROY_GFN       (1<<1)
> >   /* Only fails with -ENOMEM. Enforce it with a BUG_ON wrapper. */
> >   int __mem_sharing_unshare_page(struct domain *d,
> >                                  unsigned long gfn,
> > -                               uint16_t flags);
> > +                               bool destroy);
> >
> >   static inline
> >   int mem_sharing_unshare_page(struct domain *d,
> >                                unsigned long gfn)
> >   {
> > -    int rc = __mem_sharing_unshare_page(d, gfn, 0);
> > +    int rc = __mem_sharing_unshare_page(d, gfn, false);
> >       BUG_ON(rc && (rc != -ENOMEM));
> >       return rc;
> >   }
> >
>
> Cheers,

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork
  2019-12-18 22:00   ` Julien Grall
@ 2019-12-18 22:33     ` Tamas K Lengyel
  2019-12-18 23:01       ` Julien Grall
  0 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-18 22:33 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Jan Beulich, Xen-devel, Roger Pau Monné

On Wed, Dec 18, 2019 at 3:00 PM Julien Grall <julien@xen.org> wrote:
>
> Hi Tamas,
>
> On 18/12/2019 19:40, Tamas K Lengyel wrote:
> > Implement hypercall that allows a fork to shed all memory that got allocated
> > for it during its execution and re-load its vCPU context from the parent VM.
> > This allows the forked VM to reset into the same state the parent VM is in a
> > faster way then creating a new fork would be. Measurements show about a 2x
> > speedup during normal fuzzing operations. Performance may vary depending how
> > much memory got allocated for the forked VM. If it has been completely
> > deduplicated from the parent VM then creating a new fork would likely be more
> > performant.
> >
> > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> > ---
> >   xen/arch/x86/mm/mem_sharing.c | 105 ++++++++++++++++++++++++++++++++++
> >   xen/include/public/memory.h   |   1 +
> >   2 files changed, 106 insertions(+)
> >
> > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> > index e93ad2ec5a..4735a334b9 100644
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -1622,6 +1622,87 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd)
> >       return 0;
> >   }
> >
> > +struct gfn_free;
> > +struct gfn_free {
> > +    struct gfn_free *next;
> > +    struct page_info *page;
> > +    gfn_t gfn;
> > +};
> > +
> > +static int mem_sharing_fork_reset(struct domain *d, struct domain *cd)
> > +{
> > +    int rc;
> > +
> > +    struct p2m_domain* p2m = p2m_get_hostp2m(cd);
> > +    struct gfn_free *list = NULL;
> > +    struct page_info *page;
> > +
> > +    page_list_for_each(page, &cd->page_list)
>
> AFAICT, your domain is not paused, so it would be possible to have page
> added/remove in that list behind your back.

Well, it's not that it's not paused, it's just that I haven't added a
sanity check to make sure it is. The toolstack can (and should) pause
it, so that sanity check would be warranted.

>
> You also have multiple loop on the page_list in this function. Given the
> number of page_list can be quite big, this is a call for hogging the
> pCPU and an RCU lock on the domain vCPU running this call.

There is just one loop over page_list itself, the second loop is on
the internal list that is being built here which will be a subset. The
list itself in fact should be small (in our tests usually <100).
Granted the list can grow larger, but in those cases its likely better
to just discard the fork and create a new one. So in my opinion adding
a hypercall continuation to this not needed.

>
> > +    {
> > +        mfn_t mfn = page_to_mfn(page);
> > +        if ( mfn_valid(mfn) )
> > +        {
> > +            p2m_type_t p2mt;
> > +            p2m_access_t p2ma;
> > +            gfn_t gfn = mfn_to_gfn(cd, mfn);
> > +            mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
> > +                                        0, NULL, false);
> > +            if ( p2m_is_ram(p2mt) )
> > +            {
> > +                struct gfn_free *gfn_free;
> > +                if ( !get_page(page, cd) )
> > +                    goto err_reset;
> > +
> > +                /*
> > +                 * We can't free the page while iterating over the page_list
> > +                 * so we build a separate list to loop over.
> > +                 *
> > +                 * We want to iterate over the page_list instead of checking
> > +                 * gfn from 0 to max_gfn because this is ~10x faster.
> > +                 */
> > +                gfn_free = xmalloc(struct gfn_free);
>
> If I did the math right, for a 4G guest this will require at ~24MB of
> memory. Actually, is it really necessary to do the allocation for a
> short period of time?

If you have a fully deduplicated fork then you should not be using
this function to begin with. You get better performance my throwing
that one away and creating a new one. As for using xmalloc here, I'm
not sure what other way I have to build a list of pages that need to
be freed. I can't free the page itself while I'm iterating on
page_list (that I'm aware of). The only other option available is
calling __get_gfn_type_access with gfn=0..max_gfn which will be
extremely slow because you have to loop over a lot of holes.

>
> What are you trying to achieve by iterating twice on the GFN? Wouldn't
> it be easier to pause the domain?

I'm not sure what you mean, where do you see me iterating twice on the
gfn? And what does pausing have to do with it?

Than

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork
  2019-12-18 22:33     ` Tamas K Lengyel
@ 2019-12-18 23:01       ` Julien Grall
  2019-12-19  0:15         ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Julien Grall @ 2019-12-18 23:01 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Jan Beulich, Xen-devel, Roger Pau Monné

Hi,

On 18/12/2019 22:33, Tamas K Lengyel wrote:
> On Wed, Dec 18, 2019 at 3:00 PM Julien Grall <julien@xen.org> wrote:
>>
>> Hi Tamas,
>>
>> On 18/12/2019 19:40, Tamas K Lengyel wrote:
>>> Implement hypercall that allows a fork to shed all memory that got allocated
>>> for it during its execution and re-load its vCPU context from the parent VM.
>>> This allows the forked VM to reset into the same state the parent VM is in a
>>> faster way then creating a new fork would be. Measurements show about a 2x
>>> speedup during normal fuzzing operations. Performance may vary depending how
>>> much memory got allocated for the forked VM. If it has been completely
>>> deduplicated from the parent VM then creating a new fork would likely be more
>>> performant.
>>>
>>> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
>>> ---
>>>    xen/arch/x86/mm/mem_sharing.c | 105 ++++++++++++++++++++++++++++++++++
>>>    xen/include/public/memory.h   |   1 +
>>>    2 files changed, 106 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
>>> index e93ad2ec5a..4735a334b9 100644
>>> --- a/xen/arch/x86/mm/mem_sharing.c
>>> +++ b/xen/arch/x86/mm/mem_sharing.c
>>> @@ -1622,6 +1622,87 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd)
>>>        return 0;
>>>    }
>>>
>>> +struct gfn_free;
>>> +struct gfn_free {
>>> +    struct gfn_free *next;
>>> +    struct page_info *page;
>>> +    gfn_t gfn;
>>> +};
>>> +
>>> +static int mem_sharing_fork_reset(struct domain *d, struct domain *cd)
>>> +{
>>> +    int rc;
>>> +
>>> +    struct p2m_domain* p2m = p2m_get_hostp2m(cd);
>>> +    struct gfn_free *list = NULL;
>>> +    struct page_info *page;
>>> +
>>> +    page_list_for_each(page, &cd->page_list)
>>
>> AFAICT, your domain is not paused, so it would be possible to have page
>> added/remove in that list behind your back.
> 
> Well, it's not that it's not paused, it's just that I haven't added a
> sanity check to make sure it is. The toolstack can (and should) pause
> it, so that sanity check would be warranted.
I have only read the hypervisor part, so I didn't know what the 
toolstack has done.

> 
>>
>> You also have multiple loop on the page_list in this function. Given the
>> number of page_list can be quite big, this is a call for hogging the
>> pCPU and an RCU lock on the domain vCPU running this call.
> 
> There is just one loop over page_list itself, the second loop is on
> the internal list that is being built here which will be a subset. The
> list itself in fact should be small (in our tests usually <100).

For a first, nothing in this function tells me that there will be only 
100 pages. But then, I don't think this is right to implement your 
hypercall based only the  "normal" scenario. You should also think about 
the "worst" case scenario.

In this case the worst case scenario is have hundreds of page in page_list.

> Granted the list can grow larger, but in those cases its likely better
> to just discard the fork and create a new one. So in my opinion adding
> a hypercall continuation to this not needed

How would the caller know it? What would happen if the caller ends up to 
call this with a growing list.

> 
>>
>>> +    {
>>> +        mfn_t mfn = page_to_mfn(page);
>>> +        if ( mfn_valid(mfn) )
>>> +        {
>>> +            p2m_type_t p2mt;
>>> +            p2m_access_t p2ma;
>>> +            gfn_t gfn = mfn_to_gfn(cd, mfn);
>>> +            mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
>>> +                                        0, NULL, false);
>>> +            if ( p2m_is_ram(p2mt) )
>>> +            {
>>> +                struct gfn_free *gfn_free;
>>> +                if ( !get_page(page, cd) )
>>> +                    goto err_reset;
>>> +
>>> +                /*
>>> +                 * We can't free the page while iterating over the page_list
>>> +                 * so we build a separate list to loop over.
>>> +                 *
>>> +                 * We want to iterate over the page_list instead of checking
>>> +                 * gfn from 0 to max_gfn because this is ~10x faster.
>>> +                 */
>>> +                gfn_free = xmalloc(struct gfn_free);
>>
>> If I did the math right, for a 4G guest this will require at ~24MB of
>> memory. Actually, is it really necessary to do the allocation for a
>> short period of time?
> 
> If you have a fully deduplicated fork then you should not be using
> this function to begin with. You get better performance my throwing
> that one away and creating a new one.

How a user knows when/how this can be called? But then, as said above, 
this may be called by mistake... So I still think you need to be prepare 
for the worst case.

> As for using xmalloc here, I'm
> not sure what other way I have to build a list of pages that need to
> be freed. I can't free the page itself while I'm iterating on
> page_list (that I'm aware of). The only other option available is
> calling __get_gfn_type_access with gfn=0..max_gfn which will be
> extremely slow because you have to loop over a lot of holes.
You can use page_list_for_each_safe(). This is already used by function 
such as relinquish_memory().

> 
>>
>> What are you trying to achieve by iterating twice on the GFN? Wouldn't
>> it be easier to pause the domain?
> 
> I'm not sure what you mean, where do you see me iterating twice on the
> gfn? And what does pausing have to do with it?

It was unclear why you decided to use a double loop here. You explained 
it above, so this can be discarded.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork
  2019-12-18 23:01       ` Julien Grall
@ 2019-12-19  0:15         ` Tamas K Lengyel
  2019-12-19  7:45           ` Julien Grall
  2019-12-19 11:06           ` Jan Beulich
  0 siblings, 2 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-19  0:15 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Jan Beulich, Xen-devel, Roger Pau Monné

On Wed, Dec 18, 2019 at 4:02 PM Julien Grall <julien@xen.org> wrote:
>
> Hi,
>
> On 18/12/2019 22:33, Tamas K Lengyel wrote:
> > On Wed, Dec 18, 2019 at 3:00 PM Julien Grall <julien@xen.org> wrote:
> >>
> >> Hi Tamas,
> >>
> >> On 18/12/2019 19:40, Tamas K Lengyel wrote:
> >>> Implement hypercall that allows a fork to shed all memory that got allocated
> >>> for it during its execution and re-load its vCPU context from the parent VM.
> >>> This allows the forked VM to reset into the same state the parent VM is in a
> >>> faster way then creating a new fork would be. Measurements show about a 2x
> >>> speedup during normal fuzzing operations. Performance may vary depending how
> >>> much memory got allocated for the forked VM. If it has been completely
> >>> deduplicated from the parent VM then creating a new fork would likely be more
> >>> performant.
> >>>
> >>> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
> >>> ---
> >>>    xen/arch/x86/mm/mem_sharing.c | 105 ++++++++++++++++++++++++++++++++++
> >>>    xen/include/public/memory.h   |   1 +
> >>>    2 files changed, 106 insertions(+)
> >>>
> >>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> >>> index e93ad2ec5a..4735a334b9 100644
> >>> --- a/xen/arch/x86/mm/mem_sharing.c
> >>> +++ b/xen/arch/x86/mm/mem_sharing.c
> >>> @@ -1622,6 +1622,87 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd)
> >>>        return 0;
> >>>    }
> >>>
> >>> +struct gfn_free;
> >>> +struct gfn_free {
> >>> +    struct gfn_free *next;
> >>> +    struct page_info *page;
> >>> +    gfn_t gfn;
> >>> +};
> >>> +
> >>> +static int mem_sharing_fork_reset(struct domain *d, struct domain *cd)
> >>> +{
> >>> +    int rc;
> >>> +
> >>> +    struct p2m_domain* p2m = p2m_get_hostp2m(cd);
> >>> +    struct gfn_free *list = NULL;
> >>> +    struct page_info *page;
> >>> +
> >>> +    page_list_for_each(page, &cd->page_list)
> >>
> >> AFAICT, your domain is not paused, so it would be possible to have page
> >> added/remove in that list behind your back.
> >
> > Well, it's not that it's not paused, it's just that I haven't added a
> > sanity check to make sure it is. The toolstack can (and should) pause
> > it, so that sanity check would be warranted.
> I have only read the hypervisor part, so I didn't know what the
> toolstack has done.

I've added the same enforced VM paused operation that is present for
the fork hypercall handler.

>
> >
> >>
> >> You also have multiple loop on the page_list in this function. Given the
> >> number of page_list can be quite big, this is a call for hogging the
> >> pCPU and an RCU lock on the domain vCPU running this call.
> >
> > There is just one loop over page_list itself, the second loop is on
> > the internal list that is being built here which will be a subset. The
> > list itself in fact should be small (in our tests usually <100).
>
> For a first, nothing in this function tells me that there will be only
> 100 pages. But then, I don't think this is right to implement your
> hypercall based only the  "normal" scenario. You should also think about
> the "worst" case scenario.
>
> In this case the worst case scenario is have hundreds of page in page_list.

Well, this is only an experimental system that's completely disabled
by default. Making the assumption that people who make use of it will
know what they are doing I think is fair.

>
> > Granted the list can grow larger, but in those cases its likely better
> > to just discard the fork and create a new one. So in my opinion adding
> > a hypercall continuation to this not needed
>
> How would the caller know it? What would happen if the caller ends up to
> call this with a growing list.

The caller knows by virtue of knowing how long the VM was executed
for. In the usecase this is targeted at the VM was executing only for
a couple seconds at most. Usually much less then that (we get about
~80 resets/s with AFL). During that time its extremely unlikely you
get more then a ~100 pages deduplicated (that is, written to). But
even if there are more pages, it just means the hypercall might take a
bit longer to run for that iteration. I don't see any issue with not
breaking up this hypercall with continuation even under the worst case
situation though. But if others feel that strongly as well about
having to have continuation for this I don't really mind adding it.

>
> >
> >>
> >>> +    {
> >>> +        mfn_t mfn = page_to_mfn(page);
> >>> +        if ( mfn_valid(mfn) )
> >>> +        {
> >>> +            p2m_type_t p2mt;
> >>> +            p2m_access_t p2ma;
> >>> +            gfn_t gfn = mfn_to_gfn(cd, mfn);
> >>> +            mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
> >>> +                                        0, NULL, false);
> >>> +            if ( p2m_is_ram(p2mt) )
> >>> +            {
> >>> +                struct gfn_free *gfn_free;
> >>> +                if ( !get_page(page, cd) )
> >>> +                    goto err_reset;
> >>> +
> >>> +                /*
> >>> +                 * We can't free the page while iterating over the page_list
> >>> +                 * so we build a separate list to loop over.
> >>> +                 *
> >>> +                 * We want to iterate over the page_list instead of checking
> >>> +                 * gfn from 0 to max_gfn because this is ~10x faster.
> >>> +                 */
> >>> +                gfn_free = xmalloc(struct gfn_free);
> >>
> >> If I did the math right, for a 4G guest this will require at ~24MB of
> >> memory. Actually, is it really necessary to do the allocation for a
> >> short period of time?
> >
> > If you have a fully deduplicated fork then you should not be using
> > this function to begin with. You get better performance my throwing
> > that one away and creating a new one.
>
> How a user knows when/how this can be called? But then, as said above,
> this may be called by mistake... So I still think you need to be prepare
> for the worst case.

See my answer above.

>
> > As for using xmalloc here, I'm
> > not sure what other way I have to build a list of pages that need to
> > be freed. I can't free the page itself while I'm iterating on
> > page_list (that I'm aware of). The only other option available is
> > calling __get_gfn_type_access with gfn=0..max_gfn which will be
> > extremely slow because you have to loop over a lot of holes.
> You can use page_list_for_each_safe(). This is already used by function
> such as relinquish_memory().

Neat, will check it out! Would certainly simplify things not having to
build a private list.

>
> >
> >>
> >> What are you trying to achieve by iterating twice on the GFN? Wouldn't
> >> it be easier to pause the domain?
> >
> > I'm not sure what you mean, where do you see me iterating twice on the
> > gfn? And what does pausing have to do with it?
>
> It was unclear why you decided to use a double loop here. You explained
> it above, so this can be discarded.

OK, cool.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork
  2019-12-19  0:15         ` Tamas K Lengyel
@ 2019-12-19  7:45           ` Julien Grall
  2019-12-19 16:11             ` Tamas K Lengyel
  2019-12-19 11:06           ` Jan Beulich
  1 sibling, 1 reply; 96+ messages in thread
From: Julien Grall @ 2019-12-19  7:45 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Jan Beulich, Xen-devel, Roger Pau Monné

Hi Tamas,

On 19/12/2019 00:15, Tamas K Lengyel wrote:
> On Wed, Dec 18, 2019 at 4:02 PM Julien Grall <julien@xen.org> wrote:
>>
>> Hi,
>>
>> On 18/12/2019 22:33, Tamas K Lengyel wrote:
>>> On Wed, Dec 18, 2019 at 3:00 PM Julien Grall <julien@xen.org> wrote:
>>>>
>>>> Hi Tamas,
>>>>
>>>> On 18/12/2019 19:40, Tamas K Lengyel wrote:
>>>>> Implement hypercall that allows a fork to shed all memory that got allocated
>>>>> for it during its execution and re-load its vCPU context from the parent VM.
>>>>> This allows the forked VM to reset into the same state the parent VM is in a
>>>>> faster way then creating a new fork would be. Measurements show about a 2x
>>>>> speedup during normal fuzzing operations. Performance may vary depending how
>>>>> much memory got allocated for the forked VM. If it has been completely
>>>>> deduplicated from the parent VM then creating a new fork would likely be more
>>>>> performant.
>>>>>
>>>>> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
>>>>> ---
>>>>>     xen/arch/x86/mm/mem_sharing.c | 105 ++++++++++++++++++++++++++++++++++
>>>>>     xen/include/public/memory.h   |   1 +
>>>>>     2 files changed, 106 insertions(+)
>>>>>
>>>>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
>>>>> index e93ad2ec5a..4735a334b9 100644
>>>>> --- a/xen/arch/x86/mm/mem_sharing.c
>>>>> +++ b/xen/arch/x86/mm/mem_sharing.c
>>>>> @@ -1622,6 +1622,87 @@ static int mem_sharing_fork(struct domain *d, struct domain *cd)
>>>>>         return 0;
>>>>>     }
>>>>>
>>>>> +struct gfn_free;
>>>>> +struct gfn_free {
>>>>> +    struct gfn_free *next;
>>>>> +    struct page_info *page;
>>>>> +    gfn_t gfn;
>>>>> +};
>>>>> +
>>>>> +static int mem_sharing_fork_reset(struct domain *d, struct domain *cd)
>>>>> +{
>>>>> +    int rc;
>>>>> +
>>>>> +    struct p2m_domain* p2m = p2m_get_hostp2m(cd);
>>>>> +    struct gfn_free *list = NULL;
>>>>> +    struct page_info *page;
>>>>> +
>>>>> +    page_list_for_each(page, &cd->page_list)
>>>>
>>>> AFAICT, your domain is not paused, so it would be possible to have page
>>>> added/remove in that list behind your back.
>>>
>>> Well, it's not that it's not paused, it's just that I haven't added a
>>> sanity check to make sure it is. The toolstack can (and should) pause
>>> it, so that sanity check would be warranted.
>> I have only read the hypervisor part, so I didn't know what the
>> toolstack has done.
> 
> I've added the same enforced VM paused operation that is present for
> the fork hypercall handler.
> 
>>
>>>
>>>>
>>>> You also have multiple loop on the page_list in this function. Given the
>>>> number of page_list can be quite big, this is a call for hogging the
>>>> pCPU and an RCU lock on the domain vCPU running this call.
>>>
>>> There is just one loop over page_list itself, the second loop is on
>>> the internal list that is being built here which will be a subset. The
>>> list itself in fact should be small (in our tests usually <100).
>>
>> For a first, nothing in this function tells me that there will be only
>> 100 pages. But then, I don't think this is right to implement your
>> hypercall based only the  "normal" scenario. You should also think about
>> the "worst" case scenario.
>>
>> In this case the worst case scenario is have hundreds of page in page_list.
> 
> Well, this is only an experimental system that's completely disabled
> by default. Making the assumption that people who make use of it will
> know what they are doing I think is fair.

I assume that if you submit to upstream this new hypercall then there is 
longer plan to have more people to use it and potentially making 
"stable". If not, then it raises the question why this is pushed upstream...

In any case, all the known assumptions should be documented so they can 
be fixed rather than forgotten until it is rediscovered via an XSA.

> 
>>
>>> Granted the list can grow larger, but in those cases its likely better
>>> to just discard the fork and create a new one. So in my opinion adding
>>> a hypercall continuation to this not needed
>>
>> How would the caller know it? What would happen if the caller ends up to
>> call this with a growing list.
> 
> The caller knows by virtue of knowing how long the VM was executed
> for. In the usecase this is targeted at the VM was executing only for
> a couple seconds at most. Usually much less then that (we get about
> ~80 resets/s with AFL). During that time its extremely unlikely you
> get more then a ~100 pages deduplicated (that is, written to). But
> even if there are more pages, it just means the hypercall might take a
> bit longer to run for that iteration.

I assume if you upstream the code then you want more people to use it 
(otherwise what's the point?). In this case, you will likely have people 
that heard about the feature, wants to test but don't know the internal.

Such users need to know how this can be call safely without reading the 
implementation. In other words, some documentation for your hypercall is 
needed.

> I don't see any issue with not
> breaking up this hypercall with continuation even under the worst case
> situation though.

Xen only supports voluntary preemption, this means that an hypercall can 
only be preempted if there is code for it. Otherwise the preemption will 
mostly only happen when returning to the guest.

In other words, the vCPU executing the hypercall may go past its 
timeslice and prevent other vCPU to run.

There are other problem with long running hypercalls. Anyway, in short, 
if you ever want to get you code supported then you will need the 
hypercall to be broken down.

> But if others feel that strongly as well about
> having to have continuation for this I don't really mind adding it.

I don't think the continuation work is going to be difficult, but if you 
want to delay it, then the minimum is to document such assumptions for 
your users.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 18/20] xen/mem_access: Use __get_gfn_type_access in set_mem_access
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 18/20] xen/mem_access: Use __get_gfn_type_access in set_mem_access Tamas K Lengyel
@ 2019-12-19  7:59   ` Alexandru Stefan ISAILA
  2019-12-19 16:00     ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Alexandru Stefan ISAILA @ 2019-12-19  7:59 UTC (permalink / raw)
  To: Tamas K Lengyel, xen-devel
  Cc: Petre Ovidiu PIRCALABU, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, Roger Pau Monné



On 18.12.2019 21:40, Tamas K Lengyel wrote:
> Use __get_gfn_type_access instead of p2m->get_entry to trigger page-forking
> when the mem_access permission is being set on a page that has not yet been
> copied over from the parent.
> 
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.comReviewed-by: Alexandru Isaila <aisaila@bitdefender.com>

> ---
>   xen/arch/x86/mm/mem_access.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/mem_access.c b/xen/arch/x86/mm/mem_access.c
> index 320b9fe621..9caf08a5b2 100644
> --- a/xen/arch/x86/mm/mem_access.c
> +++ b/xen/arch/x86/mm/mem_access.c
> @@ -303,11 +303,10 @@ static int set_mem_access(struct domain *d, struct p2m_domain *p2m,
>       ASSERT(!ap2m);
>   #endif
>       {
> -        mfn_t mfn;
>           p2m_access_t _a;
>           p2m_type_t t;
> -
> -        mfn = p2m->get_entry(p2m, gfn, &t, &_a, 0, NULL, NULL);
> +        mfn_t mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &t, &_a,
> +                                          P2M_ALLOC, NULL, false);

Don't you want p2m_query_t to be 0 as it was in the p2m->get_entry() call ?

>           rc = p2m->set_entry(p2m, gfn, mfn, PAGE_ORDER_4K, t, a, -1);
>       }
>   
> 

Alex
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
                   ` (19 preceding siblings ...)
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 20/20] xen/tools: VM forking toolstack side Tamas K Lengyel
@ 2019-12-19  9:48 ` Roger Pau Monné
  2019-12-19 15:58   ` Tamas K Lengyel
  20 siblings, 1 reply; 96+ messages in thread
From: Roger Pau Monné @ 2019-12-19  9:48 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Petre Pircalabu, Tamas K Lengyel, Julien Grall, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Anthony PERARD, Stefano Stabellini, Jan Beulich,
	Alexandru Isaila, xen-devel

On Wed, Dec 18, 2019 at 11:40:37AM -0800, Tamas K Lengyel wrote:
> The following series implements VM forking for Intel HVM guests to allow for
> the fast creation of identical VMs without the assosciated high startup costs
> of booting or restoring the VM from a savefile.
> 
> JIRA issue: https://xenproject.atlassian.net/browse/XEN-89
> 
> The main design goal with this series has been to reduce the time of creating
> the VM fork as much as possible. To achieve this the VM forking process is
> split into two steps:
>     1) forking the VM on the hypervisor side;
>     2) starting QEMU to handle the backed for emulated devices.
> 
> Step 1) involves creating a VM using the new "xl fork-vm" command. The
> parent VM is expected to remain paused after forks are created from it (which
> is different then what process forking normally entails). During this forking
               ^ than
> operation the HVM context and VM settings are copied over to the new forked VM.
> This operation is fast and it allows the forked VM to be unpaused and to be
> monitored and accessed via VMI. Note however that without its device model
> running (depending on what is executing in the VM) it is bound to
> misbehave/crash when its trying to access devices that would be emulated by
> QEMU. We anticipate that for certain use-cases this would be an acceptable
> situation, in case for example when fuzzing is performed of code segments that
> don't access such devices.
> 
> Step 2) involves launching QEMU to support the forked VM, which requires the
> QEMU Xen savefile to be generated manually from the parent VM. This can be
> accomplished simply by connecting to its QMP socket and issuing the
> "xen-save-devices-state" command as documented by QEMU:
> https://github.com/qemu/qemu/blob/master/docs/xen-save-devices-state.txt
> Once the QEMU Xen savefile is generated the new "xl fork-launch-dm" command is
> used to launch QEMU and load the specified savefile for it.

IMO having two different commands is confusing for the end user, I
would rather have something like:

xl fork-vm [-d] ...

Where '-d' would prevent forking any user-space emulators. I don't
thinks there's a need for a separate command to fork the underlying
user-space emulators.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork
  2019-12-19  0:15         ` Tamas K Lengyel
  2019-12-19  7:45           ` Julien Grall
@ 2019-12-19 11:06           ` Jan Beulich
  2019-12-19 16:02             ` Tamas K Lengyel
  1 sibling, 1 reply; 96+ messages in thread
From: Jan Beulich @ 2019-12-19 11:06 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Xen-devel, Julien Grall, Roger Pau Monné

On 19.12.2019 01:15, Tamas K Lengyel wrote:
> On Wed, Dec 18, 2019 at 4:02 PM Julien Grall <julien@xen.org> wrote:
>> On 18/12/2019 22:33, Tamas K Lengyel wrote:
>>> On Wed, Dec 18, 2019 at 3:00 PM Julien Grall <julien@xen.org> wrote:
>>>> You also have multiple loop on the page_list in this function. Given the
>>>> number of page_list can be quite big, this is a call for hogging the
>>>> pCPU and an RCU lock on the domain vCPU running this call.
>>>
>>> There is just one loop over page_list itself, the second loop is on
>>> the internal list that is being built here which will be a subset. The
>>> list itself in fact should be small (in our tests usually <100).
>>
>> For a first, nothing in this function tells me that there will be only
>> 100 pages. But then, I don't think this is right to implement your
>> hypercall based only the  "normal" scenario. You should also think about
>> the "worst" case scenario.
>>
>> In this case the worst case scenario is have hundreds of page in page_list.
> 
> Well, this is only an experimental system that's completely disabled
> by default. Making the assumption that people who make use of it will
> know what they are doing I think is fair.

FWIW I'm with Julien here: The preferred course of action is to make
the operation safe against abuse. The minimum requirement is to
document obvious missing pieces for this to become supported code.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 04/20] x86/mem_sharing: cleanup code and comments in various locations
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 04/20] x86/mem_sharing: cleanup code and comments in various locations Tamas K Lengyel
@ 2019-12-19 11:18   ` Andrew Cooper
  2019-12-19 16:20     ` Tamas K Lengyel
  2019-12-19 16:21     ` Tamas K Lengyel
  0 siblings, 2 replies; 96+ messages in thread
From: Andrew Cooper @ 2019-12-19 11:18 UTC (permalink / raw)
  To: Tamas K Lengyel, xen-devel
  Cc: George Dunlap, Tamas K Lengyel, Wei Liu, Jan Beulich,
	Roger Pau Monné

On 18/12/2019 19:40, Tamas K Lengyel wrote:
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 5a3a962fbb..1e888b403b 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -1902,12 +1902,11 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
>      if ( npfec.write_access && (p2mt == p2m_ram_shared) )
>      {
>          ASSERT(p2m_is_hostp2m(p2m));
> -        sharing_enomem = 
> -            (mem_sharing_unshare_page(currd, gfn, 0) < 0);
> +        sharing_enomem = mem_sharing_unshare_page(currd, gfn, 0);

This is a logical change.  Is it intended to be in a later patch?

> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> index efb8821768..319aaf3074 100644
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -198,24 +200,26 @@ static inline shr_handle_t get_next_handle(void)
>  #define mem_sharing_enabled(d) \
>      (is_hvm_domain(d) && (d)->arch.hvm.mem_sharing_enabled)
>  
> -static atomic_t nr_saved_mfns   = ATOMIC_INIT(0); 
> +static atomic_t nr_saved_mfns   = ATOMIC_INIT(0);
>  static atomic_t nr_shared_mfns  = ATOMIC_INIT(0);
>  
> -/** Reverse map **/
> -/* Every shared frame keeps a reverse map (rmap) of <domain, gfn> tuples that
> +/*
> + * Reverse map
> + *
> + * Every shared frame keeps a reverse map (rmap) of <domain, gfn> tuples that
>   * this shared frame backs. For pages with a low degree of sharing, a O(n)
>   * search linked list is good enough. For pages with higher degree of sharing,
> - * we use a hash table instead. */
> + * we use a hash table instead.
> + */
>  
>  typedef struct gfn_info
>  {
>      unsigned long gfn;
> -    domid_t domain; 
> +    domid_t domain;
>      struct list_head list;
>  } gfn_info_t;
>  
> -static inline void
> -rmap_init(struct page_info *page)
> +static inline void rmap_init(struct page_info *page)

Seeing as you're folding this, the inline can be dropped.  In .c files,
functions should just be static.

> @@ -437,25 +441,29 @@ static inline void mem_sharing_gfn_destroy(struct page_info *page,
>      xfree(gfn_info);
>  }
>  
> -static struct page_info* mem_sharing_lookup(unsigned long mfn)
> +static inline struct page_info* mem_sharing_lookup(unsigned long mfn)

Seeing as this is cleanup, the position of the * can move.  Similarly,
it shouldn't gain an inline.

I can fix all of this up on commit (and a few other brace position
issues) if you want, to save reworking a trivial v2.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2019-12-19  9:48 ` [Xen-devel] [PATCH v2 00/20] VM forking Roger Pau Monné
@ 2019-12-19 15:58   ` Tamas K Lengyel
  2019-12-30 17:59     ` Roger Pau Monné
  0 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-19 15:58 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Julien Grall, Petre Pircalabu, Stefano Stabellini,
	Tamas K Lengyel, Wei Liu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Ian Jackson, Anthony PERARD, Jan Beulich,
	Alexandru Isaila, Xen-devel

On Thu, Dec 19, 2019 at 2:48 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Wed, Dec 18, 2019 at 11:40:37AM -0800, Tamas K Lengyel wrote:
> > The following series implements VM forking for Intel HVM guests to allow for
> > the fast creation of identical VMs without the assosciated high startup costs
> > of booting or restoring the VM from a savefile.
> >
> > JIRA issue: https://xenproject.atlassian.net/browse/XEN-89
> >
> > The main design goal with this series has been to reduce the time of creating
> > the VM fork as much as possible. To achieve this the VM forking process is
> > split into two steps:
> >     1) forking the VM on the hypervisor side;
> >     2) starting QEMU to handle the backed for emulated devices.
> >
> > Step 1) involves creating a VM using the new "xl fork-vm" command. The
> > parent VM is expected to remain paused after forks are created from it (which
> > is different then what process forking normally entails). During this forking
>                ^ than
> > operation the HVM context and VM settings are copied over to the new forked VM.
> > This operation is fast and it allows the forked VM to be unpaused and to be
> > monitored and accessed via VMI. Note however that without its device model
> > running (depending on what is executing in the VM) it is bound to
> > misbehave/crash when its trying to access devices that would be emulated by
> > QEMU. We anticipate that for certain use-cases this would be an acceptable
> > situation, in case for example when fuzzing is performed of code segments that
> > don't access such devices.
> >
> > Step 2) involves launching QEMU to support the forked VM, which requires the
> > QEMU Xen savefile to be generated manually from the parent VM. This can be
> > accomplished simply by connecting to its QMP socket and issuing the
> > "xen-save-devices-state" command as documented by QEMU:
> > https://github.com/qemu/qemu/blob/master/docs/xen-save-devices-state.txt
> > Once the QEMU Xen savefile is generated the new "xl fork-launch-dm" command is
> > used to launch QEMU and load the specified savefile for it.
>
> IMO having two different commands is confusing for the end user, I
> would rather have something like:
>
> xl fork-vm [-d] ...
>
> Where '-d' would prevent forking any user-space emulators. I don't
> thinks there's a need for a separate command to fork the underlying
> user-space emulators.

Keeping it as two commands allows you to start up the fork and let it
run immediately and only start up QEMU when you notice it is needed.
The idea being that you can monitor the kernel and see when it tries
to do some I/O that would require the QEMU backend. If you combine the
commands that option goes away. Also, QEMU itself isn't getting forked
right now, we just start a new QEMU process with the saved-state
getting loaded into it. I looked into implementing a QEMU fork command
but it turns out that for the vast majority of our use-cases QEMU
isn't needed at all, so developing that addition was tabled.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 18/20] xen/mem_access: Use __get_gfn_type_access in set_mem_access
  2019-12-19  7:59   ` Alexandru Stefan ISAILA
@ 2019-12-19 16:00     ` Tamas K Lengyel
  0 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-19 16:00 UTC (permalink / raw)
  To: Alexandru Stefan ISAILA
  Cc: Petre Ovidiu PIRCALABU, Tamas K Lengyel, Wei Liu, George Dunlap,
	Andrew Cooper, Jan Beulich, xen-devel, Roger Pau Monné

On Thu, Dec 19, 2019 at 12:59 AM Alexandru Stefan ISAILA
<aisaila@bitdefender.com> wrote:
>
>
>
> On 18.12.2019 21:40, Tamas K Lengyel wrote:
> > Use __get_gfn_type_access instead of p2m->get_entry to trigger page-forking
> > when the mem_access permission is being set on a page that has not yet been
> > copied over from the parent.
> >
> > Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.comReviewed-by: Alexandru Isaila <aisaila@bitdefender.com>
>
> > ---
> >   xen/arch/x86/mm/mem_access.c | 5 ++---
> >   1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/xen/arch/x86/mm/mem_access.c b/xen/arch/x86/mm/mem_access.c
> > index 320b9fe621..9caf08a5b2 100644
> > --- a/xen/arch/x86/mm/mem_access.c
> > +++ b/xen/arch/x86/mm/mem_access.c
> > @@ -303,11 +303,10 @@ static int set_mem_access(struct domain *d, struct p2m_domain *p2m,
> >       ASSERT(!ap2m);
> >   #endif
> >       {
> > -        mfn_t mfn;
> >           p2m_access_t _a;
> >           p2m_type_t t;
> > -
> > -        mfn = p2m->get_entry(p2m, gfn, &t, &_a, 0, NULL, NULL);
> > +        mfn_t mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &t, &_a,
> > +                                          P2M_ALLOC, NULL, false);
>
> Don't you want p2m_query_t to be 0 as it was in the p2m->get_entry() call ?

No, the entire point of the patch is to have the P2M_ALLOC query flag
set. That triggers the fork's p2m population.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork
  2019-12-19 11:06           ` Jan Beulich
@ 2019-12-19 16:02             ` Tamas K Lengyel
  0 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-19 16:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Xen-devel, Julien Grall, Roger Pau Monné

On Thu, Dec 19, 2019 at 4:05 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 19.12.2019 01:15, Tamas K Lengyel wrote:
> > On Wed, Dec 18, 2019 at 4:02 PM Julien Grall <julien@xen.org> wrote:
> >> On 18/12/2019 22:33, Tamas K Lengyel wrote:
> >>> On Wed, Dec 18, 2019 at 3:00 PM Julien Grall <julien@xen.org> wrote:
> >>>> You also have multiple loop on the page_list in this function. Given the
> >>>> number of page_list can be quite big, this is a call for hogging the
> >>>> pCPU and an RCU lock on the domain vCPU running this call.
> >>>
> >>> There is just one loop over page_list itself, the second loop is on
> >>> the internal list that is being built here which will be a subset. The
> >>> list itself in fact should be small (in our tests usually <100).
> >>
> >> For a first, nothing in this function tells me that there will be only
> >> 100 pages. But then, I don't think this is right to implement your
> >> hypercall based only the  "normal" scenario. You should also think about
> >> the "worst" case scenario.
> >>
> >> In this case the worst case scenario is have hundreds of page in page_list.
> >
> > Well, this is only an experimental system that's completely disabled
> > by default. Making the assumption that people who make use of it will
> > know what they are doing I think is fair.
>
> FWIW I'm with Julien here: The preferred course of action is to make
> the operation safe against abuse. The minimum requirement is to
> document obvious missing pieces for this to become supported code.

That's perfectly fine by me.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork
  2019-12-19  7:45           ` Julien Grall
@ 2019-12-19 16:11             ` Tamas K Lengyel
  2019-12-19 16:57               ` Julien Grall
  0 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-19 16:11 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Jan Beulich, Xen-devel, Roger Pau Monné

> > Well, this is only an experimental system that's completely disabled
> > by default. Making the assumption that people who make use of it will
> > know what they are doing I think is fair.
>
> I assume that if you submit to upstream this new hypercall then there is
> longer plan to have more people to use it and potentially making
> "stable". If not, then it raises the question why this is pushed upstream...

It's being pushed upstream so other people can make use of it, even if
it's not "production quality". Beyond what is being sent here there
are no longer term plans from Intel (at this point) to support this in
any way. The alternative would be that we just release a fork (or just
the patches) and walk away. If the Xen community wants to make the
announcement that only code that will have long term support and is
"stable" is accepted upstream that's IMHO drastically going to reduce
people's interest to share anything.

> In any case, all the known assumptions should be documented so they can
> be fixed rather than forgotten until it is rediscovered via an XSA.

Fair enough.

> >
> >>
> >>> Granted the list can grow larger, but in those cases its likely better
> >>> to just discard the fork and create a new one. So in my opinion adding
> >>> a hypercall continuation to this not needed
> >>
> >> How would the caller know it? What would happen if the caller ends up to
> >> call this with a growing list.
> >
> > The caller knows by virtue of knowing how long the VM was executed
> > for. In the usecase this is targeted at the VM was executing only for
> > a couple seconds at most. Usually much less then that (we get about
> > ~80 resets/s with AFL). During that time its extremely unlikely you
> > get more then a ~100 pages deduplicated (that is, written to). But
> > even if there are more pages, it just means the hypercall might take a
> > bit longer to run for that iteration.
>
> I assume if you upstream the code then you want more people to use it
> (otherwise what's the point?). In this case, you will likely have people
> that heard about the feature, wants to test but don't know the internal.
>
> Such users need to know how this can be call safely without reading the
> implementation. In other words, some documentation for your hypercall is
> needed.

Sure.

>
> > I don't see any issue with not
> > breaking up this hypercall with continuation even under the worst case
> > situation though.
>
> Xen only supports voluntary preemption, this means that an hypercall can
> only be preempted if there is code for it. Otherwise the preemption will
> mostly only happen when returning to the guest.
>
> In other words, the vCPU executing the hypercall may go past its
> timeslice and prevent other vCPU to run.
>
> There are other problem with long running hypercalls. Anyway, in short,
> if you ever want to get you code supported then you will need the
> hypercall to be broken down.
>
> > But if others feel that strongly as well about
> > having to have continuation for this I don't really mind adding it.
>
> I don't think the continuation work is going to be difficult, but if you
> want to delay it, then the minimum is to document such assumptions for
> your users.

I just don't see a use for it because it will never actually execute.
So to me it just looks like unnecessary dead glue. But documenting the
assumption under which this hypercall should execute is perfectly
fair.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 04/20] x86/mem_sharing: cleanup code and comments in various locations
  2019-12-19 11:18   ` Andrew Cooper
@ 2019-12-19 16:20     ` Tamas K Lengyel
  2019-12-19 16:21     ` Tamas K Lengyel
  1 sibling, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-19 16:20 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Jan Beulich, Xen-devel,
	Roger Pau Monné

On Thu, Dec 19, 2019 at 4:19 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 18/12/2019 19:40, Tamas K Lengyel wrote:
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index 5a3a962fbb..1e888b403b 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -1902,12 +1902,11 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
> >      if ( npfec.write_access && (p2mt == p2m_ram_shared) )
> >      {
> >          ASSERT(p2m_is_hostp2m(p2m));
> > -        sharing_enomem =
> > -            (mem_sharing_unshare_page(currd, gfn, 0) < 0);
> > +        sharing_enomem = mem_sharing_unshare_page(currd, gfn, 0);
>
> This is a logical change.  Is it intended to be in a later patch?

While it may look like one it's actually not. The variable
sharing_enomem is declared as an int and the function only has two
possible return values, 0 and -ENOMEM.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 04/20] x86/mem_sharing: cleanup code and comments in various locations
  2019-12-19 11:18   ` Andrew Cooper
  2019-12-19 16:20     ` Tamas K Lengyel
@ 2019-12-19 16:21     ` Tamas K Lengyel
  2019-12-19 18:51       ` Andrew Cooper
  1 sibling, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-19 16:21 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Jan Beulich, Xen-devel,
	Roger Pau Monné

On Thu, Dec 19, 2019 at 4:19 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 18/12/2019 19:40, Tamas K Lengyel wrote:
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index 5a3a962fbb..1e888b403b 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -1902,12 +1902,11 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
> >      if ( npfec.write_access && (p2mt == p2m_ram_shared) )
> >      {
> >          ASSERT(p2m_is_hostp2m(p2m));
> > -        sharing_enomem =
> > -            (mem_sharing_unshare_page(currd, gfn, 0) < 0);
> > +        sharing_enomem = mem_sharing_unshare_page(currd, gfn, 0);
>
> This is a logical change.  Is it intended to be in a later patch?
>
> > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> > index efb8821768..319aaf3074 100644
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -198,24 +200,26 @@ static inline shr_handle_t get_next_handle(void)
> >  #define mem_sharing_enabled(d) \
> >      (is_hvm_domain(d) && (d)->arch.hvm.mem_sharing_enabled)
> >
> > -static atomic_t nr_saved_mfns   = ATOMIC_INIT(0);
> > +static atomic_t nr_saved_mfns   = ATOMIC_INIT(0);
> >  static atomic_t nr_shared_mfns  = ATOMIC_INIT(0);
> >
> > -/** Reverse map **/
> > -/* Every shared frame keeps a reverse map (rmap) of <domain, gfn> tuples that
> > +/*
> > + * Reverse map
> > + *
> > + * Every shared frame keeps a reverse map (rmap) of <domain, gfn> tuples that
> >   * this shared frame backs. For pages with a low degree of sharing, a O(n)
> >   * search linked list is good enough. For pages with higher degree of sharing,
> > - * we use a hash table instead. */
> > + * we use a hash table instead.
> > + */
> >
> >  typedef struct gfn_info
> >  {
> >      unsigned long gfn;
> > -    domid_t domain;
> > +    domid_t domain;
> >      struct list_head list;
> >  } gfn_info_t;
> >
> > -static inline void
> > -rmap_init(struct page_info *page)
> > +static inline void rmap_init(struct page_info *page)
>
> Seeing as you're folding this, the inline can be dropped.  In .c files,
> functions should just be static.
>
> > @@ -437,25 +441,29 @@ static inline void mem_sharing_gfn_destroy(struct page_info *page,
> >      xfree(gfn_info);
> >  }
> >
> > -static struct page_info* mem_sharing_lookup(unsigned long mfn)
> > +static inline struct page_info* mem_sharing_lookup(unsigned long mfn)
>
> Seeing as this is cleanup, the position of the * can move.  Similarly,
> it shouldn't gain an inline.
>
> I can fix all of this up on commit (and a few other brace position
> issues) if you want, to save reworking a trivial v2.
>

Provided nothing else is outstanding with the patch and it can be
committed I would certainly appreciate that.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork
  2019-12-19 16:11             ` Tamas K Lengyel
@ 2019-12-19 16:57               ` Julien Grall
  2019-12-19 17:23                 ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Julien Grall @ 2019-12-19 16:57 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Jan Beulich, Xen-devel, Roger Pau Monné

Hi,

On 19/12/2019 16:11, Tamas K Lengyel wrote:
>>> Well, this is only an experimental system that's completely disabled
>>> by default. Making the assumption that people who make use of it will
>>> know what they are doing I think is fair.
>>
>> I assume that if you submit to upstream this new hypercall then there is
>> longer plan to have more people to use it and potentially making
>> "stable". If not, then it raises the question why this is pushed upstream...
> 
> It's being pushed upstream so other people can make use of it, even if
> it's not "production quality". Beyond what is being sent here there
> are no longer term plans from Intel (at this point) to support this in
> any way. 

So if this is merged, who is going to maintain it?

> The alternative would be that we just release a fork (or just
> the patches) and walk away.
>  If the Xen community wants to make the
> announcement that only code that will have long term support and is
> "stable" is accepted upstream that's IMHO drastically going to reduce
> people's interest to share anything.

Sharing is one thing, but if this code is not at least a minimum 
maintained then it is likely the code will not be functional in a year time.

Don't get me wrong, this is a cool feature to have but you make it 
sounds like this is going to be dumped in Xen and never going to be 
touched again. How is this going to be beneficial for the community?

> 
>> In any case, all the known assumptions should be documented so they can
>> be fixed rather than forgotten until it is rediscovered via an XSA.
> 
> Fair enough.
> 
>>>
>>>>
>>>>> Granted the list can grow larger, but in those cases its likely better
>>>>> to just discard the fork and create a new one. So in my opinion adding
>>>>> a hypercall continuation to this not needed
>>>>
>>>> How would the caller know it? What would happen if the caller ends up to
>>>> call this with a growing list.
>>>
>>> The caller knows by virtue of knowing how long the VM was executed
>>> for. In the usecase this is targeted at the VM was executing only for
>>> a couple seconds at most. Usually much less then that (we get about
>>> ~80 resets/s with AFL). During that time its extremely unlikely you
>>> get more then a ~100 pages deduplicated (that is, written to). But
>>> even if there are more pages, it just means the hypercall might take a
>>> bit longer to run for that iteration.
>>
>> I assume if you upstream the code then you want more people to use it
>> (otherwise what's the point?). In this case, you will likely have people
>> that heard about the feature, wants to test but don't know the internal.
>>
>> Such users need to know how this can be call safely without reading the
>> implementation. In other words, some documentation for your hypercall is
>> needed.
> 
> Sure.
> 
>>
>>> I don't see any issue with not
>>> breaking up this hypercall with continuation even under the worst case
>>> situation though.
>>
>> Xen only supports voluntary preemption, this means that an hypercall can
>> only be preempted if there is code for it. Otherwise the preemption will
>> mostly only happen when returning to the guest.
>>
>> In other words, the vCPU executing the hypercall may go past its
>> timeslice and prevent other vCPU to run.
>>
>> There are other problem with long running hypercalls. Anyway, in short,
>> if you ever want to get you code supported then you will need the
>> hypercall to be broken down.
>>
>>> But if others feel that strongly as well about
>>> having to have continuation for this I don't really mind adding it.
>>
>> I don't think the continuation work is going to be difficult, but if you
>> want to delay it, then the minimum is to document such assumptions for
>> your users.
> 
> I just don't see a use for it because it will never actually execute.

That's a very narrow view of how your hypercall can be used. I know that 
you said that should only be called early, but imagine for a moment the 
user decide to call it much later in the fork process.

> So to me it just looks like unnecessary dead glue. 

Try to call the hypercall after enough deduplication happen (maybe 
20min). Alternatively, give me access to your machine with the code and 
I can show how it can be misused ;).

> But documenting the
> assumption under which this hypercall should execute is perfectly
> fair.

You might want to think how the interface would look like with the 
preemption. So the day you decide to introduce preemption you don't have 
to create a new hypercall.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork
  2019-12-19 16:57               ` Julien Grall
@ 2019-12-19 17:23                 ` Tamas K Lengyel
  2019-12-19 17:38                   ` Julien Grall
  0 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-19 17:23 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Jan Beulich, Xen-devel, Roger Pau Monné

On Thu, Dec 19, 2019 at 9:58 AM Julien Grall <julien@xen.org> wrote:
>
> Hi,
>
> On 19/12/2019 16:11, Tamas K Lengyel wrote:
> >>> Well, this is only an experimental system that's completely disabled
> >>> by default. Making the assumption that people who make use of it will
> >>> know what they are doing I think is fair.
> >>
> >> I assume that if you submit to upstream this new hypercall then there is
> >> longer plan to have more people to use it and potentially making
> >> "stable". If not, then it raises the question why this is pushed upstream...
> >
> > It's being pushed upstream so other people can make use of it, even if
> > it's not "production quality". Beyond what is being sent here there
> > are no longer term plans from Intel (at this point) to support this in
> > any way.
>
> So if this is merged, who is going to maintain it?

It falls under mem_sharing for which I'm listed as "Odd Fixes"
maintainer, which I do in my personal free time. The status there
isn't changing with this new feature.

>
> > The alternative would be that we just release a fork (or just
> > the patches) and walk away.
> >  If the Xen community wants to make the
> > announcement that only code that will have long term support and is
> > "stable" is accepted upstream that's IMHO drastically going to reduce
> > people's interest to share anything.
>
> Sharing is one thing, but if this code is not at least a minimum
> maintained then it is likely the code will not be functional in a year time.

Surprisingly mem_sharing had only minor bitrots in the last ~5 years
in which time it has been pretty much abandoned.

>
> Don't get me wrong, this is a cool feature to have but you make it
> sounds like this is going to be dumped in Xen and never going to be
> touched again. How is this going to be beneficial for the community?

It adds an experimental feature to Xen that people can try out and
well experiment with! It may be useful, it may not be. You yourself
said that this is a cool feature. The fact that there is a JIRA ticket
tracking this also shows there is community interest in having such a
feature available. If down the road that changes and this proves to be
dead code, it can be removed. I don't think that will be necessary
since this isn't even compiled by default anymore though. But anyway,
it makes more sense to get it upstream then carry it out of tree
because it gets more exposure that way, more people may try it out.
Having it in-tree also means that in a couple releases people who are
interested in the feature don't have to go through the process of
rebasing the patchset on newer versions of Xen since it's already
in-tree.

> >>> But if others feel that strongly as well about
> >>> having to have continuation for this I don't really mind adding it.
> >>
> >> I don't think the continuation work is going to be difficult, but if you
> >> want to delay it, then the minimum is to document such assumptions for
> >> your users.
> >
> > I just don't see a use for it because it will never actually execute.
>
> That's a very narrow view of how your hypercall can be used. I know that
> you said that should only be called early, but imagine for a moment the
> user decide to call it much later in the fork process.
>
> > So to me it just looks like unnecessary dead glue.
>
> Try to call the hypercall after enough deduplication happen (maybe
> 20min). Alternatively, give me access to your machine with the code and
> I can show how it can be misused ;).

It will hang for a bit for sure and Linux in dom0 will complain that a
CPU is stuck. But it will eventually finish. It's not like it's doing
all that much. And anyway, if you notice that happening when you call
it it will be an obvious clue that you shouldn't be using it under the
situation you are using it under. Having continuation would hide that.

>
> > But documenting the
> > assumption under which this hypercall should execute is perfectly
> > fair.
>
> You might want to think how the interface would look like with the
> preemption. So the day you decide to introduce preemption you don't have
> to create a new hypercall.

Why would you need to introduce a new hypercall if preemption becomes
necessary? This is not a stable interfaces. It can be removed/changed
completely from one Xen release to the next.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork
  2019-12-19 17:23                 ` Tamas K Lengyel
@ 2019-12-19 17:38                   ` Julien Grall
  2019-12-19 18:00                     ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Julien Grall @ 2019-12-19 17:38 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Jan Beulich, Xen-devel, Roger Pau Monné



On 19/12/2019 17:23, Tamas K Lengyel wrote:
> On Thu, Dec 19, 2019 at 9:58 AM Julien Grall <julien@xen.org> wrote:
>>
>> Hi,
>>
>> On 19/12/2019 16:11, Tamas K Lengyel wrote:
>>>>> Well, this is only an experimental system that's completely disabled
>>>>> by default. Making the assumption that people who make use of it will
>>>>> know what they are doing I think is fair.
>>>>
>>>> I assume that if you submit to upstream this new hypercall then there is
>>>> longer plan to have more people to use it and potentially making
>>>> "stable". If not, then it raises the question why this is pushed upstream...
>>>
>>> It's being pushed upstream so other people can make use of it, even if
>>> it's not "production quality". Beyond what is being sent here there
>>> are no longer term plans from Intel (at this point) to support this in
>>> any way.
>>
>> So if this is merged, who is going to maintain it?
> 
> It falls under mem_sharing for which I'm listed as "Odd Fixes"
> maintainer, which I do in my personal free time. The status there
> isn't changing with this new feature.
> 
>>
>>> The alternative would be that we just release a fork (or just
>>> the patches) and walk away.
>>>   If the Xen community wants to make the
>>> announcement that only code that will have long term support and is
>>> "stable" is accepted upstream that's IMHO drastically going to reduce
>>> people's interest to share anything.
>>
>> Sharing is one thing, but if this code is not at least a minimum
>> maintained then it is likely the code will not be functional in a year time.
> 
> Surprisingly mem_sharing had only minor bitrots in the last ~5 years
> in which time it has been pretty much abandoned.
This falls under a "minimum maintained". This wasn't clear from your 
previous statement stating there will be no support.

[...]

> 
>>>>> But if others feel that strongly as well about
>>>>> having to have continuation for this I don't really mind adding it.
>>>>
>>>> I don't think the continuation work is going to be difficult, but if you
>>>> want to delay it, then the minimum is to document such assumptions for
>>>> your users.
>>>
>>> I just don't see a use for it because it will never actually execute.
>>
>> That's a very narrow view of how your hypercall can be used. I know that
>> you said that should only be called early, but imagine for a moment the
>> user decide to call it much later in the fork process.
>>
>>> So to me it just looks like unnecessary dead glue.
>>
>> Try to call the hypercall after enough deduplication happen (maybe
>> 20min). Alternatively, give me access to your machine with the code and
>> I can show how it can be misused ;).
> 
> It will hang for a bit for sure and Linux in dom0 will complain that a
> CPU is stuck. But it will eventually finish. It's not like it's doing
> all that much. And anyway, if you notice that happening when you call
> it it will be an obvious clue that you shouldn't be using it under the
> situation you are using it under. Having continuation would hide that.

I am not going to argue more as this is an experimental feature. But 
this will be a showstopper if we ever consider mem_sharing to be 
supported (or even security supported).

Meanwhile please document the assumption.

> 
>>
>>> But documenting the
>>> assumption under which this hypercall should execute is perfectly
>>> fair.
>>
>> You might want to think how the interface would look like with the
>> preemption. So the day you decide to introduce preemption you don't have
>> to create a new hypercall.
> 
> Why would you need to introduce a new hypercall if preemption becomes
> necessary? This is not a stable interfaces. It can be removed/changed
> completely from one Xen release to the next.

Sorry, I didn't realize the mem_sharing code was not a stable interfaces.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork
  2019-12-19 17:38                   ` Julien Grall
@ 2019-12-19 18:00                     ` Tamas K Lengyel
  0 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-19 18:00 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Jan Beulich, Xen-devel, Roger Pau Monné

> >>> The alternative would be that we just release a fork (or just
> >>> the patches) and walk away.
> >>>   If the Xen community wants to make the
> >>> announcement that only code that will have long term support and is
> >>> "stable" is accepted upstream that's IMHO drastically going to reduce
> >>> people's interest to share anything.
> >>
> >> Sharing is one thing, but if this code is not at least a minimum
> >> maintained then it is likely the code will not be functional in a year time.
> >
> > Surprisingly mem_sharing had only minor bitrots in the last ~5 years
> > in which time it has been pretty much abandoned.
> This falls under a "minimum maintained". This wasn't clear from your
> previous statement stating there will be no support.

Sure, I meant there is no support from Intel (ie. it's not part of my
job-description nor do I get payed to support this long-term). I
usually check during the RC test days that it's at least functional by
doing some testing manually. But it's pretty ad-hoc when and if I do
that (hence the Odd fixes status).

> >
> >>>>> But if others feel that strongly as well about
> >>>>> having to have continuation for this I don't really mind adding it.
> >>>>
> >>>> I don't think the continuation work is going to be difficult, but if you
> >>>> want to delay it, then the minimum is to document such assumptions for
> >>>> your users.
> >>>
> >>> I just don't see a use for it because it will never actually execute.
> >>
> >> That's a very narrow view of how your hypercall can be used. I know that
> >> you said that should only be called early, but imagine for a moment the
> >> user decide to call it much later in the fork process.
> >>
> >>> So to me it just looks like unnecessary dead glue.
> >>
> >> Try to call the hypercall after enough deduplication happen (maybe
> >> 20min). Alternatively, give me access to your machine with the code and
> >> I can show how it can be misused ;).
> >
> > It will hang for a bit for sure and Linux in dom0 will complain that a
> > CPU is stuck. But it will eventually finish. It's not like it's doing
> > all that much. And anyway, if you notice that happening when you call
> > it it will be an obvious clue that you shouldn't be using it under the
> > situation you are using it under. Having continuation would hide that.
>
> I am not going to argue more as this is an experimental feature. But
> this will be a showstopper if we ever consider mem_sharing to be
> supported (or even security supported).
>
> Meanwhile please document the assumption.

Ack, already did.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 04/20] x86/mem_sharing: cleanup code and comments in various locations
  2019-12-19 16:21     ` Tamas K Lengyel
@ 2019-12-19 18:51       ` Andrew Cooper
  2019-12-19 19:26         ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Andrew Cooper @ 2019-12-19 18:51 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Jan Beulich, Xen-devel,
	Roger Pau Monné

On 19/12/2019 16:21, Tamas K Lengyel wrote:
>>> @@ -437,25 +441,29 @@ static inline void mem_sharing_gfn_destroy(struct page_info *page,
>>>      xfree(gfn_info);
>>>  }
>>>
>>> -static struct page_info* mem_sharing_lookup(unsigned long mfn)
>>> +static inline struct page_info* mem_sharing_lookup(unsigned long mfn)
>> Seeing as this is cleanup, the position of the * can move.  Similarly,
>> it shouldn't gain an inline.
>>
>> I can fix all of this up on commit (and a few other brace position
>> issues) if you want, to save reworking a trivial v2.
>>
> Provided nothing else is outstanding with the patch and it can be
> committed I would certainly appreciate that.

I've pushed this and the previous patch, with some further fixups that I
spotted.

Hopefully the rebase won't be an issue.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible Tamas K Lengyel
@ 2019-12-19 19:07   ` Andrew Cooper
  2019-12-19 19:38     ` Tamas K Lengyel
  2019-12-20 16:46   ` Jan Beulich
  1 sibling, 1 reply; 96+ messages in thread
From: Andrew Cooper @ 2019-12-19 19:07 UTC (permalink / raw)
  To: Tamas K Lengyel, xen-devel; +Cc: Wei Liu, Jan Beulich, Roger Pau Monné

On 18/12/2019 19:40, Tamas K Lengyel wrote:
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 614ed60fe4..5a3a962fbb 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -4072,16 +4072,17 @@ static int hvmop_set_evtchn_upcall_vector(
>  }
>  
>  static int hvm_allow_set_param(struct domain *d,
> -                               const struct xen_hvm_param *a)
> +                               uint32_t index,
> +                               uint64_t new_value)

These two lines can be folded together.

> @@ -4134,13 +4135,11 @@ static int hvm_allow_set_param(struct domain *d,
>      return rc;
>  }
>  
> -static int hvmop_set_param(
> +int hvmop_set_param(
>      XEN_GUEST_HANDLE_PARAM(xen_hvm_param_t) arg)

and here.

> @@ -4160,23 +4159,42 @@ static int hvmop_set_param(
>      if ( !is_hvm_domain(d) )
>          goto out;
>  
> -    rc = hvm_allow_set_param(d, &a);
> +    rc = hvm_set_param(d, a.index, a.value);
> +
> + out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
> +int hvm_set_param(
> +    struct domain *d,
> +    uint32_t index,
> +    uint64_t value)

and again.

> @@ -4340,34 +4358,33 @@ static int hvmop_set_param(
>           * 256 bits interrupt redirection bitmap + 64k bits I/O bitmap
>           * plus one padding byte).
>           */
> -        if ( (a.value >> 32) > sizeof(struct tss32) +
> +        if ( (value >> 32) > sizeof(struct tss32) +
>                                 (0x100 / 8) + (0x10000 / 8) + 1 )
> -            a.value = (uint32_t)a.value |
> +            value = (uint32_t)value |
>                        ((sizeof(struct tss32) + (0x100 / 8) +
>                                                 (0x10000 / 8) + 1) << 32);

Can you reindent the surrounding lines so they line up again?

> @@ -4429,42 +4446,60 @@ static int hvmop_get_param(
>      if ( !is_hvm_domain(d) )
>          goto out;
>  
> -    rc = hvm_allow_get_param(d, &a);
> +    rc = hvm_get_param(d, a.index, &a.value);
>      if ( rc )
>          goto out;
>  
> -    switch ( a.index )
> +    rc = __copy_to_guest(arg, &a, 1) ? -EFAULT : 0;
> +
> +    HVM_DBG_LOG(DBG_LEVEL_HCALL, "get param %u = %"PRIx64,
> +                a.index, a.value);
> +
> + out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
> +int hvm_get_param(
> +    struct domain *d,
> +    uint32_t index,
> +    uint64_t *value)

Fold.

> +{
> +    int rc;
> +
> +    if ( index >= HVM_NR_PARAMS || !value )

No need for !value.  It had better only ever point onto the hypervisor
stack now that this is an internal function.

> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> index 1d7b66f927..a6f4ae76a1 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -335,6 +335,10 @@ unsigned long hvm_cr4_guest_valid_bits(const struct domain *d, bool restore);
>  bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
>                          void *ctxt);
>  
> +/* Caller must hold domain locks */

How about asserting so?

No major problems so Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 02/20] xen/x86: Make hap_get_allocation accessible
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 02/20] xen/x86: Make hap_get_allocation accessible Tamas K Lengyel
@ 2019-12-19 19:08   ` Andrew Cooper
  2019-12-20 16:48   ` Jan Beulich
  1 sibling, 0 replies; 96+ messages in thread
From: Andrew Cooper @ 2019-12-19 19:08 UTC (permalink / raw)
  To: Tamas K Lengyel, xen-devel
  Cc: George Dunlap, Wei Liu, Jan Beulich, Roger Pau Monné

On 18/12/2019 19:40, Tamas K Lengyel wrote:
> During VM forking we'll copy the parent domain's parameters to the client,
> including the HAP shadow memory setting that is used for storing the domain's
> EPT. We'll copy this in the hypervisor instead doing it during toolstack launch
> to allow the domain to start executing and unsharing memory before (or
> even completely without) the toolstack.
>
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 05/20] x86/mem_sharing: make get_two_gfns take locks conditionally
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 05/20] x86/mem_sharing: make get_two_gfns take locks conditionally Tamas K Lengyel
@ 2019-12-19 19:12   ` Andrew Cooper
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Cooper @ 2019-12-19 19:12 UTC (permalink / raw)
  To: Tamas K Lengyel, xen-devel
  Cc: George Dunlap, Tamas K Lengyel, Wei Liu, Jan Beulich,
	Roger Pau Monné

On 18/12/2019 19:40, Tamas K Lengyel wrote:
> During VM forking the client lock will already be taken.
>
> Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 07/20] x86/mem_sharing: don't try to unshare twice during page fault
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 07/20] x86/mem_sharing: don't try to unshare twice during page fault Tamas K Lengyel
@ 2019-12-19 19:19   ` Andrew Cooper
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Cooper @ 2019-12-19 19:19 UTC (permalink / raw)
  To: Tamas K Lengyel, xen-devel; +Cc: Wei Liu, Jan Beulich, Roger Pau Monné

On 18/12/2019 19:40, Tamas K Lengyel wrote:
> @@ -1959,19 +1965,21 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
>       */
>      if ( paged )
>          p2m_mem_paging_populate(currd, gfn);
> +
> +#ifdef CONFIG_MEM_SHARING
>      if ( sharing_enomem )
>      {
> -        int rv;
> -
> -        if ( (rv = mem_sharing_notify_enomem(currd, gfn, true)) < 0 )
> +        if ( !vm_event_check_ring(currd->vm_event_share) )
>          {
>              gdprintk(XENLOG_ERR, "Domain %hu attempt to unshare "
> -                     "gfn %lx, ENOMEM and no helper (rc %d)\n",
> -                     currd->domain_id, gfn, rv);
> +                     "gfn %lx, ENOMEM and no helper\n",
> +                     currd->domain_id, gfn);

As observations on this and later patches.  Use %pd (gets d%u or d[NAME]
as applicable), especially were d[COW] is a liable domid to get.

Also, any action which crashes a domain must not be a gdprintk(),
because otherwise you end up with a domain_crash() and no
context/symptoms in release builds of Xen.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 04/20] x86/mem_sharing: cleanup code and comments in various locations
  2019-12-19 18:51       ` Andrew Cooper
@ 2019-12-19 19:26         ` Tamas K Lengyel
  0 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-19 19:26 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Tamas K Lengyel, Wei Liu, George Dunlap, Jan Beulich, Xen-devel,
	Roger Pau Monné

On Thu, Dec 19, 2019 at 11:51 AM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 19/12/2019 16:21, Tamas K Lengyel wrote:
> >>> @@ -437,25 +441,29 @@ static inline void mem_sharing_gfn_destroy(struct page_info *page,
> >>>      xfree(gfn_info);
> >>>  }
> >>>
> >>> -static struct page_info* mem_sharing_lookup(unsigned long mfn)
> >>> +static inline struct page_info* mem_sharing_lookup(unsigned long mfn)
> >> Seeing as this is cleanup, the position of the * can move.  Similarly,
> >> it shouldn't gain an inline.
> >>
> >> I can fix all of this up on commit (and a few other brace position
> >> issues) if you want, to save reworking a trivial v2.
> >>
> > Provided nothing else is outstanding with the patch and it can be
> > committed I would certainly appreciate that.
>
> I've pushed this and the previous patch, with some further fixups that I
> spotted.
>
> Hopefully the rebase won't be an issue.

Great! There were quite a few conflicts in the rebase so I hope I
don't accidentally revert some of your new fixes :)

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-19 19:07   ` Andrew Cooper
@ 2019-12-19 19:38     ` Tamas K Lengyel
  2019-12-19 19:40       ` Andrew Cooper
  0 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-19 19:38 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Xen-devel, Tamas K Lengyel, Jan Beulich, Wei Liu, Roger Pau Monné

> > --- a/xen/include/asm-x86/hvm/hvm.h
> > +++ b/xen/include/asm-x86/hvm/hvm.h
> > @@ -335,6 +335,10 @@ unsigned long hvm_cr4_guest_valid_bits(const struct domain *d, bool restore);
> >  bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
> >                          void *ctxt);
> >
> > +/* Caller must hold domain locks */
>
> How about asserting so?

AFAICT there is no "domain_locked_by_me" function, only
paging/p2m/gfn_locked_by_me. So any suggestion on how to achieve that?

Thanks!
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-19 19:38     ` Tamas K Lengyel
@ 2019-12-19 19:40       ` Andrew Cooper
  2019-12-19 19:49         ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Andrew Cooper @ 2019-12-19 19:40 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Xen-devel, Tamas K Lengyel, Jan Beulich, Wei Liu, Roger Pau Monné

On 19/12/2019 19:38, Tamas K Lengyel wrote:
>>> --- a/xen/include/asm-x86/hvm/hvm.h
>>> +++ b/xen/include/asm-x86/hvm/hvm.h
>>> @@ -335,6 +335,10 @@ unsigned long hvm_cr4_guest_valid_bits(const struct domain *d, bool restore);
>>>  bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
>>>                          void *ctxt);
>>>
>>> +/* Caller must hold domain locks */
>> How about asserting so?
> AFAICT there is no "domain_locked_by_me" function, only
> paging/p2m/gfn_locked_by_me. So any suggestion on how to achieve that?

spin_is_locked() gets you most of the way, and would be a start.

But yes - now you say this, I remember that we don't currently have
suitable infrastructure.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-19 19:40       ` Andrew Cooper
@ 2019-12-19 19:49         ` Tamas K Lengyel
  2019-12-19 19:57           ` Andrew Cooper
  0 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-19 19:49 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Xen-devel, Tamas K Lengyel, Jan Beulich, Wei Liu, Roger Pau Monné

On Thu, Dec 19, 2019 at 12:41 PM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 19/12/2019 19:38, Tamas K Lengyel wrote:
> >>> --- a/xen/include/asm-x86/hvm/hvm.h
> >>> +++ b/xen/include/asm-x86/hvm/hvm.h
> >>> @@ -335,6 +335,10 @@ unsigned long hvm_cr4_guest_valid_bits(const struct domain *d, bool restore);
> >>>  bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
> >>>                          void *ctxt);
> >>>
> >>> +/* Caller must hold domain locks */
> >> How about asserting so?
> > AFAICT there is no "domain_locked_by_me" function, only
> > paging/p2m/gfn_locked_by_me. So any suggestion on how to achieve that?
>
> spin_is_locked() gets you most of the way, and would be a start.
>
> But yes - now you say this, I remember that we don't currently have
> suitable infrastructure.

Is the domain lock even a spin lock (the on you use by
rcu_lock_live_remote_domain_by_id)? Looks to me it just goes down to
"rcu_read_lock" which only does a preempt_disable() call o.O

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-19 19:49         ` Tamas K Lengyel
@ 2019-12-19 19:57           ` Andrew Cooper
  2019-12-19 20:09             ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Andrew Cooper @ 2019-12-19 19:57 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Xen-devel, Tamas K Lengyel, Jan Beulich, Wei Liu, Roger Pau Monné

On 19/12/2019 19:49, Tamas K Lengyel wrote:
> On Thu, Dec 19, 2019 at 12:41 PM Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 19/12/2019 19:38, Tamas K Lengyel wrote:
>>>>> --- a/xen/include/asm-x86/hvm/hvm.h
>>>>> +++ b/xen/include/asm-x86/hvm/hvm.h
>>>>> @@ -335,6 +335,10 @@ unsigned long hvm_cr4_guest_valid_bits(const struct domain *d, bool restore);
>>>>>  bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
>>>>>                          void *ctxt);
>>>>>
>>>>> +/* Caller must hold domain locks */
>>>> How about asserting so?
>>> AFAICT there is no "domain_locked_by_me" function, only
>>> paging/p2m/gfn_locked_by_me. So any suggestion on how to achieve that?
>> spin_is_locked() gets you most of the way, and would be a start.
>>
>> But yes - now you say this, I remember that we don't currently have
>> suitable infrastructure.
> Is the domain lock even a spin lock (the on you use by
> rcu_lock_live_remote_domain_by_id)? Looks to me it just goes down to
> "rcu_read_lock" which only does a preempt_disable() call o.O

/sigh - of course we have two things called the domain lock.

The RCU one is to ensure that the struct domain can't get freed behind
our back, and shouldn't be interesting in this context (obtaining the d
pointer in the first place causes it to be safe).  If that is the only
one which matters, I would drop the comment.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-19 19:57           ` Andrew Cooper
@ 2019-12-19 20:09             ` Tamas K Lengyel
  0 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-19 20:09 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Xen-devel, Tamas K Lengyel, Jan Beulich, Wei Liu, Roger Pau Monné

On Thu, Dec 19, 2019 at 12:57 PM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 19/12/2019 19:49, Tamas K Lengyel wrote:
> > On Thu, Dec 19, 2019 at 12:41 PM Andrew Cooper
> > <andrew.cooper3@citrix.com> wrote:
> >> On 19/12/2019 19:38, Tamas K Lengyel wrote:
> >>>>> --- a/xen/include/asm-x86/hvm/hvm.h
> >>>>> +++ b/xen/include/asm-x86/hvm/hvm.h
> >>>>> @@ -335,6 +335,10 @@ unsigned long hvm_cr4_guest_valid_bits(const struct domain *d, bool restore);
> >>>>>  bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
> >>>>>                          void *ctxt);
> >>>>>
> >>>>> +/* Caller must hold domain locks */
> >>>> How about asserting so?
> >>> AFAICT there is no "domain_locked_by_me" function, only
> >>> paging/p2m/gfn_locked_by_me. So any suggestion on how to achieve that?
> >> spin_is_locked() gets you most of the way, and would be a start.
> >>
> >> But yes - now you say this, I remember that we don't currently have
> >> suitable infrastructure.
> > Is the domain lock even a spin lock (the on you use by
> > rcu_lock_live_remote_domain_by_id)? Looks to me it just goes down to
> > "rcu_read_lock" which only does a preempt_disable() call o.O
>
> /sigh - of course we have two things called the domain lock.
>
> The RCU one is to ensure that the struct domain can't get freed behind
> our back, and shouldn't be interesting in this context (obtaining the d
> pointer in the first place causes it to be safe).  If that is the only
> one which matters, I would drop the comment.

Yes, only the RCU lock gets taken right now by all callers, so I'll
drop the comment.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible Tamas K Lengyel
  2019-12-19 19:07   ` Andrew Cooper
@ 2019-12-20 16:46   ` Jan Beulich
  2019-12-20 17:27     ` Tamas K Lengyel
  1 sibling, 1 reply; 96+ messages in thread
From: Jan Beulich @ 2019-12-20 16:46 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: xen-devel, Roger Pau Monné, Wei Liu, Andrew Cooper

On 18.12.2019 20:40, Tamas K Lengyel wrote:
> Currently the hvm parameters are only accessible via the HVMOP hypercalls. By
> exposing hvm_{get/set}_param it will be possible for VM forking to copy the
> parameters directly into the clone domain.

Having peeked ahead at patch 17, where this gets used, I wonder why
you want a pair of one-by-one functions, rather than a copy-all one.
This then wouldn't require exposure of the functions you touch here.

> @@ -4429,42 +4446,60 @@ static int hvmop_get_param(
>      if ( !is_hvm_domain(d) )
>          goto out;
>  
> -    rc = hvm_allow_get_param(d, &a);
> +    rc = hvm_get_param(d, a.index, &a.value);
>      if ( rc )
>          goto out;
>  
> -    switch ( a.index )
> +    rc = __copy_to_guest(arg, &a, 1) ? -EFAULT : 0;
> +
> +    HVM_DBG_LOG(DBG_LEVEL_HCALL, "get param %u = %"PRIx64,
> +                a.index, a.value);
> +
> + out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
> +int hvm_get_param(
> +    struct domain *d,

If this is to be non-static, I think it would be quite nice if
this parameter was const. This will take a prereq patch to
constify the XSM path involved, but other than this I can't
see anything getting in the way.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 02/20] xen/x86: Make hap_get_allocation accessible
  2019-12-18 19:40 ` [Xen-devel] [PATCH v2 02/20] xen/x86: Make hap_get_allocation accessible Tamas K Lengyel
  2019-12-19 19:08   ` Andrew Cooper
@ 2019-12-20 16:48   ` Jan Beulich
  1 sibling, 0 replies; 96+ messages in thread
From: Jan Beulich @ 2019-12-20 16:48 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: George Dunlap, xen-devel, Roger Pau Monné, Wei Liu, Andrew Cooper

On 18.12.2019 20:40, Tamas K Lengyel wrote:
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -321,8 +321,7 @@ static void hap_free_p2m_page(struct domain *d, struct page_info *pg)
>  }
>  
>  /* Return the size of the pool, rounded up to the nearest MB */
> -static unsigned int
> -hap_get_allocation(struct domain *d)
> +unsigned int hap_get_allocation(struct domain *d)

Along the lines of the comment on patch 1, please take the
opportunity and constify the parameter (which looks entirely
straightforward here).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-20 16:46   ` Jan Beulich
@ 2019-12-20 17:27     ` Tamas K Lengyel
  2019-12-20 17:32       ` Andrew Cooper
  0 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-20 17:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Xen-devel, Andrew Cooper, Tamas K Lengyel, Wei Liu, Roger Pau Monné

On Fri, Dec 20, 2019 at 9:47 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 18.12.2019 20:40, Tamas K Lengyel wrote:
> > Currently the hvm parameters are only accessible via the HVMOP hypercalls. By
> > exposing hvm_{get/set}_param it will be possible for VM forking to copy the
> > parameters directly into the clone domain.
>
> Having peeked ahead at patch 17, where this gets used, I wonder why
> you want a pair of one-by-one functions, rather than a copy-all one.
> This then wouldn't require exposure of the functions you touch here.

Well, provided there is no such function in existence today it was
just easier to use what's already available. I still wouldn't want to
implement a one-shot function like that because this same code-path is
shared by the save-restore operations on the toolstack side, so at
least I have a reasonable assumption that it won't break on me in the
future.

> > @@ -4429,42 +4446,60 @@ static int hvmop_get_param(
> >      if ( !is_hvm_domain(d) )
> >          goto out;
> >
> > -    rc = hvm_allow_get_param(d, &a);
> > +    rc = hvm_get_param(d, a.index, &a.value);
> >      if ( rc )
> >          goto out;
> >
> > -    switch ( a.index )
> > +    rc = __copy_to_guest(arg, &a, 1) ? -EFAULT : 0;
> > +
> > +    HVM_DBG_LOG(DBG_LEVEL_HCALL, "get param %u = %"PRIx64,
> > +                a.index, a.value);
> > +
> > + out:
> > +    rcu_unlock_domain(d);
> > +    return rc;
> > +}
> > +
> > +int hvm_get_param(
> > +    struct domain *d,
>
> If this is to be non-static, I think it would be quite nice if
> this parameter was const. This will take a prereq patch to
> constify the XSM path involved, but other than this I can't
> see anything getting in the way.

Sure.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-20 17:27     ` Tamas K Lengyel
@ 2019-12-20 17:32       ` Andrew Cooper
  2019-12-20 17:36         ` Tamas K Lengyel
  2019-12-23  9:37         ` Jan Beulich
  0 siblings, 2 replies; 96+ messages in thread
From: Andrew Cooper @ 2019-12-20 17:32 UTC (permalink / raw)
  To: Tamas K Lengyel, Jan Beulich
  Cc: Xen-devel, Tamas K Lengyel, Wei Liu, Roger Pau Monné

On 20/12/2019 17:27, Tamas K Lengyel wrote:
> On Fri, Dec 20, 2019 at 9:47 AM Jan Beulich <jbeulich@suse.com> wrote:
>> On 18.12.2019 20:40, Tamas K Lengyel wrote:
>>> Currently the hvm parameters are only accessible via the HVMOP hypercalls. By
>>> exposing hvm_{get/set}_param it will be possible for VM forking to copy the
>>> parameters directly into the clone domain.
>> Having peeked ahead at patch 17, where this gets used, I wonder why
>> you want a pair of one-by-one functions, rather than a copy-all one.
>> This then wouldn't require exposure of the functions you touch here.
> Well, provided there is no such function in existence today it was
> just easier to use what's already available. I still wouldn't want to
> implement a one-shot function like that because this same code-path is
> shared by the save-restore operations on the toolstack side, so at
> least I have a reasonable assumption that it won't break on me in the
> future.

In particular, a number of the set operations are distinctly
non-trivial.  (OTOH, those are not long for this world, and should be
creation X86_EMU_* constants instead).

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-20 17:32       ` Andrew Cooper
@ 2019-12-20 17:36         ` Tamas K Lengyel
  2019-12-20 17:46           ` Andrew Cooper
  2019-12-23  9:37         ` Jan Beulich
  1 sibling, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-20 17:36 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Xen-devel, Tamas K Lengyel, Wei Liu, Jan Beulich, Roger Pau Monné

On Fri, Dec 20, 2019 at 10:32 AM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 20/12/2019 17:27, Tamas K Lengyel wrote:
> > On Fri, Dec 20, 2019 at 9:47 AM Jan Beulich <jbeulich@suse.com> wrote:
> >> On 18.12.2019 20:40, Tamas K Lengyel wrote:
> >>> Currently the hvm parameters are only accessible via the HVMOP hypercalls. By
> >>> exposing hvm_{get/set}_param it will be possible for VM forking to copy the
> >>> parameters directly into the clone domain.
> >> Having peeked ahead at patch 17, where this gets used, I wonder why
> >> you want a pair of one-by-one functions, rather than a copy-all one.
> >> This then wouldn't require exposure of the functions you touch here.
> > Well, provided there is no such function in existence today it was
> > just easier to use what's already available. I still wouldn't want to
> > implement a one-shot function like that because this same code-path is
> > shared by the save-restore operations on the toolstack side, so at
> > least I have a reasonable assumption that it won't break on me in the
> > future.
>
> In particular, a number of the set operations are distinctly
> non-trivial.  (OTOH, those are not long for this world, and should be
> creation X86_EMU_* constants instead).
>

I actually wanted to ask about that. In
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxc/xc_sr_save_x86_hvm.c;h=97a8c49807f192c47209525f51e4d79a50c66cec;hb=HEAD#l61
the toolstack only selects certain HVM params to be saved (and
restored later). I originally followed the same logic in the fork
code-path but after a lot of experiments it looks like it's actually
OK to grab all params but only call set_param on the ones that have a
non-zero value. So setting some params with a zero value has certainly
lead to crashes, but otherwise it seems to "just work" to copy all the
rest.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-20 17:36         ` Tamas K Lengyel
@ 2019-12-20 17:46           ` Andrew Cooper
  2019-12-20 17:50             ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Andrew Cooper @ 2019-12-20 17:46 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Xen-devel, Tamas K Lengyel, Wei Liu, Jan Beulich, Roger Pau Monné

On 20/12/2019 17:36, Tamas K Lengyel wrote:
> On Fri, Dec 20, 2019 at 10:32 AM Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 20/12/2019 17:27, Tamas K Lengyel wrote:
>>> On Fri, Dec 20, 2019 at 9:47 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>> On 18.12.2019 20:40, Tamas K Lengyel wrote:
>>>>> Currently the hvm parameters are only accessible via the HVMOP hypercalls. By
>>>>> exposing hvm_{get/set}_param it will be possible for VM forking to copy the
>>>>> parameters directly into the clone domain.
>>>> Having peeked ahead at patch 17, where this gets used, I wonder why
>>>> you want a pair of one-by-one functions, rather than a copy-all one.
>>>> This then wouldn't require exposure of the functions you touch here.
>>> Well, provided there is no such function in existence today it was
>>> just easier to use what's already available. I still wouldn't want to
>>> implement a one-shot function like that because this same code-path is
>>> shared by the save-restore operations on the toolstack side, so at
>>> least I have a reasonable assumption that it won't break on me in the
>>> future.
>> In particular, a number of the set operations are distinctly
>> non-trivial.  (OTOH, those are not long for this world, and should be
>> creation X86_EMU_* constants instead).
>>
> I actually wanted to ask about that. In
> https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxc/xc_sr_save_x86_hvm.c;h=97a8c49807f192c47209525f51e4d79a50c66cec;hb=HEAD#l61
> the toolstack only selects certain HVM params to be saved (and
> restored later). I originally followed the same logic in the fork
> code-path but after a lot of experiments it looks like it's actually
> OK to grab all params but only call set_param on the ones that have a
> non-zero value. So setting some params with a zero value has certainly
> lead to crashes, but otherwise it seems to "just work" to copy all the
> rest.

I think you're trying to ascribe any form of design/plan to a system
which had none. :)

The code you quote was like that because that is how legacy migration
worked.  That said, eliding empty records was an effort-saving exercise
(avoid redundant hypercalls on destination side), not because there was
any suggestion that attempting to explicitly set 0 would crash.

Do you have any idea which param was causing problems?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-20 17:46           ` Andrew Cooper
@ 2019-12-20 17:50             ` Tamas K Lengyel
  2019-12-20 18:00               ` Andrew Cooper
  0 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-20 17:50 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Xen-devel, Tamas K Lengyel, Wei Liu, Jan Beulich, Roger Pau Monné

On Fri, Dec 20, 2019 at 10:47 AM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 20/12/2019 17:36, Tamas K Lengyel wrote:
> > On Fri, Dec 20, 2019 at 10:32 AM Andrew Cooper
> > <andrew.cooper3@citrix.com> wrote:
> >> On 20/12/2019 17:27, Tamas K Lengyel wrote:
> >>> On Fri, Dec 20, 2019 at 9:47 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>> On 18.12.2019 20:40, Tamas K Lengyel wrote:
> >>>>> Currently the hvm parameters are only accessible via the HVMOP hypercalls. By
> >>>>> exposing hvm_{get/set}_param it will be possible for VM forking to copy the
> >>>>> parameters directly into the clone domain.
> >>>> Having peeked ahead at patch 17, where this gets used, I wonder why
> >>>> you want a pair of one-by-one functions, rather than a copy-all one.
> >>>> This then wouldn't require exposure of the functions you touch here.
> >>> Well, provided there is no such function in existence today it was
> >>> just easier to use what's already available. I still wouldn't want to
> >>> implement a one-shot function like that because this same code-path is
> >>> shared by the save-restore operations on the toolstack side, so at
> >>> least I have a reasonable assumption that it won't break on me in the
> >>> future.
> >> In particular, a number of the set operations are distinctly
> >> non-trivial.  (OTOH, those are not long for this world, and should be
> >> creation X86_EMU_* constants instead).
> >>
> > I actually wanted to ask about that. In
> > https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxc/xc_sr_save_x86_hvm.c;h=97a8c49807f192c47209525f51e4d79a50c66cec;hb=HEAD#l61
> > the toolstack only selects certain HVM params to be saved (and
> > restored later). I originally followed the same logic in the fork
> > code-path but after a lot of experiments it looks like it's actually
> > OK to grab all params but only call set_param on the ones that have a
> > non-zero value. So setting some params with a zero value has certainly
> > lead to crashes, but otherwise it seems to "just work" to copy all the
> > rest.
>
> I think you're trying to ascribe any form of design/plan to a system
> which had none. :)
>
> The code you quote was like that because that is how legacy migration
> worked.  That said, eliding empty records was an effort-saving exercise
> (avoid redundant hypercalls on destination side), not because there was
> any suggestion that attempting to explicitly set 0 would crash.
>
> Do you have any idea which param was causing problems?

Yes, HVM_PARAM_IDENT_PT was one sure. There may have been others (I
don't recall now) but simply checking for non-zero value before
calling set_param resolved everything.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-20 17:50             ` Tamas K Lengyel
@ 2019-12-20 18:00               ` Andrew Cooper
  2019-12-20 18:05                 ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Andrew Cooper @ 2019-12-20 18:00 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Xen-devel, Tamas K Lengyel, Wei Liu, Jan Beulich, Roger Pau Monné

On 20/12/2019 17:50, Tamas K Lengyel wrote:
> On Fri, Dec 20, 2019 at 10:47 AM Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 20/12/2019 17:36, Tamas K Lengyel wrote:
>>> On Fri, Dec 20, 2019 at 10:32 AM Andrew Cooper
>>> <andrew.cooper3@citrix.com> wrote:
>>>> On 20/12/2019 17:27, Tamas K Lengyel wrote:
>>>>> On Fri, Dec 20, 2019 at 9:47 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>> On 18.12.2019 20:40, Tamas K Lengyel wrote:
>>>>>>> Currently the hvm parameters are only accessible via the HVMOP hypercalls. By
>>>>>>> exposing hvm_{get/set}_param it will be possible for VM forking to copy the
>>>>>>> parameters directly into the clone domain.
>>>>>> Having peeked ahead at patch 17, where this gets used, I wonder why
>>>>>> you want a pair of one-by-one functions, rather than a copy-all one.
>>>>>> This then wouldn't require exposure of the functions you touch here.
>>>>> Well, provided there is no such function in existence today it was
>>>>> just easier to use what's already available. I still wouldn't want to
>>>>> implement a one-shot function like that because this same code-path is
>>>>> shared by the save-restore operations on the toolstack side, so at
>>>>> least I have a reasonable assumption that it won't break on me in the
>>>>> future.
>>>> In particular, a number of the set operations are distinctly
>>>> non-trivial.  (OTOH, those are not long for this world, and should be
>>>> creation X86_EMU_* constants instead).
>>>>
>>> I actually wanted to ask about that. In
>>> https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxc/xc_sr_save_x86_hvm.c;h=97a8c49807f192c47209525f51e4d79a50c66cec;hb=HEAD#l61
>>> the toolstack only selects certain HVM params to be saved (and
>>> restored later). I originally followed the same logic in the fork
>>> code-path but after a lot of experiments it looks like it's actually
>>> OK to grab all params but only call set_param on the ones that have a
>>> non-zero value. So setting some params with a zero value has certainly
>>> lead to crashes, but otherwise it seems to "just work" to copy all the
>>> rest.
>> I think you're trying to ascribe any form of design/plan to a system
>> which had none. :)
>>
>> The code you quote was like that because that is how legacy migration
>> worked.  That said, eliding empty records was an effort-saving exercise
>> (avoid redundant hypercalls on destination side), not because there was
>> any suggestion that attempting to explicitly set 0 would crash.
>>
>> Do you have any idea which param was causing problems?
> Yes, HVM_PARAM_IDENT_PT was one sure. There may have been others (I
> don't recall now) but simply checking for non-zero value before
> calling set_param resolved everything.

IDENT_PT is an Westmere(?) wrinkle.

There was one processor back in those days which supported EPT, but
didn't support VT-x running in unpaged mode.  Therefore, we had to fake
up unpaged mode by pointing vCR3 at an identity pagetable inside the
guests physical address space.

The crash won't be from the IDENT_PT itself, but the paging_update_cr3()
side effect.  Was it a host crash, or guest crash?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-20 18:00               ` Andrew Cooper
@ 2019-12-20 18:05                 ` Tamas K Lengyel
  0 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-20 18:05 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Xen-devel, Tamas K Lengyel, Wei Liu, Jan Beulich, Roger Pau Monné

On Fri, Dec 20, 2019 at 11:00 AM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 20/12/2019 17:50, Tamas K Lengyel wrote:
> > On Fri, Dec 20, 2019 at 10:47 AM Andrew Cooper
> > <andrew.cooper3@citrix.com> wrote:
> >> On 20/12/2019 17:36, Tamas K Lengyel wrote:
> >>> On Fri, Dec 20, 2019 at 10:32 AM Andrew Cooper
> >>> <andrew.cooper3@citrix.com> wrote:
> >>>> On 20/12/2019 17:27, Tamas K Lengyel wrote:
> >>>>> On Fri, Dec 20, 2019 at 9:47 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>> On 18.12.2019 20:40, Tamas K Lengyel wrote:
> >>>>>>> Currently the hvm parameters are only accessible via the HVMOP hypercalls. By
> >>>>>>> exposing hvm_{get/set}_param it will be possible for VM forking to copy the
> >>>>>>> parameters directly into the clone domain.
> >>>>>> Having peeked ahead at patch 17, where this gets used, I wonder why
> >>>>>> you want a pair of one-by-one functions, rather than a copy-all one.
> >>>>>> This then wouldn't require exposure of the functions you touch here.
> >>>>> Well, provided there is no such function in existence today it was
> >>>>> just easier to use what's already available. I still wouldn't want to
> >>>>> implement a one-shot function like that because this same code-path is
> >>>>> shared by the save-restore operations on the toolstack side, so at
> >>>>> least I have a reasonable assumption that it won't break on me in the
> >>>>> future.
> >>>> In particular, a number of the set operations are distinctly
> >>>> non-trivial.  (OTOH, those are not long for this world, and should be
> >>>> creation X86_EMU_* constants instead).
> >>>>
> >>> I actually wanted to ask about that. In
> >>> https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxc/xc_sr_save_x86_hvm.c;h=97a8c49807f192c47209525f51e4d79a50c66cec;hb=HEAD#l61
> >>> the toolstack only selects certain HVM params to be saved (and
> >>> restored later). I originally followed the same logic in the fork
> >>> code-path but after a lot of experiments it looks like it's actually
> >>> OK to grab all params but only call set_param on the ones that have a
> >>> non-zero value. So setting some params with a zero value has certainly
> >>> lead to crashes, but otherwise it seems to "just work" to copy all the
> >>> rest.
> >> I think you're trying to ascribe any form of design/plan to a system
> >> which had none. :)
> >>
> >> The code you quote was like that because that is how legacy migration
> >> worked.  That said, eliding empty records was an effort-saving exercise
> >> (avoid redundant hypercalls on destination side), not because there was
> >> any suggestion that attempting to explicitly set 0 would crash.
> >>
> >> Do you have any idea which param was causing problems?
> > Yes, HVM_PARAM_IDENT_PT was one sure. There may have been others (I
> > don't recall now) but simply checking for non-zero value before
> > calling set_param resolved everything.
>
> IDENT_PT is an Westmere(?) wrinkle.
>
> There was one processor back in those days which supported EPT, but
> didn't support VT-x running in unpaged mode.  Therefore, we had to fake
> up unpaged mode by pointing vCR3 at an identity pagetable inside the
> guests physical address space.

Eh, yikes.

>
> The crash won't be from the IDENT_PT itself, but the paging_update_cr3()
> side effect.  Was it a host crash, or guest crash?
>

Yes, that's what I recall after I looked into it. It was a guest a
crash as I remember.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-20 17:32       ` Andrew Cooper
  2019-12-20 17:36         ` Tamas K Lengyel
@ 2019-12-23  9:37         ` Jan Beulich
  2019-12-23 14:55           ` Tamas K Lengyel
  1 sibling, 1 reply; 96+ messages in thread
From: Jan Beulich @ 2019-12-23  9:37 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Tamas K Lengyel, Tamas K Lengyel, Xen-devel, Wei Liu,
	Roger Pau Monné

On 20.12.2019 18:32, Andrew Cooper wrote:
> On 20/12/2019 17:27, Tamas K Lengyel wrote:
>> On Fri, Dec 20, 2019 at 9:47 AM Jan Beulich <jbeulich@suse.com> wrote:
>>> On 18.12.2019 20:40, Tamas K Lengyel wrote:
>>>> Currently the hvm parameters are only accessible via the HVMOP hypercalls. By
>>>> exposing hvm_{get/set}_param it will be possible for VM forking to copy the
>>>> parameters directly into the clone domain.
>>> Having peeked ahead at patch 17, where this gets used, I wonder why
>>> you want a pair of one-by-one functions, rather than a copy-all one.
>>> This then wouldn't require exposure of the functions you touch here.
>> Well, provided there is no such function in existence today it was
>> just easier to use what's already available. I still wouldn't want to
>> implement a one-shot function like that because this same code-path is
>> shared by the save-restore operations on the toolstack side, so at
>> least I have a reasonable assumption that it won't break on me in the
>> future.
> 
> In particular, a number of the set operations are distinctly
> non-trivial.

How is trivial or not related to there being one function doing
the looping wanted here vs the looping being done by the caller
around the two per-entity calls?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-23  9:37         ` Jan Beulich
@ 2019-12-23 14:55           ` Tamas K Lengyel
  2019-12-27  8:02             ` Jan Beulich
  0 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-23 14:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Tamas K Lengyel, Xen-devel, Wei Liu, Roger Pau Monné

On Mon, Dec 23, 2019 at 2:37 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 20.12.2019 18:32, Andrew Cooper wrote:
> > On 20/12/2019 17:27, Tamas K Lengyel wrote:
> >> On Fri, Dec 20, 2019 at 9:47 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>> On 18.12.2019 20:40, Tamas K Lengyel wrote:
> >>>> Currently the hvm parameters are only accessible via the HVMOP hypercalls. By
> >>>> exposing hvm_{get/set}_param it will be possible for VM forking to copy the
> >>>> parameters directly into the clone domain.
> >>> Having peeked ahead at patch 17, where this gets used, I wonder why
> >>> you want a pair of one-by-one functions, rather than a copy-all one.
> >>> This then wouldn't require exposure of the functions you touch here.
> >> Well, provided there is no such function in existence today it was
> >> just easier to use what's already available. I still wouldn't want to
> >> implement a one-shot function like that because this same code-path is
> >> shared by the save-restore operations on the toolstack side, so at
> >> least I have a reasonable assumption that it won't break on me in the
> >> future.
> >
> > In particular, a number of the set operations are distinctly
> > non-trivial.
>
> How is trivial or not related to there being one function doing
> the looping wanted here vs the looping being done by the caller
> around the two per-entity calls?

I don't really get why would it matter where the looping is being
done? Even if I were to add a single function to do this, it would do
the same looping and just call the now internally kept get/set params
functions.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-23 14:55           ` Tamas K Lengyel
@ 2019-12-27  8:02             ` Jan Beulich
  2019-12-27 13:10               ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Jan Beulich @ 2019-12-27  8:02 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Andrew Cooper, Tamas K Lengyel, Xen-devel, Wei Liu, Roger Pau Monné

(re-sending, as I still don't see the mail having appeared on the list)

On 23.12.2019 15:55, Tamas K Lengyel wrote:
> On Mon, Dec 23, 2019 at 2:37 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 20.12.2019 18:32, Andrew Cooper wrote:
>>> On 20/12/2019 17:27, Tamas K Lengyel wrote:
>>>> On Fri, Dec 20, 2019 at 9:47 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>> On 18.12.2019 20:40, Tamas K Lengyel wrote:
>>>>>> Currently the hvm parameters are only accessible via the HVMOP hypercalls. By
>>>>>> exposing hvm_{get/set}_param it will be possible for VM forking to copy the
>>>>>> parameters directly into the clone domain.
>>>>> Having peeked ahead at patch 17, where this gets used, I wonder why
>>>>> you want a pair of one-by-one functions, rather than a copy-all one.
>>>>> This then wouldn't require exposure of the functions you touch here.
>>>> Well, provided there is no such function in existence today it was
>>>> just easier to use what's already available. I still wouldn't want to
>>>> implement a one-shot function like that because this same code-path is
>>>> shared by the save-restore operations on the toolstack side, so at
>>>> least I have a reasonable assumption that it won't break on me in the
>>>> future.
>>>
>>> In particular, a number of the set operations are distinctly
>>> non-trivial.
>>
>> How is trivial or not related to there being one function doing
>> the looping wanted here vs the looping being done by the caller
>> around the two per-entity calls?
> 
> I don't really get why would it matter where the looping is being
> done? Even if I were to add a single function to do this, it would do
> the same looping and just call the now internally kept get/set params
> functions.

The difference (to me) is what level of control gets exposed outside
of the file. For example I also dislike external code doing this
somewhat odd (but necessary as per your communication with Andrew)
checking against zero values. Such are implementation details which
would better not be scatter around.

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-27  8:02             ` Jan Beulich
@ 2019-12-27 13:10               ` Tamas K Lengyel
  2019-12-27 13:44                 ` Jan Beulich
  0 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-27 13:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Tamas K Lengyel, Xen-devel, Wei Liu, Roger Pau Monné

On Fri, Dec 27, 2019 at 1:04 AM Jan Beulich <JBeulich@suse.com> wrote:
>
> (re-sending, as I still don't see the mail having appeared on the list)
>
> On 23.12.2019 15:55, Tamas K Lengyel wrote:
> > On Mon, Dec 23, 2019 at 2:37 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 20.12.2019 18:32, Andrew Cooper wrote:
> >>> On 20/12/2019 17:27, Tamas K Lengyel wrote:
> >>>> On Fri, Dec 20, 2019 at 9:47 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>> On 18.12.2019 20:40, Tamas K Lengyel wrote:
> >>>>>> Currently the hvm parameters are only accessible via the HVMOP hypercalls. By
> >>>>>> exposing hvm_{get/set}_param it will be possible for VM forking to copy the
> >>>>>> parameters directly into the clone domain.
> >>>>> Having peeked ahead at patch 17, where this gets used, I wonder why
> >>>>> you want a pair of one-by-one functions, rather than a copy-all one.
> >>>>> This then wouldn't require exposure of the functions you touch here.
> >>>> Well, provided there is no such function in existence today it was
> >>>> just easier to use what's already available. I still wouldn't want to
> >>>> implement a one-shot function like that because this same code-path is
> >>>> shared by the save-restore operations on the toolstack side, so at
> >>>> least I have a reasonable assumption that it won't break on me in the
> >>>> future.
> >>>
> >>> In particular, a number of the set operations are distinctly
> >>> non-trivial.
> >>
> >> How is trivial or not related to there being one function doing
> >> the looping wanted here vs the looping being done by the caller
> >> around the two per-entity calls?
> >
> > I don't really get why would it matter where the looping is being
> > done? Even if I were to add a single function to do this, it would do
> > the same looping and just call the now internally kept get/set params
> > functions.
>
> The difference (to me) is what level of control gets exposed outside
> of the file. For example I also dislike external code doing this
> somewhat odd (but necessary as per your communication with Andrew)
> checking against zero values. Such are implementation details which
> would better not be scatter around.

But you do realize that both of these functions are already exposed
via hypercalls? So it's OK to call them from the toolstack but not
from other parts of Xen itself? I don't see much reason there. But to
me it makes 0 difference where the loop that copies the params is
done, as long as it's within Xen. So if you really want it to be in
hvm.c I can do that, I just find it silly.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-27 13:10               ` Tamas K Lengyel
@ 2019-12-27 13:44                 ` Jan Beulich
  2019-12-27 14:06                   ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Jan Beulich @ 2019-12-27 13:44 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Andrew Cooper, Tamas K Lengyel, Xen-devel, Wei Liu, Roger Pau Monné

On 27.12.2019 14:10, Tamas K Lengyel wrote:
> On Fri, Dec 27, 2019 at 1:04 AM Jan Beulich <JBeulich@suse.com> wrote:
>>
>> (re-sending, as I still don't see the mail having appeared on the list)
>>
>> On 23.12.2019 15:55, Tamas K Lengyel wrote:
>>> On Mon, Dec 23, 2019 at 2:37 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 20.12.2019 18:32, Andrew Cooper wrote:
>>>>> On 20/12/2019 17:27, Tamas K Lengyel wrote:
>>>>>> On Fri, Dec 20, 2019 at 9:47 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>> On 18.12.2019 20:40, Tamas K Lengyel wrote:
>>>>>>>> Currently the hvm parameters are only accessible via the HVMOP hypercalls. By
>>>>>>>> exposing hvm_{get/set}_param it will be possible for VM forking to copy the
>>>>>>>> parameters directly into the clone domain.
>>>>>>> Having peeked ahead at patch 17, where this gets used, I wonder why
>>>>>>> you want a pair of one-by-one functions, rather than a copy-all one.
>>>>>>> This then wouldn't require exposure of the functions you touch here.
>>>>>> Well, provided there is no such function in existence today it was
>>>>>> just easier to use what's already available. I still wouldn't want to
>>>>>> implement a one-shot function like that because this same code-path is
>>>>>> shared by the save-restore operations on the toolstack side, so at
>>>>>> least I have a reasonable assumption that it won't break on me in the
>>>>>> future.
>>>>>
>>>>> In particular, a number of the set operations are distinctly
>>>>> non-trivial.
>>>>
>>>> How is trivial or not related to there being one function doing
>>>> the looping wanted here vs the looping being done by the caller
>>>> around the two per-entity calls?
>>>
>>> I don't really get why would it matter where the looping is being
>>> done? Even if I were to add a single function to do this, it would do
>>> the same looping and just call the now internally kept get/set params
>>> functions.
>>
>> The difference (to me) is what level of control gets exposed outside
>> of the file. For example I also dislike external code doing this
>> somewhat odd (but necessary as per your communication with Andrew)
>> checking against zero values. Such are implementation details which
>> would better not be scatter around.
> 
> But you do realize that both of these functions are already exposed
> via hypercalls? So it's OK to call them from the toolstack but not
> from other parts of Xen itself? I don't see much reason there.

Right now we have exactly one path each allowing this get/set. Adding
a 2nd (from outside of hvm.c) opens the door for possible races
between the various (for now just two) possible call sites. Noticing
a possible problem when adding new code is imo quite a bit more
likely if everything lives centralized in one place. IOW "exposure"
here isn't meant so much in the sense of what entity in the system
gets to drive the data, but which entities within Xen may play with
it.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible
  2019-12-27 13:44                 ` Jan Beulich
@ 2019-12-27 14:06                   ` Tamas K Lengyel
  0 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-27 14:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Tamas K Lengyel, Xen-devel, Wei Liu, Roger Pau Monné

On Fri, Dec 27, 2019 at 6:44 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 27.12.2019 14:10, Tamas K Lengyel wrote:
> > On Fri, Dec 27, 2019 at 1:04 AM Jan Beulich <JBeulich@suse.com> wrote:
> >>
> >> (re-sending, as I still don't see the mail having appeared on the list)
> >>
> >> On 23.12.2019 15:55, Tamas K Lengyel wrote:
> >>> On Mon, Dec 23, 2019 at 2:37 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> On 20.12.2019 18:32, Andrew Cooper wrote:
> >>>>> On 20/12/2019 17:27, Tamas K Lengyel wrote:
> >>>>>> On Fri, Dec 20, 2019 at 9:47 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>> On 18.12.2019 20:40, Tamas K Lengyel wrote:
> >>>>>>>> Currently the hvm parameters are only accessible via the HVMOP hypercalls. By
> >>>>>>>> exposing hvm_{get/set}_param it will be possible for VM forking to copy the
> >>>>>>>> parameters directly into the clone domain.
> >>>>>>> Having peeked ahead at patch 17, where this gets used, I wonder why
> >>>>>>> you want a pair of one-by-one functions, rather than a copy-all one.
> >>>>>>> This then wouldn't require exposure of the functions you touch here.
> >>>>>> Well, provided there is no such function in existence today it was
> >>>>>> just easier to use what's already available. I still wouldn't want to
> >>>>>> implement a one-shot function like that because this same code-path is
> >>>>>> shared by the save-restore operations on the toolstack side, so at
> >>>>>> least I have a reasonable assumption that it won't break on me in the
> >>>>>> future.
> >>>>>
> >>>>> In particular, a number of the set operations are distinctly
> >>>>> non-trivial.
> >>>>
> >>>> How is trivial or not related to there being one function doing
> >>>> the looping wanted here vs the looping being done by the caller
> >>>> around the two per-entity calls?
> >>>
> >>> I don't really get why would it matter where the looping is being
> >>> done? Even if I were to add a single function to do this, it would do
> >>> the same looping and just call the now internally kept get/set params
> >>> functions.
> >>
> >> The difference (to me) is what level of control gets exposed outside
> >> of the file. For example I also dislike external code doing this
> >> somewhat odd (but necessary as per your communication with Andrew)
> >> checking against zero values. Such are implementation details which
> >> would better not be scatter around.
> >
> > But you do realize that both of these functions are already exposed
> > via hypercalls? So it's OK to call them from the toolstack but not
> > from other parts of Xen itself? I don't see much reason there.
>
> Right now we have exactly one path each allowing this get/set. Adding
> a 2nd (from outside of hvm.c) opens the door for possible races
> between the various (for now just two) possible call sites. Noticing
> a possible problem when adding new code is imo quite a bit more
> likely if everything lives centralized in one place. IOW "exposure"
> here isn't meant so much in the sense of what entity in the system
> gets to drive the data, but which entities within Xen may play with
> it.

Sure, I'll move the loop to hvm.c then.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2019-12-19 15:58   ` Tamas K Lengyel
@ 2019-12-30 17:59     ` Roger Pau Monné
  2019-12-30 18:15       ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Roger Pau Monné @ 2019-12-30 17:59 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Julien Grall, Petre Pircalabu, Stefano Stabellini,
	Tamas K Lengyel, Wei Liu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Ian Jackson, Anthony PERARD, Jan Beulich,
	Alexandru Isaila, Xen-devel

On Thu, Dec 19, 2019 at 08:58:01AM -0700, Tamas K Lengyel wrote:
> On Thu, Dec 19, 2019 at 2:48 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Wed, Dec 18, 2019 at 11:40:37AM -0800, Tamas K Lengyel wrote:
> > > The following series implements VM forking for Intel HVM guests to allow for
> > > the fast creation of identical VMs without the assosciated high startup costs
> > > of booting or restoring the VM from a savefile.
> > >
> > > JIRA issue: https://xenproject.atlassian.net/browse/XEN-89
> > >
> > > The main design goal with this series has been to reduce the time of creating
> > > the VM fork as much as possible. To achieve this the VM forking process is
> > > split into two steps:
> > >     1) forking the VM on the hypervisor side;
> > >     2) starting QEMU to handle the backed for emulated devices.
> > >
> > > Step 1) involves creating a VM using the new "xl fork-vm" command. The
> > > parent VM is expected to remain paused after forks are created from it (which
> > > is different then what process forking normally entails). During this forking
> >                ^ than
> > > operation the HVM context and VM settings are copied over to the new forked VM.
> > > This operation is fast and it allows the forked VM to be unpaused and to be
> > > monitored and accessed via VMI. Note however that without its device model
> > > running (depending on what is executing in the VM) it is bound to
> > > misbehave/crash when its trying to access devices that would be emulated by
> > > QEMU. We anticipate that for certain use-cases this would be an acceptable
> > > situation, in case for example when fuzzing is performed of code segments that
> > > don't access such devices.
> > >
> > > Step 2) involves launching QEMU to support the forked VM, which requires the
> > > QEMU Xen savefile to be generated manually from the parent VM. This can be
> > > accomplished simply by connecting to its QMP socket and issuing the
> > > "xen-save-devices-state" command as documented by QEMU:
> > > https://github.com/qemu/qemu/blob/master/docs/xen-save-devices-state.txt
> > > Once the QEMU Xen savefile is generated the new "xl fork-launch-dm" command is
> > > used to launch QEMU and load the specified savefile for it.
> >
> > IMO having two different commands is confusing for the end user, I
> > would rather have something like:
> >
> > xl fork-vm [-d] ...
> >
> > Where '-d' would prevent forking any user-space emulators. I don't
> > thinks there's a need for a separate command to fork the underlying
> > user-space emulators.
> 
> Keeping it as two commands allows you to start up the fork and let it
> run immediately and only start up QEMU when you notice it is needed.
> The idea being that you can monitor the kernel and see when it tries
> to do some I/O that would require the QEMU backend. If you combine the
> commands that option goes away.

I'm not sure I see why, you could still provide a `xl fork-vm [-c]
...` that would just lunch a QEMU instance. End users using xl have
AFAICT no way to tell whether or when a QEMU is needed or not, and
hence the default behavior should be a fully functional one.

IMO I think fork-vm without any options should do a complete fork of a
VM, rather than a partial one without a device model clone.

> Also, QEMU itself isn't getting forked
> right now, we just start a new QEMU process with the saved-state
> getting loaded into it. I looked into implementing a QEMU fork command
> but it turns out that for the vast majority of our use-cases QEMU
> isn't needed at all, so developing that addition was tabled.

Starting a new process with the saved-state looks fine to me, it can
always be improved afterwards if there's a need for it.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2019-12-30 17:59     ` Roger Pau Monné
@ 2019-12-30 18:15       ` Tamas K Lengyel
  2019-12-30 18:43         ` Julien Grall
  0 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-30 18:15 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Julien Grall, Petre Pircalabu, Stefano Stabellini,
	Tamas K Lengyel, Wei Liu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Ian Jackson, Anthony PERARD, Jan Beulich,
	Alexandru Isaila, Xen-devel

On Mon, Dec 30, 2019 at 10:59 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Thu, Dec 19, 2019 at 08:58:01AM -0700, Tamas K Lengyel wrote:
> > On Thu, Dec 19, 2019 at 2:48 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >
> > > On Wed, Dec 18, 2019 at 11:40:37AM -0800, Tamas K Lengyel wrote:
> > > > The following series implements VM forking for Intel HVM guests to allow for
> > > > the fast creation of identical VMs without the assosciated high startup costs
> > > > of booting or restoring the VM from a savefile.
> > > >
> > > > JIRA issue: https://xenproject.atlassian.net/browse/XEN-89
> > > >
> > > > The main design goal with this series has been to reduce the time of creating
> > > > the VM fork as much as possible. To achieve this the VM forking process is
> > > > split into two steps:
> > > >     1) forking the VM on the hypervisor side;
> > > >     2) starting QEMU to handle the backed for emulated devices.
> > > >
> > > > Step 1) involves creating a VM using the new "xl fork-vm" command. The
> > > > parent VM is expected to remain paused after forks are created from it (which
> > > > is different then what process forking normally entails). During this forking
> > >                ^ than
> > > > operation the HVM context and VM settings are copied over to the new forked VM.
> > > > This operation is fast and it allows the forked VM to be unpaused and to be
> > > > monitored and accessed via VMI. Note however that without its device model
> > > > running (depending on what is executing in the VM) it is bound to
> > > > misbehave/crash when its trying to access devices that would be emulated by
> > > > QEMU. We anticipate that for certain use-cases this would be an acceptable
> > > > situation, in case for example when fuzzing is performed of code segments that
> > > > don't access such devices.
> > > >
> > > > Step 2) involves launching QEMU to support the forked VM, which requires the
> > > > QEMU Xen savefile to be generated manually from the parent VM. This can be
> > > > accomplished simply by connecting to its QMP socket and issuing the
> > > > "xen-save-devices-state" command as documented by QEMU:
> > > > https://github.com/qemu/qemu/blob/master/docs/xen-save-devices-state.txt
> > > > Once the QEMU Xen savefile is generated the new "xl fork-launch-dm" command is
> > > > used to launch QEMU and load the specified savefile for it.
> > >
> > > IMO having two different commands is confusing for the end user, I
> > > would rather have something like:
> > >
> > > xl fork-vm [-d] ...
> > >
> > > Where '-d' would prevent forking any user-space emulators. I don't
> > > thinks there's a need for a separate command to fork the underlying
> > > user-space emulators.
> >
> > Keeping it as two commands allows you to start up the fork and let it
> > run immediately and only start up QEMU when you notice it is needed.
> > The idea being that you can monitor the kernel and see when it tries
> > to do some I/O that would require the QEMU backend. If you combine the
> > commands that option goes away.
>
> I'm not sure I see why, you could still provide a `xl fork-vm [-c]
> ...` that would just lunch a QEMU instance. End users using xl have
> AFAICT no way to tell whether or when a QEMU is needed or not, and
> hence the default behavior should be a fully functional one.
>
> IMO I think fork-vm without any options should do a complete fork of a
> VM, rather than a partial one without a device model clone.

I understand your point but implementing that is outside the scope of
what we are doing right now. There are a lot more steps involved if
you want to create a fully functional VM fork with QEMU, for example
you also have to create a separate disk so you don't clobber the
parent VM's disk. Also, saving the QEMU device state is currently
hard-wired into the save/migration operation, so changing that
plumbing in libxl is quite involved. I actually found it way easier to
just write a script that connects to the socket and saves it to a
target file then going through the pain of adjusting libxl. So while
this could be implemented at this time it won't be.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2019-12-30 18:15       ` Tamas K Lengyel
@ 2019-12-30 18:43         ` Julien Grall
  2019-12-30 20:46           ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Julien Grall @ 2019-12-30 18:43 UTC (permalink / raw)
  To: Tamas K Lengyel, Roger Pau Monné
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Anthony PERARD, Jan Beulich, Alexandru Isaila, Xen-devel

Hi Tamas,

On 30/12/2019 18:15, Tamas K Lengyel wrote:
> On Mon, Dec 30, 2019 at 10:59 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>
>> On Thu, Dec 19, 2019 at 08:58:01AM -0700, Tamas K Lengyel wrote:
>>> On Thu, Dec 19, 2019 at 2:48 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>>
>>>> On Wed, Dec 18, 2019 at 11:40:37AM -0800, Tamas K Lengyel wrote:
>>>>> The following series implements VM forking for Intel HVM guests to allow for
>>>>> the fast creation of identical VMs without the assosciated high startup costs
>>>>> of booting or restoring the VM from a savefile.
>>>>>
>>>>> JIRA issue: https://xenproject.atlassian.net/browse/XEN-89
>>>>>
>>>>> The main design goal with this series has been to reduce the time of creating
>>>>> the VM fork as much as possible. To achieve this the VM forking process is
>>>>> split into two steps:
>>>>>      1) forking the VM on the hypervisor side;
>>>>>      2) starting QEMU to handle the backed for emulated devices.
>>>>>
>>>>> Step 1) involves creating a VM using the new "xl fork-vm" command. The
>>>>> parent VM is expected to remain paused after forks are created from it (which
>>>>> is different then what process forking normally entails). During this forking
>>>>                 ^ than
>>>>> operation the HVM context and VM settings are copied over to the new forked VM.
>>>>> This operation is fast and it allows the forked VM to be unpaused and to be
>>>>> monitored and accessed via VMI. Note however that without its device model
>>>>> running (depending on what is executing in the VM) it is bound to
>>>>> misbehave/crash when its trying to access devices that would be emulated by
>>>>> QEMU. We anticipate that for certain use-cases this would be an acceptable
>>>>> situation, in case for example when fuzzing is performed of code segments that
>>>>> don't access such devices.
>>>>>
>>>>> Step 2) involves launching QEMU to support the forked VM, which requires the
>>>>> QEMU Xen savefile to be generated manually from the parent VM. This can be
>>>>> accomplished simply by connecting to its QMP socket and issuing the
>>>>> "xen-save-devices-state" command as documented by QEMU:
>>>>> https://github.com/qemu/qemu/blob/master/docs/xen-save-devices-state.txt
>>>>> Once the QEMU Xen savefile is generated the new "xl fork-launch-dm" command is
>>>>> used to launch QEMU and load the specified savefile for it.
>>>>
>>>> IMO having two different commands is confusing for the end user, I
>>>> would rather have something like:
>>>>
>>>> xl fork-vm [-d] ...
>>>>
>>>> Where '-d' would prevent forking any user-space emulators. I don't
>>>> thinks there's a need for a separate command to fork the underlying
>>>> user-space emulators.
>>>
>>> Keeping it as two commands allows you to start up the fork and let it
>>> run immediately and only start up QEMU when you notice it is needed.
>>> The idea being that you can monitor the kernel and see when it tries
>>> to do some I/O that would require the QEMU backend. If you combine the
>>> commands that option goes away.
>>
>> I'm not sure I see why, you could still provide a `xl fork-vm [-c]
>> ...` that would just lunch a QEMU instance. End users using xl have
>> AFAICT no way to tell whether or when a QEMU is needed or not, and
>> hence the default behavior should be a fully functional one.
>>
>> IMO I think fork-vm without any options should do a complete fork of a
>> VM, rather than a partial one without a device model clone.
> 
> I understand your point but implementing that is outside the scope of
> what we are doing right now. There are a lot more steps involved if
> you want to create a fully functional VM fork with QEMU, for example
> you also have to create a separate disk so you don't clobber the
> parent VM's disk. Also, saving the QEMU device state is currently
> hard-wired into the save/migration operation, so changing that
> plumbing in libxl is quite involved. I actually found it way easier to
> just write a script that connects to the socket and saves it to a
> target file then going through the pain of adjusting libxl. So while
> this could be implemented at this time it won't be.
That's fine to not implement it right now, however the user interface 
should be able to cater it.

In this case, I agree with Roger that it is more intuitive to think that 
fork means a complete fork, not a partial one.

You could impose the user to always pass that option to not clone the 
device model and return an error if it is not there.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2019-12-30 18:43         ` Julien Grall
@ 2019-12-30 20:46           ` Tamas K Lengyel
  2019-12-31  0:20             ` Julien Grall
  0 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-30 20:46 UTC (permalink / raw)
  To: Julien Grall
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Anthony PERARD, Jan Beulich, Alexandru Isaila, Xen-devel,
	Roger Pau Monné

On Mon, Dec 30, 2019 at 11:43 AM Julien Grall <julien@xen.org> wrote:
>
> Hi Tamas,
>
> On 30/12/2019 18:15, Tamas K Lengyel wrote:
> > On Mon, Dec 30, 2019 at 10:59 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>
> >> On Thu, Dec 19, 2019 at 08:58:01AM -0700, Tamas K Lengyel wrote:
> >>> On Thu, Dec 19, 2019 at 2:48 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>>>
> >>>> On Wed, Dec 18, 2019 at 11:40:37AM -0800, Tamas K Lengyel wrote:
> >>>>> The following series implements VM forking for Intel HVM guests to allow for
> >>>>> the fast creation of identical VMs without the assosciated high startup costs
> >>>>> of booting or restoring the VM from a savefile.
> >>>>>
> >>>>> JIRA issue: https://xenproject.atlassian.net/browse/XEN-89
> >>>>>
> >>>>> The main design goal with this series has been to reduce the time of creating
> >>>>> the VM fork as much as possible. To achieve this the VM forking process is
> >>>>> split into two steps:
> >>>>>      1) forking the VM on the hypervisor side;
> >>>>>      2) starting QEMU to handle the backed for emulated devices.
> >>>>>
> >>>>> Step 1) involves creating a VM using the new "xl fork-vm" command. The
> >>>>> parent VM is expected to remain paused after forks are created from it (which
> >>>>> is different then what process forking normally entails). During this forking
> >>>>                 ^ than
> >>>>> operation the HVM context and VM settings are copied over to the new forked VM.
> >>>>> This operation is fast and it allows the forked VM to be unpaused and to be
> >>>>> monitored and accessed via VMI. Note however that without its device model
> >>>>> running (depending on what is executing in the VM) it is bound to
> >>>>> misbehave/crash when its trying to access devices that would be emulated by
> >>>>> QEMU. We anticipate that for certain use-cases this would be an acceptable
> >>>>> situation, in case for example when fuzzing is performed of code segments that
> >>>>> don't access such devices.
> >>>>>
> >>>>> Step 2) involves launching QEMU to support the forked VM, which requires the
> >>>>> QEMU Xen savefile to be generated manually from the parent VM. This can be
> >>>>> accomplished simply by connecting to its QMP socket and issuing the
> >>>>> "xen-save-devices-state" command as documented by QEMU:
> >>>>> https://github.com/qemu/qemu/blob/master/docs/xen-save-devices-state.txt
> >>>>> Once the QEMU Xen savefile is generated the new "xl fork-launch-dm" command is
> >>>>> used to launch QEMU and load the specified savefile for it.
> >>>>
> >>>> IMO having two different commands is confusing for the end user, I
> >>>> would rather have something like:
> >>>>
> >>>> xl fork-vm [-d] ...
> >>>>
> >>>> Where '-d' would prevent forking any user-space emulators. I don't
> >>>> thinks there's a need for a separate command to fork the underlying
> >>>> user-space emulators.
> >>>
> >>> Keeping it as two commands allows you to start up the fork and let it
> >>> run immediately and only start up QEMU when you notice it is needed.
> >>> The idea being that you can monitor the kernel and see when it tries
> >>> to do some I/O that would require the QEMU backend. If you combine the
> >>> commands that option goes away.
> >>
> >> I'm not sure I see why, you could still provide a `xl fork-vm [-c]
> >> ...` that would just lunch a QEMU instance. End users using xl have
> >> AFAICT no way to tell whether or when a QEMU is needed or not, and
> >> hence the default behavior should be a fully functional one.
> >>
> >> IMO I think fork-vm without any options should do a complete fork of a
> >> VM, rather than a partial one without a device model clone.
> >
> > I understand your point but implementing that is outside the scope of
> > what we are doing right now. There are a lot more steps involved if
> > you want to create a fully functional VM fork with QEMU, for example
> > you also have to create a separate disk so you don't clobber the
> > parent VM's disk. Also, saving the QEMU device state is currently
> > hard-wired into the save/migration operation, so changing that
> > plumbing in libxl is quite involved. I actually found it way easier to
> > just write a script that connects to the socket and saves it to a
> > target file then going through the pain of adjusting libxl. So while
> > this could be implemented at this time it won't be.
> That's fine to not implement it right now, however the user interface
> should be able to cater it.
>
> In this case, I agree with Roger that it is more intuitive to think that
> fork means a complete fork, not a partial one.
>
> You could impose the user to always pass that option to not clone the
> device model and return an error if it is not there.

Just to be clear, I can add the option to the "fork-vm" command to
load the QEMU state with it, effectively combining the "fork-vm" and
"fork-launch-dm" into one. But I still need the separate
"fork-launch-dm" command since in our model we need to be able to
launch the VM and run it without QEMU for a while, only launching QEMU
when it is determined to be necessary. So if that's what you are
asking, sure, I can do that.

But keep in mind that the "fork-vm" command even with this update
would still not produce for you a "fully functional" VM on its own.
The user still has to produce a new VM config file, create the new
disk, save the QEMU state, etc. So if your concern is that the
"fork-vm" command's name will imply that it is going to be producing
fully functional VM on its own I would rather just rename the command
because by itself it will never create a fully functional VM.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2019-12-30 20:46           ` Tamas K Lengyel
@ 2019-12-31  0:20             ` Julien Grall
  2019-12-31  0:37               ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Julien Grall @ 2019-12-31  0:20 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Alexandru Isaila, Jan Beulich, Xen-devel, Anthony PERARD,
	Julien Grall, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 754 bytes --]

Hi,

On Mon, 30 Dec 2019, 20:49 Tamas K Lengyel, <tamas@tklengyel.com> wrote:

> On Mon, Dec 30, 2019 at 11:43 AM Julien Grall <julien@xen.org> wrote:
> But keep in mind that the "fork-vm" command even with this update
> would still not produce for you a "fully functional" VM on its own.
> The user still has to produce a new VM config file, create the new
> disk, save the QEMU state, etc.


 If you fork then the configuration should be very similar. Right?

So why does the user requires to provide a new config rather than the
command to update the existing one? To me, it feels this is a call to make
mistake when forking.

How is the new config different from the original VM?

As a side note, I can't see any patch adding documentation.

Cheers,

[-- Attachment #1.2: Type: text/html, Size: 1625 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2019-12-31  0:20             ` Julien Grall
@ 2019-12-31  0:37               ` Tamas K Lengyel
  2019-12-31 10:40                 ` Roger Pau Monné
  0 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-31  0:37 UTC (permalink / raw)
  To: Julien Grall
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Alexandru Isaila, Jan Beulich, Xen-devel, Anthony PERARD,
	Julien Grall, Roger Pau Monné

On Mon, Dec 30, 2019 at 5:20 PM Julien Grall <julien.grall@gmail.com> wrote:
>
> Hi,
>
> On Mon, 30 Dec 2019, 20:49 Tamas K Lengyel, <tamas@tklengyel.com> wrote:
>>
>> On Mon, Dec 30, 2019 at 11:43 AM Julien Grall <julien@xen.org> wrote:
>> But keep in mind that the "fork-vm" command even with this update
>> would still not produce for you a "fully functional" VM on its own.
>> The user still has to produce a new VM config file, create the new
>> disk, save the QEMU state, etc.
>
>
>  If you fork then the configuration should be very similar. Right?
>
> So why does the user requires to provide a new config rather than the command to update the existing one? To me, it feels this is a call to make mistake when forking.
>
> How is the new config different from the original VM?

The config must be different at least by giving the fork a different
name. That's the minimum and it's enough only if the VM you are
forking has no disk at all. If it has a disk, you also have to update
the config to point to where the new disk is. I'm using LVM snapshots
but you could also use qcow2, or whatever else there is for disk-CoW.
The fork can also have different options enabled than it's parent. For
example in our test-case, the forks have altp2m enabled while the
parent VM doesn't. There could be other options like that someone
might want to enable for the fork(s). If there is networking involved
you likely also have to attach the fork to a new VLAN as to avoid
MAC-address collision on the bridge. So there are quite a lot of
variation possible, hence its better to have the user generate the new
config they want instead of xl coming up with something on its own.

>
> As a side note, I can't see any patch adding documentation.

It's only an experimental feature so adding documentation was not a
priority. The documentation is pretty much in the cover letter. I'm
happy to add its content as a file under docs in a patch (with the
above extra information).

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2019-12-31  0:37               ` Tamas K Lengyel
@ 2019-12-31 10:40                 ` Roger Pau Monné
  2019-12-31 15:00                   ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Roger Pau Monné @ 2019-12-31 10:40 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall

On Mon, Dec 30, 2019 at 05:37:38PM -0700, Tamas K Lengyel wrote:
> On Mon, Dec 30, 2019 at 5:20 PM Julien Grall <julien.grall@gmail.com> wrote:
> >
> > Hi,
> >
> > On Mon, 30 Dec 2019, 20:49 Tamas K Lengyel, <tamas@tklengyel.com> wrote:
> >>
> >> On Mon, Dec 30, 2019 at 11:43 AM Julien Grall <julien@xen.org> wrote:
> >> But keep in mind that the "fork-vm" command even with this update
> >> would still not produce for you a "fully functional" VM on its own.
> >> The user still has to produce a new VM config file, create the new
> >> disk, save the QEMU state, etc.

IMO the default behavior of the fork command should be to leave the
original VM paused, so that you can continue using the same disk and
network config in the fork and you won't need to pass a new config
file.

As Julien already said, maybe I wasn't clear in my previous replies:
I'm not asking you to implement all this, it's fine if the
implementation of the fork-vm xl command requires you to pass certain
options, and that the default behavior is not implemented.

We need an interface that's sane, and that's designed to be easy and
comprehensive to use, not an interface built around what's currently
implemented.

> >
> >  If you fork then the configuration should be very similar. Right?
> >
> > So why does the user requires to provide a new config rather than the command to update the existing one? To me, it feels this is a call to make mistake when forking.
> >
> > How is the new config different from the original VM?
> 
> The config must be different at least by giving the fork a different
> name. That's the minimum and it's enough only if the VM you are
> forking has no disk at all.

Adding an option to pass an explicit name for the fork would be handy,
or else xl could come up with a name by itself, like it's done for
migration, ie: <orignal name>--fork<digit>.

> If it has a disk, you also have to update
> the config to point to where the new disk is. I'm using LVM snapshots
> but you could also use qcow2, or whatever else there is for disk-CoW.
> The fork can also have different options enabled than it's parent. For
> example in our test-case, the forks have altp2m enabled while the
> parent VM doesn't. There could be other options like that someone
> might want to enable for the fork(s). If there is networking involved
> you likely also have to attach the fork to a new VLAN as to avoid
> MAC-address collision on the bridge. So there are quite a lot of
> variation possible, hence its better to have the user generate the new
> config they want instead of xl coming up with something on its own.

Passing a new config file for the fork is indeed fine, but maybe we
don't want this to be the default behavior, as said above I think it's
possible to fork a VM without passing a new config file.

> >
> > As a side note, I can't see any patch adding documentation.
> 
> It's only an experimental feature so adding documentation was not a
> priority. The documentation is pretty much in the cover letter. I'm
> happy to add its content as a file under docs in a patch (with the
> above extra information).

Please also document the new xl command(s) in the man page [0].

Thanks, Roger.

[0] https://xenbits.xen.org/docs/unstable/man/xl.1.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2019-12-31 10:40                 ` Roger Pau Monné
@ 2019-12-31 15:00                   ` Tamas K Lengyel
  2019-12-31 15:11                     ` Roger Pau Monné
  0 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-31 15:00 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall

On Tue, Dec 31, 2019 at 3:40 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Mon, Dec 30, 2019 at 05:37:38PM -0700, Tamas K Lengyel wrote:
> > On Mon, Dec 30, 2019 at 5:20 PM Julien Grall <julien.grall@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On Mon, 30 Dec 2019, 20:49 Tamas K Lengyel, <tamas@tklengyel.com> wrote:
> > >>
> > >> On Mon, Dec 30, 2019 at 11:43 AM Julien Grall <julien@xen.org> wrote:
> > >> But keep in mind that the "fork-vm" command even with this update
> > >> would still not produce for you a "fully functional" VM on its own.
> > >> The user still has to produce a new VM config file, create the new
> > >> disk, save the QEMU state, etc.
>
> IMO the default behavior of the fork command should be to leave the
> original VM paused, so that you can continue using the same disk and
> network config in the fork and you won't need to pass a new config
> file.
>
> As Julien already said, maybe I wasn't clear in my previous replies:
> I'm not asking you to implement all this, it's fine if the
> implementation of the fork-vm xl command requires you to pass certain
> options, and that the default behavior is not implemented.
>
> We need an interface that's sane, and that's designed to be easy and
> comprehensive to use, not an interface built around what's currently
> implemented.

OK, so I think that would look like "xl fork-vm <parent_domid>" with
additional options for things like name, disk, vlan, or a completely
new config, all of which are currently not implemented, + an
additional option to not launch QEMU at all, which would be the only
one currently working. Also keeping the separate "xl fork-launch-dm"
as is. Is that what we are talking about?

>
> > >
> > >  If you fork then the configuration should be very similar. Right?
> > >
> > > So why does the user requires to provide a new config rather than the command to update the existing one? To me, it feels this is a call to make mistake when forking.
> > >
> > > How is the new config different from the original VM?
> >
> > The config must be different at least by giving the fork a different
> > name. That's the minimum and it's enough only if the VM you are
> > forking has no disk at all.
>
> Adding an option to pass an explicit name for the fork would be handy,
> or else xl could come up with a name by itself, like it's done for
> migration, ie: <orignal name>--fork<digit>.
>
> > If it has a disk, you also have to update
> > the config to point to where the new disk is. I'm using LVM snapshots
> > but you could also use qcow2, or whatever else there is for disk-CoW.
> > The fork can also have different options enabled than it's parent. For
> > example in our test-case, the forks have altp2m enabled while the
> > parent VM doesn't. There could be other options like that someone
> > might want to enable for the fork(s). If there is networking involved
> > you likely also have to attach the fork to a new VLAN as to avoid
> > MAC-address collision on the bridge. So there are quite a lot of
> > variation possible, hence its better to have the user generate the new
> > config they want instead of xl coming up with something on its own.
>
> Passing a new config file for the fork is indeed fine, but maybe we
> don't want this to be the default behavior, as said above I think it's
> possible to fork a VM without passing a new config file.
>
> > >
> > > As a side note, I can't see any patch adding documentation.
> >
> > It's only an experimental feature so adding documentation was not a
> > priority. The documentation is pretty much in the cover letter. I'm
> > happy to add its content as a file under docs in a patch (with the
> > above extra information).
>
> Please also document the new xl command(s) in the man page [0].

Ack.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2019-12-31 15:00                   ` Tamas K Lengyel
@ 2019-12-31 15:11                     ` Roger Pau Monné
  2019-12-31 16:08                       ` Tamas K Lengyel
  2020-01-08 16:34                       ` George Dunlap
  0 siblings, 2 replies; 96+ messages in thread
From: Roger Pau Monné @ 2019-12-31 15:11 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall

On Tue, Dec 31, 2019 at 08:00:17AM -0700, Tamas K Lengyel wrote:
> On Tue, Dec 31, 2019 at 3:40 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Mon, Dec 30, 2019 at 05:37:38PM -0700, Tamas K Lengyel wrote:
> > > On Mon, Dec 30, 2019 at 5:20 PM Julien Grall <julien.grall@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Mon, 30 Dec 2019, 20:49 Tamas K Lengyel, <tamas@tklengyel.com> wrote:
> > > >>
> > > >> On Mon, Dec 30, 2019 at 11:43 AM Julien Grall <julien@xen.org> wrote:
> > > >> But keep in mind that the "fork-vm" command even with this update
> > > >> would still not produce for you a "fully functional" VM on its own.
> > > >> The user still has to produce a new VM config file, create the new
> > > >> disk, save the QEMU state, etc.
> >
> > IMO the default behavior of the fork command should be to leave the
> > original VM paused, so that you can continue using the same disk and
> > network config in the fork and you won't need to pass a new config
> > file.
> >
> > As Julien already said, maybe I wasn't clear in my previous replies:
> > I'm not asking you to implement all this, it's fine if the
> > implementation of the fork-vm xl command requires you to pass certain
> > options, and that the default behavior is not implemented.
> >
> > We need an interface that's sane, and that's designed to be easy and
> > comprehensive to use, not an interface built around what's currently
> > implemented.
> 
> OK, so I think that would look like "xl fork-vm <parent_domid>" with
> additional options for things like name, disk, vlan, or a completely
> new config, all of which are currently not implemented, + an
> additional option to not launch QEMU at all, which would be the only
> one currently working. Also keeping the separate "xl fork-launch-dm"
> as is. Is that what we are talking about?

I think fork-launch-vm should just be an option of fork-vm (ie:
--launch-dm-only or some such). I don't think there's a reason to have
a separate top-level command to just launch the device model.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2019-12-31 15:11                     ` Roger Pau Monné
@ 2019-12-31 16:08                       ` Tamas K Lengyel
  2019-12-31 16:36                         ` Tamas K Lengyel
  2020-01-08 16:34                       ` George Dunlap
  1 sibling, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-31 16:08 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall

On Tue, Dec 31, 2019 at 8:11 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Tue, Dec 31, 2019 at 08:00:17AM -0700, Tamas K Lengyel wrote:
> > On Tue, Dec 31, 2019 at 3:40 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >
> > > On Mon, Dec 30, 2019 at 05:37:38PM -0700, Tamas K Lengyel wrote:
> > > > On Mon, Dec 30, 2019 at 5:20 PM Julien Grall <julien.grall@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Mon, 30 Dec 2019, 20:49 Tamas K Lengyel, <tamas@tklengyel.com> wrote:
> > > > >>
> > > > >> On Mon, Dec 30, 2019 at 11:43 AM Julien Grall <julien@xen.org> wrote:
> > > > >> But keep in mind that the "fork-vm" command even with this update
> > > > >> would still not produce for you a "fully functional" VM on its own.
> > > > >> The user still has to produce a new VM config file, create the new
> > > > >> disk, save the QEMU state, etc.
> > >
> > > IMO the default behavior of the fork command should be to leave the
> > > original VM paused, so that you can continue using the same disk and
> > > network config in the fork and you won't need to pass a new config
> > > file.
> > >
> > > As Julien already said, maybe I wasn't clear in my previous replies:
> > > I'm not asking you to implement all this, it's fine if the
> > > implementation of the fork-vm xl command requires you to pass certain
> > > options, and that the default behavior is not implemented.
> > >
> > > We need an interface that's sane, and that's designed to be easy and
> > > comprehensive to use, not an interface built around what's currently
> > > implemented.
> >
> > OK, so I think that would look like "xl fork-vm <parent_domid>" with
> > additional options for things like name, disk, vlan, or a completely
> > new config, all of which are currently not implemented, + an
> > additional option to not launch QEMU at all, which would be the only
> > one currently working. Also keeping the separate "xl fork-launch-dm"
> > as is. Is that what we are talking about?
>
> I think fork-launch-vm should just be an option of fork-vm (ie:
> --launch-dm-only or some such). I don't think there's a reason to have
> a separate top-level command to just launch the device model.

It's just that the fork-launch-dm needs the domid of the fork, while
the fork-vm needs the parent's domid. But I guess we can interpret the
"domid" required input differently depending on which sub-option is
specified for the command. Let's see how it pans out.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2019-12-31 16:08                       ` Tamas K Lengyel
@ 2019-12-31 16:36                         ` Tamas K Lengyel
  2020-01-08  9:42                           ` Julien Grall
  2020-01-08 15:08                           ` Roger Pau Monné
  0 siblings, 2 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2019-12-31 16:36 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall

On Tue, Dec 31, 2019 at 9:08 AM Tamas K Lengyel <tamas@tklengyel.com> wrote:
>
> On Tue, Dec 31, 2019 at 8:11 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Tue, Dec 31, 2019 at 08:00:17AM -0700, Tamas K Lengyel wrote:
> > > On Tue, Dec 31, 2019 at 3:40 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > >
> > > > On Mon, Dec 30, 2019 at 05:37:38PM -0700, Tamas K Lengyel wrote:
> > > > > On Mon, Dec 30, 2019 at 5:20 PM Julien Grall <julien.grall@gmail.com> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On Mon, 30 Dec 2019, 20:49 Tamas K Lengyel, <tamas@tklengyel.com> wrote:
> > > > > >>
> > > > > >> On Mon, Dec 30, 2019 at 11:43 AM Julien Grall <julien@xen.org> wrote:
> > > > > >> But keep in mind that the "fork-vm" command even with this update
> > > > > >> would still not produce for you a "fully functional" VM on its own.
> > > > > >> The user still has to produce a new VM config file, create the new
> > > > > >> disk, save the QEMU state, etc.
> > > >
> > > > IMO the default behavior of the fork command should be to leave the
> > > > original VM paused, so that you can continue using the same disk and
> > > > network config in the fork and you won't need to pass a new config
> > > > file.
> > > >
> > > > As Julien already said, maybe I wasn't clear in my previous replies:
> > > > I'm not asking you to implement all this, it's fine if the
> > > > implementation of the fork-vm xl command requires you to pass certain
> > > > options, and that the default behavior is not implemented.
> > > >
> > > > We need an interface that's sane, and that's designed to be easy and
> > > > comprehensive to use, not an interface built around what's currently
> > > > implemented.
> > >
> > > OK, so I think that would look like "xl fork-vm <parent_domid>" with
> > > additional options for things like name, disk, vlan, or a completely
> > > new config, all of which are currently not implemented, + an
> > > additional option to not launch QEMU at all, which would be the only
> > > one currently working. Also keeping the separate "xl fork-launch-dm"
> > > as is. Is that what we are talking about?
> >
> > I think fork-launch-vm should just be an option of fork-vm (ie:
> > --launch-dm-only or some such). I don't think there's a reason to have
> > a separate top-level command to just launch the device model.
>
> It's just that the fork-launch-dm needs the domid of the fork, while
> the fork-vm needs the parent's domid. But I guess we can interpret the
> "domid" required input differently depending on which sub-option is
> specified for the command. Let's see how it pans out.

How does the following look for the interface?

    { "fork-vm",
      &main_fork_vm, 0, 1,
      "Fork a domain from the running parent domid",
      "[options] <Domid>",
      "-h                           Print this help.\n"
      "-N <name>                    Assign name to VM fork.\n"
      "-D <disk>                    Assign disk to VM fork.\n"
      "-B <bridge                   Assign bridge to VM fork.\n"
      "-V <vlan>                    Assign vlan to VM fork.\n"
      "-C <config>                  Use config file for VM fork.\n"
      "-Q <qemu-save-file>          Use qemu save file for VM fork.\n"
      "--launch-dm  <yes|no|late>   Launch device model (QEMU) for VM fork.\n"
      "--fork-reset                 Reset VM fork.\n"
      "-p                           Do not unpause VMs after fork."
      "-h                           Print this help.\n"
      "-d                           Enable debug messages.\n"
    },

Currently the parts that are implemented would look like:
xl fork-vm -p --launch-dm no <parent_domid>
xl fork-vm -p --launch-dm late -C <config> -Q <qemu-save-file> <fork_domid>
xl fork-vm -p --fork-reset <fork_domid>

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2019-12-31 16:36                         ` Tamas K Lengyel
@ 2020-01-08  9:42                           ` Julien Grall
  2020-01-08 15:08                           ` Roger Pau Monné
  1 sibling, 0 replies; 96+ messages in thread
From: Julien Grall @ 2020-01-08  9:42 UTC (permalink / raw)
  To: Tamas K Lengyel, Roger Pau Monné
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Anthony PERARD,
	Xen-devel

Hi Tamas,

On 31/12/2019 16:36, Tamas K Lengyel wrote:
> On Tue, Dec 31, 2019 at 9:08 AM Tamas K Lengyel <tamas@tklengyel.com> wrote:
>>
>> On Tue, Dec 31, 2019 at 8:11 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>
>>> On Tue, Dec 31, 2019 at 08:00:17AM -0700, Tamas K Lengyel wrote:
>>>> On Tue, Dec 31, 2019 at 3:40 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>>>
>>>>> On Mon, Dec 30, 2019 at 05:37:38PM -0700, Tamas K Lengyel wrote:
>>>>>> On Mon, Dec 30, 2019 at 5:20 PM Julien Grall <julien.grall@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Mon, 30 Dec 2019, 20:49 Tamas K Lengyel, <tamas@tklengyel.com> wrote:
>>>>>>>>
>>>>>>>> On Mon, Dec 30, 2019 at 11:43 AM Julien Grall <julien@xen.org> wrote:
>>>>>>>> But keep in mind that the "fork-vm" command even with this update
>>>>>>>> would still not produce for you a "fully functional" VM on its own.
>>>>>>>> The user still has to produce a new VM config file, create the new
>>>>>>>> disk, save the QEMU state, etc.
>>>>>
>>>>> IMO the default behavior of the fork command should be to leave the
>>>>> original VM paused, so that you can continue using the same disk and
>>>>> network config in the fork and you won't need to pass a new config
>>>>> file.
>>>>>
>>>>> As Julien already said, maybe I wasn't clear in my previous replies:
>>>>> I'm not asking you to implement all this, it's fine if the
>>>>> implementation of the fork-vm xl command requires you to pass certain
>>>>> options, and that the default behavior is not implemented.
>>>>>
>>>>> We need an interface that's sane, and that's designed to be easy and
>>>>> comprehensive to use, not an interface built around what's currently
>>>>> implemented.
>>>>
>>>> OK, so I think that would look like "xl fork-vm <parent_domid>" with
>>>> additional options for things like name, disk, vlan, or a completely
>>>> new config, all of which are currently not implemented, + an
>>>> additional option to not launch QEMU at all, which would be the only
>>>> one currently working. Also keeping the separate "xl fork-launch-dm"
>>>> as is. Is that what we are talking about?
>>>
>>> I think fork-launch-vm should just be an option of fork-vm (ie:
>>> --launch-dm-only or some such). I don't think there's a reason to have
>>> a separate top-level command to just launch the device model.
>>
>> It's just that the fork-launch-dm needs the domid of the fork, while
>> the fork-vm needs the parent's domid. But I guess we can interpret the
>> "domid" required input differently depending on which sub-option is
>> specified for the command. Let's see how it pans out.
> 
> How does the following look for the interface?
> 
>      { "fork-vm",
>        &main_fork_vm, 0, 1,
>        "Fork a domain from the running parent domid",
>        "[options] <Domid>",
>        "-h                           Print this help.\n"
>        "-N <name>                    Assign name to VM fork.\n"
>        "-D <disk>                    Assign disk to VM fork.\n"
>        "-B <bridge                   Assign bridge to VM fork.\n"
>        "-V <vlan>                    Assign vlan to VM fork.\n"
>        "-C <config>                  Use config file for VM fork.\n"
>        "-Q <qemu-save-file>          Use qemu save file for VM fork.\n"
>        "--launch-dm  <yes|no|late>   Launch device model (QEMU) for VM fork.\n"
>        "--fork-reset                 Reset VM fork.\n"
>        "-p                           Do not unpause VMs after fork."
>        "-h                           Print this help.\n"
>        "-d                           Enable debug messages.\n"
>      },
> 
> Currently the parts that are implemented would look like:
> xl fork-vm -p --launch-dm no <parent_domid>
> xl fork-vm -p --launch-dm late -C <config> -Q <qemu-save-file> <fork_domid>
> xl fork-vm -p --fork-reset <fork_domid>

The interface looks good to me. Note that I don't think you need to 
describre all the unimplemented option in the help. It would be 
sufficient to describe what you only support and bail out if the user 
gives something different.

What matters is we are able to extend the command line option over the time.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2019-12-31 16:36                         ` Tamas K Lengyel
  2020-01-08  9:42                           ` Julien Grall
@ 2020-01-08 15:08                           ` Roger Pau Monné
  2020-01-08 15:32                             ` Tamas K Lengyel
  1 sibling, 1 reply; 96+ messages in thread
From: Roger Pau Monné @ 2020-01-08 15:08 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall

On Tue, Dec 31, 2019 at 09:36:01AM -0700, Tamas K Lengyel wrote:
> On Tue, Dec 31, 2019 at 9:08 AM Tamas K Lengyel <tamas@tklengyel.com> wrote:
> >
> > On Tue, Dec 31, 2019 at 8:11 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >
> > > On Tue, Dec 31, 2019 at 08:00:17AM -0700, Tamas K Lengyel wrote:
> > > > On Tue, Dec 31, 2019 at 3:40 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > > >
> > > > > On Mon, Dec 30, 2019 at 05:37:38PM -0700, Tamas K Lengyel wrote:
> > > > > > On Mon, Dec 30, 2019 at 5:20 PM Julien Grall <julien.grall@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > On Mon, 30 Dec 2019, 20:49 Tamas K Lengyel, <tamas@tklengyel.com> wrote:
> > > > > > >>
> > > > > > >> On Mon, Dec 30, 2019 at 11:43 AM Julien Grall <julien@xen.org> wrote:
> > > > > > >> But keep in mind that the "fork-vm" command even with this update
> > > > > > >> would still not produce for you a "fully functional" VM on its own.
> > > > > > >> The user still has to produce a new VM config file, create the new
> > > > > > >> disk, save the QEMU state, etc.
> > > > >
> > > > > IMO the default behavior of the fork command should be to leave the
> > > > > original VM paused, so that you can continue using the same disk and
> > > > > network config in the fork and you won't need to pass a new config
> > > > > file.
> > > > >
> > > > > As Julien already said, maybe I wasn't clear in my previous replies:
> > > > > I'm not asking you to implement all this, it's fine if the
> > > > > implementation of the fork-vm xl command requires you to pass certain
> > > > > options, and that the default behavior is not implemented.
> > > > >
> > > > > We need an interface that's sane, and that's designed to be easy and
> > > > > comprehensive to use, not an interface built around what's currently
> > > > > implemented.
> > > >
> > > > OK, so I think that would look like "xl fork-vm <parent_domid>" with
> > > > additional options for things like name, disk, vlan, or a completely
> > > > new config, all of which are currently not implemented, + an
> > > > additional option to not launch QEMU at all, which would be the only
> > > > one currently working. Also keeping the separate "xl fork-launch-dm"
> > > > as is. Is that what we are talking about?
> > >
> > > I think fork-launch-vm should just be an option of fork-vm (ie:
> > > --launch-dm-only or some such). I don't think there's a reason to have
> > > a separate top-level command to just launch the device model.
> >
> > It's just that the fork-launch-dm needs the domid of the fork, while
> > the fork-vm needs the parent's domid. But I guess we can interpret the
> > "domid" required input differently depending on which sub-option is
> > specified for the command. Let's see how it pans out.
> 
> How does the following look for the interface?
> 
>     { "fork-vm",
>       &main_fork_vm, 0, 1,
>       "Fork a domain from the running parent domid",
>       "[options] <Domid>",
>       "-h                           Print this help.\n"
>       "-N <name>                    Assign name to VM fork.\n"
>       "-D <disk>                    Assign disk to VM fork.\n"
>       "-B <bridge                   Assign bridge to VM fork.\n"
>       "-V <vlan>                    Assign vlan to VM fork.\n"

IMO I think the name of fork is the only useful option. Being able to
assign disks or bridges from the command line seems quite complicated.
What about VMs with multiple disks? Or VMs with multiple nics on
different bridges?

I think it's easier for both the implementation and the user to just
use a config file in that case.

>       "-C <config>                  Use config file for VM fork.\n"
>       "-Q <qemu-save-file>          Use qemu save file for VM fork.\n"
>       "--launch-dm  <yes|no|late>   Launch device model (QEMU) for VM fork.\n"
>       "--fork-reset                 Reset VM fork.\n"
>       "-p                           Do not unpause VMs after fork."

I think the default behaviour should be to leave the original VM
paused and the forked one running, and hence this should be:

        "-p                           Leave forked VM paused."
	"-u                           Leave parent VM unpaused."

>       "-h                           Print this help.\n"
>       "-d                           Enable debug messages.\n"
>     },
> 
> Currently the parts that are implemented would look like:
> xl fork-vm -p --launch-dm no <parent_domid>
> xl fork-vm -p --launch-dm late -C <config> -Q <qemu-save-file> <fork_domid>

Why do you need a config file for launching the Qemu device model?
Doesn't the save-file contain all the information?

I think you also need something like:

# xl fork-vm --launch-dm late <parent_domid> <fork_domid>

So that a user doesn't need to pass a qemu-save-file?

Can you also list the command used to get the Qemu save-file from the
parent? (just for completeness purposes).

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2020-01-08 15:08                           ` Roger Pau Monné
@ 2020-01-08 15:32                             ` Tamas K Lengyel
  2020-01-08 18:00                               ` Roger Pau Monné
  0 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 15:32 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall

On Wed, Jan 8, 2020 at 8:08 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Tue, Dec 31, 2019 at 09:36:01AM -0700, Tamas K Lengyel wrote:
> > On Tue, Dec 31, 2019 at 9:08 AM Tamas K Lengyel <tamas@tklengyel.com> wrote:
> > >
> > > On Tue, Dec 31, 2019 at 8:11 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > >
> > > > On Tue, Dec 31, 2019 at 08:00:17AM -0700, Tamas K Lengyel wrote:
> > > > > On Tue, Dec 31, 2019 at 3:40 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > > > >
> > > > > > On Mon, Dec 30, 2019 at 05:37:38PM -0700, Tamas K Lengyel wrote:
> > > > > > > On Mon, Dec 30, 2019 at 5:20 PM Julien Grall <julien.grall@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > On Mon, 30 Dec 2019, 20:49 Tamas K Lengyel, <tamas@tklengyel.com> wrote:
> > > > > > > >>
> > > > > > > >> On Mon, Dec 30, 2019 at 11:43 AM Julien Grall <julien@xen.org> wrote:
> > > > > > > >> But keep in mind that the "fork-vm" command even with this update
> > > > > > > >> would still not produce for you a "fully functional" VM on its own.
> > > > > > > >> The user still has to produce a new VM config file, create the new
> > > > > > > >> disk, save the QEMU state, etc.
> > > > > >
> > > > > > IMO the default behavior of the fork command should be to leave the
> > > > > > original VM paused, so that you can continue using the same disk and
> > > > > > network config in the fork and you won't need to pass a new config
> > > > > > file.
> > > > > >
> > > > > > As Julien already said, maybe I wasn't clear in my previous replies:
> > > > > > I'm not asking you to implement all this, it's fine if the
> > > > > > implementation of the fork-vm xl command requires you to pass certain
> > > > > > options, and that the default behavior is not implemented.
> > > > > >
> > > > > > We need an interface that's sane, and that's designed to be easy and
> > > > > > comprehensive to use, not an interface built around what's currently
> > > > > > implemented.
> > > > >
> > > > > OK, so I think that would look like "xl fork-vm <parent_domid>" with
> > > > > additional options for things like name, disk, vlan, or a completely
> > > > > new config, all of which are currently not implemented, + an
> > > > > additional option to not launch QEMU at all, which would be the only
> > > > > one currently working. Also keeping the separate "xl fork-launch-dm"
> > > > > as is. Is that what we are talking about?
> > > >
> > > > I think fork-launch-vm should just be an option of fork-vm (ie:
> > > > --launch-dm-only or some such). I don't think there's a reason to have
> > > > a separate top-level command to just launch the device model.
> > >
> > > It's just that the fork-launch-dm needs the domid of the fork, while
> > > the fork-vm needs the parent's domid. But I guess we can interpret the
> > > "domid" required input differently depending on which sub-option is
> > > specified for the command. Let's see how it pans out.
> >
> > How does the following look for the interface?
> >
> >     { "fork-vm",
> >       &main_fork_vm, 0, 1,
> >       "Fork a domain from the running parent domid",
> >       "[options] <Domid>",
> >       "-h                           Print this help.\n"
> >       "-N <name>                    Assign name to VM fork.\n"
> >       "-D <disk>                    Assign disk to VM fork.\n"
> >       "-B <bridge                   Assign bridge to VM fork.\n"
> >       "-V <vlan>                    Assign vlan to VM fork.\n"
>
> IMO I think the name of fork is the only useful option. Being able to
> assign disks or bridges from the command line seems quite complicated.
> What about VMs with multiple disks? Or VMs with multiple nics on
> different bridges?
>
> I think it's easier for both the implementation and the user to just
> use a config file in that case.

I agree but it sounded to me you guys wanted to have a "complete"
interface even if it's unimplemented. This is what a complete
interface would look to me.

>
> >       "-C <config>                  Use config file for VM fork.\n"
> >       "-Q <qemu-save-file>          Use qemu save file for VM fork.\n"
> >       "--launch-dm  <yes|no|late>   Launch device model (QEMU) for VM fork.\n"
> >       "--fork-reset                 Reset VM fork.\n"
> >       "-p                           Do not unpause VMs after fork."
>
> I think the default behaviour should be to leave the original VM
> paused and the forked one running, and hence this should be:

That is the default. I guess the text saying VMs was not correctly
worded, it just means don't unpause fork after it's created. The
parent remains always paused.

>
>         "-p                           Leave forked VM paused."
>         "-u                           Leave parent VM unpaused."

But you shouldn't unpause the parent VM at all. It should remain
paused as long as there are forks running that were split from it.
Unpausing it will lead to subtle and unexplainable crashes in the fork
since the fork now will use pages that are from a different execution
path. Technically in the future it would be possible to unpause the VM
but it requires to fully populate the pagetables in all forks made
from it with mem_shared entries and deduplicate to the forks all the
pages that's can't be be shared. That was what I originally tried to
do but it was extremely slow, hence the lazy-population of the
pagetable in the forks.

>
> >       "-h                           Print this help.\n"
> >       "-d                           Enable debug messages.\n"
> >     },
> >
> > Currently the parts that are implemented would look like:
> > xl fork-vm -p --launch-dm no <parent_domid>
> > xl fork-vm -p --launch-dm late -C <config> -Q <qemu-save-file> <fork_domid>
>
> Why do you need a config file for launching the Qemu device model?
> Doesn't the save-file contain all the information?

The config is used to populate xenstore, not just for QEMU. The QEMU
save file doesn't contain the xl config. This is not a full VM save
file, it is only the QEMU state that gets dumped with
xen-save-devices-state.

>
> I think you also need something like:
>
> # xl fork-vm --launch-dm late <parent_domid> <fork_domid>
>
> So that a user doesn't need to pass a qemu-save-file?

This doesn't make much sense to me. To launch QEMU you need the config
file to wire things up correctly. Like in order to launch QEMU you
need to tell it the name of the VM, disk path, etc. that are all
contained in the config.

>
> Can you also list the command used to get the Qemu save-file from the
> parent? (just for completeness purposes).

It's explained in the cover letter. You connect to the QEMU socket and
issue the xen-save-devices-state QMP command.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2019-12-31 15:11                     ` Roger Pau Monné
  2019-12-31 16:08                       ` Tamas K Lengyel
@ 2020-01-08 16:34                       ` George Dunlap
  2020-01-08 17:06                         ` Tamas K Lengyel
  2020-01-08 18:07                         ` Roger Pau Monné
  1 sibling, 2 replies; 96+ messages in thread
From: George Dunlap @ 2020-01-08 16:34 UTC (permalink / raw)
  To: Roger Pau Monné, Tamas K Lengyel
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall

On 12/31/19 3:11 PM, Roger Pau Monné wrote:
> On Tue, Dec 31, 2019 at 08:00:17AM -0700, Tamas K Lengyel wrote:
>> On Tue, Dec 31, 2019 at 3:40 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>
>>> On Mon, Dec 30, 2019 at 05:37:38PM -0700, Tamas K Lengyel wrote:
>>>> On Mon, Dec 30, 2019 at 5:20 PM Julien Grall <julien.grall@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> On Mon, 30 Dec 2019, 20:49 Tamas K Lengyel, <tamas@tklengyel.com> wrote:
>>>>>>
>>>>>> On Mon, Dec 30, 2019 at 11:43 AM Julien Grall <julien@xen.org> wrote:
>>>>>> But keep in mind that the "fork-vm" command even with this update
>>>>>> would still not produce for you a "fully functional" VM on its own.
>>>>>> The user still has to produce a new VM config file, create the new
>>>>>> disk, save the QEMU state, etc.
>>>
>>> IMO the default behavior of the fork command should be to leave the
>>> original VM paused, so that you can continue using the same disk and
>>> network config in the fork and you won't need to pass a new config
>>> file.
>>>
>>> As Julien already said, maybe I wasn't clear in my previous replies:
>>> I'm not asking you to implement all this, it's fine if the
>>> implementation of the fork-vm xl command requires you to pass certain
>>> options, and that the default behavior is not implemented.
>>>
>>> We need an interface that's sane, and that's designed to be easy and
>>> comprehensive to use, not an interface built around what's currently
>>> implemented.
>>
>> OK, so I think that would look like "xl fork-vm <parent_domid>" with
>> additional options for things like name, disk, vlan, or a completely
>> new config, all of which are currently not implemented, + an
>> additional option to not launch QEMU at all, which would be the only
>> one currently working. Also keeping the separate "xl fork-launch-dm"
>> as is. Is that what we are talking about?
> 
> I think fork-launch-vm should just be an option of fork-vm (ie:
> --launch-dm-only or some such). I don't think there's a reason to have
> a separate top-level command to just launch the device model.

So first of all, Tamas -- do you actually need to exec xl here?  Would
it make sense for these to start out simply as libxl functions that are
called by your system?

I actually disagree that we want a single command to do all of these.
If we did want `exec xl` to be one of the supported interfaces, I think
it would break down something like this:

`xl fork-domain`: Only forks the domain.
`xl fork-launch-dm`: (or attach-dm?): Start up and attach the
devicemodel to the domain

Then `xl fork` (or maybe `xl fork-vm`) would be something implemented in
the future that would fork the entire domain.

(This is similar to how `git am` works for instance; internally it runs
several steps, including `git mailsplit`, `git mailinfo`, and `git
apply-patch`, each of which can be called individually.)

I think I would also have:

`xl fork-save-dm`: Connect over qmp to the parent domain and save the dm
file

Then have `xl fork-launch-dm` either take a filename (saved from the
previous step) or a parent domain id (in which case it would arrange to
save the file itself).

Although in fact, is there any reason we couldn't store the parent
domain ID in xenstore, so that `xl fork-launch-dm` could find the parent
by itself?  (Although that, of course, is something that could be added
later if it's not something Tamas needs.)

Thoughts?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2020-01-08 16:34                       ` George Dunlap
@ 2020-01-08 17:06                         ` Tamas K Lengyel
  2020-01-08 17:16                           ` George Dunlap
  2020-01-08 18:07                         ` Roger Pau Monné
  1 sibling, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:06 UTC (permalink / raw)
  To: George Dunlap
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall, Roger Pau Monné

On Wed, Jan 8, 2020 at 9:34 AM George Dunlap <george.dunlap@citrix.com> wrote:
>
> On 12/31/19 3:11 PM, Roger Pau Monné wrote:
> > On Tue, Dec 31, 2019 at 08:00:17AM -0700, Tamas K Lengyel wrote:
> >> On Tue, Dec 31, 2019 at 3:40 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>>
> >>> On Mon, Dec 30, 2019 at 05:37:38PM -0700, Tamas K Lengyel wrote:
> >>>> On Mon, Dec 30, 2019 at 5:20 PM Julien Grall <julien.grall@gmail.com> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> On Mon, 30 Dec 2019, 20:49 Tamas K Lengyel, <tamas@tklengyel.com> wrote:
> >>>>>>
> >>>>>> On Mon, Dec 30, 2019 at 11:43 AM Julien Grall <julien@xen.org> wrote:
> >>>>>> But keep in mind that the "fork-vm" command even with this update
> >>>>>> would still not produce for you a "fully functional" VM on its own.
> >>>>>> The user still has to produce a new VM config file, create the new
> >>>>>> disk, save the QEMU state, etc.
> >>>
> >>> IMO the default behavior of the fork command should be to leave the
> >>> original VM paused, so that you can continue using the same disk and
> >>> network config in the fork and you won't need to pass a new config
> >>> file.
> >>>
> >>> As Julien already said, maybe I wasn't clear in my previous replies:
> >>> I'm not asking you to implement all this, it's fine if the
> >>> implementation of the fork-vm xl command requires you to pass certain
> >>> options, and that the default behavior is not implemented.
> >>>
> >>> We need an interface that's sane, and that's designed to be easy and
> >>> comprehensive to use, not an interface built around what's currently
> >>> implemented.
> >>
> >> OK, so I think that would look like "xl fork-vm <parent_domid>" with
> >> additional options for things like name, disk, vlan, or a completely
> >> new config, all of which are currently not implemented, + an
> >> additional option to not launch QEMU at all, which would be the only
> >> one currently working. Also keeping the separate "xl fork-launch-dm"
> >> as is. Is that what we are talking about?
> >
> > I think fork-launch-vm should just be an option of fork-vm (ie:
> > --launch-dm-only or some such). I don't think there's a reason to have
> > a separate top-level command to just launch the device model.
>
> So first of all, Tamas -- do you actually need to exec xl here?  Would
> it make sense for these to start out simply as libxl functions that are
> called by your system?

For my current tools & tests - no. I don't start QEMU for the forks at
all. So at this point I don't even need libxl. But I can foresee that
at some point in the future it may become necessary in case we want
allow the forked VM to touch emulated devices. Wiring QEMU up and
making the system functional as a whole I found it easier to do it via
xl. There is just way too many moving components involved to do that
any other way.

>
> I actually disagree that we want a single command to do all of these.
> If we did want `exec xl` to be one of the supported interfaces, I think
> it would break down something like this:
>
> `xl fork-domain`: Only forks the domain.
> `xl fork-launch-dm`: (or attach-dm?): Start up and attach the
> devicemodel to the domain
>
> Then `xl fork` (or maybe `xl fork-vm`) would be something implemented in
> the future that would fork the entire domain.

I really don't have a strong opinion about this either way. I can see
it working either way. Having them all bundled under a single
top-level comment doesn't pollute the help text when someone is just
looking at what xl can do in general. Makes that command a lot more
complex for sure but I don't think it's too bad.

>
> (This is similar to how `git am` works for instance; internally it runs
> several steps, including `git mailsplit`, `git mailinfo`, and `git
> apply-patch`, each of which can be called individually.)
>
> I think I would also have:
>
> `xl fork-save-dm`: Connect over qmp to the parent domain and save the dm
> file

Aye, could be done. For now I didn't bother since its trivial to do
manually already.

>
> Then have `xl fork-launch-dm` either take a filename (saved from the
> previous step) or a parent domain id (in which case it would arrange to
> save the file itself).
>
> Although in fact, is there any reason we couldn't store the parent
> domain ID in xenstore, so that `xl fork-launch-dm` could find the parent
> by itself?  (Although that, of course, is something that could be added
> later if it's not something Tamas needs.)

Could be done. But I store ID internally in my tools anyway since I
need it to initialize VMI. So having it in Xenstore is not required
for me. In fact I would prefer to leave Xenstore out of these
operations as much as possible cause it would slow things down. In my
latest tests forking is down to 0.0007s, having to touch Xenstore for
each would slow things down considerably.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2020-01-08 17:06                         ` Tamas K Lengyel
@ 2020-01-08 17:16                           ` George Dunlap
  2020-01-08 17:25                             ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: George Dunlap @ 2020-01-08 17:16 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall, Roger Pau Monné

On 1/8/20 5:06 PM, Tamas K Lengyel wrote:
> On Wed, Jan 8, 2020 at 9:34 AM George Dunlap <george.dunlap@citrix.com> wrote:
>>
>> On 12/31/19 3:11 PM, Roger Pau Monné wrote:
>>> On Tue, Dec 31, 2019 at 08:00:17AM -0700, Tamas K Lengyel wrote:
>>>> On Tue, Dec 31, 2019 at 3:40 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>>>
>>>>> On Mon, Dec 30, 2019 at 05:37:38PM -0700, Tamas K Lengyel wrote:
>>>>>> On Mon, Dec 30, 2019 at 5:20 PM Julien Grall <julien.grall@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Mon, 30 Dec 2019, 20:49 Tamas K Lengyel, <tamas@tklengyel.com> wrote:
>>>>>>>>
>>>>>>>> On Mon, Dec 30, 2019 at 11:43 AM Julien Grall <julien@xen.org> wrote:
>>>>>>>> But keep in mind that the "fork-vm" command even with this update
>>>>>>>> would still not produce for you a "fully functional" VM on its own.
>>>>>>>> The user still has to produce a new VM config file, create the new
>>>>>>>> disk, save the QEMU state, etc.
>>>>>
>>>>> IMO the default behavior of the fork command should be to leave the
>>>>> original VM paused, so that you can continue using the same disk and
>>>>> network config in the fork and you won't need to pass a new config
>>>>> file.
>>>>>
>>>>> As Julien already said, maybe I wasn't clear in my previous replies:
>>>>> I'm not asking you to implement all this, it's fine if the
>>>>> implementation of the fork-vm xl command requires you to pass certain
>>>>> options, and that the default behavior is not implemented.
>>>>>
>>>>> We need an interface that's sane, and that's designed to be easy and
>>>>> comprehensive to use, not an interface built around what's currently
>>>>> implemented.
>>>>
>>>> OK, so I think that would look like "xl fork-vm <parent_domid>" with
>>>> additional options for things like name, disk, vlan, or a completely
>>>> new config, all of which are currently not implemented, + an
>>>> additional option to not launch QEMU at all, which would be the only
>>>> one currently working. Also keeping the separate "xl fork-launch-dm"
>>>> as is. Is that what we are talking about?
>>>
>>> I think fork-launch-vm should just be an option of fork-vm (ie:
>>> --launch-dm-only or some such). I don't think there's a reason to have
>>> a separate top-level command to just launch the device model.
>>
>> So first of all, Tamas -- do you actually need to exec xl here?  Would
>> it make sense for these to start out simply as libxl functions that are
>> called by your system?
> 
> For my current tools & tests - no. I don't start QEMU for the forks at
> all. So at this point I don't even need libxl. But I can foresee that
> at some point in the future it may become necessary in case we want
> allow the forked VM to touch emulated devices. Wiring QEMU up and
> making the system functional as a whole I found it easier to do it via
> xl. There is just way too many moving components involved to do that
> any other way.
> 
>>
>> I actually disagree that we want a single command to do all of these.
>> If we did want `exec xl` to be one of the supported interfaces, I think
>> it would break down something like this:
>>
>> `xl fork-domain`: Only forks the domain.
>> `xl fork-launch-dm`: (or attach-dm?): Start up and attach the
>> devicemodel to the domain
>>
>> Then `xl fork` (or maybe `xl fork-vm`) would be something implemented in
>> the future that would fork the entire domain.
> 
> I really don't have a strong opinion about this either way. I can see
> it working either way. Having them all bundled under a single
> top-level comment doesn't pollute the help text when someone is just
> looking at what xl can do in general. Makes that command a lot more
> complex for sure but I don't think it's too bad.

One thing I don't like about having a single command is that since
you're not planning on implementing the end-to-end "vm fork" command,
then when running the base "fork-vm" command, you'll have to print an
error message that says "This command is not available in its
completeness; you'll have to implement your own via fork-vm --domain,
fork-vm --save-dm, and fork-vm --launch-dm."

Which we could do, but seem a bit strange. :-)

>> Then have `xl fork-launch-dm` either take a filename (saved from the
>> previous step) or a parent domain id (in which case it would arrange to
>> save the file itself).
>>
>> Although in fact, is there any reason we couldn't store the parent
>> domain ID in xenstore, so that `xl fork-launch-dm` could find the parent
>> by itself?  (Although that, of course, is something that could be added
>> later if it's not something Tamas needs.)
> 
> Could be done. But I store ID internally in my tools anyway since I
> need it to initialize VMI. So having it in Xenstore is not required
> for me. In fact I would prefer to leave Xenstore out of these
> operations as much as possible cause it would slow things down. In my
> latest tests forking is down to 0.0007s, having to touch Xenstore for
> each would slow things down considerably.

Right, that makes sense.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2020-01-08 17:16                           ` George Dunlap
@ 2020-01-08 17:25                             ` Tamas K Lengyel
  0 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 17:25 UTC (permalink / raw)
  To: George Dunlap
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall, Roger Pau Monné

> >> I actually disagree that we want a single command to do all of these.
> >> If we did want `exec xl` to be one of the supported interfaces, I think
> >> it would break down something like this:
> >>
> >> `xl fork-domain`: Only forks the domain.
> >> `xl fork-launch-dm`: (or attach-dm?): Start up and attach the
> >> devicemodel to the domain
> >>
> >> Then `xl fork` (or maybe `xl fork-vm`) would be something implemented in
> >> the future that would fork the entire domain.
> >
> > I really don't have a strong opinion about this either way. I can see
> > it working either way. Having them all bundled under a single
> > top-level comment doesn't pollute the help text when someone is just
> > looking at what xl can do in general. Makes that command a lot more
> > complex for sure but I don't think it's too bad.
>
> One thing I don't like about having a single command is that since
> you're not planning on implementing the end-to-end "vm fork" command,
> then when running the base "fork-vm" command, you'll have to print an
> error message that says "This command is not available in its
> completeness; you'll have to implement your own via fork-vm --domain,
> fork-vm --save-dm, and fork-vm --launch-dm."
>
> Which we could do, but seem a bit strange. :-)

Yea, it's not a single step to get to a fully functional fork but its close:
1. pause parent vm
2. generate qemu_save_file
3. xl fork-vm -C config -Q qemu_save_file <parent_domid>

For the second fork - provided it has its own config file ready to go
- it is enough to just run the 3. step. Technically we could integrate
all these three steps into one and then the user would only have to
generate the new config file. But I found this setup to be "good
enough" already.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2020-01-08 15:32                             ` Tamas K Lengyel
@ 2020-01-08 18:00                               ` Roger Pau Monné
  2020-01-08 18:14                                 ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Roger Pau Monné @ 2020-01-08 18:00 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall

On Wed, Jan 08, 2020 at 08:32:22AM -0700, Tamas K Lengyel wrote:
> On Wed, Jan 8, 2020 at 8:08 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Tue, Dec 31, 2019 at 09:36:01AM -0700, Tamas K Lengyel wrote:
> > > On Tue, Dec 31, 2019 at 9:08 AM Tamas K Lengyel <tamas@tklengyel.com> wrote:
> > > >
> > > > On Tue, Dec 31, 2019 at 8:11 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > > >
> > > > > On Tue, Dec 31, 2019 at 08:00:17AM -0700, Tamas K Lengyel wrote:
> > > > > > On Tue, Dec 31, 2019 at 3:40 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > > > > >
> > > > > > > On Mon, Dec 30, 2019 at 05:37:38PM -0700, Tamas K Lengyel wrote:
> > > > > > > > On Mon, Dec 30, 2019 at 5:20 PM Julien Grall <julien.grall@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > On Mon, 30 Dec 2019, 20:49 Tamas K Lengyel, <tamas@tklengyel.com> wrote:
> > > > > > > > >>
> > > > > > > > >> On Mon, Dec 30, 2019 at 11:43 AM Julien Grall <julien@xen.org> wrote:
> > > > > > > > >> But keep in mind that the "fork-vm" command even with this update
> > > > > > > > >> would still not produce for you a "fully functional" VM on its own.
> > > > > > > > >> The user still has to produce a new VM config file, create the new
> > > > > > > > >> disk, save the QEMU state, etc.
> > > > > > >
> > > > > > > IMO the default behavior of the fork command should be to leave the
> > > > > > > original VM paused, so that you can continue using the same disk and
> > > > > > > network config in the fork and you won't need to pass a new config
> > > > > > > file.
> > > > > > >
> > > > > > > As Julien already said, maybe I wasn't clear in my previous replies:
> > > > > > > I'm not asking you to implement all this, it's fine if the
> > > > > > > implementation of the fork-vm xl command requires you to pass certain
> > > > > > > options, and that the default behavior is not implemented.
> > > > > > >
> > > > > > > We need an interface that's sane, and that's designed to be easy and
> > > > > > > comprehensive to use, not an interface built around what's currently
> > > > > > > implemented.
> > > > > >
> > > > > > OK, so I think that would look like "xl fork-vm <parent_domid>" with
> > > > > > additional options for things like name, disk, vlan, or a completely
> > > > > > new config, all of which are currently not implemented, + an
> > > > > > additional option to not launch QEMU at all, which would be the only
> > > > > > one currently working. Also keeping the separate "xl fork-launch-dm"
> > > > > > as is. Is that what we are talking about?
> > > > >
> > > > > I think fork-launch-vm should just be an option of fork-vm (ie:
> > > > > --launch-dm-only or some such). I don't think there's a reason to have
> > > > > a separate top-level command to just launch the device model.
> > > >
> > > > It's just that the fork-launch-dm needs the domid of the fork, while
> > > > the fork-vm needs the parent's domid. But I guess we can interpret the
> > > > "domid" required input differently depending on which sub-option is
> > > > specified for the command. Let's see how it pans out.
> > >
> > > How does the following look for the interface?
> > >
> > >     { "fork-vm",
> > >       &main_fork_vm, 0, 1,
> > >       "Fork a domain from the running parent domid",
> > >       "[options] <Domid>",
> > >       "-h                           Print this help.\n"
> > >       "-N <name>                    Assign name to VM fork.\n"
> > >       "-D <disk>                    Assign disk to VM fork.\n"
> > >       "-B <bridge                   Assign bridge to VM fork.\n"
> > >       "-V <vlan>                    Assign vlan to VM fork.\n"
> >
> > IMO I think the name of fork is the only useful option. Being able to
> > assign disks or bridges from the command line seems quite complicated.
> > What about VMs with multiple disks? Or VMs with multiple nics on
> > different bridges?
> >
> > I think it's easier for both the implementation and the user to just
> > use a config file in that case.
> 
> I agree but it sounded to me you guys wanted to have a "complete"
> interface even if it's unimplemented. This is what a complete
> interface would look to me.

I would add those options afterwards if there's a need for them. I was
mainly concerned about introducing a top level command (ie: fork-vm)
that would require calling other commands in order to get a functional
fork. I'm not so concerned about having all the possible options
listed now, as long as the default behavior of fork-vm is something
sane that produces a working fork, even if not fully implemented at
this stage.

> >
> > >       "-C <config>                  Use config file for VM fork.\n"
> > >       "-Q <qemu-save-file>          Use qemu save file for VM fork.\n"
> > >       "--launch-dm  <yes|no|late>   Launch device model (QEMU) for VM fork.\n"
> > >       "--fork-reset                 Reset VM fork.\n"
> > >       "-p                           Do not unpause VMs after fork."
> >
> > I think the default behaviour should be to leave the original VM
> > paused and the forked one running, and hence this should be:
> 
> That is the default. I guess the text saying VMs was not correctly
> worded, it just means don't unpause fork after it's created. The
> parent remains always paused.

Ack.

> >
> >         "-p                           Leave forked VM paused."
> >         "-u                           Leave parent VM unpaused."
> 
> But you shouldn't unpause the parent VM at all. It should remain
> paused as long as there are forks running that were split from it.
> Unpausing it will lead to subtle and unexplainable crashes in the fork
> since the fork now will use pages that are from a different execution
> path. Technically in the future it would be possible to unpause the VM
> but it requires to fully populate the pagetables in all forks made
> from it with mem_shared entries and deduplicate to the forks all the
> pages that's can't be be shared.

Oh, OK, since I didn't look at the implementation yet I assumed that
the parent would also be switched to trap on memory writes, so that
you could duplicate the pages before the parent writes to them, and
hence the parent could be left running.

Anyway, let's forget about the "leave parent unpaused" option then.

> That was what I originally tried to
> do but it was extremely slow, hence the lazy-population of the
> pagetable in the forks.
> 
> >
> > >       "-h                           Print this help.\n"
> > >       "-d                           Enable debug messages.\n"
> > >     },
> > >
> > > Currently the parts that are implemented would look like:
> > > xl fork-vm -p --launch-dm no <parent_domid>
> > > xl fork-vm -p --launch-dm late -C <config> -Q <qemu-save-file> <fork_domid>
> >
> > Why do you need a config file for launching the Qemu device model?
> > Doesn't the save-file contain all the information?
> 
> The config is used to populate xenstore, not just for QEMU. The QEMU
> save file doesn't contain the xl config. This is not a full VM save
> file, it is only the QEMU state that gets dumped with
> xen-save-devices-state.

TBH I think it would be easier to have something like my proposal
below, where you tell xl the parent and the forked VM names and xl
does the rest. Even better would be to not have to tell xl the parent
VM name (since I guess this is already tracked internally somewhere?).

Anyway, I'm not going to insist on this but the workflow of the Qemu
forking seems to not be very user friendly unless you know exactly how
to use it.

> 
> >
> > I think you also need something like:
> >
> > # xl fork-vm --launch-dm late <parent_domid> <fork_domid>
> >
> > So that a user doesn't need to pass a qemu-save-file?
> 
> This doesn't make much sense to me. To launch QEMU you need the config
> file to wire things up correctly. Like in order to launch QEMU you
> need to tell it the name of the VM, disk path, etc. that are all
> contained in the config.

You could get all this information from the parent VM, IIRC libxl has
a json version of the config. For example for migration there's no
need to pass any config file, since the incoming VM can be recreated
from the data in the source VM.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2020-01-08 16:34                       ` George Dunlap
  2020-01-08 17:06                         ` Tamas K Lengyel
@ 2020-01-08 18:07                         ` Roger Pau Monné
  1 sibling, 0 replies; 96+ messages in thread
From: Roger Pau Monné @ 2020-01-08 18:07 UTC (permalink / raw)
  To: George Dunlap
  Cc: Petre Pircalabu, Tamas K Lengyel, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Stefano Stabellini, Jan Beulich,
	Xen-devel, Anthony PERARD, Julien Grall

On Wed, Jan 08, 2020 at 04:34:49PM +0000, George Dunlap wrote:
> On 12/31/19 3:11 PM, Roger Pau Monné wrote:
> > On Tue, Dec 31, 2019 at 08:00:17AM -0700, Tamas K Lengyel wrote:
> >> On Tue, Dec 31, 2019 at 3:40 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>>
> >>> On Mon, Dec 30, 2019 at 05:37:38PM -0700, Tamas K Lengyel wrote:
> >>>> On Mon, Dec 30, 2019 at 5:20 PM Julien Grall <julien.grall@gmail.com> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> On Mon, 30 Dec 2019, 20:49 Tamas K Lengyel, <tamas@tklengyel.com> wrote:
> >>>>>>
> >>>>>> On Mon, Dec 30, 2019 at 11:43 AM Julien Grall <julien@xen.org> wrote:
> >>>>>> But keep in mind that the "fork-vm" command even with this update
> >>>>>> would still not produce for you a "fully functional" VM on its own.
> >>>>>> The user still has to produce a new VM config file, create the new
> >>>>>> disk, save the QEMU state, etc.
> >>>
> >>> IMO the default behavior of the fork command should be to leave the
> >>> original VM paused, so that you can continue using the same disk and
> >>> network config in the fork and you won't need to pass a new config
> >>> file.
> >>>
> >>> As Julien already said, maybe I wasn't clear in my previous replies:
> >>> I'm not asking you to implement all this, it's fine if the
> >>> implementation of the fork-vm xl command requires you to pass certain
> >>> options, and that the default behavior is not implemented.
> >>>
> >>> We need an interface that's sane, and that's designed to be easy and
> >>> comprehensive to use, not an interface built around what's currently
> >>> implemented.
> >>
> >> OK, so I think that would look like "xl fork-vm <parent_domid>" with
> >> additional options for things like name, disk, vlan, or a completely
> >> new config, all of which are currently not implemented, + an
> >> additional option to not launch QEMU at all, which would be the only
> >> one currently working. Also keeping the separate "xl fork-launch-dm"
> >> as is. Is that what we are talking about?
> > 
> > I think fork-launch-vm should just be an option of fork-vm (ie:
> > --launch-dm-only or some such). I don't think there's a reason to have
> > a separate top-level command to just launch the device model.
> 
> So first of all, Tamas -- do you actually need to exec xl here?  Would
> it make sense for these to start out simply as libxl functions that are
> called by your system?
> 
> I actually disagree that we want a single command to do all of these.
> If we did want `exec xl` to be one of the supported interfaces, I think
> it would break down something like this:
> 
> `xl fork-domain`: Only forks the domain.
> `xl fork-launch-dm`: (or attach-dm?): Start up and attach the
> devicemodel to the domain
> 
> Then `xl fork` (or maybe `xl fork-vm`) would be something implemented in
> the future that would fork the entire domain.

I don't have a strong opinion on whether we should have a bunch of
fork-* commands or a single one. My preference would be for a single
one because I think other commands can be implemented as options.

What I would like to prevent is ending up with something like
fork-domain and fork-vm commands, which look like aliases, and can
lead to confusion.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2020-01-08 18:00                               ` Roger Pau Monné
@ 2020-01-08 18:14                                 ` Tamas K Lengyel
  2020-01-08 18:23                                   ` Tamas K Lengyel
  2020-01-08 18:36                                   ` Roger Pau Monné
  0 siblings, 2 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 18:14 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall

On Wed, Jan 8, 2020 at 11:01 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Wed, Jan 08, 2020 at 08:32:22AM -0700, Tamas K Lengyel wrote:
> > On Wed, Jan 8, 2020 at 8:08 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >
> > > On Tue, Dec 31, 2019 at 09:36:01AM -0700, Tamas K Lengyel wrote:
> > > > On Tue, Dec 31, 2019 at 9:08 AM Tamas K Lengyel <tamas@tklengyel.com> wrote:
> > > > >
> > > > > On Tue, Dec 31, 2019 at 8:11 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > > > >
> > > > > > On Tue, Dec 31, 2019 at 08:00:17AM -0700, Tamas K Lengyel wrote:
> > > > > > > On Tue, Dec 31, 2019 at 3:40 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Dec 30, 2019 at 05:37:38PM -0700, Tamas K Lengyel wrote:
> > > > > > > > > On Mon, Dec 30, 2019 at 5:20 PM Julien Grall <julien.grall@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > On Mon, 30 Dec 2019, 20:49 Tamas K Lengyel, <tamas@tklengyel.com> wrote:
> > > > > > > > > >>
> > > > > > > > > >> On Mon, Dec 30, 2019 at 11:43 AM Julien Grall <julien@xen.org> wrote:
> > > > > > > > > >> But keep in mind that the "fork-vm" command even with this update
> > > > > > > > > >> would still not produce for you a "fully functional" VM on its own.
> > > > > > > > > >> The user still has to produce a new VM config file, create the new
> > > > > > > > > >> disk, save the QEMU state, etc.
> > > > > > > >
> > > > > > > > IMO the default behavior of the fork command should be to leave the
> > > > > > > > original VM paused, so that you can continue using the same disk and
> > > > > > > > network config in the fork and you won't need to pass a new config
> > > > > > > > file.
> > > > > > > >
> > > > > > > > As Julien already said, maybe I wasn't clear in my previous replies:
> > > > > > > > I'm not asking you to implement all this, it's fine if the
> > > > > > > > implementation of the fork-vm xl command requires you to pass certain
> > > > > > > > options, and that the default behavior is not implemented.
> > > > > > > >
> > > > > > > > We need an interface that's sane, and that's designed to be easy and
> > > > > > > > comprehensive to use, not an interface built around what's currently
> > > > > > > > implemented.
> > > > > > >
> > > > > > > OK, so I think that would look like "xl fork-vm <parent_domid>" with
> > > > > > > additional options for things like name, disk, vlan, or a completely
> > > > > > > new config, all of which are currently not implemented, + an
> > > > > > > additional option to not launch QEMU at all, which would be the only
> > > > > > > one currently working. Also keeping the separate "xl fork-launch-dm"
> > > > > > > as is. Is that what we are talking about?
> > > > > >
> > > > > > I think fork-launch-vm should just be an option of fork-vm (ie:
> > > > > > --launch-dm-only or some such). I don't think there's a reason to have
> > > > > > a separate top-level command to just launch the device model.
> > > > >
> > > > > It's just that the fork-launch-dm needs the domid of the fork, while
> > > > > the fork-vm needs the parent's domid. But I guess we can interpret the
> > > > > "domid" required input differently depending on which sub-option is
> > > > > specified for the command. Let's see how it pans out.
> > > >
> > > > How does the following look for the interface?
> > > >
> > > >     { "fork-vm",
> > > >       &main_fork_vm, 0, 1,
> > > >       "Fork a domain from the running parent domid",
> > > >       "[options] <Domid>",
> > > >       "-h                           Print this help.\n"
> > > >       "-N <name>                    Assign name to VM fork.\n"
> > > >       "-D <disk>                    Assign disk to VM fork.\n"
> > > >       "-B <bridge                   Assign bridge to VM fork.\n"
> > > >       "-V <vlan>                    Assign vlan to VM fork.\n"
> > >
> > > IMO I think the name of fork is the only useful option. Being able to
> > > assign disks or bridges from the command line seems quite complicated.
> > > What about VMs with multiple disks? Or VMs with multiple nics on
> > > different bridges?
> > >
> > > I think it's easier for both the implementation and the user to just
> > > use a config file in that case.
> >
> > I agree but it sounded to me you guys wanted to have a "complete"
> > interface even if it's unimplemented. This is what a complete
> > interface would look to me.
>
> I would add those options afterwards if there's a need for them. I was
> mainly concerned about introducing a top level command (ie: fork-vm)
> that would require calling other commands in order to get a functional
> fork. I'm not so concerned about having all the possible options
> listed now, as long as the default behavior of fork-vm is something
> sane that produces a working fork, even if not fully implemented at
> this stage.

OK

> > > Why do you need a config file for launching the Qemu device model?
> > > Doesn't the save-file contain all the information?
> >
> > The config is used to populate xenstore, not just for QEMU. The QEMU
> > save file doesn't contain the xl config. This is not a full VM save
> > file, it is only the QEMU state that gets dumped with
> > xen-save-devices-state.
>
> TBH I think it would be easier to have something like my proposal
> below, where you tell xl the parent and the forked VM names and xl
> does the rest. Even better would be to not have to tell xl the parent
> VM name (since I guess this is already tracked internally somewhere?).

The forked VM has no "name" when it's created. For performance reasons
when the VM fork is created with "--launch-dm no" we explicitly want
to avoid touching Xenstore. Even parsing the config file would be
unneeded overhead at that stage.

>
> Anyway, I'm not going to insist on this but the workflow of the Qemu
> forking seems to not be very user friendly unless you know exactly how
> to use it.
>
> >
> > >
> > > I think you also need something like:
> > >
> > > # xl fork-vm --launch-dm late <parent_domid> <fork_domid>
> > >
> > > So that a user doesn't need to pass a qemu-save-file?
> >
> > This doesn't make much sense to me. To launch QEMU you need the config
> > file to wire things up correctly. Like in order to launch QEMU you
> > need to tell it the name of the VM, disk path, etc. that are all
> > contained in the config.
>
> You could get all this information from the parent VM, IIRC libxl has
> a json version of the config. For example for migration there's no
> need to pass any config file, since the incoming VM can be recreated
> from the data in the source VM.
>

But again, creating a fork with the exact config of the parent is not
possible. Even if the tool would rename the fork on-the-fly as it does
during the migration, the fork would end up thrashing the parent VM's
disk and making it impossible to create any additional forks. It would
also mean that at no point can the original VM be unpaused after the
forks are gone. I don't see any usecase in which that would make any
sense at all.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2020-01-08 18:14                                 ` Tamas K Lengyel
@ 2020-01-08 18:23                                   ` Tamas K Lengyel
  2020-01-08 18:44                                     ` Roger Pau Monné
  2020-01-08 18:36                                   ` Roger Pau Monné
  1 sibling, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 18:23 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall

> > > > Why do you need a config file for launching the Qemu device model?
> > > > Doesn't the save-file contain all the information?
> > >
> > > The config is used to populate xenstore, not just for QEMU. The QEMU
> > > save file doesn't contain the xl config. This is not a full VM save
> > > file, it is only the QEMU state that gets dumped with
> > > xen-save-devices-state.
> >
> > TBH I think it would be easier to have something like my proposal
> > below, where you tell xl the parent and the forked VM names and xl
> > does the rest. Even better would be to not have to tell xl the parent
> > VM name (since I guess this is already tracked internally somewhere?).
>
> The forked VM has no "name" when it's created. For performance reasons
> when the VM fork is created with "--launch-dm no" we explicitly want
> to avoid touching Xenstore. Even parsing the config file would be
> unneeded overhead at that stage.

And to answer your question, no, the parent VM's name is not recorded
anywhere for the fork. Technically not even the parent's domain id is
kept by Xen. The fork only keeps a pointer to the parent's "struct
domain". So right now there is no hypercall interface to retrieve a
fork's parent's ID - it is assumed the tools using the interface are
keeping track of that. Could this information be dumped into Xenstore
as well? Yes. But we specifically want be able to create the fork as
fast possible without any unnecessary overhead.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2020-01-08 18:14                                 ` Tamas K Lengyel
  2020-01-08 18:23                                   ` Tamas K Lengyel
@ 2020-01-08 18:36                                   ` Roger Pau Monné
  2020-01-08 19:51                                     ` Tamas K Lengyel
  1 sibling, 1 reply; 96+ messages in thread
From: Roger Pau Monné @ 2020-01-08 18:36 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall

On Wed, Jan 08, 2020 at 11:14:46AM -0700, Tamas K Lengyel wrote:
> On Wed, Jan 8, 2020 at 11:01 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Wed, Jan 08, 2020 at 08:32:22AM -0700, Tamas K Lengyel wrote:
> > > On Wed, Jan 8, 2020 at 8:08 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > >
> > > > On Tue, Dec 31, 2019 at 09:36:01AM -0700, Tamas K Lengyel wrote:
> > > > > On Tue, Dec 31, 2019 at 9:08 AM Tamas K Lengyel <tamas@tklengyel.com> wrote:
> > > > > >
> > > > > > On Tue, Dec 31, 2019 at 8:11 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > > > > >
> > > > > > > On Tue, Dec 31, 2019 at 08:00:17AM -0700, Tamas K Lengyel wrote:
> > > > > > > > On Tue, Dec 31, 2019 at 3:40 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Dec 30, 2019 at 05:37:38PM -0700, Tamas K Lengyel wrote:
> > > > > > > > > > On Mon, Dec 30, 2019 at 5:20 PM Julien Grall <julien.grall@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > On Mon, 30 Dec 2019, 20:49 Tamas K Lengyel, <tamas@tklengyel.com> wrote:
> > > > > > > > > > >>
> > > > > > > > > > >> On Mon, Dec 30, 2019 at 11:43 AM Julien Grall <julien@xen.org> wrote:
> > > > > > > > > > >> But keep in mind that the "fork-vm" command even with this update
> > > > > > > > > > >> would still not produce for you a "fully functional" VM on its own.
> > > > > > > > > > >> The user still has to produce a new VM config file, create the new
> > > > > > > > > > >> disk, save the QEMU state, etc.
> > > > > > > > >
> > > > > > > > > IMO the default behavior of the fork command should be to leave the
> > > > > > > > > original VM paused, so that you can continue using the same disk and
> > > > > > > > > network config in the fork and you won't need to pass a new config
> > > > > > > > > file.
> > > > > > > > >
> > > > > > > > > As Julien already said, maybe I wasn't clear in my previous replies:
> > > > > > > > > I'm not asking you to implement all this, it's fine if the
> > > > > > > > > implementation of the fork-vm xl command requires you to pass certain
> > > > > > > > > options, and that the default behavior is not implemented.
> > > > > > > > >
> > > > > > > > > We need an interface that's sane, and that's designed to be easy and
> > > > > > > > > comprehensive to use, not an interface built around what's currently
> > > > > > > > > implemented.
> > > > > > > >
> > > > > > > > OK, so I think that would look like "xl fork-vm <parent_domid>" with
> > > > > > > > additional options for things like name, disk, vlan, or a completely
> > > > > > > > new config, all of which are currently not implemented, + an
> > > > > > > > additional option to not launch QEMU at all, which would be the only
> > > > > > > > one currently working. Also keeping the separate "xl fork-launch-dm"
> > > > > > > > as is. Is that what we are talking about?
> > > > > > >
> > > > > > > I think fork-launch-vm should just be an option of fork-vm (ie:
> > > > > > > --launch-dm-only or some such). I don't think there's a reason to have
> > > > > > > a separate top-level command to just launch the device model.
> > > > > >
> > > > > > It's just that the fork-launch-dm needs the domid of the fork, while
> > > > > > the fork-vm needs the parent's domid. But I guess we can interpret the
> > > > > > "domid" required input differently depending on which sub-option is
> > > > > > specified for the command. Let's see how it pans out.
> > > > >
> > > > > How does the following look for the interface?
> > > > >
> > > > >     { "fork-vm",
> > > > >       &main_fork_vm, 0, 1,
> > > > >       "Fork a domain from the running parent domid",
> > > > >       "[options] <Domid>",
> > > > >       "-h                           Print this help.\n"
> > > > >       "-N <name>                    Assign name to VM fork.\n"
> > > > >       "-D <disk>                    Assign disk to VM fork.\n"
> > > > >       "-B <bridge                   Assign bridge to VM fork.\n"
> > > > >       "-V <vlan>                    Assign vlan to VM fork.\n"
> > > >
> > > > IMO I think the name of fork is the only useful option. Being able to
> > > > assign disks or bridges from the command line seems quite complicated.
> > > > What about VMs with multiple disks? Or VMs with multiple nics on
> > > > different bridges?
> > > >
> > > > I think it's easier for both the implementation and the user to just
> > > > use a config file in that case.
> > >
> > > I agree but it sounded to me you guys wanted to have a "complete"
> > > interface even if it's unimplemented. This is what a complete
> > > interface would look to me.
> >
> > I would add those options afterwards if there's a need for them. I was
> > mainly concerned about introducing a top level command (ie: fork-vm)
> > that would require calling other commands in order to get a functional
> > fork. I'm not so concerned about having all the possible options
> > listed now, as long as the default behavior of fork-vm is something
> > sane that produces a working fork, even if not fully implemented at
> > this stage.
> 
> OK
> 
> > > > Why do you need a config file for launching the Qemu device model?
> > > > Doesn't the save-file contain all the information?
> > >
> > > The config is used to populate xenstore, not just for QEMU. The QEMU
> > > save file doesn't contain the xl config. This is not a full VM save
> > > file, it is only the QEMU state that gets dumped with
> > > xen-save-devices-state.
> >
> > TBH I think it would be easier to have something like my proposal
> > below, where you tell xl the parent and the forked VM names and xl
> > does the rest. Even better would be to not have to tell xl the parent
> > VM name (since I guess this is already tracked internally somewhere?).
> 
> The forked VM has no "name" when it's created. For performance reasons
> when the VM fork is created with "--launch-dm no" we explicitly want
> to avoid touching Xenstore. Even parsing the config file would be
> unneeded overhead at that stage.

I think you need another option to tell xl to not name the forked VM,
abusing of "--launch-dm no" to not set a name is not expected I think.

> >
> > Anyway, I'm not going to insist on this but the workflow of the Qemu
> > forking seems to not be very user friendly unless you know exactly how
> > to use it.
> >
> > >
> > > >
> > > > I think you also need something like:
> > > >
> > > > # xl fork-vm --launch-dm late <parent_domid> <fork_domid>
> > > >
> > > > So that a user doesn't need to pass a qemu-save-file?
> > >
> > > This doesn't make much sense to me. To launch QEMU you need the config
> > > file to wire things up correctly. Like in order to launch QEMU you
> > > need to tell it the name of the VM, disk path, etc. that are all
> > > contained in the config.
> >
> > You could get all this information from the parent VM, IIRC libxl has
> > a json version of the config. For example for migration there's no
> > need to pass any config file, since the incoming VM can be recreated
> > from the data in the source VM.
> >
> 
> But again, creating a fork with the exact config of the parent is not
> possible. Even if the tool would rename the fork on-the-fly as it does
> during the migration, the fork would end up thrashing the parent VM's
> disk and making it impossible to create any additional forks. It would
> also mean that at no point can the original VM be unpaused after the
> forks are gone. I don't see any usecase in which that would make any
> sense at all.

You could have the disk(s) as read-only and the VM running completely
from RAM. Alpine-linux has (or had) a mode where it was completely
stateless and running from RAM. I think it's fine to require passing a
config file for the time being, we can look at other options
afterwards.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2020-01-08 18:23                                   ` Tamas K Lengyel
@ 2020-01-08 18:44                                     ` Roger Pau Monné
  2020-01-08 19:47                                       ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Roger Pau Monné @ 2020-01-08 18:44 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall

On Wed, Jan 08, 2020 at 11:23:29AM -0700, Tamas K Lengyel wrote:
> > > > > Why do you need a config file for launching the Qemu device model?
> > > > > Doesn't the save-file contain all the information?
> > > >
> > > > The config is used to populate xenstore, not just for QEMU. The QEMU
> > > > save file doesn't contain the xl config. This is not a full VM save
> > > > file, it is only the QEMU state that gets dumped with
> > > > xen-save-devices-state.
> > >
> > > TBH I think it would be easier to have something like my proposal
> > > below, where you tell xl the parent and the forked VM names and xl
> > > does the rest. Even better would be to not have to tell xl the parent
> > > VM name (since I guess this is already tracked internally somewhere?).
> >
> > The forked VM has no "name" when it's created. For performance reasons
> > when the VM fork is created with "--launch-dm no" we explicitly want
> > to avoid touching Xenstore. Even parsing the config file would be
> > unneeded overhead at that stage.
> 
> And to answer your question, no, the parent VM's name is not recorded
> anywhere for the fork. Technically not even the parent's domain id is
> kept by Xen. The fork only keeps a pointer to the parent's "struct
> domain"

There's the domain_id field inside of struct domain, so it seems quite
easy to get the parent domid from the fork if there's a pointer to the
parent's struct domain.

> So right now there is no hypercall interface to retrieve a
> fork's parent's ID - it is assumed the tools using the interface are
> keeping track of that. Could this information be dumped into Xenstore
> as well? Yes. But we specifically want be able to create the fork as
> fast possible without any unnecessary overhead.

I think it would be nice to identify forked domains using
XEN_DOMCTL_getdomaininfo: you could add a parent_domid field to
xen_domctl_getdomaininfo and if it's set to something different than
DOMID_INVALID then the domain is a fork of the given domid.

Not saying it should be done now, but AFAICT getting the parent's
domid is feasible and doesn't require xenstore.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2020-01-08 18:44                                     ` Roger Pau Monné
@ 2020-01-08 19:47                                       ` Tamas K Lengyel
  0 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 19:47 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall

On Wed, Jan 8, 2020 at 11:44 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Wed, Jan 08, 2020 at 11:23:29AM -0700, Tamas K Lengyel wrote:
> > > > > > Why do you need a config file for launching the Qemu device model?
> > > > > > Doesn't the save-file contain all the information?
> > > > >
> > > > > The config is used to populate xenstore, not just for QEMU. The QEMU
> > > > > save file doesn't contain the xl config. This is not a full VM save
> > > > > file, it is only the QEMU state that gets dumped with
> > > > > xen-save-devices-state.
> > > >
> > > > TBH I think it would be easier to have something like my proposal
> > > > below, where you tell xl the parent and the forked VM names and xl
> > > > does the rest. Even better would be to not have to tell xl the parent
> > > > VM name (since I guess this is already tracked internally somewhere?).
> > >
> > > The forked VM has no "name" when it's created. For performance reasons
> > > when the VM fork is created with "--launch-dm no" we explicitly want
> > > to avoid touching Xenstore. Even parsing the config file would be
> > > unneeded overhead at that stage.
> >
> > And to answer your question, no, the parent VM's name is not recorded
> > anywhere for the fork. Technically not even the parent's domain id is
> > kept by Xen. The fork only keeps a pointer to the parent's "struct
> > domain"
>
> There's the domain_id field inside of struct domain, so it seems quite
> easy to get the parent domid from the fork if there's a pointer to the
> parent's struct domain.
>
> > So right now there is no hypercall interface to retrieve a
> > fork's parent's ID - it is assumed the tools using the interface are
> > keeping track of that. Could this information be dumped into Xenstore
> > as well? Yes. But we specifically want be able to create the fork as
> > fast possible without any unnecessary overhead.
>
> I think it would be nice to identify forked domains using
> XEN_DOMCTL_getdomaininfo: you could add a parent_domid field to
> xen_domctl_getdomaininfo and if it's set to something different than
> DOMID_INVALID then the domain is a fork of the given domid.
>
> Not saying it should be done now, but AFAICT getting the parent's
> domid is feasible and doesn't require xenstore.
>

Of course it could be done. I was just pointing out that it's not
currently kept separately and there is no interface to retrieve it.
But TBH I have lost the train the though why we would need that in the
first place? When QEMU is being launched the fork is already created
and QEMU doesn't need to know anything about the parent.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2020-01-08 18:36                                   ` Roger Pau Monné
@ 2020-01-08 19:51                                     ` Tamas K Lengyel
  2020-01-09  9:47                                       ` Roger Pau Monné
  0 siblings, 1 reply; 96+ messages in thread
From: Tamas K Lengyel @ 2020-01-08 19:51 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall

On Wed, Jan 8, 2020 at 11:37 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Wed, Jan 08, 2020 at 11:14:46AM -0700, Tamas K Lengyel wrote:
> > On Wed, Jan 8, 2020 at 11:01 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >
> > > On Wed, Jan 08, 2020 at 08:32:22AM -0700, Tamas K Lengyel wrote:
> > > > On Wed, Jan 8, 2020 at 8:08 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > > >
> > > > > On Tue, Dec 31, 2019 at 09:36:01AM -0700, Tamas K Lengyel wrote:
> > > > > > On Tue, Dec 31, 2019 at 9:08 AM Tamas K Lengyel <tamas@tklengyel.com> wrote:
> > > > > > >
> > > > > > > On Tue, Dec 31, 2019 at 8:11 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Dec 31, 2019 at 08:00:17AM -0700, Tamas K Lengyel wrote:
> > > > > > > > > On Tue, Dec 31, 2019 at 3:40 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 30, 2019 at 05:37:38PM -0700, Tamas K Lengyel wrote:
> > > > > > > > > > > On Mon, Dec 30, 2019 at 5:20 PM Julien Grall <julien.grall@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi,
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, 30 Dec 2019, 20:49 Tamas K Lengyel, <tamas@tklengyel.com> wrote:
> > > > > > > > > > > >>
> > > > > > > > > > > >> On Mon, Dec 30, 2019 at 11:43 AM Julien Grall <julien@xen.org> wrote:
> > > > > > > > > > > >> But keep in mind that the "fork-vm" command even with this update
> > > > > > > > > > > >> would still not produce for you a "fully functional" VM on its own.
> > > > > > > > > > > >> The user still has to produce a new VM config file, create the new
> > > > > > > > > > > >> disk, save the QEMU state, etc.
> > > > > > > > > >
> > > > > > > > > > IMO the default behavior of the fork command should be to leave the
> > > > > > > > > > original VM paused, so that you can continue using the same disk and
> > > > > > > > > > network config in the fork and you won't need to pass a new config
> > > > > > > > > > file.
> > > > > > > > > >
> > > > > > > > > > As Julien already said, maybe I wasn't clear in my previous replies:
> > > > > > > > > > I'm not asking you to implement all this, it's fine if the
> > > > > > > > > > implementation of the fork-vm xl command requires you to pass certain
> > > > > > > > > > options, and that the default behavior is not implemented.
> > > > > > > > > >
> > > > > > > > > > We need an interface that's sane, and that's designed to be easy and
> > > > > > > > > > comprehensive to use, not an interface built around what's currently
> > > > > > > > > > implemented.
> > > > > > > > >
> > > > > > > > > OK, so I think that would look like "xl fork-vm <parent_domid>" with
> > > > > > > > > additional options for things like name, disk, vlan, or a completely
> > > > > > > > > new config, all of which are currently not implemented, + an
> > > > > > > > > additional option to not launch QEMU at all, which would be the only
> > > > > > > > > one currently working. Also keeping the separate "xl fork-launch-dm"
> > > > > > > > > as is. Is that what we are talking about?
> > > > > > > >
> > > > > > > > I think fork-launch-vm should just be an option of fork-vm (ie:
> > > > > > > > --launch-dm-only or some such). I don't think there's a reason to have
> > > > > > > > a separate top-level command to just launch the device model.
> > > > > > >
> > > > > > > It's just that the fork-launch-dm needs the domid of the fork, while
> > > > > > > the fork-vm needs the parent's domid. But I guess we can interpret the
> > > > > > > "domid" required input differently depending on which sub-option is
> > > > > > > specified for the command. Let's see how it pans out.
> > > > > >
> > > > > > How does the following look for the interface?
> > > > > >
> > > > > >     { "fork-vm",
> > > > > >       &main_fork_vm, 0, 1,
> > > > > >       "Fork a domain from the running parent domid",
> > > > > >       "[options] <Domid>",
> > > > > >       "-h                           Print this help.\n"
> > > > > >       "-N <name>                    Assign name to VM fork.\n"
> > > > > >       "-D <disk>                    Assign disk to VM fork.\n"
> > > > > >       "-B <bridge                   Assign bridge to VM fork.\n"
> > > > > >       "-V <vlan>                    Assign vlan to VM fork.\n"
> > > > >
> > > > > IMO I think the name of fork is the only useful option. Being able to
> > > > > assign disks or bridges from the command line seems quite complicated.
> > > > > What about VMs with multiple disks? Or VMs with multiple nics on
> > > > > different bridges?
> > > > >
> > > > > I think it's easier for both the implementation and the user to just
> > > > > use a config file in that case.
> > > >
> > > > I agree but it sounded to me you guys wanted to have a "complete"
> > > > interface even if it's unimplemented. This is what a complete
> > > > interface would look to me.
> > >
> > > I would add those options afterwards if there's a need for them. I was
> > > mainly concerned about introducing a top level command (ie: fork-vm)
> > > that would require calling other commands in order to get a functional
> > > fork. I'm not so concerned about having all the possible options
> > > listed now, as long as the default behavior of fork-vm is something
> > > sane that produces a working fork, even if not fully implemented at
> > > this stage.
> >
> > OK
> >
> > > > > Why do you need a config file for launching the Qemu device model?
> > > > > Doesn't the save-file contain all the information?
> > > >
> > > > The config is used to populate xenstore, not just for QEMU. The QEMU
> > > > save file doesn't contain the xl config. This is not a full VM save
> > > > file, it is only the QEMU state that gets dumped with
> > > > xen-save-devices-state.
> > >
> > > TBH I think it would be easier to have something like my proposal
> > > below, where you tell xl the parent and the forked VM names and xl
> > > does the rest. Even better would be to not have to tell xl the parent
> > > VM name (since I guess this is already tracked internally somewhere?).
> >
> > The forked VM has no "name" when it's created. For performance reasons
> > when the VM fork is created with "--launch-dm no" we explicitly want
> > to avoid touching Xenstore. Even parsing the config file would be
> > unneeded overhead at that stage.
>
> I think you need another option to tell xl to not name the forked VM,
> abusing of "--launch-dm no" to not set a name is not expected I think.

See my reply below.

>
> > >
> > > Anyway, I'm not going to insist on this but the workflow of the Qemu
> > > forking seems to not be very user friendly unless you know exactly how
> > > to use it.
> > >
> > > >
> > > > >
> > > > > I think you also need something like:
> > > > >
> > > > > # xl fork-vm --launch-dm late <parent_domid> <fork_domid>
> > > > >
> > > > > So that a user doesn't need to pass a qemu-save-file?
> > > >
> > > > This doesn't make much sense to me. To launch QEMU you need the config
> > > > file to wire things up correctly. Like in order to launch QEMU you
> > > > need to tell it the name of the VM, disk path, etc. that are all
> > > > contained in the config.
> > >
> > > You could get all this information from the parent VM, IIRC libxl has
> > > a json version of the config. For example for migration there's no
> > > need to pass any config file, since the incoming VM can be recreated
> > > from the data in the source VM.
> > >
> >
> > But again, creating a fork with the exact config of the parent is not
> > possible. Even if the tool would rename the fork on-the-fly as it does
> > during the migration, the fork would end up thrashing the parent VM's
> > disk and making it impossible to create any additional forks. It would
> > also mean that at no point can the original VM be unpaused after the
> > forks are gone. I don't see any usecase in which that would make any
> > sense at all.
>
> You could have the disk(s) as read-only and the VM running completely
> from RAM. Alpine-linux has (or had) a mode where it was completely
> stateless and running from RAM. I think it's fine to require passing a
> config file for the time being, we can look at other options
> afterwards.
>

OK, there is that. But I would say that's a fairly niche use-case. You
wouldn't have any network access in that fork, no disk, no way to get
information in or out beside the serial console. So I wouldn't want
that setup to be considered the default. If someone wants to that I
would rather have an option that tells xl to automatically name the
fork for you instead of the other way around.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2020-01-08 19:51                                     ` Tamas K Lengyel
@ 2020-01-09  9:47                                       ` Roger Pau Monné
  2020-01-09 13:31                                         ` Tamas K Lengyel
  0 siblings, 1 reply; 96+ messages in thread
From: Roger Pau Monné @ 2020-01-09  9:47 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall

On Wed, Jan 08, 2020 at 12:51:35PM -0700, Tamas K Lengyel wrote:
> On Wed, Jan 8, 2020 at 11:37 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Wed, Jan 08, 2020 at 11:14:46AM -0700, Tamas K Lengyel wrote:
> > > On Wed, Jan 8, 2020 at 11:01 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > >
> > > > On Wed, Jan 08, 2020 at 08:32:22AM -0700, Tamas K Lengyel wrote:
> > > > > On Wed, Jan 8, 2020 at 8:08 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > > > > I think you also need something like:
> > > > > >
> > > > > > # xl fork-vm --launch-dm late <parent_domid> <fork_domid>
> > > > > >
> > > > > > So that a user doesn't need to pass a qemu-save-file?
> > > > >
> > > > > This doesn't make much sense to me. To launch QEMU you need the config
> > > > > file to wire things up correctly. Like in order to launch QEMU you
> > > > > need to tell it the name of the VM, disk path, etc. that are all
> > > > > contained in the config.
> > > >
> > > > You could get all this information from the parent VM, IIRC libxl has
> > > > a json version of the config. For example for migration there's no
> > > > need to pass any config file, since the incoming VM can be recreated
> > > > from the data in the source VM.
> > > >
> > >
> > > But again, creating a fork with the exact config of the parent is not
> > > possible. Even if the tool would rename the fork on-the-fly as it does
> > > during the migration, the fork would end up thrashing the parent VM's
> > > disk and making it impossible to create any additional forks. It would
> > > also mean that at no point can the original VM be unpaused after the
> > > forks are gone. I don't see any usecase in which that would make any
> > > sense at all.
> >
> > You could have the disk(s) as read-only and the VM running completely
> > from RAM. Alpine-linux has (or had) a mode where it was completely
> > stateless and running from RAM. I think it's fine to require passing a
> > config file for the time being, we can look at other options
> > afterwards.
> >
> 
> OK, there is that. But I would say that's a fairly niche use-case. You
> wouldn't have any network access in that fork, no disk, no way to get
> information in or out beside the serial console.

Why won't the fork have network access?

If the parent VM is left paused the fork should behave like a local
migration regarding network access, and thus be fully functional.

> So I wouldn't want
> that setup to be considered the default. If someone wants to that I
> would rather have an option that tells xl to automatically name the
> fork for you instead of the other way around.

Ack, I just want to make sure that whatever interface we end up using
is designed taking into account other use cases apart from the one at
hand.

On an unrelated note, does forking work when using PV interfaces?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Xen-devel] [PATCH v2 00/20] VM forking
  2020-01-09  9:47                                       ` Roger Pau Monné
@ 2020-01-09 13:31                                         ` Tamas K Lengyel
  0 siblings, 0 replies; 96+ messages in thread
From: Tamas K Lengyel @ 2020-01-09 13:31 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Petre Pircalabu, Stefano Stabellini, Tamas K Lengyel, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Julien Grall, Alexandru Isaila, Jan Beulich, Xen-devel,
	Anthony PERARD, Julien Grall

On Thu, Jan 9, 2020 at 2:48 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Wed, Jan 08, 2020 at 12:51:35PM -0700, Tamas K Lengyel wrote:
> > On Wed, Jan 8, 2020 at 11:37 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >
> > > On Wed, Jan 08, 2020 at 11:14:46AM -0700, Tamas K Lengyel wrote:
> > > > On Wed, Jan 8, 2020 at 11:01 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > > >
> > > > > On Wed, Jan 08, 2020 at 08:32:22AM -0700, Tamas K Lengyel wrote:
> > > > > > On Wed, Jan 8, 2020 at 8:08 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > > > > > I think you also need something like:
> > > > > > >
> > > > > > > # xl fork-vm --launch-dm late <parent_domid> <fork_domid>
> > > > > > >
> > > > > > > So that a user doesn't need to pass a qemu-save-file?
> > > > > >
> > > > > > This doesn't make much sense to me. To launch QEMU you need the config
> > > > > > file to wire things up correctly. Like in order to launch QEMU you
> > > > > > need to tell it the name of the VM, disk path, etc. that are all
> > > > > > contained in the config.
> > > > >
> > > > > You could get all this information from the parent VM, IIRC libxl has
> > > > > a json version of the config. For example for migration there's no
> > > > > need to pass any config file, since the incoming VM can be recreated
> > > > > from the data in the source VM.
> > > > >
> > > >
> > > > But again, creating a fork with the exact config of the parent is not
> > > > possible. Even if the tool would rename the fork on-the-fly as it does
> > > > during the migration, the fork would end up thrashing the parent VM's
> > > > disk and making it impossible to create any additional forks. It would
> > > > also mean that at no point can the original VM be unpaused after the
> > > > forks are gone. I don't see any usecase in which that would make any
> > > > sense at all.
> > >
> > > You could have the disk(s) as read-only and the VM running completely
> > > from RAM. Alpine-linux has (or had) a mode where it was completely
> > > stateless and running from RAM. I think it's fine to require passing a
> > > config file for the time being, we can look at other options
> > > afterwards.
> > >
> >
> > OK, there is that. But I would say that's a fairly niche use-case. You
> > wouldn't have any network access in that fork, no disk, no way to get
> > information in or out beside the serial console.
>
> Why won't the fork have network access?

If you have multiple forks you end up having MAC-address collision. I
don't see what would be the point of creating a single fork when the
parent remains paused - you could just keep running the parent since
you aren't gaining anything by creating the fork. The main reason to
create a fork would be to create multiples of them.

>
> If the parent VM is left paused the fork should behave like a local
> migration regarding network access, and thus be fully functional.
>
> > So I wouldn't want
> > that setup to be considered the default. If someone wants to that I
> > would rather have an option that tells xl to automatically name the
> > fork for you instead of the other way around.
>
> Ack, I just want to make sure that whatever interface we end up using
> is designed taking into account other use cases apart from the one at
> hand.
>
> On an unrelated note, does forking work when using PV interfaces?

As I recall yes, but In my Linux tests these were the config options I
tested and work with the fork. Not sure if the vif device by default
is PV or emulated:

vnc=1
vnclisten="0.0.0.0:1"

usb=1
usbdevice=['tablet']

disk = ['phy:/dev/t0vg/debian-stretch,xvda,w']
vif = ['bridge=xenbr0,mac=00:07:5B:BB:00:01']

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread

end of thread, other threads:[~2020-01-09 13:32 UTC | newest]

Thread overview: 96+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-18 19:40 [Xen-devel] [PATCH v2 00/20] VM forking Tamas K Lengyel
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 01/20] x86: make hvm_{get/set}_param accessible Tamas K Lengyel
2019-12-19 19:07   ` Andrew Cooper
2019-12-19 19:38     ` Tamas K Lengyel
2019-12-19 19:40       ` Andrew Cooper
2019-12-19 19:49         ` Tamas K Lengyel
2019-12-19 19:57           ` Andrew Cooper
2019-12-19 20:09             ` Tamas K Lengyel
2019-12-20 16:46   ` Jan Beulich
2019-12-20 17:27     ` Tamas K Lengyel
2019-12-20 17:32       ` Andrew Cooper
2019-12-20 17:36         ` Tamas K Lengyel
2019-12-20 17:46           ` Andrew Cooper
2019-12-20 17:50             ` Tamas K Lengyel
2019-12-20 18:00               ` Andrew Cooper
2019-12-20 18:05                 ` Tamas K Lengyel
2019-12-23  9:37         ` Jan Beulich
2019-12-23 14:55           ` Tamas K Lengyel
2019-12-27  8:02             ` Jan Beulich
2019-12-27 13:10               ` Tamas K Lengyel
2019-12-27 13:44                 ` Jan Beulich
2019-12-27 14:06                   ` Tamas K Lengyel
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 02/20] xen/x86: Make hap_get_allocation accessible Tamas K Lengyel
2019-12-19 19:08   ` Andrew Cooper
2019-12-20 16:48   ` Jan Beulich
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 03/20] tools/libxc: clean up memory sharing files Tamas K Lengyel
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 04/20] x86/mem_sharing: cleanup code and comments in various locations Tamas K Lengyel
2019-12-19 11:18   ` Andrew Cooper
2019-12-19 16:20     ` Tamas K Lengyel
2019-12-19 16:21     ` Tamas K Lengyel
2019-12-19 18:51       ` Andrew Cooper
2019-12-19 19:26         ` Tamas K Lengyel
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 05/20] x86/mem_sharing: make get_two_gfns take locks conditionally Tamas K Lengyel
2019-12-19 19:12   ` Andrew Cooper
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 06/20] x86/mem_sharing: drop flags from mem_sharing_unshare_page Tamas K Lengyel
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 07/20] x86/mem_sharing: don't try to unshare twice during page fault Tamas K Lengyel
2019-12-19 19:19   ` Andrew Cooper
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 08/20] x86/mem_sharing: define mem_sharing_domain to hold some scattered variables Tamas K Lengyel
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 09/20] x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in relinquish_shared_pages Tamas K Lengyel
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 10/20] x86/mem_sharing: Make add_to_physmap static and shorten name Tamas K Lengyel
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 11/20] x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool Tamas K Lengyel
2019-12-18 21:29   ` Julien Grall
2019-12-18 22:19     ` Tamas K Lengyel
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 12/20] x86/mem_sharing: Replace MEM_SHARING_DEBUG with gdprintk Tamas K Lengyel
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 13/20] x86/mem_sharing: ASSERT that p2m_set_entry succeeds Tamas K Lengyel
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 14/20] x86/mem_sharing: Enable mem_sharing on first memop Tamas K Lengyel
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 15/20] x86/mem_sharing: Skip xen heap pages in memshr nominate Tamas K Lengyel
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 16/20] x86/mem_sharing: check page type count earlier Tamas K Lengyel
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 17/20] xen/mem_sharing: VM forking Tamas K Lengyel
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 18/20] xen/mem_access: Use __get_gfn_type_access in set_mem_access Tamas K Lengyel
2019-12-19  7:59   ` Alexandru Stefan ISAILA
2019-12-19 16:00     ` Tamas K Lengyel
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 19/20] x86/mem_sharing: reset a fork Tamas K Lengyel
2019-12-18 22:00   ` Julien Grall
2019-12-18 22:33     ` Tamas K Lengyel
2019-12-18 23:01       ` Julien Grall
2019-12-19  0:15         ` Tamas K Lengyel
2019-12-19  7:45           ` Julien Grall
2019-12-19 16:11             ` Tamas K Lengyel
2019-12-19 16:57               ` Julien Grall
2019-12-19 17:23                 ` Tamas K Lengyel
2019-12-19 17:38                   ` Julien Grall
2019-12-19 18:00                     ` Tamas K Lengyel
2019-12-19 11:06           ` Jan Beulich
2019-12-19 16:02             ` Tamas K Lengyel
2019-12-18 19:40 ` [Xen-devel] [PATCH v2 20/20] xen/tools: VM forking toolstack side Tamas K Lengyel
2019-12-19  9:48 ` [Xen-devel] [PATCH v2 00/20] VM forking Roger Pau Monné
2019-12-19 15:58   ` Tamas K Lengyel
2019-12-30 17:59     ` Roger Pau Monné
2019-12-30 18:15       ` Tamas K Lengyel
2019-12-30 18:43         ` Julien Grall
2019-12-30 20:46           ` Tamas K Lengyel
2019-12-31  0:20             ` Julien Grall
2019-12-31  0:37               ` Tamas K Lengyel
2019-12-31 10:40                 ` Roger Pau Monné
2019-12-31 15:00                   ` Tamas K Lengyel
2019-12-31 15:11                     ` Roger Pau Monné
2019-12-31 16:08                       ` Tamas K Lengyel
2019-12-31 16:36                         ` Tamas K Lengyel
2020-01-08  9:42                           ` Julien Grall
2020-01-08 15:08                           ` Roger Pau Monné
2020-01-08 15:32                             ` Tamas K Lengyel
2020-01-08 18:00                               ` Roger Pau Monné
2020-01-08 18:14                                 ` Tamas K Lengyel
2020-01-08 18:23                                   ` Tamas K Lengyel
2020-01-08 18:44                                     ` Roger Pau Monné
2020-01-08 19:47                                       ` Tamas K Lengyel
2020-01-08 18:36                                   ` Roger Pau Monné
2020-01-08 19:51                                     ` Tamas K Lengyel
2020-01-09  9:47                                       ` Roger Pau Monné
2020-01-09 13:31                                         ` Tamas K Lengyel
2020-01-08 16:34                       ` George Dunlap
2020-01-08 17:06                         ` Tamas K Lengyel
2020-01-08 17:16                           ` George Dunlap
2020-01-08 17:25                             ` Tamas K Lengyel
2020-01-08 18:07                         ` Roger Pau Monné

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.