xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
@ 2016-06-11 23:24 Tamas K Lengyel
  2016-06-11 23:24 ` [PATCH v5 2/2] tests/mem-sharing: Add bulk option to memshrtool Tamas K Lengyel
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Tamas K Lengyel @ 2016-06-11 23:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Tamas K Lengyel

Currently mem-sharing can be performed on a page-by-page base from the control
domain. However, when completely deduplicating (cloning) a VM, this requires
at least 3 hypercalls per page. As the user has to loop through all pages up
to max_gpfn, this process is very slow and wasteful.

This patch introduces a new mem_sharing memop for bulk deduplication where
the user doesn't have to separately nominate each page in both the source and
destination domain, and the looping over all pages happen in the hypervisor.
This significantly reduces the overhead of completely deduplicating entire
domains.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
Ian Jackson <ian.jackson@eu.citrix.com>
George Dunlap <george.dunlap@eu.citrix.com>
Jan Beulich <jbeulich@suse.com>
Andrew Cooper <andrew.cooper3@citrix.com>

v4: Add padding to bulk op and enforce it being 0
    Use correct XSM permission check
---
 tools/libxc/include/xenctrl.h |  15 ++++++
 tools/libxc/xc_memshr.c       |  19 +++++++
 xen/arch/x86/mm/mem_sharing.c | 121 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/memory.h   |  15 +++++-
 4 files changed, 169 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 6ae1a2b..34dcc72 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2327,6 +2327,21 @@ int xc_memshr_add_to_physmap(xc_interface *xch,
                     domid_t client_domain,
                     unsigned long client_gfn);
 
+/* Allows to deduplicate the entire memory of a client domain in bulk. Using
+ * this function is equivalent of calling xc_memshr_nominate_gfn for each gfn
+ * in the two domains followed by xc_memshr_share_gfns. If successfull,
+ * returns the number of shared pages in 'shared'. Both domains must be paused.
+ *
+ * May fail with -EINVAL if the source and client domain have different
+ * memory size or if memory sharing is not enabled on either of the domains.
+ * May also fail with -ENOMEM if there isn't enough memory available to store
+ * the sharing metadata before deduplication can happen.
+ */
+int xc_memshr_bulk_share(xc_interface *xch,
+                         domid_t source_domain,
+                         domid_t client_domain,
+                         uint64_t *shared);
+
 /* Debug calls: return the number of pages referencing the shared frame backing
  * the input argument. Should be one or greater. 
  *
diff --git a/tools/libxc/xc_memshr.c b/tools/libxc/xc_memshr.c
index deb0aa4..71350d2 100644
--- a/tools/libxc/xc_memshr.c
+++ b/tools/libxc/xc_memshr.c
@@ -181,6 +181,25 @@ int xc_memshr_add_to_physmap(xc_interface *xch,
     return xc_memshr_memop(xch, source_domain, &mso);
 }
 
+int xc_memshr_bulk_share(xc_interface *xch,
+                         domid_t source_domain,
+                         domid_t client_domain,
+                         uint64_t *shared)
+{
+    int rc;
+    xen_mem_sharing_op_t mso;
+
+    memset(&mso, 0, sizeof(mso));
+
+    mso.op = XENMEM_sharing_op_bulk_share;
+    mso.u.bulk.client_domain = client_domain;
+
+    rc = xc_memshr_memop(xch, source_domain, &mso);
+    if ( !rc && shared ) *shared = mso.u.bulk.shared;
+
+    return rc;
+}
+
 int xc_memshr_domain_resume(xc_interface *xch,
                             domid_t domid)
 {
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index a522423..ba06fb0 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1294,6 +1294,54 @@ int relinquish_shared_pages(struct domain *d)
     return rc;
 }
 
+static int bulk_share(struct domain *d, struct domain *cd, unsigned long limit,
+                      struct mem_sharing_op_bulk *bulk)
+{
+    int rc = 0;
+    shr_handle_t sh, ch;
+
+    while( limit > bulk->start )
+    {
+        /*
+         * We only break out if we run out of memory as individual pages may
+         * legitimately be unsharable and we just want to skip over those.
+         */
+        rc = mem_sharing_nominate_page(d, bulk->start, 0, &sh);
+        if ( rc == -ENOMEM )
+            break;
+        if ( !rc )
+        {
+            rc = mem_sharing_nominate_page(cd, bulk->start, 0, &ch);
+            if ( rc == -ENOMEM )
+                break;
+            if ( !rc )
+            {
+                /* If we get here this should be guaranteed to succeed. */
+                rc = mem_sharing_share_pages(d, bulk->start, sh,
+                                             cd, bulk->start, ch);
+                ASSERT(!rc);
+            }
+        }
+
+        /* Check for continuation if it's not the last iteration. */
+        if ( limit > ++bulk->start && hypercall_preempt_check() )
+        {
+            rc = 1;
+            break;
+        }
+    }
+
+    /*
+     * We only propagate -ENOMEM as individual pages may fail with -EINVAL,
+     * and for bulk sharing we only care if -ENOMEM was encountered so we reset
+     * rc here.
+     */
+    if ( rc < 0 && rc != -ENOMEM )
+        rc = 0;
+
+    return rc;
+}
+
 int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 {
     int rc;
@@ -1468,6 +1516,79 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
         }
         break;
 
+        case XENMEM_sharing_op_bulk_share:
+        {
+            unsigned long max_sgfn, max_cgfn;
+            struct domain *cd;
+
+            rc = -EINVAL;
+            if( mso.u.bulk._pad[0] || mso.u.bulk._pad[1] || mso.u.bulk._pad[2] )
+                goto out;
+
+            if ( !mem_sharing_enabled(d) )
+                goto out;
+
+            rc = rcu_lock_live_remote_domain_by_id(mso.u.bulk.client_domain,
+                                                   &cd);
+            if ( rc )
+                goto out;
+
+            /*
+             * We reuse XENMEM_sharing_op_share XSM check here as this is essentially
+             * the same concept repeated over multiple pages.
+             */
+            rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd, XENMEM_sharing_op_share);
+            if ( rc )
+            {
+                rcu_unlock_domain(cd);
+                goto out;
+            }
+
+            if ( !mem_sharing_enabled(cd) )
+            {
+                rcu_unlock_domain(cd);
+                rc = -EINVAL;
+                goto out;
+            }
+
+            if ( !atomic_read(&d->pause_count) ||
+                 !atomic_read(&cd->pause_count) )
+            {
+                rcu_unlock_domain(cd);
+                rc = -EINVAL;
+                goto out;
+            }
+
+            max_sgfn = domain_get_maximum_gpfn(d);
+            max_cgfn = domain_get_maximum_gpfn(cd);
+
+            if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start )
+            {
+                rcu_unlock_domain(cd);
+                rc = -EINVAL;
+                goto out;
+            }
+
+            rc = bulk_share(d, cd, max_sgfn + 1, &mso.u.bulk);
+            if ( rc > 0 )
+            {
+                if ( __copy_to_guest(arg, &mso, 1) )
+                    rc = -EFAULT;
+                else
+                    rc = hypercall_create_continuation(__HYPERVISOR_memory_op,
+                                                       "lh", XENMEM_sharing_op,
+                                                       arg);
+            }
+            else
+            {
+                mso.u.bulk.start = 0;
+                mso.u.bulk.shared = atomic_read(&cd->shr_pages);
+            }
+
+            rcu_unlock_domain(cd);
+        }
+        break;
+
         case XENMEM_sharing_op_debug_gfn:
         {
             unsigned long gfn = mso.u.debug.u.gfn;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 29ec571..084f06e 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -465,6 +465,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
 #define XENMEM_sharing_op_debug_gref        5
 #define XENMEM_sharing_op_add_physmap       6
 #define XENMEM_sharing_op_audit             7
+#define XENMEM_sharing_op_bulk_share        8
 
 #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
 #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
@@ -500,7 +501,19 @@ struct xen_mem_sharing_op {
             uint64_aligned_t client_gfn;    /* IN: the client gfn */
             uint64_aligned_t client_handle; /* IN: handle to the client page */
             domid_t  client_domain; /* IN: the client domain id */
-        } share; 
+        } share;
+        struct mem_sharing_op_bulk {         /* OP_BULK_SHARE */
+            uint64_aligned_t start;          /* IN: start gfn. Set to 0 for
+                                                full deduplication. Field is
+                                                reset to 0 when hypercall
+                                                completes */
+            uint64_aligned_t shared;         /* OUT: the number of gfns
+                                                that are shared after this
+                                                operation, including pages
+                                                that were already shared */
+            domid_t client_domain;           /* IN: the client domain id */
+            uint16_t _pad[3];
+        } bulk;
         struct mem_sharing_op_debug {     /* OP_DEBUG_xxx */
             union {
                 uint64_aligned_t gfn;      /* IN: gfn to debug          */
-- 
2.8.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v5 2/2] tests/mem-sharing: Add bulk option to memshrtool
  2016-06-11 23:24 [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains Tamas K Lengyel
@ 2016-06-11 23:24 ` Tamas K Lengyel
  2016-06-12  0:08 ` [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains Tamas K Lengyel
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Tamas K Lengyel @ 2016-06-11 23:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Tamas K Lengyel, Ian Jackson, Ian Campbell, Stefano Stabellini

Add the bulk option to the test tool to perform complete deduplication
between domains.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 tools/tests/mem-sharing/memshrtool.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/tools/tests/mem-sharing/memshrtool.c b/tools/tests/mem-sharing/memshrtool.c
index 437c7c9..3e8f467 100644
--- a/tools/tests/mem-sharing/memshrtool.c
+++ b/tools/tests/mem-sharing/memshrtool.c
@@ -24,6 +24,8 @@ static int usage(const char* prog)
     printf("  nominate <domid> <gfn>  - Nominate a page for sharing.\n");
     printf("  share <domid> <gfn> <handle> <source> <source-gfn> <source-handle>\n");
     printf("                          - Share two pages.\n");
+    printf("  bulk <source-domid> <destination-domid>\n");
+    printf("                          - Share all pages between domains.\n");
     printf("  unshare <domid> <gfn>   - Unshare a page by grabbing a writable map.\n");
     printf("  add-to-physmap <domid> <gfn> <source> <source-gfn> <source-handle>\n");
     printf("                          - Populate a page in a domain with a shared page.\n");
@@ -180,6 +182,26 @@ int main(int argc, const char** argv)
         }
         printf("Audit returned %d errors.\n", rc);
     }
+    else if( !strcasecmp(cmd, "bulk") )
+    {
+        domid_t sdomid, cdomid;
+        int rc;
+        unsigned long shared;
+
+        if( argc != 4 )
+            return usage(argv[0]);
+
+        sdomid = strtol(argv[2], NULL, 0);
+        cdomid = strtol(argv[3], NULL, 0);
 
+        rc = xc_memshr_bulk_share(xch, sdomid, cdomid, &shared);
+        if ( rc < 0 )
+        {
+            printf("error executing xc_memshr_bulk_dedup: %s\n", strerror(errno));
+            return rc;
+        }
+
+        printf("Successfully shared %lu pages between the domains\n", shared);
+    }
     return 0;
 }
-- 
2.8.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2016-06-11 23:24 [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains Tamas K Lengyel
  2016-06-11 23:24 ` [PATCH v5 2/2] tests/mem-sharing: Add bulk option to memshrtool Tamas K Lengyel
@ 2016-06-12  0:08 ` Tamas K Lengyel
  2016-06-14  9:56 ` Jan Beulich
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Tamas K Lengyel @ 2016-06-12  0:08 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Ian Jackson, Jan Beulich

On Sat, Jun 11, 2016 at 5:24 PM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
> Currently mem-sharing can be performed on a page-by-page base from the control
> domain. However, when completely deduplicating (cloning) a VM, this requires
> at least 3 hypercalls per page. As the user has to loop through all pages up
> to max_gpfn, this process is very slow and wasteful.
>
> This patch introduces a new mem_sharing memop for bulk deduplication where
> the user doesn't have to separately nominate each page in both the source and
> destination domain, and the looping over all pages happen in the hypervisor.
> This significantly reduces the overhead of completely deduplicating entire
> domains.
>
> Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
> Acked-by: Wei Liu <wei.liu2@citrix.com>
> ---
> Ian Jackson <ian.jackson@eu.citrix.com>
> George Dunlap <george.dunlap@eu.citrix.com>
> Jan Beulich <jbeulich@suse.com>
> Andrew Cooper <andrew.cooper3@citrix.com>

Missed Cc: here.

>
> v4: Add padding to bulk op and enforce it being 0
>     Use correct XSM permission check
> ---
>  tools/libxc/include/xenctrl.h |  15 ++++++
>  tools/libxc/xc_memshr.c       |  19 +++++++
>  xen/arch/x86/mm/mem_sharing.c | 121 ++++++++++++++++++++++++++++++++++++++++++
>  xen/include/public/memory.h   |  15 +++++-
>  4 files changed, 169 insertions(+), 1 deletion(-)
>
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 6ae1a2b..34dcc72 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -2327,6 +2327,21 @@ int xc_memshr_add_to_physmap(xc_interface *xch,
>                      domid_t client_domain,
>                      unsigned long client_gfn);
>
> +/* Allows to deduplicate the entire memory of a client domain in bulk. Using
> + * this function is equivalent of calling xc_memshr_nominate_gfn for each gfn
> + * in the two domains followed by xc_memshr_share_gfns. If successfull,
> + * returns the number of shared pages in 'shared'. Both domains must be paused.
> + *
> + * May fail with -EINVAL if the source and client domain have different
> + * memory size or if memory sharing is not enabled on either of the domains.
> + * May also fail with -ENOMEM if there isn't enough memory available to store
> + * the sharing metadata before deduplication can happen.
> + */
> +int xc_memshr_bulk_share(xc_interface *xch,
> +                         domid_t source_domain,
> +                         domid_t client_domain,
> +                         uint64_t *shared);
> +
>  /* Debug calls: return the number of pages referencing the shared frame backing
>   * the input argument. Should be one or greater.
>   *
> diff --git a/tools/libxc/xc_memshr.c b/tools/libxc/xc_memshr.c
> index deb0aa4..71350d2 100644
> --- a/tools/libxc/xc_memshr.c
> +++ b/tools/libxc/xc_memshr.c
> @@ -181,6 +181,25 @@ int xc_memshr_add_to_physmap(xc_interface *xch,
>      return xc_memshr_memop(xch, source_domain, &mso);
>  }
>
> +int xc_memshr_bulk_share(xc_interface *xch,
> +                         domid_t source_domain,
> +                         domid_t client_domain,
> +                         uint64_t *shared)
> +{
> +    int rc;
> +    xen_mem_sharing_op_t mso;
> +
> +    memset(&mso, 0, sizeof(mso));
> +
> +    mso.op = XENMEM_sharing_op_bulk_share;
> +    mso.u.bulk.client_domain = client_domain;
> +
> +    rc = xc_memshr_memop(xch, source_domain, &mso);
> +    if ( !rc && shared ) *shared = mso.u.bulk.shared;
> +
> +    return rc;
> +}
> +
>  int xc_memshr_domain_resume(xc_interface *xch,
>                              domid_t domid)
>  {
> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> index a522423..ba06fb0 100644
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -1294,6 +1294,54 @@ int relinquish_shared_pages(struct domain *d)
>      return rc;
>  }
>
> +static int bulk_share(struct domain *d, struct domain *cd, unsigned long limit,
> +                      struct mem_sharing_op_bulk *bulk)
> +{
> +    int rc = 0;
> +    shr_handle_t sh, ch;
> +
> +    while( limit > bulk->start )
> +    {
> +        /*
> +         * We only break out if we run out of memory as individual pages may
> +         * legitimately be unsharable and we just want to skip over those.
> +         */
> +        rc = mem_sharing_nominate_page(d, bulk->start, 0, &sh);
> +        if ( rc == -ENOMEM )
> +            break;
> +        if ( !rc )
> +        {
> +            rc = mem_sharing_nominate_page(cd, bulk->start, 0, &ch);
> +            if ( rc == -ENOMEM )
> +                break;
> +            if ( !rc )
> +            {
> +                /* If we get here this should be guaranteed to succeed. */
> +                rc = mem_sharing_share_pages(d, bulk->start, sh,
> +                                             cd, bulk->start, ch);
> +                ASSERT(!rc);
> +            }
> +        }
> +
> +        /* Check for continuation if it's not the last iteration. */
> +        if ( limit > ++bulk->start && hypercall_preempt_check() )
> +        {
> +            rc = 1;
> +            break;
> +        }
> +    }
> +
> +    /*
> +     * We only propagate -ENOMEM as individual pages may fail with -EINVAL,
> +     * and for bulk sharing we only care if -ENOMEM was encountered so we reset
> +     * rc here.
> +     */
> +    if ( rc < 0 && rc != -ENOMEM )
> +        rc = 0;
> +
> +    return rc;
> +}
> +
>  int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>  {
>      int rc;
> @@ -1468,6 +1516,79 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>          }
>          break;
>
> +        case XENMEM_sharing_op_bulk_share:
> +        {
> +            unsigned long max_sgfn, max_cgfn;
> +            struct domain *cd;
> +
> +            rc = -EINVAL;
> +            if( mso.u.bulk._pad[0] || mso.u.bulk._pad[1] || mso.u.bulk._pad[2] )
> +                goto out;
> +
> +            if ( !mem_sharing_enabled(d) )
> +                goto out;
> +
> +            rc = rcu_lock_live_remote_domain_by_id(mso.u.bulk.client_domain,
> +                                                   &cd);
> +            if ( rc )
> +                goto out;
> +
> +            /*
> +             * We reuse XENMEM_sharing_op_share XSM check here as this is essentially
> +             * the same concept repeated over multiple pages.
> +             */
> +            rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd, XENMEM_sharing_op_share);
> +            if ( rc )
> +            {
> +                rcu_unlock_domain(cd);
> +                goto out;
> +            }
> +
> +            if ( !mem_sharing_enabled(cd) )
> +            {
> +                rcu_unlock_domain(cd);
> +                rc = -EINVAL;
> +                goto out;
> +            }
> +
> +            if ( !atomic_read(&d->pause_count) ||
> +                 !atomic_read(&cd->pause_count) )
> +            {
> +                rcu_unlock_domain(cd);
> +                rc = -EINVAL;
> +                goto out;
> +            }
> +
> +            max_sgfn = domain_get_maximum_gpfn(d);
> +            max_cgfn = domain_get_maximum_gpfn(cd);
> +
> +            if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start )
> +            {
> +                rcu_unlock_domain(cd);
> +                rc = -EINVAL;
> +                goto out;
> +            }
> +
> +            rc = bulk_share(d, cd, max_sgfn + 1, &mso.u.bulk);
> +            if ( rc > 0 )
> +            {
> +                if ( __copy_to_guest(arg, &mso, 1) )
> +                    rc = -EFAULT;
> +                else
> +                    rc = hypercall_create_continuation(__HYPERVISOR_memory_op,
> +                                                       "lh", XENMEM_sharing_op,
> +                                                       arg);
> +            }
> +            else
> +            {
> +                mso.u.bulk.start = 0;
> +                mso.u.bulk.shared = atomic_read(&cd->shr_pages);
> +            }
> +
> +            rcu_unlock_domain(cd);
> +        }
> +        break;
> +
>          case XENMEM_sharing_op_debug_gfn:
>          {
>              unsigned long gfn = mso.u.debug.u.gfn;
> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
> index 29ec571..084f06e 100644
> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -465,6 +465,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
>  #define XENMEM_sharing_op_debug_gref        5
>  #define XENMEM_sharing_op_add_physmap       6
>  #define XENMEM_sharing_op_audit             7
> +#define XENMEM_sharing_op_bulk_share        8
>
>  #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
>  #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
> @@ -500,7 +501,19 @@ struct xen_mem_sharing_op {
>              uint64_aligned_t client_gfn;    /* IN: the client gfn */
>              uint64_aligned_t client_handle; /* IN: handle to the client page */
>              domid_t  client_domain; /* IN: the client domain id */
> -        } share;
> +        } share;
> +        struct mem_sharing_op_bulk {         /* OP_BULK_SHARE */
> +            uint64_aligned_t start;          /* IN: start gfn. Set to 0 for
> +                                                full deduplication. Field is
> +                                                reset to 0 when hypercall
> +                                                completes */
> +            uint64_aligned_t shared;         /* OUT: the number of gfns
> +                                                that are shared after this
> +                                                operation, including pages
> +                                                that were already shared */
> +            domid_t client_domain;           /* IN: the client domain id */
> +            uint16_t _pad[3];
> +        } bulk;
>          struct mem_sharing_op_debug {     /* OP_DEBUG_xxx */
>              union {
>                  uint64_aligned_t gfn;      /* IN: gfn to debug          */
> --
> 2.8.1
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2016-06-11 23:24 [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains Tamas K Lengyel
  2016-06-11 23:24 ` [PATCH v5 2/2] tests/mem-sharing: Add bulk option to memshrtool Tamas K Lengyel
  2016-06-12  0:08 ` [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains Tamas K Lengyel
@ 2016-06-14  9:56 ` Jan Beulich
  2016-06-14 16:33 ` Konrad Rzeszutek Wilk
  2016-06-22 15:38 ` George Dunlap
  4 siblings, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2016-06-14  9:56 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: xen-devel

>>> On 12.06.16 at 01:24, <tamas@tklengyel.com> wrote:
> @@ -1468,6 +1516,79 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>          }
>          break;
>  
> +        case XENMEM_sharing_op_bulk_share:
> +        {
> +            unsigned long max_sgfn, max_cgfn;
> +            struct domain *cd;
> +
> +            rc = -EINVAL;
> +            if( mso.u.bulk._pad[0] || mso.u.bulk._pad[1] || mso.u.bulk._pad[2] )

With the missing blank added here (doable on commit as well),
hypervisor side
Reviewed-by: Jan Beulich <jbeulich@suse.com>

> +                goto out;
> +
> +            if ( !mem_sharing_enabled(d) )
> +                goto out;

Irrespective of the above I think a more descriptive error than
-EINVAL would be quite reasonable to expect for a sharing operation
attempted without having enabled sharing. By the name of it (but
not by its conventional meaning) -EILSEQ might be a candidate.

> +            rc = rcu_lock_live_remote_domain_by_id(mso.u.bulk.client_domain,
> +                                                   &cd);
> +            if ( rc )
> +                goto out;
> +
> +            /*
> +             * We reuse XENMEM_sharing_op_share XSM check here as this is essentially
> +             * the same concept repeated over multiple pages.
> +             */
> +            rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd, XENMEM_sharing_op_share);
> +            if ( rc )
> +            {
> +                rcu_unlock_domain(cd);
> +                goto out;
> +            }
> +
> +            if ( !mem_sharing_enabled(cd) )
> +            {
> +                rcu_unlock_domain(cd);
> +                rc = -EINVAL;
> +                goto out;
> +            }
> +
> +            if ( !atomic_read(&d->pause_count) ||
> +                 !atomic_read(&cd->pause_count) )
> +            {
> +                rcu_unlock_domain(cd);
> +                rc = -EINVAL;
> +                goto out;
> +            }
> +
> +            max_sgfn = domain_get_maximum_gpfn(d);
> +            max_cgfn = domain_get_maximum_gpfn(cd);
> +
> +            if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start )
> +            {
> +                rcu_unlock_domain(cd);

I'd also recommend adding a new label below to avoid these repeated
rcu_unlock_domain(cd) invocations.

But for both of these - you're the maintainer, so you know best.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2016-06-11 23:24 [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains Tamas K Lengyel
                   ` (2 preceding siblings ...)
  2016-06-14  9:56 ` Jan Beulich
@ 2016-06-14 16:33 ` Konrad Rzeszutek Wilk
       [not found]   ` <CABfawhmqhAsEk=J5F3y0tAvd74xfi1_2-A+tfKzwCapcmL1zPg@mail.gmail.com>
  2016-06-15  8:14   ` Jan Beulich
  2016-06-22 15:38 ` George Dunlap
  4 siblings, 2 replies; 13+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-06-14 16:33 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: xen-devel

> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> index a522423..ba06fb0 100644
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -1294,6 +1294,54 @@ int relinquish_shared_pages(struct domain *d)
>      return rc;
>  }
>  
> +static int bulk_share(struct domain *d, struct domain *cd, unsigned long limit,
> +                      struct mem_sharing_op_bulk *bulk)
> +{
> +    int rc = 0;
> +    shr_handle_t sh, ch;
> +
> +    while( limit > bulk->start )

You are missing a space there.
> +    {
> +        /*
> +         * We only break out if we run out of memory as individual pages may
> +         * legitimately be unsharable and we just want to skip over those.
> +         */
> +        rc = mem_sharing_nominate_page(d, bulk->start, 0, &sh);
> +        if ( rc == -ENOMEM )
> +            break;
> +        if ( !rc )
> +        {
> +            rc = mem_sharing_nominate_page(cd, bulk->start, 0, &ch);
> +            if ( rc == -ENOMEM )
> +                break;
> +            if ( !rc )
> +            {
> +                /* If we get here this should be guaranteed to succeed. */
> +                rc = mem_sharing_share_pages(d, bulk->start, sh,
> +                                             cd, bulk->start, ch);
> +                ASSERT(!rc);
> +            }
> +        }
> +
> +        /* Check for continuation if it's not the last iteration. */
> +        if ( limit > ++bulk->start && hypercall_preempt_check() )

I surprised the compiler didn't complain to you about lack of parenthesis.

> +        {
> +            rc = 1;
> +            break;
> +        }
> +    }
> +
> +    /*
> +     * We only propagate -ENOMEM as individual pages may fail with -EINVAL,
> +     * and for bulk sharing we only care if -ENOMEM was encountered so we reset
> +     * rc here.
> +     */
> +    if ( rc < 0 && rc != -ENOMEM )
> +        rc = 0;
> +
> +    return rc;
> +}
> +
>  int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>  {
>      int rc;
> @@ -1468,6 +1516,79 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>          }
>          break;
>  
> +        case XENMEM_sharing_op_bulk_share:
> +        {
> +            unsigned long max_sgfn, max_cgfn;
> +            struct domain *cd;
> +
> +            rc = -EINVAL;
> +            if( mso.u.bulk._pad[0] || mso.u.bulk._pad[1] || mso.u.bulk._pad[2] )

The "if("..

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
       [not found]         ` <CABfawhkTvytb7NvHJ-AHJBCxA3wJ5Fo-ciV66JpSY-xdK+3v5Q@mail.gmail.com>
@ 2016-06-14 16:42           ` Tamas K Lengyel
  0 siblings, 0 replies; 13+ messages in thread
From: Tamas K Lengyel @ 2016-06-14 16:42 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2875 bytes --]

On Jun 14, 2016 10:33, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com>
wrote:
>
> > diff --git a/xen/arch/x86/mm/mem_sharing.c
b/xen/arch/x86/mm/mem_sharing.c
> > index a522423..ba06fb0 100644
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -1294,6 +1294,54 @@ int relinquish_shared_pages(struct domain *d)
> >      return rc;
> >  }
> >
> > +static int bulk_share(struct domain *d, struct domain *cd, unsigned
long limit,
> > +                      struct mem_sharing_op_bulk *bulk)
> > +{
> > +    int rc = 0;
> > +    shr_handle_t sh, ch;
> > +
> > +    while( limit > bulk->start )
>
> You are missing a space there.

Ack.

> > +    {
> > +        /*
> > +         * We only break out if we run out of memory as individual
pages may
> > +         * legitimately be unsharable and we just want to skip over
those.
> > +         */
> > +        rc = mem_sharing_nominate_page(d, bulk->start, 0, &sh);
> > +        if ( rc == -ENOMEM )
> > +            break;
> > +        if ( !rc )
> > +        {
> > +            rc = mem_sharing_nominate_page(cd, bulk->start, 0, &ch);
> > +            if ( rc == -ENOMEM )
> > +                break;
> > +            if ( !rc )
> > +            {
> > +                /* If we get here this should be guaranteed to
succeed. */
> > +                rc = mem_sharing_share_pages(d, bulk->start, sh,
> > +                                             cd, bulk->start, ch);
> > +                ASSERT(!rc);
> > +            }
> > +        }
> > +
> > +        /* Check for continuation if it's not the last iteration. */
> > +        if ( limit > ++bulk->start && hypercall_preempt_check() )
>
> I surprised the compiler didn't complain to you about lack of parenthesis.

This seems to be standard way to create continuation used in multiple
places throughout Xen. I don't personally like it much but I guess it's
better to be consistent.

>
> > +        {
> > +            rc = 1;
> > +            break;
> > +        }
> > +    }
> > +
> > +    /*
> > +     * We only propagate -ENOMEM as individual pages may fail with
-EINVAL,
> > +     * and for bulk sharing we only care if -ENOMEM was encountered so
we reset
> > +     * rc here.
> > +     */
> > +    if ( rc < 0 && rc != -ENOMEM )
> > +        rc = 0;
> > +
> > +    return rc;
> > +}
> > +
> >  int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
> >  {
> >      int rc;
> > @@ -1468,6 +1516,79 @@ int
mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
> >          }
> >          break;
> >
> > +        case XENMEM_sharing_op_bulk_share:
> > +        {
> > +            unsigned long max_sgfn, max_cgfn;
> > +            struct domain *cd;
> > +
> > +            rc = -EINVAL;
> > +            if( mso.u.bulk._pad[0] || mso.u.bulk._pad[1] ||
mso.u.bulk._pad[2] )
>
> The "if("..

Ack.

Thanks,
Tamas

[-- Attachment #1.2: Type: text/html, Size: 4225 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2016-06-14 16:33 ` Konrad Rzeszutek Wilk
       [not found]   ` <CABfawhmqhAsEk=J5F3y0tAvd74xfi1_2-A+tfKzwCapcmL1zPg@mail.gmail.com>
@ 2016-06-15  8:14   ` Jan Beulich
  2016-06-15 14:02     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2016-06-15  8:14 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Tamas K Lengyel

>>> On 14.06.16 at 18:33, <konrad.wilk@oracle.com> wrote:
>> +        /* Check for continuation if it's not the last iteration. */
>> +        if ( limit > ++bulk->start && hypercall_preempt_check() )
> 
> I surprised the compiler didn't complain to you about lack of parenthesis.

I'm puzzled - what kind of warning would you expect here? The
compiler -  afaik - treats relational operators vs logical AND/OR as
having well known precedence wrt to one another; what iirc it
would warn about is lack of parentheses in an expression using
both of the logical operators, as people frequently don't know
that && has higher precedence than ||.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2016-06-15  8:14   ` Jan Beulich
@ 2016-06-15 14:02     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 13+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-06-15 14:02 UTC (permalink / raw)
  To: Jan Beulich, '; +Cc: xen-devel, Tamas K Lengyel

On Wed, Jun 15, 2016 at 02:14:15AM -0600, Jan Beulich wrote:
> >>> On 14.06.16 at 18:33, <konrad.wilk@oracle.com> wrote:
> >> +        /* Check for continuation if it's not the last iteration. */
> >> +        if ( limit > ++bulk->start && hypercall_preempt_check() )
> > 
> > I surprised the compiler didn't complain to you about lack of parenthesis.
> 
> I'm puzzled - what kind of warning would you expect here? The
> compiler -  afaik - treats relational operators vs logical AND/OR as
> having well known precedence wrt to one another; what iirc it
> would warn about is lack of parentheses in an expression using
> both of the logical operators, as people frequently don't know
> that && has higher precedence than ||.

Maybe those are the warnings I had seen in the past. I recall
that with a more modern compiler it would give me suggestions to
use parenthesis. But I honestly can't recall the details.

Either way - the code is fine.
> 
> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2016-06-11 23:24 [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains Tamas K Lengyel
                   ` (3 preceding siblings ...)
  2016-06-14 16:33 ` Konrad Rzeszutek Wilk
@ 2016-06-22 15:38 ` George Dunlap
  2016-06-23 15:42   ` Tamas K Lengyel
  4 siblings, 1 reply; 13+ messages in thread
From: George Dunlap @ 2016-06-22 15:38 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: xen-devel

On Sun, Jun 12, 2016 at 12:24 AM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
> Currently mem-sharing can be performed on a page-by-page base from the control
> domain. However, when completely deduplicating (cloning) a VM, this requires
> at least 3 hypercalls per page. As the user has to loop through all pages up
> to max_gpfn, this process is very slow and wasteful.
>
> This patch introduces a new mem_sharing memop for bulk deduplication where
> the user doesn't have to separately nominate each page in both the source and
> destination domain, and the looping over all pages happen in the hypervisor.
> This significantly reduces the overhead of completely deduplicating entire
> domains.
>
> Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
> Acked-by: Wei Liu <wei.liu2@citrix.com>
> ---
> Ian Jackson <ian.jackson@eu.citrix.com>
> George Dunlap <george.dunlap@eu.citrix.com>
> Jan Beulich <jbeulich@suse.com>
> Andrew Cooper <andrew.cooper3@citrix.com>

I'm sorry I'm a bit late to this party -- I'm not sure how I managed
to miss the earlier posts of this.  A couple of questions...


> @@ -1468,6 +1516,79 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>          }
>          break;
>
> +        case XENMEM_sharing_op_bulk_share:
> +        {
> +            unsigned long max_sgfn, max_cgfn;
> +            struct domain *cd;
> +
> +            rc = -EINVAL;
> +            if( mso.u.bulk._pad[0] || mso.u.bulk._pad[1] || mso.u.bulk._pad[2] )
> +                goto out;
> +
> +            if ( !mem_sharing_enabled(d) )
> +                goto out;
> +
> +            rc = rcu_lock_live_remote_domain_by_id(mso.u.bulk.client_domain,
> +                                                   &cd);
> +            if ( rc )
> +                goto out;
> +
> +            /*
> +             * We reuse XENMEM_sharing_op_share XSM check here as this is essentially
> +             * the same concept repeated over multiple pages.
> +             */
> +            rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd, XENMEM_sharing_op_share);
> +            if ( rc )
> +            {
> +                rcu_unlock_domain(cd);
> +                goto out;
> +            }
> +
> +            if ( !mem_sharing_enabled(cd) )
> +            {
> +                rcu_unlock_domain(cd);
> +                rc = -EINVAL;
> +                goto out;
> +            }
> +
> +            if ( !atomic_read(&d->pause_count) ||
> +                 !atomic_read(&cd->pause_count) )
> +            {
> +                rcu_unlock_domain(cd);
> +                rc = -EINVAL;
> +                goto out;
> +            }

I realize that Jan asked for this, but I'm really not sure what good
this is supposed to do, since the guest can be un-paused at any point
halfway through the transaction.

Wouldn't it make more sense to have this function pause and unpause
the domains itself?

> +
> +            max_sgfn = domain_get_maximum_gpfn(d);
> +            max_cgfn = domain_get_maximum_gpfn(cd);
> +
> +            if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start )
> +            {
> +                rcu_unlock_domain(cd);
> +                rc = -EINVAL;
> +                goto out;
> +            }
> +
> +            rc = bulk_share(d, cd, max_sgfn + 1, &mso.u.bulk);
> +            if ( rc > 0 )
> +            {
> +                if ( __copy_to_guest(arg, &mso, 1) )
> +                    rc = -EFAULT;
> +                else
> +                    rc = hypercall_create_continuation(__HYPERVISOR_memory_op,
> +                                                       "lh", XENMEM_sharing_op,
> +                                                       arg);
> +            }
> +            else
> +            {
> +                mso.u.bulk.start = 0;
> +                mso.u.bulk.shared = atomic_read(&cd->shr_pages);
> +            }
> +
> +            rcu_unlock_domain(cd);
> +        }
> +        break;
> +
>          case XENMEM_sharing_op_debug_gfn:
>          {
>              unsigned long gfn = mso.u.debug.u.gfn;
> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
> index 29ec571..084f06e 100644
> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -465,6 +465,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
>  #define XENMEM_sharing_op_debug_gref        5
>  #define XENMEM_sharing_op_add_physmap       6
>  #define XENMEM_sharing_op_audit             7
> +#define XENMEM_sharing_op_bulk_share        8
>
>  #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
>  #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
> @@ -500,7 +501,19 @@ struct xen_mem_sharing_op {
>              uint64_aligned_t client_gfn;    /* IN: the client gfn */
>              uint64_aligned_t client_handle; /* IN: handle to the client page */
>              domid_t  client_domain; /* IN: the client domain id */
> -        } share;
> +        } share;
> +        struct mem_sharing_op_bulk {         /* OP_BULK_SHARE */
> +            uint64_aligned_t start;          /* IN: start gfn. Set to 0 for
> +                                                full deduplication. Field is
> +                                                reset to 0 when hypercall
> +                                                completes */

It took me a while to figure out that this value isn't actually
intended to be used as described.  You're actually intended to always
set this to zero, and the hypervisor just uses it for scratch space.

To start with, it seems like having a "bulk share" option which could
do just a specific range would be useful as well as a "bulk share"
which automatically deduped the entire set of memory.

Secondly, structuring the information like the other memory operations
-- for example, "nr_extents" and "nr_done" -- would make it more
consistent, and would allow you to also to avoid overwriting one of
the "in" values and having to restore it when you were done.

Other than that, it looks OK.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2016-06-22 15:38 ` George Dunlap
@ 2016-06-23 15:42   ` Tamas K Lengyel
  2016-07-05 14:35     ` George Dunlap
  0 siblings, 1 reply; 13+ messages in thread
From: Tamas K Lengyel @ 2016-06-23 15:42 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel

On Wed, Jun 22, 2016 at 9:38 AM, George Dunlap <dunlapg@umich.edu> wrote:
> On Sun, Jun 12, 2016 at 12:24 AM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
>> Currently mem-sharing can be performed on a page-by-page base from the control
>> domain. However, when completely deduplicating (cloning) a VM, this requires
>> at least 3 hypercalls per page. As the user has to loop through all pages up
>> to max_gpfn, this process is very slow and wasteful.
>>
>> This patch introduces a new mem_sharing memop for bulk deduplication where
>> the user doesn't have to separately nominate each page in both the source and
>> destination domain, and the looping over all pages happen in the hypervisor.
>> This significantly reduces the overhead of completely deduplicating entire
>> domains.
>>
>> Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
>> Acked-by: Wei Liu <wei.liu2@citrix.com>
>> ---
>> Ian Jackson <ian.jackson@eu.citrix.com>
>> George Dunlap <george.dunlap@eu.citrix.com>
>> Jan Beulich <jbeulich@suse.com>
>> Andrew Cooper <andrew.cooper3@citrix.com>
>
> I'm sorry I'm a bit late to this party -- I'm not sure how I managed
> to miss the earlier posts of this.  A couple of questions...
>
>
>> @@ -1468,6 +1516,79 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>>          }
>>          break;
>>
>> +        case XENMEM_sharing_op_bulk_share:
>> +        {
>> +            unsigned long max_sgfn, max_cgfn;
>> +            struct domain *cd;
>> +
>> +            rc = -EINVAL;
>> +            if( mso.u.bulk._pad[0] || mso.u.bulk._pad[1] || mso.u.bulk._pad[2] )
>> +                goto out;
>> +
>> +            if ( !mem_sharing_enabled(d) )
>> +                goto out;
>> +
>> +            rc = rcu_lock_live_remote_domain_by_id(mso.u.bulk.client_domain,
>> +                                                   &cd);
>> +            if ( rc )
>> +                goto out;
>> +
>> +            /*
>> +             * We reuse XENMEM_sharing_op_share XSM check here as this is essentially
>> +             * the same concept repeated over multiple pages.
>> +             */
>> +            rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd, XENMEM_sharing_op_share);
>> +            if ( rc )
>> +            {
>> +                rcu_unlock_domain(cd);
>> +                goto out;
>> +            }
>> +
>> +            if ( !mem_sharing_enabled(cd) )
>> +            {
>> +                rcu_unlock_domain(cd);
>> +                rc = -EINVAL;
>> +                goto out;
>> +            }
>> +
>> +            if ( !atomic_read(&d->pause_count) ||
>> +                 !atomic_read(&cd->pause_count) )
>> +            {
>> +                rcu_unlock_domain(cd);
>> +                rc = -EINVAL;
>> +                goto out;
>> +            }
>
> I realize that Jan asked for this, but I'm really not sure what good
> this is supposed to do, since the guest can be un-paused at any point
> halfway through the transaction.
>
> Wouldn't it make more sense to have this function pause and unpause
> the domains itself?

The domains are paused by the user when this op is called, so this is
just a sanity check ensuring you are not issuing the op on some other
domain. So if this function would pause the domain as well all it
would do is increase the pause count. This is the only intended
use-case for this function at this time. It would make no sense to try
to issue this op on domains that are running, as that pretty much
guarantee that some of their memory has already diverged, and thus
bulk-sharing their memory would have some unintended side-effects.

>
>> +
>> +            max_sgfn = domain_get_maximum_gpfn(d);
>> +            max_cgfn = domain_get_maximum_gpfn(cd);
>> +
>> +            if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start )
>> +            {
>> +                rcu_unlock_domain(cd);
>> +                rc = -EINVAL;
>> +                goto out;
>> +            }
>> +
>> +            rc = bulk_share(d, cd, max_sgfn + 1, &mso.u.bulk);
>> +            if ( rc > 0 )
>> +            {
>> +                if ( __copy_to_guest(arg, &mso, 1) )
>> +                    rc = -EFAULT;
>> +                else
>> +                    rc = hypercall_create_continuation(__HYPERVISOR_memory_op,
>> +                                                       "lh", XENMEM_sharing_op,
>> +                                                       arg);
>> +            }
>> +            else
>> +            {
>> +                mso.u.bulk.start = 0;
>> +                mso.u.bulk.shared = atomic_read(&cd->shr_pages);
>> +            }
>> +
>> +            rcu_unlock_domain(cd);
>> +        }
>> +        break;
>> +
>>          case XENMEM_sharing_op_debug_gfn:
>>          {
>>              unsigned long gfn = mso.u.debug.u.gfn;
>> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
>> index 29ec571..084f06e 100644
>> --- a/xen/include/public/memory.h
>> +++ b/xen/include/public/memory.h
>> @@ -465,6 +465,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
>>  #define XENMEM_sharing_op_debug_gref        5
>>  #define XENMEM_sharing_op_add_physmap       6
>>  #define XENMEM_sharing_op_audit             7
>> +#define XENMEM_sharing_op_bulk_share        8
>>
>>  #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
>>  #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
>> @@ -500,7 +501,19 @@ struct xen_mem_sharing_op {
>>              uint64_aligned_t client_gfn;    /* IN: the client gfn */
>>              uint64_aligned_t client_handle; /* IN: handle to the client page */
>>              domid_t  client_domain; /* IN: the client domain id */
>> -        } share;
>> +        } share;
>> +        struct mem_sharing_op_bulk {         /* OP_BULK_SHARE */
>> +            uint64_aligned_t start;          /* IN: start gfn. Set to 0 for
>> +                                                full deduplication. Field is
>> +                                                reset to 0 when hypercall
>> +                                                completes */
>
> It took me a while to figure out that this value isn't actually
> intended to be used as described.  You're actually intended to always
> set this to zero, and the hypervisor just uses it for scratch space.

In my original patch there was no field such as this but was requested
by other Maintainers. So technically it is now possible to start
bulk-sharing at a pfn other then 0 and the op would work, it would
just result in less then full sharing. I certainly have no use-case
for it but it is possible to use it that way so I simply documented
that unintended "feature".

>
> To start with, it seems like having a "bulk share" option which could
> do just a specific range would be useful as well as a "bulk share"
> which automatically deduped the entire set of memory.

I have no need for such options.

>
> Secondly, structuring the information like the other memory operations
> -- for example, "nr_extents" and "nr_done" -- would make it more
> consistent, and would allow you to also to avoid overwriting one of
> the "in" values and having to restore it when you were done.

I don't see any harm in clearing this value to 0 when the op finishes
so I don't think it would really make much difference if we have
another field just for scratch-space use. I'm fine with adding that
but it just seems a bit odd to me.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2016-06-23 15:42   ` Tamas K Lengyel
@ 2016-07-05 14:35     ` George Dunlap
  2016-07-05 14:58       ` Tamas K Lengyel
  2016-07-05 14:59       ` Jan Beulich
  0 siblings, 2 replies; 13+ messages in thread
From: George Dunlap @ 2016-07-05 14:35 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: xen-devel

On Thu, Jun 23, 2016 at 4:42 PM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
>>> +            if ( !atomic_read(&d->pause_count) ||
>>> +                 !atomic_read(&cd->pause_count) )
>>> +            {
>>> +                rcu_unlock_domain(cd);
>>> +                rc = -EINVAL;
>>> +                goto out;
>>> +            }
>>
>> I realize that Jan asked for this, but I'm really not sure what good
>> this is supposed to do, since the guest can be un-paused at any point
>> halfway through the transaction.
>>
>> Wouldn't it make more sense to have this function pause and unpause
>> the domains itself?
>
> The domains are paused by the user when this op is called, so this is
> just a sanity check ensuring you are not issuing the op on some other
> domain. So if this function would pause the domain as well all it
> would do is increase the pause count. This is the only intended
> use-case for this function at this time. It would make no sense to try
> to issue this op on domains that are running, as that pretty much
> guarantee that some of their memory has already diverged, and thus
> bulk-sharing their memory would have some unintended side-effects.

Yes, I understand all that.  But what I'm saying (and mostly this is
actually to Jan that I'm saying it) is that this check, as written,
seems pointless to me.  We're effectively *not* trusting the toolstack
to pause the domain, but we *are* trust the toolstack not to unpause
the domain before the operation is completed.  It seems to me we
should either trust the toolstack to pause the domain and leave it
paused -- letting it suffer the consequences if it fails to do so --
or we should pause and unpause the domain ourselves.

Adding an extra pause count is simple and harmless.  If we're going to
make an effort to think about buggy toolstacks, we might as well just
make it 100% safe.

>> To start with, it seems like having a "bulk share" option which could
>> do just a specific range would be useful as well as a "bulk share"
>> which automatically deduped the entire set of memory.
>
> I have no need for such options.

Yes, but it's not unlikely that somebody else may need them at some
point in the future; and it's only a tiny bit of adjustment to make
them more generally useful than just your current specific use case,
so that we can avoid changing the interface, or adding yet another
hypercall in the future.

>> Secondly, structuring the information like the other memory operations
>> -- for example, "nr_extents" and "nr_done" -- would make it more
>> consistent, and would allow you to also to avoid overwriting one of
>> the "in" values and having to restore it when you were done.
>
> I don't see any harm in clearing this value to 0 when the op finishes
> so I don't think it would really make much difference if we have
> another field just for scratch-space use. I'm fine with adding that
> but it just seems a bit odd to me.

Well clearing the value to zero seems a bit odd to me. :-)  But more
to the point, it's valuable to have the interface 1) be flexible
enough to bulk-share a range but not the entire address space 2) be
similar enough to existing interfaces that nobody needs to figure out
how it works.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2016-07-05 14:35     ` George Dunlap
@ 2016-07-05 14:58       ` Tamas K Lengyel
  2016-07-05 14:59       ` Jan Beulich
  1 sibling, 0 replies; 13+ messages in thread
From: Tamas K Lengyel @ 2016-07-05 14:58 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel

On Tue, Jul 5, 2016 at 8:35 AM, George Dunlap <george.dunlap@citrix.com> wrote:
> On Thu, Jun 23, 2016 at 4:42 PM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
>>>> +            if ( !atomic_read(&d->pause_count) ||
>>>> +                 !atomic_read(&cd->pause_count) )
>>>> +            {
>>>> +                rcu_unlock_domain(cd);
>>>> +                rc = -EINVAL;
>>>> +                goto out;
>>>> +            }
>>>
>>> I realize that Jan asked for this, but I'm really not sure what good
>>> this is supposed to do, since the guest can be un-paused at any point
>>> halfway through the transaction.
>>>
>>> Wouldn't it make more sense to have this function pause and unpause
>>> the domains itself?
>>
>> The domains are paused by the user when this op is called, so this is
>> just a sanity check ensuring you are not issuing the op on some other
>> domain. So if this function would pause the domain as well all it
>> would do is increase the pause count. This is the only intended
>> use-case for this function at this time. It would make no sense to try
>> to issue this op on domains that are running, as that pretty much
>> guarantee that some of their memory has already diverged, and thus
>> bulk-sharing their memory would have some unintended side-effects.
>
> Yes, I understand all that.  But what I'm saying (and mostly this is
> actually to Jan that I'm saying it) is that this check, as written,
> seems pointless to me.  We're effectively *not* trusting the toolstack
> to pause the domain, but we *are* trust the toolstack not to unpause
> the domain before the operation is completed.  It seems to me we
> should either trust the toolstack to pause the domain and leave it
> paused -- letting it suffer the consequences if it fails to do so --
> or we should pause and unpause the domain ourselves.
>
> Adding an extra pause count is simple and harmless.  If we're going to
> make an effort to think about buggy toolstacks, we might as well just
> make it 100% safe.

Well, there are many many more ways the toolstack could cause issues
for this op so I would say let's just trust the toolstack to behave as
intended so I'll remove the pause check and point out in a comment
that the intended usecase assumes both domains are paused and remain
paused for the duration of the op.

>
>>> To start with, it seems like having a "bulk share" option which could
>>> do just a specific range would be useful as well as a "bulk share"
>>> which automatically deduped the entire set of memory.
>>
>> I have no need for such options.
>
> Yes, but it's not unlikely that somebody else may need them at some
> point in the future; and it's only a tiny bit of adjustment to make
> them more generally useful than just your current specific use case,
> so that we can avoid changing the interface, or adding yet another
> hypercall in the future.
>
>>> Secondly, structuring the information like the other memory operations
>>> -- for example, "nr_extents" and "nr_done" -- would make it more
>>> consistent, and would allow you to also to avoid overwriting one of
>>> the "in" values and having to restore it when you were done.
>>
>> I don't see any harm in clearing this value to 0 when the op finishes
>> so I don't think it would really make much difference if we have
>> another field just for scratch-space use. I'm fine with adding that
>> but it just seems a bit odd to me.
>
> Well clearing the value to zero seems a bit odd to me. :-)  But more
> to the point, it's valuable to have the interface 1) be flexible
> enough to bulk-share a range but not the entire address space 2) be
> similar enough to existing interfaces that nobody needs to figure out
> how it works.

I really doubt there will be any users for this feature but it's
simple enough of an adjustment that I'm fine with the change.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2016-07-05 14:35     ` George Dunlap
  2016-07-05 14:58       ` Tamas K Lengyel
@ 2016-07-05 14:59       ` Jan Beulich
  1 sibling, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2016-07-05 14:59 UTC (permalink / raw)
  To: George Dunlap, Tamas K Lengyel; +Cc: xen-devel

>>> On 05.07.16 at 16:35, <george.dunlap@citrix.com> wrote:
> On Thu, Jun 23, 2016 at 4:42 PM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
>>>> +            if ( !atomic_read(&d->pause_count) ||
>>>> +                 !atomic_read(&cd->pause_count) )
>>>> +            {
>>>> +                rcu_unlock_domain(cd);
>>>> +                rc = -EINVAL;
>>>> +                goto out;
>>>> +            }
>>>
>>> I realize that Jan asked for this, but I'm really not sure what good
>>> this is supposed to do, since the guest can be un-paused at any point
>>> halfway through the transaction.
>>>
>>> Wouldn't it make more sense to have this function pause and unpause
>>> the domains itself?
>>
>> The domains are paused by the user when this op is called, so this is
>> just a sanity check ensuring you are not issuing the op on some other
>> domain. So if this function would pause the domain as well all it
>> would do is increase the pause count. This is the only intended
>> use-case for this function at this time. It would make no sense to try
>> to issue this op on domains that are running, as that pretty much
>> guarantee that some of their memory has already diverged, and thus
>> bulk-sharing their memory would have some unintended side-effects.
> 
> Yes, I understand all that.  But what I'm saying (and mostly this is
> actually to Jan that I'm saying it) is that this check, as written,
> seems pointless to me.  We're effectively *not* trusting the toolstack
> to pause the domain, but we *are* trust the toolstack not to unpause
> the domain before the operation is completed.  It seems to me we
> should either trust the toolstack to pause the domain and leave it
> paused -- letting it suffer the consequences if it fails to do so --
> or we should pause and unpause the domain ourselves.
> 
> Adding an extra pause count is simple and harmless.  If we're going to
> make an effort to think about buggy toolstacks, we might as well just
> make it 100% safe.
> 
>>> To start with, it seems like having a "bulk share" option which could
>>> do just a specific range would be useful as well as a "bulk share"
>>> which automatically deduped the entire set of memory.
>>
>> I have no need for such options.
> 
> Yes, but it's not unlikely that somebody else may need them at some
> point in the future; and it's only a tiny bit of adjustment to make
> them more generally useful than just your current specific use case,
> so that we can avoid changing the interface, or adding yet another
> hypercall in the future.

Well, I don't view it as a guarantee, but simply a sanity check. Iirc
I did ask for it simply because similar checks exist elsewhere.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-07-05 15:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-11 23:24 [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains Tamas K Lengyel
2016-06-11 23:24 ` [PATCH v5 2/2] tests/mem-sharing: Add bulk option to memshrtool Tamas K Lengyel
2016-06-12  0:08 ` [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains Tamas K Lengyel
2016-06-14  9:56 ` Jan Beulich
2016-06-14 16:33 ` Konrad Rzeszutek Wilk
     [not found]   ` <CABfawhmqhAsEk=J5F3y0tAvd74xfi1_2-A+tfKzwCapcmL1zPg@mail.gmail.com>
     [not found]     ` <CABfawh=Hw-mZi+d751pRhA-C-wLOJ3TQPkW6rLAYp+GGGbxx1Q@mail.gmail.com>
     [not found]       ` <CABfawhnZVFt2Hwrfb+DNH3ft3gbB3Bc4+h7EpJfXbf=1_P-LWQ@mail.gmail.com>
     [not found]         ` <CABfawhkTvytb7NvHJ-AHJBCxA3wJ5Fo-ciV66JpSY-xdK+3v5Q@mail.gmail.com>
2016-06-14 16:42           ` Tamas K Lengyel
2016-06-15  8:14   ` Jan Beulich
2016-06-15 14:02     ` Konrad Rzeszutek Wilk
2016-06-22 15:38 ` George Dunlap
2016-06-23 15:42   ` Tamas K Lengyel
2016-07-05 14:35     ` George Dunlap
2016-07-05 14:58       ` Tamas K Lengyel
2016-07-05 14:59       ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).